Using Deep Learning to Understand and Model how a Virtual Assistant, like Siri, knows when to Act

Authors

  • Shravan Devraj Oak Park High School, Oak Park, CA, USA
  • Ross Greer University of California, San Diego

DOI:

https://doi.org/10.47611/jsrhs.v12i4.5243

Keywords:

machine learning, audio processing, audio signal classification, convolutional neural networks, virtual assistants

Abstract

In the era of technology, virtual assistants are all around us and have changed the way we interact with technology. To better understand the inner workings of virtual assistants, we visualized and demonstrated one way that mimics the audio classification techniques of virtual assistants by developing a deep convolutional neural network (DCNN) trained on mel spectrograms to classify audio. Our hypothesis is that mel spectrograms of the wake and non-wake words can be used to accurately classify audio. Out of the 85 files in our dataset, our classifier was trained and validated on 58 files of data and tested on 27 files of data. When evaluating our test performance, our model achieved a value of 1 for precision, recall and accuracy. Our classifier achieved a 100% accuracy in classifying wake words and non-wake words.

Downloads

Download data is not yet available.

References or Bibliography

Renotte, Nicholas, director. Build a Deep CNN Image Classifier with ANY Images. YouTube, YouTube, 25 Apr. 2022, https://youtube.com/watch?v=jztwpsIzEGc&t=0s.

“Hey Siri: An on-Device DNN-Powered Voice Trigger for Apple’s Personal Assistant.” Apple Machine Learning Research, https://machinelearning.apple.com/research/hey-siri.

Unsupervised Feature Learning for Audio Classification Using ... - Neurips, https://proceedings.neurips.cc/paper_files/paper/2009/file/a113c1ecd3cace2237256f4c712f61b5-Paper.pdf.

Nanni, Loris, et al. “An Ensemble of Convolutional Neural Networks for Audio Classification.” MDPI, 22 June 2021, https://www.mdpi.com/2076-3417/11/13/5796.

Nanni, Loris, Yandre M. G. Costa, et al. “Ensemble of Convolutional Neural Networks to Improve Animal Audio Classification - EURASIP Journal on Audio, Speech, and Music Processing.” SpringerOpen, 26 May 2020, https://asmp-eurasipjournals.springeropen.com/articles/10.1186/s13636-020-00175-3.

McLaughlin, Molly. “What Is a Virtual Assistant and How Does It Work?” Lifewire, 5 Aug. 2021, www.lifewire.com/virtual-assistants-4138533.

Doshi, Ketan. “Audio Deep Learning Made Simple: Automatic Speech Recognition (ASR), How It Works.” Medium, 25 May 2021, https://towardsdatascience.com/audio-deep-learning-made-simple-automatic-speech-recognition-asr-how-it-works-716cfce4c706.

Doshi, Ketan. “Foundations of NLP Explained Visually: Beam Search, How It Works.” Medium, 21 May 2021, https://towardsdatascience.com/foundations-of-nlp-explained-visually-beam-search-how-it-works-1586b9849a24.

Doshi, Ketan. “Audio Deep Learning Made Simple: Sound Classification, Step-by-Step.” Medium, 21 May 2021, https://towardsdatascience.com/audio-deep-learning-made-simple-sound-classification-step-by-step-cebc936bbe5.

Doshi, Ketan. “Audio Deep Learning Made Simple (Part 1): State-of-the-Art Techniques.” Medium, 21 May 2021, https://towardsdatascience.com/audio-deep-learning-made-simple-part-1-state-of-the-art-techniques-da1d3dff2504.

https://www.mdpi.com/sensors/sensors-22-01521/article_deploy/html/images/sensors-22-01521-g001.png

https://ars.els-cdn.com/content/image/3-s2.0-B9780128188330000096-f09-03-9780128188330.jpg

Published

11-30-2023

How to Cite

Devraj, S., & Greer, R. (2023). Using Deep Learning to Understand and Model how a Virtual Assistant, like Siri, knows when to Act. Journal of Student Research, 12(4). https://doi.org/10.47611/jsrhs.v12i4.5243

Issue

Section

HS Research Projects