Exploring the Application of the Whisper Model in Automatic Aphasia Speech Evaluation

Authors

  • Inpyo Lee Northern Valley Regional High School at Demarest
  • Tracey Salerno

DOI:

https://doi.org/10.47611/jsrhs.v13i4.8083

Keywords:

Whisper model, aphasia, communication disorder, speech evaluation

Abstract

Aphasia is an everyday communication and speech disorder that impairs the ability of an individual to express through writing and speech. This paper explores the potential of using automatic aphasia speech evaluation models like the Whisper model to evaluate aphasia and potentially other speech impairments. Though effective, traditional methods for aphasia assessment are time-consuming and require specialized clinical expertise. To address these challenges, the study fine-tunes the Whisper using the AphasiaBank dataset to create a more efficient and accessible evaluation tool. The first trial of finetuning focused on the phonemic transcript generation part of the Whisper model and achieved a low accuracy of 56.89%. Minor token prediction errors and word omissions were the major reasons the prediction accuracy was so low. The second trial focused on the model’s prediction structure, included prompt and correction tokens, and showed improved accuracy by 70.76%. This indicates that contextual information and correctness tokens can significantly enhance the model’s performance. Further research and training of this model should be done on the entire AphasiaBank dataset because only a sample was available for this paper. The results of this paper show that there is a potential for AI models such as the Whisper model to be alternative tools for Aphasia testing and evaluation.  This would remedy the scarcity of SLPs and make assessment more accessible to all individuals struggling with communication disorders by deploying an app or tool. 

Downloads

Download data is not yet available.

References or Bibliography

A. Ilapakurti, S. Kedari, J. S. Vuppalapati, S. Kedari and C. Vuppalapati, "Artificial Intelligent (AI) Clinical Edge for Voice disorder Detection," 2019 IEEE Fifth International Conference on Big Data Computing Service and Applications (BigDataService), Newark, CA, USA, 2019, pp. 340-345, doi: 10.1109/BigDataService.2019.00060.

Bílková, Z. (2020). Human-computer interface based on tongue and lips movements and its application for speech therapy system. Electronic Imaging. https://doi.org/10.2352/issn.2470-1173.2020.1.vda-389

Deka, C., Shrivastava, A., Nautiyal, S., & Chauhan, P. (2022). AI-Based Automated Speech Therapy Tools for persons with Speech Sound Disorders: A Systematic Literature Review. https://arxiv.org/pdf/2204.10325.pdf

Forbes, M. M., Fromm, D., & MacWhinney, B. (2012). AphasiaBank: A resource for clinicians. Seminars in Speech and Language, 33(3), 217–222. https://doi.org/10.1055/s-0032-1320041

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.

Jain, R., Barcovschi, A., Yiwere, M., Corcoran, P., & Cucu, H. (2023). Adaptation of Whisper models to child speech recognition. arXiv preprint arXiv:2307.13008.

Jothi, D. K. R., Yawalkar, P., & Mamatha, V. L. (2021). Automatic Speech Assessment System for Aphasia Speech Disorder. Annals of the Romanian Society for Cell Biology, 25(5), 5382–5392. http://www.annalsofrscb.ro/index.php/journal/article/view/6425

Kohlschein, C., Schmitt, M., Schuller, B., Jeschke, S., & Werner, C. J. (2017). A machine learning based system for the automatic evaluation of aphasia speech. OPUS (Augsburg University). https://doi.org/10.1109/healthcom.2017.8210766

Le, D., & Emily Mower Provost. (2014, September 14). Modeling pronunciation, rhythm, and intonation for automatic assessment of speech quality in aphasia rehabilitation. 15th Annual Conference of the ISCA (INTERSPEECH),. https://doi.org/10.21437/interspeech.2014-373

Mangrulkar, S., Gugger, S., Debut, L., Belkada, Y., Paul, S., & Bossan, B. (2022). Peft: State-of-the-art parameter-efficient fine-tuning methods. URL: https://github. com/huggingface/peft.Chicago

National Aphasia Association. (2016). Aphasia Statistics. The National Aphasia Association. https://aphasia.org/aphasia-resources/aphasia-statistics

National Institute on Deafness and Other Communication Disorders. (2016, May 19). Quick Statistics About Voice, Speech, Language. NIDCD. https://www.nidcd.nih.gov/health/statistics/quick-statistics-voice-speech-language

Pine, A., Littell, P., Joanis, E., Huggins-Daines, D., Cox, C., Davis, F., Antonio Santos, E., Srikanth, S., Torkornoo, D., & Yu, S. (2022). G$_i$2P$_i$ Rule-based, index-preserving grapheme-to-phoneme transformations. In Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages (pp. 52–60). Association for Computational Linguistics.

Privitera, A. J., Ng, S. H. S., Kong, A. P., & Weekes, B. S. (2024). AI and Aphasia in the Digital Age: A Critical Review. Brain sciences, 14(4), 383. https://doi.org/10.3390/brainsci14040383

Qin, Y., Lee, T., Pak, A., & Sam Po Law. (2016). Towards automatic assessment of aphasia speech using automatic speech recognition techniques. https://doi.org/10.1109/iscslp.2016.7918445

Saz, O., Yin, S.-C., Lleida, E., Rose, R., Vaquero, C., & Rodríguez, W. R. (2009). Tools and tech-nologies for Computer-Aided Speech and Language Therapy. Speech Communication, 10, 51. https://doi.org/10.1016/j.specom.2009.04.006ï

Shahin, M., Zafar, U., & Ahmed, B. (2020). The Automatic Detection of Speech Disorders in Children: Challenges, Opportunities, and Preliminary Results. IEEE Journal of Selected Topics in Signal Processing, 14(2), 400–412. https://doi.org/10.1109/jstsp.2019.2959393

Toğram, B., & Maviş, İ. (2012). Validity, reliability and standardization study of the language assessment test for aphasia. Turkish Journal of Neurology, 18(3), 096-103.

Published

11-30-2024

How to Cite

Lee, I., & Salerno, . T. (2024). Exploring the Application of the Whisper Model in Automatic Aphasia Speech Evaluation . Journal of Student Research, 13(4). https://doi.org/10.47611/jsrhs.v13i4.8083

Issue

Section

HS Research Articles