Developing a Machine Learning-Based Optical Character Recognition System to Convert Text Images into Audio for the Visually Impaired
DOI:
https://doi.org/10.47611/jsrhs.v14i1.8530Keywords:
Optical Character Recognition, Text-To-Speech, AudiobooksAbstract
Visually impaired individuals face significant challenges when it comes to reading traditional printed books because they rely heavily on visual cues to access written content. Without the ability to see, they cannot read the text directly, making conventional reading impossible. As a result, visually impaired people often depend on assistive technologies such as screen readers and audiobooks to access written material. However, converting normal books into audiobooks is a time-consuming, labor-intensive, and expensive process. It involves hiring professional narrators, recording the entire book, and editing the audio to ensure clarity and quality. This process requires significant human and technical resources, driving up costs. To address this problem, I propose a machine learning-based Optical Character Recognition system to convert text images into audio signals. The proposed system utilizes convolutional neural networks and long short-term memory for accurate text recognition and conversion. The approach achieved a word-based exact matching score of 93.724, which is a remarkable result. Furthermore, I implemented the system on a low-cost embedded board to demonstrate its feasibility and applicability in real-world scenarios. I expect that this approach can help visually impaired individuals access written content more easily and affordably.
Downloads
References or Bibliography
AI Hub. (2023, Aug 22). “Multilingual OCR data”: AI Hub.
AI Hub. (2023, Apr 14). “Outdoor real shot Korean image”: AI Hub.
American Foundation for the Blind. (2024, Oct 12). “Refreshable Braille Displays”: American Foundation for the Blind.
Blind in Mind. (2024, Oct 12). “Transcribe Textbooks into Braille”: Blind in Mind.
Bourne, R., Steinmetz, J. D., Flaxman, S., Briant, P. S., Taylor, H. R., Resnikoff, S., ... & Tareque, M. I. (2021). Trends in prevalence of blindness and distance and near vision impairment over 30 years: an analysis for the Global Burden of Disease Study. The Lancet global health, 9(2), e130-e143.
Čakić, S., Popović, T., Šandi, S., Krčo, S., & Gazivoda, A. (2020, February). The use of tesseract ocr number recognition for food tracking and tracing. In 2020 24th International Conference on Information Technology (IT) (pp. 1-4). IEEE.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
JaidedAI. (2023, May 25). “EasyOCR”: JaidedAI.
https://github.com/JaidedAI/EasyOCR
Jenti, (2021, Dec 15). “Light-OCR-API”: Jenti.
https://github.com/jentiai/Korean-Light-OCR-API
MathWorks. (2024, Sep 24). “Text Detection and Recognition”:MathWorks.
https://la.mathworks.com/help/vision/text-detection-and-recognition.html
Statistics Korea. (2020, Nov 3). “The language of the visually impaired, Braille? To celebrate Braille Day, the story of the visually impaired through statistics!”: Statistics Korea.
https://blog.naver.com/hi_nso/222134434777
WHO. (2023, Aug 10). “Blindness and vision impairment”: WHO
https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment
Published
How to Cite
Issue
Section
Copyright (c) 2025 Juna Ariyoshi; Jane Chun

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.


