Developing a Machine Learning-Based Optical Character Recognition System to Convert Text Images into Audio for the Visually Impaired

Authors

  • Juna Ariyoshi Taejon Christian International School
  • Jane Chun Taejon Christian International School

DOI:

https://doi.org/10.47611/jsrhs.v14i1.8530

Keywords:

Optical Character Recognition, Text-To-Speech, Audiobooks

Abstract

Visually impaired individuals face significant challenges when it comes to reading traditional printed books because they rely heavily on visual cues to access written content. Without the ability to see, they cannot read the text directly, making conventional reading impossible. As a result, visually impaired people often depend on assistive technologies such as screen readers and audiobooks to access written material. However, converting normal books into audiobooks is a time-consuming, labor-intensive, and expensive process. It involves hiring professional narrators, recording the entire book, and editing the audio to ensure clarity and quality. This process requires significant human and technical resources, driving up costs. To address this problem, I propose a machine learning-based Optical Character Recognition system to convert text images into audio signals. The proposed system utilizes convolutional neural networks and long short-term memory for accurate text recognition and conversion. The approach achieved a word-based exact matching score of 93.724, which is a remarkable result. Furthermore, I implemented the system on a low-cost embedded board to demonstrate its feasibility and applicability in real-world scenarios. I expect that this approach can help visually impaired individuals access written content more easily and affordably.

Downloads

Download data is not yet available.

References or Bibliography

AI Hub. (2023, Aug 22). “Multilingual OCR data”: AI Hub.

https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=71730

AI Hub. (2023, Apr 14). “Outdoor real shot Korean image”: AI Hub.

https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=105

American Foundation for the Blind. (2024, Oct 12). “Refreshable Braille Displays”: American Foundation for the Blind.

https://www.afb.org/node/16207/refreshable-braille-displays#:~:text=The%20price%20of%20braille%20displays,the%20number%20of%20characters%20displayed

Blind in Mind. (2024, Oct 12). “Transcribe Textbooks into Braille”: Blind in Mind.

http://www.braillebookstore.com/Braille-Transcription#:~:text=Prices%20for%20Brailling%20a%20textbook,us%20for%20a%20firm%20quote

Bourne, R., Steinmetz, J. D., Flaxman, S., Briant, P. S., Taylor, H. R., Resnikoff, S., ... & Tareque, M. I. (2021). Trends in prevalence of blindness and distance and near vision impairment over 30 years: an analysis for the Global Burden of Disease Study. The Lancet global health, 9(2), e130-e143.

Čakić, S., Popović, T., Šandi, S., Krčo, S., & Gazivoda, A. (2020, February). The use of tesseract ocr number recognition for food tracking and tracing. In 2020 24th International Conference on Information Technology (IT) (pp. 1-4). IEEE.

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.

JaidedAI. (2023, May 25). “EasyOCR”: JaidedAI.

https://github.com/JaidedAI/EasyOCR

Jenti, (2021, Dec 15). “Light-OCR-API”: Jenti.

https://github.com/jentiai/Korean-Light-OCR-API

MathWorks. (2024, Sep 24). “Text Detection and Recognition”:MathWorks.

https://la.mathworks.com/help/vision/text-detection-and-recognition.html

Statistics Korea. (2020, Nov 3). “The language of the visually impaired, Braille? To celebrate Braille Day, the story of the visually impaired through statistics!”: Statistics Korea.

https://blog.naver.com/hi_nso/222134434777

WHO. (2023, Aug 10). “Blindness and vision impairment”: WHO

https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment

Published

02-28-2025

How to Cite

Ariyoshi, J., & Chun, J. (2025). Developing a Machine Learning-Based Optical Character Recognition System to Convert Text Images into Audio for the Visually Impaired. Journal of Student Research, 14(1). https://doi.org/10.47611/jsrhs.v14i1.8530

Issue

Section

HS Research Projects