Enhanced OCR Recognition for Madurese Text Documents: A Genetic Algorithm Approach with Tesseract 5.5
DOI:
https://doi.org/10.30595/juita.v13i2.25794Keywords:
Image Preprocessing, Optical Character Recognition, Genetic Algorithm Optimization, Madurese Language Processing, Tesseract OCRAbstract
Character Recognition (OCR) for the Madurese language using Genetic Algorithms (GA). The study addresses the challenges in processing Madurese text documents by implementing a nine-step image preprocessing workflow optimized through GA. Our methodology combines rescaling, grayscale conversion, adaptive thresholding, deskewing, median blur, Otsu thresholding, border removal, contrast enhancement, and noise reduction, with the sequence determined by GA optimization. The system utilizes Tesseract 5.5 OCR engine configured with Vietnamese language model parameters to accommodate Maderese writing characteristics. Experiments conducted on a dataset of 500 images demonstrated significant improvements in recognition accuracy. The GA-optimized preprocessing sequence achieved a 24.32% Word Error Rate (WER) and 7.47% Character Error Rate (CER), marking substantial improvements over the baseline Tesseract implementation. Further optimization through language model selection, particularly using the Occitan (OCI) model, yielded 100% accuracy in specific test cases. The research also explored various fitness function configurations, with a 0.7:0.3 WER-to-CER ratio proving most effective. These results demonstrate the potential of GA optimization in enhancing OCR performance for regional languages with unique characteristics, contributing to the broader field of document digitization and language preservation
References
[1] K. Thammarak, P. Kongkla, Y. Sirisathitkul, and S. Intakosum, “Comparative analysis of Tesseract and Google Cloud Vision for Thai vehicle registration certificate,” International Journal of Electrical and Computer Engineering, vol. 12, no. 2, pp. 1849–1858, 2022, doi: 10.11591/ijece.v12i2.pp1849-1858.
[2] M. Aviles, L. M. Sánchez-Reyes, R. Q. Fuentes-Aguilar, D. C. Toledo-Pérez, and J. Rodríguez-Reséndiz, “A Novel Methodology for Classifying EMG Movements Based on SVM and Genetic Algorithms,” Micromachines (Basel), vol. 13, no. 12, 2022, doi: 10.3390/mi13122108.
[3] A. Nuzulia, “Peningkatan Kemampuan Berbahasa Madura Yang Baik dan Benar Pada Masyarakat Dusun Banlanjang Tlonto Raja Kecamatan Pasean di Masjid Al Muttaqin,” Angewandte Chemie International Edition, 6(11), 951–952., vol. 1, no. 1, pp. 5–24, 2019.
[4] T. Hegghammer, “OCR with Tesseract, Amazon Textract, and Google Document AI: a benchmarking experiment,” J Comput Soc Sci, vol. 5, no. 1, pp. 861–882, 2022, doi: 10.1007/s42001-021-00149-1.
[5] I. N. T. Lestari and D. I. Mulyana, “Implementation of Ocr (Optical Character Recognition) Using Tesseract in Detecting Character in Quotes Text Images,” Journal of Applied Engineering and Technological Science, vol. 4, no. 1, pp. 58–63, 2022, doi: 10.37385/jaets.v4i1.905.
[6] V. E. Bugayong, J. Flores Villaverde, and N. B. Linsangan, “Google Tesseract: Optical Character Recognition (OCR) on HDD / SSD Labels Using Machine Vision,” 2022 14th International Conference on Computer and Automation Engineering, ICCAE 2022, pp. 56–60, 2022, doi: 10.1109/ICCAE55086.2022.9762440.
[7] A. Shanthakumari, R. Kalpana, J. Jayashankari, B. Umamaheswari, and M. Sirija, “Mask RCNN and Tesseract OCR for vehicle plate character recognition,” AIP Conf Proc, vol. 2393, 2022, doi: 10.1063/5.0074442.
[8] R. Widianti, S. Surono, and K. I. Ibraheem, “Handling Noise Data with PCA Method and Optimization Using Hybrid Fuzzy C-Means and Genetic Algorithm,” JUITA : Jurnal Informatika, vol. 12, no. 2, pp. 141–147, 2024.
[9] Z. H. Ahmed, A. S. Hameed, and M. L. Mutar, “Hybrid Genetic Algorithms for the Asymmetric Distance-Constrained Vehicle Routing Problem,” Math Probl Eng, vol. 2022, 2022, doi: 10.1155/2022/2435002.
[10] A. Anwaar, A. Ashraf, W. H. K. Bangyal, and M. Iqbal, “Genetic Algorithms: Brief review on Genetic Algorithms for Global Optimization Problems,” Proceedings - 2022 International Conference on Human-Centered Cognitive Systems, HCCS 2022, 2022, doi: 10.1109/HCCS55241.2022.10090327.
[11] M. Zeinali, G. Rahimi, and S. Hosseini, “Optimizing buckling load of sandwich plates with cutouts using artificial neural networks and genetic algorithms,” Mechanics Based Design of Structures and Machines, vol. 52, no. 9, pp. 6173–6190, 2024, doi: 10.1080/15397734.2023.2272679.
[12] H. Naseri, A. Fani, and A. Golroo, “Toward equity in large-scale network-level pavement maintenance and rehabilitation scheduling using water cycle and genetic algorithms,” International Journal of Pavement Engineering, pp. 1–13, 2020, doi: 10.1080/10298436.2020.1790558.
[13] V. Skorpil and V. Oujezsky, “Parallel Genetic Algorithms’ Implementation Using a Scalable Concurrent Operation in Python†,” Sensors, vol. 22, no. 6, 2022, doi: 10.3390/s22062389.
[14] R. Peña-García, R. D. Velázquez-Sánchez, C. Gómez-Daza-Argumedo, J. O. Escobedo-Alva, R. Tapia-Herrera, and J. A. Meda-Campaña, “Physics-Based Aircraft Dynamics Identification Using Genetic Algorithms,” Aerospace, vol. 11, no. 2, 2024, doi: 10.3390/aerospace11020142.
[15] Z. Guo, Y. Wang, S. Zhao, T. Zhao, and M. Ni, “Modeling and optimization of micro heat pipe cooling battery thermal management system via deep learning and multi-objective genetic algorithms,” Int J Heat Mass Transf, vol. 207, 2023, doi: 10.1016/j.ijheatmasstransfer.2023.124024.
[16] B. M. Achmad, S. Sa, and I. Kurniawan, “LSTM Algorithm in Predicting Chronic Kidney Disease Optimized Using Genetic Algorithm,” JUITA : Jurnal Informatika, vol. 12, no. 2, pp. 243–253, 2024.
[17] G. P. Salachoris, G. Standoli, M. Betti, G. Milani, and F. Clementi, “Evolutionary numerical model for cultural heritage structures via genetic algorithms: a case study in central Italy,” Bulletin of Earthquake Engineering, vol. 22, no. 7, pp. 3591–3625, 2024, doi: 10.1007/s10518-023-01615-z.
[18] V. Singh, R. Mehra, K. B. Ramesh, P. Srivastava, and A. Mishra, “Treatment of carpet and textile industry effluents using Diplosphaera mucosa VSPA: A multiple input optimisation study using artificial neural network-genetic algorithms,” Bioresour Technol, vol. 387, 2023, doi: 10.1016/j.biortech.2023.129619.
[19] D. Carreres-Prieto, J. Ybarra-Moreno, J. T. García, and J. F. Cerdán-Cartagena, “A Comparative analysis of neural networks and genetic algorithms to characterize wastewater from led spectrophotometry,” J Environ Chem Eng, vol. 11, no. 3, 2023, doi: 10.1016/j.jece.2023.110219.
[20] M. Elyasi, M. E. Simitcioğlu, A. Saydemir, A. Ekici, O. Ö. Özener, and H. Sözer, “Genetic algorithms and heuristics hybridized for software architecture recovery,” Automated Software Engineering, vol. 30, no. 2, 2023, doi: 10.1007/s10515-023-00384-y.
[21] F. Ye, C. Doerr, H. Wang, and T. Bäck, “Automated Configuration of Genetic Algorithms by Tuning for Anytime Performance: Hot-off-the-Press Track at GECCCO 2022,” GECCO 2022 Companion - Proceedings of the 2022 Genetic and Evolutionary Computation Conference, pp. 51–52, 2022, doi: 10.1145/3520304.3534075.
[22] F. García-Gutierrez et al., “GA-MADRID: design and validation of a machine learning tool for the diagnosis of Alzheimer’s disease and frontotemporal dementia using genetic algorithms,” Med Biol Eng Comput, vol. 60, no. 9, pp. 2737–2756, 2022, doi: 10.1007/s11517-022-02630-z.
[23] Z. Zou, B. Wang, X. Hu, Y. Deng, H. Wan, and H. Jin, “Enhancing requirements-to-code traceability with GA-XWCoDe: Integrating XGBoost, Node2Vec, and genetic algorithms for improving model performance and stability,” Journal of King Saud University - Computer and Information Sciences, vol. 36, no. 8, 2024, doi: 10.1016/j.jksuci.2024.102197.
[24] B. Wang, Q. Xu, Z. Bian, and Y. You, “Tesseract: Parallelize the Tensor Parallelism Efficiently,” ACM International Conference Proceeding Series, 2022, doi: 10.1145/3545008.3545087.
[25] P. Lertsawatwicha, P. Phathong, N. Tantasanee, K. Sarawutthinun, and T. Siriborvornratanakul, “A novel stock counting system for detecting lot numbers using Tesseract OCR,” International Journal of Information Technology (Singapore), vol. 15, no. 1, pp. 393–398, 2023, doi: 10.1007/s41870-022-01107-4.
[26] I. H. Al amin and A. Aprilino, “Implementasi Algoritma Yolo Dan Tesseract Ocr Pada Sistem Deteksi Plat Nomor Otomatis,” Jurnal Teknoinfo, vol. 16, no. 1, p. 54, 2022, doi: 10.33365/jti.v16i1.1522.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 JUITA: Jurnal Informatika

This work is licensed under a Creative Commons Attribution 4.0 International License.

JUITA: Jurnal Informatika is licensed under a Creative Commons Attribution 4.0 International License.








