Pattern Matching Algorithms for Optimizing the Accuracy of Optical Character Recognition in Automated Migrant Worker Registration Systems

DOI: https://doi.org/10.33650/jeecom.v8i1.14487
Authors

(1) * Ferry Wiranto   (Institut Teknologi dan Sains Mandala)  
        Indonesia
(*) Corresponding Author

Abstract


Manual registration processes for migrant workers present significant operational challenges, requiring 10-15 minutes per person with data error rates reaching 15%, hindering service efficiency in protection organizations. This research addresses these challenges by developing a domain-specific Optical Character Recognition (OCR) system optimized through multiple pattern matching algorithms tailored for Indonesian identity documents. Unlike general-purpose OCR approaches, the system implements six pattern variations for RT/RW field extraction and three hybrid strategies (direct, fuzzy, and contextual matching) for occupation fields, specifically designed to handle format inconsistencies in KTP and KK documents. Testing with 50 document samples achieved variable accuracy rates ranging from 75-95% across different field types, with the multiple pattern approach demonstrating 30.8% improvement over single-pattern methods for RT/RW fields and 20% improvement for occupation fields. Real-world deployment at Migrant Care Jember produced measurable operational improvements: 67% time reduction (12 to 4 minutes), 80% error reduction (15% to 3%), and threefold service capacity increase without additional personnel. The integrated confidence level system with visual indicators (green/yellow/red) enables non-technical users to identify fields requiring verification, enhancing practical usability. This study demonstrates that domain-specific pattern matching optimization can effectively bridge the gap between theoretical OCR advancements and practical implementation challenges in resource-constrained organizational settings, with direct implications for migrant worker protection services.


Keywords

Pattern Matching, Optical Character Recognition, Tesseract.js, Migrant Workers, Algorithm Optimization



Full Text: PDF



References


BNP2TKI, Data Penempatan dan Perlindungan Pekerja Migran Indonesia. Jakarta: Badan Nasional Penempatan dan Perlindungan Tenaga Kerja Indonesia, 2024.

Mosavi, S. Shamshirband, E. Salwana, K. wing Chau, and J. H. M. Tah, "Prediction of multi-inputs bubble column reactor using a novel hybrid model of computational fluid dynamics and machine learning," Eng. Appl. Comput. Fluid Mech., vol. 13, no. 1, pp. 482-492, 2019.

Shorten and T. M. Khoshgoftaar, "A survey on Image Data Augmentation for Deep Learning," J. Big Data, vol. 6, no. 1, 2019, doi: 10.1186/s40537-019-0197-0.

G. Nguyen et al., "Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey," Artif. Intell. Rev., vol. 52, no. 1, pp. 77-124, 2019, doi: 10.1007/s10462-018-09679-z.

Tesseract.js, "Pure Javascript OCR for more than 100 Languages," 2023. [Online]. Available: https://tesseract.projectnaptha.com/

R. Smith, "An Overview of the Tesseract OCR Engine," in Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society, 2007.

G. L. Prajapati and A. S. Patil, "Performance Evaluation of OCR Techniques," International Journal of Computer Applications, vol. 127, no. 11, pp. 30-34, 2015.

R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran, A. Al-Nemrat, and S. Venkatraman, "Deep Learning Approach for Intelligent Intrusion Detection System," IEEE Access, vol. 7, pp. 41525-41550, 2019, doi: 10.1109/ACCESS.2019.2895334.

Y. Wu et al., "Large scale incremental learning," Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019-June, pp. 374-382, 2019, doi: 10.1109/CVPR.2019.00046.

V. Palanisamy and R. Thirunavukarasu, "Implications of big data analytics in developing healthcare frameworks - A review," J. King Saud Univ. - Comput. Inf. Sci., vol. 31, no. 4, pp. 415-425, 2019, doi: 10.1016/j.jksuci.2017.12.007.

D. Dwivedi, G. Srivastava, S. Dhar, and R. Singh, "A decentralized privacy-preserving healthcare blockchain for IoT," Sensors (Switzerland), vol. 19, no. 2, pp. 1-17, 2019, doi: 10.3390/s19020326.

F. Al-Turjman, H. Zahmatkesh, and L. Mostarda, "Quantifying uncertainty in internet of medical things and big-data services using intelligence and deep learning," IEEE Access, vol. 7, pp. 115749-115759, 2019, doi: 10.1109/ACCESS.2019.2931637.

S. Kumar and M. Singh, "Big data analytics for healthcare industry: Impact, applications, and tools," Big Data Min. Anal., vol. 2, no. 1, pp. 48-57, 2019, doi: 10.26599/BDMA.2018.9020031.

L. M. Ang, K. P. Seng, G. K. Ijemaru, and A. M. Zungeru, "Deployment of IoV for Smart Cities: Applications, Architecture, and Challenges," IEEE Access, vol. 7, pp. 6473-6492, 2019, doi: 10.1109/ACCESS.2018.2887076.

B. P. L. Lau et al., "A survey of data fusion in smart city applications," Inf. Fusion, vol. 52, no. January, pp. 357-374, 2019, doi: 10.1016/j.inffus.2019.05.004.

K. Sivaraman, R. M. V. Krishnan, B. Sundarraj, and S. Sri Gowthem, "Network failure detection and diagnosis by analyzing syslog and SNS data: Applying big data analysis to network operations," Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 9 Special Issue 3, pp. 883-887, 2019.

J. Sadowski, "When data is capital: Datafication, accumulation, and extraction," Big Data Soc., vol. 6, no. 1, pp. 1-12, 2019, doi: 10.1177/2053951718820549.

J. R. Saura, B. R. Herraez, and A. Reyes-Menendez, "Comparing a traditional approach for financial brand communication analysis with a big data analytics technique," IEEE Access, vol. 7, pp. 37100-37108, 2019, doi: 10.1109/ACCESS.2019.2905301.

D. Nallaperuma et al., "Online Incremental Machine Learning Platform for Big Data-Driven Smart Traffic Management," IEEE Trans. Intell. Transp. Syst., vol. 20, no. 12, pp. 4679-4690, 2019, doi: 10.1109/TITS.2019.2924883.

C. Shang and F. You, "Data Analytics and Machine Learning for Smart Process Manufacturing: Recent Advances and Perspectives in the Big Data Era," Engineering, vol. 5, no. 6, pp. 1010-1016, 2019, doi: 10.1016/j.eng.2019.01.019.

Y. Yu, M. Li, L. Liu, Y. Li, and J. Wang, "Clinical big data and deep learning: Applications, challenges, and future outlooks," Big Data Min. Anal., vol. 2, no. 4, pp. 288-305, 2019, doi: 10.26599/BDMA.2019.9020007.

M. Sigala, A. Beer, L. Hodgson, and A. O'Connor, Big Data for Measuring the Impact of Tourism Economic Development Programmes: A Process and Quality Criteria Framework for Using Big Data. 2019.

F. Wiranto and I. M. Tirta, “Information Retrieval Using Matrix Methods Case Study : Three Popular Online News Sites in Indonesia,” Proc. Int. Conf. Math. Geom. Stat. Comput. (IC-MaGeStiC 2021), vol. 96, pp. 167–172, 2022.

F. Wiranto, I. Sabilirrasyad, M. Hermansyah, S. Mandala, F. Wiranto, and S. Mandala, “Optimizing Forecasting of Dow Jones Stock Index in New York amid Uncertain Global Conditions in 2023 : A Combined Approach of ARIMA and Machine Learning Models,” vol. 1, pp. 73–88, 2023.

F. Wiranto, “Analysis of LQ45 Index Stock Movements using the ARIMA Method during Uncertainty in Global Economic Conditions in 2023,” Int. Conf. Econ. , Bus. Inf. Technol., no. April 2021, pp. 816–827, 2023.


Dimensions, PlumX, and Google Scholar Metrics

10.33650/jeecom.v8i1.14487


Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Ferry Wiranto

 
This work is licensed under a Creative Commons Attribution License (CC BY-SA 4.0)

Journal of Electrical Engineering and Computer (JEECOM)
Published by LP3M Nurul Jadid University, Indonesia, Probolinggo, East Java, Indonesia.