Levenshtein and Similar Text PHP Functions for Correcting Typographical Errors

Main Article Content

Dika Rizky Yunianto
Elok Nur Hamdana
Imam Fahrur Rozi

Abstract

Typographical errors, often referred to as writing mistakes or typos, are a common occurrence in both traditional and digital forms of content. They can also manifest during text input on website platforms. One effective approach to rectifying these errors involves leveraging the concept of text similarity. This entails evaluating how similar two words are to each other, serving as a benchmark for correcting typos. In the realm of website development, where the PHP programming language is frequently employed, there exist text similarity functions known as levenshtein() and similar_text(). The levenshtein() function quantifies the disparity between two strings, whereas the similar_text() function measures their likeness. By combining these two functions, it becomes possible to assess both the proximity and divergence between two strings, providing a comprehensive perspective on their similarity. Results from empirical testing have demonstrated that the amalgamation of these functions yields a noteworthy precision score of 85%. This precision metric outperforms the precision values achieved by the levenshtein() and similar_text() functions when employed in isolation. This study holds promise for enhancing the accuracy of textual content on websites and represents a valuable asset in the pursuit of error-free and professional web-based communication.

Article Details

Section
Informatics

References

Beall, J., & Kafadar, K. (2004). The effectiveness of copy cataloging at eliminating typographical errors in shared bibliographic records. Library Resources and Technical Services, 48 (2), 92–101.

Bisandu, D. B., Prasad, R., & Liman, M. M. (2019). Data clustering using efficient similarity measures. Journal of Statistics and Management Systems (22) (5).

Hamidah, N., Yusliani, N., & Rodiah, D. (2020). Spelling Checker using Algorithm Damerau Levenshtein Distance and Cosine Similarity. Sriwijaya Journal of Informatics and Applications, vol. 1, no. 1.

Krstinic, D., Braovic, M., Seric, L., & Bozic-Stulic, D. (2020). Multi-label classifier performance evaluation with confusion matrix. Computer Science & Information Technology (CS & IT).

Mawardi, V. C., Augusfian, F., Pragantha, J., & Bressan, S. (2020). Spelling Correction Application with Damerau-Levenshtein Distance to Help Teachers Examine Typographical Errors in Exam Test Scripts. E3S Web of Conferences (p. 188).

PHP. (1997 - 2023). PHP Documentation. From PHP Documentation: https://www.php.net/manual/en/index.php

Prasetya, D. D., Wibawa, A. P., & Hirashima, T. (2018). The performance of text similarity algorithms. International Journal of Advances in Intelligent Informatics, Vol. 4 No. 1.

Rustamovna, A. U. (2021). Understanding the Levenshtein Distance Equation for Beginners. The American Journal of Engineering and Technology.

Shah, K., & Melo, G. (2020). Correcting the Autocorrect: Context-Aware Typographical Error Correction via Training Data Augmentation.

Siahaan, A., Aryza, S., Hariyanto, E., Rusiadi, Lubis, A., Ikhwan, A., & Eh Kan, P. (2018). Combination of Levenshtein distance and Rabin-Karp to improve the accuracy of document equivalence level. International Journal of Engineering & Technology, 7, 2.27.