DESIGN AND IMPLEMENTATION OF AN INTERACTIVE WEB-BASED DATA MINING SYSTEM USING KNN, SVM, AND RANDOM FOREST WITH STREAMLIT

DOI: https://doi.org/10.33650/codex.v1i01.14453
Authors

(1) * Muhammad Faisal   (Universitas Islam Madura)  
        Indonesia
(2)  Kholqi Maulana   (Universitas Islam Madura)  
        Indonesia
(*) Corresponding Author

Abstract


The rapid growth of digital data requires effective tools to extract meaningful information and support decision-making processes. Data mining and machine learning techniques play an important role in analyzing large datasets and producing accurate classifications. However, implementing machine learning models often requires technical expertise and complex tools. This study aims to design and implement a web-based data mining system using the Streamlit framework integrated with classification algorithms, namely K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest. The research method includes system design, implementation, and evaluation using three datasets: Iris, Wine, and Digit. The system provides an interactive interface that allows users to select datasets, configure algorithm parameters, evaluate classification accuracy, and visualize results. The implementation results show that all algorithms perform effectively, with Random Forest achieving the highest accuracy, followed by SVM and KNN. The developed system successfully integrates machine learning classification methods into a user-friendly web-based platform, enabling efficient data analysis and visualization. This study demonstrates that interactive web-based data mining systems can enhance accessibility and understanding of machine learning applications for academic and practical use


Keywords

Classification, Data Mining, Random Forest, Streamlit, Support Vector Machine.



Full Text: PDF



References


Chaurasiya, H., & Pandey, A. K. (2025). Data mining applications used in medical healthcare. In Progressive Computational Intelligence, Information Technology and Networking (pp. 236–240). https://doi.org/10.1201/9781003650010-38

Guleria, M., Sharma, L., Malik, U., Singh, N., & Nigam, A. (2026). Role of Emerging Techniques in Data Mining. In Studies in Computational Intelligence (Vol. 1240, pp. 437–448). https://doi.org/10.1007/978-3-032-06732-6_23

Gupta, S., Sharma, N., Kaur, G., & Sardana, A. (2024). Prediction of diabetes using voting classification algorithms. In Medical Imaging Informatics: Machine learning, deep learning and big data analytics (pp. 89–114). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85186820901&partnerID=40&md5=8a5c8664a3a359a3e3b955d766596a15

Halder, M., Shopnil, S., Arafat, Y., Chowdhury, M. M., Hossain Jobayer, S., & Farid, D. M. (2023). Clustering as a Catalyst for Big Data Classification (CC-BC). 2023 26th International Conference on Computer and Information Technology, ICCIT 2023. https://doi.org/10.1109/ICCIT60459.2023.10441188

Jabardi, M. (2025). Support Vector Machines: Theory, Algorithms, and Applications. Infocommunications Journal, 17(1), 66–75. https://doi.org/10.36244/ICJ.2025.1.8

Mishra, A., Khan, M. H., Khan, W., Khan, M. Z., & Srivastava, N. K. (2022). A Comparative Study on Data Mining Approach Using Machine Learning Techniques: Prediction Perspective. In EAI/Springer Innovations in Communication and Computing (pp. 153–165). https://doi.org/10.1007/978-3-030-77746-3_11

Mohammed, S. F., & Mahdi, G. J. M. (2024). Non-linear support vector machine classification models using kernel tricks with applications. AIP Conference Proceedings, 3036(1). https://doi.org/10.1063/5.0196147

Nantasenamat, C., Biswas, A., Nápoles-Duarte, J. M., Parker, M. I., & Dunbrack, R. L. (2023). Building bioinformatics web applications with Streamlit. In Cheminformatics, QSAR and Machine Learning Applications for Novel Drug Development (pp. 679–699). https://doi.org/10.1016/B978-0-443-18638-7.00001-3

Pan, Z., Pan, Y., Wang, Y., & Wang, W. (2021). A new globally adaptive k-nearest neighbor classifier based on local mean optimization. Soft Computing, 25(3), 2417–2431. https://doi.org/10.1007/s00500-020-05311-x

Pandey, S. (2024). Streamlit Essentials: From basics to advanced data app development. In Streamlit Essentials: From Basics to Advanced Data App Development. https://www.scopus.com/inward/record.uri?eid=2-s2.0-105022971937&partnerID=40&md5=04546039d9c45110949cad11d0f99cee

Rada, R., Bedalli, E., Shurdhi, S., & Cico, B. (2023). A comparative analysis on prototype-based clustering methods. 12th Mediterranean Conference on Embedded Computing, MECO 2023. https://doi.org/10.1109/MECO58584.2023.10154917

Raghavendra, S. (2022). Beginner’s guide to streamlit with python: Build web-based data and machine learning applications. In Beginner’s Guide to Streamlit with Python: Build Web-Based Data and Machine Learning Applications. https://doi.org/10.1007/978-1-4842-8983-9

Zhao, J., Lee, C.-D., Chen, G., & Zhang, J. (2024). Research on the Prediction Application of Multiple Classification Datasets Based on Random Forest Model. 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems, ICPICS 2024, 156–161. https://doi.org/10.1109/ICPICS62053.2024.10795875


Dimensions, PlumX, and Google Scholar Metrics

10.33650/codex.v1i01.14453


Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Muhammad Faisal, Kholqi Maulana

Creative Commons License
 

CODEX: Journal of Software Engineering
Published by Lembaga Penerbitan, Penelitian dan Pengabdian kepada Masyarakat (LP3M) of Nurul Jadid University, Probolinggo, East Java, Indonesia.