A Comprehensive Review of Clasifier used with Imbalanced Data in Machine Learning

DOI: https://doi.org/10.33650/jeecom.v6i1.8510

Authors (s)


(1) * Muammar Reza Pahlawan   (Magister of Informatics, Universitas AMIKOM Yogyakarta)  
        Indonesia
(2)  Arief Setyanto   (Magister of Informatics, Universitas AMIKOM Yogyakarta)  
        Indonesia
(3)  M. Rudyanto Arief   (Magister of Informatics, Universitas AMIKOM Yogyakarta)  
        Indonesia
(*) Corresponding Author

Abstract


Dengan majunya perkembangan teknologi beberapa tahun terakhir, menghadirkan banyak konten digital. Hal ini juga menghadirkan kesempatan dalam bidang penelitian seperti halnya Machine Learning. Salah satu metode dalam Machine Learning adalah klasifikasi. Klasifikasi bertujuan untuk mengelompokkan data sesuai dengan kelasnya. Akan tetapi faktor seperti data imbalance dapat menyebabkan hasil dari metode ini menjadi kurang sesuai dengan yang diharapkan. Penelitian ini menyajikan tinjauan komprehensif tentang metode klasifikasi dalam pengolahan teks, dengan fokus pada penanganan tantangan yang ditimbulkan oleh data yang tidak seimbang. Dengan pertumbuhan eksponensial konten digital, kebutuhan untuk mengkategorikan dan menganalisis data teks secara efektif telah menjadi semakin kritis. Metode klasifikasi memainkan peran penting dalam upaya ini, memfasilitasi tugas seperti analisis sentimen, klasifikasi dokumen, dan pengambilan informasi. Namun, keberadaan data imbalance, ditandai oleh distribusi kelas yang condong, menimbulkan hambatan signifikan terhadap keandalan dan efektivitas model klasifikasi. Dengan penelitian ini diharapkan pembaca, dapat mengetahui metode apa saja yang umumnya digunakan dalam metode klasifikasi. Kemampuan metode klasifikasi tersebut pada umumnya ketika dihadapkan pada kasus tertentu seperti data imbalance. Tinjauan ini menyoroti Support Vector Machine (SVM) sebagai metode klasifikasi paling menonjol sebesar 25%, diikuti oleh K-Nearest Neighbours (KNN) dan Random Forest dengan persentase 19%, Decision Tree, dan Naïve Bayes. Metode alternatif yang disesuaikan dengan tujuan penelitian dan tantangan tertentu juga dieksplorasi. Hasil persentase penggunaan metode tersebut didapat dari kumpulan jurnal yang peneliti kumpulkan dan teliti


Keywords

Classification Methods; Super Vector Machine (SVM); K-Nearest Neighbors (KNN); Random Forest, Imbalance Data

Full Text: PDF

References


C.-A. Tsai and Y.-J. Chang, “Efficient Selection of Gaussian Kernel SVM Parameters for Imbalanced Data,” Genes, vol. 14, no. 3, p. 583, Feb. 2023, doi: 10.3390/genes14030583.

A. N. Kasanah, M. Muladi, and U. Pujianto, “Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN,” RESTI, vol. 3, no. 2, pp. 196–201, Aug. 2019, doi: 10.29207/resti.v3i2.945.

I. Binanto, N. F. Sianipar, F. Dea, M. N. Primadani, and T. W. Kartikasari, “KLASIFIKASI SENYAWA KELADI TIKUS MENGGUNAKAN ALGORITMA KNN, GAUSSIAN NAÏVE BAYES DENGAN MENERAPKAN IMBALANCE DATA BORDERLINE SMOTE,” SNST, vol. 13, no. 1, p. 377, Nov. 2023, doi: 10.36499/psnst.v13i1.9005.

A. Fazli and J. Poshtan, “Wind turbine fault detection and isolation robust against data imbalance using KNN,” Energy Science & Engineering, vol. 12, no. 3, pp. 1174–1186, Mar. 2024, doi: 10.1002/ese3.1706.

M. Wahyu Ade Saputra, E. Utami, and A. Yaqin, “Unlocking Insights: A Literature Review on Enhanced Confix Stripping and Nazief & Adriani Algorithm Modifications for Makassar Language Text Stemming,” International Journal of Innovative Science and Research Technology (IJISRT), pp. 603–610, Mar. 2024, doi: 10.38124/ijisrt/IJISRT24MAR437.

Prema Adhitya Dharma Kusumah, Kusrini Kusrini, and Kusnawi Kusnawi, “Optimizing Data Security: A Literature Review on the Implementation of Beaufort Cipher for Vigenère Affine Cipher,” Feb. 2024, doi: 10.5281/ZENODO.10685974.

P. Thölke et al., “Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data,” NeuroImage, vol. 277, p. 120253, Aug. 2023, doi: 10.1016/j.neuroimage.2023.120253.

Sukamto, Hadiyanto, and Kurnianingsih, “KNN Optimization Using Grid Search Algorithm for Preeclampsia Imbalance Class,” E3S Web Conf., vol. 448, p. 02057, 2023, doi: 10.1051/e3sconf/202344802057.

L. Ren, A. S. Seklouli, H. Zhang, T. Wang, and A. Bouras, “An adaptive Laplacian weight random forest imputation for imbalance and mixed-type data,” Information Systems, vol. 111, p. 102122, Jan. 2023, doi: 10.1016/j.is.2022.102122.

Mohd. Mustaqeem and T. Siddiqui, “A Hybrid Software Defects Prediction Model for Imbalance Datasets Using Machine Learning Techniques: (S-SVM Model),” J Autonom Intell, vol. 6, no. 1, p. 559, Jun. 2023, doi: 10.32629/jai.v6i1.559.

I. Kurniawan, D. C. P. Buani, A. Abdussomad, W. Apriliah, and E. Fitriani, “Penerapan Teknik Random Undersampling untuk Mengatasi Imbalance Class dalam Prediksi Kebakaran Hutan Menggunakan Algoritma Decision Tree,” AJCSR, vol. 5, no. 1, p. 1, Jan. 2023, doi: 10.38101/ajcsr.v5i1.617.

X. Cai, M. Xiao, Z. Ning, and Y. Zhou, “Resolving the Imbalance Issue in Hierarchical Disciplinary Topic Inference via LLM-based Data Augmentation,” in 2023 IEEE International Conference on Data Mining (ICDM), Shanghai, China: IEEE, Dec. 2023, pp. 956–961. doi: 10.1109/ICDM58522.2023.00107.

A. Z. Zakaria, A. Selamat, L. K. Cheng, and O. Krejcar, “Improving Class Imbalance Detection And Classification Performance: A New Potential of Combination Resample and Random Forest,” in 2022 IEEE International Conference on Computing (ICOCO), Kota Kinabalu, Malaysia: IEEE, Nov. 2022, pp. 316–323. doi: 10.1109/ICOCO56118.2022.10031922.

I. A. E. Zaeni, W. Primadi, M. K. Osman, D. R. Anzani, D. Lestari, and A. N. Handayani, “Detection of the Imbalance Step Length using the Decision Tree,” in 2022 Fifth International Conference on Vocational Education and Electrical Engineering (ICVEE), Surabaya, Indonesia: IEEE, Sep. 2022, pp. 157–162. doi: 10.1109/ICVEE57061.2022.9930456.

M. Yan, J. Wang, D. Li, and J. Meng, “An Improved Imbalanced Data Classification Algorithm Based on SVM,” in 2022 International Conference on Cyber-Physical Social Intelligence (ICCSI), Nanjing, China: IEEE, Nov. 2022, pp. 454–459. doi: 10.1109/ICCSI55536.2022.9970637.

Z. Xing, R. Zhao, Y. Wu, and T. He, “Intelligent fault diagnosis of rolling bearing based on novel CNN model considering data imbalance,” Appl Intell, vol. 52, no. 14, pp. 16281–16293, Nov. 2022, doi: 10.1007/s10489-022-03196-x.

H. Suryono, H. Kuswanto, and N. Iriawan, “Rice phenology classification based on random forest algorithm for data imbalance using Google Earth engine,” Procedia Computer Science, vol. 197, pp. 668–676, 2022, doi: 10.1016/j.procs.2021.12.201.

Md. A. Sahid, M. Hasan, N. Akter, and Md. M. R. Tareq, “Effect of Imbalance Data Handling Techniques to Improve the Accuracy of Heart Disease Prediction using Machine Learning and Deep Learning,” in 2022 IEEE Region 10 Symposium (TENSYMP), Mumbai, India: IEEE, Jul. 2022, pp. 1–6. doi: 10.1109/TENSYMP54529.2022.9864473.

D. Mualfah, W. Fadila, and R. Firdaus, “Teknik SMOTE untuk Mengatasi Imbalance Data pada Deteksi Penyakit Stroke Menggunakan Algoritma Random Forest,” CoSciTech, vol. 3, no. 2, pp. 107–113, Aug. 2022, doi: 10.37859/coscitech.v3i2.3912.

S. Maula Chamzah, M. Lestandy, N. Kasan, and A. Nugraha, “Penerapan Synthetic Minority Oversampling Technique (SMOTE) untuk Imbalance Class pada Data Text Menggunakan kNN,” Syntax J. Inf., vol. 11, no. 02, pp. 56–67, Nov. 2022, doi: 10.35706/syji.v11i02.6940.

B. Ma, “The Impact of Environmental Pollution on Residents’ Income Caused by the Imbalance of Regional Economic Development Based on Artificial Intelligence,” Sustainability, vol. 15, no. 1, p. 637, Dec. 2022, doi: 10.3390/su15010637.

K. Kurniabudi, A. Harris, V. Veronica, and E. Yanti, “Optimizing Attack Detection for High Dimensionality and Imbalanced Data with SMOTE, Chi-Square and Random Forest Classifier,” ijics, vol. 6, no. 1, p. 1, Mar. 2022, doi: 10.30865/ijics.v6i1.3890.

M. A. Ganaie and M. Tanveer, “KNN weighted reduced universum twin SVM for class imbalance learning,” Knowledge-Based Systems, vol. 245, p. 108578, Jun. 2022, doi: 10.1016/j.knosys.2022.108578.

C. Fu, S. Zhou, D. Zhang, and L. Chen, “Relative Density-Based Intuitionistic Fuzzy SVM for Class Imbalance Learning,” Entropy, vol. 25, no. 1, p. 34, Dec. 2022, doi: 10.3390/e25010034.

F. Budiman, I. A. Saputro, P. Purwanto, and P. N. Andono, “Optimization Of Classification Results By Minimizing Class Imbalance On Decision Tree Algorithm,” in 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE), Jakarta, Indonesia: IEEE, Jan. 2022, pp. 6–11. doi: 10.1109/ISMODE53584.2022.9743062.

U. Bradter, J. D. Altringham, W. E. Kunin, T. J. Thom, J. O’Connell, and T. G. Benton, “Variable ranking and selection with random forest for unbalanced data,” Environ. Data Science, vol. 1, p. e30, 2022, doi: 10.1017/eds.2022.34.

S. A. Bahanshal, R. S. Baraka, B. Kim, and V. Verdhan, “An Optimized Hybrid Fuzzy Weighted k-Nearest Neighbor with the Presence of Data Imbalance,” IJACSA, vol. 13, no. 4, 2022, doi: 10.14569/IJACSA.2022.0130476.

M. Badar, M. Fisichella, V. Iosifidis, and W. Nejdl, “Discrimination and Class Imbalance Aware Online Naive Bayes,” 2022, doi: 10.48550/ARXIV.2211.04812.

N. Yudistira, A. F. Putra, Ahmad Saifuddin, and Noverio Athariq Syafaz, “Algoritma Decision Tree Dan Smote Untuk Klasifikasi Serangan Jantung Miokarditis Yang Imbalance,” JLE, vol. 2, no. 2, pp. 112–122, Dec. 2021, doi: 10.51402/jle.v2i2.48.

J. Yan, Z. Zhang, and H. Dong, “AdaDT: An adaptive decision tree for addressing local class imbalance based on multiple split criteria,” Appl Intell, vol. 51, no. 7, pp. 4744–4761, Jul. 2021, doi: 10.1007/s10489-020-02061-z.

J.-B. Wang, C.-A. Zou, and G.-H. Fu, “AWSMOTE: An SVM-Based Adaptive Weighted SMOTE for Class-Imbalance Learning,” Scientific Programming, vol. 2021, pp. 1–18, May 2021, doi: 10.1155/2021/9947621.

S. Park and H. Park, “Performance Comparison of Multi-class SVM with Oversampling Methods for Imbalanced Data Classification,” in Advances on Broad-Band Wireless Computing, Communication and Applications, vol. 159, L. Barolli, M. Takizawa, T. Enokido, H.-C. Chen, and K. Matsuo, Eds., in Lecture Notes in Networks and Systems, vol. 159. , Cham: Springer International Publishing, 2021, pp. 108–119. doi: 10.1007/978-3-030-61108-8_11.

B. Huang, Y. Zhu, Z. Wang, and Z. Fang, “Imbalanced Data Classification Algorithm Based on Clustering and SVM,” J CIRCUIT SYST COMP, vol. 30, no. 02, p. 2150036, Feb. 2021, doi: 10.1142/S0218126621500365.

Maradi, A. Y. (2020). Pemanfaatan android untuk sistem kendali robot penembak dengan mikrokontroler. CYCLOTRON, 3(1).

K. Ahlawat, A. Chug, and A. P. Singh, “A Novel Hybrid Sampling Algorithm for Solving Class Imbalance Problem in Big Data,” Adv. Data Sci. Adapt. Data Anal., vol. 13, no. 02, p. 2150005, Apr. 2021, doi: 10.1142/S2424922X21500054.

B. Zhao, X. Zhang, H. Li, and Z. Yang, “Intelligent fault diagnosis of rolling bearings based on normalized CNN considering data imbalance and variable working conditions,” Knowledge-Based Systems, vol. 199, p. 105971, Jul. 2020, doi: 10.1016/j.knosys.2020.105971.

Herlina, A., Syahbana, M. I., Gunawan, M. A., & Rizqi, M. M. (2022). Sistem Kendali Lampu Berbasis Iot Menggunakan Aplikasi Blynk 2.0 Dengan Modul Nodemcu Esp8266. INSANtek, 3(2), 61-66..

Q. Shu, T. Hu, and S. Liu, “Random Forest Algorithm Based on GAN for Imbalanced Data Classification,” J. Phys.: Conf. Ser., vol. 1544, no. 1, p. 012014, May 2020, doi: 10.1088/1742-6596/1544/1/012014.

Y. Lu, Y.-M. Cheung, and Y. Y. Tang, “Bayes Imbalance Impact Index: A Measure of Class Imbalanced Data Set for Classification Problem,” IEEE Trans. Neural Netw. Learning Syst., vol. 31, no. 9, pp. 3525–3539, Sep. 2020, doi: 10.1109/TNNLS.2019.2944962.

Setyobudi, R. (2023). Utilization of tds sensors for water quality monitoring and water filtering of carp pools using IoT. EUREKA: Physics and Engineering, (6), 69-77.

S. Abdullah and G. Prasetyo, “EASY ENSEMMBLE WITH RANDOM FOREST TO HANDLE IMBALANCED DATA IN CLASSIFICATION,” JFMA, vol. 3, no. 1, pp. 39–46, Jun. 2020, doi: 10.14710/jfma.v3i1.7415.

G. Zheng, C. A. Wu, and H. Guo, “KNN-based ensemble selection for imbalance learning,” IJCSYSE, vol. 5, no. 2, p. 82, 2019, doi: 10.1504/IJCSYSE.2019.100025.

Setyobudi, R. (2023). Utilization of tds sensors for water quality monitoring and water filtering of carp pools using IoT. EUREKA: Physics and Engineering, (6), 69-77.

Md. Mahin, Md. J. Islam, B. C. Debnath, and A. Khatun, “Tuning Distance Metrics and K to Find Sub-categories of Minority Class from Imbalance Data Using K Nearest Neighbours,” in 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox’sBazar, Bangladesh: IEEE, Feb. 2019, pp. 1–6. doi: 10.1109/ECACE.2019.8679380.

Prabowo, Y. A., Imaduddin, R. I., Pambudi, W. S., Firmansyah, R. A., & Fahruzi, A. (2021). Identification of automatic guided vehicle (agv) based on magnetic guided sensor for industrial material transfer. In IOP Conference Series: Materials Science and Engineering (Vol. 1010, No. 1, p. 012028). IOP Publishing..

D. Devi, S. K. Biswas, and B. Purkayastha, “Learning in presence of class imbalance and class overlapping by using one-class SVM and undersampling technique,” Connection Science, vol. 31, no. 2, pp. 105–142, Apr. 2019, doi: 10.1080/09540091.2018.1560394.

M. Bader-El-Den, E. Teitei, and T. Perry, “Biased Random Forest For Dealing With the Class Imbalance Problem,” IEEE Trans. Neural Netw. Learning Syst., vol. 30, no. 7, pp. 2163–2172, Jul. 2019, doi: 10.1109/TNNLS.2018.2878400.


Article View

Abstract views : 129 times | PDF files viewed : 72 times

Dimensions, PlumX, and Google Scholar Metrics

10.33650/jeecom.v6i1.8510


Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Muammar Reza Pahlawan, Arief Setyanto, M. Rudyanto Arief

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Creative Commons License
 
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Journal of Electrical Engineering and Computer (JEECOM)
Published by LP3M Nurul Jadid University, Indonesia, Probolinggo, East Java, Indonesia.