Pengaruh Komposisi Split Data Terhadap Performa Akurasi Analisis Sentimen Algoritma Naïve Bayes dan SVM

Yoga Adi Prasetyo, Ema Utami, Ainul Yaqin
DOI: https://doi.org/10.33650/jeecom.v6i2.9188



Abstract

Analisis sentimen merupakan bidang yang penting dalam pengolahan bahasa alami dan aplikasi sosial media modern. Penelitian ini menginvestigasi pengaruh dari variasi komposisi split data terhadap performa akurasi model analisis sentimen menggunakan SVM dan Naive Bayes. Metode eksperimen menggunakan variasi dari teknik k-fold cross-validation untuk membandingkan hasil dari berbagai proporsi pembagian data latih dan uji. Hasil eksperimen menunjukkan bahwa komposisi split data memiliki dampak signifikan terhadap performa akurasi kedua algoritma, dengan beberapa proporsi split data menghasilkan hasil yang lebih konsisten dan stabil dibandingkan dengan yang lain. Temuan ini memberikan wawasan yang berharga dalam pengaturan praktis untuk pelatihan model analisis sentimen yang lebih efektif dan andal. Teknik ekstraksi fitur yang digunakan Term Frequency-Inverse Document Frequency (TF-IDF), dengan algoritma klasifikasi Naive Bayes dan Support Vector Machine (SVM). Performa model dievaluasi menggunakan metrik seperti akurasi, presisi, recall, dan F1-score. Hasil penelitian menunjukkan bahwa signifikan model SVM dengan rasio 80:20 mencapai akurasi 76,66% dan F1-score 77 %, dibandingkan metode SVM dan Naïve Bayes dengan rasio lainnya. 


Keywords

Analisis sentimen;Data Splitting;TF-IDF;SVM;Naïve Bayes

Full Text:

PDF

References

Y. Rohmiyati, “Analisis Penyebaran Informasi Pada Sosial Media,” Anuva, vol. 2, no. 1, p. 29, 2018, doi: 10.14710/anuva.2.1.29-42.

M. T. Nitamia and H. Februariyanti, “Analisis Sentimen Ulasan Ekpedisi J&T Expres Menggunakan Algoritma Naive Bayes,” J. Manaj. Inform. Sist. Inf., vol. 5, no. 1, pp. 20–29, 2022.

U. Khaira, R. Aryani, and R. W. Hardian, “Komparasi Algoritma Naïve Bayes Dan Support Vector Machine (SVM) Pada Analisis Sentimen Kebijakan Kemdikbudristek Mengenai Kuota Internet Selama Covid-19,” J. Process., vol. 18, no. 2, pp. 272–285, 2023, doi: 10.33998/processor.2023.18.2.897.

R. Oktafiani, A. Hermawan, and D. Avianto, “Pengaruh Komposisi Split data Terhadap Performa Klasifikasi Penyakit Kanker Payudara Menggunakan Algoritma Machine Learning,” J. Sains dan Inform., vol. 9, no. April, pp. 19–28, 2023, doi: 10.34128/jsi.v9i1.622.

S. Rabbani, D. Safitri, F. Try Puspa Siregar, R. Rahmaddeni, and L. Efrizoni, “Evaluation of Support Vector Machine, Naive Bayes, Decision Tree, and Gradient Boosting Algorithms for Sentiment Analysis on ChatGPT Twitter Dataset,” Indones. J. Artif. Intell. Data Min., vol. 7, no. 1, p. 11, 2023, doi: 10.24014/ijaidm.v7i1.24662.

D. N. N. Husnina, D. E. Ratnawati, and B. Rahayudi, “Analisis Sentimen Pengguna Aplikasi RedBus berdasarkan Ulasan di Google Play Store menggunakan Metode Naïve Bayes,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 2, pp. 737–743, 2023, [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/12297%0Ahttps://j-ptiik.ub.ac.id/index.php/j-ptiik/article/download/12297/5593.

C. B. Vista, O. M. Sihono, and A. T. Firdausi, “Analisis Sentimen Kebijakan Pembelajaran Tatap Muka Selama Pandemi Covid-19 Menggunakan Metode Support Vector Machine,” J. Inform. Polinema, vol. 9, no. 3, pp. 259–264, 2023, doi: 10.33795/jip.v9i3.1273.

A. Mustolih, P. Arsi, and P. Subarkah, “Sentiment Analysis Motorku X Using Applications Naive Bayes Classifier Method,” Indones. J. Artif. Intell. Data Min., vol. 6, no. 2, p. 231, 2023, doi: 10.24014/ijaidm.v6i2.24864.

J. Li, M. Ayu, S. T. Albarhami, and K. Kyritsis, Advances in Sentiment Analysis, no. January. 2024.

G. paksi Permana, D. A. Nugraha, and H. Santoso, “JOINTECS Perbandingan Performa SVM dan Naïve Bayes Pada Analisis Sentimen,” vol. 7, no. 1, pp. 4–6, 2024.

H. Bichri, A. Chergui, and M. Hain, “Investigating the Impact of Train / Test Split Ratio on the Performance of Pre-Trained Models with Custom Datasets,” Int. J. Adv. Comput. Sci. Appl., vol. 15, no. 2, pp. 331–339, 2024, doi: 10.14569/IJACSA.2024.0150235.

I. O. Muraina, “Ideal Dataset Splitting Ratios in Machine Learning Algorithms: General Concerns for Data Scientists and Data Analysts,” 7th Int. Mardin Artuklu Sci. Res. Conf., no. February, pp. 496–504, 2022, [Online]. Available: https://www.researchgate.net/publication/358284895_IDEAL_DATASET_SPLITTING_RATIOS_IN_MACHINE_LEARNING_ALGORITHMS_GENERAL_CONCERNS_FOR_DATA_SCIENTISTS_AND_DATA_ANALYSTS.

F. Resyanto, Y. Sibaroni, and A. Romadhony, “Choosing The Most Optimum Text Preprocessing Method for Sentiment Analysis: Case:iPhone Tweets,” Proc. 2019 4th Int. Conf. Informatics Comput. ICIC 2019, pp. 2–6, 2019, doi: 10.1109/ICIC47613.2019.8985943.

Y. Handayani, A. R. Hakim, and Muljono, “Sentiment analysis of Bank BNI user comments using the support vector machine method,” Proc. - 2020 Int. Semin. Appl. Technol. Inf. Commun. IT Challenges Sustain. Scalability, Secur. Age Digit. Disruption, iSemantic 2020, pp. 202–207, 2020, doi: 10.1109/iSemantic50169.2020.9234230.

B. AlBadani, R. Shi, and J. Dong, “A Novel Machine Learning Approach for Sentiment Analysis on Twitter Incorporating the Universal Language Model Fine-Tuning and SVM,” Appl. Syst. Innov., vol. 5, no. 1, 2022, doi: 10.3390/asi5010013.

N. V. Babu and E. G. M. Kanaga, “Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review,” SN Comput. Sci., vol. 3, no. 1, pp. 1–20, 2022, doi: 10.1007/s42979-021-00958-1.

Baiq Nurul Azmi, Arief Hermawan, and Donny Avianto, “Analisis Pengaruh Komposisi Data Training dan Data Testing pada Penggunaan PCA dan Algoritma Decision Tree untuk Klasifikasi Penderita Penyakit Liver,” JTIM J. Teknol. Inf. dan Multimed., vol. 4, no. 4, pp. 281–290, 2023, doi: 10.35746/jtim.v4i4.298.

Y. Barve, J. R. Saini, K. Pal, and K. Kotecha, “A Novel Evolving Sentimental Bag-of-Words Approach for Feature Extraction to Detect Misinformation,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 4, pp. 266–275, 2022, doi: 10.14569/IJACSA.2022.0130431.

P. Verma, A. Dumka, A. Bhardwaj, and A. Ashok, “Product Review-Based Customer Sentiment Analysis Using an Ensemble of mRMR and Forest Optimization Algorithm (FOA),” Int. J. Appl. Metaheuristic Comput., vol. 13, no. 1, pp. 1–21, 2022, doi: 10.4018/ijamc.2022010107.

M. B. Ressan and R. F. Hassan, “Naïve-Bayes family for sentiment analysis during COVID-19 pandemic and classification tweets,” Indones. J. Electr. Eng. Comput. Sci., vol. 28, no. 1, pp. 375–383, 2022, doi: 10.11591/ijeecs.v28.i1.pp375-383.

F. Rustam, I. Ashraf, A. Mehmood, S. Ullah, and G. S. Choi, “Tweets classification on the base of sentiments for US airline companies,” Entropy, vol. 21, no. 11, pp. 1–22, 2019, doi: 10.3390/e21111078.

C. H. Yutika, A. Adiwijaya, and S. Al Faraby, “Analisis Sentimen Berbasis Aspek pada Review Female Daily Menggunakan TF-IDF dan Naïve Bayes,” J. Media Inform. Budidarma, vol. 5, no. 2, p. 422, 2021, doi: 10.30865/mib.v5i2.2845.

A. H. Setianingrum, D. H. Kalokasari, and I. M. Shofi, “Implementasi Algoritma Multinomial Naive Bayes Classifier,” J. Tek. Inform., vol. 10, no. 2, pp. 109–118, 2018, doi: 10.15408/jti.v10i2.6822.

R. Obiedat et al., “Sentiment Analysis of Customers’ Reviews Using a Hybrid Evolutionary SVM-Based Approach in an Imbalanced Data Distribution,” IEEE Access, vol. 10, pp. 22260–22273, 2022, doi: 10.1109/ACCESS.2022.3149482.

P. H. Prastyo, A. S. Sumi, A. W. Dian, and A. E. Permanasari, “Tweets Responding to the Indonesian Government’s Handling of COVID-19: Sentiment Analysis Using SVM with Normalized Poly Kernel,” J. Inf. Syst. Eng. Bus. Intell., vol. 6, no. 2, p. 112, 2020, doi: 10.20473/jisebi.6.2.112-122.


Dimensions, PlumX, and Google Scholar Metrics

10.33650/jeecom.v6i2.9188


Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Yoga Adi Prasetyo

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Creative Commons License
 
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Journal of Electrical Engineering and Computer (JEECOM)
Published by LP3M Nurul Jadid University, Indonesia, Probolinggo, East Java, Indonesia.