Classification of Music for Study Based on Spotify Audio Features Using Random Forest with Feature Importance Analysis and Reduction

DOI: https://doi.org/10.33650/jeecom.v8i1.13200
Authors

(1) * Laksmita Dewi Supraba   (Universitas Amikom Yogyakarta)  
        Indonesia
(2)  Andi Sunyoto   (Universitas Amikom Yogyakarta)  
        Indonesia
(*) Corresponding Author

Abstract


Music has a significant impact on the way a person thinks and feels in their daily activities. This study aims to categorize the types of music that are suitable for learning activities by using Spotify's audio feature, to create a more flexible and personalized music recommendation system. The dataset used comes from Spotify Study Music which consists of 172,819 songs with 12 audio features, which are grouped into three main categories, namely Pop tracks, Classical soundtracks, and Lo-fi tracks. The research process includes data pre-processing, handling class imbalances using SMOTE, data normalization, feature significance Analysis, Cross Validation, and feature reduction. Normalization results show that all features have been in the range of 0.0-1.0 without changing the characteristics of the original distribution. The Random Forest Model performed exceptionally well with an average accuracy rate of 99% on cross-validation and 99.9% on training data, indicating the model's ability to efficiently recognize musical patterns. Important Feature Analysis shows that energy, loudness, acousticness, instrumentalness, and liveness have the most significant influence in distinguishing music characteristics for learning, while mode, popularity, duration_ms, and danceability when removed using Feature Reduction analysis show a significant decrease in accuracy. This study recommends maintaining the features of acousticness, instrumentalness, and liveness because it plays an important role in maintaining the stability and accuracy of music classification models that support the learning process.



Keywords

Spotify Study Music, SMOTE, Feature Importance, Feature Reduction, Random Forest, Classification.



Full Text: PDF



References


D. Wulan et al., “Penggunaan Seni Musik dalam Mendukung Perkembangan Kognitif dan Emosional Siswa SD,” 2023. doi: 10.69688/jpip.v1i2.15.

K. B. Batt-Rawden, “The benefits of self-selected music on health and well-being,” Arts in Psychotherapy, vol. 37, no. 4, pp. 301–310, Sep. 2010, doi: 10.1016/j.aip.2010.05.005.

A. G. Jondya and B. H. Iswanto, “Indonesian’s Traditional Music Clustering Based on Audio Features,” in Procedia Computer Science, Elsevier B.V., 2017, pp. 174–181. doi: 10.1016/j.procs.2017.10.019.

M. J. Pachali and H. Datta, “What Drives Demand for Playlists on Spotify?,” Marketing Science, vol. 44, no. 1, pp. 54–64, Jan. 2025, doi: 10.1287/mksc.2022.0273.

R. Jane Scarratt, J. Stupacher, P. Vuust, and K. Vibe Jespersen, “Motivations for using music for sleep: insights from YouTube comments,” 2024. doi: 10.31234/osf.io/28um6.

K. Jacobson, V. Murali, E. Newett, B. Whitman, and R. Yon, “Music Personalization at Spotify,” Association for Computing Machinery (ACM), Sep. 2016, pp. 373–373. doi: 10.1145/2959100.2959120.

B. Zhang et al., “Understanding user behavior in Spotify,” in Proceedings - IEEE INFOCOM, 2013, pp. 220–224. doi: 10.1109/INFCOM.2013.6566767.

J. Shen and G. Xiao, “Music Genre Classification Based on Functional Data Analysis,” IEEE Access, pp. 1–1, 2024, doi: 10.1109/ACCESS.2024.3512874.

R. J. Scarratt, O. A. Heggli, P. Vuust, and K. Vibe Jespersen, “The music that people use to sleep: universal and subgroup characteristics.” doi: 10.31234/osf.io/5mbyv.

R. J. Scarratt, O. A. Heggli, P. Vuust, and M. Sadakata, “Music that is used while studying and music that is used for sleep share similar musical features, genres and subgroups,” Sci Rep, vol. 13, no. 1, Dec. 2023, doi: 10.1038/s41598-023-31692-8.

K. M. Hasib et al., “A Survey of Methods for Managing the Classification and Solution of Data Imbalance Problem,” Journal of Computer Science, vol. 16, no. 11, pp. 1546–1557, 2020, doi: 10.3844/JCSSP.2020.1546.1557.

H. S. Saragih, “Predicting Song Popularity Based on Spotify’s Audio Features: Insights from The Indonesian streaming users,” Journal of Management Analytics, vol. 10, no. 4, pp. 693–709, 2023, doi: 10.1080/23270012.2023.2239824.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002, doi: 10.1613/jair.953.

A. A. Maitsa and N. A. S. Winarsih, “Comparison of Hyperparameter Tuning in Decision Tree and Random Forest Algorithms for Song Genre Classification,” 2025. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC

J. Prasetya and A. Abdurakhman, “Comparison Of Smote Random Forest And Smote K-Nearest Neighbors Classification Analysis On Imbalanced Data,” MEDIA STATISTIKA, vol. 15, no. 2, pp. 198–208, Apr. 2023, doi: 10.14710/medstat.15.2.198-208.

Bernard Ginn Jr and Sejoniq Johnson, “Evaluating the Efficacy of Random Forest in Mood Classification for Music Recommendation System,” 2024, doi: DOI:10.13140/RG.2.2.22524.04485.

K. A. Nida Qurrota and R. Wahyuni, “The Use Of Songs On The Spotify App to Improve Students Listening Ability,” English Journal Antartika, vol. 2, no. 2, 2024, [Online]. Available: https://ejournal.mediaantartika.id/index.php/eja/index

L. Alzubaidi et al., “A Survey On Deep Learning Tools Dealing With Data Scarcity: Definitions, Challenges, Solutions, Tips, And Applications,” J Big Data, vol. 10, no. 1, Dec. 2023, doi: 10.1186/s40537-023-00727-2.

Agung Teguh Wibowo Amais and Adi Susilo, “Principal Component Analysis-Based Data Clustering for Labeling of Level Damage Sector in Post-Natural Disasters,” vol. XX, 2017, doi: 10.1109/ACCESS.2023.3275852.

A. Nurhopipah and U. Hasanah, “Dataset Splitting Techniques Comparison For Face Classification on CCTV Images,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 14, no. 4, p. 341, Oct. 2020, doi: 10.22146/ijccs.58092.

M. Shantal, Z. Othman, and A. A. Bakar, “Impact of Missing Data on Correlation Coefficient Values: Deletion and Imputation Methods for Data Preparation,” Malaysian Journal of Fundamental and Applied Sciences, vol. 19, no. 6, pp. 1052–1067, Nov. 2023, doi: 10.11113/mjfas.v19n6.3098.

G. A. B. Suryanegara, Adiwijaya, and M. D. Purbolaksono, “Improved Classification Results in the Random Forest Algorithm for Detection of Diabetes Patients Using the Normalization Method,” Jurnal RESTI, vol. 5, no. 1, pp. 114–122, Mar. 2021, doi: 10.29207/resti.v5i1.2880.

X. Yuan, S. Liu, W. Feng, and G. Dauphin, “Feature Importance Ranking of Random Forest-Based End-to-End Learning Algorithm,” Remote Sens (Basel), vol. 15, no. 21, Nov. 2023, doi: 10.3390/rs15215203.

H. Khabiri, M. N. Talebi, M. F. Kamran, S. Akbari, F. Zarrin, and F. Mohandesi, “Music-Induced Emotion Recognition Based on Feature Reduction Using PCA From EEG Signals,” Frontiers in Biomedical Technologies, vol. 11, no. 1, pp. 59–68, Dec. 2024, doi: 10.18502/fbt.v11i1.14512.

G. Mostafa, H. Mahmoud, T. Abd El-Hafeez, and M. E. ElAraby, “Feature Reduction For Hepatocellular Carcinoma Prediction Using Machine Learning Algorithms,” J Big Data, vol. 11, no. 1, Dec. 2024, doi: 10.1186/s40537-024-00944-3.

C. Cynthia, D. Ghosh, and G. K. Kamath, “Detection of DDoS Attacks Using SHAP-Based Feature Reduction,” International Journal of Machine Learning, vol. 13, no. 4, pp. 173–180, 2023, doi: 10.18178/ijml.2023.13.4.1147.

T. Mahmood, M. Usman, and C. Conrad, “Selecting Relevant Features for Random Forest-Based Crop Type Classifications by Spatial Assessments of Backward Feature Reduction,” PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, vol. 93, no. 2, pp. 173–196, Apr. 2025, doi: 10.1007/s41064-024-00329-4.

J. Elektronik, I. K. Udayana, I. L. Simarmata, W. Supriana, and S. Kuta, “Music Genre Classification Using Random Forest Model,” vol. 12, no. 1, pp. 2654–5101, 2023.

G. Biau and E. Scornet, “A Random Forest Guided Tour,” Nov. 2015, [Online]. Available: http://arxiv.org/abs/1511.05741

H. A. Salman, A. Kalakech, and A. Steiti, “Random Forest Algorithm Overview,” Babylonian Journal of Machine Learning, vol. 2024, pp. 69–79, Jun. 2024, doi: 10.58496/bjml/2024/007.

A. A. Maitsa and N. A. S. Winarsih, “Comparison of Hyperparameter Tuning in Decision Tree and Random Forest Algorithms for Song Genre Classification,” Aug. 2025. doi: https://doi.org/10.30871/jaic.v9i4.10142.


Dimensions, PlumX, and Google Scholar Metrics

10.33650/jeecom.v8i1.13200


Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Laksmita Dewi Supraba

 
This work is licensed under a Creative Commons Attribution License (CC BY-SA 4.0)

Journal of Electrical Engineering and Computer (JEECOM)
Published by LP3M Nurul Jadid University, Indonesia, Probolinggo, East Java, Indonesia.