IMPLEMENTASI SMOTE UNTUK MENGATASI IMBALANCED DATA PADA SENTIMEN ANALISIS SENTIMEN HOTEL DI NUSA TENGGARA BARAT DENGAN MENGGUNAKAN ALGORITMA SVM

Erry Maricha Oki Nur Haryanto, Adhien Kenya Anima Estetikha, Rahmad Arif Setiawan

Sari


The development of a digital platform that connects all tourism stakeholders in Indonesia has been widely applied, especially for lodging services. Dozens of inns with various facilities offered. The development of the world of machine learning has many researchers regarding sentiment analysis that can be associated with the phenomenon of the increasing tourism industry. Many tourists tend to be confused about finding a hotel or inn that suits what they want. One of them is by reading from the reviews of previous visitors. However, sometimes the many reviews create confusion for tourists. Sentiment analysis is an evaluation to determine a person's sentiments, emotions, expressions, and attitudes and usually uses a dataset in machine learning. This research is an analysis of the Support Vector Machine (SVM) algorithm: Sequential Minimal Optimization (SMO) with Synthetic Minority Over-Sampling Technique (SMOTE) for data classification given Sentiment Analysis dataset from reviews of hotel visitors in West Nusa Tenggara from the traveloka site and the collection process it uses scrapy. By applying the imbalance dataset handling method, it is hoped that a classification model with the SVM algorithm will be more accurate and able to handle biases in the classification results. The results of this study using the SVM algorithm without applying the Synthetic Minority Over-Sampling Technique (SMOTE) get an accuracy of 87.62% and the results using the SVM SMOTE algorithm get an accuracy of 87.99%

Keywords: bias, imbalance dataset, SVM, SMOTE.


Teks Lengkap:

PDF

Referensi


R. N. Chory, M. Nasrun, and C. Setianingsih, “Sentiment analysis on user satisfaction level of mobile data services using Support Vector Machine (SVM) algorithm,” Proc. - 2018 IEEE Int. Conf. Internet Things Intell. Syst. IOTAIS 2018, pp. 194–200, 2019, doi: 10.1109/IOTAIS.2018.8600884.

S. Hassan, M. Rafi, and M. S. Shaikh, “Comparing SVM and Naïve Bayes classifiers for text categorization with Wikitology as knowledge enrichment,” Proc. 14th IEEE Int. Multitopic Conf. 2011, INMIC 2011, pp. 31–34, 2011, doi: 10.1109/INMIC.2011.6151495.

R. Barandela, R. M. Valdovinos, J. Salvador Sánchez, and F. J. Ferri, “The imbalanced training sample problem: under or over sampling?,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 3138, pp. 806–814, 2004, doi: 10.1007/978-3-540-27868-9_88.

M. A. H. Ian H. Witten, Frank Eibe, Data Mining Practical Machine Learning Tools and Techniques. 2008.

V. Aswini and S. K. Lavanya, “Pattern discovery for text mining,” pp. 412–416, 2014, doi: 10.1109/iccpeic.2014.6915399.

A. K. Fauziyyah, “Analisis Sentimen Pandemi Covid19 Pada Streaming Twitter Dengan Text Mining Python,” J. Ilm. SINUS, vol. 18, no. 2, p. 31, 2020, doi: 10.30646/sinus.v18i2.491.

Y. Al Amrani, M. Lazaar, and K. E. El Kadirp, “Random forest and support vector machine based hybrid approach to sentiment analysis,” Procedia Computer Science, vol. 127. pp. 511–520, 2018, doi: 10.1016/j.procs.2018.01.150.

and E. A. G. Haibo He, Member, IEEE, “Learning from imbalanced data,” Proc. - Int. Conf. Tools with Artif. Intell. ICTAI, vol. 2019-Novem, no. 9, pp. 923–930, 2019, doi: 10.1109/ICTAI.2019.00131.

S. T. Jishan, R. I. Rashu, N. Haque, and R. M. Rahman, “Improving accuracy of students’ final grade prediction model using optimal equal width binning and synthetic minority over-sampling technique,” Decis. Anal., vol. 2, no. 1, pp. 1–25, 2015, doi: 10.1186/s40165-014-0010-2.


Refbacks

  • Saat ini tidak ada refbacks.


##submission.license.cc.by-sa4.footer##

Program Studi Teknik Informatika Unversitas Janabadra