PERBANDINGAN ANALISIS DATA FITUR NOMINAL MULTI-KATEGORI MENGGUNAKAN METODE ADAPTIVE SYNTHETIC NOMINAL (ADASYN-N) SERTA ADAPTIVE SYNTHETIC-KNN (ADASYN-KNN)
Sari
ABSTRACT
Growing need for efficient algorithms for data manipulation, analysis, and intelligent use has been a very active research area in machine learning field. However, some research areas still not fully developed, especially when unbalanced data classification is needed. Datasets with this class imbalance occur because of an unbalanced ratio between one case and another. This class imbalance will be detrimental to data mining because machine learning in data mining has difficulty in classifying minority classes (small instances) correctly. There are several approaches to handling imbalances, one of which is by using the original data sampling method. The first sampling method approach to overcome class imbalance is undersampling which is a method to balance classes by randomly reducing the majority class instances. Over-sampling is a method of balancing class distribution by randomly replicating instances in minority classes.
This study presents comparison of over-sampling techniques to overcome problem of class imbalances in datasets with nominal-multi categories features between Adaptive Synthetic-Nominal (ADASYN-N) and Adaptive Synthetic-kNN (ADASYN-KNN) methods. There are seven datasets with nominal-multi categories features which have an unbalanced class distribution. Then the dataset that has been over-sampled with both methods is classified using the Random Forest method. Furthermore, a comparison of the accuracy of the original dataset and the dataset of the ADASYN-N and ADASYN-KNN over-sampling techniques was carried out.
Keywords: ADASYN-KNN, ADASYN-N, class imbalance, nominal, multi-category, over-samplingTeks Lengkap:
PDFReferensi
DAFTAR PUSTAKA
A. Shipitsyn, “Statistical Learning with Imbalanced Data,” Linköping University.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002.
H. He and Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications, 1st ed. Wiley-IEEE Press, 2013.
Y. E. Kurniawati, “Multi-Class Imbalance Learning dengan Adaptive Synthetic – Nominal (ADASYN-N) dan Adaptive Synthetic – KNN (ADASYN-KNN) untuk Resampling Data pada Data Hasil Tes Pap Smear,” Universitas Gadjah Mada, 2017.
H. He, Y. Bai, E. A. Garcia, and S. Li, “Adaptive Synthetic Sampling Approach for Imbalanced Learning,” Int. Jt. Conf. Neural Networks, no. 3, pp. 1322–1328, 2008.
V. García, J. S. Sánchez, and R. A. Mollineda, “On the effectiveness of preprocessing methods when dealing with different levels of class imbalance,” Knowledge-Based Syst., vol. 25, no. 1, pp. 13–21, 2012.
P. J. Huang, “Classification of Imbalanced Data Using Synthetic Over-Sampling Techniques,” University of California, 2015.
Refbacks
- Saat ini tidak ada refbacks.
##submission.license.cc.by-sa4.footer##
Program Studi Teknik Informatika Unversitas Janabadra