Optimization of software defects prediction in imbalanced class using a combination of resampling methods with support vector machine and logistic regression

Ustyannie, Windyaning (2021) Optimization of software defects prediction in imbalanced class using a combination of resampling methods with support vector machine and logistic regression. Jurnal INFOTEL (Informatics, Telecommunication, and Electronics), 13 (4). pp. 176-184. ISSN e-ISSN: 2460-0997, p-ISSN: 2085-3688

[img] Text (peer review)
peer review.pdf - Accepted Version

Download (916kB)

Abstract

The main problem in producing high accuracy software defect prediction is if the data set has an imbalance class and dichotomous characteristics. The imbalanced class problem can be solved using a data level approach, such as resampling methods. While the problem of software defects predicting if the data set has dichotomous characteristics can be approached using the classification method. This study aimed to analyze the performance of the proposed software defect prediction method to identify the best combination of resampling methods with the appropriate classification method to provide the highest accuracy. The combination of the proposed methods first is the resampling process using oversampling, under-sampling, or hybrid methods. The second process uses the classification method, namely the Support Vector Machine (SVM) algorithm and the Logistic Regression (LR) algorithm. The proposed, tested model uses five NASA MDP data sets with the same number attributes of 37. Based on the t-test, the < = 0.0344 < 0.05 and the > = 3.1524 > 2.7765 which indicates that the combination of the proposed methods is suitable for classifying imbalanced class. The performance of the classification algorithm has also improved with the use of the resampling process. The average increase in AUC values using the resampling in the SVM algorithm is 17.19%, and the LR algorithm is at 7.26% compared to without the resampling process. Combining the three resampling methods with the SVM algorithm and the LR algorithm shows that the best combining method is the oversampling method with the SVM algorithm to software defects prediction in imbalanced class with an average accuracy value of 84.02% and AUC 91.65%

Item Type: Article
Uncontrolled Keywords: defect prediction, imbalanced class, logistic regression, resampling, support vector machine
Subjects: Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources
Depositing User: Windyaning Ustyannie
Date Deposited: 09 Aug 2022 01:46
Last Modified: 28 Dec 2022 02:38
URI: http://eprints.akprind.ac.id/id/eprint/1103

Actions (login required)

View Item View Item