Behaviour of Imbalanced Data in Presence of Borderline and Noisy Examples using Hybrid SPIDER2-IPF Boosting Ensemble Method

Neelam Rout, Debahuti Mishra, Manas Kumar Mallick

doi:10.17762/sfs.v10i4S.1302

PDF

Published: Apr 3, 2023

DOI: https://doi.org/10.17762/sfs.v10i4S.1302

Keywords:

Classification, Imbalance Data, Borderline & Noisy Examples, Ensemble Method, SPIDER2-IPF Model, Filter Methods, Boosting Method, Performance Metrics, Wilcoxon Test

Neelam Rout, Debahuti Mishra, Manas Kumar Mallick

Abstract

A crucial task in many different fields is classification. The use of conventional classification algorithms has been limited by issues like class imbalance, overlaps, and noise. When mismatched class distributions are found among the instances of class of classification datasets, an imbalance problem occurs. When datasets are unbalanced and contain noisy and borderline data, classification becomes significantly more difficult. Noisy data are comparable to minority samples, and any method for resolving the class imbalance may focus excessively on the noise, impairing performance. The study suggests using the SPIDER2-IPF Models to handle noisy and borderline situations. An experiment employing the Saturation, PANDA, Classification, ANR, and SPIDER2-IPF filter methods shows the impact of borderline and noisy data from the rare class on the performance of the classifier. To handle noisy and borderline samples, the SPIDER2-IPF model is built using SPIDER2 resampling techniques. In imbalanced datasets, an Iterative-Partitioning Filter (IPF) can reduce noise from both majority and minority classes and address issues brought on by noisy and borderline samples. The results of the SPIDER2-IPF model are 99.51%, 67.78%, 81.39%, 99.36%, 99.63%, and 99.48% for sensitivity, specificity, G-Mean, precision, recall, and F-Measure. According to the results of the experiments, the suggested methods can effectively address the issue of class imbalance with noisy and borderline challenges. The Wilcoxon test results show that the suggested method worked effectively with unbalanced data. Finally, the suggested solutions can successfully address this class of issues.

Issue

Vol. 10 No. 4S (2023): Special Issue 4

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details