Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Design and Implementation of an Enhanced Classifier for

Software Defect Detection Prediction

Statement of the Problem

Software test automation has undeniably improved the entire software test process bringing

speed, accuracy and wider test coverage, consistency, cost saving and provides for frequent

and thorough testing. However, there has been a consistent problem of class imbalance,

presence of null values and high dimensionality of defects in existing data sets which can

impact on the classification of defects and result in wrong prediction of the model. Most

existing defect detection models focus more on the model’s accuracy and performance rather

than the actual defect return rate and this in turn neglects the initial purpose of the model in

identifying high rate of possible defects in the software.

Defect return rate among existing models is not optimal. This study attempts to develop a

more efficient classifier and apply a suitable boosting technique, class balancing and

dropping of Null values to boost the resultant algorithm to obtain a better prediction model

across all metrics identified for evaluation.

1.3 Aim and Objectives of Study

This research aims to address the challenges of software defect prediction by designing and

implementing a modified classification model that can overcome issues of class imbalance,

null values, overlapping and high dimensionality.


To achieve the aim of this research, the following objectives are essential to justify this

research:

1. Identify the challenges of class imbalance, null values, overlapping and high

dimensionality in software defect prediction.

2. Implement suitable class balancing, null value dropping and high dimensionality

reduction techniques. design a boosting algorithm to be applied to our normalized

classifier.

3. Implement a normalization technique to enhance our proposed prediction model.

4. Implement a modified boosting algorithm to our normalised model.

5. Compare the performance of the modified models with existing models to determine

their effectiveness in addressing the identified challenges.


The output from deploying our model on the Jm1data set is displayed above showing an

average of over 80% across all the metrics and 86% overall performance which makes it a

good classifier. Only Random Forest bettered our classifier on this data set.

Comparison with Existing Literature:

You might also like