Software Fault Prediction Using Deep Learning Algorithms Published

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/337300141
Software Fault Prediction Using Deep Learning Algorithms
Article in International Journal of Open Source Software and Processes · October 2019

DOI: 10.4018/IJOSSP.2019100101
CITATIONS READS
18 2,633
2 authors:
Osama Alqasem Mohammed Akour

Yarmouk University Prince Sultan University
7 PUBLICATIONS 97 CITATIONS 83 PUBLICATIONS 553 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Algorithms View project
Software Fault Predication View project
All content following this page was uploaded by Mohammed Akour on 08 May 2020.
The user has requested enhancement of the downloaded file.

International Journal of Open Source Software and Processes
Volume 10 • Issue 4 • October-December 2019
Software Fault Prediction Using

Deep Learning Algorithms
Osama Al Qasem, Yarmouk University, Irbid, Jordan
Mohammed Akour, Al Yamamah University, Riyadh, Saudi Arabia & Yarmouk University, Irbid, Jordan
ABSTRACT
Software faults prediction (SFP) processes can be used for detecting faulty constructs at early stages of
the development lifecycle, in addition to its being used in several phases of the development process.
Machine learning (ML) is widely used in this area. One of the most promising subsets from ML is
deep learning that achieves remarkable performance in various areas. Two deep learning algorithms
are used in this paper, the Multi-layer perceptrons (MLPs) and Convolutional Neural Network (CNN).
In order to evaluate the studied algorithms, four commonly used datasets from NASA are used i.e.
(PC1, KC1, KC2 and CM1). The experiment results show how the CNN algorithm achieves prediction
superiority of the MLP algorithm. The accuracy and detection rate measurements when using CNN
has reached the standard ratio respectively as follows: PC1 97.7% - 73.9%, KC1 100% - 100%, KC2
99.3% - 99.2% and CM1 97.3% - 82.3%. This study provides promising results in using the deep
learning for software fault prediction research.
Keywords
Convolutional Neural Network (CNN), Deep Learning Algorithms, Fault Prediction, Machine Learning (ML),
Multi-Layer Perceptrons (MLP)
1. INTRODUCTION
Software testing and quality assurance activities are performed to discover faults in the software prior
to its delivery to the customer. During design or development phases some faults might be passed
through to the next level without detecting and fixing them, so it is important to predict not just detect
the occurrence of the fault as a pre-activity to prevent the fault before it becomes real. Software fault
prediction (SFP) `is one of the quality models that help to conduct the software development life cycle
healthily. To reduce the software failure, fault prediction aids in planning, controlling and executing
software development activities, classify any software module as faulty or non- faulty which serve as
a shield that might avoid any later unforeseen risk, which leads eventually to increase the efficiency
and effectiveness of software Rana, R. (2015). The SFP also reduce the costs, time, and effort should
be spent on software Rana, R. (2015).
The evolution of software engineering has led to the development of many fault prediction
algorithms using machine learning (supervised and unsupervised learning) techniques. Different
fault prediction algorithms utilize diverse metrics to measure the performance of these algorithms
like accuracy, error rate, precision, time is taken, recall, F-measure, etc.
Previous SFP models use ML techniques to predict the faults of software. The optimal faults
prediction model to be succeed and be more accurate in predicting the faults hinges on three factors
that are a selection of data have more faulty units that be used for identifying the relationship between
DOI: 10.4018/IJOSSP.2019100101

Copyright © 2019, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

1
software code metrics and faults; selection of quality metrics, and applying feature selection that
reduce the time; and selecting the appropriate techniques that known (Nisa & Ahsan, 2015).
The literature is rich with many perdition models that used for SFP. Some of the frequently used
ML algorithms include Decision Tree, Naïve Bayes classification, and neural network. Using these
algorithms has offered improvements in prediction accuracy. However, it has limitations as expert
knowledge is needed to process data, the comparatively high level of human expert interaction required,
and a massive amount of training data is required for operation (considered time overheads), which
can become challenging in a dynamic environment (Zhao, Yan, Chen, Mao, Wang, & Gao, 2019).
To address the above restrictions, the area of research that currently concerns across multiple
domains is deep learning. Deep Learning is a set of techniques used for learning from multi-layers
in Neural Networks; it is a subfield of machine learning that uses supervised and/or unsupervised
strategies. This has magnificent success in various domains such as computer vision, bioinformatics,
etc. (Peng, Jiang, Wang, Alwageed, & Yao, 2017). Deep learning lets computational models that are
made from multi layers to learn representations of data with multiple levels of abstraction (LeCun,
Bengio, & Hinton, 2015), It automatically extracts essential features from raw data and makes it robust,
with respect to variations in input (Paul, & Singh, 2015). It attempts to learn on multi-layer where
each layer is defined from a lower layer by using a set of algorithms. The essential example of a deep
learning model is the deep feedforward network or multi-layer perceptrons (MLPs). Generally, the
usefulness of deep learning becomes better when the amount of training data increases. As a result,
the capabilities to solve the complicated applications and its accuracy continually increase (Bengio,
Goodfellow & Courville, 2017).
Deep learning overcomes the AI community by making improvements to solve problems that
have been resisted for many years. It introduces more successes in the future because it requires very
little engineering by hand. The new learning algorithms and architectures that are currently being
developed depend on deep learning (LeCun, Bengio, & Hinton, 2015). Some of these techniques
are Deep Belief Network (DBN), Convolutional Neural Network (CNN), Autoencoder (AE) and
Multilayer Perceptrons (MLPs).
The main objective behind this research work is to implement different deep learning algorithms
for SFP purposes and then investigate the performance of the application of studied deep learning
approaches in this work over traditional approaches. Authors try to explicitly compare the evaluation
results of MLP and CNN with several previously studied approach.
2. BACKGROUND
Many researchers studied the effectiveness of applying the traditional machine learning algorithms
in the subject of the SFP like, neural networks (NN) (BISI, & GOYAL, 2015), (Mundada, Murade,
Vaidya & Swathi, 2016). (Pahal, & Chillar. 2017), association rules (Ma, Dejaeger, Vanthienen &
Baesens, 2011), k-Nearest Neighbours, Naïve Bayes (Dwivedi, & Kumar, 2016), etc. Most of these ML
algorithms achieved satisfactory results with good classification accuracy. However, these algorithms
had some limitations such as it cannot manage imprecise information(ANN), not suitable for a large
number of software metrics(SVM), Construction of decision tree is complex, Require Continues
value of software metrics (association rule), Dataset should be unlabeled (clustering) (Paramshetti &
Phalke, 2014). Applying these algorithms requires a lot of domain expertise and feature engineering.
A profound exploratory data analysis is usually performed on the dataset, trailed by dimension
reduction techniques. At long last, the most representative features are passed to the ML algorithm.
The different knowledge base of the various domains and applications can be different from each
other require specialists for each domain, and this is reflected in the difficulty in the appropriate
extraction of features and the ability to transfer ML models trained in one area and generalized to other
areas(Zhang, Wang & Habetler, 2019). To meet some of these limitations and achieve an even better
performance, deep learning-based methods are becoming increasingly popular to meet this demand.
2
The advantages of deep learning compared to machine learning are Deep Learning has the
best-in-class performance, have the ability for automatic feature extraction and eliminates the stage
of feature engineering, and easily generalized the trained model to other domains (Zhang, Wang &
Habetler, 2019).
Deep learning is a rapidly growing research topic where there are many deep learning architectures,
and new models are being developed to suit different research areas. The community is quite open
and there are a number of good quality books that give us useful information. However, only brief
information is provided on some of the deep learning techniques that have been applied for SFP. Most
of the deep learning algorithms used in the SFP are used in pre-processing steps and are not used for
classification, for example (Tong, Liu & Wang, 2018)used stacked denoising auto encoders to extract
the deep representations from the traditional software metrics and next used two-stage ensemble to
address the class imbalance problem, (Yang, Lo, Xia, Zhang & Sun, 2015) leverage a deep belief
network algorithm to build a set of expressive features from a set of initial change features and used
machine learning classifier, and (Li, He, Zhu & Lyu, 2017) use CNN to automatically learn semantic
and structural features of programs. Given the ability of deep learning to process large-scale data,
learn multi-level and hierarchical presentation (Zhao et al., 2019). Deep learning can be a powerful
tool for SFP. In our paper, authors used Multilayer Perceptrons and Convolutional Neural Networks
algorithms for classification.
2.1. Multilayer Perceptrons (MLPs)

Perceptrons (MLPs): is the essential deep learning model. Extending Artificial NNs with many layers,
Artificial Neural Networks with multi-hidden layers called deep neural networks have become popular
because leading to amazing performance gains on difficult learning tasks and success in various
machine learning projects. So, the deep neural network is becoming favoured over shallow networks.
The deep neural network has hundreds of hyper-parameters and complex topologies. Moreover,
the choice of design is very important; most of the time, success count on finding the right architecture
for the problem. Recently the developer focused on designing distinct architectures on new problems.
When the numbers of layers and units are increased, it represents functions with higher complexity,
where neurons receive as input the neuron activations from the previous layer and perform a simple
computation.
MLPs adjusts its structure based on external or internal information, so it is an adaptive system.
it used to detect patterns in data or to model complex relationships between inputs and outputs.
It applied in various domains with considerable attainment, and effectively applied in the area of
prediction (Naser, 2012).
2.2. Convolutional Neural Networks (CNNs)

CNN is a particular type of neural network for processing data and employs a mathematical operation
called convolution (Bengio, Goodfellow & Courville, 2017). It is a type of feed-forward neural network
(Yang, Chen, Yan, Zhao, & Fan, 2017).
The CNN consists of three types of layers: convolutional layer, pooling layer, and fully-connected
layer (Yang, Chen, Yan, Zhao, & Fan, 2017) and many layers, followed by fully connected layers
as in a NN. A convolutional layer is considered the core of a CNN. It contains a group of learnable
kernels that convolved across the dimensions (width and the height) of the input features during the
forward pass, producing a 2-D activation map of the kernel. The weights of the convolutional layer
are used for feature extraction while the fully-connected layer is used for classification.
In this work, authors aim to investigate the capability of MLPs and CNN algorithms for SFP.
The used algorithms have been implemented by python programing language. NASA dataset has
been used (Promise, 2019) for training and testing purposes, the dataset detail has been provided
in the forthcoming sections. The adopted approach, based on modifying the structure of the neural
network, in this study was changed the number of layers and neurons in the layers, and neural network
3
parameters until it reached the desired results. Initially, MLPs were applied, and the neural network
was reconstructed repeatedly to achieve the best result. After that, the CNN algorithm is applied with
the same procedures mentioned above until reaching the required competitive result.
The benefits of using CNN ability for automatic feature extraction, parameters of the classifier
and the feature learner are trained during back-propagation process, and introduction of pooling
layers, this was observed to be valuable in computer vision applications (Paramshetti & Phalke, 2014).
Convolutional networks have proven to perform well on various tasks such as handwriting number
classification and face detection (Zeiler & Fergus, 2014). The promised performance encourages us
to utilize it in the SFP.
3. LITERATURE REVIEW
It is necessary to explore relevant research to ensure understanding of the aspects of SFP. There is
a significant body of research that tackles the various prediction algorithms for SFP. This section
focuses on publications that followed the machine language approaches, neural network model, and
deep leaning approaches.
Machine learning approach can deal with both the supervised and unsupervised data. For the
following researches present the classification models used to predict the fault in supervised data.
For each of them a new approach is proposed and is compared with other classification method using
different datasets (Ma, Dejaeger, Vanthienen & Baesens 2011) evaluates the use of a classification
method based on association rules (CBA2) that compared it to RIPPER and C4.5 rule classifiers.
It is applied to over 12 datasets from NASA Metrics Data Program (MDP) datasets. Also, the rule
sets produced on one dataset and one software project can be used to predict defective in analogous
software projects and applicable to other datasets. After applying CBA2 algorithm, it gave the best
performance in sensitivity and AUC. While (Dwivedi, & Kumar, 2016) developed SFP model by
several data mining classification and prediction techniques, Neural Network, k-Nearest Neighbours,
Naïve Bayes have been analysed and compared. They used PROMISE Repository dataset. This paper
present two SFP models using different algorithms developed using Rapid Miner tool. These models
are MODEL-I that it is training done by NN classifier and MODEL-II. Its training includes stacking
meta-learner, after experiment the result show the better efficiency to predict the fault it was for the
MODEL-I and has 91.54% prediction accuracy. For the unsupervised data the next technique uses a
clustering model to predict the fault. (Kaur, Kaur, Gulati & Aggarwal, 2010) suggest new algorithm
and evaluate it, such as K-Sorensen-means clustering that is a new K-means clustering algorithm for
SFP, uses Sorensen distance for calculating cluster distance. The proposed approach is applied to three
NASA-MDP datasets (JM1, PC1 and CM1). When implemented by MATLAB, the results show the
advantage for K-Sorensen-means clustering with comparison to K-Canberra means.
(Pahal, & Chillar, 2017) Suggest a hybrid model using artificial neural network (ANN) and
Simplified Swarm Optimization (SSO) for SFP. ANN is used for categorizing the software modules
into fault-prone and non-fault-prone. SSO is used to reduce dimensionality of dataset. Four datasets
from the NASA repository have been used. This hybrid model has excellent performance if compared
with other prediction methodologies and has been verified to be more operational for finding
association between fault and software metrics. (Babu & Babu, 2015) Investigate this by using neural
network approach for defect prediction in software development. They used Fuzzy neural networks to
deploy a mechanism for tracking the reliability of the software. They depend on Fuzzy rules and the
inputs as number of defects per kilo line of codes. They used the KC1 dataset to predict the software
defects based on the popular neural algorithms (RTC, RC, and CART). They found the performance
of the neural network algorithms to have improved with the pre-processed and normalized datasets.
Mundada et al. (2016) Used back propagation learning algorithm in suggested prediction ANN
model that start with giving an initial weight. It updates during learning phase by gradient descent
method. Then the data is trained by resilient back propagation algorithm. It uses JM1/software
4
dataset, Python programming language, and Numpy and neurolab framework for implementation of
the neural networks. It was observed that the ANN model performs better in terms of error prediction
in comparison with other analytical models.
Another proposed reduction dimensionality is suggested by (BISI & GOYAL, 2015), they
used an ANN model with Sensitivity Analysis (SA-ANN) to find out the best variables for getting
output. Principal Component Analysis (PCA-ANN) was also used. Dimension data is scaled using
logarithmic function, the training and prediction by the ANN model. These approaches are applied
on four publicly available datasets (CM1, PC1, KC1 and KC2). The result present that the accuracy
of PCA-ANN is higher than the SA-ANN approach.
Deep learning can be used as pre-training process in software fault prediction. (Li, He, Zhu &
Lyu, 2017) use CNN to automatically learn semantic and structural features of programs and applied
Logistic Regression as the final classifier (Owhadi-Kareshk, Sedaghat & Akbarzadeh-T, 2017)
proposed to use a pre-training technique for a shallow ANN. A Denoising AutoEncoder (DAE) is used
for this purpose. The model starts the training procedure from the weights and bias of a trained DAE.
The execution of experiment is made on public seven NASA datasets. They compare the suggested
model (Pre-ANN) with SVM, ANN, Principal Component Analysis (PCA)-SVM, Kernel PCA-SVM
and AE-SVM. The model was implemented in MATLAB. 5-fold cross-validation was used to ensure
the robustness of experiments. Results confirm that the pre-training improves accuracy. It has higher
accuracy in four from seven dataset.
The result in previous work demonstrates the feasibility of deep learning techniques in the filed
of program analysis. It is promising to adapt deep learning in other software engineering research.
4. RESEARCH METHODOLOGY
This section describes the methodology steps required to use MLPs and CNN for software fault
prediction using the NASA dataset, start with selected four dataset with various fault percentage then
make normalization for these datasets as pre-processing step, after that the dataset divided between
training and testing folds. The MLPs is applied initially and modify its topology and parameter,
then measure the performance for it to finally compare the results to find best result achieved; same
steps have been repeated for CNN. Figure 1 summarizes these steps. The MLP and CNN algorithms
are implemented using python 3.6 language based on many libraries to perform the experiments.
Thereupon each step is discussed in greater detail in the subsequent sections.
4.1. Selected Dataset

The datasets are selected from the NASA Metrics Data Program, include software measurement
data and associated error data collected. Each instance of these datasets is a program module. The
quality of a module is described based on its Error Rate, number of faults in the module, indicating
whether the module is faulty or not. The number of faulty modules is typically outnumbered by the
non-faulty ones. The analyses of this thesis use the features of four datasets to validate proposed
approach. The datasets characteristics are presented in Table 1. As can be seen from Table 1, the
smallest dataset contains 498 instances whereas the largest dataset contains 2109 instances. Each
instance refers to a single software subroutine, function, or method characterized by Lines of Code
(LOC) based metrics, Halstead metrics, McCabe metrics, base Halstead measures, derived Halstead
measures, and a branch-count, The dataset characteristics in Table 1.
4.2. Pre-Processing
Normalization is used with numerical attributes that can find new range from an existing range based
on equation. It is useful for prediction process as it enhances the performance of classifiers. It is a
method used to standardize the range of independent variables or features of data. Authors applied
5
Figure 1. Methodology
Table 1. Dataset characteristics
Dataset Attribute Language Instances Faulty Percentage

CM1 22 C 498 49 9.83%
KC1 22 C++ 2109 326 15.45%
KC2 22 C++ 522 105 20.50%
PC1 22 C 1109 77 6.94% 1109
the standardization method such that the attributes preserve the normal distribution. The following
equation represents the Standardization formula:
X i − min (x )
X i' = (1)
max(x )− min (x )
X i' : rescaled value
X i : orginal value
max(x ): maximum value in
feuture
min (x ) : minimum value in
feuture
4.3. Applying MLPs

MLPs is a neural networks with multi-hidden layers, have an input layer, many hidden layers, and
one output layer. The attributes form the inputs to the network, this input layer are weighted and fed
simultaneously to 1st hidden layer. The outputs of the 1st hidden layer form the input to next hidden
layer, and so on until last hidden layer. The weighted outputs of the last hidden layer are input to the
output layer, this input release the network’s prediction for given tuples. When the network is fully
connected, that means each neuron provides input to each neuron in the next layer (Han & Kamber,
2012).
6
Forward propagation used to accept an input x and produce an output y, the inputs x propagates
up to the hidden neurons at each layer to finally produces y. Back-propagation allows flow backwards
through the network, in order to compute the gradient, whilst other algorithm, is used to perform
learning using this gradient (Bengio, Goodfellow, & Courville, 2017).
Back-propagation is used to adjustment of the weights in order to reduce the mean error between
actual and prediction value, this adjustment is made from the output layer to the first hidden layer
(Han & Kamber, 2012).
The proposed architecture for SFP is displayed in Fig 2. It contains an input layer, hidden layers,
and an output layer. A hidden layer contains three to five layers, an input layer (contains the inputs
features of the network) takes data of shape 21*1 and passes to 1st hidden layer That receives the
weighted inputs and sends data from the previous layer to the next hidden layer. Several activation
functions are compared for each hidden layer. For output layer sigmoid have been used.
In this paper, MLPs is used with (3-5) hidden layers. The network will be trained using the
adagrad optimizer. The network will be trained for (1000-20000) epochs and batch size (5 to 20).
4.4. Applying CNN

In general, CNN consists of convolutional and pooling (subsampling) layers and follow by one or more
fully connected layers, these layers stacked on top of each other to form a deep model. Convolutional
Layers learn the feature representations of their input. The neurons are arranged into feature maps.
Each neuron in a feature map is connected to a neighborhood of neurons in the previous layer via a
set of trainable weights, occasionally used as a filter bank. Inputs are convolved with the weights to
compute a new feature map, and the results are sent through activation function. All the neurons in
the same maps have equally constrained weights; however, the maps of the various features within
the same convolutional layer have different weights so that many features can be extracted at each
location (LeCun, Bengio & Hinton, 2015).
The Pooling layers are used to reduce the spatial resolution of the feature maps, thus achieving
spatial stability for the introduction of distortions and translations. Initially, it was common to use
the average pooling aggregation layers to spread the average of all inputs values to the next layer. For
Figure 2. MLPs architecture
7
recent models, the max pooling aggregation layers are used to propagate the maximum value within
a receptive field to the next layer.
The fully connected layers follow several convolutional and pooling layers, used to perform the
function of high-level reasoning and interpret the representation of features.
At the beginning, CNN was used to process the images using two-dimensional (Convolution,
pooling) layers and fully connected layer. Then was applied to natural language processing using one
-dimensional (Convolution, pooling) layers and fully connected layer.
Our CNN is composed of Convolution one -dimensional layer, pooling one -dimensional layer,
fully connected layer and activation function.
The proposed architecture for SFP is displayed in Figure 3. It contains an input layer, a hidden
layer, and an output layer. A hidden layer contains one or more CNN layer followed output layer. An
input layer takes data of shape 22*1 and passes to CNN. A CNN constructs a tensor of shape 22*64
passes to max-pooling layer. It reduces the tensor shape into 11*64. a tensor of shape 2nd layer 11*128
passes to 2nd max-pooling layer, it reduces the tensor shape into 5*128. flattening transformed two-
dimension matrix features (5*128) into a vector (650). To fed dens layer (fully connected layer).
Finally, the sigmoid activation function is used to classify the outputs.
4.5. Modifying the Model

The settings which have to be defined for the network include the following: activation function,
number of layers in each layer and hyper-parameter.
4.5.1. Hyper-Parameter Tuning

Tuning the hyper-parameter is an important to choose the correct settings to obtain desired results,
which frequently depend upon experience rather than theoretical knowledge. Trade-offs is intrinsic
in the parameter selection due to the restrictions such as memory limitation (Snuverink, 2017).
Number of epochs: an epoch is the number of passes through the data set. It encompasses one
forward and backward pass during training (Snuverink, 2017). In the experiments the number of
epochs changed, start from 1000 up to 20000 epochs.
Batch size: is the number of training samples that is used in one epoch and will be propagated
through the network. Dropout rate: Dropout is a technique used to prevent over-fitting and provide a
way of approximately combining exponentially many different neural network architectures efficiently.
Optimizer function: is used to update weights in back propagation stage to reduce the error (Bengio,
et al. 2017). In the experiments the Batch size changed, start from 5 to20.
4.5.2. The Number of Layer

The number of layers becomes the most important criterion in the architecture of the networks (Shafi,
Ahmad, Shah & Kashif, 2006). In the experiments the number of layers changed, starting from 1 to 5.
Figure 3. CNN architecture
8
4.5.3. Activation Function

Activation functions are an active search field, enabling the training of DNN in a fast and accurate
manner. The common activation functions that are used are Sigmoid, Rectified Linear Unit and
hyperbolic tangent. In this paper, authors used more than one activate function to compare among
them and determine the most suitable for:
• Sigmoid: A real variable function that is specified and defined to all values of real input and has
positive derivatives anywhere. The Sigmoid function is defined by the formula:
1
S (x ) = (2)
1 + e −1
• Hyperbolic Tangent (Tanh): Defined as the ratio between the hyperbolic sine and the cosine
functions, or the ratio of the half difference and half-sum of two exponential functions in the
points x and –x. Tanh range outputs between -1 and 1 (Karlik, & Olgac, 2011).
The Tanh function defined by the formula:
e x − e −x
Tanh (x ) = (3)
e x + e −x
• Rectified linear units (ReLUs): Became common, because it accelerates significantly the
convergence of Stochastic Gradient Descent (SGD), it requires less computing power and much
simpler compared to Tanh or Sigmoid activation functions (Grinciūnaitė, 2016).
A comparison between deep learning algorithms performance is conducted in terms of

classification accuracy. In addition to this, we are intending to use True Negative Rate (TNR), Detection
rate that is computed by considering the positive and negative prediction of objects (Table 2).
• Detection rate: It is the proportion of positive cases that were correctly identified. Describe if
a class has a fault, how often the fault will be positive (True positive rate):
Detection rate = TP/ (TP+FN)
• Accuracy: The closeness of a measured value to the true value, is the proportion of the total
number of predictions that were correct:
Table 2. Confusion matrix
Actual Prediction
Positive Negative
Positive True Positive (TP) False Negative (FN)
Negative False Positive (FP) True Negative (TN)
9
Accuracy = (TP+TN) / (TP+TN+FP+TN)
• True Negative Rate (TNR): The ability of the test to identify correctly those does not have
the faults:
TNR = TN/ (TN+FP)
Python programing language is employed to perform our experiments. Python is a powerful

multiparadigm computer programming language, optimized for programmer productivity, code
readability, and software quality. Python is helpful to accomplish real-world tasks and is commonly
used in a range of areas, such as a general-purpose language (Lutz, 2013).
Several software libraries have been developed in order to simplify the process of implementing
machine learning algorithms, the libraries have been used are Numpy, Panda, Matplotlib and Sklearn
(Scikit-larn). For deep learning Keras used is consider as one of the most powerful tools that provide
deep learning models. It is easy to use with Python libraries and providing high- level building blocks
for developing and evaluating deep learning models.
5. RESULTS
The final implementation of the MLPs and CNN is done in Python 3.6.5 using Keras Frameworks
and Spyder served as the development environment.
In the process of training MLPs and CNN, I was guided by obtaining the highest accuracy, and
determine the effect of modifying the architecture for the model. When one parameter (such as epoch
number) was changed, there was no improvement in accuracy. The parameter was modified regularly
and the best obtained results were used in re-experimenting in order to get the satisfactory result.
5.1. MLPs Results

Experiments were carried out to test for accuracy, Detection rate, TNR and determine the AUC. The
baseline architecture for the network was 5 layers, epoch (10000), batch size (5), Adgrad optimizer,
Dropout .5, and ReLU activation function. Figure 4 presents visualized baseline model. The baseline
is modified according to the results.
5.1.1. Modify Hyper-Parameter

Authors modified the parameters and performed more than 150 experiments in order to achieve the
best result, and the results were as shown in Tables 3, 4, 5, 6, and 7.
The best result obtained through the use of the parameters mentioned in Tables 3, 4, 5, 6, and 7
combined with each other. The best results were as shown in Table 8.
5.2. CNN Results

In these experiments the baseline model is defined were it consisting of three convolutional layers
followed by a fully connected layer. Each Convolutional layers are followed with bias, ReLU
activations, and batch normalization. Convolution layers are followed by a max pooling layer. We
have trained the model with epoch with batch size 10. Overall, the layers and their outputs can be
seen in Figure 5. The model is until the best result is achieved.
5.2.1. Modify Hyper-Parameter

Authors modified the parameters and performed more than 70 experiments in order to achieve the
best result, and the results were as shown in Tables 9, 10, and 11.
10
Figure 4. Visualized baseline MLPs model
Table 3. Best no. of epoch
Dataset No. of Epoch

PC1 10000
KC1 15000
KC2 20000
CM1 17000
Table 4. Best batch size
Dataset Batch Size

PC1 5
KC1 10
KC2 10
CM1 7
11
Table 5. Best dropout rate
Dataset Dropout Rate

PC1 .2
KC1 .5
KC2 .5
CM1 .5
Table 6. Best activation function
Dataset Activation Function

PC1 ReLU
KC1 ReLU
KC2 ReLU
CM1 ReLU
Table 7. Best no. of layer
Dataset No. of Layer

PC1 5
KC1 3
KC2 3
CM1 5
Table 8. Best results
No. of Batch Dropout No. of Activation Detection

Dataset Accuracy TNR
Epoch Size Rate Layer Function Rate
PC1 10000 5 .2 5 ReLU .935 .363 .977
KC1 15000 10 .5 3 ReLU .857 .315 .95
KC2 20000 10 .5 3 ReLU .831 .447 .928
CM1 17000 7 .5 5 ReLU .905 .191 .979
The best result obtained through the use of the parameters mentioned in tables 8,9,10 combined
with each other. The best results were as shown in the Table 12. Table 13 shows the best results
obtained from two algorithms.
6. COMPARISON
6.1. MLPs and CNN Comparison Results

The results show a clear advantage for CNN.in three measurements evaluation (Accuracy, Detection
Rate, and TNR). The Figures 6, 7, 8, and 9 illustrate this clearly.
12
Figure 5. Visualized baseline CNN model
Table 9. Best no. of epoch
Dataset No. of Epoch

PC1 4000
KC1 4000
KC2 4000
PC1 4000
Table 10. Best batch size
Dataset Batch Size

PC1 15
KC1 15
KC2 15
CM1 10
The CNN performance achieved excellent result compared to the MLPs algorithms. The CNN
reached 100% on the KC1 dataset, and least accuracy was on the CM1 dataset where 97.3 were. On
the other hand, the best result of the MLPs experiment was 93.5 on the PC1 dataset and least result
was 83.1on the KC2 (compare with 99.3 when using CNN).
13
Table 11. Best no. of layer
Dataset No. of Layer

PC1 3
KC1 3
KC2 3
CM1 3
Detection
Dataset No. of Epoch Batch Size No. of Layer Accuracy TNR
Rate
PC1 4000 15 3 .978 .739 .996
KC1 4000 15 3 1.00 1.00 1.00
KC2 4000 15 3 .993 .992 1.00
CM1 4000 10 3 .973 .823 .992
Figure 6. PC1 comparison
6.2. Comparison With Machine Learning Algorithms

Table 14 shows the results of evaluating 25 classifiers on four different NASA datasets from (Haghighi,
et al.2012) compare it to our results.
14
Figure 7. KC1 comparison
Figure 8. KC2 comparison
Figure 9. CM1 comparison
7. CONCLUSION
The goals of this thesis is to investigate the capabilities and potential of deep learning algorithms,
improve the performance of SFP, and determine the best algorithms that could be used for this purpose.
Authors have conducted experiments followed by analysis and comparisons to verify the
effectiveness of deep learning algorithms (CNN and MLPs). The experiments were applied on
four NASA datasets generated from PROMISE software engineering repository. Results from the
experiments were evaluated using accuracy, Detection rate and TNR. Finally, the analysis and
comparisons were carried out to demonstrate the goals of our study. The results of this study show
15
Table 14. Comparison with ML
Dataset KC1 KC2 CM1 PC1

Algorithms accuracy accuracy accuracy accuracy
Bayes Net 69.8 78.3 64.6 74.3
Naïve Bayes 82.3 83.5 85.3 89.1
Naïve Bayes Updateable 82.3 83.5 85.3 89.1
Logistic 85.6 82.9 88.1 92.4
Multilayer Perceptron 85.9 84.6 87.5 93.5
SGD 85. 84.4 89.5 93.0
Simple Logistic 85.7 84.2 89.1 92.6
SMO 84.7 82.7 89.5 92.9
Voted Perceptron 83.7 30.2 90.1 92.6
IBK 84. 80.4 84.7 90.0
Kstar 83.9 79.1 87.1 91.7
LWL 84.4 79.5 89.7 93.2
AdaBoostM1 84.z 81.4 90.1 93.0
Attribute Selected Classifier 84.3 82.5 89.3 93.4
Bagging 85.4 82.9 89.7 93.5
Classification Via Regression 85.5 81.8 89.3 93.1
CVParameter Selection 84.5 79.5 90.1 93.0
Filtered Classifier 84.8 82.3 90.1 93.5
Logit Boost 85.3 83.5 88.9 93.1
Multi Class Classifier 85.6 82.9 88.1 92.4
Multi Scheme 84.5 79.5 90.1 93.0
Random Committee 85.4 81.2 87.7 93.5
Random SubSpace 85.4 83.9 90.1 93.2
Stacking 84.5 79.5 90.1 93.0
Vote 84.5 79.5 90.1 93.0
OUR Results
MLPs 85.7 83.1 90.5 93.5
CNN 100 99.3 97.3 97.8
that the use of MLPs gives us modest results compared to the CNN. CNN algorithm presented
outstanding results of up to 100% for the KC1,99.3 for the KC2,97.7 for PC1, and 97.3 for CM1.
For future work, it would be useful to explore deep learning algorithms that are not applied in this
study, also, to determine whether there is an algorithm that performs better than the CNN algorithm
that is applied in this study.
16
REFERENCES
Bengio, Y., Goodfellow, I., & Courville, A. (2017). Deep learning (Vol. 1). MIT press.
Bisi, M., & Goyal, N. K. (2015). Early Prediction of Software Fault-Prone Module using Artificial Neural
Network. International Journal of Performability Engineering, 11(1).
Grinciūnaitė, A. (2016). Development of a Deep Learning Model for 3D Human Pose Estimation in Monocular
Videos [Doctoral dissertation]. Vilnius Gediminas Technical University.
Han, J., & Kamber, M. (2012). Data Mining: C d h Concepts and Techniques.
Karlik, B., & Olgac, A. V. (2011). Performance analysis of various activation functions in generalized MLP
architectures of neural networks. International Journal of Artificial Intelligence and Expert Systems, 1(4),
111–122.
Kaur, D., Kaur, A., Gulati, S., & Aggarwal, M. (2010, September). A clustering algorithm for software fault
prediction. Proceedings of the 2010 International Conference on Computer and Communication Technology
(ICCCT) (pp. 603-607). IEEE. doi:10.1109/ICCCT.2010.5640474
Kumar Dwivedi, V., & Singh, M. K. (2016). Software Defect Prediction Using Data Mining Classification
Approach. International Journal of Technical Research and Applications, 4(6), 31–35.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436.
Li, J., He, P., Zhu, J., & Lyu, M. R. (2017, July). Software defect prediction via convolutional neural network.
Proceedings of the 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS)
(pp. 318-328). IEEE. doi:10.1109/QRS.2017.42
Lutz, (2013), ‘Learning Python’. O’Reilly, 5th edn
Ma, B., Dejaeger, K., Vanthienen, J., & Baesens, B. (2011). Software defect prediction based on association
rule classification.
Mundada, D., Murade, A., Vaidya, O., & Swathi, J. N. (2016). Software fault prediction using artificial neural
network and Resilient Back Propagation. International Journal on Computer Science and Engineering, 5(03).
Naser, S. S. A. (2012). Predicting learners performance using artificial neural networks in linear programming
intelligent tutoring system. International Journal of Artificial Intelligence & Applications, 3(2), 65–73.
doi:10.5121/ijaia.2012.3206
Nisa, I. U., & Ahsan, S. N. (2015, December). Fault prediction model for software using soft computing techniques.
Proceedings of the 2015 International Conference on Open Source Systems & Technologies (ICOSST) (pp. 78-
83). IEEE. doi:10.1109/ICOSST.2015.7396406
Owhadi-Kareshk, M., Sedaghat, Y., & Akbarzadeh-T, M. R. (2017, October). Pre-training of an artificial neural
network for software fault prediction. Proceedings of the 2017 7th International Conference on Computer and
Knowledge Engineering (ICCKE) (pp. 223-228). IEEE. doi:10.1109/ICCKE.2017.8167880
Pahal, A., & Chillar, R. S. (2017). A Hybrid Approach for Software Fault Prediction Using Artificial Neural
Network and Simplified Swarm Optimization. International Journal of Advanced Research in Computer and
Communication Engineering, 6(3), 601–605. doi:10.17148/IJARCCE.2017.63140
Paramshetti, P., & Phalke, D. A. (2014). Survey on software defect prediction using machine learning techniques.
International Journal Of Science And Research, 3(12), 1394–1397.
Paul, S., & Singh, L. (2015, December). A review on advances in deep learning. Proceedings of the 2015 IEEE
Workshop on Computational Intelligence: Theories, Applications and Future Directions (WCI) (pp. 1-6). IEEE.
Peng, S., Jiang, H., Wang, H., Alwageed, H., & Yao, Y. D. (2017, April). Modulation classification using
convolutional neural network based deep learning model. Proceedings of the 2017 26th Wireless and Optical
Communication Conference (WOCC) (pp. 1-5). IEEE. doi:10.1109/WOCC.2017.7929000
Promise Software Engineering Repository. (2019). Retrieved from http://promise.site.uottawa.ca/SERepo-sitory
17
Rana, R. (2015). Software defect prediction techniques in automotive domain: evaluation, selection and adoption.
Shafi, I., Ahmad, J., Shah, S. I., & Kashif, F. M. (2006, December). Impact of varying neurons and hidden layers
in neural network architecture for a time frequency application. Proceedings of the 2006 IEEE International
Multitopic Conference (pp. 188-193). IEEE. doi:10.1109/INMIC.2006.358160
Snuverink, I. (2017). Deep Learning for Pixelwise Classification of Hyperspectral Images: A generalizing model
for a fixed scene subject to temporally changing weather, lighting and seasonal conditions.
Tong, H., Liu, B., & Wang, S. (2018). Software defect prediction using stacked denoising autoencoders and two-
stage ensemble learning. Information and Software Technology, 96, 94–111. doi:10.1016/j.infsof.2017.11.008
Yang, S., Chen, L. F., Yan, T., Zhao, Y. H., & Fan, Y. J. (2017, May). An ensemble classification algorithm
for convolutional neural network based on AdaBoost. Proceedings of the 2017 IEEE/ACIS 16th International
Conference on Computer and Information Science (ICIS) (pp. 401-406). IEEE. doi:10.1109/ICIS.2017.7960026
Yang, X., Lo, D., Xia, X., Zhang, Y., & Sun, J. (2015, August). Deep learning for just-in-time defect prediction.
Proceedings of the 2015 IEEE International Conference on Software Quality, Reliability and Security (pp. 17-
26). IEEE. doi:10.1109/QRS.2015.14
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional networks.
Proceedings of the European conference on computer vision (pp. 818-833). Springer, Cham.
Zhang, S. Zhang, S. Wang, B. G. & Habetler, T. (2019). Machine Learning and Deep Learning Algorithms for
Bearing Fault Diagnostics - A Comprehensive Review.
Zhao, R., Yan, R., Chen, Z., Mao, K., Wang, P., & Gao, R. X. (2019). Deep learning and its applications to machine
health monitoring. Mechanical Systems and Signal Processing, 115, 213–237. doi:10.1016/j.ymssp.2018.05.050
18
Mohammed Akour is an Associate Professor Software Engineering at Al Yamamah University (YU). He got his
Bachelor (2006) and Master (2008) degrees from Yarmouk University in Computer Information System with
Honor. He joined Yarmouk University as a Lecturer in August 2008 after graduating with his master’s in computer
information system. In August 2009, He left Yarmouk University to pursue his PhD in Software Engineering at
North Dakota State University (NDSU). He joined Yarmouk University again in April 2013 after graduating with his
PhD in Software Engineering from NDSU with Honor. He serves as Organizer, a Co-chair and publicity Chair for
several IEEE conferences, and as ERB for more than 10 ISI indexed prestigious journals. He is a member of the
International Association of Engineers (IAENG). Dr. Akour at Yarmouk University served as Head of accreditation
and Quality assurance and then was hired as director of computer and Information Center. In 2018, Dr. Akour has
been hired as vice Dean of Student Affairs at Yarmouk University. In 2019, Dr. Akour joins Al Yamamah University
-Riyadh Saudi Arabia- as Associate professor in Software Engineering.
19
View publication stats

Software Fault Prediction Using Deep Learning Algorithms Published

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Software Fault Prediction Using Deep Learning Algorithms Published

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Software Fault Prediction Using Deep Learning Algorithms

Article in International Journal of Open Source Software and Processes · October 2019

Osama Alqasem Mohammed Akour

SEE PROFILE SEE PROFILE

Algorithms View project

Software Fault Predication View project

The user has requested enhancement of the downloaded file.

Software Fault Prediction Using

2.1. Multilayer Perceptrons (MLPs)

2.2. Convolutional Neural Networks (CNNs)

4.1. Selected Dataset

Table 1. Dataset characteristics

Dataset Attribute Language Instances Faulty Percentage

4.3. Applying MLPs

4.4. Applying CNN

Figure 2. MLPs architecture

4.5. Modifying the Model

4.5.1. Hyper-Parameter Tuning

4.5.2. The Number of Layer

Figure 3. CNN architecture

4.5.3. Activation Function

The Tanh function defined by the formula:

A comparison between deep learning algorithms performance is conducted in terms of

Detection rate = TP/ (TP+FN)

Table 2. Confusion matrix

Accuracy = (TP+TN) / (TP+TN+FP+TN)

TNR = TN/ (TN+FP)

Python programing language is employed to perform our experiments. Python is a powerful

5.1. MLPs Results

5.1.1. Modify Hyper-Parameter

5.2. CNN Results

5.2.1. Modify Hyper-Parameter

Figure 4. Visualized baseline MLPs model

Table 3. Best no. of epoch

Dataset No. of Epoch

Table 4. Best batch size

Dataset Batch Size

Table 5. Best dropout rate

Dataset Dropout Rate

Table 6. Best activation function

Dataset Activation Function

Table 7. Best no. of layer

Dataset No. of Layer

Table 8. Best results

No. of Batch Dropout No. of Activation Detection

6.1. MLPs and CNN Comparison Results

Figure 5. Visualized baseline CNN model

Table 9. Best no. of epoch

Dataset No. of Epoch

Table 10. Best batch size

Dataset Batch Size

Table 11. Best no. of layer

Dataset No. of Layer

Table 12. Best results

Table 13. Best results

Figure 6. PC1 comparison

6.2. Comparison With Machine Learning Algorithms

Figure 7. KC1 comparison

Figure 8. KC2 comparison

Figure 9. CM1 comparison

Table 14. Comparison with ML

Dataset KC1 KC2 CM1 PC1