Professional Documents
Culture Documents
Mathematics 10 03120
Mathematics 10 03120
Article
Deep Learning-Based Software Defect Prediction via Semantic
Key Features of Source Code—Systematic Survey
Ahmed Abdu 1 , Zhengjun Zhai 1,2, * , Redhwan Algabri 3, * , Hakim A. Abdo 4 , Kotiba Hamad 5, *
and Mugahed A. Al-antari 6, *
Abstract: Software defect prediction (SDP) methodology could enhance software’s reliability through
predicting any suspicious defects in its source code. However, developing defect prediction models
is a difficult task, as has been demonstrated recently. Several research techniques have been proposed
over time to predict source code defects. However, most of the previous studies focus on conventional
feature extraction and modeling. Such traditional methodologies often fail to find the contextual
information of the source code files, which is necessary for building reliable prediction deep learning
models. Alternatively, the semantic feature strategies of defect prediction have recently evolved and
Citation: Abdu, A.; Zhai, Z.; Algabri,
developed. Such strategies could automatically extract the contextual information from the source
R.; Abdo, H.A.; Hamad, K.; Al-antari,
code files and use them to directly predict the suspicious defects. In this study, a comprehensive
M.A. Deep Learning-Based Software
survey is conducted to systematically show recent software defect prediction techniques based on
Defect Prediction via Semantic Key
Features of Source Code—Systematic
the source code’s key features. The most recent studies on this topic are critically reviewed through
Survey. Mathematics 2022, 10, 3120. analyzing the semantic feature methods based on the source codes, the domain’s critical problems
https://doi.org/10.3390/ and challenges are described, and the recent and current progress in this domain are discussed. Such
math10173120 a comprehensive survey could enable research communities to identify the current challenges and
future research directions. An in-depth literature review of 283 articles on software defect prediction
Academic Editors: George E.
Tsekouras, Christos Kalloniatis and
and related work was performed, of which 90 are referenced.
Dimitrios Makris
Keywords: software defect prediction (SDP); source code representation; deep learning; semantic
Received: 1 August 2022 key features
Accepted: 26 August 2022
Published: 31 August 2022
MSC: 03-04
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations. 1. Introduction
Due to their roles in assisting developers and testers to automatically allocate the
suspicious defects and prioritizing testing efforts, software defect prediction (SDP) tech-
niques are gaining much popularity. SDP could also identify whether a software project
Copyright: © 2022 by the authors.
is faulty or healthy in the early project life cycle and effectively improve the testing pro-
Licensee MDPI, Basel, Switzerland.
cess and software quality [1,2]. Software is a set of instructions, programs, or data used
This article is an open access article
to execute specific tasks in various applications such as cyber security [3], robotics [4,5],
distributed under the terms and
autonomous vehicles [6], image processing [7,8], prediction of malignant mesothelioma [9],
conditions of the Creative Commons
recommendation systems [10], etc.
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
Previous studies have used learning techniques based on traditional metrics to build
4.0/).
such SDP models. The specific and limited source code features have been manually
generated to identify the faulty files among others such as McCabe features [11], Halstead
features [12], Chidamber and Kemerer (CK) features [13], MOOD features [14], change
features [15], and other object-oriented (OO) features [16]. Several machine learning
techniques for SDP have been implemented, including Support Vector Machine (SVM) [17],
Naive Bayes (NB) [18], Neural Network (NN) [19], Decision Tree (DT) [20], Dictionary
Learning [21], and Kernel Twin Support Vector Machines (KTSVM) [22].
However, code regions with different semantics key features cannot be distinguished
using existing traditional features because the traditional features could carry the same
information for code regions with different semantics (see Section 4.1). Sematic feature
strategies have evolved to detect suspicious defect prediction from the source codes in
recent years as alternative methodologies. Rather than using the metric generator analysis,
such methods could directly predict the defects from the project’s source code [23–25]. These
approaches use recent deep learning techniques to fit the obstetric model through creating
multi-dimensional features that perform perfectly in either the source code reconfiguration
as a sequential structure or abstract syntax trees in the input data set. In future, advanced
classification algorithms could use the semantic key features similar to the scenario of
metric-based defect prediction to achieve a more effective prediction performance.
Despite the rapid advancement of the SDP domain, the preliminary survey stud-
ies [26–30] show insufficient reviews to capture the latest challenges of the deep learning
techniques that used the source code key features for defect prediction. For example, the
recent progress in the related source code analysis and representation domain was pre-
sented based on the different processing techniques such as abstract syntax trees (AST) and
token, graph, or statement-based methods where they were successfully used for software
defect prediction.
Researchers working on the field of semantic features-based SDP need to focus on two
essential aspects: the representation of source code and deep learning techniques. Figure 1
shows the classifications of semantic features-based SDP approaches (See Sections 4.2, 4.4–4.6).
In this study, a systematic literature review (SLR) strategy is conducted based on seven
initial research questions (see Table 1). The main contributions of this study are:
• A comprehensive study is conducted to show the state-of-the-art software defect
prediction approaches using the contextual information that is generated from the
Mathematics 2022, 10, 3120 3 of 26
source codes. The strategies for analyzing and representing the source code is the
main topic of this study.
• The current challenges and threats related to the semantic features-based SDP are
evaluated through examining and addressing the current potential solutions. In
addition, we provide guidelines for building an efficient deep learning model for the
SDP over various conditions.
• We also group deep learning (DL) models that have recently been used for the SDP
based on the semantic features collected from the source codes and compare the
performance of the various DL models, and the best DL models that outperform the
existing conventional ones are outlined.
This paper is organized as follows: Section 2 presents the review of background and
related studies, Section 3 presents the methodology of this study, Section 4 illustrates the
results and answers the research questions, Section 5 summarizes the results of this study,
and lastly, Section 6 presents the conclusion of this study.
Figure 2. Traditional producer of the software defect prediction (SDP) in abstract view.
Previous studies presented several SDP techniques based on the standard features where
Wang et al. [32] proposed a semi-boost algorithm to define the latent clustering relationship
between modules expressed in an enhancing framework. Wu et al. [33] proposed dictionary
learning under a semi-supervised approach using the labeled and unlabeled defect datasets.
Furthermore, their method considered the classification cost errors during the dictionary
learning. Zhang et al. [34] proposed an approach to generate a class balance training dataset
using a Laplacian point sampling method for the labeled fault-free programs then calculated a
relationship graph’s weights using a non-negative sparse algorithm. Their proposed approach
predicted the identities of unlabeled software packages using a label propagation algorithm
based on the constructed non-negative sparse graph. Zhu et al. [1] introduced a method for
Mathematics 2022, 10, 3120 4 of 26
defect prediction based on a Naive Bayesian algorithm which built a training model based on
the data gravity and feature dimension weight, then calculated the prior probabilities for the
training data using the information weights to construct a prediction classifier for SDP. Jin [22]
used the Kernel twin support vector machine (KTSVM) for performing the domain adaptation
to meet various training data distributions. They used their strategy as the cross-project defect
prediction (CPDP) model.
2.2. Software Defect Prediction Based on Analysis the Source Code Semantic Features
Although several traditional approaches have been presented for SDP, these traditional
approaches still need to improve their prediction performance. Such SDP models based on
the semantic features are currently gaining popularity in software engineering. A typical
semantic feature-based SDP process is shown in Figure 3. The first step is to parse the source
code and extract code nodes. There are three widely used techniques for representing the
source code: Graph-based method, AST-based method, and Token-based method [35] (See
Section 4.2).
(token1, token2,
token3, token4, ...)
Encoding/Embedding
Mapping
CFG Nodes
token1 1
token2 2
token3 3
token4 4
... ..
Classifier
Defective Or Clean
The second step is to extract the token node vectors using the Encoding or Embedding
process. Only the numerical vectors can be used as an input quantity for deep learning
models. Thus, a unique integer identification is assigned to each token. A mapping
between the different tokens and integers is created to construct the semantic features
and then the token vectors are encoded/embedded into integer vectors for the deep
learning models. Generally, word embedding is implemented using the global vectors
(GloVe) [36] or word2vec [37] techniques. After the numeric vectors are extracted from the
AST nodes, the labeling procedure is conducted automatically using the blaming function or
annotating in Version Control System (VCS) [23,24,38,39]. The last step is to train the deep
learning models using the labeled semantic feature dataset. After the DL model is perfectly
fine-tuned, the model evaluation is performed using the testing subset to investigate the
prediction performance showing weather the assessed source code module is clean or it
has some defects. The most common deep learning approaches for the SDP include, but are
not limited to, Convolution Neural Networks (CNN), Deep Belief Networks (DBN), Long
Short Term Memory (LSTM), and Transformer Architectures (See Sections 4.4 and 4.5).
Mathematics 2022, 10, 3120 5 of 26
3. Methodology
Section 3.1 introduces the research strategy, Section 3.2 describes the research questions,
and Section 3.3 presents the search for related studies.
Exclusion principles:
• The study reported without an experimental investigation of the semantic feature-
based technique.
• The article has only an abstract (the accessibility of the article is not included in this
criterion; both papers (open access and subscription) were included.
• The paper is not a major relevant study article.
• The study does not provide a detailed description of how deep learning is applied.
4. Results
In this section, we discuss our answers to each study question.
models. Semantic features represent the contextual information of a source code that
standard features cannot express.
Figure 7 illustrates four java code files. For example, Figure 7a shows an original
defect version (memory leak bug), and Figure 7b shows a clean version (code after fixing
the bug). The code version in Figure 7a contains an IOException when the variables (‘is’
and ‘os’) are initialized before the “try” procedure. The defective code file can cause a
memory leak (https://issues.apache.org/jira/browse/LUCENE-3251, we accessed this
link last time on 1 August 2022), but this has been corrected in Figure 7b through shifting
the initializing sentences into the try statement. In addition, the buggy code file in Figure 7c
and the clean code file in Figure 7d have the same structure of do-while block; the only
difference is that Figure 7c does not include the loop increment statement, which will
lead to an infinite loop bug. These java code files contain the same source code properties
regarding function calls, complexity, code size, etc. Using traditional features like code
size and complexity to represent these two Java files will lead to equivalent feature vectors.
However, these two code lines have immensely different semantic information. In this case,
semantic information is required to create more accurate prediction models.
Figure 7. Motivating Java code example. (a) Original buggy code (memory leak bug), (b) Code after
fixing the memory leak bug, (c) Original buggy code (infinite loop bug), and (d) Code after fixing the
infinite loop bug.
In the token-based method, the source code fragment is divided into tokens. The bag
of words (BOW) representation is used to count the frequency of each token appearing in a
document [46]. A token-based method requires more Central Processing Unit (CPU) time
and memory than a line-by-line examination because a code line usually contains several
tokens. However, the token-based approach can use several transforms, resulting in efficient
elimination of coding style differences and identification of many code segments as clone
pairs [47]. The source code representation can help in numerous applications to improve the
shortage of memory and speed, such as cyber security [48,49], robotics [50,51], etc.
Abstract Syntax Tree (AST) is a method used to represent the source code semantic
information and has been applied to software engineering tools and programming lan-
guages [52]. ASTs do not contain details such as delimiters and punctuation identifiers
compared to the standard code file. In contrast, ASTs can present contextual and linguistic
information about source code [53].
In the case of the graph-based method, such as control flow graph CFGs [54], program
dependence graphs [55], and graph representations of programs [56], they can be applied
to describe contextual information of the source code.
Each technique of code representation is carefully constructed to collect specific data
that can either be applied independently or with other methods to produce a more extensive
result. Then, the features carrying important information from the source code will be
gathered, which can be used in code clone detection, defect prediction, automatic program
repair, etc. [57].
Several public labeled datasets for SDP have also been generated. Table 4 shows a list
of available labeled datasets used for SDP. Such datasets usually consist of pairs of faulty
and correct code parts.
Mathematics 2022, 10, 3120 11 of 26
4.5. Rq-5: Performance Analysis of Different Deep Learning Techniques Applied to Semantic
Feature-Based SDP Models
This section discusses the SDP models based on semantic information directly ex-
tracted from source code using DL techniques. Wang et al. [23] proposed the first approach
to generate the contextual information from source code and employ them in SDP. After,
researchers proposed several methods to extract the contextual features from the source
code using deep learning techniques. Table 5 shows the detail of semantic feature-based
SDP studies, the DL techniques such as DBN, CNN, LSTM, RNN, and BERT, and the jour-
nal information, publication year, evaluation metrics, and methods used for representing
source code and extracting semantic features. AST is usually used to describe the source
code as shown in Table 5.
Mathematics 2022, 10, 3120 12 of 26
DL Code Evaluation
Studies Year Pros Cons Performance Limitations
Methods Representation Metrics
The study proposed
using a powerful Their experiments
The same DBN
DBN models representation-learning have been perform-
methodology can
require massive technique and DL to ed on java projects
Wang et al. [23] 2016 DBN AST F1 score be applied to
data to execute the automatically learn needed to test
various data types
better performance. the semantic features them on other
and applications.
from source code software projects.
to predict software defects.
It is the extension of
Their outputs
Wang et al. [23] to
were highly
demonstrate the viabi-
F1 score efficient; they
Wang et al. [24] 2018 DBN AST – – lity and efficiency of
and PofB20 only needed
predicting defects using
to apply them
semantic representation
to real projects.
from source code.
In the study, the authors Comparative ana-
Easy to improve CNNs typically
suggested using deep lysis is limited;
the prediction operate much
neural networks and their results need
model through more slowly
Phan et al. [74] 2017 CNN CFGs AUC accurate graphs represen- to be compared
adding a Conv Because of
ting instruction execution with the findings
layer for pooling procedures like
patterns to detect defect of other
(like cbow). max pool.
features automatically. approaches.
Results must be
They built a prediction
compared to
model using CNN to
Li et al. [75] 2017 CNN AST F1 score – – those obtained by
learn semantic information
other methods of
from the program’s AST.
AST node sequence.
Mathematics 2022, 10, 3120 13 of 26
Table 5. Cont.
DL Code Evaluation
Studies Year Pros Cons Performance Limitations
Methods Representation Metrics
They need to
conduct more
This study explored
projects in many
Meilong contextual data based
2020 CNN AST F1 score – – languages to
et al. [76] on AST sequences to
verify the genera-
highlight defect features.
lizability of their
study.
The results are
limited; various
They built a prediction
LSTM is an Training LSTM datasets must
system upon LSTM
excellent tool for model requires be used to
F1 score architecture to capture
Dam et al. [77] 2019 LSTM AST any task requir- more memory, test the results.
and AUC the source code’s
inga sequence and and they can The findings must
syntax and semantics
text generation. overfit easily. also be compared
representation.
to other
approaches.
Additional metr-
They developed effective
ics for other
SDP models based
Statement- F1 score programming
on statement-level.
Majd et al. [78] 2020 LSTM Level (SL) and – – languages, such
Results were evaluated
features Accuracy as Java, Python,
on more than 100,000
or Kotlin, are
C++/C programs.
required.
Results must be
They built a prediction
compared to
model to automatically
those obtained by
Deng et al. [79] 2020 LSTM AST F1 score – – learn semantic inform-
other methods of
ation from the program’s
AST node
ASTs using LSTM.
sequence.
Mathematics 2022, 10, 3120 14 of 26
Table 5. Cont.
DL Code Evaluation
Studies Year Pros Cons Performance Limitations
Methods Representation Metrics
More semantic
The study proposed
features about
to express semantic
programs, such
features using word2vec
as data flow
Liang et al. [80] 2019 LSTM AST F1 score – – and used LSTM as
details, should
the predictor to
be captured as
enhance the performance
features to vali-
of the SDP model.
date their work.
BiLSTM is better A BiLSTM-based network
There is a need
for solving the was used to extract
for additional
F1 score, fixed sequence BiLSTM is costly token sequences from
Lin and metrics for progra-
2021 BiLSTM AST AUC and to sequence because it uses the AST and train
Lu [81] mming languages
MCC prediction problem double LSTM cells. the proposed model
such as Java, Python,
with varying input by learning contextual
Kotlin, etc.
and output lengths. features.
A defect prediction model
A comparison
It is a good based on gated hierar-
study is constrai-
choice for source chical LSTM (GH-LSTM)
It is difficult ned; it is necessary
F1 score code analysis has been proposed, which
Wang et al. [57] 2021 GH-LSTM AST to train more to contrast their
and PofB20 tasks such as uses the semantic fea-
accurately. results with
named entity tures extracted from the
other approaches’
recognition. word embeddings of AST
results.
of source code files.
Experimental
The vanishing
While creating An attention-based RNN work should be
and gradient
a model, it accur- has been proposed to conducted on a
problem affects
F1 score ately captures predict software defects wide range of pr-
Fan et al. [82] 2019 ARNN AST deep RNNs,
and AUC the relationships based on semantic features ojects, including
negatively affe-
between the extracted from the source both personal
cting prediction
text’s words. code’s AST. and company-
performance.
related projects.
Mathematics 2022, 10, 3120 15 of 26
Table 5. Cont.
DL Code Evaluation
Studies Year Pros Cons Performance Limitations
Methods Representation Metrics
GNN is suitable To validate their
for training The GNN has been cons- study, more
GNNs are not
F1 score, models on any tructed to predict the semantic program
noise-resistant
Xu et al. [83] 2020 GNN AST AUC, and dataset that software defects using features, like data
in graph-based
Accuracy contains pairwise the contextual information flow information,
data
relationships and extracted from the AST. should be included
input data. as features.
BERT models, In BERT models, The suggested approach
Their study
which are trained training is slow employs a BERT model
lacks comparison
BiLSTM Full-token on a big corpus, because it is to capture the sema-
Uddin et al. [84] 2022 F1 score with other domain
and BERT and AST make it easier big and requires ntic features of code
adaptation methods
for more precise updating a lot of to predict software
for SDP.
and smaller jobs. weights. defects.
Mathematics 2022, 10, 3120 16 of 26
Figures 8 and 9 show the distribution of semantic feature-based SDP using DL.
Figure 8a shows the paper distribution per code representation techniques, Figure 8b
shows distribution per evaluation measures, and Figure 8c shows paper distribution per on-
line libraries. Figure 9a shows paper distribution per year, and Figure 9b shows distribution
according to DL techniques.
(a) (b)
(c)
Figure 8. (a) Distribution per code representation techniques, (b) Distribution per evaluation mea-
sures, and (c) Distribution per online libraries.
(a) (b)
Figure 9. (a) Distribution per year and (b) Distribution according to DL techniques.
Mathematics 2022, 10, 3120 17 of 26
4.6. Rq-5.1: Which Deep Learning Techniques Outperform Existing SDP Models?
We compare the recall, f-measure, and precision of different SDP models to measure the
efficiency of various semantic features-based SDP models. Table 6 shows deep learning models’
recall, f-measure, and precision of the semantic features-based SDP. The result indicates that
the f-measure score for DBN models ranges between 0.645 and 0.658, CNN models range
between 0.609 and 0.627, LSTM range between 0.521 and 0.696, ARNN is 0.575, GNN is 0.673,
and BERT is 0.689. Table 6 shows that despite the different values for all the techniques used,
they are generally comparable, with ranges between 0.521 and 0.696.
Table 6. Recall, Precision, and F-Measure values of semantic features based-SDP using deep learn-
ing models.
Figure 10 shows the average precision, f-measure, and recall of DL techniques used
in the semantic features-based SDP. The LSTM approach is the most efficient approach
among all DL techniques and is the most usually utilized DL technique in the SDP area
based on the results. Following DBN, CNN and BERT is also used as an SDP model in
many research; BERT outperforms DBN and CNN, while DBN outperforms CNN.
Figure 10. Average precision, recall, and f-measure of DL techniques used in the semantic features-
based SDP.
In effort-aware cases, we estimate that the software resources are confined when
developers perform a code check, which is the most often used in real-time SDP techniques.
In this scenario, developers or testers can only check a few instances of the predicted
bugs; specifically designed metrics should evaluate the prediction performance under
effort-aware scenarios.
A. Non-effort-aware evaluation metrics
SDP models have four potentials for predicting the result of a code change: (1) predict
a defective code change as a defect (True positive, TP), (2) Predict a defective code change
as no defect (False Positive, FP), (3) Predict a non-defective code change as no defect (True
Negative, TN), and (4) Predict a non-defective code change as defective (False Negative,
FN). The prediction model calculates performance indicators such as recall, F-measure,
precision, and UAC in the test set based on these four possibilities.
Precision represents the ratio of all correctly classified faulty changes to all incorrect
faults, which is given by:
TP
Precision = . (1)
TP + FP
Recall refers to the ratio of all correctly faulty changes to all truly faulty changes,
which is expressed by:
TP
Recall = . (2)
TP + FN
F-measure: A comprehensive performance indicator combines the recall and precision
rates. It is the consistent average of the recall and precision rate as follows:
2 × precision × recall
F- measure = . (3)
precision + recall
AUC: Area Under the Curve is a frequently used performance metric in real-time
SDP research. A two-dimensional space represents the ROC curve using recall as the
y-coordinate and Pf as the x-coordinate [85].
B. Effort-aware evaluation metrics
Developers try to examine less code to catch as many faults as possible to detect defects
more efficiently. In this study, we choose two measures that can evaluate the performance
of code examination according to the findings provided by SDP models in the effort-aware
scenario: PofB20 and Popt.
PofB20 is a metric for determining the ratio of defects that a programmer can find by
checking 20 percent of the lines of code (LOC). When the programmer finishes inspecting
20 percent of the full code, the scores of PofB20 are referred to as the percentage of faults
identified. When examining many LOC, a higher PofB20 value determines that the projects
have more defects.
Popt is the indicator of effort-aware performance based on the Alberg diagram con-
cept [86], which presents the prediction model performances in the effort-aware case. The
x-axis displays the percentage of examined LOC, and the y-axis indicates the recall of an
SDP model, as shown in Figure 11. Popt can be derived from the prediction model (m),
which is given by:
Area (Optimal) − Area (m)
Popt = . (4)
Area (Optimal) − Area (Worst)
where the optimal describes situations where all data are arranged according to defect
density in descending order. The worst line refers to the scenario in which the defect
density of all files is arranged in ascending order.
Mathematics 2022, 10, 3120 19 of 26
Researchers probably selected these measures because they are frequently used in
deep learning research. Accuracy is a poor measure for defect prediction research because
the SDP datasets are imbalanced. Besides accuracy, other measures must also be used to
evaluate the efficiency of the SDP models, including recall, precision, and PofB20.
detection, masked language modeling, and next sentence prediction. Table 3 shows a list of
the standard unlabeled datasets appropriate for SDP.
In addition,the distribution of the classes can also be a factor that affects the difficulty
of building datasets in real software projects. A project often has fewer faulty files or
procedures than valid ones. Consequently, the standard classifiers may correctly identify
the main classification (clean code) but discard the smaller category of defect-prone code.
5. Discussion
This section presents the systematic literature review’s general discussion and valid-
ity issues.
Besides accuracy, other measures must also be used to evaluate the efficiency of the SDP
models, including recall, precision, and PofB20.
RQ-7: Unlike natural languages, collecting contextual information from the source code
faced several challenges. The complexity of the source code structure is one of the challenges
with source code representation. A code fragment can be dependent on a far-off component
or even be found in another code fragment. It can be challenging to assess whether it is
flawed without knowing the context of a coding unit. Furthermore, the lack of sufficiently
labeled datasets for semantic features-based SDP is one of the challenges. To solve this
problem, developers can employ contextual word embedding with pre-training. This strategy
pre-trains the language model using self-supervised learning on many unlabeled datasets.
research selection criteria. The selected articles are organized according to code represen-
tation techniques, deep learning types, datasets, evaluation metrics, best deep learning
techniques, gaps, and challenges, and the related results are presented. Researchers mostly
preferred the AST method to represent source code and extract the semantic features. Fur-
thermore, several repositories and datasets exist for semantic features-based SDP studies.
Researchers also used deep learning for building semantic features-based SDP models;
LSTM is commonly used in these tasks.
This study will help software developers, and researchers analyze and evaluate SDP
modeling. They will select more deep learning techniques and code representation methods
under various scenarios; they can identify the relationships between the type of projects,
code representation methods, deep learning techniques, and required evaluation measures
over such situations. In addition, it will enable them to handle various SDP-related threats
and difficulties.
The following are future recommendations for researchers and software developers in
the SDP field:
• There should be a more general outcome of various semantic features-based SDP
approaches; only a few studies have been studied in generalized methods. When
the researcher adopts this perspective, it will be possible to compare SDP’s efficiency
and performance.
• Industries should have free access to the dataset to conduct more research experiments
for SDP. Researchers and developers should reduce the demand for labeled datasets
of large size; they should apply self-supervised learning to large amounts of unla-
beled data. They should add more features and test cases so DL can be easily used
without‘overfitting.
• It is essential to compare the performance of DL techniques to other DL /Statistical
approaches to assess their potential comparison to SDP.
• Mobile applications have received widespread attention because of their simplicity,
portability, timeliness, and efficiency. The business community is constantly devel-
oping office and mobile applications suitable for various scenarios. Therefore, SDP
techniques can also be applied to mobile application-based architecture. Practition-
ers should precisely be taking further care to explore the applicability of existing
approaches in mobile applications.
Author Contributions: Conceptualization, A.A.; Data curation, H.A.A.; Formal analysis, R.A.; Fund-
ing acquisition, K.H. and M.A.A.-a.; Project administration, Z.Z.; Resources, H.A.A.; Software, A.A.;
Supervision, Z.Z.; Writing—original draft, A.A.; Writing—review & editing, R.A., K.H. and M.A.A.-a.
All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Sejong University Industry-University Cooperation Founda-
tion under the project number (20220224) and by the National Research Foundation (NRF) of South
Korea (2020R1A2C1004720).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Acknowledgments: We acknowledge the fund sponsors of this research.
Conflicts of Interest: The authors declare no conflict of interest.
Mathematics 2022, 10, 3120 23 of 26
Abbreviations
The following abbreviations are used in this manuscript:
References
1. Zhu, K.; Zhang, N.; Ying, S.; Wang, X. Within-project and cross-project software defect prediction based on improved transfer
naive bayes algorithm. Comput. Mater. Contin. 2020, 63, 891–910.
2. Zhu, K.; Ying, S.; Zhang, N.; Zhu, D. Software defect prediction based on enhanced metaheuristic feature selection optimization
and a hybrid deep neural network. J. Syst. Softw. 2021, 180, 111026. [CrossRef]
3. Shaukat, K.; Luo, S.; Varadharajan, V.; Hameed, I.A.; Chen, S.; Liu, D.; Li, J. Performance comparison and current challenges of
using machine learning techniques in cybersecurity. Energies 2020, 13, 2509. [CrossRef]
4. Algabri, R.; Choi, M.T. Deep-learning-based indoor human following of mobile robot using color feature. Sensors 2020, 20, 2699.
[CrossRef] [PubMed]
5. Algburi, R.N.A.; Gao, H.; Al-Huda, Z. A new synergy of singular spectrum analysis with a conscious algorithm to detect faults in
industrial robotics. Neural Comput. Appl. 2022, 34, 7565–7580. [CrossRef]
6. Alghodhaifi, H.; Lakshmanan, S. Autonomous vehicle evaluation: A comprehensive survey on modeling and simulation
approaches. IEEE Access 2021, 9, 151531–151566. [CrossRef]
7. Al-Huda, Z.; Peng, B.; Yang, Y.; Algburi, R.N.A. Object scale selection of hierarchical image segmentation with deep seeds. IET
Image Process. 2021, 15, 191–205. [CrossRef]
8. Peng, B.; Al-Huda, Z.; Xie, Z.; Wu, X. Multi-scale region composition of hierarchical image segmentation. Multimed. Tools Appl.
2020, 79, 32833–32855. [CrossRef]
9. Alam, T.M.; Shaukat, K.; Hameed, I.A.; Khan, W.A.; Sarwar, M.U.; Iqbal, F.; Luo, S. A novel framework for prognostic factors
identification of malignant mesothelioma through association rule mining. Biomed. Signal Process. Control 2021, 68, 102726.
[CrossRef]
10. Shaukat, K.; Alam, T.M.; Ahmed, M.; Luo, S.; Hameed, I.A.; Iqbal, M.S.; Li, J.; Iqbal, M.A. A model to enhance governance
issues through opinion extraction. In Proceedings of the 2020 11th IEEE Annual Information Technology, Electronics and Mobile
Communication Conference (IEMCON), Vancouver, BC, Canada, 4–7 November 2020; pp. 511–516.
11. McCabe, T.J. A complexity measure. IEEE Trans. Softw. Eng. 1976, 4, 308–320. [CrossRef]
12. Halstead, M.H. Elements of Software Science (Operating and Programming Systems Series); Elsevier Science Inc.: New York, NY,
USA, 1977.
13. Chidamber, S.R.; Kemerer, C.F. A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 1994, 20, 476–493. [CrossRef]
14. Harrison, R.; Counsell, S.J.; Nithi, R.V. An evaluation of the MOOD set of object-oriented software metrics. IEEE Trans. Softw.
Eng. 1998, 24, 491–496.
15. Jiang, T.; Tan, L.; Kim, S. Personalized defect prediction. In Proceedings of the 2013 28th IEEE/ACM International Conference on
Automated Software Engineering (ASE), Silicon Valley, CA, USA, 11–15 November 2013; pp. 279–289.
16. e Abreu, F.B.; Carapuça, R. Candidate metrics for object-oriented software within a taxonomy framework. J. Syst. Softw. 1994,
26, 87–96. [CrossRef]
17. Elish, K.O.; Elish, M.O. Predicting defect-prone software modules using support vector machines. J. Syst. Softw. 2008, 81, 649–660.
[CrossRef]
18. Wang, T.; Li, W.h. Naive bayes software defect prediction model. In Proceedings of the 2010 International Conference on
Computational Intelligence and Software Engineering, Wuhan, China, 10–12 December 2010; pp. 1–4.
Mathematics 2022, 10, 3120 24 of 26
19. Erturk, E.; Sezer, E.A. A comparison of some soft computing methods for software fault prediction. Expert Syst. Appl. 2015,
42, 1872–1879. [CrossRef]
20. Gayatri, N.; Nickolas, S.; Reddy, A.; Reddy, S.; Nickolas, A. Feature selection using decision tree induction in class level metrics
dataset for software defect predictions. In Proceedings of the World Congress on Engineering and Computer Science, San
Francisco, CA, USA, 20–22 October 2010; Volume 1, pp. 124–129.
21. Wan, H.; Wu, G.; Yu, M.; Yuan, M. Software defect prediction based on cost-sensitive dictionary learning. Int. J. Softw. Eng. Knowl.
Eng. 2019, 29, 1219–1243. [CrossRef]
22. Jin, C. Cross-project software defect prediction based on domain adaptation learning and optimization. Expert Syst. Appl. 2021,
171, 114637. [CrossRef]
23. Wang, S.; Liu, T.; Tan, L. Automatically learning semantic features for defect prediction. In Proceedings of the 2016 IEEE/ACM
38th International Conference on Software Engineering (ICSE), Austin, TX, USA, 14–22 May 2016; pp. 297–308.
24. Wang, S.; Liu, T.; Nam, J.; Tan, L. Deep semantic feature learning for software defect prediction. IEEE Trans. Softw. Eng. 2018,
46, 1267–1293.
25. Pan, C.; Lu, M.; Xu, B. An empirical study on software defect prediction using codebert model. Appl. Sci. 2021, 11, 4793.
[CrossRef]
26. Pandey, S.K.; Mishra, R.B.; Tripathi, A.K. Machine learning based methods for software fault prediction: A survey. Expert Syst.
Appl. 2021, 172, 114595. [CrossRef]
27. Akimova, E.N.; Bersenev, A.Y.; Deikov, A.A.; Kobylkin, K.S.; Konygin, A.V.; Mezentsev, I.P.; Misilov, V.E. A survey on software
defect prediction using deep learning. Mathematics 2021, 9, 1180. [CrossRef]
28. Catal, C.; Diri, B. A systematic review of software fault prediction studies. Expert Syst. Appl. 2009, 36, 7346–7354. [CrossRef]
29. Hall, T.; Beecham, S.; Bowes, D.; Gray, D.; Counsell, S. A systematic literature review on fault prediction performance in software
engineering. IEEE Trans. Softw. Eng. 2011, 38, 1276–1304. [CrossRef]
30. Hosseini, S.; Turhan, B.; Gunarathna, D. A systematic literature review and meta-analysis on cross project defect prediction. IEEE
Trans. Softw. Eng. 2017, 45, 111–147. [CrossRef]
31. Balogun, A.O.; Basri, S.; Mahamad, S.; Abdulkadir, S.J.; Almomani, M.A.; Adeyemo, V.E.; Al-Tashi, Q.; Mojeed, H.A.; Imam,
A.A.; Bajeh, A.O. Impact of feature selection methods on the predictive performance of software defect prediction models: An
extensive empirical study. Symmetry 2020, 12, 1147. [CrossRef]
32. Wang, T.; Zhang, Z.; Jing, X.; Liu, Y. Non-negative sparse-based SemiBoost for software defect prediction. Softw. Test. Verif. Reliab.
2016, 26, 498–515. [CrossRef]
33. Wu, F.; Jing, X.Y.; Sun, Y.; Sun, J.; Huang, L.; Cui, F.; Sun, Y. Cross-project and within-project semisupervised software defect
prediction: A unified approach. IEEE Trans. Reliab. 2018, 67, 581–597. [CrossRef]
34. Zhang, Z.W.; Jing, X.Y.; Wang, T.J. Label propagation based semi-supervised learning for software defect prediction. Autom.
Softw. Eng. 2017, 24, 47–69. [CrossRef]
35. Hua, W.; Sui, Y.; Wan, Y.; Liu, G.; Xu, G. Fcca: Hybrid code representation for functional clone detection using attention networks.
IEEE Trans. Reliab. 2020, 70, 304–318. [CrossRef]
36. Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference
on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543.
37. Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their
compositionality. Adv. Neural Inf. Process. Syst. 2013, 26, 3111–3119.
38. Kim, S.; Zimmermann, T.; Pan, K.; James, E., Jr. Automatic identification of bug-introducing changes. In Proceedings of the 21st
IEEE/ACM International Conference on Automated Software Engineering (ASE’06), Tokyo, Japan, 18–22 September 2006; pp.
81–90.
39. Śliwerski, J.; Zimmermann, T.; Zeller, A. When do changes induce fixes? ACM Sigsoft Softw. Eng. Notes 2005, 30, 1–5. [CrossRef]
40. Malhotra, R. A systematic review of machine learning techniques for software fault prediction. Appl. Soft Comput. 2015,
27, 504–518. [CrossRef]
41. Li, Z.; Jing, X.Y.; Zhu, X. Progress on approaches to software defect prediction. IET Softw. 2018, 12, 161–175. [CrossRef]
42. Rathore, S.S.; Kumar, S. A study on software fault prediction techniques. Artif. Intell. Rev. 2019, 51, 255–327. [CrossRef]
43. Li, N.; Shepperd, M.; Guo, Y. A systematic review of unsupervised learning techniques for software defect prediction. Inf. Softw.
Technol. 2020, 122, 106287. [CrossRef]
44. Kitchenham, B.; Brereton, O.P.; Budgen, D.; Turner, M.; Bailey, J.; Linkman, S. Systematic literature reviews in software
engineering—A systematic literature review. Inf. Softw. Technol. 2009, 51, 7–15. [CrossRef]
45. Allamanis, M.; Barr, E.T.; Devanbu, P.; Sutton, C. A survey of machine learning for big code and naturalness. ACM Comput. Surv.
(CSUR) 2018, 51, 1–37. [CrossRef]
46. Sajnani, H.; Saini, V.; Svajlenko, J.; Roy, C.K.; Lopes, C.V. Sourcerercc: Scaling code clone detection to big-code. In Proceedings of
the 38th International Conference on Software Engineering, Austin, TX, USA, 14–22 May 2016; pp. 1157–1168.
47. Kamiya, T.; Kusumoto, S.; Inoue, K. CCFinder: A multilinguistic token-based code clone detection system for large scale source
code. IEEE Trans. Softw. Eng. 2002, 28, 654–670. [CrossRef]
48. Shaukat, K.; Luo, S.; Varadharajan, V.; Hameed, I.A.; Xu, M. A survey on machine learning techniques for cyber security in the
last decade. IEEE Access 2020, 8, 222310–222354. [CrossRef]
Mathematics 2022, 10, 3120 25 of 26
49. Shaukat, K.; Luo, S.; Chen, S.; Liu, D. Cyber threat detection using machine learning techniques: A performance evaluation
perspective. In Proceedings of the 2020 International Conference on Cyber Warfare and Security (ICCWS), Virtual Event, 20–21
October 2020; pp. 1–6.
50. Algabri, R.; Choi, M.T. Target recovery for robust deep learning-based person following in mobile robots: Online trajectory
prediction. Appl. Sci. 2021, 11, 4165. [CrossRef]
51. Algabri, R.; Choi, M.T. Robust person following under severe indoor illumination changes for mobile robots: Online color-based
identification update. In Proceedings of the 2021 21st International Conference on Control, Automation and Systems (ICCAS),
Jeju, Korea, 12–15 October 2021; pp. 1000–1005.
52. Baxter, I.D.; Yahin, A.; Moura, L.; Sant’Anna, M.; Bier, L. Clone detection using abstract syntax trees. In Proceedings of the
Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272), Bethesda, MD, USA, 20–20 November
1998; pp. 368–377.
53. Zhang, J.; Wang, X.; Zhang, H.; Sun, H.; Wang, K.; Liu, X. A novel neural source code representation based on abstract syntax tree.
In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), Montreal, QC, Canada,
25–31 May 2019; pp. 783–794.
54. Allen, F.E. Control flow analysis. ACM Sigplan Not. 1970, 5, 1–19. [CrossRef]
55. Gabel, M.; Jiang, L.; Su, Z. Scalable detection of semantic clones. In Proceedings of the 30th International Conference on Software
Engineering, Leipzig, Germany, 10–18 May 2008; pp. 321–330.
56. Yousefi, J.; Sedaghat, Y.; Rezaee, M. Masking wrong-successor Control Flow Errors employing data redundancy. In Proceedings
of the 2015 5th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran, 29–29 October 2015;
pp. 201–205.
57. Wang, H.; Zhuang, W.; Zhang, X. Software defect prediction based on gated hierarchical LSTMs. IEEE Trans. Reliab. 2021,
70, 711–727. [CrossRef]
58. Alon, U.; Brody, S.; Levy, O.; Yahav, E. code2seq: Generating sequences from structured representations of code. arXiv 2018,
arXiv:1808.01400.
59. Allamanis, M.; Sutton, C. Mining source code repositories at massive scale using language modeling. In Proceedings of the 2013
10th Working Conference on Mining Software Repositories (MSR), San Francisco, CA, USA, 18–19 May 2013; pp. 207–216.
60. Kanade, A.; Maniatis, P.; Balakrishnan, G.; Shi, K. Learning and evaluating contextual embedding of source code. In Proceedings
of the International Conference on Machine Learning, Virtual Event, 13–18 July 2020; pp. 5110–5121.
61. Iyer, S.; Konstas, I.; Cheung, A.; Zettlemoyer, L. Summarizing source code using a neural attention model. In Proceedings of the
54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August
2016; pp. 2073–2083.
62. Allamanis, M.; Brockschmidt, M.; Khademi, M. Learning to represent programs with graphs. arXiv 2017, arXiv:1711.00740.
63. Bryksin, T.; Petukhov, V.; Alexin, I.; Prikhodko, S.; Shpilman, A.; Kovalenko, V.; Povarov, N. Using large-scale anomaly detection
on code to improve kotlin compiler. In Proceedings of the 17th International Conference on Mining Software Repositories, Seoul,
Korea, 29–30 June 2020; pp. 455–465.
64. D’Ambros, M.; Lanza, M.; Robbes, R. Evaluating defect prediction approaches: A benchmark and an extensive comparison.
Empir. Softw. Eng. 2012, 17, 531–577. [CrossRef]
65. Kamei, Y.; Shihab, E.; Adams, B.; Hassan, A.E.; Mockus, A.; Sinha, A.; Ubayashi, N. A large-scale empirical study of just-in-time
quality assurance. IEEE Trans. Softw. Eng. 2012, 39, 757–773. [CrossRef]
66. Wu, R.; Zhang, H.; Kim, S.; Cheung, S.C. Relink: Recovering links between bugs and changes. In Proceedings of the 19th
ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Szeged, Hungary,
5–9 September 2011; pp. 15–25.
67. Yatish, S.; Jiarpakdee, J.; Thongtanunam, P.; Tantithamthavorn, C. Mining software defects: Should we consider affected releases?
In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), Montreal, QC, Canada,
25–31 May 2019; pp. 654–665.
68. Jureczko, M.; Madeyski, L. Towards identifying software project clusters with regard to defect prediction. In Proceedings of the
6th International Conference on Predictive Models in Software Engineering, Timişoara, Romania, 12–13 September 2010; pp. 1–10.
69. Peters, F.; Menzies, T. Privacy and utility for defect prediction: Experiments with morph. In Proceedings of the 2012 34th
International Conference on Software Engineering (ICSE), Zurich, Switzerland, 2–9 June 2012; pp. 189–199.
70. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
71. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016.
72. Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [CrossRef]
73. Guo, D.; Ren, S.; Lu, S.; Feng, Z.; Tang, D.; Liu, S.; Zhou, L.; Duan, N.; Svyatkovskiy, A.; Fu, S.; et al. Graphcodebert: Pre-training
code representations with data flow. arXiv 2020, arXiv:2009.08366.
74. Phan, A.V.; Le Nguyen, M.; Bui, L.T. Convolutional neural networks over control flow graphs for software defect prediction.
In Proceedings of the 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), Boston, MA, USA,
6–8 November 2017; pp. 45–52.
Mathematics 2022, 10, 3120 26 of 26
75. Li, J.; He, P.; Zhu, J.; Lyu, M.R. Software defect prediction via convolutional neural network. In Proceedings of the 2017
IEEE International Conference on Software Quality, Reliability and Security (QRS), Prague, Czech Republic, 25–29 July 2017;
pp. 318–328.
76. Meilong, S.; He, P.; Xiao, H.; Li, H.; Zeng, C. An approach to semantic and structural features learning for software defect
prediction. Math. Probl. Eng. 2020, 2020, 6038619. [CrossRef]
77. Dam, H.K.; Pham, T.; Ng, S.W.; Tran, T.; Grundy, J.; Ghose, A.; Kim, T.; Kim, C.J. Lessons learned from using a deep tree-based
model for software defect prediction in practice. In Proceedings of the 2019 IEEE/ACM 16th International Conference on Mining
Software Repositories (MSR), Montreal, QC, Canada, 25–31 May 2019; pp. 46–57.
78. Majd, A.; Vahidi-Asl, M.; Khalilian, A.; Poorsarvi-Tehrani, P.; Haghighi, H. SLDeep: Statement-level software defect prediction
using deep-learning model on static code features. Expert Syst. Appl. 2020, 147, 113156. [CrossRef]
79. Deng, J.; Lu, L.; Qiu, S. Software defect prediction via LSTM. IET Softw. 2020, 14, 443–450. [CrossRef]
80. Liang, H.; Yu, Y.; Jiang, L.; Xie, Z. Seml: A semantic LSTM model for software defect prediction. IEEE Access 2019, 7, 83812–83824.
[CrossRef]
81. Lin, J.; Lu, L. Semantic feature learning via dual sequences for defect prediction. IEEE Access 2021, 9, 13112–13124. [CrossRef]
82. Fan, G.; Diao, X.; Yu, H.; Yang, K.; Chen, L. Software defect prediction via attention-based recurrent neural network. Sci. Program.
2019, 2019, 6230953. [CrossRef]
83. Xu, J.; Wang, F.; Ai, J. Defect prediction with semantics and context features of codes based on graph representation learning.
IEEE Trans. Reliab. 2020, 70, 613–625. [CrossRef]
84. Uddin, M.N.; Li, B.; Ali, Z.; Kefalas, P.; Khan, I.; Zada, I. Software defect prediction employing BiLSTM and BERT-based semantic
feature. Soft Comput. 2022, 26, 7877–7891. [CrossRef]
85. Lessmann, S.; Baesens, B.; Mues, C.; Pietsch, S. Benchmarking classification models for software defect prediction: A proposed
framework and novel findings. IEEE Trans. Softw. Eng. 2008, 34, 485–496. [CrossRef]
86. Mende, T.; Koschke, R. Effort-aware defect prediction models. In Proceedings of the 2010 14th European Conference on Software
Maintenance and Reengineering, Madrid, Spain, 15–18 March 2010; pp. 107–116.
87. Tufano, M.; Watson, C.; Bavota, G.; Di Penta, M.; White, M.; Poshyvanyk, D. Deep learning similarities from different
representations of source code. In Proceedings of the 2018 IEEE/ACM 15th International Conference on Mining Software
Repositories (MSR), Gothenburg, Sweden, 27 May–3 June 2018; pp. 542–553.
88. Wohlin, C.; Runeson, P.; Höst, M.; Ohlsson, M.C.; Regnell, B.; Wesslén, A. Experimentation in Software Engineering; Springer Science
& Business Media: Berlin/Heidelberg, Germany, 2012.
89. Thornton, A.; Lee, P. Publication bias in meta-analysis: Its causes and consequences. J. Clin. Epidemiol. 2000, 53, 207–216.
[CrossRef]
90. Troya, J.; Moreno, N.; Bertoa, M.F.; Vallecillo, A. Uncertainty representation in software models: A survey. Softw. Syst. Model.
2021, 20, 1183–1213. [CrossRef]