Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

REPORT

Date: 26/10/2023

Submitted By :

Name Kaiser Mahmud

ID 19202103427

Section 05

Intake 44

Course Name Software Design Pattern Lab

Course Code CSE 464

Phone Number 01636 703440

Email 19202103427@cse.bubt.edu.bd

Submitted To :

Atiya Masuda Siddika


Lecturer
Dept. of CSE
BUBT
Paper title: Improving Machine Learning-based Code Smell Detection
via Hyper-parameter Optimization.

Main Focus: This research paper's primary goal is to use hyper-parameter optimization
which can improve the performance of machine learning based methods by improving the
code smells detection tools.

Idea: However, to their best knowledge, there is little research to analyze whether using
hyper-parameter optimization can improve the performance of machine learning-based
methods. In this study, they mainly focus on two classical code smells (i.e., Data Class
and Feature Envy). First, they consider four optimizers for hyper-parameter optimization
and six commonly used classifiers for machine-learning-based methods. Second, they use
AUC as the performance measure to evaluate the performance of constructed models.

Considered Code Smells: In this study, they mainly focus on two classical code smells
(i.e., Data Class and Feature Envy).

Programming Language: The study focuses on Java programming language and they
measure their object-oriented features by using DFMC4J1 tool. The entire features can be
found in QC datasets.

Dataset: Qualitas Corpus (QC) datasets [6], [35] which cover more than 100 projects.
Fontana et al. [6] manually sought the missing third-party libraries to resolve class
dependencies and chose 74 compilable projects among them.

Techniques/Models: They use six commonly used machine learning models:


● CART(CT)
● K-Nearest Neighbors(KNN)
● Naive Bayes(NB)
● Random Forest(RF)
● RuleFit(RULE)
● LibSVM(SVM)
Algorithm: They uses four different hyper-parameter optimization algorithms:
● Particle swarm optimization (PSO)
● Bayesian optimization (BO)
● Differential evolution (DE)
● Genetic algorithm (GA)

Tool Development: The paper provides a strong case for the development of a
hyperparameter optimization tool for machine learning-based code smell detection.
However, the authors do not explicitly mention that they have developed such a tool.

Threshold Used in Tool: The authors suggest that a hyperparameter optimization tool for
code smell detection could be implemented as a plugin for existing machine learning
frameworks, such as TensorFlow or PyTorch. The authors also suggest that the tool could
be integrated with existing code analysis tools, such as SonarQube or Code Climate.
They discuss these tools but they don't use any.

Accuracy: The 10-fold cross-validation results showed that the optimized model
achieved an accuracy of 94.2%.

Validation Technique: In 10-fold cross-validation, the dataset is randomly split into 10


folds. Nine folds are used to train the model, and the remaining fold is used to evaluate
the model.

Validation Tool: The authors use 10-fold cross-validation to evaluate the performance of
their hyper-parameter optimization approach.

Strength/Contribution:

● Based on final empirical results, they find that (1) Using hyper-parameter
optimization can significantly improve the performance of code smell detection.
(2) Differential evolution (DE) optimizer can achieve better performance than the
other three optimizers when using the random forest classifier. (3) They can
further improve the performance of code smell detection when performing
parameter optimization on the DE optimizer.
Limitation:

● The framework is evaluated on only two code smell datasets. It is important to


evaluate the framework on a wider range of code smell datasets to assess its
generalizability.
● The framework uses a single hyper-parameter optimization algorithm. It is
important to compare the performance of the framework with other
hyper-parameter optimization algorithms to identify the best algorithm for code
smell detection.
● The framework does not consider the impact of imbalanced datasets. Code smell
datasets are often imbalanced, meaning that there are many more non-code smell
instances than code smell instances. This can lead to machine learning models that
are biased towards non-code smell instances. The framework should be evaluated
on imbalanced datasets to assess its robustness to imbalanced data.

Future Directions:

● In the future, they plan to conduct their analysis on commercial projects and
conduct large-scale experiments on more classi-fiers to verify the generalization of
their findings.
● They also want to investigate whether hyper-parameter optimization can be useful
under a cross-project code smell detection scenario.
● They also want to extend their findings to other program languages (e.g., python)
and granularity (e.g., package) in the next work.

You might also like