Ebug Final

PREDICTING SOFTWARE DEFECT COMPLEXITY AND
ACCURACY USING BUG TRACKING AND CLUSTERING
Project Domain : Machine Learning
STUDENT NAME Project Supervisor:

Reg no:XXXXXX NAME,
Designation
ABSTRACT
• Many open sources, free and commercial bug tracking tools have been developed and are
currently under development.
• There are number of issues are related to software projects are daily increasing and the
developers are started to use bug tracking systems in that order to manages the bug reports.
• The industry needs that the criteria to select the best system tool among the available set of
system tools which will helps to fix and track the progressive report of bug fixes.
• While, collection of useful information from the large and not organized set of there reports is
still dif ficult problem because there are various bug tracking systems are provide the data via
many resources like web interfaces.
• We try to present these comprehensive classif ication criteria to manage the reviews for
available tools and propose a new modified tool for the bug tracking and reporting system.
• It also helps in reporting the bugs which are founded by that process, assigning the bug to the
developer for monitoring and f ixing the progress of bug f ixing by various graphical/charting
facility and status updates.
• In our project we will be using Support Vector Machine (SVM) as existing and Convolution Neural
Network (CNN) as proposed. From the results its proved that proposed Convolution Neural
Network (CNN) works better than existing SVM.
INTRODUCTION TO DOMAIN
• Machine Learning
• In the statistical context, Machine Learning is def ined as an application of artif icial
intelligence where available information is used through algorithms to process or assist
the processing of statistical data.
• While Machine Learning involves concepts of automation, it requires human guidance.
Machine Learning involves a high level of generalization in order to get a system that
performs well on yet unseen data instances.
• Machine learning is a relatively new discipline within Computer Science that provides a
collection of data analysis techniques. Some of these techniques are based on well
established statistical methods (e.g. logistic regression and principal component
analysis) while many others are not.
• Most statistical techniques follow the paradigm of determining a particular probabilistic
model that best describes observed data among a class of related models.
• Similarly, most machine learning techniques are designed to f ind models that best f it
data (i.e. they solve certain optimization problems), except that these machine learning
models are no longer restricted to probabilistic ones.
INTRODUCTION TO PROJECT
• The main purpose of this project Bug Tracking System project is to deal with providing online support
to the software engineers who are facing the bugs or errors in software technologies.
• This project can maintain project details, developer details and tester details. Bug Tracking System is
the system which enables to detect the bugs.
• It does not f ind the bugs but provides the full information regarding bugs detected. Bug Tracking
System allows the user of it who wants to know about a provide information to the identif ied bugs.
The engineers develop the project as per client requirements. The tester will identify the bugs in the
testing phase.
• Whenever the tester facing number of bugs then he adds the bug id and information in the database.
The tester informs to project manager and developer.
• The bug details in the database table are accessible to project manager and developer. When a client
puts request or orders for a product to be developed.
• The system/Project manager is responsible to adding users in the Bug Tracking System and assigning
projects to the users. This project provides bug information includes the bug id, bug name, bug
priority, project name, bug location, bug type.
OBJECTIVE OF PROJECT
• To find out the Ebug in efficient manner.
• Less time consumption.
• User friendly model.
• Should be applicable to all datasets.
LITERATURE SURVEY 1
• An Eye Tracking Research on Debugging Strategies towards Dif ferent
Types of Bugs Fei Peng; Chunyu Li; Xiaohan Song; Wei Hu; Guihuan Feng
• Publisher: IEEE 2022
• . In this paper, the experiments conducted on 20 participants suggest
that there do exist dif fe rences on the eye movement data of those
successful and failed debugging samples.
• Specif ically, concerning data f low bugs, it is benef icial to pay attention to
the changes of variables, Nevertheless, it is more important to watch the
code and understand their logical structure when dealing with control
flow bugs.
• We believe it can help programmers f ind defects more ef ficiently by
combining this conclusion and the error message provided by the
compiler.
LITERATURE SURVEY 2
• The Eclipse and Mozilla defect tracking dataset: A genuine dataset for
mining bug information Ahmed Lamkanfi; Javier Pérez; Serge Demeyer
• This paper we propose the Eclipse and Mozilla Defect Tracking Dataset, a
representative database of bug data, f iltered to contain only genuine
defects (i.e., no feature requests) and designed to cover the whole bug-
triage life cycle (i.e., store all intermediate actions).
• We have used this dataset ourselves for predicting bug severity, for
studying bug-f ix ing time and for identifying erroneously assigned
components.
• Sharing these data with the rest of the community will allow for
reproducibility, validation and comparison of the results obtained in bug-
report analyses and experiments.
LITERATURE SURVEY 3
• Feature Ranking and Aggregation for Bug Triaging in Open-
Source Issue Tracking Systems Anjali Goyal; Neetu Sardana
• This paper presents a methodology to rank the non-textual bug
parameters using feature ranking and aggregation techniques.
• The presented methodology has been evaluated on four open-
source systems, namely, Mozilla Firefox, Eclipse, GNome and
Open Office.
• From the experimental evaluation, it has been observed that
the ranking of bug parameters is consistent among the
different open-source projects of Bugzilla repository.
LITERATURE SURVEY 4
• A bug Mining tool to identify and analyze security bugs using Naive
Bayes and TF-IDF Diksha Behl; Sahil Handa; Anuja Arora
• This paper focuses on security bug and presents a bug mining system
for the identif ic ation of security and non-security bugs using the
term frequency-inverse document frequency (TF-IDF) weights and
naïve bayes.
• We performed experiments on bug report repositories of bug
tracking systems such as bugzilla and debugger.
• In the proposed approach we apply text mining methodology and TF-
IDF on the existing historic bug report database based on the bug s
description to predict the nature of the bug and to train a statistical
model for manually mislabeled bug reports present in the database.
LITERATURE SURVEY 5
• A Bug Rule Based Technique with Feedback for Classifying
Bug Reports Tao Zhang; Byungjeong Lee
• We propose a bug rule based classif ic ation technique to
categorize bug reports.
• By utilizing developer feedback mechanism in the technique,
it distinguishes duplicate and valid bug reports and is
expected to improve the accuracy of bug reports retrieval.
• Fi nal l y, we show the feasi bi l i t y of thi s techni qu e i n
experiment and case study.
EXISTING SYSTEM
• Support Vector Machine (SVM):
• To detect an ideal hyperplane for dif fe rent distinct examples in a high
dimensional space is the main process of the SVM. To fulfill this model there
is more than one hyperplane.
• This process depends upon the bolster vector which the information that
lies nearest on the closed surface and coordinating with the ideal choice
surface.
• It performs classif ic ation by planning the input vectors into a high
dimensional space and constructing the hyperplane to separate the data.
This strategy is mainly used to solve a quadratic programming problem and
non-convex, unconstrained minimization problem.
• The SVM is the most effective method in the classifier process
DISADVANTAGE OF EXISTING SYSTEM
• Highly manual.
• Data can be lost or destroyed
• It is difficult to update, delete and view data.
• Maintaining and retrieving the record of Users is difficult.
• No check of source address.
• Time consumption is high.
• Cannot be implemented in all datasets.
PROBLEM STATEMENT
 Less accuracy in prediction of Ebug.

 Cannot be implemented in all datasets.
 Time consuming process.
 Complex model.
PROPOSED SYSTEM
• Convolution Neural Network:
• In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep
neural networks, most commonly applied to analyzing visual imagery.
• They are also known as shift invariant or space invariant artif icial neural networks (SIANN),
based on their shared-weights architecture and translation invariance characteristics.
• CNNs are regularized versions of multilayer perceptrons. Multilayer perceptrons usually
mean fully connected networks, that is, each neuron in one layer is connected to all
neurons in the next layer.
• The "fully-connectedness" of these networks makes them prone to overfitting data. Typical
ways of regularization include adding some form of magnitude measurement of weights to
the loss function.
• CNNs take a dif ferent approach towards regularization: they take advantage of the
hierarchical pattern in data and assemble more complex patterns using smaller and simpler
patterns.
• Therefore, on the scale of connectedness and complexity, CNNs are on the lower extreme.
PROPOSED SYSTEM ADVANTAGES
• No manual.
• Data cannot be lost or destroyed
• It is easy to update, delete and view data.
• Maintaining and retrieving the record of Users is easy.
• Time consumption is low.
• Can be implemented in all datasets.
ARCHITECTURE DIAGRAM
USECASE DIAGRAM
MODULES
1. Input dataset
2. Analysis of size of data set.
3. Oversampling.
4. Training and Testing.
5. Apply algorithms.
6. Predict results.
MODULES DESCRIPTION
1. Input dataset:
• Dataset can be taken from online data source provider from
the internet sources. We have to collect a huge dataset in
volume so as to predict the accuracy in an efficient manner.
2. Analysis of data set:
• Here the analysis if dataset takes place. The size of data is
taken into consideration for the data process.
3. Oversampling (Using SMOTE): we have created a detailed
history of all Ebug that is been happened over a long
duration.
MODULES DESCRIPTION
4. Training and Testing Subset: As the dataset is imbalanced,
many classif iers show bias for majority classes. The features
of minority class are treated as noise and are ignored. Hence
it is proposed to select a sample dataset.
5. Applying algorithm: Following are the methods used to test
the sub-sample dataset.
• a. Support Vector Machine (SVM)
• b. Convolution Neural Network (CNN)
6. Predicting results: The test subset is applied on the trained
model .The metrices used is accuracy. The accuracy Curve is
plotted and the desirable results are achieved.
HARDWARE REQUIREMENTS
• Processor: Core I5 Processor.

• Ram: 4 GB RAM
• Hard Disk: 500 G.B Hard Disk
• 14 inch monitor
SOFTWARE REQUIREMENTS
• Technology : Python
• IDE : Python IDE
CONCLUSION
• In this project we have reviewed on the technologies which are being
used for finding and improving bug tracking system.
• Further we have introduced dif ferent techniques used to implement
them.
• Present methods include database server, SQL and admin information.
• Later on we use Bug host, Bug herd, Mantis Bug Tracker, Bug Zilla and
Bug genie are used for comparative study in terms of accuracy and
storing of structure data into the database etc.
• This comparison will help us in building our system more convenient
and useful.
• From the research we have proposed the system which will predict
time required for particular task.
REFERENCES
[1] Current challenges in automatic software repair . Goues, Claire Le, Forrest,
Stephanie and Weimer, Westley. New York: Springer Science+Business Media,
2019.
[2] Yuan Tian, David Lo, Chengnian Sun, "Information Retrieval Based Nearest
Neighbour Classification for FineGrained Bug Severity Prediction", WCRE, 2012,
2013 20th Working Conference on Reverse Engineering (WCRE), 2019 20th
Working Conference on Reverse Engineering (WCRE) 2019, pp. 215-224,
[3] Shaffiei, Zatul Amilah, Mudiana Mokhsin, and Saidatul Rahah Hamidi. "Change
and Bug Tracking System: Anjung Penchala Sdn. Bhd." Change 10.3 (2020).
[4] C. D. Manning, P. Raghavan, and H. Schtze, Introduction to Information
Retrieval. New York, NY, USA: Cambridge University Press, 2018, pp. 232–233.
[5] S. Robertson, H. Zaragoza, and M. Taylor, “Simple BM25 Extension to Multiple
Weighted Fields,” in Proceedings of the thir teenth ACM international
conference on Information and knowledge management, 2020, pp. 42–49.

Ebug Final

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ebug Final

Uploaded by

Copyright:

Available Formats

PREDICTING SOFTWARE DEFECT COMPLEXITY AND

ACCURACY USING BUG TRACKING AND CLUSTERING

Project Domain : Machine Learning

STUDENT NAME Project Supervisor:

 Less accuracy in prediction of Ebug.

• Processor: Core I5 Processor.

You might also like