Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

SOFTWARE METRICS

Data Collection
Data Analysis

06/11/2024
“We can not make good decisions with bad data”

What Is Good Data?

06/11/2024
Some characteristics of good data :

Correctness: The data should be collected according to the


exact rules of definition of the metric.
Accuracy: The difference between the data and the actual
value.
Precision: Precision deals with the number of decimal places
needed to express the data.
Consistent: Data should be consistent from one measuring
device or person to another.
Timed: If the data are associated with a particular activity,
then it should be time stamped

06/11/2024
How To Define The Data?
Direct Measurement
The measurement of an attribute which does not depend
on the measurement of any other attribute.
Example:
Length of a program is measured by counting the lines of
code.

Indirect Measurement
The measurement of an attribute which depends on the
measurement of one or more attributes.
Example:
module defect density = (number of defect)/(module
size)

06/11/2024
Process
Process Derived
Product
Product Data
Raw Data extraction
Refine Data Analysis attribute
Resource
Resource collection values

Direct Measurement Indirect Measurement

06/11/2024
Terminology
 An error is the human mistake that causes fault.

 A fault occurs when a human error results a mistake in a


software product. That is, the fault is the encoding of
human error. Faults represent problems that the developer
sees.
 A failure is the inability of a system or component to
perform required function according to its
specification. Failures represent the problems that the user
sees.

06/11/2024
What do we need to record of a problem?

 Location: Where did the problem occur?

 Timing: When did it occur?

 Symptom: What was observed?

 End result: Which consequences resulted?

 Mechanism: How did it occur?

 Cause: Why did it occur?

 Severity: How much was the user affected?

 Cost: How much did it cost?

06/11/2024
Example
In the 1980s, problem with a radiation-therapy machine were discovered in East
Texas Cancer Center. The machine administrated two types of radiation therapy:
x-ray and electron.

The failure report was look like:


Location: East Texas Cancer Center in USA
Timing: March 21, 1980
Symptom: “Malfunction 54” appeared on screen
End result: Strength of beam too great by a factor of 100
Mechanism: Heat generation and control unit was not working properly.
Cause: Unintentional design fault
Severity: Critical, as injury to Mr. Cox was fatal
Cost: $600000

06/11/2024
Classification of failure based on severity

 Catastrophic: Failures involve the loss of one or more lives.


 Critical: Failures cause serious permanent injury but no life loss.
 Significant: Failures cause light injury
 Documentation: Failures result no personal injury and there is no
safety issue

06/11/2024
Changes

 Location: The module where changes are made.

 Timing: When change was made.

 Symptom: Type of change.

 End result: Success of change or other testing.

 Mechanism: How and by whom change was performed.

 Cause: Corrective, adaptive or preventive.

 Severity: Impact on rest of system

 Cost: Time and effort for change implementation.

06/11/2024
How to collect data
Manual data collection: Manager, systems analysts, programmers,
testers and users must collect raw data on forms. This manual recording is
subject to bias, error, omission, and delay. Unfortunately, in many instances,
there is no alternative to manual data collection.

Automatic data collection: Automatic data capture is therefore


desirable, and sometime essential, such as in recording the execution time of
real software.

We should remember the following points at the time of data


collection:

 Keep procedure simple


 Avoid unnecessary recording
 Train staff in the need to record data.
 Validate all collected data at a central collection point.

06/11/2024
When to collect data and how to store data

 Data collection planning must begin when project planning begins. The
actual data collection takes place during many phases of development.
For example, some data relating to project personnel can be collected at
the start of project (for example, qualification or experience) while other
data collection, such as effort, begins at project start and continues
through operation and maintenance.

 Raw software-engineering data should be stored on a database, setup


using a database management system (DBMS). An automated tool for
organizing, storing, and retrieving data, a DBMS has many advantages
over both paper records and computer stored ‘flat’ file.

06/11/2024
Data analysis and terminology
To perform the analysis , we use statistical techniques to describe
the distribution of attribute values, as well as the relationship
between or among attributes. Like –

 How many very high values are there in the data?


 How many very low values are there in the data?
 How to data progress from low to high?

Population: A population includes all of the elements from a set of


data.

Sample: A sample consists one or more observations drawn from


the population.

Sampling: A sampling method is a procedure for selecting sample


elements from a population.

06/11/2024
Data analysis techniques

 Box plots
 Scatter plots
 Control charts
 Measures of association
 Robust correlation
 Linear regression
 Robust regression
 Multivariate regression

06/11/2024
The nature of the data

Normal Distribution

Skewed Distribution

Non-normal Distribution

06/11/2024
Purpose of the experiment

There are two major reasons to conduct a formal


investigation, whether it be an experiment, case
study, or survey:

 To conform a theory

 To explore a relationship

06/11/2024
Decision Tree

06/11/2024
Box Plots
A box plot depict the summary of the range of a set of data. It
shows that where most of the data are clustered and is there any
outlier data or not.

Upper quartile

lower quartile
Upper tail

lower tail
u l
Box length (d) = u-l
Upper tail = u+1.5d
Lower tail = u-1.5d

median

06/11/2024
Box Plots

System MOD 88 61 43
A 15 16
B 43
C 61
D 10
E 43
F 57
G 58
H 65
I 50 75 50 25 0
J 60
K 50 51
L 96
M 51
N 61
P 32
Q 78 10,15,32,43,43,48,50,50,51,57,58,60,61,,61,65,78,96
R 48

06/11/2024
Scatter Plots
Box plot shows information about one variable, a scatter plot depicts the
relationship between two variables.

06/11/2024
Control Charts
Control chart helps us to see when our data are within acceptable
bounds. If it is out of bounds then we can take action to prevent
problems before they occur.

06/11/2024
Measures of association
Scatter plots examine the behavior of two attributes , and
sometime we can determine that two attributes are related.
A change in one attribute seems usually to provoke a
predictable change in the other, but we do not know for
certain that similar change will take place in the future.

r=

represents the mean of x


represents the mean of y

 If r is 1 then x and y have a perfect positive linear relationship


 If r is 1 then x and y have a perfect positive linear relationship
 If r is 0 then there is no relationship

06/11/2024
Thank You

06/11/2024

You might also like