Department of Software Engineering COURSE TITLE: - Statistics and Probability Course Code: - Sweg2101

/
DEPARTMENT OF SOFTWARE ENGINEERING
COURSE TITLE: - Statistics and Probability
COURSE CODE: - SWEG2101
Name: Bantamlak Gezahagn
Section B
Id: ETS0109/12
Submitted to Mr.Ashenafi
APPLICATIONS OF STATISTICS AND PROBABLITY TO SOFTWARE
ENGINEERING.
Introduction
Software engineering is a relatively a new field of all the other disciplines and was therefore
subject to a lot of scrutiny and through-out the years since its inception has evolved steadily from
its founding days in the 1940s until today in the 2000s. In the course of its evolution applications
have continuously evolved and became more updated. The ongoing goal of providing improved
technologies and practices, seeks to improve the productivity of practitioners and the quality of
applications to users.
The software crisis was originally defined in terms of productivity, but evolved to emphasize
quality. Software engineering was spurred by the so-called software crisis of the 1960s, 1970s
and 1980s, which identified many of the problems of software development. Many software
projects ran over budget and schedule. Some projects caused property damage. A few projects
caused loss of life. Some used the term software crisis to refer to their inability to hire qualified
programmers. As to how Probability and statistics relates to this is that the disciplines of
probability and statistics have fundamentally changed the way we do science and the way we
think about our world. Probability and Statistics has many applications in the field of software
engineering starting from designing the software up to testing, through these processes and many
other areas of software engineering. These disciplines have opened the way to artificial
intelligence (AI) to be the next trending and vital field in computational world.
Though there are many areas of software engineering that use probability and statistics the
following are the most common:
Data Science
Data science is one field of software engineering which uses statistics to extract knowledge
from data in various forms whether it be automatic or semi-automatic, the data analysis fields
such as statistics, data mining, and predictive analytics, similar to knowledge discovery in
databases.
Data mining
The process of survey and analysis, by automatic or semiautomatic means, of large quantities
of data in order to realize meaningful forms and rules and is the application of statistics in the
form of probing data analysis and predictive models to expose patterns and trends in bulky
data sets. .In order to analyze data and learn from the data knowledge of statistics is essential.
Using principles of statistics we can get and generalize the data in to information using the
samples.
 Most Data Mining techniques are statistical exploratory data analysis tools.
 Understanding data and its gathering means are particularly important.
 Database sampling or cluster analysis help in reducing the dimension and size of massive
data sets.
 Statistical data picturing tools are used to aid in the analysis of massive data sets.
Data Analysis
Helps engineers in the making of decisions about several things, for example purpose of the
operation, part design characteristics, specifications and tolerances of parts, materials,
manufacturing process design, setup and tooling, working conditions, material handling, plant
layout, and workplace design. Knowing the of product manufacturing assists in the development
of an optimum manufacturing method.
Quality control
Quality control is a process by which entities review quality of their system, and process
control use statistics as a tool to manage conformance to specifications of manufacturing
processes and their products.
Process Control
Using statistics in process control also known as statistical process control (SPC) is a method
of quality control which employs statistical methods to monitor and control a process. This helps
ensure the process operates efficiently, producing more specification-conforming product with
less waste (rework or scrap). SPC can be applied to any process where the "conforming product"
(product meeting specifications) output can be measured. Key tools used in SPC include run
charts, control charts, a focus on continuous improvement, and the design of experiments. An
example of a process where SPC is applied is manufacturing lines.
Design of Experiments
Uses statistical techniques to test and construct models of engineering components and systems.
It’s also a methodology for formulating scientific and engineering problems using statistical
models.
A-B tests
A common use case in advertising, is to choose between two version of an ad or a visual feature.
The A-B method is fairly simple: divide the population randomly to two groups (A and B), show
each group a different version, and measure the difference in clicks (or any other metric).
However, in order to determine if there is a significant change statistic plays a significant role.
An AB test is an example of statistical hypothesis testing, a process whereby a hypothesis is
made about the relationship between two data sets and those data sets are then compared against
each other to determine if there is a statistically significant relationship or not.
To put this in more practical terms, a prediction is made that Page Variation #B will perform
better than Page Variation #A, and then data sets from both pages are observed and compared to
determine if Page Variation #B is a statistically significant improvement over Page Variation #A.
This process is an example of statistical hypothesis testing.
Statistical analysis is our best tool for predicting outcomes we don’t know using information
we do know.
For example, we have no way of knowing with 100% accuracy how the next 100,000 people
who visit our website will behave. That is information we cannot know today, and if we were to
wait until those 100,000 people visited our site, it would be too late to optimize their experience.
What we can do is observe the next 1,000 people who visit our site and then use statistical
analysis to predict how the following 99,000 will behave.
If we set things up properly, we can make that prediction with good accuracy, which allows us to
optimize how we interact with those 99,000 visitors. This is why AB testing can be so valuable
to businesses.
In short, statistical analysis allows us to use information we know to predict outcomes we don’t
know with a reasonable level of accuracy.
Signal Processing
Statistical signal processing (applying principles of statistics in signal processing) is an

approach to signal processing which treats signals as stochastic processes (a collection of random
variables), utilizing their statistical properties to perform signal processing tasks. Statistical
techniques are widely used in signal processing applications. For example, one can model
the probability distribution of noise incurred when photographing an image, and construct
techniques based on this model to reduce the noise in the resulting image. This is typically
accomplished using either as a Bayesian or a frequentist model.
Software Quality Assurance
The quality of data, which is to say its accuracy, must be known whenever it is to be used for
purposes of decision. This is only possible as it is produced by a valid analytical system
operating in a state of statistical control. A quality assurance program should be established,
consisting of quality control of the analytical system and quality assessment of the data that are
produced. Data quality objectives should be established for every measurement situation and
the accuracy attained must be within these limits. Ideally, the attained accuracy should exceed
the required accuracy by a factor of three, at a minimum. The estimation of attained accuracy is
best made using reliable reference materials. When they are not available, spikes may be used
with lesser confidence. No matter what estimation techniques are used, decisions must be made
on the basis of statistical tests of significance. The evaluation of accuracy is a continuing
operation and facilitated by the use of appropriate control charts. In general Statistical SQA is
a technique that measures the quality in a quantitative fashion. It implies that information about
defects is collected and categorized and an attempt is made to trace each defect to underlying
cause. It uses pareto principle to identify vital causes (80% of defects can be traced to 20% of
causes) and moves to correct the problems that have caused the defects. Example: a statistical
technique known as Error Index (EI) is used to develop an overall indication of improvement
in software quality. The EI is computed as follows:
Let
 Ei – the total number of errors uncovered during the ith step in the SE process
 Si – number of serious errors
 Mi – number of moderate errors
 Ti – number of minor errors
 PS - product size at the i-th step
 Ws,Wm,Wt – weighting factors for serious, moderate, and minor errors
Recommended values for these are 10, 3, 1 respectively.
At each step of the software process a phase index is computes as:
PI = Ws(Si/Ei)+Wm(Mi/Ei)+Wt(Ti/Ei)
Software Reliability
Is another very important quality factor defined as probability of failure free operation of a
computer program in a specified environment for a specified time. For example, a program X
can be estimated to have a reliability of 0.96 over 8 elapsed hours.
Software reliability can be measured, directed and estimated using historical and
development data. The key to this measurement is the meaning of term failure. Failure is
defined as non-conformance to software requirements.
Software safety
Is a software SQA activity that focuses on finding potential hazards that may affect software
negatively and cause and entire system to fail.

Modeling and analysis
Process is conducted as part of software and hazards are identified and categorized by
criticality and risk.
Once system level hazards are found, analysis techniques are used to assign severity, and
probability of occurrence, similar to risk analysis effective depends on software being analyzed
in the context of the entire system. Once hazards are identified and analyzed, safety related
requirements can be specified for the software. Reliability and safety are closely related.
Software reliability uses statistical techniques to determine the likelihood that a software
failure will occur. Occurrence of a software failure doesn’t necessarily result in a hazard or
mishap. On the other hand, software safety examines the ways in which failures result in
conditions that can lead to mishap.
Time and methods engineering
Uses statistics to study repetitive operations in manufacturing in order to set standards and find
optimum (in some sense) manufacturing procedures.
Time and methods engineering uses statistics to study repetitive operations in manufacturing in
order to set standards and find optimum (in some sense) manufacturing procedures.
System identification
The field of system identification uses statistical methods to build mathematical models of
dynamical systems from measured data. System identification also includes the optimal design
of experiments for efficiently generating informative data for fitting such models as well as
model reduction.
Error-correction
Error detection is the detection of errors caused by noise or other impairments during
transmission from the transmitter to the receiver.
Error correction is the detection of errors and reconstruction of the original, error-free data.

Department of Software Engineering COURSE TITLE: - Statistics and Probability Course Code: - Sweg2101

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Department of Software Engineering COURSE TITLE: - Statistics and Probability Course Code: - Sweg2101

Uploaded by

Copyright:

Available Formats

/

DEPARTMENT OF SOFTWARE ENGINEERING

COURSE TITLE: - Statistics and Probability

COURSE CODE: - SWEG2101

Name: Bantamlak Gezahagn

This process is an example of statistical hypothesis testing.

Statistical signal processing (applying principles of statistics in signal processing) is an

Software Quality Assurance

At each step of the software process a phase index is computes as:

negatively and cause and entire system to fail.

Time and methods engineering

optimum (in some sense) manufacturing procedures.

transmission from the transmitter to the receiver.

You might also like