STATISTICS 1 - CAL Edited

STATISTICS LEVEL 1
Speaker Name
ST Context 2
• STatS is a project started in 2012 under the sponsorship of PQE

Management.
• To permit to ST to reach Best-in-class by introduction of innovative

statistical tools and methodologies at Company level
• The main goals of STatS are review, rationalize and improve the
effectiveness of Statistical methodology in general.
• STatS is intended to continuously improve our detection capability

through the adoption of an advanced statistical approach, and to
reduce DPPM (Defective Parts Per Million), thru an innovation of the
statistical techniques deployed in ST manufacturing.
• To drive and support the deployment and correct application of the

Statistics Manuals in all ST manufacturing plants
ST Restricted
Statistics Learning 3
Statistics level 1 SPC 1
Statistics level 2 SPC 2
Measurement
Statistical
System
Model Building
Analysis (MSA)
Design of
Multivariable
Experiments
Statistics
(DOE)
ST Restricted
Why this course? 4
• To provide the fundamentals of statistics
• To answer current statistical questions in everyday work
• To produce more accurate/effective statistical analysis
ST Restricted
Training purpose
5
• Describe the difference between population and sample
• Describe properly some important features of a sample using both graphical

methods (histogram, bar diagram, box plot …) and numerical methods
(indices of position and spread like average, median, percentiles, standard
deviation, variance…).
• To understand what is the meaning of random variable and to become familiar

with the normal distributions.
• To compare samples descriptively and to interpret a Normal Probability Plot

(NPP).
ST Restricted
5
Benefits 6
• To Analyze information at the proper level

and take the right decision accordingly
ST Restricted
6
Let’s get to know each other… 7
Round table:
• Name
• Organization
• Are you already using statistical methodology?
• If so, what are the main applications?
• Expectations from the course
ST Restricted
Pre-test 8
• Complete the questionnaire to the best of your knowledge
• This is not an individual ‘control’ test
It will allow us to have an idea of your level of knowledge

about the subject prior to the training (so, if you don’t
know, don’t worry …)
We will re-do the questionnaire at the end of the course to

measure the learning that has taken place
10 minutes
ST Restricted
Structure of the course
ST Restricted
Structure of the course/Agenda 10
Welcome Day 1 (9:00 – 18:00) Day 2 (9:00 – 13:00)

Module 1 Module 4
• Introduction • Descriptive Indices for the
• First Concepts association between two variables
• Population VS Sample Module 5
• Descriptive VS Inferential
• Parameter VS Statistic • Random Variables
• Theoretical Distributions
Module 2 • For Continuous Variables
• Types of data • For Discrete Variables
Module 3 Conclusion
• Graphical Presentation of Data
• Presentations for Numeric and Categorical Data
• Presentations for One Sample or for Two Samples
Module 4
• Descriptive Indices for
• Location
• Spread
ST Restricted
Module 1: Introduction
ST Restricted
Module 1 objectives 12
• At the end of this chapter, you will be able to:
• Know why we need Statistics

• Describe the difference between Population and Sample
• Describe which are the main parts of Statistics
ST Restricted
Why we need Statistics? 13
Dealing with Uncertainty

Everyday decisions are based on incomplete information
Consider:
• The yield of a certain product will be higher in six months than it is now.
• If the number of attendees of this course is as high as predicted, the

rate of good statistical reports will increase by 10% in the next 6 months.
ST Restricted
Why we need Statistics? 14
… because of uncertainty, the previous

statements should be modified:
• The yield of a certain product is likely to be higher in six months than it
is now.
• If the number of attendees of this course is as high as predicted, it is

probable that the rate of good statistical analyses will increase by 10%
in the next 6 months.
Statistics helps us assessing HOW likely an event is.
ST Restricted
The Decision Making Process 15
Decision Making Process Steps
Begin Here: Tools: what helps in each step

Identify the Data
Problem.
Descriptive Statistics,
Probability, Computers
Information
Inferential Statistics,
Experience, Theory,
Literature, Computers
Knowledge
Decision
ST Restricted
Key Definitions 16
• A population is the collection of all items of interest or under

investigation
• N represents the population size
• A sample is an observed subset of the population

• n represents the sample size
• A parameter is a specific characteristic of a population
• A statistic is a specific characteristic of a sample
Provide some examples of: Population, Sample, Parameter and statistic.
ST Restricted
Key Definitions 17
• A sample is drawn from a population. The most important feature

of a sample is its ability to represent as much as possible the
entire population.
A good sample is said “representative” of the population.
Several techniques exist to help drawing a representative sample.
For example, Simple Random Sampling is a procedure in which:

• each member of the population is chosen strictly by chance,
• each member of the population is equally likely to be chosen,
and
• every possible sample of n objects is equally likely to be chosen
The resulting sample is called a random sample
ST Restricted
Descriptive and Inferential Statistics 18
Two branches of Statistics:
•Descriptive statistics
• Collecting, summarizing, and processing data to transform data into information
•Inferential statistics
• provide the bases for predictions, forecasts, and estimates that are used to
transform information into knowledge
ST Restricted
POPULATION
SAMPLE
N=10,000 Sampling n=500
(True) Mean=?
Average=56.2
Descriptive statistics
From 500 sample data, we calculate the average.
We might also generate some graphs.
Possible Error →calculation.
Inferential statistics
Actually, we are interested in something referred to the entire population.
Not only about a description of the sample values. → We can ESTIMATE - for example
- the value of the mean of the entire population.
Error →Inferential Error: the sample will never represent all the population 100%.
ST Restricted
Descriptive
•Collect data
• e.g., Survey
•Present data
• e.g., Tables and Graphs
•Summarize data
• e.g., Sample mean =
 X i
ST Restricted
Inference
•Estimation
• e.g., Estimate the population mean
weight using the sample mean weight
•Hypothesis testing
• e.g., Test the claim that the population
mean weight is 120 pounds
Inference is the process of drawing conclusions or making
decisions about a population based on sample results
ST Restricted
Module 1 Key Learning’s 22
• Decision making process
• Incomplete information in decision making
• Description of Simple Random Sampling
• Key definitions:
• Population vs. Sample
• Parameter vs. Statistic
• Descriptive vs. Inferential statistics
ST Restricted
Module 2: Types of Data
ST Restricted
• Identify the different types of data we deal with

• Identify the different levels of measurement
ST Restricted
Types of Data: Variables Classification 25
Variable
Numerical Categorical
Continuous Discrete
ST Restricted
Types of Data: Classification Details 26
Variable
Continuous Discrete
Quantitative variables. They can be every value on the scale

Continuous of real numbers and can be ordered.
EXAMPLE: Oxide Thickness, Bond Line Thickness, Resistance
Quantitative variables. They can be every value on the scale

Discrete of integer numbers and can be ordered.
EXAMPLE: Number of Good Dice/Wafer
ST Restricted
Types of Data: Classification Details 27
Variable
Continuous Discrete
Qualitative variables. Their “values” usually belong to different,

mutual exclusive, categories. They are not numbers.
EXAMPLES:
- Visual Inspection. Possible Results → Good/No-Good
- Survey. Possible results → (1) Excellent (2) Good (3) Fair (4) Poor
Provide some examples of: Numerical (Continuous & Discrete)

and Categorical variables.
ST Restricted
Levels of Measurement 28
Differences between measurements, true zero exists.

EXAMPLES: Thickness, Height, Age, Income,…
Numerical
Differences between measurements, but no true zero.
EXAMPLE: Temperature in Fahrenheit or Celsius.
Ordered Categories (rankings, order, or scaling)

Ordinal Data EXAMPLE: Satisfaction/Quality Rating
Categorical
Categories (no meaningful ordering or direction)
Nominal Data EXAMPLE: Type of car owned, Marital Status,…
ST Restricted
Activity 29
• Classify the following variables into the types previously seen:

Variable 1 Variable 2 Variable 3 Variable 4 Variable 5 Variable 6
Process _ Potting Resin Thickness Final Test
Defectivity RDS_ON Number of cycles
Equipment (mm) Equipment
FPT01_H1 19
614 TFX_10 344.6 6
FPT01_H2 6
630 TFX_10 331.5 0
FPT02_H1 6
623 TFX_11 321.4 11
FPT02_H2 16
612 TFX_07 330.6 3
FPT01_H1 15
658 TFX_07 219.9 9
FPT01_H2 5
662 TFX_08 311.4 16
FPT02_H1 4
638 TFX_09 389.9 0
FPT02_H2 6
630 TFX_09 322.7 1
VARIABLE TYPE? 5 minutes

ST Restricted
Answer 30
Variable 1 Variable 2 Variable 3 Variable 4 Variable 4 Variable 5

Process _ Potting Resin Thickness Final Test
Defectivity RDS_ON Number of cycles
Equipment (mm) Equipment
FPT01_H1 19
614 TFX_10 344.6 6
FPT01_H2 6
630 TFX_10 331.5 0
FPT02_H1 6
623 TFX_11 321.4 11
FPT02_H2 16
612 TFX_07 330.6 3
FPT01_H1 15
658 TFX_07 219.9 9
FPT01_H2 5
662 TFX_08 311.4 16
FPT02_H1 4
638 TFX_09 389.9 0
FPT02_H2 6
630 TFX_09 322.7 1
Categorical Numerical Numerical Categorical Numerical Numerical

Nominal Continuous Discrete Nominal Continuous Discrete
VARIABLE TYPE
ST Restricted
• Types of variables
• Levels of Measurement
ST Restricted
Module 3: Graphical Presentation
of Data
ST Restricted
• Create and interpret graphs to describe categorical variables: frequency

distribution, bar chart, pie chart, Pareto diagram
• Create a line chart to describe time-series data
• Create and interpret graphs to describe numerical variables: frequency distribution,
histogram
• Construct and interpret graphs to describe relationships between variables: Scatter
plot, cross table
• Create and interpret graphs to describe two numerical variables: double histogram
(numerical ), side-by-side Bar Chart (categorical
ST Restricted
Graphical Presentations 34
• Raw Data are usually not easy to use for

decision making
• Some type of organization is needed

• Table
• Graph
Consider that:
The choice of the proper graph to use depends

on the variable being summarized
ST Restricted
Types of Variables VS Types of Graphs 35
For…
categorical variables… numerical variables…
you can use: you can use:
❑ Frequency distribution ❑ Line chart

❑ Bar chart ❑ Frequency distribution
❑ Pie chart ❑ Histogram and ogive
❑ Pareto diagram ❑ Stem-and-leaf display
❑ Scatter plot
ST Restricted
For Categorical Data
Tabulating Data Graphing Data
Bar Chart
Frequency Distribution Table
Pie Chart
Pareto Diagram
ST Restricted
Frequency Distribution Table 37
WHAT IS IT?
A Frequency Distribution Table is a simple method to evaluate the
distribution of the frequencies associated to the possible values of a
variable of interest.
If the variable is categorical, it is very easy to determine the values to

include in the first left column of the table. In fact, all the possible
characters of the variable are considered.
To each character, different types of frequencies can be included in the

table: FREQUENCY, RELATIVE FREQUENCY and CUMULATIVE
FREQUENCY. Moreover, each one can be expressed as a percentage.
ST Restricted
Frequency Distribution Table 38
EXAMPLE: For a certain product and in a certain interval of time, 200 devices did
not pass electric tests on 3 different failure modes (BIN2, BIN6 and BIN8). The results
are summarized in the following Frequency Distribution Table:
NOTE: the categorical variable is “The Failure Mode”. The categories are BIN2, BIN6 and BIN8.
Category Relative Cumulative Cumulative

Frequency Frequency %
(BIN) Frequency Frequency Frequency %
BIN 2 78 78/200=0.390 39% 0.390 39%
0.390+0.245=
BIN 6 49 49/200=0.245 24.5% 39+24.5=63.5%
0.635
BIN 8 73 73/200=0.365 36.5% 0.635+0.365=1 63.5+36.5=100%
TOTAL 200 1 100% (--) (--)
ST Restricted
Bar Chart 39
WHAT IS IT?
• Bar charts are graphs used for qualitative (category) data.

• They are made up with bars associated to the frequencies
contained in frequency distribution table (vertical axis) against the
categories of the considered variable (horizontal axis).
• The Height of the bars shows the frequency or percentage for each
category.
• It may be helpful to display the frequency distribution table under
the chart.
ST Restricted
Bar Chart 40
Category
Frequency
(BIN) Bar-chart (data from the previous example)
BIN 2 78 On the vertical axis, are shown the frequencies
BIN 6 49
BIN 8 73
TOTAL 200
ST Restricted
Bar Chart 41
Category
Frequency % Bar-chart (data from the previous example)
(BIN)
BIN 2 39%
On the vertical axis, are shown the frequencies %
BIN 6 24.5%
BIN 8 36.5%
TOTAL 100%
How to generate bar chart in JMP
ST Restricted
Pie Chart 42
WHAT IS IT?
• Presentations of the frequencies or percentages

• Pie charts are often used for qualitative (category) data.
• Sizes of pie slices show the frequency or percentage for each category.
ST Restricted
Pie Chart 43
EXAMPLE of Pie-chart (data from the previous example)
How to generate Pie chart in JMP

ST Restricted
Pareto Diagram 44
WHAT IS IT?
• Used to portray categorical data.
• A bar chart, where categories are shown in descending order

of frequency.
• A cumulative polygon is often shown in the same graph.
• Used to separate the “vital few” (the categories whose

frequencies% sum to about 80%) from the “trivial many” (the
remaining categories).
ST Restricted
Pareto Diagram 45
EXAMPLE of Pareto diagram (data from the previous example)

On the vertical axis, are shown the cumulative frequencies
How to generate Pareto Plot in JMP

ST Restricted
For Numerical Data
Time series Frequency & Variables

Cumulative association
Line chart
frequency Scatter plot
Tabulating Data Graphing Data

Histogram & ogive
Stem & leaf
ST Restricted
Line Chart 47
WHAT IS IT?
• A line chart shows the values of one or more variables over time.
• If more variables are plotted on the same graph, the comparative
behavior can be investigated to show trends, differences, cyclic
patterns etc.
• If the points are a statistic (e.g. an average), the points can be
replaced by a box-plot in order to show the spread.
• Time is measured on the horizontal axis.
• The variable of interest is measured on the vertical axis.
NOTE: this plot is helpful for categorical and for discrete variables as well.
ST Restricted
Line Chart 48
Line chart : some examples
ST Restricted
Frequency Distribution (Numeric Data) 49
WHAT IS IT?
A frequency distribution for a numeric variable is a list or a table
containing class groupings (categories or ranges within which the
data fall) and the corresponding frequencies with which data fall
within each class or category.
A frequency distribution is a way to summarize data.
The distribution condenses the raw data into a more useful form and
allows for a quick visual interpretation of the data.
ST Restricted
Frequency Distribution (Numeric Data) 50
For numerical variables, the determination of the classes it is not an obvious task
like in the case of categorical variables. In fact, in this case the classes are not
“naturally” defined by all the possible characters of the variable. Here, they are
chosen arbitrarily (i.e. subjectively) in a non-unique way.
In the frequency distribution table, we might include: relative and relative %
frequencies, cumulative and cumulative % frequencies.
A generic Frequency Distribution Table to group n observation

Class Frequency Relative Relative Cumulative Cumulative
Frequency Frequency % Frequency Frequency %
CLASS 1
CLASS 2
TOTAL n 1 100% (--) (--)
ST Restricted
Frequency Distribution 51
EXAMPLE given the following group of 10 raw data,

build a Frequency Distribution Table.
75.4786
min = 73.7692, max=75.6101
74.6043
74.5925
73.7692 A possible (non-unique) way of grouping is:
raw data 74.3453 CLASS 1 → 73.0000 but less than 74.0000
74.4622 CLASS 2 → 74.0000 but less than 75.0000
74.815
CLASS 3 → 75.0000 but less than 76.0000
74.0306
75.6101
74.0489

Class Frequency Relative Frequency
CLASS 1 1 0.1
CLASS 2 7 0.7
CLASS 3 2 0.2
TOTAL 10 1
ST Restricted
Frequency Distribution 52
NOTES
1. Grouping data has clear interpretative advantages but as a result some detail is
lost (in fact, bins are also called “classes of equivalence” → all the observations
grouped in the same bin are considered equivalent. This implies that, once the
groups are formed, it will not be possible anymore to discriminate between
observations that belong to the same group). See also, Stem & Leaf Display.
2. Class limits must be chosen in order to guarantee mutually exclusive
classes, i.e. each observation can be included in one and only one class.
ST Restricted
Number of Classes (K) 53
To define the number of classes (k), you might use the following thumb-rule:
Number of Number of
observations (n) classes (k)
n < 50 5-7
50  n 100 7-8
101  n  500 8-10
501  n  1000 10-11
1001  n  5000 11-14
n > 5000 14-20
Alternatively, other known rules are:

- K = square root of n
- K = 1+(10/3)log(n) For sample sizes n<100
ST Restricted
Class Width (W) 54
To generate K equally sized (*) classes (i.e. of uniform width), the width W of
each class is given by:
max - min
W=
K
Where, max and min are the largest and the smallest sample values respectively.
(*) NOTE: it is possible to generate a histogram with unequal class widths but its interpretation is
different since the bar heights are not enough to “catch” the relative importance of each class.
For details, see ADCS 8482919_A, §6.1.1.1 CASE B, page 24/400.
ST Restricted
Histogram 55
WHAT IS IT?
• A graph of the data in a frequency distribution is called a histogram.
• The interval endpoints are shown on the horizontal axis.
• the vertical axis is either frequency, relative frequency, or
percentage.
• Bars of the appropriate heights are used to represent the number of
observations within each class (no gaps between bars are allowed).
• A minimum number of 30-40 observations is required to obtain
interpretable results.
• Histograms are used to study shape (e.g. symmetry), location and
spread of the data.
ST Restricted
Histogram 56
75.4786 74.4622
74.6043 74.8150
EXAMPLE build an histogram for the following data: 74.5925 74.0306
73.7692 75.6101
74.3453 74.0489
STEP 1: chose the classes (number and width): CLASS 1 → 73.0000 but less than 74.0000
STEP 2: create a frequency distribution table: Class Frequency Relative Frequency

CLASS 1 1 0.1
CLASS 2 7 0.7
CLASS 3 2 0.2
STEP 3: Draw the histogram for frequency or relative frequency:

frequency Relative frequency
7 0.7
2 0.2
1 0.1
73 74 75 76 classes 73 74 75 76 classes
ST Restricted
Histogram 57
QUESTIONS
1. How wide should each interval be?
(How many classes should be used?)
2. How should the endpoints of the intervals be determined?

• Often answered by trial and error, subject to user judgment
• The goal is to create a distribution that is neither too "jagged" nor
too "blocky”
• Goal is to appropriately show the pattern of variation in the data
ST Restricted
Histogram 58
Many (Narrow class intervals) 3.5

3
•
2.5
may yield a very jagged distribution with gaps
Frequency
2
from empty classes 1.5
• Can give a poor indication of how frequency 1

0.5
varies across classes 0
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
More
Temperature
12
Few (Wide class intervals) 10
Frequency
8
• may compress variation too much and yield a 6
blocky distribution 4
• can obscure important patterns of variation. 2
0
0 30 60 More
Temperature
(X axis labels are upper class endpoints)
ST Restricted
Histogram Interpretation 59
• The shape of the distribution is said to be symmetric if the

observations are balanced, or evenly distributed, about the center.
Symmetric Distribution
10
9
8
7
Frequency
6
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9
How to generate Histogram in JMP

ST Restricted
Histogram Interpretation 60
The shape of the distribution is said to be skewed if the

observations are not symmetrically distributed around the center.
Positively Skewed Distribution
A positively skewed distribution 12
10
(skewed to the right) has a longer tail 8
Frequency
that extends to the right in the 6
4
direction of positive values. 2
0
1 2 3 4 5 6 7 8 9
Negatively Skewed Distribution

A negatively skewed distribution
12
(skewed to the left) has a longer tail 10
that extends to the left in the direction 8

Frequency
6
of negative values. 4
0
1 2 3 4 5 6 7 8 9
ST Restricted
Ogive 61
Use of the cumulative frequency distribution

75.4786 74.4622
74.6043 74.8150
EXAMPLE build an ogive for the following data: 74.5925 74.0306
73.7692 75.6101
74.3453 74.0489
Class Upper Interval Endpoint Cumulative Frequency %

73 but less than 74 74 10
Cumulative frequency %
100
80
10
74 75 76 Interval endpoint
ST Restricted
Stem-and-Leaf Display 62
WHAT IS IT?
• It is a simple way to see distribution details in a data set

METHOD: Separate the sorted data series into
leading digits (the stem) and
trailing digits (the leaves)
• For example, the number 45 is the union of a stem which is the

tens digit (4) and a leaf which is ones digit (5).
ST Restricted
Stem-and-Leaf Display 63
EXAMPLE build an ogive for the following data:

X values X sorted
75.4786 74.9129 71.0224 74.5601
STEM and LEAF definition:
74.6043 74.5601 72.4336 74.5925 The stems are the integer part of the values and the
74.5925 75.5037 73.1728 74.6043
73.7692 73.1873 73.1873 74.6455 leaves are the approximation to the first decimal
74.3453 71.0224 73.7567 74.6521
74.815
digit. So, for example, the value 75.4786 will be
74.4622 73.9911 73.7692
74.815 73.8043 73.8043 74.8574 formed by a stem=75 and by a leaf=5
74.0306 74.0654 73.9911 74.9129
75.6101 74.1178 74.0306 74.9785
74.0489 75.2165 74.0381 75.0494
74.3663 76.537 74.0489 75.1362
The Stem-and-Leaf plot
75.7653 74.4765 74.0654 75.2165 LO|71,02
74.5463 74.0991 74.0991 75.2846 71|
75.2975 71|
75.8547 74.6521 74.1178
72|4
75.0494 76.8578 74.1965 75.4592
72|
74.0381 73.1728 74.2735 75.4786 73|11
75.4592 74.3207 74.3161 75.5037 73|7789
74.9785 74.2735 74.3207 75.6101 74|000001123333344
75.2846 75.1362 74.3449 75.7653 74|55556668899
74.1965 75.7757 74.3453 75.7757 75|0122244
73.7567 76.0959 74.3663 75.8547 75|56778
74.3161 74.6455 74.4622 76.0823 76|00
76.0959 76|58
74.5273 74.8574 74.4765
75.2975 76.0823 74.5273 76.537
72.4336 74.3449 74.5463 76.8578
stems leaves
ST Restricted
Presentations for Two Variables 64
• Graphs illustrated so far have involved only a single variable.

• When two variables exist, other techniques are used.
Presentations for 2 Variables
Scatter Plots Cross Tables
ST Restricted
Scatter Plot 65
WHAT IS IT?
• Scatter Plots are used for paired observations(*) taken from
two numerical variables
• The Scatter Plot:
• one variable is measured on the vertical axis and the other
variable is measured on the horizontal axis.
(*) NOTE
The values plotted are: Pi=(Xi,Yi) i=1,…,n (i.e. p1=(X1, Y1), P2=(X1,Y2),…,Pn=(Xn, Yn))
ST Restricted
Scatter Plot 66
Scatter Plots are very informative tools. For 2 variables X and Y, they permit to assess for example:
Association between X and Y Type of Association

No association Association Linear Other
Y Y
X X
Direction of the Association Presence of Outliers
Y Positive Negative Y
Outliers
X X
How to generate Scatterplot in JMP

ST Restricted
Double Histograms 67
Histograms can be produced to compare the distributions of two numerical

variables. In the example below, the histograms of the variables X and Y are plotted
on the same graph.
X
25
20
15
10
frequency
5
0
5
10
15
20
25
345 346 347 348 349 350 351 352 353 354 355
Y
NOTE: For other graphical methods like this, see ADCS 8482819_A - §6.2.1.1
ST Restricted
Cross Tables 68
WHAT IS IT?
• Cross Tables (or contingency tables) list the number of
observations for every combination of values for two categorical or
ordinal variables
• If there are r categories for the first variable (rows) and c

categories for the second variable (columns), the table is called an
r x c cross table
ST Restricted
Cross Tables 69
EXAMPLE
4 x 3 Cross Table for:
Time dedicated to 4 types of Task (rows) by 3 Operators (columns)
Task \ Operator Operator A Operator B Operator C TOTAL

Type 1 46.5 55.0 27.5 129.0
Type 2 32.0 44.0 13.0 95.0
Type 3 15.5 20.0 19.0 49.0
Type 4 16.0 28.0 7.0 51.0
TOTAL 110.0 147.0 67.0 324.0
ST Restricted
Side by Side Bar Chart 70
EXAMPLE
Using the data from the previous example, produce a “side by side bar chart”
Time
Operator A
60
Operator B
50
Operator C
40
30
20
10
Task Type 1 Task Type 2 Task Type 3 Task Type 4
How to generate Side by Side chart in JMP

ST Restricted
Activity 71
• Open the file TRAINING DATAStat1_Cal.xlsx

• Consider column F, “Measurement Equipment”. It contains the names of 4 Tester
equipment . Each equipment have been used a variable number of times. The total
number of times the group of equipment have been used is 544. We want to assess
the distribution of usage of the different equipment
• Generate the proper graphical representation
ST Restricted
Answer 74
• Measurement Equipment – Barchart
• From excel, copy the column Measurement
Equipment to JMP data table
• Go to JMP, File > Edit > Copy with
Column name
• Follow the JMP routine on creating
Bar Chart
ST Restricted
Answer 75
• Measurement Equipment - Piechart
• From excel, copy the column Measurement
Equipment to JMP data table
• Go to JMP, File > Edit > Copy with
Column name
• Follow the JMP routine on creating
Pie Chart
ST Restricted
Activity 76

• Consider column B, “BALL SHEAR”. It contains 205 measurements of Ball Shear for
DFN8_Cu on FWB1106
ST Restricted
Answer 78
• THICKNESS Histogram
• From excel, copy the column BALL SHEAR to
JMP data table
• Click Analyze > Distribution
• Follow the JMP routine on creating Histogram
ST Restricted
Activity 79

• Consider column G, “Relative Humidity(%)” and column H, “Tensile Strength”. They
contain 1000 values.
• Generate a scatterplot for these two variables and interpret it.
ST Restricted
Answer 80
Scatterplot of “Relative Humidity (%)” VS. “Tensile Strength”

• From excel, copy the 2 columns to JMP data table
• Click Graph > Scatterplot Matrix
• Follow the JMP routine on creating Scatter Plot
INTERPRETATION
the graphical analysis of the graph shows:
• The two variable seem not correlated
ST Restricted
• Graphical Presentation for One Variable

• Frequency Distribution Table
• Bar Chart
• Pie Chart
• Pareto Diagram
• Line Chart
• Histogram
• Stem and Leaf
• Graphical Presentation for Two Variables

• Scatter Plot
• Double Histograms
• Cross (Contingency) Table
• Side by Side Bar Chart
ST Restricted
Module 4: Descriptive Indices
ST Restricted
At the end of this chapter, you will be able to:
• Calculate and interpret the main “Indices of Location”

• Calculate and interpret the main “Indices of Spread”
• Interpret the main “Indices of association”
• Recognize a symmetrical distribution
ST Restricted
Descriptive Indices 84
• Graphical Presentations are very useful in describing data

• GOAL: help transforming data into information
• Provide an overall, quick, and intuitive idea about the behavior of processes
• In some cases suffer of subjectivity (interpretation and some steps in their creation)
• Are easy to build
• Can be useful for both numerical and categorical data
• Numerical Descriptions of Data provide complementary information

• GOAL: the same as for graphical presentations
• Provide information on specific aspects of the considered data (e.g. location and spread)
• Thanks to their numerical nature, they do not suffer of subjectivity
• Are easy to calculate
• Are used for numerical data
ST Restricted
Symbols 85
When referring to descriptive indices, conventional symbols are used:
Letters of the Latin alphabet for Sample Statistics, and letters of the
Greek alphabet for Population Parameters.
IF SAMPLE IF POPULATION
INDEX
STATISTIC PARAMETER
MEAN 𝑋ത 𝜇 (or other, e.g. 𝜆)
STANDARD DEVIATION 𝑆 𝜎
VARIANCE 𝑆2 𝜎2
PROPORTION 𝑃 𝜃
NOTE: when a sample statistic is used to estimate an unknown population parameter, this is
indicated by the symbol ^ . For example: 𝑋ത = 𝜇Ƹ (read: «𝑋ത is the estimated value of 𝜇»).
ST Restricted
Descriptive Indices Outline 86
INDEX PROVIDES INFORMATION ABOUT

Mean
Median
Data Location (on a scale)
Mode
Percentiles
Range
Variance Data Variation (Variability or Spread)
Standard Deviation
Skewness Shape of data distribution

Kurtosis (compared to the normal distribution)
Covariance
Association between variables
Correlation Coefficient
ST Restricted
Measures of Location 87
AVERAGE MEDIAN MODE PERCENTILE
n
 xi x1 + x 2 +  + x n
i =1
x= =
n n
80% of the
values
80th percentile
ARITHMETIC AVERAGE. CENTRAL VALUE OF THE MOST FREQUENTLY VALUE GRATER THAN
ORDERED SAMPLE (OR, OBSERVED VALUE A CERTAIN % OF THE
THE 50th PERCENTILE). OBSERVATIONS
ST Restricted
Mean 88
• The Average or mean is the most common measure of central tendency
• For a population of N values:

N
x i
x1 + x 2 +  + x N Population values
μ= i=1
=
N N
Population size
• For a sample of size n:
n
x i
x1 + x 2 +  + x n Observed values
x= i=1
=
n n Sample size
ST Restricted
Mean 89
• The most common measure of central tendency
• Mean = sum of values divided by the number of values
• Affected by extreme values (outliers)
Outlier
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20
= =3 = =4
5 5 5 5
ST Restricted
Median 90
MEDIAN
• Given an ordered (ascending order) sample of size n:

• The median is preceded and followed by 50% of the sample values
• The median occupies the 0.5(n+1)th position of the sample. This is only the
position of the median in the ordered sample, NOT its value.
• If n is odd, the median is the central observation.
• If n is even, the median is the average of the two central observations.
• If compared to the average, the median is more robust to outliers.
ST Restricted
Median 91
“The median is more robust to outliers than the average”.
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Outlier
Median = 3 Median = 3
Average = 3 Average = 4
ST Restricted
MODE Mode 92
• A measure of central tendency
• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical data
• There may be no mode
• There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
ST Restricted
Mode 93
Bimodal distribution (2 modes)

Distribuzione Simmetrica Distribuzione Simmetrica
Frequency 10 10
9 9
8 8
7 7
Frequenza
Frequenza
6 6
5 5
4 4
3 3
2 2
1 1
0 0
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 18 i 29 j 3k 4l 5m 6n 7o 8p 9q
Classes
ST Restricted
Percentile 94
DEFINITION
A percentile is a value for which a certain proportion of data falls above and
below it.
“The pth percentile is a value, Y(p), such that at most (100p)% of the
measurements are less than this value and at most 100(1- p)% are greater. The
50th percentile is called the median. Percentiles split a set of ordered data into
hundredths. For example, 70% of the data should fall below the 70th percentile”.
From: NIST/SEMATECH e-Handbook of Statistical Methods - http://www.itl.nist.gov/div898/handbook/
ST Restricted
Quartile 95
QUARTILES
The quartiles are percentiles which divide the ordered sample in 4 parts containing
each the same amount of data. The 3 quartiles are generally indicated by Q1, Q2,
and Q3.
25% 25% 25% 25%
Q1 Q2 Q3
given an ordered sample,

Q1, the first quartile, is the observation whose value is greater than 25% of the values of the whole sample
(and smaller than 75%). It occupies the 0.25 (n+1)th position of the ordered sample.
Q2, the second quartile (or median), is the observation whose value is greater (and smaller) than 50% of
the values of the whole sample. It occupies the 0.5 (n+1)th position of the ordered sample.
Q3, the third quartile, is the observation whose value is greater than 75% of the values of the whole sample
(and smaller than 25%). It occupies the 0.75 (n+1)th position of the ordered sample.
ST Restricted
Measures of Variation 96
RANGE INTERQUARTILE VARIANCE STANDARD

RANGE DEVIATION
Measures of variation give

information on the spread or
variability of the data values.
Same center,
different variation
ST Restricted
Range 97
RANGE
It is the difference between the largest and the smallest observations:
RANGE = Xmax - Xmin
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
RANGE = 14 - 1 = 13
ST Restricted
Range 98
“The Range does not consider the distribution of the observations”.
7 8 9 10 11 12 7 8 9 10 11 12
RANGE = 12 - 7 = 5 RANGE = 12 - 7 = 5
“The Range is sensitive to the presence of outliers”.

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
RANGE = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
RANGE = 120 - 1 = 119
ST Restricted
Interquartile Range 99
Can eliminate some outlier problems by using the interquartile

range (IQR):
Eliminate high- and low-valued observations and calculate the range

of the middle 50% of the data
Interquartile range = 3rd quartile – 1st quartile

IQR = Q3 – Q1
ST Restricted
Box-Plot 100
Using the definitions of Quartiles and IQR, it is possible to create a very useful
graphical presentation: the box-plot, also called “box and whiskers plot”.
The elements needed to generate a box-plot are:
1. A “box”, defined by the IQR → it includes the central 50% of the observations
2. A line within the box, corresponding to the median (in addition, also the mean can be shown)
3. Two lines called “whiskers”, with length defined as follows:
• Upper Whisker (UW): if max(x)<Q3+1.5IQR => UW=max(x). Otherwise, UW=Q3+1.5IQR
• Lower Whisker LW): if min(x)>Q1-1.5IQR => LW=min(x). Otherwise, LW=Q1-1.5IQR
Observations larger than Q3+1.5IQR OR smaller than Q1-1.5IQR are plotted outside the
whiskers and suspected to be outliers.
BOX
Lower Whisker - LW Q2 Upper Whisker - UW
(Median)
25% 25%
Outliers? IQR Outlier?

Q1 Q3
Values
ST Restricted
Box-Plot 101
EXAMPLES
X minimum X maximum
• LW = X min
• UW = X max
• No outliers
X minimum Q3+1.5IQR
• LW = X min
• UW = Q3 + 1.5IQR
• 1 suspected outlier
Q1-1.5IQR X maximum
• LW =Q1 - 1.5IQR
• UW = X max
X minimum Q3+1.5IQR
• LW = Q1 - 1.5IQR
• UW = Q3 + 1.5IQR
ST Restricted
Box-Plot 102
To compare two or more groups of data, it is helpful to display several

box-plots on the same graph.
EXAMPLE
GROUP “A”
GROUP “B”
Values
ST Restricted
Variance 103
VARIANCE
Average of squared deviations of values from the mean
2 σ𝑁
𝑖=1(𝑋𝑖 −𝜇)
2 (𝑋1 −𝜇)2 +(𝑋2 −𝜇)2 + ⋯+(𝑋𝑁 −𝜇)2
• For a population of N values: 𝜎𝑥 = =
𝑁 𝑁
2 σ𝑛 ത 2
𝑖=1(𝑋𝑖 −𝑋)
ത 2 +(𝑋2 −𝑋)
(𝑋1 −𝑋) ത 2 + ⋯+(𝑋𝑛 −𝑋)
ത 2
• For a sample (*) of size n: 𝑠𝑥 = =
𝑛−1 𝑛−1
Where: 𝜇 = population mean

𝑋ത = sample mean
N = population size
n = sample size
xi = ith value of the variable X
ST Restricted
Standard Deviation 104
STANDARD DEVIATION
Square root of average of squared deviations of values from the mean
σ𝑁
𝑖=1(𝑋𝑖 −𝜇)
2 (𝑋1 −𝜇)2 +(𝑋2 −𝜇)2 + ⋯+(𝑋𝑁 −𝜇)2
• For a population of N values: 𝜎𝑥 = =
𝑁 𝑁
σ𝑛 ത 2
𝑖=1(𝑋𝑖 −𝑋)
ത 2 +(𝑋2 −𝑋)
(𝑋1 −𝑋) ത 2 + ⋯+(𝑋𝑛 −𝑋)
ത 2
• For a sample of size n: 𝑠𝑥 = =
𝑛−1 𝑛−1
Where: 𝜇 = population mean

𝑋ത = sample mean
N = population size
n = sample size
xi = ith value of the variable X
ST Restricted
• Most commonly used measure of variation
• Shows variation about the mean
• It is sensitive to outliers
• Has the same units as the original data
ST Restricted
EXAMPLES
X = 15.5
Case A S = 3.338
11 12 13 14 15 16 17 18 19 20 21
X = 15.5
Case B S = 0.926
11 12 13 14 15 16 17 18 19 20 21
X = 15.5
Case C S = 4.570
11 12 13 14 15 16 17 18 19 20 21
ST Restricted
Advantages of Variance & Standard Dev. 107
• Each value in the data set is used in the calculation
• Values far from the mean are given extra weight

(because deviations from the mean are squared. This, on the other
hand, makes the standard deviation and the variance highly sensitive
to the presence of outliers)
ST Restricted
Coefficient of Variation 108
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more sets of data measured in

different units
𝑠
𝐶𝑉 = 100%
𝑥ҧ
ST Restricted
Coefficient of Variation 109
EXAMPLE – comparison of the variability of two production lots
• LOT1:
• Average oxide thickness = 500
• Standard deviation = 15
s 15
CVLOT1 = 100% = 100% = 3%
x 500
Both Lots have the
• LOT2: same standard
deviation, but Lot 2
• Average oxide thickness = 650 is less variable
• Standard deviation = 15 relative to its larger
thickness
s 15
CVLOT2 = 100% = 100% = 2.3%
x 650
ST Restricted
Measures of Shape 110
SKEWNESS KURTOSIS
Compares the “asymmetry” of Compares the shape of the

a distribution with the Normal “peak” of a (unimodal)
distribution. distribution with the shape of the
peak of a Normal distribution.
NOTE: in this course, the measures of shape are only mentioned. More details, are
included in the course “Statistics Level 2”.
ST Restricted
Skewness – measure of asymmetry 111
SKEWNESS measures “the extent of the lack of symmetry” of a distribution, compared to

the normal distribution ( perfectly symmetrical and with skewness equal to zero).
Different shapes according to different values of skewness
SKN(X) = 0
SKN(X) > 0 SKN(X) < 0
ST Restricted
Kurtosis – measure of peakedness 112
KURTOSIS (Departure from the Shape of the Peak of a Normal)
Different shapes according to different values of kurtosis
KUR(X) > 0
KUR(X) = 0
KUR(X) < 0
ST Restricted
Asymmetry Mean and Median 113
In a symmetrical distribution, the mean and the median are the same value
Position of MEAN and MEDIAN in case of symmetry/asymmetry of the distribution.
Symmetry Positive asymmetry

Negative asymmetry
Average = Median
Average < Median Average > Median
ST Restricted
Asymmetry and Box-Plot 114
In a symmetrical distribution,
◼ the mean and the median are the same value
◼ Q1 is distant from Q2 the same as Q2 is distant from Q3, i.e. (Q2-Q1) = (Q3-Q2)
It is possible to check both the two conditions using a box-plot:
(Q2-Q1) > (Q3-Q2) => non-symmetrical
Q1 Q2 Q3
(Q2-Q1) = (Q3-Q2) => symmetrical
Q1 Q2 Q3
Indicating the mean with the symbol “ +”:

+ Mean = Median (Q2) => symmetrical
Q1 Q2 Q3
ST Restricted
Indices for the association between variables 115
To study the association between 2 variables, two indices can be considered :
• The covariance
• The coefficient of correlation
ST Restricted
Covariance 116
Use it to study the linear relationship between two random variables .
In particular, given two variables X1 and X2, this index provides information about:
• The existence of a linear relationship between X1 and X2.
• The direction of the relationship.
N
∑( X 1i - μ X1 )( X 2i - μ X 2 )
For a population of size N: Cov ( X 1 , X 2 ) = σ X Y =
i =1
1 2
N
n
∑( X 1i - X )( X 2i - Y )
i =1
For a sample of size n: Cov ( X 1 , X 2 ) = s X1 X 2 =
n -1
INTERPRETATION
cov(X,Y) < 0  negative linear relationship

cov(X,Y) = 0  no linear relationship
cov(X,Y) > 0  positive linear relationship
Since the covariance varies in the - to + interval, this index is not of great help in assessing
the intensity (or strength) of the linear association between the variables.
ST Restricted
Graphical interpretation Covariance 117
Y’ Positive linear relationship
(Xi − mX )  0 COV (X,Y) > 0

and
(Yi − mY )  0 (Xi − mX )  0
and
COV (X,Y) < 0 (Yi − mY )  0
mY P
X’
COV (X,Y) > 0 COV (X,Y) < 0
( Xi − mX )  0 ( Xi − mX )  0
and and
( Yi − mY )  0 ( Yi − mY )  0
Negative linear relationship
mX X
ST Restricted
Pearson’s Correlation coefficient - r 118
Use it to study the linear relationship between two random variables.

In particular, given two variables X1 and X2, this index provides information about:
• The existence of a linear relationship between X1 and X2.
• The direction of the relationship.
• The intensity or strength of the linear relationship.
Cov( X 1 , X 2 )
For a population of size N: ρ = corr ( X 1 , X 2 ) =
σ X1 σ X 2
Cov( X 1 , X 2 )
For a sample of size n: r = corr ( X 1 , X 2 ) =
s X1 s X 2
INTERPRETATION
corr (X1, X2) = -1  perfect negative linear relationship

-1 < corr (X1, X2) < 0  (different intensities of) negative linear relationship
corr (X1, X2) = 0  no linear relationship
0 < corr (X1, X2) < 1  (different intensities of) positive linear relationship
corr (X1, X2) = 1  perfect positive linear relationship
ST Restricted
Pearson’s Correlation coefficient - r 119
Correlation coefficient values for different degrees of association between variables X1 and X2.
X2 X2
A B
A Perfect negative linear correlation.
B Perfect positive linear correlation.

r = -1 X1 r=1 X1
X2
X2 D
C
C No linear correlation.
D No linear correlation.
r=0 X1 r=0 X1
E X2 F
X2
E Negative linear correlation (weak).
F Positive linear correlation (mild).
r = -0.32 X1 r = 0.58 X1
ST Restricted
Pearson’s Correlation coefficient 120
EXAMPLE
Evaluate the extent of the correlation between two electrical parameters: Isat and Vt of a MOS transistor in
C045 nm technology. A sample of 225 couples of values (Isat and Vt) is drawn. Data are summarized in the
following table (for brevity, only the first and last 5 rows of the original data-set are shown in the table.
However, the analysis was performed on the entire data-set).
PIDS04L006LS PVT06L004LS
LOT_WAFER_SITE
(µA/µm) (V)
The Data-set Q135WEZ_10_1 -323.5 -0.3588
Q135WEZ_10_2 -322.5 -0.3727
Q135WEZ_10_3 -311.6667 -0.3468
Q135WEZ_10_4 -325.75 -0.3543
Q135WEZ_10_5 -333.5 -0.3448
() () ()

Q135WEZ_9_5 -321 -0.3707
Q135WEZ_9_6 -284.1667 -0.3823
Q135WEZ_9_7 -305.3333 -0.3912
Q135WEZ_9_8 -319.8333 -0.3645
Q135WEZ_9_9 -329.8333 -0.3482
ST Restricted
Pearson’s Correlation coefficient 121
EXAMPLE (continuation)
Scatter plot of PI DS 04L006LS v s PV T06L00 4LS
-260
-270
Scatter plot of data

PIDS04L006LS (uA/um)
-280
-290
-300
-310
-320
-330
-340
-350
-360
-0,43 -0,42 -0,41 -0,4 -0,39 -0,38 -0,37 -0,36 -0,35 -0,34 -0,33 -0,32 -0,31
PVT06L 004LS (V)
PIDS04L006LS (mA/mm) PVT06L004LS (V)

Correlation matrix PIDS04L006LS (mA/mm) -0.7405
PVT06L004LS (V) -0.7405
Pearson’s correlations is -0.7405. This suggests a strong linear relation between

the variables. Since the correlation coefficient is negative, as one variable
increases, the second decreases (negative correlation).
ST Restricted
Notes on Pearson’s Correlation coefficient 122
❑ The correlation coefficient is a unit-free index whose value must lie between -1 and +1
inclusive. For this reason, in addition to the existence and direction of the relationship, this
index provides information on the intensity of the linear relationship between two variables.
❑ Pearson correlation coefficient assumes that the two considered variables jointly form a
bivariate normal distribution. This aspect will be explained in the course “Statistics Level
2”, where also alternative approaches ( in case this assumption is not true) is considered.
❑ A value of +1 would result if all the points could be connected by a straight line with a
positive slope.
❑ A value of -1 would occur if all the points could be connected by a straight line with a
negative slope. Neither extreme case could be expected to occur in practice, however.
❑ The intensity of the linear relation between X and Y is higher as the correlation gets closer to
either  1.
❑ If the random variables X and Y are independent, then the correlation coefficient is 0.
However, the converse is not true, since only the linear relationship is detectable by the
correlation coefficient (for example, the relationship may be quadratic).
ST Restricted
Activity 123

• Consider the column B, “Ball Shear”.
• Use the statistical package to calculate all the statistics explained in this module
60 minutes
ST Restricted
Answer 124
JMP
ST Restricted
Activity 125

• Consider the column B, “Ball Shear”.
• Use the statistical package to generate a boxplot and interpret the result
60 minutes
ST Restricted
Answer 126
Graphical evidence of a symmetrical distribution
ST Restricted
• Measures of location • Measures of spread

• Mean • Skewness and Kurtosis
• Median • How to use a Box-Plot to check if a
• Mode distribution is symmetrical
• Percentile
• Quartile • Measures of association
• Covariance
• Measures of spread • Pearson’s correlation Coefficient
• Range
• Interquartile Range
• Variance
• Standard Deviation
• Coefficient of variation
ST Restricted
Module 5: Random Variables
ST Restricted
At the end of this chapter, you will be able to:

• Discriminate between discrete and continuous random variables
• Describe the main properties of distribution functions and cumulative distribution
functions
ST Restricted
Random Variable 130
WHAT IS IT?
Statistics and Mathematics are not the same! However, they often use the same terms.
But with different meanings…
In Algebra, a variable is an unknown quantity. Usually, the problem consists in finding out its
value. For example, given the equation 12-3x=0, we can find that x=4.
In Statistics, a Random Variable is different…

X is called a “variable” because its value can “vary” within a set of possible value. X can take
on any of those values and… randomly. That’s why X is called a random variable.
ST Restricted
Random Variable 131
Lowercase VS Capital letters
Conventionally,
• A random variable is given a capital letter, e.g. X, Y, Z, W,…

• Values (that a random variable can take on) are given lowercase letters, e.g. {x1, x2, …, xn}
for the random variable X. {y1, y2, …, yn} for the random variable Y, and so on.
So, we write for example: X={x1, x2, …, xn} and read “the random variable X can take on the
values x1, x2, …, xn”.
ST Restricted
Random Experiment & Random Variable 132
DEFINITIONS
(1) Random Experiment

• A Random Experiment is the activity needed to collect data on a specific aspect of a process
(“process”, here, is in general, not only referred to a “production process”. To toss a coin is a
process…). The outcome of the experiment can be either:
• The result of a measurement process (e.g. “the thickness of a layer” or the result of a “pull
test in wire bonding”)
• A count (e.g. “number of failed dice in a wafer”)
• The experiment is called random, since it takes on different values according to some random
mechanism and its results cannot be predicted in any trial. The idea of randomness is that, the
value of the random variable will vary from trial to trial as the experiment is repeated according to
the inherent variability of the considered process.
(2) Random Variable

• A Random Variable is the set of possible values from a Random Experiment. Or, in other terms,
a Random Variable takes values which are the outcome of a Random Experiment.
• The set of possible values is called the Sample Space.
ST Restricted
Continuous & Discrete Random Variables 133
Results of a “pull test” Measures of thickness
Continuous Random Variables

take on any value in an
interval, including fractions
and decimals.
Count of “good” dice in a wafer (or at FT)
Discrete Random Variables

take on integer (a countable list of
distinct) values only. They never
include fractions or decimals.
ST Restricted
Probability Model 134
Statistics helps us providing probability models.

DEFINITIONS
(1) Probability Model
It is a mathematical model that relates a value of a random variable with the probability of occurrence of
that value in the population.
(2) Probability Distribution

It is the probability model associated to a discrete random variable
(3) Probability Density Function

It is the probability model associated to a continuous random variable
ST Restricted
Probability Distribution & Density Function 135
DISCRETE RANDOM VARIABLE CONTINUOUS RANDOM VARIABLE
PROBABILITY DISTRIBUTION PROBABILITY DENSITY FUNCTION
what is the probability that in a wafer what is the probability that for a wafer
randomly selected from a lot, the number of randomly selected, the value of thickness (the
defective dice (the variable X) is 3? variable X) is included in the interval (x1;x2)?
ST Restricted
Probability Distribution & Density Function 136
Main properties of the Probability Distribution and of the Density Function

Probability Distribution
• It is indicated by P(X=x)=P(x) → read: “the probability that the X takes on the value x”
• 0 ≤ P(x) ≤ 1 for every possible value x
•Σ
All
P(x) = 1 The sum of P(x) over all the possible values of x is equal to one
possible x
Density Function
• It is indicated by f(x)
• We generally refer to the probability that X belongs to an interval of possible values
• f(x0) = 0 the probability that X is equal to a value x0 is equal to zero
• ∫f(x)dx = 1 The probability that X belongs to the interval of all the possible values is 1
All
possible x
ST Restricted
Theoretical Probability Models 137
QUESTION
“So, how Statistics can help us in calculating the probability that a certain event takes place?”
ANSWER
“Statisticians defined many different probability models that can be used in real-world
phenomena. They are called Theoretical Probability Models. “
Real-World phenomena do not

behave all the same way.
→ We need different models to
interpret their different behaviors.
Here on the right, some

Theoretical Probability Models
or DISTRIBUTIONS
DISCRETE CONTINUOUS
DISTRIBUTIONS DISTRIBUTIONS
ST Restricted
The Normal Distribution 138
If a Random Variable X is normally distributed, its density function is given by:
1 1 𝑥−𝜇 2
−2 𝜎
𝑓 𝑥 = 𝑒
𝜎 2𝜋
Where, e = the mathematical constant approximated by 2.71828

π = the mathematical constant approximated by 3.14159
μ = the population mean
σ = the population standard deviation
x = any value of the continuous variable, − < x < 
ST Restricted
The Normal Distribution 139
f(X)
Normal Density Function σ
X
μ
Main features of the Normal Distribution:

• It is symmetric
• It is unimodal
• It is bell-shaped
• It is defined by two parameters:
• μ – the mean of the distribution (-∞ < μ < ∞)
• σ2 – the variance of the distribution (σ2 > 0)
ST Restricted
Graphical Interpretation of σ 140
For the Normal distribution, a graphical interpretation of the standard deviation

σ permits to estimate the expected percentage of the population values falling
between the limits defined by the mean μ ± kσ (k=1,2,3). This is illustrated in the
following figure:
ST Restricted
Variations of the Parameters 141
By varying the parameters μ and σ, we obtain different normal distributions
ST Restricted
Variations of the parameters 142
Changing μ shifts the

distribution left or right.
f(x)
Changing σ increases or
decreases the spread.
σ
μ x
ST Restricted
Graphical Assessment of Normality 143
• In many situations it is necessary to know if the sample data on

which we do our analyses comes from a Normal distribution.
• This can be initially assessed using the graphical methods

presented here.
• Notes:
• At a graphical level, we can only produce a partial assessment of normality (we
can be confident about the conclusion from a graph that clearly indicates non-
normality. Conversely, if the graph recalls a normal behavior, some doubts
should remain)
• To obtain more complete information, other procedures should be employed (→
Inferential methods, which are not presented in this course).
ST Restricted
Graphical Assessment of Normality 144
Graphical Methods to Assess Normality
Histogram Normal Probability Plot
ST Restricted
Histogram to Assess Normality 145
• The Normal distribution has some typical characteristics.

• It is symmetrical
• It is Unimodal (only one Mode)
If the histogram presents ALL these features, there is a

graphical evidence of normality.
ST Restricted
EXAMPLES histograms of samples drawn from different distributions are shown.
UNIFORM Distribution (5-7)

• It seems symmetrical
• It is NOT bell-shaped
• It is NOT unimodal
Conclusion: Non-Normal distribution
GAMMA Distribution (3,1.2)

• It is NOT symmetrical
• It is unimodal
ST Restricted
BIMODAL (2 Normal together)

• It seems symmetrical
• It is NOT unimodal
NORMAL Distribution (13,1)

• It is symmetrical
• It is unimodal
Conclusion: Normal distribution
ST Restricted
Normal Probability Plot 148
Normal probability plot
How to build & use it:

• Arrange data from low to high values
• Find cumulative normal probabilities for all values
• Examine a plot of the observed values vs. cumulative probabilities (with the
cumulative normal probability on the vertical axis and the observed data values
on the horizontal axis)
• Evaluate the plot for evidence of linearity (linear pattern of the plotted points →
evidences of normality. Non-linear patterns (according to the shape) are an
indication of different but non-normal distributions
• In the following examples, some cases are illustrated. The red asterisks refer to
the first and the third quartile (Q1 and Q3) of the plotted data
ST Restricted
NORMAL DISTRIBUTION SYMMETRICAL NON-NORMAL (t)
Data Data
Quantiles Quantiles
Normal Quantiles Normal Quantiles
SYMMETRICAL NON-NORMAL (Uniform) SYMMETRICAL NON-NORMAL (Bimodal)
Data Data
Quantiles Quantiles

ST Restricted
LEFT-SKEWED DISTRIBUTION RIGHT-SKEWED DISTRIBUTION
Data
Data
Quantiles
Quantiles
ST Restricted
Activity 151

• Consider the column A, “Package Thickness”. It contains 1000 measurements of
thickness
• Interpret the graph
ST Restricted
Answer 152
Evidence of a bimodal distribution
ST Restricted
Activity 153

• Consider the column I, “Oxide Thickness”. It contains 118 measurements of thickness
• Generate the proper graphical representation to study the shape of its distribution
ST Restricted
Answer 154
• OXIDE THICKNESS
Evidence of a symmetrical, bell-shaped distribution
ST Restricted
Activity 155
• Open the file TRAINING DATA.xlsx

• Consider the column C, “Punch Diameter”. It contains 138 measurements of Tape
Punch hole diameter
• Generate the proper graphical representation to study the shape of its distribution
ST Restricted
Answer 156
• Micromodule Tape punch hole diameter
Evidence of a non-symmetrical distribution (right-skewed)
ST Restricted
Activity 157
• Open the file TRAINING DATA.xlsx

• Consider column A , “Package Thickness” and column B, “Ball Shear”.
• For each sample, generate the proper graphical representation to assess normality
• Interpret the graphs comparing the two populations
ST Restricted
Answer 159
Graphical assessment of normality
Evidence of:
• “Package Thickness” → non-normal distribution (bimodal)
• “Ball Shear” → approximately normal behavior
ST Restricted
• Random Variables
• Random Experiment
• Discrete Random Variables
• Continuous Random Variables
• Probability Distributions
• Density Functions
• Theoretical Probability Models
• The Normal Distribution

• Main Features
• Graphical Interpretation of the Standard Deviation
• Variations of the Parameters
• Graphical evaluation of normality
ST Restricted
Conclusion
• You are able to understand the difference between
population and sample
• You can describe many features of a sample using both

graphical methods and numerical methods
• You know what is the meaning of random variables and to

become familiar with the normal distribution.
• You can compare populations descriptively.
• You can interpret a Normal Probability Plot (NPP).
ST Restricted
Conclusion
• What you could do next to better improve your statistical
competency:
• Use as much as possible what you have learned. And do it since tomorrow!
• Only way to avoid forgetting what you learned: do not wait too much time after the
course to start implementing the techniques shown in the training.
• Think about attending the next training course on “Statistics Level 2”
• You will learn:
• how to use sample data to assess important aspects of the entire population (inference)
• How to perform a statistical test of hypotheses.
• How to determine if the dataset contains outliers
ST Restricted
Post-test 163
• Complete the post-test to the best of your knowledge
It allows us to measure the learning that has taken place during the
training.
10-15 minutes
ST Restricted
Customer satisfaction 164
How can we improve for next time?
Kirkpatrick Level 1 evaluation questionnaire
You will receive an e-mail, Please take 5mn to complete the

evaluation form this will help us to continually improve the learning
content, facilitation, organization
ST Restricted
CONGRATULATIONS!!
ST Restricted
JMP Routines
ST Restricted
How to make Bar chart in JMP
File: Bar Pie and Pareto.jmp
167
Click File > New > Data Table

Key in data in empty data table
Go to Graph Menu > Chart
From the Select Columns click

on Freq > Statistics, then
Select % of Total
ST Restricted
How to make Bar chart in JMP 168
Click on Category (Bin) then
Check that the type of chart is

set on Bar Chart
(Other chart options available)
Hit on OK button
Back
ST Restricted
How to make Pie chart in JMP 169
ST Restricted
How to make Pie chart in JMP 170
Click on Category (Bin) then
Check that the type of chart is

set on Pie Chart
(Other chart options available)
Hit on OK button
Back
ST Restricted
How to make Pareto in JMP 171
Click File > New > Data Table

Key in data in empty data table
Go to Graph Menu > Pareto Plot
From the Select Columns click

on Category > Y,Cause, then
Freq > Freq
ST Restricted
How to make Pareto Plot in JMP 172
Right click on the Pareto Plot bar

(enclosed in the box) and select Label
Cum Percent Points to display % for
each category
Back
ST Restricted
How to make Histogram in JMPFile: Thickness.jmp
173
Open file Thickness.jmp

Click Analyze > Distribution
Cast Thickness column in Y,columns

Click OK
ST Restricted
How to make Histogram in JMP 174
Initial output display:

Click on red hotspot , then click Stack
Stacked output display:
Back
ST Restricted
How to make Scatterplot in JMP 175
File: Absentee Rate.jmp

Open file Absentee Rate.jmp
Click Graph > Scatterplot Matrix
Cast Absences to Y, columns and

Experience in X. Click OK
Objective of the study: To know if long-term employees would be more reliable and
absent less often ST Restricted
How to make Scatterplot in JMP 176
• Does it look like the experience influence

the absentee rate?
Back
ST Restricted
How to make Side by Side chart in JMP 177
File: Side by side.jmp
Click on Graph > Chart
Assign Data Column to Statistics

(Data), and Task/Operator columns
to Categories,X,Levels
ST Restricted
How to make Side by Side chart in JMP 178
Hit OK button
Back
ST Restricted

STATISTICS 1 - CAL Edited

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STATISTICS 1 - CAL Edited

Uploaded by

Copyright:

Available Formats

STATISTICS LEVEL 1

• STatS is a project started in 2012 under the sponsorship of PQE

• To permit to ST to reach Best-in-class by introduction of innovative

• STatS is intended to continuously improve our detection capability

• To drive and support the deployment and correct application of the

Statistics level 1 SPC 1

Statistics level 2 SPC 2

• To provide the fundamentals of statistics

• To answer current statistical questions in everyday work

• To produce more accurate/effective statistical analysis

• Describe the difference between population and sample

• Describe properly some important features of a sample using both graphical

• To understand what is the meaning of random variable and to become familiar

• To compare samples descriptively and to interpret a Normal Probability Plot

• To Analyze information at the proper level

• Complete the questionnaire to the best of your knowledge

• This is not an individual ‘control’ test

It will allow us to have an idea of your level of knowledge

We will re-do the questionnaire at the end of the course to

Welcome Day 1 (9:00 – 18:00) Day 2 (9:00 – 13:00)

• At the end of this chapter, you will be able to:

• Know why we need Statistics

Dealing with Uncertainty

• If the number of attendees of this course is as high as predicted, the

… because of uncertainty, the previous

• If the number of attendees of this course is as high as predicted, it is

Statistics helps us assessing HOW likely an event is.

Decision Making Process Steps

Begin Here: Tools: what helps in each step

• A population is the collection of all items of interest or under

• A sample is an observed subset of the population

• A parameter is a specific characteristic of a population

• A statistic is a specific characteristic of a sample

Provide some examples of: Population, Sample, Parameter and statistic.

• A sample is drawn from a population. The most important feature

Several techniques exist to help drawing a representative sample.

For example, Simple Random Sampling is a procedure in which:

Two branches of Statistics:

• Decision making process

• Incomplete information in decision making

• Description of Simple Random Sampling

• At the end of this chapter, you will be able to:

• Identify the different types of data we deal with

Quantitative variables. They can be every value on the scale

Quantitative variables. They can be every value on the scale

Qualitative variables. Their “values” usually belong to different,

Provide some examples of: Numerical (Continuous & Discrete)

Differences between measurements, true zero exists.

Ordered Categories (rankings, order, or scaling)

• Classify the following variables into the types previously seen:

VARIABLE TYPE? 5 minutes

Variable 1 Variable 2 Variable 3 Variable 4 Variable 4 Variable 5

Categorical Numerical Numerical Categorical Numerical Numerical

• At the end of this chapter, you will be able to:

• Create and interpret graphs to describe categorical variables: frequency

• Raw Data are usually not easy to use for

• Some type of organization is needed

The choice of the proper graph to use depends

categorical variables… numerical variables…

you can use: you can use:

❑ Frequency distribution ❑ Line chart

For Categorical Data

Tabulating Data Graphing Data