Download as pdf or txt
Download as pdf or txt
You are on page 1of 174

STATISTICS LEVEL 1

Speaker Name
ST Context 2

• STatS is a project started in 2012 under the sponsorship of PQE


Management.

• To permit to ST to reach Best-in-class by introduction of innovative


statistical tools and methodologies at Company level

• The main goals of STatS are review, rationalize and improve the
effectiveness of Statistical methodology in general.

• STatS is intended to continuously improve our detection capability


through the adoption of an advanced statistical approach, and to
reduce DPPM (Defective Parts Per Million), thru an innovation of the
statistical techniques deployed in ST manufacturing.

• To drive and support the deployment and correct application of the


Statistics Manuals in all ST manufacturing plants

ST Restricted
Statistics Learning 3

Statistics level 1 SPC 1

Statistics level 2 SPC 2

Measurement
Statistical
System
Model Building
Analysis (MSA)

Design of
Multivariable
Experiments
Statistics
(DOE)

ST Restricted
Why this course? 4

• To provide the fundamentals of statistics

• To answer current statistical questions in everyday work

• To produce more accurate/effective statistical analysis

ST Restricted
Training purpose
5

• Describe the difference between population and sample

• Describe properly some important features of a sample using both graphical


methods (histogram, bar diagram, box plot …) and numerical methods
(indices of position and spread like average, median, percentiles, standard
deviation, variance…).

• To understand what is the meaning of random variable and to become familiar


with the normal distributions.

• To compare samples descriptively and to interpret a Normal Probability Plot


(NPP).

ST Restricted
5
Benefits 6

• To Analyze information at the proper level


and take the right decision accordingly

ST Restricted
6
Let’s get to know each other… 7

Round table:
• Name
• Organization
• Are you already using statistical methodology?
• If so, what are the main applications?
• Expectations from the course

ST Restricted
Pre-test 8

• Complete the questionnaire to the best of your knowledge

• This is not an individual ‘control’ test

It will allow us to have an idea of your level of knowledge


about the subject prior to the training (so, if you don’t
know, don’t worry …)

We will re-do the questionnaire at the end of the course to


measure the learning that has taken place

10 minutes
ST Restricted
Structure of the course

ST Restricted
Structure of the course/Agenda 10

Welcome Day 1 (9:00 – 18:00) Day 2 (9:00 – 13:00)


Module 1 Module 4
• Introduction • Descriptive Indices for the
• First Concepts association between two variables
• Population VS Sample Module 5
• Descriptive VS Inferential
• Parameter VS Statistic • Random Variables
• Theoretical Distributions
Module 2 • For Continuous Variables
• Types of data • For Discrete Variables

Module 3 Conclusion
• Graphical Presentation of Data
• Presentations for Numeric and Categorical Data
• Presentations for One Sample or for Two Samples
Module 4
• Descriptive Indices for
• Location
• Spread

ST Restricted
Module 1: Introduction

ST Restricted
Module 1 objectives 12

• At the end of this chapter, you will be able to:

• Know why we need Statistics


• Describe the difference between Population and Sample
• Describe which are the main parts of Statistics

ST Restricted
Why we need Statistics? 13

Dealing with Uncertainty


Everyday decisions are based on incomplete information
Consider:

• The yield of a certain product will be higher in six months than it is now.

• If the number of attendees of this course is as high as predicted, the


rate of good statistical reports will increase by 10% in the next 6 months.

ST Restricted
Why we need Statistics? 14

… because of uncertainty, the previous


statements should be modified:
• The yield of a certain product is likely to be higher in six months than it
is now.

• If the number of attendees of this course is as high as predicted, it is


probable that the rate of good statistical analyses will increase by 10%
in the next 6 months.

Statistics helps us assessing HOW likely an event is.

ST Restricted
The Decision Making Process 15

Decision Making Process Steps

Begin Here: Tools: what helps in each step


Identify the Data
Problem.
Descriptive Statistics,
Probability, Computers
Information
Inferential Statistics,
Experience, Theory,
Literature, Computers
Knowledge

Decision

ST Restricted
Key Definitions 16

• A population is the collection of all items of interest or under


investigation
• N represents the population size

• A sample is an observed subset of the population


• n represents the sample size

• A parameter is a specific characteristic of a population

• A statistic is a specific characteristic of a sample

Provide some examples of: Population, Sample, Parameter and statistic.

ST Restricted
Key Definitions 17

• A sample is drawn from a population. The most important feature


of a sample is its ability to represent as much as possible the
entire population.
A good sample is said “representative” of the population.

Several techniques exist to help drawing a representative sample.

For example, Simple Random Sampling is a procedure in which:


• each member of the population is chosen strictly by chance,
• each member of the population is equally likely to be chosen,
and
• every possible sample of n objects is equally likely to be chosen
The resulting sample is called a random sample

ST Restricted
Descriptive and Inferential Statistics 18

Two branches of Statistics:

•Descriptive statistics
• Collecting, summarizing, and processing data to transform data into information

•Inferential statistics
• provide the bases for predictions, forecasts, and estimates that are used to
transform information into knowledge

ST Restricted
Descriptive and Inferential Statistics 19

POPULATION
SAMPLE
N=10,000 Sampling n=500
(True) Mean=?
Average=56.2

Descriptive statistics
From 500 sample data, we calculate the average.
We might also generate some graphs.
Possible Error →calculation.

Inferential statistics
Actually, we are interested in something referred to the entire population.
Not only about a description of the sample values. → We can ESTIMATE - for example
- the value of the mean of the entire population.
Error →Inferential Error: the sample will never represent all the population 100%.

ST Restricted
Descriptive and Inferential Statistics 20

Descriptive
•Collect data
• e.g., Survey

•Present data
• e.g., Tables and Graphs

•Summarize data
• e.g., Sample mean =
 X i

ST Restricted
Descriptive and Inferential Statistics 21

Inference
•Estimation
• e.g., Estimate the population mean
weight using the sample mean weight

•Hypothesis testing
• e.g., Test the claim that the population
mean weight is 120 pounds
Inference is the process of drawing conclusions or making
decisions about a population based on sample results

ST Restricted
Module 1 Key Learning’s 22

• Decision making process

• Incomplete information in decision making

• Description of Simple Random Sampling

• Key definitions:
• Population vs. Sample
• Parameter vs. Statistic
• Descriptive vs. Inferential statistics

ST Restricted
Module 2: Types of Data

ST Restricted
Module 2 objectives 24

• At the end of this chapter, you will be able to:

• Identify the different types of data we deal with


• Identify the different levels of measurement

ST Restricted
Types of Data: Variables Classification 25

Variable

Numerical Categorical

Continuous Discrete

ST Restricted
Types of Data: Classification Details 26

Variable

Numerical Categorical

Continuous Discrete

Quantitative variables. They can be every value on the scale


Continuous of real numbers and can be ordered.
EXAMPLE: Oxide Thickness, Bond Line Thickness, Resistance

Quantitative variables. They can be every value on the scale


Discrete of integer numbers and can be ordered.
EXAMPLE: Number of Good Dice/Wafer

ST Restricted
Types of Data: Classification Details 27

Variable

Numerical Categorical

Continuous Discrete

Qualitative variables. Their “values” usually belong to different,


mutual exclusive, categories. They are not numbers.
EXAMPLES:
- Visual Inspection. Possible Results → Good/No-Good
- Survey. Possible results → (1) Excellent (2) Good (3) Fair (4) Poor

Provide some examples of: Numerical (Continuous & Discrete)


and Categorical variables.

ST Restricted
Levels of Measurement 28

Differences between measurements, true zero exists.


EXAMPLES: Thickness, Height, Age, Income,…

Numerical
Differences between measurements, but no true zero.
EXAMPLE: Temperature in Fahrenheit or Celsius.

Ordered Categories (rankings, order, or scaling)


Ordinal Data EXAMPLE: Satisfaction/Quality Rating

Categorical
Categories (no meaningful ordering or direction)
Nominal Data EXAMPLE: Type of car owned, Marital Status,…

ST Restricted
Activity 29

• Classify the following variables into the types previously seen:


Variable 1 Variable 2 Variable 3 Variable 4 Variable 5 Variable 6
Process _ Potting Resin Thickness Final Test
Defectivity RDS_ON Number of cycles
Equipment (mm) Equipment
FPT01_H1 19
614 TFX_10 344.6 6
FPT01_H2 6
630 TFX_10 331.5 0
FPT02_H1 6
623 TFX_11 321.4 11
FPT02_H2 16
612 TFX_07 330.6 3
FPT01_H1 15
658 TFX_07 219.9 9
FPT01_H2 5
662 TFX_08 311.4 16
FPT02_H1 4
638 TFX_09 389.9 0
FPT02_H2 6
630 TFX_09 322.7 1

VARIABLE TYPE? 5 minutes


ST Restricted
Answer 30

Variable 1 Variable 2 Variable 3 Variable 4 Variable 4 Variable 5


Process _ Potting Resin Thickness Final Test
Defectivity RDS_ON Number of cycles
Equipment (mm) Equipment
FPT01_H1 19
614 TFX_10 344.6 6
FPT01_H2 6
630 TFX_10 331.5 0
FPT02_H1 6
623 TFX_11 321.4 11
FPT02_H2 16
612 TFX_07 330.6 3
FPT01_H1 15
658 TFX_07 219.9 9
FPT01_H2 5
662 TFX_08 311.4 16
FPT02_H1 4
638 TFX_09 389.9 0
FPT02_H2 6
630 TFX_09 322.7 1

Categorical Numerical Numerical Categorical Numerical Numerical


Nominal Continuous Discrete Nominal Continuous Discrete

VARIABLE TYPE
ST Restricted
Module 2 Key Learning’s 31

• Types of variables

• Levels of Measurement

ST Restricted
Module 3: Graphical Presentation
of Data

ST Restricted
Module 3 objectives 33

• At the end of this chapter, you will be able to:

• Create and interpret graphs to describe categorical variables: frequency


distribution, bar chart, pie chart, Pareto diagram
• Create a line chart to describe time-series data
• Create and interpret graphs to describe numerical variables: frequency distribution,
histogram
• Construct and interpret graphs to describe relationships between variables: Scatter
plot, cross table
• Create and interpret graphs to describe two numerical variables: double histogram
(numerical ), side-by-side Bar Chart (categorical

ST Restricted
Graphical Presentations 34

• Raw Data are usually not easy to use for


decision making

• Some type of organization is needed


• Table
• Graph

Consider that:

The choice of the proper graph to use depends


on the variable being summarized

ST Restricted
Types of Variables VS Types of Graphs 35

For…

categorical variables… numerical variables…

you can use: you can use:

❑ Frequency distribution ❑ Line chart


❑ Bar chart ❑ Frequency distribution
❑ Pie chart ❑ Histogram and ogive
❑ Pareto diagram ❑ Stem-and-leaf display
❑ Scatter plot

ST Restricted
Graphical Presentations 36

For Categorical Data

Tabulating Data Graphing Data

Bar Chart
Frequency Distribution Table

Pie Chart

Pareto Diagram

ST Restricted
Frequency Distribution Table 37

WHAT IS IT?
A Frequency Distribution Table is a simple method to evaluate the
distribution of the frequencies associated to the possible values of a
variable of interest.

If the variable is categorical, it is very easy to determine the values to


include in the first left column of the table. In fact, all the possible
characters of the variable are considered.

To each character, different types of frequencies can be included in the


table: FREQUENCY, RELATIVE FREQUENCY and CUMULATIVE
FREQUENCY. Moreover, each one can be expressed as a percentage.

ST Restricted
Frequency Distribution Table 38

EXAMPLE: For a certain product and in a certain interval of time, 200 devices did
not pass electric tests on 3 different failure modes (BIN2, BIN6 and BIN8). The results
are summarized in the following Frequency Distribution Table:

NOTE: the categorical variable is “The Failure Mode”. The categories are BIN2, BIN6 and BIN8.

Category Relative Cumulative Cumulative


Frequency Frequency %
(BIN) Frequency Frequency Frequency %
BIN 2 78 78/200=0.390 39% 0.390 39%
0.390+0.245=
BIN 6 49 49/200=0.245 24.5% 39+24.5=63.5%
0.635
BIN 8 73 73/200=0.365 36.5% 0.635+0.365=1 63.5+36.5=100%

TOTAL 200 1 100% (--) (--)

ST Restricted
Bar Chart 39

WHAT IS IT?

• Bar charts are graphs used for qualitative (category) data.


• They are made up with bars associated to the frequencies
contained in frequency distribution table (vertical axis) against the
categories of the considered variable (horizontal axis).
• The Height of the bars shows the frequency or percentage for each
category.
• It may be helpful to display the frequency distribution table under
the chart.

ST Restricted
Bar Chart 40

Category
Frequency
(BIN) Bar-chart (data from the previous example)
BIN 2 78 On the vertical axis, are shown the frequencies
BIN 6 49

BIN 8 73

TOTAL 200
ST Restricted
Bar Chart 41

Category
Frequency % Bar-chart (data from the previous example)
(BIN)
BIN 2 39%
On the vertical axis, are shown the frequencies %

BIN 6 24.5%

BIN 8 36.5%

TOTAL 100%
How to generate bar chart in JMP
ST Restricted
Pie Chart 42

WHAT IS IT?

• Presentations of the frequencies or percentages


• Pie charts are often used for qualitative (category) data.
• Sizes of pie slices show the frequency or percentage for each category.

ST Restricted
Pie Chart 43

EXAMPLE of Pie-chart (data from the previous example)

How to generate Pie chart in JMP


ST Restricted
Pareto Diagram 44

WHAT IS IT?
• Used to portray categorical data.

• A bar chart, where categories are shown in descending order


of frequency.

• A cumulative polygon is often shown in the same graph.

• Used to separate the “vital few” (the categories whose


frequencies% sum to about 80%) from the “trivial many” (the
remaining categories).

ST Restricted
Pareto Diagram 45

EXAMPLE of Pareto diagram (data from the previous example)


On the vertical axis, are shown the cumulative frequencies

How to generate Pareto Plot in JMP


ST Restricted
Graphical Presentations 46

For Numerical Data

Time series Frequency & Variables


Cumulative association
Line chart
frequency Scatter plot

Tabulating Data Graphing Data


Histogram & ogive
Frequency Distribution Table
Stem & leaf

ST Restricted
Line Chart 47

WHAT IS IT?
• A line chart shows the values of one or more variables over time.
• If more variables are plotted on the same graph, the comparative
behavior can be investigated to show trends, differences, cyclic
patterns etc.
• If the points are a statistic (e.g. an average), the points can be
replaced by a box-plot in order to show the spread.
• Time is measured on the horizontal axis.
• The variable of interest is measured on the vertical axis.

NOTE: this plot is helpful for categorical and for discrete variables as well.

ST Restricted
Line Chart 48

Line chart : some examples

ST Restricted
Frequency Distribution (Numeric Data) 49

WHAT IS IT?
A frequency distribution for a numeric variable is a list or a table
containing class groupings (categories or ranges within which the
data fall) and the corresponding frequencies with which data fall
within each class or category.
A frequency distribution is a way to summarize data.

The distribution condenses the raw data into a more useful form and
allows for a quick visual interpretation of the data.

ST Restricted
Frequency Distribution (Numeric Data) 50

For numerical variables, the determination of the classes it is not an obvious task
like in the case of categorical variables. In fact, in this case the classes are not
“naturally” defined by all the possible characters of the variable. Here, they are
chosen arbitrarily (i.e. subjectively) in a non-unique way.
In the frequency distribution table, we might include: relative and relative %
frequencies, cumulative and cumulative % frequencies.

A generic Frequency Distribution Table to group n observation


Class Frequency Relative Relative Cumulative Cumulative
Frequency Frequency % Frequency Frequency %
CLASS 1

CLASS 2

TOTAL n 1 100% (--) (--)

ST Restricted
Frequency Distribution 51

EXAMPLE given the following group of 10 raw data,


build a Frequency Distribution Table.

75.4786
min = 73.7692, max=75.6101
74.6043
74.5925
73.7692 A possible (non-unique) way of grouping is:
raw data 74.3453 CLASS 1 → 73.0000 but less than 74.0000
74.4622 CLASS 2 → 74.0000 but less than 75.0000
74.815
CLASS 3 → 75.0000 but less than 76.0000
74.0306
75.6101
74.0489

Frequency Distribution Table


Class Frequency Relative Frequency

CLASS 1 1 0.1

CLASS 2 7 0.7

CLASS 3 2 0.2

TOTAL 10 1

ST Restricted
Frequency Distribution 52

NOTES
1. Grouping data has clear interpretative advantages but as a result some detail is
lost (in fact, bins are also called “classes of equivalence” → all the observations
grouped in the same bin are considered equivalent. This implies that, once the
groups are formed, it will not be possible anymore to discriminate between
observations that belong to the same group). See also, Stem & Leaf Display.
2. Class limits must be chosen in order to guarantee mutually exclusive
classes, i.e. each observation can be included in one and only one class.

ST Restricted
Number of Classes (K) 53

To define the number of classes (k), you might use the following thumb-rule:
Number of Number of
observations (n) classes (k)
n < 50 5-7
50  n 100 7-8
101  n  500 8-10
501  n  1000 10-11
1001  n  5000 11-14
n > 5000 14-20

Alternatively, other known rules are:


- K = square root of n
- K = 1+(10/3)log(n) For sample sizes n<100

ST Restricted
Class Width (W) 54

To generate K equally sized (*) classes (i.e. of uniform width), the width W of
each class is given by:

max - min
W=
K

Where, max and min are the largest and the smallest sample values respectively.

(*) NOTE: it is possible to generate a histogram with unequal class widths but its interpretation is
different since the bar heights are not enough to “catch” the relative importance of each class.
For details, see ADCS 8482919_A, §6.1.1.1 CASE B, page 24/400.

ST Restricted
Histogram 55

WHAT IS IT?
• A graph of the data in a frequency distribution is called a histogram.
• The interval endpoints are shown on the horizontal axis.
• the vertical axis is either frequency, relative frequency, or
percentage.
• Bars of the appropriate heights are used to represent the number of
observations within each class (no gaps between bars are allowed).
• A minimum number of 30-40 observations is required to obtain
interpretable results.
• Histograms are used to study shape (e.g. symmetry), location and
spread of the data.

ST Restricted
Histogram 56

75.4786 74.4622
74.6043 74.8150
EXAMPLE build an histogram for the following data: 74.5925 74.0306
73.7692 75.6101
74.3453 74.0489

STEP 1: chose the classes (number and width): CLASS 1 → 73.0000 but less than 74.0000
CLASS 2 → 74.0000 but less than 75.0000
CLASS 3 → 75.0000 but less than 76.0000

STEP 2: create a frequency distribution table: Class Frequency Relative Frequency


CLASS 1 1 0.1
CLASS 2 7 0.7
CLASS 3 2 0.2

STEP 3: Draw the histogram for frequency or relative frequency:


frequency Relative frequency
7 0.7

2 0.2
1 0.1

73 74 75 76 classes 73 74 75 76 classes
ST Restricted
Histogram 57

QUESTIONS
1. How wide should each interval be?
(How many classes should be used?)

2. How should the endpoints of the intervals be determined?


• Often answered by trial and error, subject to user judgment
• The goal is to create a distribution that is neither too "jagged" nor
too "blocky”
• Goal is to appropriately show the pattern of variation in the data

ST Restricted
Histogram 58

Many (Narrow class intervals) 3.5


3


2.5
may yield a very jagged distribution with gaps

Frequency
2
from empty classes 1.5

• Can give a poor indication of how frequency 1


0.5
varies across classes 0

4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
More
Temperature

12

Few (Wide class intervals) 10

Frequency
8
• may compress variation too much and yield a 6
blocky distribution 4

• can obscure important patterns of variation. 2

0
0 30 60 More
Temperature
(X axis labels are upper class endpoints)

ST Restricted
Histogram Interpretation 59

• The shape of the distribution is said to be symmetric if the


observations are balanced, or evenly distributed, about the center.

Symmetric Distribution

10
9
8
7
Frequency

6
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9

How to generate Histogram in JMP


ST Restricted
Histogram Interpretation 60

The shape of the distribution is said to be skewed if the


observations are not symmetrically distributed around the center.

Positively Skewed Distribution

A positively skewed distribution 12

10
(skewed to the right) has a longer tail 8

Frequency
that extends to the right in the 6

4
direction of positive values. 2

0
1 2 3 4 5 6 7 8 9

Negatively Skewed Distribution


A negatively skewed distribution
12
(skewed to the left) has a longer tail 10

that extends to the left in the direction 8


Frequency

6
of negative values. 4

0
1 2 3 4 5 6 7 8 9
ST Restricted
Ogive 61

Use of the cumulative frequency distribution


75.4786 74.4622
74.6043 74.8150
EXAMPLE build an ogive for the following data: 74.5925 74.0306
73.7692 75.6101
74.3453 74.0489

Class Upper Interval Endpoint Cumulative Frequency %


73 but less than 74 74 10
74 but less than 75 75 80
75 but less than 76 76 100

Cumulative frequency %
100
80

10

74 75 76 Interval endpoint

ST Restricted
Stem-and-Leaf Display 62

WHAT IS IT?

• It is a simple way to see distribution details in a data set


METHOD: Separate the sorted data series into
leading digits (the stem) and
trailing digits (the leaves)

• For example, the number 45 is the union of a stem which is the


tens digit (4) and a leaf which is ones digit (5).

ST Restricted
Stem-and-Leaf Display 63

EXAMPLE build an ogive for the following data:


X values X sorted
75.4786 74.9129 71.0224 74.5601
STEM and LEAF definition:
74.6043 74.5601 72.4336 74.5925 The stems are the integer part of the values and the
74.5925 75.5037 73.1728 74.6043
73.7692 73.1873 73.1873 74.6455 leaves are the approximation to the first decimal
74.3453 71.0224 73.7567 74.6521
74.815
digit. So, for example, the value 75.4786 will be
74.4622 73.9911 73.7692
74.815 73.8043 73.8043 74.8574 formed by a stem=75 and by a leaf=5
74.0306 74.0654 73.9911 74.9129
75.6101 74.1178 74.0306 74.9785
74.0489 75.2165 74.0381 75.0494
74.3663 76.537 74.0489 75.1362
The Stem-and-Leaf plot
75.7653 74.4765 74.0654 75.2165 LO|71,02
74.5463 74.0991 74.0991 75.2846 71|
75.2975 71|
75.8547 74.6521 74.1178
72|4
75.0494 76.8578 74.1965 75.4592
72|
74.0381 73.1728 74.2735 75.4786 73|11
75.4592 74.3207 74.3161 75.5037 73|7789
74.9785 74.2735 74.3207 75.6101 74|000001123333344
75.2846 75.1362 74.3449 75.7653 74|55556668899
74.1965 75.7757 74.3453 75.7757 75|0122244
73.7567 76.0959 74.3663 75.8547 75|56778
74.3161 74.6455 74.4622 76.0823 76|00
76.0959 76|58
74.5273 74.8574 74.4765
75.2975 76.0823 74.5273 76.537
72.4336 74.3449 74.5463 76.8578

stems leaves

ST Restricted
Presentations for Two Variables 64

• Graphs illustrated so far have involved only a single variable.


• When two variables exist, other techniques are used.

Presentations for 2 Variables

Numerical Categorical

Scatter Plots Cross Tables

ST Restricted
Scatter Plot 65

WHAT IS IT?
• Scatter Plots are used for paired observations(*) taken from
two numerical variables
• The Scatter Plot:
• one variable is measured on the vertical axis and the other
variable is measured on the horizontal axis.

(*) NOTE
The values plotted are: Pi=(Xi,Yi) i=1,…,n (i.e. p1=(X1, Y1), P2=(X1,Y2),…,Pn=(Xn, Yn))

ST Restricted
Scatter Plot 66

Scatter Plots are very informative tools. For 2 variables X and Y, they permit to assess for example:

Association between X and Y Type of Association


No association Association Linear Other
Y Y

X X

Direction of the Association Presence of Outliers

Y Positive Negative Y
Outliers

X X

How to generate Scatterplot in JMP


ST Restricted
Double Histograms 67

Histograms can be produced to compare the distributions of two numerical


variables. In the example below, the histograms of the variables X and Y are plotted
on the same graph.
X

25
20
15
10
frequency

5
0
5
10
15
20
25
345 346 347 348 349 350 351 352 353 354 355
Y

NOTE: For other graphical methods like this, see ADCS 8482819_A - §6.2.1.1

ST Restricted
Cross Tables 68

WHAT IS IT?
• Cross Tables (or contingency tables) list the number of
observations for every combination of values for two categorical or
ordinal variables

• If there are r categories for the first variable (rows) and c


categories for the second variable (columns), the table is called an
r x c cross table

ST Restricted
Cross Tables 69

EXAMPLE
4 x 3 Cross Table for:
Time dedicated to 4 types of Task (rows) by 3 Operators (columns)

Task \ Operator Operator A Operator B Operator C TOTAL


Type 1 46.5 55.0 27.5 129.0
Type 2 32.0 44.0 13.0 95.0
Type 3 15.5 20.0 19.0 49.0
Type 4 16.0 28.0 7.0 51.0
TOTAL 110.0 147.0 67.0 324.0

ST Restricted
Side by Side Bar Chart 70

EXAMPLE
Using the data from the previous example, produce a “side by side bar chart”

Time
Operator A
60
Operator B
50
Operator C
40
30
20
10

Task Type 1 Task Type 2 Task Type 3 Task Type 4

How to generate Side by Side chart in JMP


ST Restricted
Activity 71

• Open the file TRAINING DATAStat1_Cal.xlsx


• Consider column F, “Measurement Equipment”. It contains the names of 4 Tester
equipment . Each equipment have been used a variable number of times. The total
number of times the group of equipment have been used is 544. We want to assess
the distribution of usage of the different equipment
• Generate the proper graphical representation

ST Restricted
Answer 74
• Measurement Equipment – Barchart
• From excel, copy the column Measurement
Equipment to JMP data table
• Go to JMP, File > Edit > Copy with
Column name
• Follow the JMP routine on creating
Bar Chart

ST Restricted
Answer 75
• Measurement Equipment - Piechart
• From excel, copy the column Measurement
Equipment to JMP data table
• Go to JMP, File > Edit > Copy with
Column name
• Follow the JMP routine on creating
Pie Chart

ST Restricted
Activity 76

• Open the file TRAINING DATAStat1_Cal.xlsx


• Consider column B, “BALL SHEAR”. It contains 205 measurements of Ball Shear for
DFN8_Cu on FWB1106
• Generate the proper graphical representation

ST Restricted
Answer 78
• THICKNESS Histogram
• From excel, copy the column BALL SHEAR to
JMP data table
• Click Analyze > Distribution
• Follow the JMP routine on creating Histogram

ST Restricted
Activity 79

• Open the file TRAINING DATAStat1_Cal.xlsx


• Consider column G, “Relative Humidity(%)” and column H, “Tensile Strength”. They
contain 1000 values.
• Generate a scatterplot for these two variables and interpret it.

ST Restricted
Answer 80

Scatterplot of “Relative Humidity (%)” VS. “Tensile Strength”


• From excel, copy the 2 columns to JMP data table
• Click Graph > Scatterplot Matrix
• Follow the JMP routine on creating Scatter Plot

INTERPRETATION
the graphical analysis of the graph shows:
• The two variable seem not correlated

ST Restricted
Module 3 Key Learning’s 81

• Graphical Presentation for One Variable


• Frequency Distribution Table
• Bar Chart
• Pie Chart
• Pareto Diagram
• Line Chart
• Histogram
• Stem and Leaf

• Graphical Presentation for Two Variables


• Scatter Plot
• Double Histograms
• Cross (Contingency) Table
• Side by Side Bar Chart

ST Restricted
Module 4: Descriptive Indices

ST Restricted
Module 4 objectives 83

At the end of this chapter, you will be able to:

• Calculate and interpret the main “Indices of Location”


• Calculate and interpret the main “Indices of Spread”
• Interpret the main “Indices of association”
• Recognize a symmetrical distribution

ST Restricted
Descriptive Indices 84

• Graphical Presentations are very useful in describing data


• GOAL: help transforming data into information
• Provide an overall, quick, and intuitive idea about the behavior of processes
• In some cases suffer of subjectivity (interpretation and some steps in their creation)
• Are easy to build
• Can be useful for both numerical and categorical data

• Numerical Descriptions of Data provide complementary information


• GOAL: the same as for graphical presentations
• Provide information on specific aspects of the considered data (e.g. location and spread)
• Thanks to their numerical nature, they do not suffer of subjectivity
• Are easy to calculate
• Are used for numerical data

ST Restricted
Symbols 85

When referring to descriptive indices, conventional symbols are used:

Letters of the Latin alphabet for Sample Statistics, and letters of the
Greek alphabet for Population Parameters.

IF SAMPLE IF POPULATION
INDEX
STATISTIC PARAMETER
MEAN 𝑋ത 𝜇 (or other, e.g. 𝜆)
STANDARD DEVIATION 𝑆 𝜎
VARIANCE 𝑆2 𝜎2
PROPORTION 𝑃 𝜃

NOTE: when a sample statistic is used to estimate an unknown population parameter, this is
indicated by the symbol ^ . For example: 𝑋ത = 𝜇Ƹ (read: «𝑋ത is the estimated value of 𝜇»).

ST Restricted
Descriptive Indices Outline 86

INDEX PROVIDES INFORMATION ABOUT


Mean
Median
Data Location (on a scale)
Mode
Percentiles

Range
Variance Data Variation (Variability or Spread)
Standard Deviation

Skewness Shape of data distribution


Kurtosis (compared to the normal distribution)

Covariance
Association between variables
Correlation Coefficient

ST Restricted
Measures of Location 87

AVERAGE MEDIAN MODE PERCENTILE

n
 xi x1 + x 2 +  + x n
i =1
x= =
n n
80% of the
values
80th percentile

ARITHMETIC AVERAGE. CENTRAL VALUE OF THE MOST FREQUENTLY VALUE GRATER THAN
ORDERED SAMPLE (OR, OBSERVED VALUE A CERTAIN % OF THE
THE 50th PERCENTILE). OBSERVATIONS

ST Restricted
Mean 88

• The Average or mean is the most common measure of central tendency

• For a population of N values:


N

x i
x1 + x 2 +  + x N Population values
μ= i=1
=
N N
Population size
• For a sample of size n:
n

x i
x1 + x 2 +  + x n Observed values
x= i=1
=
n n Sample size

ST Restricted
Mean 89

• The most common measure of central tendency

• Mean = sum of values divided by the number of values

• Affected by extreme values (outliers)

Outlier

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Mean = 3 Mean = 4
1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20
= =3 = =4
5 5 5 5

ST Restricted
Median 90

MEDIAN

• Given an ordered (ascending order) sample of size n:


• The median is preceded and followed by 50% of the sample values
• The median occupies the 0.5(n+1)th position of the sample. This is only the
position of the median in the ordered sample, NOT its value.
• If n is odd, the median is the central observation.
• If n is even, the median is the average of the two central observations.
• If compared to the average, the median is more robust to outliers.

ST Restricted
Median 91

“The median is more robust to outliers than the average”.

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Outlier

Median = 3 Median = 3

Average = 3 Average = 4

ST Restricted
MODE Mode 92

• A measure of central tendency

• Value that occurs most often

• Not affected by extreme values

• Used for either numerical or categorical data

• There may be no mode

• There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

No Mode

Mode = 9

ST Restricted
Mode 93

Bimodal distribution (2 modes)


Distribuzione Simmetrica Distribuzione Simmetrica

Frequency 10 10
9 9
8 8
7 7
Frequenza

Frequenza
6 6
5 5
4 4
3 3
2 2
1 1
0 0
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 18 i 29 j 3k 4l 5m 6n 7o 8p 9q
Classes

ST Restricted
Percentile 94

DEFINITION
A percentile is a value for which a certain proportion of data falls above and
below it.

“The pth percentile is a value, Y(p), such that at most (100p)% of the
measurements are less than this value and at most 100(1- p)% are greater. The
50th percentile is called the median. Percentiles split a set of ordered data into
hundredths. For example, 70% of the data should fall below the 70th percentile”.

From: NIST/SEMATECH e-Handbook of Statistical Methods - http://www.itl.nist.gov/div898/handbook/

ST Restricted
Quartile 95

QUARTILES
The quartiles are percentiles which divide the ordered sample in 4 parts containing
each the same amount of data. The 3 quartiles are generally indicated by Q1, Q2,
and Q3.

25% 25% 25% 25%

Q1 Q2 Q3

given an ordered sample,


Q1, the first quartile, is the observation whose value is greater than 25% of the values of the whole sample
(and smaller than 75%). It occupies the 0.25 (n+1)th position of the ordered sample.
Q2, the second quartile (or median), is the observation whose value is greater (and smaller) than 50% of
the values of the whole sample. It occupies the 0.5 (n+1)th position of the ordered sample.
Q3, the third quartile, is the observation whose value is greater than 75% of the values of the whole sample
(and smaller than 25%). It occupies the 0.75 (n+1)th position of the ordered sample.

ST Restricted
Measures of Variation 96

RANGE INTERQUARTILE VARIANCE STANDARD


RANGE DEVIATION

Measures of variation give


information on the spread or
variability of the data values.

Same center,
different variation

ST Restricted
Range 97

RANGE
It is the difference between the largest and the smallest observations:
RANGE = Xmax - Xmin

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

RANGE = 14 - 1 = 13

ST Restricted
Range 98

“The Range does not consider the distribution of the observations”.

7 8 9 10 11 12 7 8 9 10 11 12

RANGE = 12 - 7 = 5 RANGE = 12 - 7 = 5

“The Range is sensitive to the presence of outliers”.


1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
RANGE = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
RANGE = 120 - 1 = 119

ST Restricted
Interquartile Range 99

Can eliminate some outlier problems by using the interquartile


range (IQR):

Eliminate high- and low-valued observations and calculate the range


of the middle 50% of the data

Interquartile range = 3rd quartile – 1st quartile


IQR = Q3 – Q1

ST Restricted
Box-Plot 100

Using the definitions of Quartiles and IQR, it is possible to create a very useful
graphical presentation: the box-plot, also called “box and whiskers plot”.
The elements needed to generate a box-plot are:
1. A “box”, defined by the IQR → it includes the central 50% of the observations
2. A line within the box, corresponding to the median (in addition, also the mean can be shown)
3. Two lines called “whiskers”, with length defined as follows:
• Upper Whisker (UW): if max(x)<Q3+1.5IQR => UW=max(x). Otherwise, UW=Q3+1.5IQR
• Lower Whisker LW): if min(x)>Q1-1.5IQR => LW=min(x). Otherwise, LW=Q1-1.5IQR

Observations larger than Q3+1.5IQR OR smaller than Q1-1.5IQR are plotted outside the
whiskers and suspected to be outliers.
BOX
Lower Whisker - LW Q2 Upper Whisker - UW
(Median)

25% 25%

Outliers? IQR Outlier?


Q1 Q3
Values

ST Restricted
Box-Plot 101
EXAMPLES

X minimum X maximum
• LW = X min
• UW = X max
• No outliers

X minimum Q3+1.5IQR
• LW = X min
• UW = Q3 + 1.5IQR
• 1 suspected outlier

Q1-1.5IQR X maximum
• LW =Q1 - 1.5IQR
• UW = X max
• 1 suspected outlier

X minimum Q3+1.5IQR
• LW = Q1 - 1.5IQR
• UW = Q3 + 1.5IQR
• 3 suspected outlier

ST Restricted
Box-Plot 102

To compare two or more groups of data, it is helpful to display several


box-plots on the same graph.

EXAMPLE

GROUP “A”

GROUP “B”

Values

ST Restricted
Variance 103

VARIANCE
Average of squared deviations of values from the mean

2 σ𝑁
𝑖=1(𝑋𝑖 −𝜇)
2 (𝑋1 −𝜇)2 +(𝑋2 −𝜇)2 + ⋯+(𝑋𝑁 −𝜇)2
• For a population of N values: 𝜎𝑥 = =
𝑁 𝑁

2 σ𝑛 ത 2
𝑖=1(𝑋𝑖 −𝑋)
ത 2 +(𝑋2 −𝑋)
(𝑋1 −𝑋) ത 2 + ⋯+(𝑋𝑛 −𝑋)
ത 2
• For a sample (*) of size n: 𝑠𝑥 = =
𝑛−1 𝑛−1

Where: 𝜇 = population mean


𝑋ത = sample mean
N = population size
n = sample size
xi = ith value of the variable X

ST Restricted
Standard Deviation 104

STANDARD DEVIATION
Square root of average of squared deviations of values from the mean

σ𝑁
𝑖=1(𝑋𝑖 −𝜇)
2 (𝑋1 −𝜇)2 +(𝑋2 −𝜇)2 + ⋯+(𝑋𝑁 −𝜇)2
• For a population of N values: 𝜎𝑥 = =
𝑁 𝑁

σ𝑛 ത 2
𝑖=1(𝑋𝑖 −𝑋)
ത 2 +(𝑋2 −𝑋)
(𝑋1 −𝑋) ത 2 + ⋯+(𝑋𝑛 −𝑋)
ത 2
• For a sample of size n: 𝑠𝑥 = =
𝑛−1 𝑛−1

Where: 𝜇 = population mean


𝑋ത = sample mean
N = population size
n = sample size
xi = ith value of the variable X

ST Restricted
Standard Deviation 105

• Most commonly used measure of variation

• Shows variation about the mean

• It is sensitive to outliers

• Has the same units as the original data

ST Restricted
Standard Deviation 106

EXAMPLES

X = 15.5
Case A S = 3.338
11 12 13 14 15 16 17 18 19 20 21

X = 15.5
Case B S = 0.926
11 12 13 14 15 16 17 18 19 20 21

X = 15.5
Case C S = 4.570
11 12 13 14 15 16 17 18 19 20 21

ST Restricted
Advantages of Variance & Standard Dev. 107

• Each value in the data set is used in the calculation

• Values far from the mean are given extra weight


(because deviations from the mean are squared. This, on the other
hand, makes the standard deviation and the variance highly sensitive
to the presence of outliers)

ST Restricted
Coefficient of Variation 108

• Measures relative variation

• Always in percentage (%)

• Shows variation relative to mean

• Can be used to compare two or more sets of data measured in


different units
𝑠
𝐶𝑉 = 100%
𝑥ҧ

ST Restricted
Coefficient of Variation 109

EXAMPLE – comparison of the variability of two production lots

• LOT1:
• Average oxide thickness = 500
• Standard deviation = 15
s 15
CVLOT1 = 100% = 100% = 3%
x 500
Both Lots have the
• LOT2: same standard
deviation, but Lot 2
• Average oxide thickness = 650 is less variable
• Standard deviation = 15 relative to its larger
thickness
s 15
CVLOT2 = 100% = 100% = 2.3%
x 650

ST Restricted
Measures of Shape 110

SKEWNESS KURTOSIS

Compares the “asymmetry” of Compares the shape of the


a distribution with the Normal “peak” of a (unimodal)
distribution. distribution with the shape of the
peak of a Normal distribution.

NOTE: in this course, the measures of shape are only mentioned. More details, are
included in the course “Statistics Level 2”.

ST Restricted
Skewness – measure of asymmetry 111

SKEWNESS measures “the extent of the lack of symmetry” of a distribution, compared to


the normal distribution ( perfectly symmetrical and with skewness equal to zero).

Different shapes according to different values of skewness

SKN(X) = 0

SKN(X) > 0 SKN(X) < 0

ST Restricted
Kurtosis – measure of peakedness 112

KURTOSIS (Departure from the Shape of the Peak of a Normal)

Different shapes according to different values of kurtosis

KUR(X) > 0

KUR(X) = 0

KUR(X) < 0

ST Restricted
Asymmetry Mean and Median 113

In a symmetrical distribution, the mean and the median are the same value

Position of MEAN and MEDIAN in case of symmetry/asymmetry of the distribution.

Symmetry Positive asymmetry


Negative asymmetry
Average = Median
Average < Median Average > Median

ST Restricted
Asymmetry and Box-Plot 114

In a symmetrical distribution,
◼ the mean and the median are the same value
◼ Q1 is distant from Q2 the same as Q2 is distant from Q3, i.e. (Q2-Q1) = (Q3-Q2)

It is possible to check both the two conditions using a box-plot:

(Q2-Q1) > (Q3-Q2) => non-symmetrical

Q1 Q2 Q3

(Q2-Q1) = (Q3-Q2) => symmetrical

Q1 Q2 Q3

Indicating the mean with the symbol “ +”:


+ Mean = Median (Q2) => symmetrical

Q1 Q2 Q3

ST Restricted
Indices for the association between variables 115

To study the association between 2 variables, two indices can be considered :

• The covariance
• The coefficient of correlation

ST Restricted
Covariance 116

Use it to study the linear relationship between two random variables .

In particular, given two variables X1 and X2, this index provides information about:
• The existence of a linear relationship between X1 and X2.
• The direction of the relationship.
N

∑( X 1i - μ X1 )( X 2i - μ X 2 )
For a population of size N: Cov ( X 1 , X 2 ) = σ X Y =
i =1
1 2
N
n

∑( X 1i - X )( X 2i - Y )
i =1
For a sample of size n: Cov ( X 1 , X 2 ) = s X1 X 2 =
n -1

INTERPRETATION

cov(X,Y) < 0  negative linear relationship


cov(X,Y) = 0  no linear relationship
cov(X,Y) > 0  positive linear relationship

Since the covariance varies in the - to + interval, this index is not of great help in assessing
the intensity (or strength) of the linear association between the variables.

ST Restricted
Graphical interpretation Covariance 117

Y’ Positive linear relationship

(Xi − mX )  0 COV (X,Y) > 0


and
(Yi − mY )  0 (Xi − mX )  0
and
COV (X,Y) < 0 (Yi − mY )  0
mY P
X’
COV (X,Y) > 0 COV (X,Y) < 0

( Xi − mX )  0 ( Xi − mX )  0
and and
( Yi − mY )  0 ( Yi − mY )  0

Negative linear relationship

mX X

ST Restricted
Pearson’s Correlation coefficient - r 118

Use it to study the linear relationship between two random variables.


In particular, given two variables X1 and X2, this index provides information about:
• The existence of a linear relationship between X1 and X2.
• The direction of the relationship.
• The intensity or strength of the linear relationship.

Cov( X 1 , X 2 )
For a population of size N: ρ = corr ( X 1 , X 2 ) =
σ X1 σ X 2
Cov( X 1 , X 2 )
For a sample of size n: r = corr ( X 1 , X 2 ) =
s X1 s X 2

INTERPRETATION

corr (X1, X2) = -1  perfect negative linear relationship


-1 < corr (X1, X2) < 0  (different intensities of) negative linear relationship
corr (X1, X2) = 0  no linear relationship
0 < corr (X1, X2) < 1  (different intensities of) positive linear relationship
corr (X1, X2) = 1  perfect positive linear relationship

ST Restricted
Pearson’s Correlation coefficient - r 119

Correlation coefficient values for different degrees of association between variables X1 and X2.

X2 X2
A B

A Perfect negative linear correlation.

B Perfect positive linear correlation.


r = -1 X1 r=1 X1

X2
X2 D
C

C No linear correlation.

D No linear correlation.
r=0 X1 r=0 X1

E X2 F
X2
E Negative linear correlation (weak).

F Positive linear correlation (mild).

r = -0.32 X1 r = 0.58 X1

ST Restricted
Pearson’s Correlation coefficient 120

EXAMPLE
Evaluate the extent of the correlation between two electrical parameters: Isat and Vt of a MOS transistor in
C045 nm technology. A sample of 225 couples of values (Isat and Vt) is drawn. Data are summarized in the
following table (for brevity, only the first and last 5 rows of the original data-set are shown in the table.
However, the analysis was performed on the entire data-set).

PIDS04L006LS PVT06L004LS
LOT_WAFER_SITE
(µA/µm) (V)
The Data-set Q135WEZ_10_1 -323.5 -0.3588
Q135WEZ_10_2 -322.5 -0.3727
Q135WEZ_10_3 -311.6667 -0.3468
Q135WEZ_10_4 -325.75 -0.3543
Q135WEZ_10_5 -333.5 -0.3448

() () ()


Q135WEZ_9_5 -321 -0.3707
Q135WEZ_9_6 -284.1667 -0.3823
Q135WEZ_9_7 -305.3333 -0.3912
Q135WEZ_9_8 -319.8333 -0.3645
Q135WEZ_9_9 -329.8333 -0.3482

ST Restricted
Pearson’s Correlation coefficient 121

EXAMPLE (continuation)

Scatter plot of PI DS 04L006LS v s PV T06L00 4LS

-260

-270

Scatter plot of data


PIDS04L006LS (uA/um)
-280

-290

-300

-310

-320

-330

-340

-350

-360
-0,43 -0,42 -0,41 -0,4 -0,39 -0,38 -0,37 -0,36 -0,35 -0,34 -0,33 -0,32 -0,31

PVT06L 004LS (V)

PIDS04L006LS (mA/mm) PVT06L004LS (V)


Correlation matrix PIDS04L006LS (mA/mm) -0.7405
PVT06L004LS (V) -0.7405

Pearson’s correlations is -0.7405. This suggests a strong linear relation between


the variables. Since the correlation coefficient is negative, as one variable
increases, the second decreases (negative correlation).

ST Restricted
Notes on Pearson’s Correlation coefficient 122

❑ The correlation coefficient is a unit-free index whose value must lie between -1 and +1
inclusive. For this reason, in addition to the existence and direction of the relationship, this
index provides information on the intensity of the linear relationship between two variables.
❑ Pearson correlation coefficient assumes that the two considered variables jointly form a
bivariate normal distribution. This aspect will be explained in the course “Statistics Level
2”, where also alternative approaches ( in case this assumption is not true) is considered.
❑ A value of +1 would result if all the points could be connected by a straight line with a
positive slope.
❑ A value of -1 would occur if all the points could be connected by a straight line with a
negative slope. Neither extreme case could be expected to occur in practice, however.
❑ The intensity of the linear relation between X and Y is higher as the correlation gets closer to
either  1.

❑ If the random variables X and Y are independent, then the correlation coefficient is 0.
However, the converse is not true, since only the linear relationship is detectable by the
correlation coefficient (for example, the relationship may be quadratic).

ST Restricted
Activity 123

• Open the file TRAINING DATAStat1_Cal.xlsx


• Consider the column B, “Ball Shear”.
• Use the statistical package to calculate all the statistics explained in this module

60 minutes
ST Restricted
Answer 124

JMP

ST Restricted
Activity 125

• Open the file TRAINING DATAStat1_Cal.xlsx


• Consider the column B, “Ball Shear”.
• Use the statistical package to generate a boxplot and interpret the result

60 minutes
ST Restricted
Answer 126

Graphical evidence of a symmetrical distribution

ST Restricted
Module 4 Key Learning’s 127

• Measures of location • Measures of spread


• Mean • Skewness and Kurtosis
• Median • How to use a Box-Plot to check if a
• Mode distribution is symmetrical
• Percentile
• Quartile • Measures of association
• Covariance
• Measures of spread • Pearson’s correlation Coefficient
• Range
• Interquartile Range
• Variance
• Standard Deviation
• Coefficient of variation

ST Restricted
Module 5: Random Variables

ST Restricted
Module 5 objectives 129

At the end of this chapter, you will be able to:


• Discriminate between discrete and continuous random variables
• Describe the main properties of distribution functions and cumulative distribution
functions

ST Restricted
Random Variable 130

WHAT IS IT?
Statistics and Mathematics are not the same! However, they often use the same terms.
But with different meanings…

In Algebra, a variable is an unknown quantity. Usually, the problem consists in finding out its
value. For example, given the equation 12-3x=0, we can find that x=4.

In Statistics, a Random Variable is different…


X is called a “variable” because its value can “vary” within a set of possible value. X can take
on any of those values and… randomly. That’s why X is called a random variable.

ST Restricted
Random Variable 131

Lowercase VS Capital letters

Conventionally,

• A random variable is given a capital letter, e.g. X, Y, Z, W,…


• Values (that a random variable can take on) are given lowercase letters, e.g. {x1, x2, …, xn}
for the random variable X. {y1, y2, …, yn} for the random variable Y, and so on.

So, we write for example: X={x1, x2, …, xn} and read “the random variable X can take on the
values x1, x2, …, xn”.

ST Restricted
Random Experiment & Random Variable 132

DEFINITIONS

(1) Random Experiment


• A Random Experiment is the activity needed to collect data on a specific aspect of a process
(“process”, here, is in general, not only referred to a “production process”. To toss a coin is a
process…). The outcome of the experiment can be either:
• The result of a measurement process (e.g. “the thickness of a layer” or the result of a “pull
test in wire bonding”)
• A count (e.g. “number of failed dice in a wafer”)
• The experiment is called random, since it takes on different values according to some random
mechanism and its results cannot be predicted in any trial. The idea of randomness is that, the
value of the random variable will vary from trial to trial as the experiment is repeated according to
the inherent variability of the considered process.

(2) Random Variable


• A Random Variable is the set of possible values from a Random Experiment. Or, in other terms,
a Random Variable takes values which are the outcome of a Random Experiment.

• The set of possible values is called the Sample Space.

ST Restricted
Continuous & Discrete Random Variables 133

Results of a “pull test” Measures of thickness

Continuous Random Variables


take on any value in an
interval, including fractions
and decimals.

Count of “good” dice in a wafer (or at FT)

Discrete Random Variables


take on integer (a countable list of
distinct) values only. They never
include fractions or decimals.

ST Restricted
Probability Model 134

Statistics helps us providing probability models.


DEFINITIONS
(1) Probability Model
It is a mathematical model that relates a value of a random variable with the probability of occurrence of
that value in the population.

(2) Probability Distribution


It is the probability model associated to a discrete random variable

(3) Probability Density Function


It is the probability model associated to a continuous random variable

ST Restricted
Probability Distribution & Density Function 135

DISCRETE RANDOM VARIABLE CONTINUOUS RANDOM VARIABLE

PROBABILITY DISTRIBUTION PROBABILITY DENSITY FUNCTION

what is the probability that in a wafer what is the probability that for a wafer
randomly selected from a lot, the number of randomly selected, the value of thickness (the
defective dice (the variable X) is 3? variable X) is included in the interval (x1;x2)?

ST Restricted
Probability Distribution & Density Function 136

Main properties of the Probability Distribution and of the Density Function


Probability Distribution
• It is indicated by P(X=x)=P(x) → read: “the probability that the X takes on the value x”
• 0 ≤ P(x) ≤ 1 for every possible value x

•Σ
All
P(x) = 1 The sum of P(x) over all the possible values of x is equal to one
possible x

Density Function
• It is indicated by f(x)
• We generally refer to the probability that X belongs to an interval of possible values
• f(x0) = 0 the probability that X is equal to a value x0 is equal to zero

• ∫f(x)dx = 1 The probability that X belongs to the interval of all the possible values is 1
All
possible x

ST Restricted
Theoretical Probability Models 137

QUESTION
“So, how Statistics can help us in calculating the probability that a certain event takes place?”

ANSWER
“Statisticians defined many different probability models that can be used in real-world
phenomena. They are called Theoretical Probability Models. “

Real-World phenomena do not


behave all the same way.
→ We need different models to
interpret their different behaviors.

Here on the right, some


Theoretical Probability Models
or DISTRIBUTIONS

DISCRETE CONTINUOUS
DISTRIBUTIONS DISTRIBUTIONS
ST Restricted
The Normal Distribution 138

If a Random Variable X is normally distributed, its density function is given by:

1 1 𝑥−𝜇 2
−2 𝜎
𝑓 𝑥 = 𝑒
𝜎 2𝜋

Where, e = the mathematical constant approximated by 2.71828


π = the mathematical constant approximated by 3.14159
μ = the population mean
σ = the population standard deviation
x = any value of the continuous variable, − < x < 

ST Restricted
The Normal Distribution 139

f(X)

Normal Density Function σ

X
μ

Main features of the Normal Distribution:


• It is symmetric
• It is unimodal
• It is bell-shaped
• It is defined by two parameters:
• μ – the mean of the distribution (-∞ < μ < ∞)
• σ2 – the variance of the distribution (σ2 > 0)

ST Restricted
Graphical Interpretation of σ 140

For the Normal distribution, a graphical interpretation of the standard deviation


σ permits to estimate the expected percentage of the population values falling
between the limits defined by the mean μ ± kσ (k=1,2,3). This is illustrated in the
following figure:

ST Restricted
Variations of the Parameters 141

By varying the parameters μ and σ, we obtain different normal distributions

ST Restricted
Variations of the parameters 142

Changing μ shifts the


distribution left or right.
f(x)

Changing σ increases or
decreases the spread.
σ

μ x

ST Restricted
Graphical Assessment of Normality 143

• In many situations it is necessary to know if the sample data on


which we do our analyses comes from a Normal distribution.

• This can be initially assessed using the graphical methods


presented here.

• Notes:
• At a graphical level, we can only produce a partial assessment of normality (we
can be confident about the conclusion from a graph that clearly indicates non-
normality. Conversely, if the graph recalls a normal behavior, some doubts
should remain)
• To obtain more complete information, other procedures should be employed (→
Inferential methods, which are not presented in this course).

ST Restricted
Graphical Assessment of Normality 144

Graphical Methods to Assess Normality

Histogram Normal Probability Plot

ST Restricted
Histogram to Assess Normality 145

• The Normal distribution has some typical characteristics.


• It is symmetrical
• It is bell-shaped
• It is Unimodal (only one Mode)

If the histogram presents ALL these features, there is a


graphical evidence of normality.

ST Restricted
Histogram to Assess Normality 146

EXAMPLES histograms of samples drawn from different distributions are shown.

UNIFORM Distribution (5-7)


• It seems symmetrical
• It is NOT bell-shaped
• It is NOT unimodal

Conclusion: Non-Normal distribution

GAMMA Distribution (3,1.2)


• It is NOT symmetrical
• It is NOT bell-shaped
• It is unimodal

Conclusion: Non-Normal distribution

ST Restricted
Histogram to Assess Normality 147

BIMODAL (2 Normal together)


• It seems symmetrical
• It is NOT bell-shaped
• It is NOT unimodal

Conclusion: Non-Normal distribution

NORMAL Distribution (13,1)


• It is symmetrical
• It is bell-shaped
• It is unimodal

Conclusion: Normal distribution

ST Restricted
Normal Probability Plot 148

Normal probability plot

How to build & use it:


• Arrange data from low to high values
• Find cumulative normal probabilities for all values
• Examine a plot of the observed values vs. cumulative probabilities (with the
cumulative normal probability on the vertical axis and the observed data values
on the horizontal axis)
• Evaluate the plot for evidence of linearity (linear pattern of the plotted points →
evidences of normality. Non-linear patterns (according to the shape) are an
indication of different but non-normal distributions
• In the following examples, some cases are illustrated. The red asterisks refer to
the first and the third quartile (Q1 and Q3) of the plotted data

ST Restricted
Normal Probability Plot 149

NORMAL DISTRIBUTION SYMMETRICAL NON-NORMAL (t)

Data Data
Quantiles Quantiles

Normal Quantiles Normal Quantiles

SYMMETRICAL NON-NORMAL (Uniform) SYMMETRICAL NON-NORMAL (Bimodal)

Data Data
Quantiles Quantiles

Normal Quantiles Normal Quantiles


ST Restricted
Normal Probability Plot 150

LEFT-SKEWED DISTRIBUTION RIGHT-SKEWED DISTRIBUTION

Data
Data
Quantiles
Quantiles

Normal Quantiles Normal Quantiles

ST Restricted
Activity 151

• Open the file TRAINING DATAStat1_Cal.xlsx


• Consider the column A, “Package Thickness”. It contains 1000 measurements of
thickness
• Generate the proper graphical representation
• Interpret the graph

ST Restricted
Answer 152

Evidence of a bimodal distribution

ST Restricted
Activity 153

• Open the file TRAINING DATAStat1_Cal.xlsx


• Consider the column I, “Oxide Thickness”. It contains 118 measurements of thickness
• Generate the proper graphical representation to study the shape of its distribution
• Interpret the graph

ST Restricted
Answer 154
• OXIDE THICKNESS

Evidence of a symmetrical, bell-shaped distribution

ST Restricted
Activity 155

• Open the file TRAINING DATA.xlsx


• Consider the column C, “Punch Diameter”. It contains 138 measurements of Tape
Punch hole diameter
• Generate the proper graphical representation to study the shape of its distribution
• Interpret the graph

ST Restricted
Answer 156

• Micromodule Tape punch hole diameter

Evidence of a non-symmetrical distribution (right-skewed)

ST Restricted
Activity 157

• Open the file TRAINING DATA.xlsx


• Consider column A , “Package Thickness” and column B, “Ball Shear”.
• For each sample, generate the proper graphical representation to assess normality
• Interpret the graphs comparing the two populations

ST Restricted
Answer 159

Graphical assessment of normality

Evidence of:
• “Package Thickness” → non-normal distribution (bimodal)
• “Ball Shear” → approximately normal behavior

ST Restricted
Module 5 Key Learning’s 160

• Random Variables
• Random Experiment
• Discrete Random Variables
• Continuous Random Variables
• Probability Distributions
• Density Functions
• Theoretical Probability Models

• The Normal Distribution


• Main Features
• Graphical Interpretation of the Standard Deviation
• Variations of the Parameters
• Graphical evaluation of normality

ST Restricted
Conclusion
• You are able to understand the difference between
population and sample

• You can describe many features of a sample using both


graphical methods and numerical methods

• You know what is the meaning of random variables and to


become familiar with the normal distribution.

• You can compare populations descriptively.

• You can interpret a Normal Probability Plot (NPP).

ST Restricted
Conclusion
• What you could do next to better improve your statistical
competency:
• Use as much as possible what you have learned. And do it since tomorrow!
• Only way to avoid forgetting what you learned: do not wait too much time after the
course to start implementing the techniques shown in the training.
• Think about attending the next training course on “Statistics Level 2”
• You will learn:
• how to use sample data to assess important aspects of the entire population (inference)
• How to perform a statistical test of hypotheses.
• How to determine if the dataset contains outliers

ST Restricted
Post-test 163

• Complete the post-test to the best of your knowledge

It allows us to measure the learning that has taken place during the
training.

10-15 minutes

ST Restricted
Customer satisfaction 164

How can we improve for next time?

Kirkpatrick Level 1 evaluation questionnaire

You will receive an e-mail, Please take 5mn to complete the


evaluation form this will help us to continually improve the learning
content, facilitation, organization

ST Restricted
CONGRATULATIONS!!

ST Restricted
JMP Routines

ST Restricted
How to make Bar chart in JMP
File: Bar Pie and Pareto.jmp
167

Click File > New > Data Table


Key in data in empty data table

Go to Graph Menu > Chart

From the Select Columns click


on Freq > Statistics, then
Select % of Total

ST Restricted
How to make Bar chart in JMP 168

Click on Category (Bin) then

Check that the type of chart is


set on Bar Chart
(Other chart options available)

Hit on OK button

Back

ST Restricted
How to make Pie chart in JMP 169

ST Restricted
How to make Pie chart in JMP 170

Click on Category (Bin) then

Check that the type of chart is


set on Pie Chart
(Other chart options available)

Hit on OK button

Back

ST Restricted
How to make Pareto in JMP 171

Click File > New > Data Table


Key in data in empty data table

Go to Graph Menu > Pareto Plot

From the Select Columns click


on Category > Y,Cause, then
Freq > Freq

ST Restricted
How to make Pareto Plot in JMP 172

Right click on the Pareto Plot bar


(enclosed in the box) and select Label
Cum Percent Points to display % for
each category

Back

ST Restricted
How to make Histogram in JMPFile: Thickness.jmp
173

Open file Thickness.jmp


Click Analyze > Distribution

Cast Thickness column in Y,columns


Click OK

ST Restricted
How to make Histogram in JMP 174

Initial output display:


Click on red hotspot , then click Stack

Stacked output display:

Back

ST Restricted
How to make Scatterplot in JMP 175

File: Absentee Rate.jmp


Open file Absentee Rate.jmp
Click Graph > Scatterplot Matrix

Cast Absences to Y, columns and


Experience in X. Click OK

Objective of the study: To know if long-term employees would be more reliable and
absent less often ST Restricted
How to make Scatterplot in JMP 176

• Does it look like the experience influence


the absentee rate?

Back

ST Restricted
How to make Side by Side chart in JMP 177
File: Side by side.jmp
Click on Graph > Chart

Assign Data Column to Statistics


(Data), and Task/Operator columns
to Categories,X,Levels

ST Restricted
How to make Side by Side chart in JMP 178

Hit OK button

Back

ST Restricted

You might also like