Professional Documents
Culture Documents
STATISTICS 1 - CAL Edited
STATISTICS 1 - CAL Edited
Speaker Name
ST Context 2
• The main goals of STatS are review, rationalize and improve the
effectiveness of Statistical methodology in general.
ST Restricted
Statistics Learning 3
Measurement
Statistical
System
Model Building
Analysis (MSA)
Design of
Multivariable
Experiments
Statistics
(DOE)
ST Restricted
Why this course? 4
ST Restricted
Training purpose
5
ST Restricted
5
Benefits 6
ST Restricted
6
Let’s get to know each other… 7
Round table:
• Name
• Organization
• Are you already using statistical methodology?
• If so, what are the main applications?
• Expectations from the course
ST Restricted
Pre-test 8
10 minutes
ST Restricted
Structure of the course
ST Restricted
Structure of the course/Agenda 10
Module 3 Conclusion
• Graphical Presentation of Data
• Presentations for Numeric and Categorical Data
• Presentations for One Sample or for Two Samples
Module 4
• Descriptive Indices for
• Location
• Spread
ST Restricted
Module 1: Introduction
ST Restricted
Module 1 objectives 12
ST Restricted
Why we need Statistics? 13
• The yield of a certain product will be higher in six months than it is now.
ST Restricted
Why we need Statistics? 14
ST Restricted
The Decision Making Process 15
Decision
ST Restricted
Key Definitions 16
ST Restricted
Key Definitions 17
ST Restricted
Descriptive and Inferential Statistics 18
•Descriptive statistics
• Collecting, summarizing, and processing data to transform data into information
•Inferential statistics
• provide the bases for predictions, forecasts, and estimates that are used to
transform information into knowledge
ST Restricted
Descriptive and Inferential Statistics 19
POPULATION
SAMPLE
N=10,000 Sampling n=500
(True) Mean=?
Average=56.2
Descriptive statistics
From 500 sample data, we calculate the average.
We might also generate some graphs.
Possible Error →calculation.
Inferential statistics
Actually, we are interested in something referred to the entire population.
Not only about a description of the sample values. → We can ESTIMATE - for example
- the value of the mean of the entire population.
Error →Inferential Error: the sample will never represent all the population 100%.
ST Restricted
Descriptive and Inferential Statistics 20
Descriptive
•Collect data
• e.g., Survey
•Present data
• e.g., Tables and Graphs
•Summarize data
• e.g., Sample mean =
X i
ST Restricted
Descriptive and Inferential Statistics 21
Inference
•Estimation
• e.g., Estimate the population mean
weight using the sample mean weight
•Hypothesis testing
• e.g., Test the claim that the population
mean weight is 120 pounds
Inference is the process of drawing conclusions or making
decisions about a population based on sample results
ST Restricted
Module 1 Key Learning’s 22
• Key definitions:
• Population vs. Sample
• Parameter vs. Statistic
• Descriptive vs. Inferential statistics
ST Restricted
Module 2: Types of Data
ST Restricted
Module 2 objectives 24
ST Restricted
Types of Data: Variables Classification 25
Variable
Numerical Categorical
Continuous Discrete
ST Restricted
Types of Data: Classification Details 26
Variable
Numerical Categorical
Continuous Discrete
ST Restricted
Types of Data: Classification Details 27
Variable
Numerical Categorical
Continuous Discrete
ST Restricted
Levels of Measurement 28
Numerical
Differences between measurements, but no true zero.
EXAMPLE: Temperature in Fahrenheit or Celsius.
Categorical
Categories (no meaningful ordering or direction)
Nominal Data EXAMPLE: Type of car owned, Marital Status,…
ST Restricted
Activity 29
VARIABLE TYPE
ST Restricted
Module 2 Key Learning’s 31
• Types of variables
• Levels of Measurement
ST Restricted
Module 3: Graphical Presentation
of Data
ST Restricted
Module 3 objectives 33
ST Restricted
Graphical Presentations 34
Consider that:
ST Restricted
Types of Variables VS Types of Graphs 35
For…
ST Restricted
Graphical Presentations 36
Bar Chart
Frequency Distribution Table
Pie Chart
Pareto Diagram
ST Restricted
Frequency Distribution Table 37
WHAT IS IT?
A Frequency Distribution Table is a simple method to evaluate the
distribution of the frequencies associated to the possible values of a
variable of interest.
ST Restricted
Frequency Distribution Table 38
EXAMPLE: For a certain product and in a certain interval of time, 200 devices did
not pass electric tests on 3 different failure modes (BIN2, BIN6 and BIN8). The results
are summarized in the following Frequency Distribution Table:
NOTE: the categorical variable is “The Failure Mode”. The categories are BIN2, BIN6 and BIN8.
ST Restricted
Bar Chart 39
WHAT IS IT?
ST Restricted
Bar Chart 40
Category
Frequency
(BIN) Bar-chart (data from the previous example)
BIN 2 78 On the vertical axis, are shown the frequencies
BIN 6 49
BIN 8 73
TOTAL 200
ST Restricted
Bar Chart 41
Category
Frequency % Bar-chart (data from the previous example)
(BIN)
BIN 2 39%
On the vertical axis, are shown the frequencies %
BIN 6 24.5%
BIN 8 36.5%
TOTAL 100%
How to generate bar chart in JMP
ST Restricted
Pie Chart 42
WHAT IS IT?
ST Restricted
Pie Chart 43
WHAT IS IT?
• Used to portray categorical data.
ST Restricted
Pareto Diagram 45
ST Restricted
Line Chart 47
WHAT IS IT?
• A line chart shows the values of one or more variables over time.
• If more variables are plotted on the same graph, the comparative
behavior can be investigated to show trends, differences, cyclic
patterns etc.
• If the points are a statistic (e.g. an average), the points can be
replaced by a box-plot in order to show the spread.
• Time is measured on the horizontal axis.
• The variable of interest is measured on the vertical axis.
NOTE: this plot is helpful for categorical and for discrete variables as well.
ST Restricted
Line Chart 48
ST Restricted
Frequency Distribution (Numeric Data) 49
WHAT IS IT?
A frequency distribution for a numeric variable is a list or a table
containing class groupings (categories or ranges within which the
data fall) and the corresponding frequencies with which data fall
within each class or category.
A frequency distribution is a way to summarize data.
The distribution condenses the raw data into a more useful form and
allows for a quick visual interpretation of the data.
ST Restricted
Frequency Distribution (Numeric Data) 50
For numerical variables, the determination of the classes it is not an obvious task
like in the case of categorical variables. In fact, in this case the classes are not
“naturally” defined by all the possible characters of the variable. Here, they are
chosen arbitrarily (i.e. subjectively) in a non-unique way.
In the frequency distribution table, we might include: relative and relative %
frequencies, cumulative and cumulative % frequencies.
CLASS 2
ST Restricted
Frequency Distribution 51
75.4786
min = 73.7692, max=75.6101
74.6043
74.5925
73.7692 A possible (non-unique) way of grouping is:
raw data 74.3453 CLASS 1 → 73.0000 but less than 74.0000
74.4622 CLASS 2 → 74.0000 but less than 75.0000
74.815
CLASS 3 → 75.0000 but less than 76.0000
74.0306
75.6101
74.0489
CLASS 1 1 0.1
CLASS 2 7 0.7
CLASS 3 2 0.2
TOTAL 10 1
ST Restricted
Frequency Distribution 52
NOTES
1. Grouping data has clear interpretative advantages but as a result some detail is
lost (in fact, bins are also called “classes of equivalence” → all the observations
grouped in the same bin are considered equivalent. This implies that, once the
groups are formed, it will not be possible anymore to discriminate between
observations that belong to the same group). See also, Stem & Leaf Display.
2. Class limits must be chosen in order to guarantee mutually exclusive
classes, i.e. each observation can be included in one and only one class.
ST Restricted
Number of Classes (K) 53
To define the number of classes (k), you might use the following thumb-rule:
Number of Number of
observations (n) classes (k)
n < 50 5-7
50 n 100 7-8
101 n 500 8-10
501 n 1000 10-11
1001 n 5000 11-14
n > 5000 14-20
ST Restricted
Class Width (W) 54
To generate K equally sized (*) classes (i.e. of uniform width), the width W of
each class is given by:
max - min
W=
K
Where, max and min are the largest and the smallest sample values respectively.
(*) NOTE: it is possible to generate a histogram with unequal class widths but its interpretation is
different since the bar heights are not enough to “catch” the relative importance of each class.
For details, see ADCS 8482919_A, §6.1.1.1 CASE B, page 24/400.
ST Restricted
Histogram 55
WHAT IS IT?
• A graph of the data in a frequency distribution is called a histogram.
• The interval endpoints are shown on the horizontal axis.
• the vertical axis is either frequency, relative frequency, or
percentage.
• Bars of the appropriate heights are used to represent the number of
observations within each class (no gaps between bars are allowed).
• A minimum number of 30-40 observations is required to obtain
interpretable results.
• Histograms are used to study shape (e.g. symmetry), location and
spread of the data.
ST Restricted
Histogram 56
75.4786 74.4622
74.6043 74.8150
EXAMPLE build an histogram for the following data: 74.5925 74.0306
73.7692 75.6101
74.3453 74.0489
STEP 1: chose the classes (number and width): CLASS 1 → 73.0000 but less than 74.0000
CLASS 2 → 74.0000 but less than 75.0000
CLASS 3 → 75.0000 but less than 76.0000
2 0.2
1 0.1
73 74 75 76 classes 73 74 75 76 classes
ST Restricted
Histogram 57
QUESTIONS
1. How wide should each interval be?
(How many classes should be used?)
ST Restricted
Histogram 58
•
2.5
may yield a very jagged distribution with gaps
Frequency
2
from empty classes 1.5
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
More
Temperature
12
Frequency
8
• may compress variation too much and yield a 6
blocky distribution 4
0
0 30 60 More
Temperature
(X axis labels are upper class endpoints)
ST Restricted
Histogram Interpretation 59
Symmetric Distribution
10
9
8
7
Frequency
6
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9
10
(skewed to the right) has a longer tail 8
Frequency
that extends to the right in the 6
4
direction of positive values. 2
0
1 2 3 4 5 6 7 8 9
6
of negative values. 4
0
1 2 3 4 5 6 7 8 9
ST Restricted
Ogive 61
Cumulative frequency %
100
80
10
74 75 76 Interval endpoint
ST Restricted
Stem-and-Leaf Display 62
WHAT IS IT?
ST Restricted
Stem-and-Leaf Display 63
stems leaves
ST Restricted
Presentations for Two Variables 64
Numerical Categorical
ST Restricted
Scatter Plot 65
WHAT IS IT?
• Scatter Plots are used for paired observations(*) taken from
two numerical variables
• The Scatter Plot:
• one variable is measured on the vertical axis and the other
variable is measured on the horizontal axis.
(*) NOTE
The values plotted are: Pi=(Xi,Yi) i=1,…,n (i.e. p1=(X1, Y1), P2=(X1,Y2),…,Pn=(Xn, Yn))
ST Restricted
Scatter Plot 66
Scatter Plots are very informative tools. For 2 variables X and Y, they permit to assess for example:
X X
Y Positive Negative Y
Outliers
X X
25
20
15
10
frequency
5
0
5
10
15
20
25
345 346 347 348 349 350 351 352 353 354 355
Y
NOTE: For other graphical methods like this, see ADCS 8482819_A - §6.2.1.1
ST Restricted
Cross Tables 68
WHAT IS IT?
• Cross Tables (or contingency tables) list the number of
observations for every combination of values for two categorical or
ordinal variables
ST Restricted
Cross Tables 69
EXAMPLE
4 x 3 Cross Table for:
Time dedicated to 4 types of Task (rows) by 3 Operators (columns)
ST Restricted
Side by Side Bar Chart 70
EXAMPLE
Using the data from the previous example, produce a “side by side bar chart”
Time
Operator A
60
Operator B
50
Operator C
40
30
20
10
ST Restricted
Answer 74
• Measurement Equipment – Barchart
• From excel, copy the column Measurement
Equipment to JMP data table
• Go to JMP, File > Edit > Copy with
Column name
• Follow the JMP routine on creating
Bar Chart
ST Restricted
Answer 75
• Measurement Equipment - Piechart
• From excel, copy the column Measurement
Equipment to JMP data table
• Go to JMP, File > Edit > Copy with
Column name
• Follow the JMP routine on creating
Pie Chart
ST Restricted
Activity 76
ST Restricted
Answer 78
• THICKNESS Histogram
• From excel, copy the column BALL SHEAR to
JMP data table
• Click Analyze > Distribution
• Follow the JMP routine on creating Histogram
ST Restricted
Activity 79
ST Restricted
Answer 80
INTERPRETATION
the graphical analysis of the graph shows:
• The two variable seem not correlated
ST Restricted
Module 3 Key Learning’s 81
ST Restricted
Module 4: Descriptive Indices
ST Restricted
Module 4 objectives 83
ST Restricted
Descriptive Indices 84
ST Restricted
Symbols 85
Letters of the Latin alphabet for Sample Statistics, and letters of the
Greek alphabet for Population Parameters.
IF SAMPLE IF POPULATION
INDEX
STATISTIC PARAMETER
MEAN 𝑋ത 𝜇 (or other, e.g. 𝜆)
STANDARD DEVIATION 𝑆 𝜎
VARIANCE 𝑆2 𝜎2
PROPORTION 𝑃 𝜃
NOTE: when a sample statistic is used to estimate an unknown population parameter, this is
indicated by the symbol ^ . For example: 𝑋ത = 𝜇Ƹ (read: «𝑋ത is the estimated value of 𝜇»).
ST Restricted
Descriptive Indices Outline 86
Range
Variance Data Variation (Variability or Spread)
Standard Deviation
Covariance
Association between variables
Correlation Coefficient
ST Restricted
Measures of Location 87
n
xi x1 + x 2 + + x n
i =1
x= =
n n
80% of the
values
80th percentile
ARITHMETIC AVERAGE. CENTRAL VALUE OF THE MOST FREQUENTLY VALUE GRATER THAN
ORDERED SAMPLE (OR, OBSERVED VALUE A CERTAIN % OF THE
THE 50th PERCENTILE). OBSERVATIONS
ST Restricted
Mean 88
x i
x1 + x 2 + + x N Population values
μ= i=1
=
N N
Population size
• For a sample of size n:
n
x i
x1 + x 2 + + x n Observed values
x= i=1
=
n n Sample size
ST Restricted
Mean 89
Outlier
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20
= =3 = =4
5 5 5 5
ST Restricted
Median 90
MEDIAN
ST Restricted
Median 91
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Outlier
Median = 3 Median = 3
Average = 3 Average = 4
ST Restricted
MODE Mode 92
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
ST Restricted
Mode 93
Frequency 10 10
9 9
8 8
7 7
Frequenza
Frequenza
6 6
5 5
4 4
3 3
2 2
1 1
0 0
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 18 i 29 j 3k 4l 5m 6n 7o 8p 9q
Classes
ST Restricted
Percentile 94
DEFINITION
A percentile is a value for which a certain proportion of data falls above and
below it.
“The pth percentile is a value, Y(p), such that at most (100p)% of the
measurements are less than this value and at most 100(1- p)% are greater. The
50th percentile is called the median. Percentiles split a set of ordered data into
hundredths. For example, 70% of the data should fall below the 70th percentile”.
ST Restricted
Quartile 95
QUARTILES
The quartiles are percentiles which divide the ordered sample in 4 parts containing
each the same amount of data. The 3 quartiles are generally indicated by Q1, Q2,
and Q3.
Q1 Q2 Q3
ST Restricted
Measures of Variation 96
Same center,
different variation
ST Restricted
Range 97
RANGE
It is the difference between the largest and the smallest observations:
RANGE = Xmax - Xmin
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
RANGE = 14 - 1 = 13
ST Restricted
Range 98
7 8 9 10 11 12 7 8 9 10 11 12
RANGE = 12 - 7 = 5 RANGE = 12 - 7 = 5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
RANGE = 120 - 1 = 119
ST Restricted
Interquartile Range 99
ST Restricted
Box-Plot 100
Using the definitions of Quartiles and IQR, it is possible to create a very useful
graphical presentation: the box-plot, also called “box and whiskers plot”.
The elements needed to generate a box-plot are:
1. A “box”, defined by the IQR → it includes the central 50% of the observations
2. A line within the box, corresponding to the median (in addition, also the mean can be shown)
3. Two lines called “whiskers”, with length defined as follows:
• Upper Whisker (UW): if max(x)<Q3+1.5IQR => UW=max(x). Otherwise, UW=Q3+1.5IQR
• Lower Whisker LW): if min(x)>Q1-1.5IQR => LW=min(x). Otherwise, LW=Q1-1.5IQR
Observations larger than Q3+1.5IQR OR smaller than Q1-1.5IQR are plotted outside the
whiskers and suspected to be outliers.
BOX
Lower Whisker - LW Q2 Upper Whisker - UW
(Median)
25% 25%
ST Restricted
Box-Plot 101
EXAMPLES
X minimum X maximum
• LW = X min
• UW = X max
• No outliers
X minimum Q3+1.5IQR
• LW = X min
• UW = Q3 + 1.5IQR
• 1 suspected outlier
Q1-1.5IQR X maximum
• LW =Q1 - 1.5IQR
• UW = X max
• 1 suspected outlier
X minimum Q3+1.5IQR
• LW = Q1 - 1.5IQR
• UW = Q3 + 1.5IQR
• 3 suspected outlier
ST Restricted
Box-Plot 102
EXAMPLE
GROUP “A”
GROUP “B”
Values
ST Restricted
Variance 103
VARIANCE
Average of squared deviations of values from the mean
2 σ𝑁
𝑖=1(𝑋𝑖 −𝜇)
2 (𝑋1 −𝜇)2 +(𝑋2 −𝜇)2 + ⋯+(𝑋𝑁 −𝜇)2
• For a population of N values: 𝜎𝑥 = =
𝑁 𝑁
2 σ𝑛 ത 2
𝑖=1(𝑋𝑖 −𝑋)
ത 2 +(𝑋2 −𝑋)
(𝑋1 −𝑋) ത 2 + ⋯+(𝑋𝑛 −𝑋)
ത 2
• For a sample (*) of size n: 𝑠𝑥 = =
𝑛−1 𝑛−1
ST Restricted
Standard Deviation 104
STANDARD DEVIATION
Square root of average of squared deviations of values from the mean
σ𝑁
𝑖=1(𝑋𝑖 −𝜇)
2 (𝑋1 −𝜇)2 +(𝑋2 −𝜇)2 + ⋯+(𝑋𝑁 −𝜇)2
• For a population of N values: 𝜎𝑥 = =
𝑁 𝑁
σ𝑛 ത 2
𝑖=1(𝑋𝑖 −𝑋)
ത 2 +(𝑋2 −𝑋)
(𝑋1 −𝑋) ത 2 + ⋯+(𝑋𝑛 −𝑋)
ത 2
• For a sample of size n: 𝑠𝑥 = =
𝑛−1 𝑛−1
ST Restricted
Standard Deviation 105
• It is sensitive to outliers
ST Restricted
Standard Deviation 106
EXAMPLES
X = 15.5
Case A S = 3.338
11 12 13 14 15 16 17 18 19 20 21
X = 15.5
Case B S = 0.926
11 12 13 14 15 16 17 18 19 20 21
X = 15.5
Case C S = 4.570
11 12 13 14 15 16 17 18 19 20 21
ST Restricted
Advantages of Variance & Standard Dev. 107
ST Restricted
Coefficient of Variation 108
ST Restricted
Coefficient of Variation 109
• LOT1:
• Average oxide thickness = 500
• Standard deviation = 15
s 15
CVLOT1 = 100% = 100% = 3%
x 500
Both Lots have the
• LOT2: same standard
deviation, but Lot 2
• Average oxide thickness = 650 is less variable
• Standard deviation = 15 relative to its larger
thickness
s 15
CVLOT2 = 100% = 100% = 2.3%
x 650
ST Restricted
Measures of Shape 110
SKEWNESS KURTOSIS
NOTE: in this course, the measures of shape are only mentioned. More details, are
included in the course “Statistics Level 2”.
ST Restricted
Skewness – measure of asymmetry 111
SKN(X) = 0
ST Restricted
Kurtosis – measure of peakedness 112
KUR(X) > 0
KUR(X) = 0
KUR(X) < 0
ST Restricted
Asymmetry Mean and Median 113
In a symmetrical distribution, the mean and the median are the same value
ST Restricted
Asymmetry and Box-Plot 114
In a symmetrical distribution,
◼ the mean and the median are the same value
◼ Q1 is distant from Q2 the same as Q2 is distant from Q3, i.e. (Q2-Q1) = (Q3-Q2)
Q1 Q2 Q3
Q1 Q2 Q3
Q1 Q2 Q3
ST Restricted
Indices for the association between variables 115
• The covariance
• The coefficient of correlation
ST Restricted
Covariance 116
In particular, given two variables X1 and X2, this index provides information about:
• The existence of a linear relationship between X1 and X2.
• The direction of the relationship.
N
∑( X 1i - μ X1 )( X 2i - μ X 2 )
For a population of size N: Cov ( X 1 , X 2 ) = σ X Y =
i =1
1 2
N
n
∑( X 1i - X )( X 2i - Y )
i =1
For a sample of size n: Cov ( X 1 , X 2 ) = s X1 X 2 =
n -1
INTERPRETATION
Since the covariance varies in the - to + interval, this index is not of great help in assessing
the intensity (or strength) of the linear association between the variables.
ST Restricted
Graphical interpretation Covariance 117
( Xi − mX ) 0 ( Xi − mX ) 0
and and
( Yi − mY ) 0 ( Yi − mY ) 0
mX X
ST Restricted
Pearson’s Correlation coefficient - r 118
Cov( X 1 , X 2 )
For a population of size N: ρ = corr ( X 1 , X 2 ) =
σ X1 σ X 2
Cov( X 1 , X 2 )
For a sample of size n: r = corr ( X 1 , X 2 ) =
s X1 s X 2
INTERPRETATION
ST Restricted
Pearson’s Correlation coefficient - r 119
Correlation coefficient values for different degrees of association between variables X1 and X2.
X2 X2
A B
X2
X2 D
C
C No linear correlation.
D No linear correlation.
r=0 X1 r=0 X1
E X2 F
X2
E Negative linear correlation (weak).
r = -0.32 X1 r = 0.58 X1
ST Restricted
Pearson’s Correlation coefficient 120
EXAMPLE
Evaluate the extent of the correlation between two electrical parameters: Isat and Vt of a MOS transistor in
C045 nm technology. A sample of 225 couples of values (Isat and Vt) is drawn. Data are summarized in the
following table (for brevity, only the first and last 5 rows of the original data-set are shown in the table.
However, the analysis was performed on the entire data-set).
PIDS04L006LS PVT06L004LS
LOT_WAFER_SITE
(µA/µm) (V)
The Data-set Q135WEZ_10_1 -323.5 -0.3588
Q135WEZ_10_2 -322.5 -0.3727
Q135WEZ_10_3 -311.6667 -0.3468
Q135WEZ_10_4 -325.75 -0.3543
Q135WEZ_10_5 -333.5 -0.3448
ST Restricted
Pearson’s Correlation coefficient 121
EXAMPLE (continuation)
-260
-270
-290
-300
-310
-320
-330
-340
-350
-360
-0,43 -0,42 -0,41 -0,4 -0,39 -0,38 -0,37 -0,36 -0,35 -0,34 -0,33 -0,32 -0,31
ST Restricted
Notes on Pearson’s Correlation coefficient 122
❑ The correlation coefficient is a unit-free index whose value must lie between -1 and +1
inclusive. For this reason, in addition to the existence and direction of the relationship, this
index provides information on the intensity of the linear relationship between two variables.
❑ Pearson correlation coefficient assumes that the two considered variables jointly form a
bivariate normal distribution. This aspect will be explained in the course “Statistics Level
2”, where also alternative approaches ( in case this assumption is not true) is considered.
❑ A value of +1 would result if all the points could be connected by a straight line with a
positive slope.
❑ A value of -1 would occur if all the points could be connected by a straight line with a
negative slope. Neither extreme case could be expected to occur in practice, however.
❑ The intensity of the linear relation between X and Y is higher as the correlation gets closer to
either 1.
❑ If the random variables X and Y are independent, then the correlation coefficient is 0.
However, the converse is not true, since only the linear relationship is detectable by the
correlation coefficient (for example, the relationship may be quadratic).
ST Restricted
Activity 123
60 minutes
ST Restricted
Answer 124
JMP
ST Restricted
Activity 125
60 minutes
ST Restricted
Answer 126
ST Restricted
Module 4 Key Learning’s 127
ST Restricted
Module 5: Random Variables
ST Restricted
Module 5 objectives 129
ST Restricted
Random Variable 130
WHAT IS IT?
Statistics and Mathematics are not the same! However, they often use the same terms.
But with different meanings…
In Algebra, a variable is an unknown quantity. Usually, the problem consists in finding out its
value. For example, given the equation 12-3x=0, we can find that x=4.
ST Restricted
Random Variable 131
Conventionally,
So, we write for example: X={x1, x2, …, xn} and read “the random variable X can take on the
values x1, x2, …, xn”.
ST Restricted
Random Experiment & Random Variable 132
DEFINITIONS
ST Restricted
Continuous & Discrete Random Variables 133
ST Restricted
Probability Model 134
ST Restricted
Probability Distribution & Density Function 135
what is the probability that in a wafer what is the probability that for a wafer
randomly selected from a lot, the number of randomly selected, the value of thickness (the
defective dice (the variable X) is 3? variable X) is included in the interval (x1;x2)?
ST Restricted
Probability Distribution & Density Function 136
•Σ
All
P(x) = 1 The sum of P(x) over all the possible values of x is equal to one
possible x
Density Function
• It is indicated by f(x)
• We generally refer to the probability that X belongs to an interval of possible values
• f(x0) = 0 the probability that X is equal to a value x0 is equal to zero
• ∫f(x)dx = 1 The probability that X belongs to the interval of all the possible values is 1
All
possible x
ST Restricted
Theoretical Probability Models 137
QUESTION
“So, how Statistics can help us in calculating the probability that a certain event takes place?”
ANSWER
“Statisticians defined many different probability models that can be used in real-world
phenomena. They are called Theoretical Probability Models. “
DISCRETE CONTINUOUS
DISTRIBUTIONS DISTRIBUTIONS
ST Restricted
The Normal Distribution 138
1 1 𝑥−𝜇 2
−2 𝜎
𝑓 𝑥 = 𝑒
𝜎 2𝜋
ST Restricted
The Normal Distribution 139
f(X)
X
μ
ST Restricted
Graphical Interpretation of σ 140
ST Restricted
Variations of the Parameters 141
ST Restricted
Variations of the parameters 142
Changing σ increases or
decreases the spread.
σ
μ x
ST Restricted
Graphical Assessment of Normality 143
• Notes:
• At a graphical level, we can only produce a partial assessment of normality (we
can be confident about the conclusion from a graph that clearly indicates non-
normality. Conversely, if the graph recalls a normal behavior, some doubts
should remain)
• To obtain more complete information, other procedures should be employed (→
Inferential methods, which are not presented in this course).
ST Restricted
Graphical Assessment of Normality 144
ST Restricted
Histogram to Assess Normality 145
ST Restricted
Histogram to Assess Normality 146
ST Restricted
Histogram to Assess Normality 147
ST Restricted
Normal Probability Plot 148
ST Restricted
Normal Probability Plot 149
Data Data
Quantiles Quantiles
Data Data
Quantiles Quantiles
Data
Data
Quantiles
Quantiles
ST Restricted
Activity 151
ST Restricted
Answer 152
ST Restricted
Activity 153
ST Restricted
Answer 154
• OXIDE THICKNESS
ST Restricted
Activity 155
ST Restricted
Answer 156
ST Restricted
Activity 157
ST Restricted
Answer 159
Evidence of:
• “Package Thickness” → non-normal distribution (bimodal)
• “Ball Shear” → approximately normal behavior
ST Restricted
Module 5 Key Learning’s 160
• Random Variables
• Random Experiment
• Discrete Random Variables
• Continuous Random Variables
• Probability Distributions
• Density Functions
• Theoretical Probability Models
ST Restricted
Conclusion
• You are able to understand the difference between
population and sample
ST Restricted
Conclusion
• What you could do next to better improve your statistical
competency:
• Use as much as possible what you have learned. And do it since tomorrow!
• Only way to avoid forgetting what you learned: do not wait too much time after the
course to start implementing the techniques shown in the training.
• Think about attending the next training course on “Statistics Level 2”
• You will learn:
• how to use sample data to assess important aspects of the entire population (inference)
• How to perform a statistical test of hypotheses.
• How to determine if the dataset contains outliers
ST Restricted
Post-test 163
It allows us to measure the learning that has taken place during the
training.
10-15 minutes
ST Restricted
Customer satisfaction 164
ST Restricted
CONGRATULATIONS!!
ST Restricted
JMP Routines
ST Restricted
How to make Bar chart in JMP
File: Bar Pie and Pareto.jmp
167
ST Restricted
How to make Bar chart in JMP 168
Hit on OK button
Back
ST Restricted
How to make Pie chart in JMP 169
ST Restricted
How to make Pie chart in JMP 170
Hit on OK button
Back
ST Restricted
How to make Pareto in JMP 171
ST Restricted
How to make Pareto Plot in JMP 172
Back
ST Restricted
How to make Histogram in JMPFile: Thickness.jmp
173
ST Restricted
How to make Histogram in JMP 174
Back
ST Restricted
How to make Scatterplot in JMP 175
Objective of the study: To know if long-term employees would be more reliable and
absent less often ST Restricted
How to make Scatterplot in JMP 176
Back
ST Restricted
How to make Side by Side chart in JMP 177
File: Side by side.jmp
Click on Graph > Chart
ST Restricted
How to make Side by Side chart in JMP 178
Hit OK button
Back
ST Restricted