2.fundamentals of Ststisitics

Fundamentals of statistics
1
Definition of statistics
❑ A collection of quantitative data pertaining to any subject or group,

specially when the data are systematically gathered & collected.
❑ The science that deals with collection, tabulation, analysis,

interpretation, and presentation of quantitative data.
2
Two phases of statistics
❑ Descriptive or deductive statistics:

Describes and analyzes a subject or group.
❑ Inductive statistics:
Determines from a limited sample of data, an important
conclusions about the population
3
Data collection
❑ Data may be collected by direct observation or indirectly through

written or verbal questions.
❑ Data that are collected for quality control purposes are collected by
direct observation and are classified as either variable or attribute.
4
Data Types
Data
Variable Attribute
(quantitative) (qualitative)
- Binomial
Continuous Discrete - Nominal
- Ordinal
Measurable Countable Countable
Co
py
Data collection
❑ Variable data:
❑ Measurable. If capable of any degree of subdivision, it is referred to as
continuous.
❑ Examples: weight, length......
❑ Variables that exhibit gaps are called discrete.
❑ Sometimes it is convenient for verbal or non numerical data to assume the
nature of a variable, e.g. the quality of a surface finish can be classified as
good (3), average (2), & poor (1).
❑ While many quality characteristics are stated in terms of variables, many
others must be stated as attributes.
6
Data collection
❑ Attributes:
❑ Are those quality characteristics that are classified as either
conforming or nonconforming, go / no go.
❑ Characteristics that are judged by visual observation are classified as
attributes.
❑ Sometimes it is desirable for variables to be classified as attributes e.g.
the weight of a package may not be as important as if the weight is
within specs or not.
7
Data collection
❑ In data collection, the number of figures is a function of the intended use of the
data.
❑ For example, data on the life of light bulbs, it is acceptable to say 995.6 h.
995.632 is too accurate than necessary.
❑ If your upper and lower specs are 9.58 and 9.52mm, then the data collected
should be to the nearest .01 mm.
❑ Your measuring instruments may not give a true reading because of problems
due to accuracy and precision.
8
Accuracy and precision
Accurate & Not accurate

Accurate Precise precise &not precise
True value
9
10
A measurement system can be accurate but not precise, precise but not
accurate, neither, or both. For example, if an experiment contains
a systematic error, then increasing the sample size generally increases
precision but does not improve accuracy. The result would be a consistent
yet inaccurate . Eliminating the systematic error improves accuracy but does
not change precision.
11
Describing the data
Sometimes data collected are too many that they are more
confusing than helpful. Consider the data shown in Table 1
0 1 3 0 1
1 5 4 1 2
1 0 2 0 0
2 1 1 1 2
0 4 1 3 1
1 3 4 0 0
1 3 0 1 2
TABLE 1 Number of Daily Billing Errors.
12
Describing the data
Clearly these data, in this form, are difficult to use and are not effective in describing the data’s
characteristics. Some means of summarising the data are needed to show what values the data
tends to cluster about and how the data are dispersed or spread out.
Two techniques are available to accomplish this summarization of data, graphical and analytical.
❑ The graphical technique is a plot or picture of a frequency distribution.
❑ Analytical techniques summarize data by computing a measure of central tendency and a
measure of the dispersion.
Sometimes both the graphical and analytical techniques are used.
13
Graphical Techniques
Frequency Distribution – Histograms
Ungrouped data
❑ Ungrouped data comprises a listing of the observed values as shown in Table 1. A

method of processing the data is necessary.
❑ A much better understanding can be obtained by tallying the frequency of each
value of Daily Billing Errors as shown in Table 2.
❑ The numerical value for the number of tallies is called the frequency.
14
Table2- Tally of Number of Daily Billing Errors
Number Nonconforming Tabulation Frequency
0 9
1 13
2 5
3 4
4 3
5 1
15
Ungrouped data
❑ If the "Tabulation" column is eliminated, the resulting table is classified as a frequency distribution,
and can be graphically presented as a histogram.
❑ A histogram consists of a set of rectangles that represent the frequency in each category as
shown in Fig. 1
16
Fig.1 Frequency histogram
10 –
Frequency
5–
0–
0 1 2 3 4 5
Number non conforming
17
Ungrouped data
❑ Another types of graphical presentations is the relative frequency distribution, the cumulative
frequency distribution and relative cumulative frequency distribution.
❑ Relative frequency is calculated by dividing the frequency for each data value by the total. These
calculations are shown in the 3rd column of Table 3 . Graphical presentation is shown in Fig. 2
❑ Cumulative frequency is calculated by adding the frequency of each data value to the sum of the
frequencies for the previous data values. These calculations are shown in the 4th column of Table 3
. Graphical presentation is shown in Fig. 3
❑ Relative cumulative frequency is calculated by dividing the cumulative frequency for each data
value by the total. These calculations are shown in the 5th column of Table 3 . Graphical
presentation is shown in Fig. 4
18
Table 3- Relative Frequency Distributions of Data
Relative cumulative
Number Nonconforming Frequency Relative Frequency Cumulative Frequency
Frequency
0 9 9/35= 0.26 9 9/35= 0.26
1 13 13/35= 0.37 9+13=22 22/35= 0.63
2 5 5/35= 0.14 22+5=27 27/35= 0.77
3 4 4/35= 0.11 27+4=31 31/35= 0.89
4 3 3/35= 0.09 31+3=34 34/35= 0.97
5 1 1/35= 0.03 34+1=35 35/35= 1.00
Total 35 1.00
19
Fig.2 Relative frequency histogram
0.4
Relative frequency
0.3
0.2
0.1
0
0 1 2 3 4 5
20
Fig.3 Cumulative frequency histogram
40
Cumulative frequency
30
20
10
0
0 1 2 3 4 5
21
Fig.4 Relative cumulative frequency histogram
Relative Cumulative frequency
1.00
0.75
0.50
0.25
0
0 1 2 3 4 5
22
Grouped data
Most data are continuous rather than discrete and require grouping
1. Collect data and construct a tally sheet.
2. Determine the range.
3. Determine the cell interval and the number of cells.
4. Determine the cell midpoints.
5. Post the cell frequency.
6. Construct the histogram
23
Grouped data
1. Collect data and construct a tally sheet.

- Individual observations are collected representing the data
- Determine minimum and maximum observations.
2. Determine the range.
• The range is the difference between the highest observed value and
the lowest observed value
• R = XH-XL
• XH = highest number
• XL = Lowest Number
24
Grouped data
3. Determine the cell interval and no. of cells.

❑ The cell interval is the distance between adjacent cell midpoints as shown in Figure 3.
❑ The cell interval ( i ) and the numbers of cells (n) are interrelated by the formula,
n = R/i
❑ Since n and i are both unknown, a trial and error approach is used to find the interval that
will meet the following guidelines.
25
Grouped data
Guidelines to determine number of cells
❑ In general, the number of cells should be between 5 and 20.
❑ Use 5 to 9 cells when the number of observations is less than 100;
❑ Use 8 to 17 cells when the umber of observations is between 100 and 500; and
❑ Use 15 to 20 cells when the number of observations is greater than 500.

Another method to determine the number of cells n
where N is the no. of observations

n= N
26
Fig. 5 Cell Classification
Interval (i)
Cell
Upper
Midpoint Boundary
Lower
Boundary
27
Grouped data
4. Determine the cell midpoints.
The cell midpoint is determined by using the formula
Mp = XL + i / 2
Where Mp is the midpoint of the cell
XL is the lower boundary of the cell
i is the cell interval
5. Post the cell frequency.
Cell frequency is the sum of frequencies of values within the cell boundaries. Make
a tally of the values
6. Construct the histogram
28
Grouped data
Example problem 1
A company that fills bottles of oil tries to maintain a specific weight of the product.
The table gives the weight of 110 bottles that were checked at random
intervals. Make a tally of these weights and construct a frequency histogram
( weight is in KGs )
29
Grouped data
Example problem 1
6.00 5.98 6.01 6.01 5.97 5.99 5.98 6.01 5.99 5.98 5.96
5.88 5.99 5.69 6.93 5.79 6.81 5.90 5.60 5.70 6.71 5.80
5.77 6.61 6.50 5.96 6.40 5.47 5.55 5.69 5.79 6.31 6.20
6.11 6.03 6.01 5.99 5.59 6.92 6.80 5.68 6.71 5.78 5.55
6.60 5.58 6.95 6.50 6.40 5.58 5.92 6.30 5.90 6.20 6.10
6.90 5.90 6.80 5.94 5.99 6.72 6.60 5.73 6.52 6.41 6.30
5.94 6.21 6.14 6.92 6.81 5.97 5.99 6.72 5.99 6.62 5.85
6.52 5.49 6.41 5.93 5.83 6.30 6.02 5.99 6.22 5.75 6.12
5.76 5.55 6.00 6.09 6.91 5.53 5.66 6.81 6.70 6.61 5.68
6.50 5.88 5.65 5.75 6.43 5.85 6.32 5.44 6.22 6.12 5.63
30
Grouped data
Example problem 1 – Sol.
R = XH - XL
= 6.95 – 5.44 = 1.51 kg = 1510 gm
n = N = 110 = 10.49 = 11
n = R/i
11 = 1510 / i
i = 1510 / 11 = 137.3 gm = 0.14 kg
31
Grouped data
Example problem 1 Frequency
Group /cell
Sol. fi
5.44 – 5.57 7
5.58 – 5.71 12
5.72 – 5.85 12
5.86 – 5.99 24
6.10 – 6.23 13
6.24 – 6.37 6
6.38 – 6,51 9
6.52 – 6.65 6
6.66 – 6.79 5
6.80 – 6.93 10
6.94 – 7.07 6
Total 110
32
Histogram of Oil bottles weight
24 –
22 –
20 –
–
–
–
–
Frequency
10 –
–
–
–
–
0–
44 58 72 86 100 124 138 152 166 180 194
Oil bottles weight ( kgs)
33
Grouped data
Example problem 2
The relative strength of 150 silver solder welds are tested, and the results are
given in the table. Determine the cell interval and the approximate number of
cells. Make a table showing cell midpoints, cell boundaries, and observed
frequencies. Plot a frequency histogram
34
Grouped data
Example problem 2
1.5 1.2 3.1 1.3 0.7 1.3 3.4 1.3 1.7 2.6 1.1 0.8
0.1 2.9 1.0 1.3 2.6 1.7 1.0 1.5 2.2 3.0 2.0 1.8
0.3 0.7 2.4 1.5 0.7 2.1 2.9 2.5 2.0 3.0 1.5 1.3
3.5 1.1 0.7 0.5 1.6 1.4 2.2 1.0 1.7 3.1 2.7 2.3
1.7 3.2 3.0 1.7 2.8 2.2 0.6 2.0 1.4 3.3 2.2 2.9
1.8 2.3 3.3 3.1 3.3 2.9 1.6 2.3 3.3 2.0 1.6 2.7
2.2 1.2 1.3 1.4 2.3 2.5 1.9 2.1 3.4 1.5 0.8 2.2
3.1 2.1 3.5 1.4 2.8 2.8 1.8 2.4 1.2 3.7 1.3 2.1
1.5 1.9 2.0 3.0 0.9 3.1 2.9 3.0 2.1 1.8 1.1 1.4
1.9 1.7 1.5 3.0 2.6 1.0 2.8 1.8 1.8 2.4 2.3 2.2
2.9 1.8 1.4 1.4 3.3 2.4 2.1 1.2 1.4 1.6 2.4 2.1
1.8 2.1 1.6 0.9 2.1 1.5 2.0 1.1 3.8 1.3 1.3 1.0
0.9 2.9 2.5 1.6 1.2 2.4
35
Grouped data
R = XH - XL
= 3.8 – 0.1 = 3.7

h = N = 150 = 12.25 = 13
h = R/i
13 = 3.7 / i
i = 3.7 / 13 = 0.3
36
Cell boundaries Midpoint xi f
Frequency i
0.1 – 0.4‾ 0.25 2
0.4 – 0.7‾ 0.55 2
0.7 – 1.0‾ 0.85 9
1.0 – 1.3‾ 1.15 14
1.3 – 1.6‾ 1.45 25
1.6 – 1.9‾ 1.75 20
1.9 – 2.2‾ 2.05 18
2.2 – 2.5‾ 2.35 18
2.5 – 2.8‾ 2.65 8
2.8 – 3.1‾ 2.95 17
3.1 – 3.4‾ 3.25 11
3.4 – 3.7‾ 3.55 4
3.7 – 4.0‾ 3.85 2
Total 150 37
Histogram of strength of silver welds
26 –
24 –
22 –
20 –
–
–
–
Frequency
–
10 –
–
–
–
–
0–
.025 .055 0.85 1.15 1.45 1.75 2.05 2.35 2.65 2.95 3.25 3.55 3.85
Strength
38
Uses of Histogram
❑ The histogram describes the variation in the process. It is used to:
1. Determine the process capability,
2. Compare with specifications,
3. Suggest the shape of the population, and
4. Indicate discrepancies in data such as gaps.
39
Fig.6 Characteristics of Frequency Distribution
Graphs
❑ A smooth curve represents a population frequency distribution whereas the

histogram represents a sample frequency distribution
Symmetrical Skewed to the Right Skewed to the Left

(Normal)
Bimodal Peaked Flat

40
Characteristics of Frequency Distribution Graphs
Provide a basis for decision making without further analysis.
Have certain identifiable characteristics:
❑ Symmetry or lack of symmetry of the data. Are the data equally distributed on each side of the central value, or
are the data skewed to the right or to the left?
❑ Number of modes or peaks to the data.
❑ Location of data.
❑ the spread of data ( quite peaked or flat )
Location Spread Shape

Figure 7 Differences due to location, spread, and shape
41
Analysis of Histograms
❑ Analysis of a histogram can provide information concerning specifications.

❑ Fig. 8 shows a histogram for the % of wash concentration in a steel tube
cleaning operation prior to painting.
❑ No complex statistics are needed to show that corrective actions are needed
to bring the spread of the distribution closer to the ideal value of 1.6%.
❑ Concentrations less than 1.45% produce poor quality, while concentrations
more than 1.75% are costly and therefore reduce productivity
42
Fig. 8 Histogram of wash concentration
Ideal
10 –
Frequency
5–
0–
0.7 1.0 1.3 1.6 1.9 2.2 2.5 2.8
Wash concentration %
43
Interpreting Histogram
Fig. 9 Histogram Shapes
44
❑ Normal. Many measured characteristics follow a normal distribution . The
histogram is bell-shaped. Normal distribution is so common that if the
histogram is not bell shaped, we should ask ourselves “why not?”
❑ Bimodal (or Multimodal). These histograms have two (bimodal) or many

(multimodal) peaks. Such histograms result when the data come from two or
more distributions. For example, if the data came from different suppliers,
machines, shifts, and so on, a bimodal (or multimodal) histogram will signal
large differences due to these causes.
45
❑ Empty Interval. In this case, one of the intervals has zero frequency. This may
result from unfairness in data collection.
❑ Positive Skew. Positive skew means a long tail to the right. This is common
when successful efforts are being made to minimize the measured value. Also,
variance has a positively skewed distribution.
46
❑ Negative Skew. Negative skew means a long tail to the left. This is common
when successful efforts are being made to increase the measured value. Such
a histogram may also result if sorting is taking place.
❑ Uniform. This histogram looks more like a rectangular distribution. Such a

histogram can result if the process mean is not in control, as in the case when
tool wear is taking place.
47
❑ Outlier. Here one or more cells are greatly separated from the main body of
the histogram. Such observations are often the result of wrong measurement
or other mistakes.
48
Test for normality
Histogram.
❑ Visual examination of a histogram developed from a large amount of data will
give an indication of the underlying population distribution.
❑ If a histogram is unimodal, symmetrical, and tapers off at the tails, normality

is a definite possibility and may be sufficient information in many practical
situations’.
❑ The larger the sample size, the better the judgment of normality.
❑ A minimum sample size of 50 is recommended.
49
Analytical Techniques
50
Measures of Central Tendency
It is a numerical value that describes the central position of the data

or how the data tend to build up the center.
There are 3 measures in common use:
1- The average
2- The median
3- The mode
51
1- Average of ungrouped data: is the most commonly used specially with

symmetrical distributions. It is the sum of observations divided by their
number. n
X X1 + X 2 + .....+ X n
i
Where X= =
i =1
n n
X = average Ungrouped Data
n = number of observed values
h
Xi = observed values / midpoints of cells
f x i i
h = number of cells X = i =1
h
fi = frequency of the i th cell f

i =1
i
Grouped Data
52
Ungrouped data - Example
❑ Resistance value of 5 coils in Ω are

1. x1 = 3.35
2. x2 = 3.37
3. x3 = 3.28
4. x4 = 3.34
5. x5 = 3.30
3.35+ 3.37+ 3.28+ 3.34+ 3.3

Average = X 5
= 3.33 Ω
53
2- Average of grouped data: When the data have been grouped in frequency
distribution the formula is as follows
Where h
X = average
f x i i
X = i =1
h
Xi = midpoints of cells
h = number of cells
f
i =1
i
fi = frequency of the i th cell
54
Grouped data – Example 1
❑ Given the frequency distribution of the life of 320 automotive tires in

1000 km as shown in Table 4, determine the average
Solution
fx i i
11549
X= i =1
h
= = 36.1 (In 1000 km) = 36100 km
f
320
i
i =1
55
Table 4- Frequency Distributions of the life of
320 tires in 1000 km
Midpoint Frequency Computation
Group
xi fi fi Xi
23.6 – 26.5 25 4 100

26.6 – 29.5 28 36 1008
29.6 – 32.5 31 51 1581
32.6 – 35.5 34 63 2142
35.6 – 38.5 37 58 2146
38.6 – 41.5 40 52 2080
41.6 – 44.5 43 34 1462
44.6 – 47.5 46 16 736
47.6 – 50.5 49 6 294
Total 320 11546
56
Grouped data – Example 2
❑ The weight of 65 castings is distributed as follows:
Midpoint Frequency
xi fi
3.5 6
3.8 9
4.1 18
4.4 14
4.7 13
5.0 5
1.Determine the average

2.Plot a frequency histogram
3.Evaluate the production process if specs are 4.25 ± 0.60 kg
57
Grouped data – Example 2 – Sol
❑ Compute the column fi Xi
Midpoint Frequency Computation
xi fi fi Xi
6
3.5
3.8
6
9
21
34.2
f x i i
276.7
4.1 18 73.8
X= i =1
6
= = 4.27kg
f
65
4.4 14 61.6 i
4.7 13 61.1 i =1
5.0 5 25
Total 65 276.7
58
Fig.1 Frequency histogram
20 –
The process is centered,

controlled, but not
Frequency
applicable
10 –
0–
3.5 3.8 4.1 4.4 4.7 5.0 Weight
3.65 4.25 4.85

59
60
61
2- The median: is the value that divides a series of ordered observations. It is

an effective measure for skewed distributions.
3. The mode: is the value that occurs with greatest frequency.

❑ A series of numbers is referred to as unimodal : if it has one mode
❑ Bimodal : if it has two modes
❑ Multimodal : if there are more than two modes
62
The Median
Example1. Find the median distance for the

following data.
85, 125, 130, 65, 100, 70, 75, 50, 140, 95, 70
Sol.
50, 65, 70, 70, 75, 85, 95, 100, 125, 130, 140
Single middle value Ordered data

Median = 85
63
The Median
Example. Find the median distance for the following data
85, 125, 130, 65, 100, 70, 75, 50, 140, 135, 95, 70
Sol.
50, 65, 70, 70, 75, 85, 95, 100, 125, 130, 135, 140
Two middle values so Ordered data

take the mean.
Median = 90
64
The Median – Grouped data
Example 1 :Finding median of the given grouped data
x f Cumulative
frequency
19 13 13
21 15 13 + 15 = 28
23 20 28 + 20 = 48
25 18 48 + 18 = 66
27 16 66 + 16 = 82
29 17 82 + 17 = 99
31 23 99 + 23 = 122
Here, the total frequency, N = ∑f = 122

N/2 = 122 / 2 = 61
The median is (N/2)th value = 61th value.
Now, 61th value occurs in the cumulative frequency 25,
whose corresponding x value is 25.
Hence, the median = 25.
65
Example 2 : Finding median of the given grouped data
The following table shows the number of hours per day of

watching TV in a sample of 500 people:
What is the median number of TV viewing hours?
Hours 0-1 2-3 4-5 6-7 8-9 10-11 12-13

Frequency 55 87 145 90 73 35 15
Solution
First we construct a frequency table like the following:
66
Interval Midpoint (x) Frequency (f) Cumulative fx

Frequency
0-1 0.5 55 55 55(0.5) = 27.5
2-3 2.5 87 142 87(2.5) = 217.5
4-5 4.5 145 287 145(4.5) = 652.5
6-7 6.5 90 377 6.5(90) = 585
8-9 8.5 73 450 8.5(73) = 620.5
10-11 10.5 35 485 10.5(35) = 367.5
12-13 12.5 15 500 12.5(15) = 187.5
Total 2658
67
The middle data value is between values 250 and 251.

Both these data values occur in the “4-5” interval (look at
the Cumulative Frequency column). So the median is 4.5,
the midpoint of the “4-5” interval.
68
69
Calculating the median using formula
L …..Lower limit of median group

i…….Group interval
N…..Total frequency
F1….The nearest cumulative frequency before the (N/2)th value
F2…… The nearest cumulative frequency after the (N/2)th value
70
the (N/2)th value lies at the group ( 4-5) which

is the median group
So, L=4
i =2
N = 500
F1 = 142
F2 = 287
71
72
The Mode
❑ The mode of a set of data is the value in the set that occurs most often.
❑ A set of data can be bimodal. It is also possible to have a set of data with no
mode.
❑ Bimodal: 15, 18, 18, 18, 20, 22, 24, 24, 24, 26, 26
❑ Unimodal :6, 8, 9, 9, 9, 10, 11 14, 15, 18
❑ No Mode : 2.7, 3.5, 4.9, 5.1, 8.3
73
The Mode – Grouped data
For grouped data, group mode (or, modal group) is the group
with the highest frequency.
To find mode for grouped data, use the following formula:
Where:
L……….. is the lower boundary of group mode
Δ1 …........is the difference between the frequency of group
mode and the frequency of the group before the group mode.
Δ2 ……………is the difference between the frequency of group
mode and the frequency of the group after the group mode
i …………is the group width
74
Example: Based on the grouped data below, find the mode
Time to travel to work Frequency

1 – 11– 8
11 – 21– 14
21 – 31– 12
31 – 41– 9
41 – 51– 7
Solution
Based on the table, the group mode is 11 – 21–
L = 11 Δ1 = 14 – 8 = 6 Δ2 = 14 – 12 = 2
i = 10
75
76
Figure 10 .
77
Figure 10 Relationship among average, Median, and

Mode
Symmetrical Positively Skewed Negatively Skewed
Average
Median Mode Average Average Mode
Mode Median Median
78
79
80
Measures of Dispersion
Introduction
❑ A second tool of statistics is composed of the measures of dispersion, which
describe how the data are spread out or scattered on each side of the central
value. Measures of dispersion and measures of central tendency are both
needed to describe a collection of data.
❑ Two common types of measures of dispersion:
1- Range
2- Standard deviation
81
1- Range
The range of a series of numbers is the difference between the largest
and smallest values or observations. Symbolically, it is given by
the formula.
R = XH – XL
Where R = range
XH = highest observation in a series
XL = lowest observation in a series
82
Example problem
❑ If the weights of a sample of 10 bottles of shampoo are recorded as

follows ( in gm).
150, 147, 152, 156, 144, 148, 149, 153, 146,151
Determine the range of sample
Solution
Max weight XH = 156 gm
Min weight XL = 144 gm
Range R = XH – XL
= 156 – 144 = 12 gm
83
2- Standard deviation
❑ The standard deviation is a numerical value in the units of the observed values that measures the
spreading tendency of the data.
❑ A large standard deviation shows greater variability of the data than does a small standard
deviation. In symbolic terms it is given by the formula
2
 n 
n
 i
n
( X − X ) 2
Or n X i −   X i 
2
i =1  i =1 
S= i =1 S=
(n −standard n( n − 1)
Where s = sample 1) deviation
Xi = observed value
= average
nX = number of observed values
84
Example Problem 1
❑ Determine the standard deviation of moisture content of a roll of Kraft paper. The results of six
readings across the paper web are 6.7, 6.0, 6.4, 5.9, 6.4, and 5.8 %
Solution
X i2
85
Example Problem 1
❑ Determine the standard deviation of moisture content of a roll of Kraft paper. The results of six
readings across the paper web are 6.7, 6.0, 6.4, 5.9, 6.4, and 5.8 %
2
n
 n

n X i2 −   X i 
S= i =1  i =1 
n(n − 1)
6(231.26) − (37.2)2
=
6(6 −1)
= 0.35%
86
Example Problem 2
❑ Four readings of the thickness of a paper are 0.076, 0.082, 0.073, and 0.077mm. Determine the
sample standard deviation
Sol.
X i2
87
Example Problem 2
❑ Four readings of the thickness of a paper are 0.076, 0.082, 0.073, and 0.077mm. Determine the
sample standard deviation
Sol.
2
n
 n

n X i2 −   X i 
S= i =1  i =1 
n(n − 1)
) − ( 0.308)2
4(0.023758 0.095032− 0.094864
= =
4(4 −1) 12
0.000168 = 0.000014= 0.0037

=
12
88
Example Problem 3
2- 1- Standard deviation for grouped data
89
Example Problem 3
2- 1- Standard deviation for grouped data
90
Relationship between the measures of
dispersion (range & standard deviation)
❑ Range is useful when data are too small
❑ The standard deviation is used when a more precise measure of dispersion is desired (# of
observations > 10).
❑ As shown in Fig. 11 two distributions may have the same average and range, but their standard
deviations are different . The distribution on the bottom is much better and the sample standard
deviation is much smaller which means better quality
91
Fig. 11 Comparison of two distributions with
equal average and range
R
92
Table 6 Analytical Technique - Recap
Analytical Technique
Measure of Central Tendency Measure of Dispersion
Average Median Mode Range Standard Deviation

Is the value that divides a Is the difference Is a num erical value in the
series of ordered betw een the units of the observed
Is the arithm etic m ean for a Is the value that occurs
observations. It is an largest and values that m easures the
group of num bers w ith greatest frequency.
effective m easure for sm allest values or spreading tendency of the
skew ed distributions. observations data.
n
(X
n
For odd num ber of
X i
X 1 + X 2 + ..... + X 4 observations, m edian is a
The m ode of series of
num bers varies from no
i − X )2
X= i =1
=
n n
single value located at the
m ode to m ulitm odal S= i =1
m iddle. (n − 1)
Weighted Average
A series of num bers is R = XH – XL
n
W
2
Xi For even num ber of referred to as unim odal : if n
 n 
i
observations, m edian is a it has one m ode, Bim odal : n X −   X i 
2
X = i =1 i
 i =1 
n average value for the tw o if it has tw o m odes, S = i =1
W i =1
i m iddle values Multim odal : if there are n(n − 1)
m ore that tw o m odes
93
Concept of a population and a sample
❑ A sample is selected to represent the population.
❑ Since the composition of samples will fluctuate, the computed statistics will
be larger or smaller than their true population values (parameters).
❑ Sampling is necessary when measuring of the entire population is:
- impossible
- too expensive
- destructive
- too dangerous
❑ We use different symbols to differentiate between samples and population.
94
Table 7 Comparison of sample and population
Sample Population
statistic parameter
X average µ ( Xo ) mean
S sample standard deviation σ (So) standard deviation
95
Table 8 Results of 8 samples of green & blue
spheres
Sample number Sample size No. of green No. of blue % of green
spheres spheres spheres
1 10 1 9 10
2 10 2 8 20
3 10 5 5 50
4 10 1 9 10
5 10 3 7 30
6 10 0 10 0
7 10 2 8 20
8 10 1 9 10
Total 80 15 65 18.8
96
Comparison of sample and population
❑ Table 8 shows the results of an experiment that illustrates the relationship between samples
and the population.
❑ A container holds 800 blue and 200 green spheres . The 1000 spheres are considered the
population with 20% green spheres.
❑ 8 samples of size 10 spheres are selected, checked in colour and replaced ( one by one ).
❑ The table illustrate the difference between the sample results and what should be expected
from the known population.
97
The Normal Curve
❑ One type of population that is quite common is called the normal curve. The normal curve is a
symmetrical, unimodal, bell-shaped distribution with the mean, median, and mode having
the same value.
❑ A population curve or distribution is developed from a frequency histogram.
❑ As the sample size of a histogram gets larger and larger, the cell interval gets smaller and
smaller.
98
The Normal Curve
❑ When the sample size is quite large and the cell interval is very small, the histogram will
take on the appearance of a smooth polygon or a curve representing the population.
❑ Much of the variation in nature and in industry follows the frequency distribution of the normal
curve
❑ A curve of the normal population of 1000 observations of the resistance in ohms of an electrical
device with population mean, µ, of 90 Ω and population standard deviation , σ of 2 Ω is shown
in figure12. The interval between dotted lines is equal to one standard deviation, σ.
99
Figure 12 The normal curve
m = 90
Frequency
s = 2.0
X
 84 86 88 90 92 94 96 + 
10
0
The standardized normal distribution
❑ Much of the variation in nature and in industry follows the frequency distribution of the normal
curve
❑ All normal distributions of continuous variables can be converted to the standardized normal
distribution ( see fig. 13) by using the standardized normal value Z.
❑ For example consider the value of 92 Ω in fig. 12 , which is one standard

deviation above the mean. Conversion to the Z value is
Xi − m 92− 90
Z= = = +1
s 2
10
1
Figure 13 The standardized normal distribution
m =0
s =1
Z
 -3 -2 -1 0 1 2 3  +
10
2
The standardized normal distribution
❑ Fig. 13 shows the standardized curve with its mean of Zero and standard deviation of 1. The area
under the curve is equal to 1.0 or 100% and therefore can easily be used for probability
calculations.
❑ A normal area table is provided as Table A in the appendix
10
3
Relationship to the mean and standard deviation
❑ Fig. 14 shows three normal curves with different mean values and the same standard deviation.
The only change is in location.
❑ Fig. 15 shows three normal curves with the same mean value but different standard deviations.
The figure illustrates the principle that the larger the standard deviation, the flatter the curve,
and the smaller the standard deviation, the more peaked the curve.
❑ It is noted that the two parameters ( mean & standard deviation ) are independent.
10
4
Figure 14 Normal curve with different means,
but identical standard deviations
µ = 14 µ = 20 µ = 29
5 11 14 17 20 23 26 29 32 35 38
10
5
Figure 15 Normal curve with different
standard deviations, but identical means
σ = 1.5
σ =3
σ = 4.5
5 8 11 14 17 20 23 26 29 32 35 +
10
6
Figure 16 Percent of items included between
certain values of the standard deviation
68.26%
95.46%
99.73%
-3 σ -2 σ - 1σ µ +1 σ +2 σ +3 σ
10
7
Applications
❑ The areas under the curve for various Z values are given in Table A in the appendix. Table A,
"Areas under the Normal Curve," is a left reading table, which means that the given areas are
for that portion of the curve from to a particular value, Xi.
-∞
The first step is to determine the Z value using the formula
Xi − m
Z=
where Z = standard normal value s
= individual value
µ =X imean
σ = population standard deviation
10
8
Example problem (1)
❑ The mean value of the weight of a particular brand of cereal ‫ الحبوب‬for the past year is 0.297 kg (10.5 oz) with
a standard deviation of 0.024 kg. assuming a normal distribution, find the percent of the data that falls below
the lower specification limit of 0.274 kg. (Note: Since the mean and standard deviation were determined from a
large number of tests during the year, they are considered to be valid estimates of the population values.)
-∞ Area1
σ = 0.024
µ = 0.297
Xi = 0.274
10
9
11
0
Example problem (solution)
Xi − m
Z=
s
= 0.274 - 0.297
0.024
= - 0.96
From Table A it is found that for Z = - 0.96,
Area1 = 0.1685 or 16.85%
Thus, 16.85% of the data are less than 0.274 kg.
11
1
Example problem (2)
❑ Using the data from the preceding problem, determine the percentage of the data that fall above 0.347 kg.
Sol.
❑ Since Table A is a left-reading table, the solution to this problem requires the use of the relationship: Area 1 +
Area2 = AreaT = 1.0000.
❑ Therefore, Area2 is determined and subtracted from 1.0000 to obtain Area1.
-∞ AreaT = 1.0000 +∞
σ = 0.024
-∞ Area2 Area1 +∞
µ = 0.297
Xi = 0.347
11
2
Z = Xi - µ
σ
= 0.347 – 0.297
0.024
= + 2.08
From Table A it is found that for Z2 = +2.08,

Area2 = 0.9812
Area1 = AreaT – Area2
= 1.0000 – 0.9812
= 0.0188 or 1.88%
Thus, 1.88% of the data are above 0.347 kg.
11
3
Example problem (3)
❑ A large number of tests of line voltage to home residences show a mean of 118.5 V and a population
standard deviation of 1.20 V. determine the percentage of data between 116 and 120V.
❑ Since Table A is a left-reading table. The solution requires that the area to the left of 116 V be
subtracted from the area to the left of 120 V. The graph and calculations show the technique.
-∞ Area3
σ = 1.20
-∞ Area2 Area1
µ = 118.5
Xi = 116 Xi = 120
11
4
Z2 = Xi - µ Z3 = Xi - µ
σ σ
= 116 – 118.5 = 120 – 118.5
1.20 1.20
= – 2.08 = + 1.25
From Table A it is found that for Z2 = -2.08, Area2 = 0.0188, and for
Z3 = + 1.25, Area3 = 0.8944.
Area1 = Area3 – Area2
= 0.8944 – 0.0188
= 0.8756 or 87.56%
Thus, 87.56% of the data are between 116 and 120V.
11
5
Example problem (4)
❑ If it is desired to have 12.1% of the line voltage below 115 V, how should the mean voltage be adjusted? The
dispersion is σ = 1.20 V.
❑ The Solution to this type problem is the reverse of the other problems. First 12.1% or 0.1210, is found in the
body of table A. This give a Z value and using the formula for Z, we can solve for the mean voltage. Form Table A
with Area1 = 0.1210, the Z value of –1.17 is obtained.
Area1 = 0.1210
-∞
σ = 1.20
X0 = ?
Xi = 115
11
6
Z = X i – X0
σ
-1.17= 115 - X0
1.20
X0 = 116.4 V
❑ Thus, the mean voltage should be centered at 116.4 V for 12.1% of the values to
be less than 115V.
11
7
Example problem (5)
❑ The population mean of a company’s racing bicycle is 9.07 kg with a population standard deviation
of 0.4 kg. If the distribution is approximately normal, determine
❑ A) the % of bicycles less than 8.3 kg
❑ B) the % of bicycles greater than 10.00 kg
❑ C) the % of bicycles between 8.3 and 10.00 kg
-∞ Area2
σ = 0.4
-∞ Area1 Area4 Area3 ∞
µ = 9.07
Xa = 8.3 Xb = 10
11
8
Example problem 5 (solution)
Za = Xa - µ Zb = Xb - µ
σ σ
= 8.3 – 9.07 = 10 – 9.07
0.4 0.4
= – 1.925 = + 2.325
a) From Table A it is found that for Za = – 1.925 , Area1 = 0.0188,

Then 1.88% of bicycles have weights less than 8.3 kg
b) and for Zb = + 2.325 Area2 = 0.9899.
Then 0.9899 of bicycles have weights less than 10 kg
Bicycles have weights more than 10 kg = 1- 0.9899
= 0.0101 or 1.01% (Area3 )
11
9
a) From Table A it is found that for Za = – 1.925 , Area1 = 0.0188,
Then 1.88% of bicycles have weights less than 8.3 kg
b) and for Zb = + 2.325 Area2 = 0.9899.
Then 0.9899 of bicycles have weights less than 10 kg
Bicycles have weights more than 10 kg = 1- 0.9899
= 0.0101 or 1.01% (Area3 )
c) Area4 = Area2 – Area1
= 0.9899 – 0.0188
= 0.9711 or 97.11%
Thus, 97.11% of the bicycles are between 8.3 and 10 kg.

12
0
Example problem (6)
❑ Plastic strips that are used in a sensitive electronic device are manufactured to a max specifications of
305.70 mm and a min specs. of 304.55 mm. If the strips are less than the min specs., they are scrapped; if
greater than the max specs, they are reworked.. The part dimensions are normally distributed with a
population standard deviation of o.25 mm. What % of the product is scrap? What % is rework? How can
the process be centered to eliminate all but 0.1% of the scrap? What is the rework % then?
-∞ Area2
σ = 0.25
-∞ Area1
Xmin = 304.55 Xmax = 305.7
12
1
. µ = Xmin + Xmax
2
= 304.55 + 305.70
0.024
= 305.125
12
2
Z1 = Xmin - µ
σ
= 304.55 – 305.125
0.25
= – 2.3
From Table A it is found that for Z1 = – 2.3, Area1 = 0.0107
Thus, 1.07% of the strips are scrapped.
12
3
Z2 = Xmax - µ
σ
= 305.7 – 305.125
0.25
= + 2.3
From Table A it is found that for Z1 = + 2.3, Area2 = 0.9916
Thus, % of rework = 1- 0.9916 = 0.0084 = 0.84%.
12
4
From Table A it is found that for a % of 0.1 scrap, Z= – 1.28
Z = Xi - µ = – 1.28
σ
Xi = – 1.28 X 0.25 + 305.125 = LCL
= 304.81
Xav = 304.81 + 3 X 0.25 = 305.56
UCL = 304.81 + 6 X 0.25 = 306.31
12
5
Z = Xi - µ = 306.31 – 305.125
σ 0.25
= 4.74
From Table A it is found that for Z = 4.74 ( > 3.5 ) area = 1.0
Thus, rework % = 0
12
6

2.fundamentals of Ststisitics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2.fundamentals of Ststisitics

Uploaded by

Copyright:

Available Formats

Fundamentals of statistics

❑ A collection of quantitative data pertaining to any subject or group,

❑ The science that deals with collection, tabulation, analysis,

❑ Descriptive or deductive statistics:

❑ Data may be collected by direct observation or indirectly through

Measurable Countable Countable

Accurate & Not accurate

TABLE 1 Number of Daily Billing Errors.

Sometimes both the graphical and analytical techniques are used.

Frequency Distribution – Histograms

❑ Ungrouped data comprises a listing of the observed values as shown in Table 1. A

Number Nonconforming Tabulation Frequency

0 9 9/35= 0.26 9 9/35= 0.26

1 13 13/35= 0.37 9+13=22 22/35= 0.63

2 5 5/35= 0.14 22+5=27 27/35= 0.77

3 4 4/35= 0.11 27+4=31 31/35= 0.89

4 3 3/35= 0.09 31+3=34 34/35= 0.97

5 1 1/35= 0.03 34+1=35 35/35= 1.00

Relative Cumulative frequency

1. Collect data and construct a tally sheet.

3. Determine the cell interval and no. of cells.

❑ In general, the number of cells should be between 5 and 20.

❑ Use 5 to 9 cells when the number of observations is less than 100;

❑ Use 15 to 20 cells when the number of observations is greater than 500.

where N is the no. of observations

= 6.95 – 5.44 = 1.51 kg = 1510 gm

0.9 2.9 2.5 1.6 1.2 2.4

= 3.8 – 0.1 = 3.7

0.1 – 0.4‾ 0.25 2

0.4 – 0.7‾ 0.55 2

0.7 – 1.0‾ 0.85 9

1.0 – 1.3‾ 1.15 14

1.3 – 1.6‾ 1.45 25

1.6 – 1.9‾ 1.75 20

1.9 – 2.2‾ 2.05 18

2.2 – 2.5‾ 2.35 18

2.5 – 2.8‾ 2.65 8

2.8 – 3.1‾ 2.95 17

3.1 – 3.4‾ 3.25 11

3.4 – 3.7‾ 3.55 4

3.7 – 4.0‾ 3.85 2

❑ The histogram describes the variation in the process. It is used to:

1. Determine the process capability,

2. Compare with specifications,

3. Suggest the shape of the population, and

4. Indicate discrepancies in data such as gaps.

❑ A smooth curve represents a population frequency distribution whereas the

Symmetrical Skewed to the Right Skewed to the Left

Bimodal Peaked Flat

Have certain identifiable characteristics:

❑ Number of modes or peaks to the data.

❑ the spread of data ( quite peaked or flat )

Location Spread Shape

❑ Analysis of a histogram can provide information concerning specifications.

Fig. 9 Histogram Shapes

❑ Bimodal (or Multimodal). These histograms have two (bimodal) or many

❑ Uniform. This histogram looks more like a rectangular distribution. Such a

❑ If a histogram is unimodal, symmetrical, and tapers off at the tails, normality

❑ A minimum sample size of 50 is recommended.

It is a numerical value that describes the central position of the data

1- Average of ungrouped data: is the most commonly used specially with

fi = frequency of the i th cell f

❑ Resistance value of 5 coils in Ω are