Download as pdf or txt
Download as pdf or txt
You are on page 1of 126

Fundamentals of statistics

Definition of statistics

❑ A collection of quantitative data pertaining to any subject or group,

specially when the data are systematically gathered & collected.

❑ The science that deals with collection, tabulation, analysis,

interpretation, and presentation of quantitative data.

Two phases of statistics

❑ Descriptive or deductive statistics:

Describes and analyzes a subject or group.
❑ Inductive statistics:
Determines from a limited sample of data, an important
conclusions about the population

Data collection

❑ Data may be collected by direct observation or indirectly through

written or verbal questions.

❑ Data that are collected for quality control purposes are collected by
direct observation and are classified as either variable or attribute.

Data Types


Variable Attribute
(quantitative) (qualitative)

- Binomial
Continuous Discrete - Nominal
- Ordinal

Measurable Countable Countable

Data collection

❑ Variable data:
❑ Measurable. If capable of any degree of subdivision, it is referred to as
❑ Examples: weight, length......
❑ Variables that exhibit gaps are called discrete.
❑ Sometimes it is convenient for verbal or non numerical data to assume the
nature of a variable, e.g. the quality of a surface finish can be classified as
good (3), average (2), & poor (1).
❑ While many quality characteristics are stated in terms of variables, many
others must be stated as attributes.

Data collection

❑ Attributes:
❑ Are those quality characteristics that are classified as either
conforming or nonconforming, go / no go.
❑ Characteristics that are judged by visual observation are classified as
❑ Sometimes it is desirable for variables to be classified as attributes e.g.
the weight of a package may not be as important as if the weight is
within specs or not.

Data collection
❑ In data collection, the number of figures is a function of the intended use of the

❑ For example, data on the life of light bulbs, it is acceptable to say 995.6 h.
995.632 is too accurate than necessary.

❑ If your upper and lower specs are 9.58 and 9.52mm, then the data collected
should be to the nearest .01 mm.

❑ Your measuring instruments may not give a true reading because of problems
due to accuracy and precision.

Accuracy and precision

Accurate & Not accurate

Accurate Precise precise &not precise

True value

Accuracy and precision

Accuracy and precision
A measurement system can be accurate but not precise, precise but not
accurate, neither, or both. For example, if an experiment contains
a systematic error, then increasing the sample size generally increases
precision but does not improve accuracy. The result would be a consistent
yet inaccurate . Eliminating the systematic error improves accuracy but does
not change precision.

Describing the data
Sometimes data collected are too many that they are more
confusing than helpful. Consider the data shown in Table 1

0 1 3 0 1
1 5 4 1 2
1 0 2 0 0
2 1 1 1 2
0 4 1 3 1
1 3 4 0 0
1 3 0 1 2

TABLE 1 Number of Daily Billing Errors.

Describing the data

Clearly these data, in this form, are difficult to use and are not effective in describing the data’s
characteristics. Some means of summarising the data are needed to show what values the data
tends to cluster about and how the data are dispersed or spread out.
Two techniques are available to accomplish this summarization of data, graphical and analytical.
❑ The graphical technique is a plot or picture of a frequency distribution.
❑ Analytical techniques summarize data by computing a measure of central tendency and a
measure of the dispersion.

Sometimes both the graphical and analytical techniques are used.

Graphical Techniques

Frequency Distribution – Histograms

Ungrouped data

❑ Ungrouped data comprises a listing of the observed values as shown in Table 1. A

method of processing the data is necessary.
❑ A much better understanding can be obtained by tallying the frequency of each
value of Daily Billing Errors as shown in Table 2.
❑ The numerical value for the number of tallies is called the frequency.

Frequency Distribution – Histograms
Table2- Tally of Number of Daily Billing Errors

Number Nonconforming Tabulation Frequency

0 9

1 13

2 5

3 4
4 3
5 1

Frequency Distribution – Histograms
Ungrouped data

❑ If the "Tabulation" column is eliminated, the resulting table is classified as a frequency distribution,
and can be graphically presented as a histogram.
❑ A histogram consists of a set of rectangles that represent the frequency in each category as
shown in Fig. 1

Fig.1 Frequency histogram

10 –


0 1 2 3 4 5
Number non conforming
Frequency Distribution – Histograms
Ungrouped data
❑ Another types of graphical presentations is the relative frequency distribution, the cumulative
frequency distribution and relative cumulative frequency distribution.
❑ Relative frequency is calculated by dividing the frequency for each data value by the total. These
calculations are shown in the 3rd column of Table 3 . Graphical presentation is shown in Fig. 2
❑ Cumulative frequency is calculated by adding the frequency of each data value to the sum of the
frequencies for the previous data values. These calculations are shown in the 4th column of Table 3
. Graphical presentation is shown in Fig. 3
❑ Relative cumulative frequency is calculated by dividing the cumulative frequency for each data
value by the total. These calculations are shown in the 5th column of Table 3 . Graphical
presentation is shown in Fig. 4

Table 3- Relative Frequency Distributions of Data

Relative cumulative
Number Nonconforming Frequency Relative Frequency Cumulative Frequency

0 9 9/35= 0.26 9 9/35= 0.26

1 13 13/35= 0.37 9+13=22 22/35= 0.63

2 5 5/35= 0.14 22+5=27 27/35= 0.77

3 4 4/35= 0.11 27+4=31 31/35= 0.89

4 3 3/35= 0.09 31+3=34 34/35= 0.97

5 1 1/35= 0.03 34+1=35 35/35= 1.00

Total 35 1.00

Fig.2 Relative frequency histogram

Relative frequency


0 1 2 3 4 5
Number non conforming
Fig.3 Cumulative frequency histogram

Cumulative frequency


0 1 2 3 4 5
Number non conforming
Fig.4 Relative cumulative frequency histogram

Relative Cumulative frequency


0 1 2 3 4 5
Number non conforming
Grouped data

Most data are continuous rather than discrete and require grouping
1. Collect data and construct a tally sheet.
2. Determine the range.
3. Determine the cell interval and the number of cells.
4. Determine the cell midpoints.
5. Post the cell frequency.
6. Construct the histogram

Grouped data

1. Collect data and construct a tally sheet.

- Individual observations are collected representing the data
- Determine minimum and maximum observations.
2. Determine the range.
• The range is the difference between the highest observed value and
the lowest observed value
• R = XH-XL
• XH = highest number
• XL = Lowest Number

Grouped data

3. Determine the cell interval and no. of cells.

❑ The cell interval is the distance between adjacent cell midpoints as shown in Figure 3.

❑ The cell interval ( i ) and the numbers of cells (n) are interrelated by the formula,

n = R/i

❑ Since n and i are both unknown, a trial and error approach is used to find the interval that
will meet the following guidelines.

Grouped data
Guidelines to determine number of cells

❑ In general, the number of cells should be between 5 and 20.

❑ Use 5 to 9 cells when the number of observations is less than 100;

❑ Use 8 to 17 cells when the umber of observations is between 100 and 500; and

❑ Use 15 to 20 cells when the number of observations is greater than 500.

Another method to determine the number of cells n

where N is the no. of observations

n= N

Fig. 5 Cell Classification

Interval (i)


Midpoint Boundary
Grouped data
4. Determine the cell midpoints.
The cell midpoint is determined by using the formula
Mp = XL + i / 2
Where Mp is the midpoint of the cell
XL is the lower boundary of the cell
i is the cell interval
5. Post the cell frequency.
Cell frequency is the sum of frequencies of values within the cell boundaries. Make
a tally of the values
6. Construct the histogram

Grouped data
Example problem 1

A company that fills bottles of oil tries to maintain a specific weight of the product.
The table gives the weight of 110 bottles that were checked at random
intervals. Make a tally of these weights and construct a frequency histogram
( weight is in KGs )

Grouped data
Example problem 1
6.00 5.98 6.01 6.01 5.97 5.99 5.98 6.01 5.99 5.98 5.96

5.88 5.99 5.69 6.93 5.79 6.81 5.90 5.60 5.70 6.71 5.80

5.77 6.61 6.50 5.96 6.40 5.47 5.55 5.69 5.79 6.31 6.20

6.11 6.03 6.01 5.99 5.59 6.92 6.80 5.68 6.71 5.78 5.55

6.60 5.58 6.95 6.50 6.40 5.58 5.92 6.30 5.90 6.20 6.10

6.90 5.90 6.80 5.94 5.99 6.72 6.60 5.73 6.52 6.41 6.30

5.94 6.21 6.14 6.92 6.81 5.97 5.99 6.72 5.99 6.62 5.85

6.52 5.49 6.41 5.93 5.83 6.30 6.02 5.99 6.22 5.75 6.12

5.76 5.55 6.00 6.09 6.91 5.53 5.66 6.81 6.70 6.61 5.68

6.50 5.88 5.65 5.75 6.43 5.85 6.32 5.44 6.22 6.12 5.63

Grouped data
Example problem 1 – Sol.
R = XH - XL

= 6.95 – 5.44 = 1.51 kg = 1510 gm

n = N = 110 = 10.49 = 11

n = R/i
11 = 1510 / i
i = 1510 / 11 = 137.3 gm = 0.14 kg

Grouped data
Example problem 1 Frequency
Group /cell
Sol. fi
5.44 – 5.57 7
5.58 – 5.71 12
5.72 – 5.85 12
5.86 – 5.99 24
6.10 – 6.23 13
6.24 – 6.37 6
6.38 – 6,51 9
6.52 – 6.65 6
6.66 – 6.79 5
6.80 – 6.93 10
6.94 – 7.07 6

Total 110
Example problem 1 – Sol.
Histogram of Oil bottles weight
24 –
22 –
20 –


10 –

44 58 72 86 100 124 138 152 166 180 194
Oil bottles weight ( kgs)

Grouped data
Example problem 2

The relative strength of 150 silver solder welds are tested, and the results are
given in the table. Determine the cell interval and the approximate number of
cells. Make a table showing cell midpoints, cell boundaries, and observed
frequencies. Plot a frequency histogram

Grouped data
Example problem 2
1.5 1.2 3.1 1.3 0.7 1.3 3.4 1.3 1.7 2.6 1.1 0.8

0.1 2.9 1.0 1.3 2.6 1.7 1.0 1.5 2.2 3.0 2.0 1.8

0.3 0.7 2.4 1.5 0.7 2.1 2.9 2.5 2.0 3.0 1.5 1.3

3.5 1.1 0.7 0.5 1.6 1.4 2.2 1.0 1.7 3.1 2.7 2.3

1.7 3.2 3.0 1.7 2.8 2.2 0.6 2.0 1.4 3.3 2.2 2.9

1.8 2.3 3.3 3.1 3.3 2.9 1.6 2.3 3.3 2.0 1.6 2.7

2.2 1.2 1.3 1.4 2.3 2.5 1.9 2.1 3.4 1.5 0.8 2.2

3.1 2.1 3.5 1.4 2.8 2.8 1.8 2.4 1.2 3.7 1.3 2.1

1.5 1.9 2.0 3.0 0.9 3.1 2.9 3.0 2.1 1.8 1.1 1.4

1.9 1.7 1.5 3.0 2.6 1.0 2.8 1.8 1.8 2.4 2.3 2.2

2.9 1.8 1.4 1.4 3.3 2.4 2.1 1.2 1.4 1.6 2.4 2.1

1.8 2.1 1.6 0.9 2.1 1.5 2.0 1.1 3.8 1.3 1.3 1.0

0.9 2.9 2.5 1.6 1.2 2.4

Grouped data
Example problem 2 – Sol.
R = XH - XL

= 3.8 – 0.1 = 3.7

h = N = 150 = 12.25 = 13

h = R/i
13 = 3.7 / i
i = 3.7 / 13 = 0.3

Example problem 2 – Sol.
Cell boundaries Midpoint xi f
Frequency i

0.1 – 0.4‾ 0.25 2

0.4 – 0.7‾ 0.55 2

0.7 – 1.0‾ 0.85 9

1.0 – 1.3‾ 1.15 14

1.3 – 1.6‾ 1.45 25

1.6 – 1.9‾ 1.75 20

1.9 – 2.2‾ 2.05 18

2.2 – 2.5‾ 2.35 18

2.5 – 2.8‾ 2.65 8

2.8 – 3.1‾ 2.95 17

3.1 – 3.4‾ 3.25 11

3.4 – 3.7‾ 3.55 4

3.7 – 4.0‾ 3.85 2

Total 150 37
Example problem 2 – Sol.
Histogram of strength of silver welds
26 –
24 –
22 –
20 –


10 –

.025 .055 0.85 1.15 1.45 1.75 2.05 2.35 2.65 2.95 3.25 3.55 3.85


Uses of Histogram

❑ The histogram describes the variation in the process. It is used to:

1. Determine the process capability,

2. Compare with specifications,

3. Suggest the shape of the population, and

4. Indicate discrepancies in data such as gaps.

Fig.6 Characteristics of Frequency Distribution

❑ A smooth curve represents a population frequency distribution whereas the

histogram represents a sample frequency distribution

Symmetrical Skewed to the Right Skewed to the Left


Bimodal Peaked Flat

Characteristics of Frequency Distribution Graphs
Provide a basis for decision making without further analysis.

Have certain identifiable characteristics:

❑ Symmetry or lack of symmetry of the data. Are the data equally distributed on each side of the central value, or
are the data skewed to the right or to the left?

❑ Number of modes or peaks to the data.

❑ Location of data.

❑ the spread of data ( quite peaked or flat )

Location Spread Shape

Figure 7 Differences due to location, spread, and shape
Analysis of Histograms

❑ Analysis of a histogram can provide information concerning specifications.

❑ Fig. 8 shows a histogram for the % of wash concentration in a steel tube
cleaning operation prior to painting.
❑ No complex statistics are needed to show that corrective actions are needed
to bring the spread of the distribution closer to the ideal value of 1.6%.
❑ Concentrations less than 1.45% produce poor quality, while concentrations
more than 1.75% are costly and therefore reduce productivity

Fig. 8 Histogram of wash concentration

10 –


0.7 1.0 1.3 1.6 1.9 2.2 2.5 2.8

Wash concentration %
Interpreting Histogram

Fig. 9 Histogram Shapes

Interpreting Histogram
❑ Normal. Many measured characteristics follow a normal distribution . The
histogram is bell-shaped. Normal distribution is so common that if the
histogram is not bell shaped, we should ask ourselves “why not?”

❑ Bimodal (or Multimodal). These histograms have two (bimodal) or many

(multimodal) peaks. Such histograms result when the data come from two or
more distributions. For example, if the data came from different suppliers,
machines, shifts, and so on, a bimodal (or multimodal) histogram will signal
large differences due to these causes.

Interpreting Histogram

❑ Empty Interval. In this case, one of the intervals has zero frequency. This may
result from unfairness in data collection.

❑ Positive Skew. Positive skew means a long tail to the right. This is common
when successful efforts are being made to minimize the measured value. Also,
variance has a positively skewed distribution.

Interpreting Histogram

❑ Negative Skew. Negative skew means a long tail to the left. This is common
when successful efforts are being made to increase the measured value. Such
a histogram may also result if sorting is taking place.

❑ Uniform. This histogram looks more like a rectangular distribution. Such a

histogram can result if the process mean is not in control, as in the case when
tool wear is taking place.

Interpreting Histogram

❑ Outlier. Here one or more cells are greatly separated from the main body of
the histogram. Such observations are often the result of wrong measurement
or other mistakes.

Test for normality
❑ Visual examination of a histogram developed from a large amount of data will
give an indication of the underlying population distribution.

❑ If a histogram is unimodal, symmetrical, and tapers off at the tails, normality

is a definite possibility and may be sufficient information in many practical

❑ The larger the sample size, the better the judgment of normality.

❑ A minimum sample size of 50 is recommended.

Analytical Techniques

Measures of Central Tendency

It is a numerical value that describes the central position of the data

or how the data tend to build up the center.
There are 3 measures in common use:
1- The average
2- The median
3- The mode

Measures of Central Tendency

1- Average of ungrouped data: is the most commonly used specially with

symmetrical distributions. It is the sum of observations divided by their
number. n

X X1 + X 2 + .....+ X n
Where X= =
i =1
n n
X = average Ungrouped Data
n = number of observed values
Xi = observed values / midpoints of cells
f x i i
h = number of cells X = i =1

fi = frequency of the i th cell f

i =1

Grouped Data

Ungrouped data - Example

❑ Resistance value of 5 coils in Ω are

1. x1 = 3.35
2. x2 = 3.37
3. x3 = 3.28
4. x4 = 3.34
5. x5 = 3.30

3.35+ 3.37+ 3.28+ 3.34+ 3.3

Average = X 5

= 3.33 Ω

Measures of Central Tendency

2- Average of grouped data: When the data have been grouped in frequency
distribution the formula is as follows

Where h

X = average
f x i i
X = i =1
Xi = midpoints of cells
h = number of cells
i =1

fi = frequency of the i th cell

Grouped data – Example 1

❑ Given the frequency distribution of the life of 320 automotive tires in

1000 km as shown in Table 4, determine the average


fx i i
X= i =1
= = 36.1 (In 1000 km) = 36100 km

i =1

Table 4- Frequency Distributions of the life of
320 tires in 1000 km
Midpoint Frequency Computation
xi fi fi Xi

23.6 – 26.5 25 4 100

26.6 – 29.5 28 36 1008
29.6 – 32.5 31 51 1581
32.6 – 35.5 34 63 2142
35.6 – 38.5 37 58 2146
38.6 – 41.5 40 52 2080
41.6 – 44.5 43 34 1462
44.6 – 47.5 46 16 736
47.6 – 50.5 49 6 294

Total 320 11546

Grouped data – Example 2
❑ The weight of 65 castings is distributed as follows:
Midpoint Frequency

xi fi
3.5 6
3.8 9
4.1 18
4.4 14
4.7 13
5.0 5

1.Determine the average

2.Plot a frequency histogram
3.Evaluate the production process if specs are 4.25 ± 0.60 kg

Grouped data – Example 2 – Sol

❑ Compute the column fi Xi

Midpoint Frequency Computation

xi fi fi Xi
f x i i
4.1 18 73.8
X= i =1
= = 4.27kg
4.4 14 61.6 i
4.7 13 61.1 i =1

5.0 5 25
Total 65 276.7

Fig.1 Frequency histogram

20 –

The process is centered,

controlled, but not


10 –

3.5 3.8 4.1 4.4 4.7 5.0 Weight

3.65 4.25 4.85

Measures of Central Tendency

Measures of Central Tendency

Measures of Central Tendency

2- The median: is the value that divides a series of ordered observations. It is

an effective measure for skewed distributions.

3. The mode: is the value that occurs with greatest frequency.

❑ A series of numbers is referred to as unimodal : if it has one mode
❑ Bimodal : if it has two modes
❑ Multimodal : if there are more than two modes

The Median

Example1. Find the median distance for the

following data.
85, 125, 130, 65, 100, 70, 75, 50, 140, 95, 70

50, 65, 70, 70, 75, 85, 95, 100, 125, 130, 140

Single middle value Ordered data

Median = 85

The Median

Example. Find the median distance for the following data

85, 125, 130, 65, 100, 70, 75, 50, 140, 135, 95, 70

50, 65, 70, 70, 75, 85, 95, 100, 125, 130, 135, 140

Two middle values so Ordered data

take the mean.
Median = 90

The Median – Grouped data
Example 1 :Finding median of the given grouped data
x f Cumulative
19 13 13
21 15 13 + 15 = 28
23 20 28 + 20 = 48
25 18 48 + 18 = 66
27 16 66 + 16 = 82
29 17 82 + 17 = 99
31 23 99 + 23 = 122

Here, the total frequency, N = ∑f = 122

N/2 = 122 / 2 = 61
The median is (N/2)th value = 61th value.
Now, 61th value occurs in the cumulative frequency 25,
whose corresponding x value is 25.
Hence, the median = 25.
The Median – Grouped data
Example 2 : Finding median of the given grouped data

The following table shows the number of hours per day of

watching TV in a sample of 500 people:
What is the median number of TV viewing hours?

Hours 0-1 2-3 4-5 6-7 8-9 10-11 12-13

Frequency 55 87 145 90 73 35 15


First we construct a frequency table like the following:

The Median – Grouped data

Interval Midpoint (x) Frequency (f) Cumulative fx

0-1 0.5 55 55 55(0.5) = 27.5
2-3 2.5 87 142 87(2.5) = 217.5
4-5 4.5 145 287 145(4.5) = 652.5
6-7 6.5 90 377 6.5(90) = 585
8-9 8.5 73 450 8.5(73) = 620.5
10-11 10.5 35 485 10.5(35) = 367.5
12-13 12.5 15 500 12.5(15) = 187.5
Total 2658

The Median – Grouped data

The middle data value is between values 250 and 251.

Both these data values occur in the “4-5” interval (look at
the Cumulative Frequency column). So the median is 4.5,
the midpoint of the “4-5” interval.

The Median – Grouped data

The Median – Grouped data

Calculating the median using formula

L …..Lower limit of median group

i…….Group interval
N…..Total frequency
F1….The nearest cumulative frequency before the (N/2)th value
F2…… The nearest cumulative frequency after the (N/2)th value
The Median – Grouped data

Calculating the median using formula

the (N/2)th value lies at the group ( 4-5) which

is the median group

So, L=4
i =2
N = 500
F1 = 142
F2 = 287

The Median – Grouped data

Calculating the median using formula

The Mode

❑ The mode of a set of data is the value in the set that occurs most often.
❑ A set of data can be bimodal. It is also possible to have a set of data with no
❑ Bimodal: 15, 18, 18, 18, 20, 22, 24, 24, 24, 26, 26
❑ Unimodal :6, 8, 9, 9, 9, 10, 11 14, 15, 18
❑ No Mode : 2.7, 3.5, 4.9, 5.1, 8.3

The Mode – Grouped data
For grouped data, group mode (or, modal group) is the group
with the highest frequency.
To find mode for grouped data, use the following formula:

L……….. is the lower boundary of group mode
Δ1 … the difference between the frequency of group
mode and the frequency of the group before the group mode.
Δ2 ……………is the difference between the frequency of group
mode and the frequency of the group after the group mode
i …………is the group width
The Mode – Grouped data
Example: Based on the grouped data below, find the mode

Time to travel to work Frequency

1 – 11– 8
11 – 21– 14
21 – 31– 12
31 – 41– 9
41 – 51– 7

Based on the table, the group mode is 11 – 21–
L = 11 Δ1 = 14 – 8 = 6 Δ2 = 14 – 12 = 2
i = 10

The Mode – Grouped data

Measures of Central Tendency

Figure 10 .

Measures of Central Tendency

Figure 10 Relationship among average, Median, and


Symmetrical Positively Skewed Negatively Skewed

Median Mode Average Average Mode
Mode Median Median

Measures of Central Tendency

Measures of Central Tendency

Measures of Dispersion

❑ A second tool of statistics is composed of the measures of dispersion, which
describe how the data are spread out or scattered on each side of the central
value. Measures of dispersion and measures of central tendency are both
needed to describe a collection of data.
❑ Two common types of measures of dispersion:
1- Range
2- Standard deviation

Measures of Dispersion

1- Range
The range of a series of numbers is the difference between the largest
and smallest values or observations. Symbolically, it is given by
the formula.
R = XH – XL
Where R = range
XH = highest observation in a series
XL = lowest observation in a series

Example problem

❑ If the weights of a sample of 10 bottles of shampoo are recorded as

follows ( in gm).
150, 147, 152, 156, 144, 148, 149, 153, 146,151
Determine the range of sample
Max weight XH = 156 gm
Min weight XL = 144 gm
Range R = XH – XL
= 156 – 144 = 12 gm
Measures of Dispersion

2- Standard deviation
❑ The standard deviation is a numerical value in the units of the observed values that measures the
spreading tendency of the data.
❑ A large standard deviation shows greater variability of the data than does a small standard
deviation. In symbolic terms it is given by the formula

 n 

 i
( X − X ) 2
Or n X i −   X i 

i =1  i =1 
S= i =1 S=
(n −standard n( n − 1)
Where s = sample 1) deviation
Xi = observed value
= average
nX = number of observed values

Example Problem 1

❑ Determine the standard deviation of moisture content of a roll of Kraft paper. The results of six
readings across the paper web are 6.7, 6.0, 6.4, 5.9, 6.4, and 5.8 %

X i2

Example Problem 1

❑ Determine the standard deviation of moisture content of a roll of Kraft paper. The results of six
readings across the paper web are 6.7, 6.0, 6.4, 5.9, 6.4, and 5.8 %

 n

n X i2 −   X i 
S= i =1  i =1 
n(n − 1)

6(231.26) − (37.2)2
6(6 −1)

= 0.35%
Example Problem 2

❑ Four readings of the thickness of a paper are 0.076, 0.082, 0.073, and 0.077mm. Determine the
sample standard deviation

X i2

Example Problem 2

❑ Four readings of the thickness of a paper are 0.076, 0.082, 0.073, and 0.077mm. Determine the
sample standard deviation

 n

n X i2 −   X i 
S= i =1  i =1 
n(n − 1)
) − ( 0.308)2
4(0.023758 0.095032− 0.094864
= =
4(4 −1) 12

0.000168 = 0.000014= 0.0037


Example Problem 3
2- 1- Standard deviation for grouped data

Example Problem 3
2- 1- Standard deviation for grouped data

Relationship between the measures of
dispersion (range & standard deviation)

❑ Range is useful when data are too small

❑ The standard deviation is used when a more precise measure of dispersion is desired (# of
observations > 10).

❑ As shown in Fig. 11 two distributions may have the same average and range, but their standard
deviations are different . The distribution on the bottom is much better and the sample standard
deviation is much smaller which means better quality

Fig. 11 Comparison of two distributions with
equal average and range

Table 6 Analytical Technique - Recap

Analytical Technique

Measure of Central Tendency Measure of Dispersion

Average Median Mode Range Standard Deviation

Is the value that divides a Is the difference Is a num erical value in the
series of ordered betw een the units of the observed
Is the arithm etic m ean for a Is the value that occurs
observations. It is an largest and values that m easures the
group of num bers w ith greatest frequency.
effective m easure for sm allest values or spreading tendency of the
skew ed distributions. observations data.

For odd num ber of
X i
X 1 + X 2 + ..... + X 4 observations, m edian is a
The m ode of series of
num bers varies from no
i − X )2
X= i =1
n n
single value located at the
m ode to m ulitm odal S= i =1
m iddle. (n − 1)
Weighted Average
A series of num bers is R = XH – XL

Xi For even num ber of referred to as unim odal : if n
 n 
observations, m edian is a it has one m ode, Bim odal : n X −   X i 
X = i =1 i
 i =1 
n average value for the tw o if it has tw o m odes, S = i =1
W i =1
i m iddle values Multim odal : if there are n(n − 1)
m ore that tw o m odes

Concept of a population and a sample
❑ A sample is selected to represent the population.
❑ Since the composition of samples will fluctuate, the computed statistics will
be larger or smaller than their true population values (parameters).
❑ Sampling is necessary when measuring of the entire population is:
- impossible
- too expensive
- destructive
- too dangerous
❑ We use different symbols to differentiate between samples and population.

Table 7 Comparison of sample and population

Sample Population

statistic parameter
X average µ ( Xo ) mean
S sample standard deviation σ (So) standard deviation

Table 8 Results of 8 samples of green & blue
Sample number Sample size No. of green No. of blue % of green
spheres spheres spheres
1 10 1 9 10
2 10 2 8 20
3 10 5 5 50
4 10 1 9 10
5 10 3 7 30
6 10 0 10 0
7 10 2 8 20
8 10 1 9 10
Total 80 15 65 18.8

Comparison of sample and population

❑ Table 8 shows the results of an experiment that illustrates the relationship between samples
and the population.

❑ A container holds 800 blue and 200 green spheres . The 1000 spheres are considered the
population with 20% green spheres.

❑ 8 samples of size 10 spheres are selected, checked in colour and replaced ( one by one ).

❑ The table illustrate the difference between the sample results and what should be expected
from the known population.

The Normal Curve

❑ One type of population that is quite common is called the normal curve. The normal curve is a
symmetrical, unimodal, bell-shaped distribution with the mean, median, and mode having

the same value.

❑ A population curve or distribution is developed from a frequency histogram.

❑ As the sample size of a histogram gets larger and larger, the cell interval gets smaller and

The Normal Curve

❑ When the sample size is quite large and the cell interval is very small, the histogram will
take on the appearance of a smooth polygon or a curve representing the population.

❑ Much of the variation in nature and in industry follows the frequency distribution of the normal

❑ A curve of the normal population of 1000 observations of the resistance in ohms of an electrical
device with population mean, µ, of 90 Ω and population standard deviation , σ of 2 Ω is shown
in figure12. The interval between dotted lines is equal to one standard deviation, σ.

Figure 12 The normal curve

m = 90

s = 2.0

 84 86 88 90 92 94 96 + 

The standardized normal distribution

❑ Much of the variation in nature and in industry follows the frequency distribution of the normal

❑ All normal distributions of continuous variables can be converted to the standardized normal
distribution ( see fig. 13) by using the standardized normal value Z.

❑ For example consider the value of 92 Ω in fig. 12 , which is one standard

deviation above the mean. Conversion to the Z value is

Xi − m 92− 90
Z= = = +1
s 2

Figure 13 The standardized normal distribution

m =0
s =1

 -3 -2 -1 0 1 2 3  +

The standardized normal distribution

❑ Fig. 13 shows the standardized curve with its mean of Zero and standard deviation of 1. The area
under the curve is equal to 1.0 or 100% and therefore can easily be used for probability

❑ A normal area table is provided as Table A in the appendix

Relationship to the mean and standard deviation

❑ Fig. 14 shows three normal curves with different mean values and the same standard deviation.
The only change is in location.

❑ Fig. 15 shows three normal curves with the same mean value but different standard deviations.
The figure illustrates the principle that the larger the standard deviation, the flatter the curve,
and the smaller the standard deviation, the more peaked the curve.

❑ It is noted that the two parameters ( mean & standard deviation ) are independent.

Figure 14 Normal curve with different means,
but identical standard deviations

µ = 14 µ = 20 µ = 29

5 11 14 17 20 23 26 29 32 35 38

Figure 15 Normal curve with different
standard deviations, but identical means

σ = 1.5

σ =3

σ = 4.5

5 8 11 14 17 20 23 26 29 32 35 +

Figure 16 Percent of items included between
certain values of the standard deviation




-3 σ -2 σ - 1σ µ +1 σ +2 σ +3 σ


❑ The areas under the curve for various Z values are given in Table A in the appendix. Table A,
"Areas under the Normal Curve," is a left reading table, which means that the given areas are
for that portion of the curve from to a particular value, Xi.
The first step is to determine the Z value using the formula

Xi − m
where Z = standard normal value s
= individual value
µ =X imean
σ = population standard deviation

Example problem (1)

❑ The mean value of the weight of a particular brand of cereal ‫ الحبوب‬for the past year is 0.297 kg (10.5 oz) with
a standard deviation of 0.024 kg. assuming a normal distribution, find the percent of the data that falls below
the lower specification limit of 0.274 kg. (Note: Since the mean and standard deviation were determined from a
large number of tests during the year, they are considered to be valid estimates of the population values.)

-∞ Area1

σ = 0.024

µ = 0.297
Xi = 0.274
Example problem (solution)

Xi − m
= 0.274 - 0.297
= - 0.96
From Table A it is found that for Z = - 0.96,

Area1 = 0.1685 or 16.85%

Thus, 16.85% of the data are less than 0.274 kg.

Example problem (2)

❑ Using the data from the preceding problem, determine the percentage of the data that fall above 0.347 kg.
❑ Since Table A is a left-reading table, the solution to this problem requires the use of the relationship: Area 1 +
Area2 = AreaT = 1.0000.
❑ Therefore, Area2 is determined and subtracted from 1.0000 to obtain Area1.

-∞ AreaT = 1.0000 +∞

σ = 0.024

-∞ Area2 Area1 +∞

µ = 0.297

Xi = 0.347
Example problem (solution)

Z = Xi - µ
= 0.347 – 0.297
= + 2.08

From Table A it is found that for Z2 = +2.08,

Area2 = 0.9812
Area1 = AreaT – Area2
= 1.0000 – 0.9812
= 0.0188 or 1.88%
Thus, 1.88% of the data are above 0.347 kg.

Example problem (3)
❑ A large number of tests of line voltage to home residences show a mean of 118.5 V and a population
standard deviation of 1.20 V. determine the percentage of data between 116 and 120V.
❑ Since Table A is a left-reading table. The solution requires that the area to the left of 116 V be
subtracted from the area to the left of 120 V. The graph and calculations show the technique.

-∞ Area3
σ = 1.20
-∞ Area2 Area1

µ = 118.5
Xi = 116 Xi = 120
Example problem (solution)

Z2 = Xi - µ Z3 = Xi - µ
σ σ
= 116 – 118.5 = 120 – 118.5
1.20 1.20
= – 2.08 = + 1.25

From Table A it is found that for Z2 = -2.08, Area2 = 0.0188, and for
Z3 = + 1.25, Area3 = 0.8944.
Area1 = Area3 – Area2
= 0.8944 – 0.0188
= 0.8756 or 87.56%
Thus, 87.56% of the data are between 116 and 120V.
Example problem (4)

❑ If it is desired to have 12.1% of the line voltage below 115 V, how should the mean voltage be adjusted? The
dispersion is σ = 1.20 V.
❑ The Solution to this type problem is the reverse of the other problems. First 12.1% or 0.1210, is found in the
body of table A. This give a Z value and using the formula for Z, we can solve for the mean voltage. Form Table A
with Area1 = 0.1210, the Z value of –1.17 is obtained.

Area1 = 0.1210
σ = 1.20

X0 = ?
Xi = 115
Example problem (solution)

Z = X i – X0
-1.17= 115 - X0
X0 = 116.4 V

❑ Thus, the mean voltage should be centered at 116.4 V for 12.1% of the values to
be less than 115V.

Example problem (5)
❑ The population mean of a company’s racing bicycle is 9.07 kg with a population standard deviation
of 0.4 kg. If the distribution is approximately normal, determine
❑ A) the % of bicycles less than 8.3 kg
❑ B) the % of bicycles greater than 10.00 kg
❑ C) the % of bicycles between 8.3 and 10.00 kg

-∞ Area2
σ = 0.4
-∞ Area1 Area4 Area3 ∞

µ = 9.07
Xa = 8.3 Xb = 10
Example problem 5 (solution)

Za = Xa - µ Zb = Xb - µ
σ σ
= 8.3 – 9.07 = 10 – 9.07
0.4 0.4
= – 1.925 = + 2.325

a) From Table A it is found that for Za = – 1.925 , Area1 = 0.0188,

Then 1.88% of bicycles have weights less than 8.3 kg
b) and for Zb = + 2.325 Area2 = 0.9899.
Then 0.9899 of bicycles have weights less than 10 kg
Bicycles have weights more than 10 kg = 1- 0.9899
= 0.0101 or 1.01% (Area3 )
Example problem 5 (solution)
a) From Table A it is found that for Za = – 1.925 , Area1 = 0.0188,

Then 1.88% of bicycles have weights less than 8.3 kg

b) and for Zb = + 2.325 Area2 = 0.9899.

Then 0.9899 of bicycles have weights less than 10 kg

Bicycles have weights more than 10 kg = 1- 0.9899

= 0.0101 or 1.01% (Area3 )

c) Area4 = Area2 – Area1

= 0.9899 – 0.0188

= 0.9711 or 97.11%

Thus, 97.11% of the bicycles are between 8.3 and 10 kg.

Example problem (6)
❑ Plastic strips that are used in a sensitive electronic device are manufactured to a max specifications of
305.70 mm and a min specs. of 304.55 mm. If the strips are less than the min specs., they are scrapped; if
greater than the max specs, they are reworked.. The part dimensions are normally distributed with a
population standard deviation of o.25 mm. What % of the product is scrap? What % is rework? How can
the process be centered to eliminate all but 0.1% of the scrap? What is the rework % then?

-∞ Area2
σ = 0.25
-∞ Area1

Xmin = 304.55 Xmax = 305.7

Example problem 6 (solution)

. µ = Xmin + Xmax
= 304.55 + 305.70
= 305.125

Example problem 6 (solution)

Z1 = Xmin - µ
= 304.55 – 305.125
= – 2.3

From Table A it is found that for Z1 = – 2.3, Area1 = 0.0107

Thus, 1.07% of the strips are scrapped.

Example problem 6 (solution)

Z2 = Xmax - µ
= 305.7 – 305.125
= + 2.3

From Table A it is found that for Z1 = + 2.3, Area2 = 0.9916

Thus, % of rework = 1- 0.9916 = 0.0084 = 0.84%.

Example problem 6 (solution)
From Table A it is found that for a % of 0.1 scrap, Z= – 1.28

Z = Xi - µ = – 1.28

Xi = – 1.28 X 0.25 + 305.125 = LCL

= 304.81
Xav = 304.81 + 3 X 0.25 = 305.56
UCL = 304.81 + 6 X 0.25 = 306.31

Example problem 6 (solution)

Z = Xi - µ = 306.31 – 305.125
σ 0.25

= 4.74

From Table A it is found that for Z = 4.74 ( > 3.5 ) area = 1.0

Thus, rework % = 0


You might also like