CHAPTER 6 - Basic Statistic Concepts

BTV3413 –




• To review basic statistic concepts

• To understand how to graphically and analytically study a process by using statistics
• To know how to create and intercepts a frequency diagram and a histogram
• To know how to calculate the mean, median, mode, range and standard deviation for
a given set of numbers
• To understand the importance of the normal curve and the central limit theorem in
quality assurance.
• To know how to find the area under a curve using the standard normal probability
distribution (tables)
• To understand how to interpret the information analyzed


• Statistics is the collection, tabulation, analysis, interpretation and presentation

of analytical data which provide a viable method of supporting or clarifying a
topic under discussion.

• In industry the statistical representation of data is the foundation for quality

assurance and quality improvement processes if correctly used. It can be used
as a decision support to change processes or pursuing a particular course of

• Assumption based on incomplete information can lead to incorrect decisions,

unwise investment and uncomfortable environment.

Population Vs Samples

• The entire group of individuals is called the population

• For example, a researcher may be interested in the relation between class

size (variable 1) and academic performance (variable 2) for the population of
third-grade children

• Usually, populations are so large that a researcher cannot examine the entire

• Therefore, a sample is selected to represent the population in a research

study. The goal is to use the results obtained from the sample to help answer
questions about the population.


• In quality control, two types of numerical data can be collected: Variable data
and attribute data.

• A variable is a characteristic or condition that can change or take on different


• Attribute data are those quality characteristics that are observed to be either
present or absent, conforming or nonconforming.

• Although both variables and attribute data can be described by numbers,

attribute data are countable, not measurable.

• Attribute data will always be a whole number because it counts the presence or
absence of a chosen characteristic.


• A sample is a subset of elements or measurements taken from a population.

• Descriptive or deductive statistics describe a population or complete group of

data. When describing a population using deductive statistics, the investigator
must study each entity within the population. This provides a great deal of
information about the population, product, or process, but gathering the
information is time-consuming.

• Inductive statistics deal with a limited amount of data or a representative

sample of the population

• Measurement error is the difference between a value measured and the true
value. The error that occurs is one either of accuracy or of precision.


• Accuracy refers to how far from the actual or real value the measurements is

• Precision is the ability to repeat a series of measurements and get the same
value each time (sometimes referred to as repeatability)

Statistic : Frequency diagram
• A frequency diagram shows the number of times each of the measured values
occurred when the data were collected. This diagram can be created either
from measurements taken from a process or from data taken from the
occurrences of events.

Statistic : Frequency diagram
To create :
1. Collect the data. Record the measurements or counts of the characteristics of
2. Count the number of times each measurement or count occurs.
3. Construct the diagram by placing the counts or measured values on the x axis and
the frequency or number of occurrences on the y axis. The x axis must contain each
possible measurement value from the lowest to the highest, even if a particular
value does not have any corresponding measurements. A bar is drawn on the
diagram to depict each of the values and the number of times the value occurred in
the data collected.
4. Interpret the frequency diagram. Study the diagrams you create and think about the
diagram’s shape, size, and location in terms of the desired target specification.

Example : clutch plate grouped data for thickness

Clutch plate thickness was measured to

respond to customer issues, the engineers
involved in the clutch plate problem are
studying the thickness of the part. To gain a
clearer understanding of incoming materials
thickness, they plan to create a frequency
diagram for the grouped data as shown in
table. The first step is to perform by the
operator, who randomly selects five part
each hour, measures the thickness of each
part and record the values

Data analysis : graphical

Clutch plate thickness frequency

Clutch plate thickness Tally sheet distribution ( Coded 0.06)

Statistic : Histogram
• Similar to frequency diagrams.
❑ The most notable difference between the two is that on a histogram the data are
grouped into cells. Each cell contains a range of values.
Step 1: Collect the data and construct a tally sheet
Step 2: Calculate the range
Step 3: Create the cells by determining the cell intervals, midpoints, and
Step 4: Label the axes
Step 5: Post the values
Step 6: Interpret the histogram

Statistic : Histogram

Statistic : Histogram
• Analyze histogram by studying :

❑ Shape

❑ Location

❑ Spread

Statistic : Histogram
• Shape : the form that the values of the measurable characteristics take on
when graphed. Shape is based on the distribution’s symmetry, skewness, and

• Location : Where is the distribution in relation to the target?

• Spread : the distance between the highest and lowest values.

Statistic : Histogram
• Analytical methods of describing histograms exist.

• Though shape was easily seen from a picture, the location and spread can be
more clearly identified mathematically

• Location is described by measures of central tendency: the mean, mode, and

median. Spread is defined by measures of dispersion: the range and standard

Statistic : Histogram
• Mathematical description of histogram : measures of central tendency:

• Mean –is determined by adding the values together and then dividing this sum
by the total number of values

• Median –is the value that divided an ordered series of numbers so that there is
an equal number of values on either side of the center, or median value

• Mode – is the most frequently occurring number is a group of values.

Statistic : Histogram
• Mathematical description of histogram : measures of dispersion:

• Range –is the difference between the highest value in a series of values or
sample and the lowest value in the same series

• Standard deviation – shows the dispersion of the data within the distribution

Mean = average

Median = middle
• Put numbers in order from lowest to highest and find the number that is exactly
in the middle

20, 15, 10, 10, 10, 1

• Since there is an even number of values the median is 10 years (average of

the 2 middle values)

• For odd number, the values of median is the number in the middle.

Mode = frequently occurring number
• Number in data set that occurs most often : 20, 15, 10, 10, 10, 1

• Sometimes there will not be a mode : 20, 17, 15, 8, 3

• Record answer as “none” or “no mode” – NOT “0”

• Sometimes there will be more than one mode

• 20, 15, 15, 10, 10, 10, 1

Range = difference between the lowest and highest

20, 15, 10, 10, 1

20-1 = 19 years

• The range tells you how spread out the data points are.

The mean of four numbers is 50.5

101 99 1 1

• What is the median?

• What is the mode?

Measured Values

• When making a set of repetitive measurements, the standard deviation

(S.D.) can be determined to

❑ indicate how much the samples differ from the mean

❑ Indicates also how spread out the values of the samples are

Variance vs

Standard Deviation

• The smaller the standard deviation, the higher the quality of the measuring
instrument and your technique

• Also indicates that the data points are also fairly close together with a small
value for the range.

• Indicates that you did a good job of precision your measurements.

Standard Deviation

A high or large standard deviation

• Indicates that the values or measurements are not similar

• There is a high value for the range

• Indicates a low level of precision (you didn’t make measurements that were
close to the same)

• The standard deviation will be “0” if all the values or measurements are the

You and your friends have just measured the heights of your dogs (in

The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and

Find out the Mean, the Variance, and the Standard Deviation.

• Our example was for a Population (the 5 dogs were the only dogs we were
interested in).

• But if the data is a Sample (a selection taken from a bigger Population), then
the calculation changes!

• When you have "N" data values that are:

• The Population: divide by N when calculating Variance (like we did)

• A Sample: divide by N-1 when calculating Variance

The "Population Standard Deviation":

The "Sample Standard Deviation":

Example :

The frequency table of the monthly salaries of 20 people is shown below

Salary ($) frequency

3500 5
4000 8
4200 5
4300 2

a) Calculate the mean of the salaries of the 20 people

b) Calculate the standard deviation of the salaries of the 20 people

a) Mean = $3955

b) variance = [(3500 - 3955)^2 * 5 + (4000 - 3955)^2 * 8 + (4200 - 3955)^2 * 5 +

(4300 - 3955)^2 * 2] / 20 = [(-455)^2 * 5 + (45)^2 * 8 + (245)^2 * 5 + (345)^2 *
2] / 20 = (103225 * 5 + 2025 * 8 + 60025 * 5 + 119025 * 2) / 20 = 53975

c) SD = Standard Déviation = √(Variance) = √(53975) ≈ $232.50

• Mean = (600 + 470 + 170 + 430 + 300)/5 = 1970/5 = 394

• Now we calculate each dog's difference from the Mean

• To calculate the Variance, take each difference, square it, and then average
the result is 21,704

• The standard deviation = sqrt (21,704) = 147

• Now, we can show which heights are within one S.D (147) of the mean.

• Using the S.D, we have a standard way of knowing what is normal, and what is
extra large or extra small

The graphs show three normal distributions with the same
mean, but the taller graph is less “spread out.”
Therefore, the data represented by the taller graph has a
smaller standard deviation

Statistic : Central Limit Theorem
• The central limit theorem states that a group of sample averages tends to be
normally distributed; as the sample size n increases, this tendency toward
normality improves.

• This means that the population from which the samples are taken does not
need to be normally distributed for the sample averages to tend to be normally

• In the field of quality, the central limit theorem supports the use of sampling to
analyze the population. The mean of the sample averages will approximate
the mean of the population.

Statistic : Normal Frequency Distribution
• The normal frequency distribution, the familiar bell-shaped curve , is
commonly called a normal curve. A normal frequency distribution is
described by the normal density function:

Statistic : Normal Frequency Distribution
• Normal Frequency Distribution (the Normal Curve)
❑ A normal curve is symmetrical about µ
❑ The mean, mode, and median are equal
❑ The curve is unimodal and bell-shaped
❑ Data values concentrate around the mean and decrease in number further
❑ The area under the normal curve equals 1
❑ The distribution can be described in terms of the mean and standard

Percentage of Measurements Falling Within
Each Standard Deviation

Standard Normal Probability Distribution : Z tables

Statistic : Normal Frequency Distribution
• To find Area under Normal Curve:

Statistic : Normal Frequency Distribution

The engineers working with the clutch plate thickness data have determined that their
data approximates a normal curve. They would like to determine what percentage of
parts from the samples taken is below 0.0624 inch and above 0.0629 inch.

They calculated an average of 0.0627 and a standard deviation of 0.00023. They used
the Z tables to determine the percentage of parts under 0.0624 inch thick.

• Area = 0.0968 or 9.68 percent of the parts are thinner than 0.0624 inch

The Rockwell hardness of specimens of an alloy shipped by your supplier

varies according to a normal distribution with mean 70 and standard deviation 3.
Specimens are acceptable for machining only if their hardness is greater than
65. What percentage of specimens will be acceptable? Draw the normal curve
diagram associated with this problem.

