Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

BACHELOR OF SCIENCE IN ENGINEERING

(BSCE/BSEE/BSME):
ENGINEERING DATA ANALYSIS
COURSE MODULE COURSE UNIT WEEK
1 7-8 12

SAMPLING DISTRIBUTION

CHECKLIST
✓ Read course and unit objectives
✓ Read study guide prior to class attendance
✓ Read required learning resources; refer to unit
terminologies for jargons
✓ Proactively participate in classroom discussions
✓ Participate in weekly discussion board (Google Classroom)
✓ Answer and submit course unit tasks

UNIT EXPECTED OUTCOMES (UEOs)


At the end of this unit, the students are expected to:

Cognitive:
1. Learn and understand sampling distribution and estimation parameters
2. Understand the applications of central limit theorem

REQUIRED READINGS
Walpole, R. (2012), Probability and Statistics for Scientist and Engineers, 9th Ed., Chapter 8
(p.226-264)
STUDY GUIDE
SAMPLING DISTRIBUTION AND POINT ESTIMATION PARAMETERS

Populations and Samples


POPULATION
-consists of the totality of the observations with which we are
concerned.

Each observation in a population is a value of a random variable X having some


probability distribution f(x).

SAMPLE
- is a subset of a population.

If our inferences from the sample to the population are to be valid, we must
obtain samples that are representative of the population.

BIASED
-Any sampling procedure that produces inferences that consistently
overestimate or consistently underestimate some characteristic of the population is
said to be biased.

RANDOM SAMPLING
- the observations are made independently and at random to eliminate bias.

Note:
Our main purpose in selecting random samples is to elicit information about the
unknown population parameters.

STATISTIC
-Any function of the random variables constituting a random sample is called a
statistic.

Location Measures of a Sample: The Sample Mean, Median, and Mode

The most commonly used statistics for measuring the center of a set of data, arranged in
order of magnitude, are the mean, median, and mode.

MEAN

Let X1,X2, . . . , Xn represent n random variables.


The term sample mean is applied to both the statistic 𝑋̅ and its computed value 𝑥̅ .

MEDIAN

-The sample median is the location measure that shows the middle value of the
sample.

MODE

-The sample mode is the value of the sample that occurs most often.

Example:

Suppose a data set consists of the following observations:


0.32 0.53 0.28 0.37 0.47 0.43 0.36 0.42 0.38 0.43.
Determine the mean, mode and median
Variability Measures of a Sample: The Sample Variance, Standard Deviation,
and Range

The variability or the dispersion in a sample displays how the observations spread out from the
average.

SAMPLE VARIANCE:

or

SAMPLE STANDARD DEVIATION

SAMPLE RANGE:

Example
A comparison of coffee prices at 4 randomly selected grocery stores in San Diego
showed increases from the previous month of 12, 15, 17, and 20 cents for a 1-pound
bag. Find the variance of this random sample of price increases.
Sampling Distributions

• The field of statistical inference is basically concerned with generalizations and predictions.

• Inference about the Population from Sample Information

• We computed a statistic from a sample selected from the population, and from this statistic
we made various statements concerning the values of population parameters that may or
may not be true.

• Since a statistic is a random variable that depends only on the observed sample, it must have
a probability distribution.

SAMPLING DISTRIBUTION
• -the probability distribution of a statistic is called a sampling distribution. The sampling
distribution of a statistic depends on the distribution of the population, the size of the samples,
and the method of choosing the samples.

• The sampling distribution of 𝑋̅with sample size n is the distribution that results when an
experiment is conducted over and over (always with sample size n) and the many values of
𝑋̅ result. This sampling distribution, then, describes the variability of sample averages around
the population mean μ.

The Central Limit Theorem


If 𝑋̅ is the mean of a random sample of size n taken from a population with mean μ
and finite variance σ2, then the limiting form of the distribution of

as n→∞, is the standard normal distribution n(z; 0, 1).


“the presumption of normality on the distribution of If 𝑋̅ becomes more accurate as n grows larger.”

If we are sampling from a population with unknown distribution, either finite or infinite, the sampling
distribution of If 𝑋̅ will still be approximately normal with mean μ and variance σ2/n, provided that the
sample size is large.

The normal approximation for If 𝑋̅ will generally be good if n ≥ 30, provided the population distribution
is not terribly skewed. If n < 30, the approximation is good only if the population is not too different
from a normal distribution and, if the population is known to be normal, the sampling distribution of
If 𝑋̅ will follow a normal distribution exactly, no matter how small the size of the samples.
Inferences on the Population Mean
• One very important application of the Central Limit Theorem is the determination of
reasonable values of the population mean μ. Topics such as hypothesis testing, estimation,
quality control, and many others make use of the Central Limit Theorem.

In the case study below, an illustration is given which draws an inference that makes use of the
sampling distribution of 𝑋̅ .

Example:
An important manufacturing process produces cylindrical component parts for the automotive
industry. It is important that the process produce parts having a mean diameter of 5.0 millimeters.
The engineer involved conjectures that the population mean is 5.0 millimeters. An experiment is
conducted in which 100 parts produced by the process are selected randomly and the diameter
measured on each. It is known that the population standard deviation is σ = 0.1
millimeter. The experiment indicates a sample average diameter of 𝑥̅ = 5.027 millimeters. Does this
sample information appear to support or refute the engineer’s conjecture?

Solution:

Whether the data support or refute the conjecture depends on the probability that data similar to
those obtained in this experiment (𝑥̅ = 5.027) can readily occur when in fact μ = 5.0. In other words,
how likely is it that one can obtain x ≥ 5.027 with n = 100 if the population mean is μ = 5.0? If this
probability suggests that 𝑥̅ = 5.027 is not unreasonable, the conjecture is not refuted. If the probability
is quite low, one can certainly argue that the data do not support the conjecture that μ = 5.0. The
probability that we choose to compute is given by P(| 𝑋̅ − 5| ≥ 0.027).
In other words, if the mean μ is 5, what is the chance that X ̅ will deviate by as much as 0.027
millimeter?

Therefore, one would experience by chance that an ̅𝑥 would be 0.027 millimeter


from the mean in only 7 in 1000 experiments. As a result, this experiment with
𝑥̅ = 5.027 certainly does not give supporting evidence to the conjecture that μ =
In fact, it strongly refutes the conjecture!

Example:

Traveling between two campuses of a university in a city via shuttle bus takes, on average, 28
minutes with a standard deviation of 5 minutes. In a given week, a bus transported passengers 40
times. What is the probability that the average transport time was more than 30 minutes? Assume
the mean time is measured to the nearest minute.

solution

In this case, μ = 28 and σ = 5. We need to calculate the probability P(X¯ > 30) with n = 40. Since the
time is measured on a continuous scale to the nearest minute, an ¯x greater than 30 is equivalent
to ¯x ≥ 30.5. Hence,
There is only a slight chance that the average time of one bus trip will exceed 30
minutes. An illustrative graph is shown below:

Sampling Distribution of the Difference between Two Means

The Central Limit Theorem can be easily extended to the two-sample, two-population
case.

If independent samples of size n1 and n2 are drawn at random from two populations,
discrete or continuous, with means μ1 and μ2 and variances 12 and 22 , respectively, then the
sampling distribution of the differences of means, 𝑋̅ 1 − 𝑋̅ 2, is approximately normally distributed
with mean and variance given by

hence
is approximately a standard normal variable.

Example (Case study on Paint Drying Time):


Two independent experiments are run in which two different types of paint are compared. Eighteen
specimens are painted using type A, and the drying time, in hours, is recorded for each. The same
is done with type B. The population standard deviations are both known to be 1.0. Assuming that the
mean drying time is equal for the two types of paint, find P( X A̅ − X ̅B > 1.0), where X A̅ and X B̅ are
average drying times for samples of size nA = nB = 18.

Solution:
From the sampling distribution of X ̅A− X B̅ , we know that the distribution is
approximately normal with mean

The desired probability is given by the shaded region in Figure below


Example:
The television picture tubes of manufacturer A have a mean lifetime of 6.5 years and a standard
deviation of 0.9 year, while those of manufacturer B have a mean lifetime of 6.0 years and a standard
deviation of 0.8 year. What is the probability that a random sample of 36 tubes from manufacturer A
will have a mean lifetime that is at least 1 year more than the mean lifetime of a sample of 49 tubes
from manufacturer B?

Sampling Distribution of the Variance S2

• Sampling distributions of important statistics allow us to learn information about parameters.

• If S2 is the variance of a random sample of size n taken from a normal population having the
variance σ2, then the statistic below has a chi-squared distribution with  = n − 1 degrees of
freedom.

Note: Chi-square a chi-squared distribution is a special case of a gamma distribution.


The chi-squared distribution plays a vital role in statistical inference. It has considerable applications
in both methodology and theory.
The values of the random variable χ2 are calculated from each sample by the

The probability that a random sample produces a χ2 value greater than some specified value is equal
to the area under the curve to the right of this value.

The chi-squared distribution (Table A5)

Note:
• 95% Exactly 95% of a chi-squared distribution lies between 𝜒0.975 2 2
vand 𝜒0.025
• 2
A χ2 value falling to the right of 𝜒0.025 is not likely to occur unless our assumed value of
• 2
σ2 is too small. Similarly, a χ2 value falling to the left of 𝜒0.975 is unlikely unless our assumed
value of σ is too large.
2

Example:
A manufacturer of car batteries guarantees that the batteries will last, on average,
3 years with a standard deviation of 1 year. If five of these batteries have lifetimes of 1.9, 2.4, 3.0,
3.5, and 4.2 years, should the manufacturer still be convinced that the batteries have a standard
deviation of 1 year? Assume that the battery lifetime follows a normal distribution.

Solution : We first find the sample variance using Theorem


Since 95% of the χ2 values with 4 degrees of freedom fall between 0.484 and 11.143, the
computed value with σ2 = 1 is reasonable, and therefore the manufacturer has no reason to suspect
that the standard deviation is other than 1 year.

Example:
Assume the sample variances to be continuous measurements. Find the probability that a random
sample of 25 observations, from a normal population with variance σ2 = 6, will have a sample
variance S2
(a) greater than 9.1;
(b) between 3.462 and 10.745.

Solution:
(a)

(b)

t-Distribution

• Use of the Central Limit Theorem and the normal distribution is certainly helpful around
inferences on a population mean or the difference between two population means.. However,
it was assumed that the population standard deviation is known
• In many experimental scenarios, knowledge of σ is certainly no more reasonable than
knowledge of the population mean μ. Often, in fact, an estimate of σ must be supplied by the
same sample information that produced the sample average x ̅. As a result, a natural statistic
to consider to deal with inferences on μ is

Theorem
Let Z be a standard normal random variable and V a chi-squared random variable with  degrees of
freedom. If Z and V are independent, then the distribution of the random variable T, where
is given by the density function

This is known as the t-distribution with  degrees of freedom

What Is the t-Distribution Used For?


The t-distribution is used extensively in problems that deal with inference about the population mean
or in problems that involve comparative samples (i.e., in cases where one is trying to determine if
means from two samples are significantly different).

Example.
A chemical engineer claims that the population mean yield of a certain batch process is 500 grams
per milliliter of raw material. To check this claim he samples 25 batches each month. If the computed
t-value falls between −t0.05 and t0.05, he is satisfied with this claim. What conclusion should he draw
from a sample that has a mean (x ) ̅= 518 grams per milliliter and a sample standard deviation s =
40 grams? Assume the distribution of yields to be approximately normal.

Solution:
From Table A.4 we find that t0.05 = 1.711 for 24 degrees of freedom. Therefore, the engineer can be
satisfied with his claim if a sample of 25 batches yields a t-value between −1.711 and 1.711.

If μ= 500, then t = 518 – 500 = 2.25


40/√25

a value well above 1.711. The probability of obtaining a t-value, with  = 24, equal to or greater than
2.25 is approximately 0.02. If μ > 500, the value of t computed from the sample is more reasonable.
Hence, the engineer is likely to conclude that the process produces a better product than he thought.

F-Distribution
The statistic F is defined to be the ratio of two independent chi-squared random variables, each divided
by its number of degrees of freedom.

where U and V are independent random variables having chi-squared distributions with v1 and v2 degrees
of freedom, respectively. We shall now state the sampling distribution of F.
Theorem
Let U and V be two independent random variables having chi-squared distributions with 1 and 2
degrees of freedom, respectively. Then the distribution of the random variable F, is given by the density
function

This is known as the F-distribution with 1 and 2 degrees of freedom (d.f.).

Typical F-distributions

Illustration of the fα for the Fdistribution.


What Is the F-Distribution Used For?
• The F distribution is used in two-sample situations to draw inferences about the population
variances.
• the F-distribution can also be applied to many other types of problems involving sample
variances. In fact, the F-distribution is called the variance ratio distribution.

TERMINOLOGIES
Refer to terms in the study guide

FURTHER READINGS
Further Reading:
What is F-distribution and where is it used?
What is it used for?
What is Quantile Plot and Probability Plot?
How to construct Quantile Plots?

UNIT TASK
• Answer Module 1-CU 6 Unit Task 1 - 3 in our MathEng3 Workbook

REFERENCES
[1] Walpole, R. (2012), Probability and Statistics for Scientist and Engineers, 9th Ed.

[2] Song, T. (2004), Fundamentals of Probability and Statistics for Engineers

You might also like