Session 5

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Probability

IPM – Term II, January 2023

Dr. Landis Conrad Felix Michel


7.1
Sampling Plans and Experimental Designs
Sampling Plans and Experimental Designs

The way a sample is selected is called the sampling plan


or experimental design. Knowing the sampling plan used
in a particular situation will often allow you to measure the
reliability or goodness of your inference.

Simple random sampling is a commonly used sampling


plan in which every sample of size n has the same chance
of being selected. For example, suppose you want to select
a sample of size n = 2 from a population containing N = 4
objects.

3
Sampling Plans and Experimental Designs

If the four objects are identified by the symbols x1, x2, x3,
and x4, there are six distinct pairs that could be selected, as
listed in table.
Sample Observations in Sample
1 x1, x2
2 x1, x3
3 x1, x4
4 x2, x3
5 x2, x4
6 x3, x4

Ways of Selecting a Sample of Size 2 from 4 Objects

Table 7.1

4
Sampling Plans and Experimental Designs

If the sample of n = 2 observations is selected so that each


of these six samples has the same chance—one out of six
or 1/6—of selection, then the resulting sample is called a
simple random sample, or just a random sample.

DEFINITION
If a sample of n elements is selected from a population of
N elements using a sampling plan in which each of the
possible samples has the same chance of selection, then
the sampling is said to be random and the resulting
sample is a simple random sample.

5
Sampling Plans and Experimental Designs

Remember that nonrandom samples can be described


but cannot be used for making statistical inferences!

6
7.2
Statistics and Sampling Distributions
Statistics and Sampling Distributions

When you select a random sample from a population, the


numerical descriptive measures you calculate from the
sample are called statistics.

These statistics vary or change for each different random


sample you select; that is, they are random variables.

8
Statistics and Sampling Distributions

The probability distributions for statistics are called


sampling distributions because, in repeated sampling,
they tell us:
• What values of the statistic can occur.
• How often each value occurs.

DEFINITION
The sampling distribution of a statistic is the probability
distribution for the possible values of the statistic that
results when random samples of size n are repeatedly
drawn from the population.

9
Example 7.3
A population consists of N = 5 numbers: 3, 6, 9, 12, 15. If a
random sample of size n = 3 is selected without
replacement, find the sampling distributions for the sample
mean and the sample median M.

10
Example 7.3 – Solution
We are sampling from the population shown in figure.

Figure 7.1

11
Example 7.3 – Solution
It contains five distinct numbers and each is equally likely,
with probability p(x) = 1 ∕ 5. We can easily find the population
mean and median as

To find the sampling distribution, we need to know what


values of and M can occur when the sample is taken.

12
Example 7.3 – Solution
There are possible random samples of size n and
each is equally likely, with probability 1 ∕ 10. These samples,
along with the calculated values of and m for each, are
listed in table.

Values of and m for Simple Random Sampling when n = 3 and N = 5


Table 7.3

13
Example 7.3 – Solution
You will notice that some values of are more likely than
others because they occur in more than one sample. For
example,

14
Example 7.3 – Solution
Using the values in Table 7.3, we can find the sampling
distribution of and m, shown in table and graphed in
Figure 7.2 shown on the next slide.

m p(m)
6 .3
9 .4
12 .3
(a) (b)
Sampling Distributions for (a) the Sample Mean and (b) the Sample Median
Table 7.4

15
Example 7.3 – Solution

Figure 7.2

16
7.3
The Central Limit Theorem and the Sample Mean
The Central Limit Theorem
Under rather general conditions, this theorem states that
sums and means of random samples of measurements
drawn from a population tend to have an approximately
normal distribution.

For example, suppose you toss a die n = 1 time. The


random variable x is the number observed on the upper
face.

18
The Central Limit Theorem
This familiar random variable can take six values, each with
probability 1∕ 6, and its probability distribution is shown in
figure.

Probability distribution for x, the number


appearing on a single toss of a die
Figure 7.3

19
The Central Limit Theorem
The shape of the distribution is flat—generally called a
discrete uniform distribution—and is symmetric about the
mean μ = 3.5, with a standard deviation σ = 1.71.

Now, take a sample of size n = 2 from this population; that


is, toss two dice and record the sum of the numbers on the
two upper faces, Σxi = x1 +x2.

20
The Central Limit Theorem
Table shows the 36 possible outcomes, each with
probability 1∕ 36.

Sums of the Upper Faces of Two Dice


Table 7.5(a)

The sums are tabulated, and each of the possible sums is


divided by n = 2 to obtain an average.

21
The Central Limit Theorem
When all of the 36 possible averages are consolidated into
a statistical table, the result is the sampling distribution of
shown in table and graphed in figure.

Sampling Distribution of Sampling distribution of for n = 2 dice


Table 7.5(b) Figure 7.4

22
The Central Limit Theorem
Notice the dramatic difference in the shape of the sampling
distribution. It is now roughly mound-shaped but still
symmetric about the mean μ = 3.5.

Using a similar procedure, we generated the sampling


distributions of when n = 3 and n = 4.

23
The Central Limit Theorem
For n = 3, the sampling
distribution in figure
clearly shows the mound
shape of the normal
probability distribution,
still centered at μ = 3.5.

Sampling distribution of for n = 3 dice


Figure 7.5

Notice also that the spread of the distribution is slowly


decreasing as the sample size n increases.

24
The Central Limit Theorem
Figure dramatically shows
that the distribution of
is approximately
normally distributed
based on a sample as
small as n = 4.

Sampling distribution of for n = 4 dice


Figure 7.6

This phenomenon is the result of an important statistical


theorem called the Central Limit Theorem (CLT).

25
The Central Limit Theorem
Central Limit Theorem
If random samples of n observations are drawn from a
nonnormal population with finite mean μ and standard
deviation σ, then, when n is large, the sampling distribution
of the sample mean is approximately normally
distributed, with mean μ and standard deviation

The approximation becomes more accurate as n becomes


large.

n = sample size 26
The Central Limit Theorem
The Central Limit Theorem can be restated to apply to the
sum of the sample measurements Σxi, which, as n
becomes large, also has an approximately normal
distribution with mean nμ and standard deviation

When the Sample Size Is Large Enough to Use the


Central Limit Theorem
• If the sampled population is normal, then the sampling
distribution of will also be normal, no matter what
sample size you choose.

summission x ~ N(nk, std.dev * (n)^1/2

27
The Central Limit Theorem
• When the sampled population is approximately
symmetric, the sampling distribution of becomes
approximately normal for relatively small values of n.
Remember how rapidly the discrete uniform distribution in
the dice example became mound-shaped (n = 3).
• When the sampled population is skewed, the sample
size n must be larger, with n at least 30 before the
sampling distribution of becomes approximately normal.

28
The Central Limit Theorem
These guidelines suggest that, for many populations, the
sampling distribution of will be approximately normal for
moderate sample sizes, but as specific applications of the
Central Limit Theorem arise, we will give you the
appropriate sample size n.

29
The Sampling Distribution of the Sample Mean

The Sampling Distribution of the Sample Mean,


• If a random sample of n measurements is selected from a
population with mean μ and standard deviation σ, the
sampling distribution of the sample mean will have
mean μ and standard deviation

• If the population has a normal distribution, the sampling


distribution of will be exactly normally distributed,
regardless of the sample size, n.

30
The Sampling Distribution of the Sample Mean

• If the population distribution is nonnormal, the sampling


distribution of will be approximately normally distributed
for large samples (by the Central Limit Theorem).
Conservatively, we require n ≥ 30.

31
Standard Error of the Sample Mean

DEFINITION
The standard deviation of a statistic used as an estimator
of a population parameter is also called the standard error
of the estimator (abbreviated SE) because it refers to the
precision of the estimator. Therefore, the standard
deviation of is referred to as the
standard error of the mean (abbreviated as
SEM, or sometimes just SE).

32
7.4

Assessing Normality
Assessing Normality
• Histogram. Construct a histogram of the data. If the
histogram departs significantly from a bell-shaped
distribution you can conclude that the data do not have a
normal distribution.

• Box Plot. Construct a box plot and check for outliers. One
or more outliers may indicate that the data do not have a
normal distribution. Also can check if distribution is
Skewed (Left or Right).

34
Assessing Normality
• Normal Probability Plot. If the histogram is relatively scatter plot - which plots
ordered data points
symmetric and there are no extreme outliers, use a
statistical computer package to generate a normal
probability plot in which the ordered data points are
plotted against their z-values (Not Z scores).
z score measures distance of value from mean
z value = inverse of cumulative distribution of x.
• Normal Distribution. If the data have been drawn from a
normal population, the normal probability plot should be
reasonably close to a straight line and the plotted data
points should not show a systematic departure from this
straight line pattern.

35
Assessing Normality
• Nonnormal Distribution. If the normality plot is not
reasonably close to a straight line and/or the plotted
points exhibit some systematic pattern that is not a
straight line, the data is not normal.

36
Example 7.6
The histogram and normal probability plot in figure were
constructed based upon a sample of n = 50 observations
from a normal population with mean μ = 10 and standard
deviation σ = 2.

Histogram and normal probability plot for data from a normal distribution
Figure 7.10

37
Example 7.6
Comment on the shape of the histogram and whether the
normal probability plots can reasonably be described as a
straight line.

Solution:
The histogram is almost symmetrical and displays the
mound shape (or bell shape) of the normal curve.

38
Example 7.6 – Solution
The probability plot shows the ordered data points lying
almost in a straight line.

Although all normal plots based on normal data will not


always look this good, these are the characteristics that
you look for.

39
Assessing Normality (4 of 4)
What happens if the data is from a distribution that is not
normal? Let’s investigate some non-normal situations.

40
Example 7.7 (1 of 2)
Suppose the data are selected from a discrete uniform
distribution on the integers 1 to 10.

A sample of n = 100 observations produced the histogram


and normal probability plot in figures.

41
Example 7.7 (2 of 2)

Histogram and normal probability plot for data from a discrete uniform distribution
Figure 7.11

How do they differ from those produced by a normal


sample?

42
Example 7.7 – Solution
The histogram is far from mound-shaped, and is relatively
flat, characteristic of a discrete uniform distribution, and
hence not normal.

The normal probability plot shows a downturn in the lower


area of the plot and an upturn in the upper area of the plot.

This reflects the fact that the tails do not taper off like the
normal curve, but rather, both tails are cut off, the lower at
1 and the upper at 10. This is not characteristic of the
normal distribution.

43
Example 7.9
The data are n = 48 sea-level pressures measured monthly
for 4 years. Discuss the nonnormal aspects of the graphs in
figure.

Figure 7.13

44
Example 7.9 – Solution
The data are not normal based upon the histogram in
Figure 7.13(a).

The probability plot in Figure 7.13(b) has the appearance


of a wavy line first below the centerline, then above, again
below ending above the centerline, indicating a periodic
pattern in the data.

45
References – Additional Readings
• Chapters 7, “Introduction to Probability and Statistics”, 2020, William Mendenhall, Robert J.
Beaver, Barbara M. Beaver, 15TH Edition, Cengage Learning, ISBN: 1337554421

•.

random variables
[a,b]
a and b are upper limit and lower limit
function = RAND(b-a) + a
gives you random variables between the range

You might also like