Professional Documents
Culture Documents
Finals Math14
Finals Math14
Introduction:
In statistics, point estimation involves the use of sample data to calculate a single
value (known as a point estimate since it identifies a point in some parameter
space) which is to serve as a "best guess" or "best estimate" of an unknown
population parameter (for example, the population mean). More formally, it is the
application of a point estimator to the data to obtain a point estimate.
Suppose that we draw all possible samples of size n from a given population.
Suppose further that we compute a statistic (e.g., a mean, proportion, standard
deviation) for each sample. The probability distribution of this statistic is called
a sampling distribution. And the standard deviation of this statistic is called
the standard error.
If the population size is much larger than the sample size, then the sampling
distribution has roughly the same standard error, whether we
sample with or without replacement. On the other hand, if the sample represents
a significant fraction (say, 1/20) of the population size, the standard error will be
meaningfully smaller, when we sample without replacement.
We know the following about the sampling distribution of the mean. The mean of
the sampling distribution (̅̅̅) is equal to the mean of the population (μ). And the
standard error of the sampling distribution (̅̅̅) is determined by the standard
deviation of the population (σ), the population size (N), and the sample size (n).
These relationships are shown in the equations below:
̅̅̅ = μ
̅̅̅ √
√
In the standard error formula, the factor sqrt[ (N - n ) / (N - 1) ] is called the finite
population correction or fpc. When the population size is very large relative to the
sample size, the fpc is approximately equal to one; and the standard error
formula can be approximated by:
̅̅̅
√
You often see this "approximate" formula in introductory statistics texts. As a
general rule, it is safe to use the approximate formula when the sample size is no
bigger than 1/20 of the population size.
We find that the mean of the sampling distribution of the proportion (μ p) is equal
to the probability of success in the population (P). And the standard error of the
2
sampling distribution (σp) is determined by the standard deviation of the
population (σ), the population size, and the sample size. These relationships are
shown in the equations below:
μp = P
σp = √
√
σp = √ √
where σ = √
Like the formula for the standard error of the mean, the formula for the standard
error of the proportion uses the finite population correction, sqrt[ (N - n ) / (N - 1)
]. When the population size is very large relative to the sample size, the fpc is
approximately equal to one; and the standard error formula can be approximated
by:
σp = √
The central limit theorem states that the sampling distribution of the mean of
any independent, random variable will be normal or nearly normal, if the sample
size is large enough.
In practice, some statisticians say that a sample size of 30 is large enough when
the population distribution is roughly bell-shaped. Others recommend a sample
size of at least 40. But if the original population is distinctly not normal (e.g., is
badly skewed, has multiple peaks, and/or has outliers), researchers like the
sample size to be even larger.
3
The t distribution and the normal distribution can both be used with statistics that
have a bell-shaped distribution. This suggests that we might use either the t-
distribution or the normal distribution to analyze sampling distributions. Which
should we choose?
Guidelines exist to help you make that choice. Some focus on the population
standard deviation.
If the sample size is large, use the normal distribution. (See the
discussion above in the section on the Central Limit Theorem to
understand what is meant by a "large" sample.)
Assume that a school district has 10,000 6th graders. In this district, the average
weight of a 6th grader is 80 pounds, with a standard deviation of 20 pounds.
Suppose you draw a random sample of 50 students. What is the probability that
the average weight of a sampled student will be less than 75 pounds?
Solution: To solve this problem, we need to define the sampling distribution of the
mean. Because our sample size is greater than 30, the Central Limit Theorem
tells us that the sampling distribution will approximate a normal distribution.
To define our normal distribution, we need to know both the mean of the
sampling distribution and the standard deviation. Finding the mean of the
sampling distribution is easy, since it is equal to the mean of the population.
Thus, the mean of the sampling distribution is equal to 80.
The standard deviation of the sampling distribution can be computed using the
following formula.
̅̅̅ √
√
4
̅̅̅ √
√
̅̅̅
Let's review what we know and what we want to know. We know that the
sampling distribution of the mean is normally distributed with a mean of 80 and a
standard deviation of 2.81. We want to know the probability that a sample mean
is less than or equal to 75 pounds.
Because we know the population standard deviation and the sample size is
large, we'll use the normal distribution to find probability. To solve the problem,
we plug these inputs into the Normal Probability Calculator: mean = 80, standard
deviation = 2.81, and normal random variable = 75. The Calculator tells us that
the probability that the average weight of a sampled student is less than 75
pounds is equal to 0.038.
Note: Since the population size is more than 20 times greater than the sample
size, we could have used the "approximate" formula σ x = [ σ / sqrt(n) ] to compute
the standard error. Had we done that, we would have found a standard error
equal to [ 20 / √ ] or 2.83.
Example: Find the probability that of the next 120 births, no more than 40% will
be boys. Assume equal probabilities for the births of boys and girls. Assume also
that the number of births in the population (N) is very large, essentially infinite.
Solution: The Central Limit Theorem tells us that the proportion of boys in 120
births will be approximately normally distributed.
The mean of the sampling distribution will be equal to the mean of the population
distribution. In the population, half of the births result in boys; and half, in girls.
Therefore, the probability of boy births in the population is 0.50. Thus, the mean
proportion in the sampling distribution should also be 0.50.
The standard deviation of the sampling distribution (i.e., the standard error) can
be computed using the following formula.
σp = √ √
Here, the finite population correction is equal to 1.0, since the population size (N)
was assumed to be infinite. Therefore, standard error formula reduces to:
σp = √
σp = √
σp = 0.04564
5
Let's review what we know and what we want to know. We know that the
sampling distribution of the proportion is normally distributed with a mean of 0.50
and a standard deviation of 0.04564. We want to know the probability that no
more than 40% of the sampled births are boys.
Because we know the population standard deviation and the sample size is
large, we'll use the normal distribution to find probability. To solve the problem,
we plug these inputs into the Normal Probability Calculator: mean = .5, standard
deviation = 0.04564, and the normal random variable = .4. The Calculator tells us
that the probability that no more than 40% of the sampled births are boys is equal
to 0.014.
We'll start the lesson with some formal definitions. In doing so, recall that we
denote the n random variables arising from a random sample as subscripted
uppercase letters:
X1, X2, ..., Xn
The corresponding observed values of a specific random sample are then
denoted as subscripted lowercase letters:
For example, if μ denotes the mean grade point average of all college students, t
hen the parameter space (assuming a 4-point grading scale) is:
Ω = {μ: 0 ≤ μ ≤ 4}
And, if p denotes the proportion of students who smoke cigarettes, then the
parameter space is:
Ω = {p: 0 ≤ p ≤ 1}
Definition. The function of X1, X2, ..., Xn, that is, the
statistic u(X1, X2, ..., Xn), used to estimate θ is called a point
estimator of θ.
6
For example, the function:
X= ∑
̂ ∑
S2= ∑
Definition. The function u(x1, x2, ..., xn) computed from a set of
data is an observed point estimate of θ.
̅ ∑
is a point estimate of μ, the mean grade point average of all the students in the
population.
And, if xi = 0 if a student has no tattoo, and xi = 1 if a student has a tattoo, then:
̂ =0.11
is a point estimate of p, the proportion of all students in the population who have
a tattoo.
On the previous page, we showed that if Xi are Bernoulli random variables with
parameter p, then:
̂ ∑
7
Definition. If the following holds:
E[u(X1,X2,…,Xn)]=θE[u(X1,X2,…,Xn)]=θ
∑ ∑ ̅
̅ ̂
are the maximum likelihood estimators of μ and σ2, respectively. A natural question then
is whether or not these estimators are "good" in any sense. One measure of "good" is
"unbiasedness."
̂ ∑
Solution. Recall that if Xi is a Bernoulli random variable with parameter p, then E(Xi) = p.
Therefore:
̂ ( ∑ ) ∑ ∑
The first equality holds because we've merely replaced p-hat with its definition. The
second equality holds by the rules of expectation for a linear combination. The third
equality holds because E(Xi) = p. The fourth equality holds because when you add the
value p up n times, you get np. And, of course, the last equality is simple algebra.
In summary, we have shown that:
E( ̂ )= p
8
Solution. Recall that if Xi is a normally distributed random variable with mean μ and
variance σ2, then:
Also, recall that the expected value of a chi-square random variable is its degrees of
freedom. That is, if:
X~
[ ] [ ]
The first equality holds because we effectively multiplied the sample variance by 1. The
second equality holds by the law of expectation that tells us we can pull a constant
through the expectation. The third equality holds because of the two facts we recalled
above. That is:
E[ ]
Example: Let T be the time that is needed for a specific task in a factory to be
completed. In order to estimate the mean and variance of T, we observe a random
sample T1, T2,⋯⋯,T6. Thus, Ti‟s are i.i.d. and have the same distribution as T. We obtain
the following values (in minutes):
18, 21, 17,16, 24, 20.
Find the values of the sample mean, the sample variance, and the sample standard
deviation for the observed sample.
9
The sample variance is given by
S= √ = 2.94
where σ is the standard deviation std(X) being estimated. We don‟t know the
standard deviation σ of X, but we can approximate the standard error based
10
upon some estimated value s for σ. Irrespective of the value of σ, the standard
error decreases with the square root of the sample size m. Quadrupling the
sample size halves the standard error.
We seek estimators that are unbiased and have minimal standard error.
Sometimes these goals are incompatible. Consider Exhibit 4.2, which indicates
PDFs for two estimators of a parameter θ. One is unbiased. The other is biased
but has a
lower
standard
error.
Which
estimator
should we
use?
Exhibit 4.2: PDFs are indicated for two estimators of a parameter θ. One is unbiased.
The other is biased but has lower standard error.
Mean squared error (MSE) combines the notions of bias and standard error. It is
defined as
Since we have already determined the bias and standard error of estimator, calculating
its mean squared error is easy:
Faced with alternative estimators for a given parameter, it is generally reasonable to use
the one with the smallest MSE.
11
References:
The Pennsylvania State University (2018) Probability Theory and Mathematical Statistics
[Available online] Retrieve from:
https://newonlinecourses.science.psu.edu/stat414/node/192/
12
Statistical Intervals
Confidence Intervals
The common notation for the parameter in question is . Often, this parameter is the
population mean , which is estimated through the sample mean .
The level C of a confidence interval gives the probability that the interval produced by
the method employed includes the true value of the parameter .
Example
Suppose a student measuring the boiling temperature of a certain liquid observes the
readings (in degrees Celsius) 102.5, 101.7, 103.1, 100.9, 100.5, and 102.2 on 6 different
samples of the liquid. He calculates the sample mean to be 101.82. If he knows that the
standard deviation for this procedure is 1.2 degrees, what is the confidence interval for
the population mean at a 95% confidence level?
In other words, the student wishes to estimate the true mean boiling temperature of the
liquid using the results of his measurements. If the measurements follow a normal
distribution, then the sample mean will have the distribution . Since the sample
√
size is 6, the standard deviation of the sample mean is equal to 1.2/sqrt (6) = 0.49.
13
equal to 0.05/2 = 0.025.
The value z* representing the point on the standard normal density curve such that the
probability of observing a value greater than z* is equal to p is known as the
upper p critical value of the standard normal distribution. For example, if p = 0.025, the
value z* such that P(Z > z*) = 0.025, or P(Z < z*) = 0.975, is equal to 1.96. For a
confidence interval with level C, the value p is equal to (1-C)/2. A 95% confidence
interval for the standard normal distribution, then, is the interval (-1.96, 1.96), since 95%
of the area under the curve falls within this interval.
For a population with unknown mean and known standard deviation , a confidence
interval for the population mean, based on a simple random sample (SRS) of size n, is
+ z* , where z* is the upper (1-C)/2 critical value for the standard normal distribution.
√
An increase in sample size will decrease the length of the confidence interval without
reducing the level of confidence. This is because the standard deviation decreases
as n increases. The margin of error m of a confidence interval is defined to be the
value added or subtracted from the sample mean which determines the length of the
interval: .
√
In most practical research, the standard deviation for the population of interest is not
known. In this case, the standard deviation is replaced by the estimated standard
deviation s, also known as the standard error. Since the standard error is an estimate for
the true value of the standard deviation, the distributions of the sample mean is no
longer normal with mean , and standard deviation . Instead, the sample mean follows
√
the t distribution with mean , and standard deviation . The t distribution is also
√
described by its degrees of freedom. For a sample of size n, the t distribution will have n-
1 degrees of freedom. The notation for a t distribution with k degrees of freedom is t(k).
As the sample size n increases, the t distribution becomes closer to the normal
distribution, since the standard error approaches the true standard deviation or large n.
For a population with unknown mean and unknown standard deviation, a confidence
interval for the population mean, based on a simple random sample (SRS) of size n, is
t* , where t* is the upper (1-C)/2 critical value for the t distribution with n-1 degrees
√
of freedom, t(n-1).
Prediction Intervals
Predicting the next future observation with a 100(1-α)% prediction interval
Suppose that is a random sample from a normal population. We wish to
predict the value, a single future observation. A point prediction of is the
14
sample mean. The prediction error is . The expected value of the prediction
error is
because the future observation , is independent of the mean of the current sample
. The prediction error is normally distributed. Therefore
which has a t distribution with degrees of freedom. Manipulating Tas we have done
previously in the development of a CI leads to a prediction interval on the future
observation
Definition:
√ √
√ √
√ √
Notice that the prediction interval is considerably longer than the CI.
Tolerance Interval
15
Consider a population of semiconductor processors. Suppose that the speed of these
processors has a normal distribution with mean megahertz and standard
deviation megahertz. Then the interval from 600 - 1.96(30) = 541.2 to 600 +
1.96(30) = 658.8 megahertz captures the speed of 95% of the processors in this
population because the interval from 1.96 to 1.96 captures 95% of the area under the
standard normal curve. The interval from ⁄ ⁄ is called a tolerance
interval
If μ and σ are unknown, capturing a specific percentage of values of a population will
contain less than this percentage (probably) because of sampling variability in x-bar and
s.
A tolerance interval for capturing at least % of the values in a normal distribution with
confidence level 100(1+ ) % is,
where k is a tolerance interval factor found in Appendix Table XI. Values are given for
90%, 95%, and 95% and for 95% and 99% confidence
One-sided tolerance bounds can also be computed. The tolerance factors for these
bounds are also given in Appendix Table XI.
16
Chapter VIII
Statistical Hypotheses
The best way to determine whether a statistical hypothesis is true would be to
examine the entire population. Since that is often impractical, researchers typically
examine a random sample from the population. If sample data are not consistent with
the statistical hypothesis, the hypothesis is rejected.
Hypothesis testing in statistics is a way for you to test the results of a survey or
experiment to see if you have meaningful results. You‟re basically testing whether your
17
results are valid by figuring out the odds that your results have happened by chance. If
your results may have happened by chance, the experiment won‟t be repeatable and so
has little use.
Hypothesis testing can be one of the most confusing aspects for students, mostly
because before you can even perform a test, you have to know what your null
hypothesis is. Often, those tricky word problems that you are faced with can be difficult
to decipher. But it‟s easier than you think; all you need to do is:
3. Taking Vioxx can increase your risk of heart problems (a drug now taken off the
market).
18
How do I State the Null Hypothesis?
1. The significance level is the probability of rejecting a null hypothesis that is correct.
2. The sampling distribution for a test statistic assumes that the null hypothesis is
correct.
19
When a test statistic falls in either critical region, your sample data are sufficiently
incompatible with the null hypothesis that you can reject it for the population.
In a two-tailed test, the generic null and alternative hypotheses are the following:
The specifics of the hypotheses depend on the type of test you perform because
you might be assessing means, proportions, or rates.
In the examples below, I use an alpha of 5%. Each distribution has one shaded
region of 5%. When you perform a one-tailed test, you must determine whether the
critical region is in the left tail or the right tail. The test can detect an effect only in the
direction that has the critical region. It has absolutely no capacity to detect an effect in
the other direction.
20
In a one-tailed test, you have two options for the null and alternative hypotheses,
which corresponds to where you place the critical region.
Or:
21
Again, the specifics of the hypotheses depend on the type of test you perform.
Notice how for both possible null hypotheses the tests can‟t distinguish between
zero and an effect in a particular direction. For example, in the example directly above,
the null combines “the effect is greater than or equal to zero” into a single category. That
test can‟t differentiate between zero and greater than zero.
2. Effects can exist in both directions but the researchers only care about an
effect in one direction. There is no drawback to failing to detect an effect in the other
direction. (Not recommended.)
If your P value is less than the chosen significance level then you reject the null
hypothesis i.e. accept that your sample gives reasonable evidence to support the
alternative hypothesis. It does NOT imply a "meaningful" or "important" difference; that is
for you to decide when considering the real-world relevance of your result.
Type I error is the false rejection of the null hypothesis and type II error is the
false acceptance of the null hypothesis. As an aid memoir: think that our cynical society
rejects before it accepts.
22
The significance level (alpha) is the probability of type I error. The power of a
test is one minus the probability of type II error (beta). Power should be maximised when
selecting statistical methods.
The following table shows the relationship between power and error in
hypothesis testing:
DECISION
H0 = null hypothesis
P = probability
23
beta gets smaller as the sample size gets larger
beta gets smaller as the number of tests or end points increases
Example:
Aircrew escape systems are powered by a solid propellant. The burning rate of
this propellant is an important product characteristic. Specifications require that the
mean burning rate must 50 centimeters per second. We know that the standard
deviation of burning rate is centimeters per second. The experimenter decides to
specify a type I error probability or significance level and selects a random
sample of and obtains a sample average burning rate of centimeters per
second. What conclusions should be drawn?
4. α = 0.05
24
6. Reject H0 if z0 > 1.96 or if z0 < -1.96. Note that this results from step 4, where we
specified α = 0.05, and so the boundaries of the critical region are at z 0.025 = 1.96 and -
z0.025 = -1.96.
8. Conclusion: Since z0 = 3.25 > 1.96, we reject H0: µ = 50 at the 0.05 level of
significance. Stated more completely, we conclude that the mean burning rate differs
from 50 centimeters per second, based on a sample of 25 measurements. In fact, there
is strong evidence that the mean burning rate exceeds 50 centimeters per second.
Example:
The increased availability of light materials with high strength has revolutionized
the design and manufacture of golf clubs, particularly drives. Clubs with hollow heads
and very thin faces can result in much longer tee shots, especially for players of modest
skills. This is due partly to the “spring-like effect” that the thin face imparts to the ball.
Firing a golf ball at the head of the club and measuring the ratio of the outgoing velocity
of the ball to the incoming velocity can quantify this spring-like effect. The ratio of
velocities is called the coefficient of restitution of the club. An experiment was performed
in which 15 drivers produced by a particular club maker were selected at random and
their coefficients of restitution measured. In the experiment the golf balls were fired from
air cannon so that the incoming velocity and spin rate of the ball could be precisely
controlled. It is of interest to determine if there is evidence (with α = 0.05) to support a
claim that the mean coefficient of restitution exceeds 0.82. The observations follow:
The sample mean and sample standard deviation are x = 0.83725 and s =
0.02456. The normal probability plot of the date supports the assumption that the
coefficient of restitution is normally distributed. Since the objective of the experiment is to
demonstrate that the mean coefficient of restitution exceeds 0.82, a one-sided
alternative hypothesis testing is appropriate.
25
2. H0: µ = 0.82
3. H1: µ > 0.82. We want to reject H0 if the mean coefficient of restitution exceeds 0.82.
4. α = 0.05
√
6. Reject H0 if t0 > t0.05,14 = 1.761
8. Conclusions: Since t0 = 2.72 > 1.761, we reject H0 and conclude at the 0.05 level of
significance that the mean coefficient of restitution exceeds 0.82.
Example:
An automatic filling machine is used to fill bottles with liquid detergent. A random
sample of 20 bottles results in a sample variance of fill volume of s 2 = 0.0153 (fluid
ounces)2. If the variance of fill volume exceeds 0.01 (fluid ounces) 2, an unacceptable
proportion of bottles will be underfilled or overfilled. Is there evidence in the sample data
26
to suggest that the manufacturer has a problem with underfilled or overfilled bottles? Use
α = 0.05, and assume that fill volume has a normal distribution.
2. H0; 2
= 0.01
2
3. H1: > 0.01
4. α = 0.05
6. Reject H0 if .
7. Computations:
Example:
2. H0: p = 0.05
27
This formulation of the problem will allow the manufacturer to make a strong
claim about process capability if the null hypothesis H 0: p = 0.05 is rejected.
4. α = 0.05
8. Conclusions: Since z0 = -1.95 < -z0.05 = -1.645, we reject H0 and conclude that the
process fraction defective p is less than 0.05. The P-value for this value of the test
statistic z0 is P = 0.0256, which is less than α = 0.05. We conclude that the process is
capable.
28
References:
Frost J., (2019) One-Tailed and Two-Tailed Hypothesis Tests Explained [Available
online]
29