Professional Documents
Culture Documents
03 Hypothesis Testing
03 Hypothesis Testing
Postgraduate Program
Department of Geomatics Engineering
Faculty of Civil, Planning, and Geo Engineering
Institut Teknologi Sepuluh Nopember
Sample distribution
I If many different samples are selected from the same population we get a range
of different sample means.
The sampling distribution of the mean is the probability distribution for all
possible values of the sample mean, x̄.
If we take enough samples, the mean of these sample means will tend towards the
population mean, i.e.:
E[x̄] = µ (1)
or, the expected value of x̄ = the population mean.
The standard deviation of all the sample means measures the dispersion of all
possible values of x̄ for all possible samples.
The standard deviation of all the sample means is called the standard error of the
mean, and is defined by: r
N −n σ
σx = √ (2)
N −1 n
where σ is the standard deviation of the population being sampled, n is the sample
size, and N is the population size (often unknown).
If n/N < 0.05, i.e. the sample size is much smaller than the population size, then the
eq. (2) reduces to:
σ
σx = √ (3)
n
From now on, we will assume that the n/N < 0.05 condition is always met.
Assume that many random samples of size n = 49 are to be taken from a large
population with mean µ = 100 and standard deviation σ = 21. What are the mean
and standard deviation of the values of all the sample means?
Assume that many random samples of size n = 49 are to be taken from a large
population with mean µ = 100 and standard deviation σ = 21. What are the mean
and standard deviation of the values of all the sample means?
We know that repeating the sampling process will generate different sample means
due to the different samples selected. The mean of these x̄ values is E[x̄] = µ = 100.
Since the population is large relative to the sample, the standard deviation of the x̄
values is:
σ 21
σx = √ = √ = 3
n 49
This is true for any population distribution, not just normally-distributed ones.
Furthermore, if the population being sampled itself has a normal distribution, then
the sampling distribution of the mean is normally distributed for any value of n.
Now we can ask: What is p(x < 1340)? (The red-shaded area).
-IM Anjasmara, 2022-
We can use the sampling distribution of the mean to compute the probability of
selecting a sample that will provide a value of x̄ within any specified distance from
the population mean.
We can use the sampling distribution of the mean to compute the probability of
selecting a sample that will provide a value of x̄ within any specified distance from
the population mean.
And now we can find p(x < 1340) = p(z < −1.4)
-IM Anjasmara, 2022-
For the Statistics mid-semester test the mean score of 85 students was 48% with a
standard deviation of 25%. What is the probability that the mean of a sample of 30
students will be greater than 55% ?
From the standard normal tables, the area enclosed between z=0 (the mean) and
z=1.53 is A=0.4370:
H0 : innocent
i.e., the null hypothesis states that he will still be innocent after the trial, preserving
the status quo.
Think in terms of a courtroom:
I the null hypothesis is like the defence lawyer, pleading innocence;
I the alternative hypothesis is like the prosecution lawyer, attempting to prove
guilt.
-IM Anjasmara, 2022-
I When testing a hypothesis, the aim is to use sample data to refute the null
hypothesis (and not to prove the alternative hypothesis). Then, if there is any
doubt as to the validity of the alternative hypothesis, we revert back to the null
hypothesis.
I In mathematical terms, the null hypothesis is a statement of a population, and
not sample, statistic, because the population statistic is a representation of the
accepted wisdom, while the sample statistic is a representation of the new
evidence. In this chapter, null hypothesis statements will concern µ (and never
x̄).
Because our results are based only on a sample then our decision to reject or accept
a hypothesis may be incorrect.
There are two types of errors:
I Type I error: reject H0 when it is true (being too sceptical)
I Type II error: accept H0 when it is false (being too gullible)
H0 true H0 false
Accept H0 Correct decission Type II error
Reject H0 Type I error Correct decission
A type II error is more difficult to detect than a type I error. Recall, this is when we
accept H0 when it is false (i.e., we free the guilty man).
In an experiment, if we accept H0 it may be because H0 is actually true, but it may
also be because we did not have enough evidence to reject it. This latter case is like
the police bungling an investigation, so that the jury have no choice but to free the
guilty man.
To avoid making a type II error, we say “do not reject H0 ” rather than “accept H0 ”.
I This statement does not discard the possibility that we have mistakenly
accepted H0 .
I It embodies the subtle difference between saying “this man is innocent” and
“we cannot prove this man guilty”.
So a statistical test can either reject or fail to reject a null hypothesis, but can never
prove it.
The value of α gives the area under the probability distribution curve corresponding
to the probability of making a type I error. An example for the normal distribution is:
As a null hypothesis can never be rejected with 100% certainty, we test at various
levels of significance:
a small value of α means a small chance of making the wrong decision, and thus a
large chance of making the right decision.
This is the statement of the “new” result, i.e., the result that is contended to alter
the status quo; or, the case for the prosecution.
Decide what we are testing for:
I are we testing whether the new results are “less than” or “greater than” the
established results (use < or > in the formulation for Ha );
I or whether they are “just different” (use 6= in the formulation for Ha )?
The clue comes from the wording of the question. If the question doesn’t specifically
state “less than” or “greater than” (or wording to that effect), use 6=.
This will be the opposite or inverse of the alternative hypothesis; i.e., what is the
status quo?
Use the opposite sign to Ha (≥ , ≤ , or = ):
In summary, a hypothesis test concerning the value of a population mean (µ) can
take one of three forms:
H0 Ha
µ ≤ µ0 µ > µ0
µ ≥ µ0 µ < µ0
µ = µ0 µ 6= µ0
It is known that a certain quantity has a value µ0 . Recent tests find that this
quantity actually has a value x̄.
Recall, the significance level is the probability of rejecting a true null hypothesis. This
value is usually given to you as either a fraction between (but not including) 0 and 1.
If it is not given, it is up to you to choose a value. Common choices are α= 0.1,
0.05, 0.01 and 0.001, but 0.05 is most widely used.
The value of α equals the area of the rejection region: this is the part of the normal
distribution where the sample data indicate that H0 should be rejected.
This critical value will be used to test the null hypothesis - see Step 7.
For a given value of α the value of zα will depend on whether we have a 2-tailed or
1-tailed test:
I in a 1-tailed test, all of α is in one rejection region, so find zα
I in a 2-tailed test, α is split into two rejection regions, each one with area α/2, so
find zα/2
Its location is determined by the form of the alternative hypothesis (<, >, or 6= ):
1-tailed:
H0 : µ ≤ µ0
Ha : µ > µ0
1-tailed:
H0 : µ ≥ µ0
Ha : µ < µ0
-IM Anjasmara, 2022-
2-tailed:
H0 : µ = µ0
Ha : µ 6= µ0
i.e., plot the position of z on the z-axis, and check its position relative to zα and the
rejection region:
I if z lies in the rejection region, reject H0 ;
I if z does not lie in the rejection region, do not reject H0 .
Always state the significance level at which you make your decision.
Step 2
Step 3
Determine level of significance: We are told that the confidence level is 99%,
therefore α = 0.01.
Step 4
Determine the rejection region: The null hypothesis will be rejected if µ < 25, so we
have the following situation:
Since we are testing µ < 25, we are in the LHS of the normal curve, therefore the
rejection region is z < –2.33. -IM Anjasmara, 2022-
Step 7
Compare the test statistic against its critical value: –2.21 > –2.33, therefore z, and
hence x̄, the sample mean, do not lie in the rejection region.
Hence, we do not reject H0 at the 0.01 significance level.
Step 8
Our sample measurement is compatible with the supposed population mean at 99%
confidence level. Therefore it follows that the true mean is not less than 25 vehicles
per 30 seconds. -IM Anjasmara, 2022-
The value of a well-observed angle was known to be 30◦ 150 3000 . A new theodolite
was tested against this angle for calibration. A sample of 36 arcs produced a mean
of 30◦ 150 3200 , with an SD of 6”. Is this value significantly different from the standard
value at the 5% level of significance?
The value of a well-observed angle was known to be 30◦ 150 3000 . A new theodolite
was tested against this angle for calibration. A sample of 36 arcs produced a mean
of 30◦ 150 3200 , with an SD of 6”. Is this value significantly different from the standard
value at the 5% level of significance?
Take 30◦ 150 3000 as the population mean. We therefore want to test whether the
sample mean from the new data (30◦ 150 3200 ) indicates that this value is incorrect.
We have: µ = 30◦ 150 3000 , s = 6”, x̄ = 30◦ 150 3200 , n = 36, α = 0.05.
CI also represents the region in which we are (1 − α)% likely to find the population
mean, µ:
CI = x̄ ± zα/2 σx̄
or
p(x̄ − zα/2 σx̄ ≤ µ ≤ x̄ + zα/2 σx̄ ) = 1 − α
This means that we don’t need to know µ in order to determine the confidence
interval.
For instance, for α = 0.05, the CL = 95%, and zα/2 = ±1.96:
I The quantity zα/2 σx̄ is called the margin of error. This is not the precision of
the data (which is just σx̄ ); rather it gives the maximum allowable error.
I It is obviously desirable to have a low margin of error, because a low margin of
error indicates that we have pinned down the mean quite precisely.
However, a low margin of error implies a low zα/2 , and thus a low confidence
level. Conversely, a high confidence percentage gives a large margin of error
(because zα/2 is larger).
I So having high confidence does not imply that we have good data: it just
means that we have allocated a wider range in which to place the measurement.
Since the confidence interval can give a range of values where we are (1 − α)% likely
to find µ or x̄, we can use it to perform a 2-tailed hypothesis test (but not 1-tailed).
Obviously, real-world examples would never have a zero significance level. But this
example above shows that as α decreases, it becomes harder to reject H0 .
Now consider:
H0 : µ = µ0
Ha : µ 6= µ0
By analogy with the 8 steps to hypothesis testing for a 2-tailed test, we can see that
the do-not-reject H0 region is given by:
µ0 ± zα/2
So if the sample mean does not fall in this region, we must reject H0 .
-IM Anjasmara, 2022-
We have: µ = 30◦ 150 3000 , s = 6”, x̄ = 30◦ 150 3200 , n = 36, α = 0.05.
We can use the confidence interval method because this is a 2-tailed test.
The sample mean (30◦ 150 3200 , the green arrow) does not lie within this interval.
Hence, we reject H0 with 95% confidence.
Our sample measurement is incompatible with the supposed population mean at 0.05
significance. Therefore it follows that the true mean is not 30◦ 150 3000 at this level.
The sample mean (30◦ 150 3200 , the green arrow) does lie within this interval. Hence,
we do not reject H0 with 99% confidence.
Our sample measurement is now compatible with the supposed population mean at
0.01 significance. Therefore it follows that the true mean is 30◦ 150 3000 at this level.
I The data haven’t changed, but the outcomes have. In Example 1 we rejected
H0 , but in Example 2 we didn’t.
I Whereas in Example 1 we had a 5% chance of mistakenly rejecting H0 , in
Example 2 we only had a 1% chance of mistakenly rejecting it. So in Example 2
we “accepted” H0 , not because the data got better or worse, but because we
were allowed more freedom via a larger margin of error.
I In terms of statistical theory, while setting a low significance level means a low
probability of mistakenly rejecting H0 (making a type I error), it raises the
probability of making a type II error, i.e., mistakenly “accepting” H0 (and thus
accepting any old rubbish!).