Hypothesis Tests

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 36

Introduction

Hypothesis tests
Learning outcomes:

• understand the nature of a hypothesis test, the difference between one-tailed and two-tailed tests, and the
terms null hypothesis, alternative hypothesis, significance level, rejection region (or critical region),
acceptance region and test statistic (Outcomes of hypothesis tests are expected to be interpreted in terms
of the contexts in which questions are set.)

• formulate hypotheses and carry out a hypothesis test in the context of a single observation from a
population which has a binomial or Poisson distribution, using

o direct evaluation of probabilities

o a normal approximation to the binomial or the Poisson distribution, where appropriate


Cont’d

• formulate hypotheses and carry out a hypothesis test concerning the population mean in cases where the
population is normally distributed with known variance or where a large sample is used

• understand the terms Type I error and Type II error in relation to hypothesis tests

• calculate the probabilities of making Type I and Type II errors in specific situations involving tests based on a
normal distribution or direct evaluation of binomial or Poisson probabilities.
What is hypothesis testing?
Consider this situation.

An opinion poll asks a sample of voters who will win they vote for if there are two candidates. So e.g. if 53%
voters say they will vote for candidate A, how will you be certain that candidate A will win the election?

Evidence is provided in various courtrooms. A jury then has to make a decision based on this evidence. But
how can a judge be so certain that this evidence is true?

• A Hypothesis test analyses the data to find how likely it is that the results could happen by chance.
Suppose that you have a set of dice numbered 1 to 6. Some of the dice are fair and some are biased. The biased
dice have bias towards the number six.
Now if you pick one of the dice and roll it 16 times. How many times would you need to roll a six to convince
yourself that this die is biased?
It is possible, although, highly unlikely, to roll 16 sixes with 16 rolls of a fair die. If is the number of sixes obtained
with 16 rolls of the die then .
We can calculate theoretical probabilities of rolling different numbers of sixes.
These probabilities tell us that in 16 rolls of a die, almost 89% of the time we would expect to roll at most 4 sixes, and for over
96% of the time we would expect to roll at most 5 sixes. This means that rolling 6 sixes or more will occur by chance less than 4%
of the time.
The shaded regions on these graphs show the regions for the probabilities. The first graph shows and the second shows .
In both graphs, the region to the right of the dotted line represents 5% of the total probability.

The second of the previous two graphs shows that is completely to the right of the dotted line, which we would
expect as is less than 5%. However, the first graph shows the dotted line dividing into two parts, which is as
expected since is greater than 5%.

• If we choose 5% as the critical percentage at which the number of sixes rolled is significant; that is, the
critical value at which the number of sixes rolled is unlikely to occur by chance, then rolling 6 sixes is the critical
value we use to decide if the die is biased.

• The percentage value of 5% is known as the significance level. The significance level determines where we draw a
dotted line on the graph of the probabilities.

• This region at one end of the graph is the critical region. The other region of the graph is the acceptance region.
Key facts

• The significance level is the probability of rejecting a claim. A claim cannot be proven by
scientific testing and analysis, but if the probability of it occurring by chance is very small it is
said that there is sufficient evidence to reject the claim.

• The range of values at which you reject the claim is the critical region or rejection region.

• The value at which you change from accepting to rejecting the claim is the critical value.
Sample data, such as the data you obtain in the dice experiment, may only partly support or refute a supposition or
claim. A hypothesis is a claim believed or suspected to be true.
A hypothesis test is the investigation that aims to find out if the claim could happen by chance or if the probability
of it happening by chance is statistically significant.
It is very unlikely in any hypothesis test to be able to be absolutely certain that a claim is true; what you can say is
that the test shows the probability of the claim happening by chance is so small that you are statistically confident in
your decision to accept or reject the claim.

• In a hypothesis test, the claim is called the null hypothesis, abbreviated to .

• If you are not going to accept the null hypothesis, then you must have an alternative hypothesis to accept; the
alternative hypothesis abbreviation is .

• Both the null and alternative hypotheses are expressed in terms of a parameter, such as a probability or a mean
value.
Example
Experience has shown that drivers on a Formula 1 racetrack simulator crash 40% of the time. Zander decides to run a simulator
training program to reduce the number of crashes. To evaluate the effectiveness of the training program, Zander allows 20 drivers,
one by one, to use the simulator. Zander counts how many of the drivers crash.
Four drivers crash. Test the effectiveness of Zander’s simulation training program at the 5% level of significance.

Solution:
Let be ‘the number of drivers that crash’. Then .

Four drivers crash; you calculate the test statistic, the region

As , the result is not significant. 4 is not the critical value so will be accepted. There is insufficient evidence at the 5% level of
significance to suggest that the proportion of drivers crashing their simulators has decreased.
In this example:
• is the assumption that there is no difference between the usual outcome and what you are
testing.

• since Zander’s training program aims to reduce the probability of crashes.

• Test statistic is the calculated region/probability using sample data in a hypothesis test. You
compare your calculated test statistic with what is the expected outcome from the null
hypothesis.

• In this case, the calculated probability must be less than the percentage significance level to say
there is sufficient evidence to reject the null hypothesis.

• Your conclusion should be in context of the original question.


Alternative case
Suppose only three of the 20 drivers crashed after attending Zander’s simulator training programme. At the 5% level
of significance, what would you conclude?

Solution:

As , the result not significant. 3 is the critical value so will be rejected. There is sufficient evidence at the 5% level
of significance to suggest that the proportion of drivers crashing their simulators has decreased and the simulator
program is effective.
• When you investigate a claim using a hypothesis test, you are not interested in the probability of an exact number
occurring, but the probability of a range of values up to and including that number occurring. In effect that is the
probability of a region. The test statistic is the calculated region using sample data in a hypothesis test.

• The range of values for which you reject the null hypothesis is the critical region or rejection region.

• The value at which you change from accepting the null hypothesis to rejecting it is the critical value.
Example
Studies suggest that 10% of the world’s population is left-handed. Bailin suspects that being left-handed is less common amongst
basketball
players and plans to test this by asking a random sample of 50 basketball players if they are left-handed.

a)Find the rejection region at the 5% significance level and state the critical value.
b)If the critical value is 2, what would be the least integer percentage significance level for you to conclude that Bailin’s suspicion is
correct?

Solution:
c)Let X be the random variable ‘number of basketball players’ and p be the probability of being left-handed; then

Significance level: 5%
The rejection region is such that and

The rejection region is ; the critical value is 1.

b) For Bailin to reject the null hypothesis the probability of the region must be less than the significance level.
, so with a critical value of 2 the significance level needs to be 12% to reject the null hypothesis.
Summary
To carry out a hypothesis test:

• Define the random variable and its parameters.

• Define the null and alternative hypotheses.

• Determine the critical region.

• Calculate the test statistic.

• Compare the test statistic with the critical region.

• Write your conclusion in context.


Note:
The lower the percentage significance level, the smaller the rejection region and the more confident you can be of the result.
Hypothesis testing using approximate distributions
•For a large sample, you can approximate a binomial distribution to a normal distribution and then carry
out a hypothesis test.
Consider this example.
Arra is elected club president with the support of 52% of the club members. One year later, the club
members claim that Arra does not have as much support any more. In a survey of 200 club members, 91 said
they would vote for Arra. Using a suitable approximating distribution, test this claim at the 5% significance
level.
Solution:
Let be ‘the number of people who would vote for Arra’. Then .
and
Hence
The conditions for the binomial are suitable to approximate to a normal distribution.
When stating the null and alternative hypotheses you can use either the parameter or but you must be
consistent for both and .
Cont'd

As , so will be rejected. There is sufficient evidence at the 5% level of significance


to suggest that Arra does not have as much support as he used to.
Example
Past records show that 60% of the population react positively to a cream to treat a skin condition. The manufacturers
of a new cream to treat this condition claim that more of the population will react positively to the new cream. A
random sample of 120 people is tested and 79 people are found to react positively.

a) Does this provide evidence at the 5% significance level of an increase in the percentage of the population that
react positively to this new treatment?
b) Find the critical value.

Solution:
c) Let be ‘the number of people who react positively to the new cream’. Then

Hence
Cont'd

As , so will be accepted. There is insufficient evidence at the 5% level of significance to support the
manufacture’s claim.

b)Let be the critical value, then:

, is the critical value.


The critical value at 5% significance occurs where the probability is less than 5%, or 0.05.
One-tailed and two-tailed hypothesis tests
• We have seen previously about direction of a bias; the die was biased towards colour red. In this case, the critical
region is at one end, or tail of the graph. These are one – tailed tests.
Let’s look at an example of a spinner.
If the claim is that the spinner is not working correctly, rather than the spinner is biased towards a particular colour,
then you will apply a two-tailed test
Here is the working:
Let X be the random variable ‘the number of spins landing on red’ then .

5% level of significance, two-tailed test

; accept the null hypothesis. There is insufficient evidence to support the claim that the spinner is not working
correctly.
The graph shows the distribution of the spinner landing on red when it is spun ten times.

Note that the alternative hypothesis uses to show that neither an increase nor a decrease is expected; and the significance level
percentage has been split to show two critical regions at either end of the graph.
• If a claim is for neither an increase nor a decrease, just a different value of the parameter, then the hypothesis test is two-tailed.
• In a two-tailed test use to express the alternative hypothesis and share the significance level percentage between the two tails;
that is’ between the two critical regions.
Example
In a particular coastal area, on average, 35% of the rocks contain fossils. Jamila selects 12 rocks at random from this area and
breaks them open. She finds fossils in two of the rocks.

a) Test at the 10% level of significance if there is evidence to show the average of 35% is incorrect.
b) How many fossils would Jamila need to find to reject the null hypothesis?
c) Would Jamila’s conclusion be different if the test was one – tailed?

Solution:
Let X be the random variable ‘the proportion of rocks containing fossils’. Then .

Significance level: 10%


Two-tailed test
Cont’d

Therefore, accept the null hypothesis. Jamila must have been unlucky to find so few fossils. She expected to find about four fossils
in 12 rocks . She found only two fossils, yet this still was too many to have any evidence to reject the null hypothesis.

b) and
Since is greater than 5% and is less than 5%, the critical value is 1.

c) Since , the answer is still to accept the null hypothesis: there is insufficient evidence to show that

the average number of fossils is incorrect.


Hypothesis testing with the Poisson distribution

• For a single observation from a population that has a Poisson distribution, we can directly
compare Poisson probabilities or use a normal approximation to the Poisson distribution.

• To carry out hypothesis testing with the Poisson distribution, the process is the same as
before.
Example
Records show that 1% of the population has a positive reaction to a test for a particular allergy. In a village, 120 people are tested
and four people have a positive reaction. Test at the 5% level of significance if there is any evidence of an increase in the
population with a positive reaction for this particular allergy.
Solution:
State the distribution; is large and , so approximate to Poisson.

As , reject the null hypothesis. There is evidence at the 5% level of significance to suggest that in this village there is an increase
in the population who react positively with this particular allergy.
Example
The number of accidents on a stretch of road occurs at the rate of seven each month. New traffic measures are put in place to
reduce the number of accidents. In the following month, there are only two accidents.

a)Test at the 5% level of significance if there is evidence that the new traffic measures have significantly reduced the number of
accidents.
b)Over a period of 6 months, there are 32 accidents. It is claimed the new traffic measures are no longer reducing accidents.
I.Test this claim at the 5% level of significance.
II.What would your conclusion be if you test the claim at the 10% level of significance?

Solution:
a)

As reject the null hypothesis. There is evidence to suggest the new traffic measures reduce the number of accidents.
Cont’d
b. I)

As accept the null hypothesis. There is insufficient evidence to suggest the new traffic measures have reduced the number of
accidents.

b. II) As , reject the null hypothesis. There is evidence at the 10% significance level that the new traffic measures reduce the
number of accidents.
Hypothesis testing of the Population mean
• Sample data are often collected to test a statistical hypothesis about a population. Such a sample, even if
it is a random sample, may or may not be representative of the population.

• Using the central limit theorem, estimates of the sample mean and sample variance can be calculated
from the sample and these estimates can be used to see if they support or reject the null hypothesis.

• For a sample data, ; that is , is referred to as the standard error.


• If the population mean is unknown, but the population variance is known, sample data can be used to carry
out a hypothesis test that the population mean has a particular value, as follows:

For a sample size drawn from a normal distribution with known variance, , and sample mean , the test statistic
is:

where is the population mean.


Example
The masses of cucumbers grown at a smallholding are normally distributed with mean 310 g and standard deviation 22 g.
Producers of a new plant food claim that its use increases the masses of cucumbers. To test this claim, some cucumber plants are
grown using the new plant food and a random sample of 40 cucumbers from these plants are selected and weighed. The mean
mass of these cucumbers is 316 g.
Assuming the standard deviation of the masses of the sample is the same as the standard deviation of the population, test the claim
at a 5% level of significance.
Solution:
Let be the mean mass of cucumbers. Then .

so the masses are in the critical region. We reject , because there is some evidence to support the plant food producer’s claim.
Alternative way of writing the solution

• An alternative approach is to compare the test statistic z with the critical value from tables.

One-tailed test at 5% level of significance, critical value z is .


As , reject . There is some evidence to accept the plant food producer’s claim.
Hypothesis test of population mean using a large sample

• It is possible to carry out a hypothesis test of a population mean when the population variance is unknown.
Provided the sample is large, we follow the same process as for a hypothesis test of a population mean from a
normal population with known variance. For the variance, we use , an unbiased estimate of the population
variance.

If the population mean and population variance are unknown, sample data can be used to conduct a hypothesis
test that the population mean has a particular value, as follows:
For a large sample size drawn with unknown and sample mean , the test statistic is:

where is the population mean and .


Example
A researcher believes that students underestimate how long 1 minute is. To test his belief, 42 students are chosen at
random. Each student, in turn, closes their eyes and estimates 1 minute. The results for their times, seconds, are
summarised as follows:

Investigate at the 10% level of significance if there is any evidence to support the researcher’s claim. What advice
would you give to the researcher based on your findings?
Solution:

As , reject There is evidence to support the researcher’s claim.


The value of the test statistic and the critical value are very close; advise the researcher to do more tests.
Type I and Type II errors
• We have seen previously, in hypothesis testing, null and alternative hypotheses are set up and conclusions are drawn based on
calculations of sample information.
• Consequently, there are two ways to reach an incorrect decision based on the results of calculating probabilities in hypothesis
testing, a Type I error and Type II error.

The table shows how these errors arise.

Accept Reject
The null hypothesis, is true Correct decision Type I error
The null hypothesis, is false Type II error Correct decision
Cont’d
• For example, in UK and many other countries the law states that a person is innocent until ‘proven’ guilty. A
Type I error happens when an innocent person is found guilty; a Type II error happens when a guilty person is
found innocent.

• In both these situations, the evidence was either incorrect or not interpreted correctly.

• E.g. for the experiment to determine whether a die is biased – a Type I error occurs when we conclude the die
is biased given that it is fair; and a Type II error occurs when we conclude the die is fair given that it is
biased.
In statistics, an error may be made if:

• there is bias in the sample data

• the probability model is not the correct model to use

• the chosen significance level is not appropriate for the situation.


Notes:
• There is a greater chance of making a Type II error when the rejection/critical regions in
each tail is small, leading us to accept when it is, in fact, false.

• However, simply changing the significance level to a higher percentage increases the chance
of a Type I error.

You might also like