Unit 4-Hypothesis Testing

HYPOTHESIS TESTING By
Shailaja K.P
A quality control manager must decide whether the machines producing a product
are working properly on the basis of samples taken of the product.
Similarly, if a new drug is introduced to cure a certain disease and this drug is tested
on a sample of patients, a decision must be made based upon the response of these
patients whether the drug should be introduced in the general market or not.
In order to make these statistical decisions, we make certain assumptions about the
population parameters to be tested.
These assumptions or claims are known as hypotheses.
Hypothesis is a premise or a claim that we want to test.
Then, a sample is taken to estimate the values of these population parameters.
If the estimates favor the hypothesis, then we accept the hypothesis as being correct.
If the estimates does not favor the hypothesis made about the same population
parameter, then a decision must be made as to whether this difference is purely a
matter of chance or whether this difference is significant enough so that it is the real
difference and our assumption about the population parameter is not correct.
Since we are testing for our assumptions or hypothesis being correct or not, this field
of decision making is known as Hypothesis Testing.
Hypothesis testing is a form of statistical inference that uses data from a sample
to draw conclusions about a population parameter or a population probability
distribution.
As an example, let us look at the famous murder mystery of “The butler did it….”
Suppose that a detective at the scene of a murder makes some assumptions or inferences
about the murderer, based upon the initial observation and analysis of the scene of crime.
If the detective believes that the victim was struck from behind and by a left-handed
person, then the killer, according to his assumption would be tall and left-handed.
Let us assume that the detective's first suspicion is on the butler.
The detective can assume or hypothesize that the butler is innocent, then check his height
and whether he is left-handed or not and then make a decision.
If the butler is short and right-handed, then the detective will accept his own assumption
or hypothesis that the butler is innocent, as being correct.
If the butler is tall and left- handed then the detective could decide that the butler was
guilty, thus rejecting his own hypothesis that the butler was innocent.
However, it is quite possible that the butler could be tall and left-handed and still may not
have committed the murder.
On the other hand, the butler could be short and right-handed and may have
committed the murder by planning it to look this way.
Hence the detective could still make the mistake of accepting his assumption when it
was not correct or of rejecting his assumption when in fact it was correct.
Hence, there are four possibilities that must be considered. These are:
a) The butler is accused by the detective and, in fact, he did commit the crime -a
correct decision.
b) The butler is accused by the detective when in fact, he did not commit the crime
an incorrect decision.
c)The butler is considered innocent and he, in fact, did not commit the crime -a right
decision.
d) The butler is considered innocent and in fact, he did commit the crime - a Wrong
decision
These conditions and the decisions can be summarized in a tabular form as follows:
Null Hypothesis
A claim or hypothesis about the currently accepted values or population parameters
is known as the Null Hypothesis and is written as H0.
For the example discussed, our assumption that the butler is innocent would form the
null hypothesis and would be stated as follows:
H0= The butler is innocent
This hypothesis is then tested with the available evidence and the decision is made
whether to accept this hypothesis or reject it.
If this hypothesis is rejected, then we accept the alternate hypothesis which is, that
the butler is not innocent.
This alternate hypothesis is denoted as H1 and is stated as:
H1 = The butler is not innocent
The process involves testing of the null hypothesis.
If the null hypothesis is rejected, then the alternate hypothesis is accepted.
There are two types of errors that can be committed in making decisions regarding
accepting or rejecting the null hypothesis.
The first type of error known as Type I error is committed when the null hypothesis
is rejected when in fact it is true.
The second type of error, known as Type II error is committed when a null
hypothesis is accepted when in fact it was not true and should have been rejected.
1. Null hypothesis H0. An assertion about the population parameter that is being tested
by the sample results.
2 Alternate hypothesis H1. A claim about the population parameter that is accepted
when the null hypothesis is rejected.
3. Type I error. An error made in rejecting the null hypothesis, when in fact it is true.
4. Type II error. An error made in accepting the null hypothesis, when in fact it is
false.
Type I error is denoted by α (Alpha) and is expressed as a probability of rejecting a true
hypothesis.
It is also known as the level of significance.
(1 - α) expresses the level of confidence.
For example, α = 0.05 means confidence level is 95% or 0.95.
Type II error is denoted by β (Beta) and is expressed as the probability of accepting a
false hypothesis.
Procedure For Hypothesis Testing
The general procedure for hypothesis testing consists of the following steps
1. State the null hypothesis as well as the alternate hypothesis. This means stating
the assumed value of the population parameter which is to be tested. For example,
suppose that we want to test the hypothesis that the average IQ of our college
students is 130. Then this would become our null hypothesis and the alternate
hypothesis would be that this average IQ is not 130. These statements are expressed
as follows
2. Establish a level of significance prior to sampling. The level of significance
signifies the probability of committing Type I error α(alpha) and is generally taken
as equal to 5%(α = 0.05), or 1% (α= 0.01).
For Eg., if we select 5% level of significance, we will expect that the probability of
making an error of rejecting the hypothesis when it is true is 5%. In other words we
are about 95% confident that we will make a correct decision, although we could be
wrong with a probability of 5%.
It is at the discretion of the investigator to select its value, depending upon the
sensitivity of the study.
3. Determine a suitable test statistic. This means the choice of appropriate
probability distribution to use with the particular available information under
consideration. The normal distribution using the Z score table or the t-distribution is
most often used.
4. Define the rejection (critical) regions. The critical region will be established on
the basis of the choice of the value of the level of significance α(alpha). For
example, if we select the value of α = 0.05, and we use the standard normal
distribution as our test statistic for testing the population parameter , then the
difference between the assumption of null hypothesis and the value obtained by the
analysis of sample results is not expected to be more than 1.96 at  =0.05
This relationship can be shown by the following diagram.
In the above diagram, if the sample statistic falls within 1.96 of the assumed
value of under the assumption of null hypothesis H 0, then we accept the null
hypothesis as being correct at 95% confidence level (or 0.05 level of significance).
The difference between and which may be any value between X1 and or X2 and is
considered to be accidental or due to chance element and is not considered
significant enough or real enough to reject null hypothesis, so that for all practical
purposes the value of is considered equal to even though can have any value
between X1and X2 as shown above.
However, if the value of falls beyond X2 on the upper side or beyond X1 on the
lower side, then this difference between the values of and would be considered
significant and it will lead to rejection of null hypothesis.
This area of rejection is known as the critical region.
5. Data collection and sample analysis, This involves the actual collection and
computation of the sample data. A sample of the pre-established size n is collected
and the estimate of the population parameter is calculated. This estimate is the value
of the test statistic. For example, if we are testing a hypothesis about the value of
population mean , then the test statistic would be the sample mean . Then we test
this statistic to see if it falls in the critical region or in the region of acceptance. For
instance, if we want to test for the average IQ of the college students to be 130, then
that would be our population mean to be tested. We take a random sample of a given
size n and calculate its mean and then we test it to see if the value of this falls in
the area of acceptance or in the area of rejection at a given level of significance.
6. Making the decision. Before the statistical decision is made, a decision rule must
be established. Such decision rule will form the basis on which the null hypothesis
will be accepted or rejected. This decision rule is really a formal statement of the
obvious purpose of the test. For example, this rule could be:
Accept the null hypothesis if the value of sample statistic falls within the area of
acceptance, otherwise reject the null hypothesis.
Based upon this established decision rule, a decision can be made whether to accept
or reject the null hypothesis.
One Tail And Two Tail Tests
So far we have discussed situations in which the null hypothesis is rejected if the
sample statistic is either too far above or too far below the population parameter
which means that the area of rejection is at both ends (or tails) of the normal curve.
For example, if we are testing for the average IQ of the college students being equal
to 130, then the null hypothesis H0 = 130 will be rejected if a sample selected gives
mean which is either too high or too low compared to . This can be expressed
follows
This means that with  = 0.05 (or 95% confidence interval), the value of must be
within 1.96 , , of the assumed value of under H 0 in order to accept the null
hypothesis.
In other words
must be less than 1.96.

It means that: At = 0.05, accept H0 if critical ratio CR falls within 1.96 and reject
H0, if CR is less than (-1.96) or greater than (+1.96).
If it happens to be exactly 1.96 then we can accept H 0.
On the other hand, there are situations in which the area of rejection lies entirely on
one extreme of the curve, which is either the right end of the tail or the left end of the
tail.
Tests concerning such situations are known as one-tailed tests, and the null
hypothesis is rejected only if the value of the sample statistic falls into this single
rejection region.
For example, let us assume that we are manufacturing 9 volt batteries and we claim
that our batteries last on an average () 100 hours.
If somebody wants to test the accuracy of our claim, he can take a random sample of
our batteries and find the average ( ) of this sample.
He will reject our claim only if the value of so calculated is considerably lower
than 100 hours, but will not reject our claim if the value of is considerably higher
than 100 hours.
Hence in this case, the rejection area will only be on the left end tail of the curve.
Similarly, if we are making a low calorie diet ice cream and claim that it has on an
average only 500 calories per pound and an investigator wants to test our claim, he
can take a sample and compute .
If the value of is much higher than 500 calories, then he will reject our claim.
But he will not reject our claim if the value of is much lower than 500 calories.
Hence the rejection region in this case will be only on the right end tail of the curve.
These rejection and acceptance areas are shown in the normal curves as follows:
Tests Involving A Population Mean (Large Sample)
This type of testing involves decisions to check whether a reported population mean
is reasonable or not, compared to the sample mean computed from the sample taken
from the same population.
A random sample is taken from the population and its statistic is computed.
An assumption is made about the population mean as being equal to the sample
mean and a test is conducted to see if the difference ( - ) is significant or not.
(This difference is not significant if it falls within the acceptance region and this
difference is considered significant if it falls within the rejection region or the critical
region at a given level of significance ).
It must also be noted that if population is not known to be normally distributed then
the sample size should be large enough, generally more than 30.
However, if population is known to be normally distributed and the population
standard deviation known then even a smaller sample size would be acceptable.
Example- A principal at a certain school claims that the students in his school are above
average intelligence. A random sample of thirty students IQ scores have a mean score of
112.5. Is there sufficient evidence to support the principal’s claim? The mean population IQ
is 100 with a standard deviation of 15.
Step 1: State the Null hypothesis. The accepted fact is that the population mean is 100, so:
H0: μ = 100.
Step 2: State the Alternate Hypothesis. The claim is that the students have above average IQ
scores, so:
H1: μ > 100.
The fact that we are looking for scores “greater than” a certain point means that this is a one-
tailed test.
Step 3: Draw a picture to help you visualize the problem.
Step 4: State the alpha level. If you aren’t given an alpha level, use 5% (0.05).
Step 5: Find the rejection region area (given by your alpha level above) from the z-table. An
area of .05 is equal to a z-score of 1.65.
Step 6: Find the test statistic using this formula:
For this set of data: z= (112.5 – 100) / (15/√30) = 4.56

Step 7: If Step 6 value is greater than Step 5, reject the null hypothesis. If it’s less than Step
5, you cannot reject the null hypothesis. In this case, it is more (4.56 > 1.645), so you can
reject the null.
M, ,M
Example- Blood glucose levels for obese patients have a mean of 100 with a standard
deviation of 15. A researcher thinks that a diet high in raw cornstarch will have a positive or
negative effect on blood glucose level. A sample of 30 patients who have tried the raw
cornstarch diet have a mean glucose level of 140. Test the hypothesis that the raw cornstarch
had an effect.
Solution - = 100,  =0.05, =140, n =30 and σ = 15
Example- Claim =75.
Test the claim if  =0.05, = 73.6, σ = 3.1 and n=80.
Tests involving A single proportion
So far, we have dealt with the population parameter which reflects quantitative data.
It cannot be used for qualitative data.
For such qualitative data, the parameter of interest is the population proportion
favoring one of the outcomes of the event.
For e.g., if a politician claims that 60% of the population supports his viewpoint on a
given issue, we can test this claim by taking random samples of people and asking
their opinions about this politician and finding the percentage of people on an average
who support the viewpoint of this politician and then test whether this sample
percentage is significantly different than his claim of population percentage.
The mortgage department of a large bank is interested in the nature of loans of first-
time borrowers. This information will be used to tailor their marketing strategy. They
believe that 50% of first-time borrowers take out smaller loans than other borrowers.
They perform a hypothesis test to determine if the percentage is the same or different
from 50%. They sample 100 first-time borrowers and find 53 of these loans are
smaller than other borrowers. For the hypothesis test, they choose a 5% level of
significance.
Vxcvx
The Sponsor of a TV show believes that his studio audience is divided equally between men
and women. Out of 400 persons attending the show one day, there were 230 men. At 5%
level of significance, test if the belief of the sponsor is correct.
H0 : П= 0.5
H1 : П ≠ 0.5
the sample proportion p = x/n = 230/400 = 0.575
population proportion П = 0.5
Example
Test the hypothesis that more than 30% of U.S household have internet access(with a
significance level of %5). We collect the sample of 150 household and find that 57
have access.
H0 : P ≥ 0.3
H1 : P <0.3
Z computed = 2.14
Z from table = 1.65
Chi Square Test ( )
So far, our estimate of population parameters or comparison of sample and
population characteristics have involved certain assumptions about the populations
from which the samples have been drawn.
In most statistical tests, we have based our decisions on the assumption that the
population was normally distributed.
Even in the case of the binomial distribution, we approximated it to a normal
distribution so that Z-score test could be used to make certain decisions.
When this assumption about the population cannot be made, then it becomes
necessary to use other procedures.
One of the tests used in such situations is known as Chi Square ( )Test
Chi-square test is am important test amongst the several tests of significance
developed by statisticians.
It was developed by Karl Pearson in 1900.
Chi-Square test is a non-parametric test not based on any assumptions or distribution
of any variable.
This statistical test follows a specific distribution known as chi-square distribution.
In general, the test we use to measure the difference between what is observed and
what is expected according to an assumed hypothesis is called chi-square test.
distribution has the following properties.
1. It involves squared observations and hence it is always positive. Its value is
always greater than or equal to zero.
2. The distribution is not symmetrical. It is skewed to the right so that its Skewness
is positive. However, as the number of degrees of freedom increases, Chi-square
approaches a symmetric distribution.
The estimates of degree of freedom for distribution is determined by the number of
categories in which various attributes of the sample are placed so that if there are n
number of categories, then the degrees of freedom would be (n-1).
The following graph illustration shows the family of curves with varying degrees
of freedom and it can be seen that as the number of degrees of freedom increases
distribution approaches the normal curve.
The test is used to test whether there is a significant difference between the
observed number of responses in each category and the expected number of
responses for such category under the assumptions of null hypothesis.
In other words, the objective is to find out how well the distribution of observed
frequencies (fo) fit the distribution of expected frequencies (f e).
Hence this test is also called goodness-of-fit test.
This test can best be illustrated with an example.
Suppose that we take a sample of 200 businessmen from New York, out of which
100 believe that the economic conditions would improve, then we take a sample of
300 businessmen from Washington out of which 100 believe that the economic
conditions would improve and we take a sample of 250 businessmen from Chicago
and 120 of them believe that economic conditions would improve.
Now we want to test whether there is a significant difference between the opinions of the businessmen
from these three different cities at a given level of significance.
The null hypothesis will assume that there is no difference between opinions of businessmen in these
three cities, which can be considered as categories.
The alternative hypothesis would be that all these categories are not similar.
The random variable whose sampling distribution is approximated by distribution is given by
Where
fo = observed frequency of responses in a given category
fe = expected frequency of responses in the same category under the assumption of the null
hypothesis.
The calculated value of is then compared with the critical value of from the table
with pre-established value at the level of significance  and the appropriate value of
the degrees of freedom.
The degrees of freedom (df) are calculated as follows:
 df = (k - 1), where k is the number of categories in one-sample test
 df = (k- 1)(r- 1), where k is the number of columns and r is the number of in a
cross-classification table (known as contingency table) for various categories of two
or more independent samples.
As stated before, a distribution is a family of distributions with a different
distribution for each value of (df).
One-sample Test
Assume that a die was rolled 30 times to check if the die was fair or loaded. If it is
fair and balanced die, then we should expect each face to come up five times since
the probability (p) of each face of fair die coming up is l/6 and the expected value of
each face coming up in 30 rolls is = np = (30 x 1/6) = 5.
In the experiment conducted, the actual number of times each face came up in
sequence is as follows:
4, 7, 3, 6, 8, 2
A comparison of observed frequency and expected frequency of each face coming
up in 30 rolls is tabulated below.
Steps involved in the Process
Step 1. State the null hypothesis and the alternate hypothesis.
H0 : All faces are equally likely to occur. In other words,
p1=p2=p3=p4=p5=p6
H1 : All probabilities are not equal or at least two of the probabilities differ from each other.
Under null hypothesis, all proportions must be equal. Even if one of these proportions is not
equal to any of the others, the null hypothesis cannot be accepted.
Step 2. A level of significance is selected.
Assume α= 0.05. This is the probability of making type I error. This means that when α = 0.05,
we will be making an error of rejecting the null hypothesis when in fact it is true, 5% of the
times.
Step 3. Calculate the expected frequency fe for each category.
In our case fe = 5
Step 4. Use an appropriate test statistic.
In our case, test is selected because we are comparing observed frequencies with
expected frequencies in discrete categories. (The categories are the six faces of the
die.)
test measures the discrepancy between the observed values and expected values
for decision making purposes about the null hypothesis, so that:
Step 5 A decision rule is formulated.
We check the critical value of from the table against α= 0.05 and df = (k - 1) = (6-
1) =5.
This value is given as 11.07.
We compare our computed value of with the critical value of from, the table.
Since our computed value of = 5.6 is less than the critical value of = 11.07, we
cannot reject the null hypothesis.
The following diagram of distribution illustrates this point
Example
Suppose that 60 children were asked as to which ice-cream flavour they liked out of the three flavours
of vanilla, strawberry and chocolate. The answers are recorded as follows:
Our objective is to determine whether children favour any particular flavour compared to other flavours.
Solution
The null hypothesis states that there is no difference among the tastes of children as far as the ice-cream
flavours are concerned.
Under the null hypothesis, equal number of children are expected to prefer each flavour.
This means that the expected frequencies should be 20 for vanilla, 20 for strawberry and 20 for
chocolate.
If we assume the level of significance α = 0.05 and knowing the degrees of freedom
df = (k- 1), where k is the number of categories which in our case is 3 so that df =
(3 - 1) = 2, then we can compare our computed value of with critical value of
from the table at α= 0.05 and df = 2 and reach a decision whether to accept or reject
the null hypothesis.
The critical value of is given as = 5.991l.
Since our computed value of is less than the critical value of ,we cannot reject
the null hypothesis.
Test-Contingency Tables
In the previous section, we discussed test for goodness-of-fit for a single trait only.
However, we may be confronted with a situation where we want to determine the significance
of differences in characteristics between two or more groups.
For example, we may want to test if there are any significant differences between males and
females in adjustment to old age where adjustment can be classified into categories of good,
fair, average, poor and so on.
The data concerning the relative frequencies with which the group members fall into various
categories, is then presented in the form of a table consisting of rows and columns and the
format is known as the contingency table.
The rows and columns are used to summarize and display the results of data collected and are
categorized on the basis of classification of categories.
Testing Hypothesis For Independence Of Two Categories
Although test is used as goodness-of-fit test, it is most often used as a test of
independence to determine if the paired observations obtained on two or more
nominal variables are independent of each other or not.
It is sometimes necessary to deal with the idea of two variables being related to one
another in the sense that the value of one variable depends upon the value of the
other corresponding variable.
For example, if testing for the degree of association between the height and the
weight of persons, it would be easy to recognize that tall people are expected to
weigh more than the short people.
Hence the variables height and weight are not independent of each other.
Similarly, the variables of income and education seem to be related to each other.
Similarly, there may or may not be any association between the opinion on nuclear
disarmament and the gender of the person. Such dependence or independence can be
tested by test.
For example, in testing for any relationship between the opinion on nuclear
disarmament and the gender of the person, let us assume that 100 persons including
60 males were asked about their opinion and their responses were classified into two
categories yes and no as follows:
Now, if the opinions were totally independent of the gender then same proportion of
males and females would favour nuclear disarmament.
The above table shows that 60 out of 100 (or 60%) of the persons surveyed favoured
nuclear disarmament.
Under the null hypothesis, if the opinions are independent of sex, then 60% of male as
well as 60% of female are expected to favour disarmament.
This means
These expected frequencies are calculated as follows:

Let us call the small rectangles with observed frequencies as cells.
Then the expected frequency for each cell can be calculated by multiplying the total of
the row containing that cell with the total of the column containing that cell and
dividing this product by the grand total of the table.
For example, if we wanted to know the expected frequency of the first cell in the
North-West corner of the table representing males who favour disarmament, then the
total of the row is 60 and the total of that column is 60 and the grand total of the table
is 100, and hence the:
Thus the frequency of each cell can be calculated.
These expected frequencies are then written on the right hand corner (or any corner)
in a small rectangle to identify these as expected frequencies against the observed
frequencies which are written in the middle of the cell as follows:
This must be clear that the sum of the expected frequencies in each category must be
the same as the sum of observed frequencies.
Then the null hypothesis for independence can be tested by:
Solving the above problem we get the value of as
Looking at the value of from the table for 95% confidence (or α =0.05) and df =
(2- 1)(2- 1) = 1, we get
Since our calculated value of is less than the critical value of =3.841, we
cannot reject the null hypothesis that the opinion is independent of sex.

Unit 4-Hypothesis Testing

Uploaded by

Copyright:

Available Formats

You might also like

Unit 4-Hypothesis Testing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 4-Hypothesis Testing

Uploaded by

Copyright:

Available Formats

HYPOTHESIS TESTING By

must be less than 1.96.

For this set of data: z= (112.5 – 100) / (15/√30) = 4.56

These expected frequencies are calculated as follows:

You might also like