Advanced Statistics Hypothesis Testing

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

CHAPTER 2

HYPOTHESIS TESTING

Hypothesis is a general statement regarding certain descriptions or conditions about


the subject under consideration. It is an assumption, a conjuncture or inferential
statements concerning a quantitative characteristics of a population involved. It is a
statement of claim or assertion about a population parameter or causal relationship
among group of subjects based on the merits of sample information.

Hypothesis testing is a process of finding enough evidence to conclude whether the


rejection or non-rejection of a belief or hypothesis about a parameter is reasonable.

Ten Steps of Hypothesis Testing

1. Formulate or establish the null hypothesis. (Ho)


2. Formulate or establish the alternative hypothesis. (Ha)
3. Determine the type of hypothesis.
4. Determine the level of significance/degrees of freedom.
5. Determine the appropriate test statistic.
6. Select the appropriate formula.
7. Calculate the test statistics value.
8. Determine the tabular or critical values.
9. State the decision.
10. Make the statistical conclusion

Formulation of Hypothesis

Types of Hypothesis (Step 1&2)

1. Null hypothesis “Ho” is the hypothesis you hope to reject. This is the hypothesis used for testing
and is the starting point of the testing process. The null hypothesis must always express the idea
of a no significant difference or relationship.

2. Alternative hypothesis “Ha” is opposite the null hypothesis . It specifies the existence of a
difference or relationship.
How to establish the null & alternative hypotheses

Forms of establishing hypotheses

1. Statement form is the literal or textual method of formulating hypotheses.

2. Quantitative form is a numerical method of expressing the mathematical relationship of the


hypotheses (commonly using equality and directional inequality such as > or < or ≠).

For symbolism purposes, we will make use of the following:

Parameter Sample
Mean µ (mu) X
Standard deviation σ s
Formulation of Hypothesis Involving Single Sample

An inference based on a single sample is to show how to make use of the information taken from
the sample to test whether a population parameter is equal to, less than or greater than a specified
value.
Ho: µ = µo µ ≥ µo µ ≤ µo
Ha: µ ≠ µo µ < µo µ > µo

Formulate the null and alternative hypothesis in each of the following cases:

Example 1: A scientific journal on corn farming published that with the use of new technology, the
average yield of corn per hectare is 1 500 cavans A sample of 115 corn producers in
Cagayan showed an average yield of 1 450 cavans with a standard deviation of 105
cavans. Can we assume that the yield of corn in Cagayan is significantly different from
the research finding?
Solution:

Ho: There is no significant difference in the average yield of corn in Cagayan from the research
finding.
Ho: µ = 1 500 (quantitative form)

Ha: There is a significant differences in the average yield of corn in Cagayan from the research
finding.
Ha: µ ≠ 1 500

Example 2: A liquor company claims that the new billboards display featuring a well-known actress
will increase product sales in retail stores an average of 20 1 – liter bottle in a week.

Ho: There is no significant increase in the product sales of retail stores.


Ho: µ ≤ 20 1 – liter

Ha: There is a significant increase in the product sales of retail stores.


Ha: µ > 20 1 – liter
Seatwork:

I. Determine whether or not the pair would be appropriate for a hypothesis test. Answer Yes or No

1. Ho : µ = 100, Ha : µ ≠ 100

2. Ho : µ = 16, Ha : µ ≤ 16

3. Ho : µ ≥ 73, Ha : µ < 73

4. Ho : µ ≠ 30, Ha : µ ≠ 30

5. Ho : µ > 5, Ha : µ < 5
Type of test (Step 3)

Hypothesis testing can done with either of the two types of test: one-tailed or two-tailed test.
These types can be referred to as . the directional changes or differences, as established by the
alternative hypothesis. It is therefore determined by the following conditions:

1. It is one-tailed test, if the established alternative hypothesis is directional or it indicates the


predicted direction of the difference greater than or less than.

For example:

a. Ha: µ > 50 → one-tailed test


b. Ha: µ < 50 → one-tailed test

2. It is two-tailed test, if the established alternative hypothesis is non-directional or the direction is


not indicated.

For example: Ha : µ ≠ 50 → two-tailed test


Level of Significance (Step 4)

The concept of the level of significance is based on the following:

1. It measures the risks in decision-making using hypothesis testing methodology

2. It is denoted by “α” , and determine the size of rejection region. A (1 –α) value
represents the level of confidence.

The rejection region consists of the numerical values of test statistics for which
the null hypothesis will be rejected.
3. It is the probability of committing the “Type I error”, which is rejecting the null
hypothesis when it is in fact true while a “Type II error” is committed if the null
hypothesis is accepted when it is false.
4. It is directly under the control of the individual performing the test. Traditionally,
levels from 0.01 to o.i0 are selected that corresponds to precision level from
99% to 90% respectively.
CRITICAL VALUE AT DIFFERENT LEVEL OF SIGNIFICANCE

Test Type Level of Significance

0.01 0.025 0.05 0.10

One-tailed ±2.33 ±1.86 ±1.645 ±1.28

Two-tailed ±2.575 ±2.33 ±1.96 ±1.645

Test Statistics (Step 5)

- is the basis for deciding whether to reject or to accept the hypothesis being tested. It is based on the
estimator of the parameter being tested. It corresponds to the specific distributions to be used as defined by
its basic assumptions.

For simple test of hypothesis, we may use the z-test or the t-test.
A. The z –Test

When to use the z – Test

z – Test may be used to test hypothesis involving a single sample mean or two sample means. It
make use of the normal distributions. It may also be used as a test for the difference between two
proportions.

Basic Assumptions for for z – test

1. Population standard deviation (σ) is known.


2. Sample size is large. (n > 30)
3. Population is truly or approximately normally distributed.
4. Samples are randomly selected.
5. Samples sizes, n1 and n2 are large.
6. Samples are independent samples.
B. The t – test

The t-test is normally used if the population standard deviation is not known and has to be
estimated using the sample standard deviation. Thus certain assumptions will have to be established
to be asured that the result obtained by employing the test is valid. A paired t-test is used to compare
samples resulting from before and after experimentation.
Basic Assumptions for t-test
1. Samples are randonmly selected.
2. Samples comes from normally distributed population.
3. Sample size is small. (n < 30)
4. For inference based on a single sample, the population standard deviation (σ) is unknown.
5. For inference based on two samples, the populations have equal variances.
Examples: For each of the following, state the appropriate test to be used & state the reason.

1. The quality control department of Lamoyan Industries, maker of toothpaste, wants to test the
performance of one of its filling machines. The machine is expected to discharge an average amount
of 10 mg per tube. The study calls for a sampling of 100 tubes and aims to detect any departure
from the setting.

Answer: z – test
Reason: testing population mean, large sample. (n > 30)

2. A businessman is considering the purchase of a vending machine. The seller claims that over the
past 3 years, the average daily revenue was P950.00. An observation for 28 days reveals a daily
revenue of P925.00 with a standard deviation of P55.00. What can be concluded?

Answer: t – test
Reason: testing population mean, small sample. (n < 30)
3. Given:
Group A Group B

Mean 100 105


sample size 20 30
variance 25 25

Answer: z – test
Reason: sample size of Group is ≥ 30
Test of Hypothesis (Step 6)

1. Sampling Distribution of a Mean

1. Z = X - µ = (X - µ) √ n for large sample n > 30


σ/√ n σ

2. t = X - µ for small sample n < 30


s/√ n
2. Sampling Distribution of a Proportion

Z = P - P₀ where: P₀ - hypothesized proportion


√p - q P - proportion of a given sample
n
III. A. Difference of Two Means

1. Z = X1 - X2

√ σ1 2 + σ 2 2

n1 n2

2. t = X1 - X2

√ (n1 - 1) S12 + (n2 - 1) S22 (1 + 1)

n1 + n2 – 2 n1 n2

df = n 1 + n2 – 2
B. Difference of Two proportions

Z = P1 - P2
σ P1 - P2

where:
σ P1 - P2 = √ pq ( 1 + 1 )
n1 n 2

p = X1 + X2

n1 + n2

q = 1 -p

P1 = X1 P2 = X2
n1 n2
!V. t-test for dependent & correlated samples

t = d
sd/√n

where:
d = ∑d / n
√ n∑d2 - (∑d)2
n (n - 1)

df = n - 1
Tabular (Critical) Value (Step 8)

A. z Value

How to determine the tabular value of z

The tabular value of z greatly depends on two things: the type of test and the level of significance ( α). Thus,
it is represented by “Za/2 .

Note:

1. If one-tailed test and Ha: µ1 < µ2 ; the tabular value is negatively signed.

2. If one-tailed test and Ha: µ1 > µ2 ; the tabular value is positively signed.

3. If two-tailed test, µ1 ≠ µ2 , the value is ± signed.


Example: 1: Given: α = 0.05 ; two-tailed test.
What is the tabular value of z?

Z = ± 1.96

Example 2: Given: (a) level of confidence 90%, (b) one-tailed test


What is the critical value of Z?

Since, level of confidence at 90%, α = 0.10. Thus, Z = +1.28 or -1.28


B. t value

Degrees of freedom is a property of t distribution. It is the number of variables yhat is allowed to vary
without changing the mean.

For single sample, degree of freedom (d. f.) = n – 1, while for two samples, (d. f.) = n1 - n2 + and t
distribution table. (Table III)

How to determine the Tabular Value of t

1. Specify the level of significance


2. Determine the type of test
3. Find the degrees of freedom
4. Refer to t distribution table (Table III)
Note:

1. If one-tailed test and Ha: µ1 < µ2 ; the tabular value is negatively signed.

2. If one-tailed test and Ha: µ1 > µ2 ; the tabular value is positively signed.

3. If two-tailed test, get the value of α/2 before looking at the table, and ,the value is ± signed.

Example 1: Given: Ha: µ1 ≠ µ2 ; α = 0.01 and n = 25


Find the tabular value of t.

Solution: Based on Ha, type of test is two-tailed. Since, it is two-tailed , level of significance will be
divided by 2 (α/2 = 0.01/2 = 0.005), and d.f. = 25-1 = 24, therefore, referring to the table

t0.005, 24 = ±2.797
Example 2: The average number of applicants for a trainee in a call center for the past years has been
44.6. Recent study from 10 call centers suggests that the attractiveness of this type of job
maybe increasing. If the hypothesis will be tested using t-test with 97.5% confidence level,
what is the tabular value of t?

Solution: Since Ha: µ1 > µ2, then α = 0.05, and n = 10, then d.f = 10 - 1 = 9,
therefore, t0.025, 9 = +2.262

Note: Since confidence level is 97.5%, α = 1 - 0.975 = 0.025


Decision Making (Step 9)

The decision dilemma on whether to accept or reject certain hypothesis basically depends on two values,
the tabular value and the computed value. As mentioned the computed value is determined by
calculations using the appropriate statistical test formula, while the tabular value is taken from the table.

For the two-tailed test:


Decision Rule:
1) If computed value is between -1.96 to +1.96, then the null hypothesis is accepted.

2) If the computed value lies or fall on the rejection region, the null hypothesis is rejected.

Or (This is applicable for positive Computed value & tabular value)

1. If the computed value is greater than the tabular value, accept Ha , reject H o

2. If the computed value is less than the tabular value, accept H o , reject Ha
Example:
Given: (a) Zcv = 2.19 (b) Ztv = ± 1.65
What is the decision?

Solution: Since 2.19 > 1.65, the computed value is greater than the tabular value. The decision is to
reject the null hypothesis H o & accept alternative hypothesis Ha.
I. Sampling Distribution of a Mean

Example 1. For z test (n > 30)

A company that makes chocolates claims that the mean weight of a bag of chocolates is 240
grams with a standard deviation of 20.5 grams. Using a 0.05 significance level, would you agree with the
company if a random sample of 50 bags of chocolates was found to have a mean weight of 230 grams?

Solution:

1. Ho : (µ = 240 grams) There is no significant difference in the average weight per pack of
chocolates.
2. Ha : (µ ≠ 240 grams) There is a significant difference in the average weight per pack of
chocolates.

3. Type of test: two tailed test

4. Level of significance: α = 0.05

5. Test Statistics: Z – test

6. Formula: Z = (X - µ) √ n = (230 - 240) (√50) = (-10) (7.07107) = -3.45


σ 20.5 20.5
7. Computed value: Zcv = - 3.45

8. Tabular value: Ztv = ± 1.96

9. Decision : Reject null hypothesis

10. Conclusion: There is a significant difference in average weight per pack of chocolates.
(Reject Ho , accept Ha
Example 2

The mean & standard deviation in years of the lifetime of camera produced by a manufacturer are
4 and 1.6 respectively. The manufacturer claims that the main lifetime of the cameras has increased due
to a new manufacturing technique. To determine the validity of such claim, a sample of 30 cameras was
taken, the mean lifetime of which is 5 years. Is the claim valid at an α = 0.05?

Solution:

1. H0 :

2. Ha :

3. Type of test:

4. Level of Significance:

5. Test Statistics:

6. Formula:
7. Computed value:

8. Tabular value:

9. Decision:

10. Conclusion:

Example 3: For t – test (n < 30) One Sample

It is claimed that the calorie contents of a certain brand of powdered milk does not exceed 99.2
per bag. A sample of 10 bags has a mean calories contents of 102 & a standard deviation of 17.5. At an
α = 0.05, determine whether or not the claim is true.

Solution:

1. Ho : The calorie contents of a certain brand of powdered milk does not exceed 99.2 per bag.

2. Ha : The calorie contents of a certain brand of powdered milk exceed 99.2 per bag.
3. Type of test: one-tailed test, right tail

4. Level of Significance: α = 0.05

5. Test Statistics: t –test df = n -1 = 10 - 1 = 9

6. Formula: t = (X - µ) √ n
s
7. Computed value: t = (102 - 99.2) √9 = (2.8) √9 = 2.8 (3) = 8.4 = 0.48
17.5 17.5 17.5 17.5
8. Tabular value: t = 1.833

9. Decision: Since, the computed value of t is less than the tabular value. Reject Ha , accept Ho

10. Conclusion: Therefore: The calorie contents of a certain brand of powdered milk does not exceed
99.2 per bag.
Example 4:
The National Steel Co. is manufacturing steel wire with an average strength of 50 kilos. The
laboratory tests a random sample of 18 pieces of wires and finds that the mean strength is 48 kilos and the
standard deviation is 10 kilos. Are the results in accordance with the hypothesis that the company produces
steel wire with an average strength of 50 kilos? (Use 10% level of significance)
Solution:
1. Ho : µ = 50 kilos There is no significant difference in the average strength.
2. Ha : µ ≠ 50 kilos There is a significant difference in the average strength.
3. Type of Test: 2 tailed test
4. Level of significance : α = o.10/d.f. = 18 – 1 = 17
5. Test Statistics: t - test
6. Formula: t = (X - µ) √ n = (48 - 50) √ 18 = ( -2 ) (4.24264) = 8.48528 = - 0.848
s 10 10 10
7. Computed value: tcv = - 0.848
8. Tabular value: ttv = -1.740

9. Decision: Accept Ho , Reject Ha

10. Conclusion: Therefore, there is no significant difference in the average strength.

II. Sampling Distribution of a Proportion

Example 1.

In a consumer taste test, 100 regular Pepsi drinkers are given blind samples of Coke and Pepsi,
48 of these subjects preferred Coke. At an 0.05 level significance, test the claim that Coke is preferred by
50% of Pepsi drinkers who participated in such blind taste tests.
Solution:

1. Ho : p = 0.50 There is no sufficient evidence to the claim that 50% of Pepsi drinkers prefer Coke.

2. Ha : p = 0.05 There is a sufficient evidence to the claim that 50% of Pepsi drinkers prefer Coke.

3. Type of Test: 2 tailed test

4. Level of significance : α = o.05

5. Test Statistics: z - test for proportion

6. Formula: Z = p - Po
√ Pq
n

7. Computed value: Z = p - Po = 0.48 - 0.50 = - 0.02 = - 0.02 = - 0.40


√ Pq √(0.50) (0.50) √ 0.25 0.05
n 100 100
p = x/n 48 / 100 = 0.48

q = 1 – P = 1 - 0.5 = 0.5
8. Tabular value: Z = 1.96

9. Decision: The computed Z = -0.40 is greater than the critical value of Z = -1.96 and it falls in the
non-rejection region, while Z = │-0.40│ is less than Z = │-1.96│ and also falls at the non rejection
region of Ho .

10. Therefore, you fail to reject the null since there is no sufficient evidence to reject the claim that 50% of
Pepsi drinkers prefer Coke.
III. A. Difference of Two Means

Example 1. Z - test for two sample

A study was made to determine if there is a significant difference in the salaries of professors in
the private and state colleges in Cebu. The results taken from 80 professors on both groups, are as
follows:
Private State

Mean 12,500 11,275


Std, Dev. 120 120

What can be concluded from this?


Solution:

1. Ho : µp = µs There is no significant difference in the average salaries between the professors in private
and state colleges.
2. Ha : µp ≠ µs There is a significant difference in the average salaries between the professors in private
and state colleges.
3. Type of test: two-tailed test
4. Level of Significance: α = 0.05
5. Test Statistics: z – test
6. Formula: Z = X1 - X2

√ σ1 2 + σ2 2

n1 n2
7. Computed value: Zcv = 12,500 - 11,275 = 1,225 = 1225 = 1225
√ (120) 2 + (120)2 √ (14,400) + (14,400 √180 + 180 18.9737
80 80 80 80

Zcv = 64.56

8. Tabular value: Ztv = ± 1.96

9. Decision: Reject H0, since Zcv ≥ Ztv

10. Conclusion; There is a significant difference in the average salaries between the professors in private
and state colleges.
Example. t - test two sample

Sybulle Korean Restaurant is attempting to determine if their average daily revenue for the new
menu exceeds the old menu. A 15 randomly selected days for old menu showed mean revenue of P3,600
with standard deviation of P250, while the data for the new menu in 15 days revealed a mean revenue of
P3,750 with a standard deviation of P230. What can be concluded at 1% level of significance?

Solution:

1. Ho : µ1 = µ2 The revenue for the new menu did not exceed the revenue for the old menu..

2. Ha : µ1 ≠ µ2 The revenue for the new menu exceed the revenue for the old menu.

3. Type of test: one-tailed test

4. Level of Significance: α = 0.01

5. Test Statistics: t – test / d.f. = n1 + n2 - 2 = 15 + 15 - 2 = 28


6. Formula: t = X1 - X2

√ (n1 - 1) S12 + (n2 - 1) S22 (1 + 1)

n 1 + n2 – 2 n1 n2

df = n 1 + n2 – 2

7. Computed value: tcv = P3,750 - P3,600


√ (15 – 1) (250)2 + (15 – 1) (230)2 ( 1 + 1)
15 + 15 -2 15 15

= P150
√ (14) (62,500) + (14) 52,900) (0.0667 + 0.0667)
28
= P150 P150
√(875,000) + 740,600 (0.1334) = √1,615600 (0 .1334)
28 28
= P150 P150 = tcv = 1.710
√7,697.18) = 87.73357
8. Tabular value: ttv = +2.763

9. Accept Ho , since tcv ≤ ttv

10. Therefore: The revenue for the new menu did not exceed the revenue for the old menu.

IV. t-test for dependent or correlated samples

t = d
sd/√n

where:
d = ∑d / n
√ n∑d2 - (∑d)2
n (n - 1)

df = n - 1
Example:

Using a reaction times, 14 persons were tested for reaction times with their left and right hands. Only right
handed person were used. The results (in thousandths of a second) are given in the table below:

Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Right 191 97 116 165 165 129 171 155 112 102 188 158 121 133
Left 224 171 191 207 196 165 177 165 140 188 155 219 177 174

Use a significance level of 0.05 to test the claim that there is a difference between the mean of the right and
left hand reaction times.

If an aeronautical engineer is designing a fighter-jet cockpit and must locate the ejection-set activator to be
accessible either the right or the left hand does it make a difference which hand he/she chooses?
Solution:

1. Ho : µd = 0 There is no sufficient evidence to support the claim that there is a difference between the
right & left hand reaction times.

2. Ha : µd ≠ 0 There is a sufficient evidence to support the claim that there is a difference between the
right & left hand reaction times.

3. Type of test: two-tailed test

4. Level of Significance: α = 0.05

5. Test Statistics: t – test / d.f. = n – 1 = 14 - 11 = 13

6. Formula: t = d
sd /√n

where:
d = ∑d / n
√ n∑d2 - (∑d)2
n (n - 1)

df = n - 1
7. Computed value: t = - 4.79

Solution:
Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 TOTAL

D=R -L -33 -74 -75 -42 -80 -36 -6 -10 -28 -86 33 -61 -56 -41 -595

D2 1089 5476 5625 1764 6400 1296 36 100 784 7396 1089 3721 3163 1681 39,593

d = Ʃd = -595 = - 42.5
n 14

sd = √ n∑d2 - (∑d)2 = √ 14 (39,593) - ( -595)2 = √554,302 - 354,024 = √1100,4286


n (n - 1) 14(13) 182

sd = 33.17

Therefore: t = d - µd = 42.5 - 0 = 42.5 = 42.5 = - 4.79


sd / √n 33.2/ √14 33.2 /3.74166 8.87307
9. Decision: Reject Ho since, the computed value of t = -4.79 is less than the critical value of t = -2,160
and falls in the critical region at the left tail, while the value t = │-4.79│ is greater than the
critical value of t = │-2.160│ and falls in the critical region at the right tail.

10. Conclusion: There is a sufficient evidence to support the claim that there is a difference between the
right & left hand reaction times.

You might also like