BUS-End Term Merged

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 175

Interval Estimation- Confidence Interval of µ

What- In interval estimation we give an interval about the point estimator together with some measure of assurance of how
close it is to the true parameter value. E.g. we say population mean height lies between 68±2 with 95% confidence.
ത µ
𝑋−
Construction of 95% confidence interval (C.I.) for µ of N(µ,σ)- 95% = P(-1.96≤ Z ≤1.96) = P(-1.96 ≤ 𝜎/ ≤ 1.96)
ഥ 𝑛
ത N (mean= µ, S.E. = σ/ 𝑛) ⟹ 𝑿− µ =Z ~N(mean=0,S.D.=1).
𝑋~ 𝝈/ 𝒏 = P[ 𝑋ത – 1.96( σ/ 𝑛 ) ≤ µ ≤ 𝑋ത + 1.96( σ/ 𝑛 ) ].
Therefore, 95% C.I. for µ is 𝑋ത ∓ 1.96( σ/ 𝑛 )
In general, (1-α)100% C.I. for µ is 𝑋ത ∓ 𝑍𝛼/2 ( σ/ 𝑛 )
C.I.- P.E. ∓(Table value)S.E.(P.E.)
Lower Confidence Limit(LCL)/LB,
Upper Confidence Limit(UCL)/UB
Note- For 90% C.I.- 𝑍.05 = 1.64
How to do problem-

Ex- A random sample of size 16 from a population of height with S.D. 4” yields 𝑋=68”. I) Find a 95% C.I. for the
population mean height, ii) find its width, iii) Can you say population mean height is 68”?
Ans.- i) a) (1-α)100% = 95% ⇒ 1-α = .95 ⇒ α/2= .025, b) Z.025=1.96,

c) ⇒ 95% C.I. for µ is 𝑋ത ±1.96(σ/ 𝑛 ) = 68 ±1.96(4/ 16 ) = 68±1.96.⇒ C.I. is (66.04,69.96).


ii) Width=UCL-LCL= 2*1.96 = 3.92.
iii) µ=68 ∈ C.I.. This implies population mean height is 68”.
Interval Estimation-C.I. of 𝜇
Interpretation of 95% C.I. for µ- If we do repeated sampling
and for each sample compute the C.I. given by the formula 𝑋ത
±1.96(σ/ 𝑛 ), then 95% of such intervals will contain the true
value of µ.
Determination of sample size for given precision and
confidence in the estimation of µ-
Want to est. µ within 2’’ of its true value with 95% confidence.
1.96(σ/ 𝑛) = 2 ⇒ n = (1.96σ/2)2 = (1.96*4/2)2 = 15.366 ≅ 16
(since σ=4” given). Note- If σ is unknown, then it can be replaced by its estimated
General format- Want to determine sample size n required to value in the following manner.
estimate the population mean µ within error bound B of its i) If the population range=R= (Max- Min) is available, then 𝜎ො =
true value with (1-α)100% confidence. R/6.
Instead of table value 1.96 use Zα/2 and for 2” use error bound ii) Undertake a pilot study with small sample size, then 𝜎ො = S
B. Then, Zα/2(σ/ 𝑛) = B ⇒ n= (Zα/2σ/B)2. Discuss. where S2= σ𝑛𝑖=1(𝑋-𝑋)ത 2/(n-1) = (∑X2 - n𝑋ത 2)/(n-1).
Ex- The minimum and maximum height in the population are 56” and 82” respectively. Determine the sample size required
to estimate the population mean height within 1.5” of its true value with 90% confidence.
Ans. 𝜎ො = R/6 = (82-56)/6 =4.333. Implies n = (Zα/2σ/B)2 = (Z.05σ/B)2 = (1.645*4.333/1.5)2 =22.58≅ 23.
C.I. for 𝜇- 𝜎 known and unknown- t distribution
Table for C.I. of 𝜇- P.E. ∓ (Table value)(S.E. or Est. S.E. of P.E.)
t-distribution- Unlike Z,
we have different t-
distributions for
different degrees of
freedom (d.f.) K. For
our purpose k = n-1. If
n=16 then d.f.=K=15.
Graph of t-curve is
given below.

Properties of t- curve-
Notation- 𝒕𝜶,𝑲 = t-value such that the RTA=α
under t-curve with d.f. K.
Ex- t.05,4 = t-value such that RTA=.05 under
t-curve with K=4 is 2.132 . t.05,5 = t-value
such that RTA=.05 under t-curve with K=5
is 2.015.
𝒕.𝟎𝟓,∞= 𝒁.𝟎𝟓 = 1.645
Confidence Interval for 𝜇- Examples

Ex- A random sample of size 100 yields 𝑋=16000, S= 4000. I) Construct a 90% C.I. for the population mean income.
Ii) Find its width. Iii) Can you say population mean income is 18000?
Ans.- i) n=100≥ 30 implies can use Z-table value.

90% C.I. for µ is 𝑋ത ±Z.05(S/ 𝑛 ) or 16000±1.64(4000/ 100).


LCL = 16000 – 656 = 15344 and UCL= 16000 + 656 = 16656.
i) Width= 2*656= 1312.
ii) µ=1800 ∉ C.I. ⇒ population mean income is not 18000.
Ex- A random sample of size 4 from the population of height yields 64,66,68,70. I) Construct a 95% C.I. for the
population mean height. Ii) Find its width. Iii) Can you say population mean height is 66”?
Ans.- Normal population, σ unknown. Use t-table value.
ത ∑X/n = 67, S2= σ𝑛𝑖=1(𝑋-𝑋)
i) 𝑋= ത 2/(n-1) = 20/3=6.667. S= 2.582.
95% C.I. is 𝑋ത ±𝑡.025,3 (S/ 𝑛 ) OR 67±3.182(2.582/ 4) = 67 ± 4.108 OR (62.892,71.108).
i) Width = 2*4.108 =8.216.
ii) µ = 66 ∈ C.I.. ⟹ The population mean can be 66”.
Estimation of population proportion of success p- C.I.
Estimation of population proportion of success (p)-
Ex- Population proportion below poverty line (BPL), cure by a drug, credit card holders, A-type blood, defective
Point estimator- 𝒑 ഥ = X/n where X= number of successes in n trials. Note X~Bin(n,p). Hence E(X) = np, Var (X) = npq which
ഥ) = 𝝈𝒑ഥ 𝟐 = pq/n.
ഥ) = 𝝁𝒑ഥ = p, Var ( 𝒑
implies E( 𝒑

Ex- In a random sample of 100 families 60 are BPL. Then the sample proportion of success (BPL) = 𝑝ҧ = 60/100 =.6 =60%.
ഥ, such that the probability of p lying in that interval is 95% or (1-𝛼)100% in general.
Interval estimation- Interval about 𝒑

C.I. for p is P.E. ∓ (Table value)(S.E. or Est. S.E. of P.E.)

Ex- In a random sample of 100 families 60 are BPL. I) Construct a 95% C.I. for the population proportion BPL, ii) find its width,
iii) Can you say population proportion BPL is 70%?
Estimation of population proportion p- Determination of n
(1-𝛼)

𝛼
𝛼 (2)
(2)

1.96( 𝒑𝒒/𝒏 = .02. Implies n = (1.96/.02)2*(pq).

Replace pq by its maximum value = 0.25. In


that case ,n = (1.96/.02)2*(.25) = 2401. Incase of
decimal always round upwards.

General format- We want to estimate p within error bound B of its true value with (1-α)100% confidence. Replacing
1.96=Z.025 by Zα/2 and error bound .02 by B, we get, n = (Zα/2/B)2*(.25).
Estimation of proportion of success- Finite population WR
For sampling from Finite Population WOR,
𝑵−𝒏 𝒑𝒒 𝑁−𝑛
𝒑) = p, Var(ഥ
E(ഥ 𝒑) = ( ) ( ) where = Finite Population Correction (fpc) factor OR
𝑵−𝟏 𝒏 𝑁−1
𝑛
Finite Population Multiplier (FPM) which can be ignored if the sampling fraction < 5%.
𝑁
Interval Estimate of a Population Mean:  Known
• Example: Discount Sounds
Discount Sounds has 260 retail outlets throughout the United States. The
firm is evaluating a potential location for a new outlet, based in part, on the
mean annual income of the individuals in the marketing area of the new
location.
A sample of size n = 36 was taken; the sample mean income is $41,100.
The population is not believed to be highly skewed. The population standard
deviation is estimated to be $4,500, and the confidence coefficient to be used
in the interval estimate is .95.

9
Interval Estimate of a Population Mean:  Known
• Example: Discount Sounds
95% of the sample means that can be observed are within + 1.96 𝜎𝑥ҧ of the
population mean . The margin of error is:

𝜎 4,500
𝑧𝛼/2 = 1.96 = 1,470
𝑛 36

Thus, at 95% confidence, the margin of error is $1,470.

10
Interval Estimate of a Population Mean:  Known
• Example: Discount Sounds
Interval estimate of  is:
$41,100 + $1,470
or
$39,630 to $42,570

We are 95% confident that the interval contains the population mean.

11
Interval Estimate of a Population Mean:  Known
• Example: Discount Sounds
Confidence Margin
Level of Error Interval Estimate
90% 1,234 39,866 to 42,334
95% 1,470 39,630 to 42,570
99% 1,932 39,168 to 43,032

In order to have a higher degree of confidence, the margin of error


and thus the width of the confidence interval must be larger.

12
Interval Estimate of a Population Mean:  Unknown
• If an estimate of the population standard deviation  cannot be developed
prior to sampling, we use the sample standard deviation s to estimate  .
• This is the  unknown case.
• In this case, the interval estimate for  is based on the t distribution.
• (We’ll assume for now that the population is normally distributed.)

13
t Distribution
Standard t distribution
normal (20 degrees
distribution of freedom)

t distribution
(10 degrees
of freedom)

z, t
0

14
Interval Estimate of a Population Mean:  Unknown
• Example: Apartment Rents
A reporter for a student newspaper is writing an article on the
cost of off-campus housing. A sample of 16 one-bedroom
apartments within a half-mile of campus resulted in a sample
mean of $750 per month and a sample standard deviation of $55.
Let us provide a 95% confidence interval estimate of the mean rent per
month for the population of one-bedroom apartments within a half-mile of
campus. We will assume this population to be normally distributed.

15
Interval Estimate of a Population Mean:  Unknown
• Interval Estimate
𝑠
𝑥ҧ ± 𝑡.025,15
𝑛
55
750 + 2.131 = 750 + 29.30
16

We are 95% confident that the mean rent per month


for the population of one-bedroom apartments within
a half-mile of campus is between $720.70 and $779.30.

16
Sample Size for an Interval Estimate of a Population Mean
• Example: Discount Sounds
Recall that Discount Sounds is evaluating a potential location
for a new retail outlet, based in part, on the mean annual income
of the individuals in the marketing area of the new location.
Suppose that Discount Sounds’ management team wants an estimate of
the population mean such that there is a .95 probability that the sampling
error is $500 or less.
How large a sample size is needed to meet the required precision?

17
Sample Size for an Interval Estimate of a Population Mean
𝜎
𝐸 = 𝑧𝛼/2 = 500
𝑛

At 95% confidence, z.025 = 1.96. Recall that  = 4,500.


(1.96)2 (4,500)2
𝑛= 2
= 311.17 ⋍ 312
(500)

A sample of size 312 is needed to reach a desired


precision of + $500 at 95% confidence.

18
Interval Estimate of a Population Proportion
• Example: Political Science, Inc.
Political Science, Inc. (PSI) specializes in voter polls and surveys
designed to keep political office seekers informed of their position in a
race.
Using telephone surveys, PSI interviewers ask registered voters who
they would vote for if the election were held that day.

19
Interval Estimate of a Population Proportion
• Example: Political Science, Inc.
In a current election campaign, PSI has just found that 220
registered voters, out of 500 contacted, favor a particular candidate.
PSI wants to develop a 95% confidence interval estimate for the
proportion of the population of registered voters that favor the
candidate.

20
Interval Estimate of a Population Proportion

n 𝑝=
ҧ 220 ≥ 5, n 𝑞ത =280 ≥ 5. Hence can use Z-table value.

𝑝(1
ҧ − 𝑝)ҧ
𝑝ҧ ± 𝑧𝛼/2
𝑛

where: n = 500, 𝑝ҧ = 220/500 = .44, z/2 = 1.96

.44(1−.44)
.44 ±1.96 = .44 ± .0435
500

PSI is 95% confident that the proportion of all voters


that favor the candidate is between .3965 and .4835.

21
Sample Size for an Interval Estimate of
a Population Proportion
• Example: Political Science, Inc.
Suppose that PSI would like a .99 probability that the sample
proportion is within + .03 of the population proportion.
How large a sample size is needed to meet the required precision?
(A previous sample of similar units yielded .44 for the sample
proportion.)

22
Sample Size for an Interval Estimate of
a Population Proportion
𝑝∗ (1 − 𝑝∗ )
𝑧𝛼/2 = .03
𝑛

At 99% confidence, z.005 = 2.576. Recall that p* = .44.


2 ∗
𝑧𝛼/2 𝑝 1 − 𝑝∗ 2.576 2 (.44) .56
𝑛= = = 1816.73 = 1817
𝐸2 (.03) 2

A sample of size 1817 is needed to reach a


desired precision of + .03 at 99% confidence.

23
Sample Size for an Interval Estimate of
a Population Proportion

Note: We used .44 as the best estimate of p in the preceding expression.


If no information is available about p, then .5 is often assumed
because it provides the highest possible sample size. If we had
used p = .5, the recommended n would have been 1843.

24
Testing of Hypothesis-
ഥ.
Ex- Report says population mean height 𝝁 is 66”. Researcher thinks it has increased. He takes n=16. Computes 𝑿
Terms- Hypothesis- Null hypothesis (H0)- Sources- Alternative hypothesis (H1 or HA)- H0: µ = 66” (µ0) versus 𝑯𝟏 : 𝝁 > 66”.
Intuition says, If 𝑋ത ≫ 66”, then reject H0, otherwise accept H0. Cut-off point/Critical Value and Critical Region.-
determined such that errors are minimized- 1) Type-I (Rejection) error, 2) Type- II (Acceptance) error .
P(Type I error) = P(Rejecting H0|H0) = α (Level of Significance).
P(Type II error) = P(Accepting H0 | H1) = β.
Note- The critical value/cut off point is determined such that α and β are minimum. But as one decreases the other
increases. Usual practice is for a given value of α (to be decided by the management) we look for a C.V. and C.R. for which β is
minimum.
Ex- Comparing average yield of existing process with a new process. The management will not change from the old process
to the new process unless it is very sure that the new process is a better one because it involves lot of money and risk. So,
the probability of wrongly rejecting the old (H0) should be kept at a minimum, Say α = 1% to 5%. Will take the problem
forward with α = .05.This level of significance α plays very important role in testing. C.V. and C.R./R.R depends on it.
ഥ -values under the 𝑯𝟎 (µ = 66”)
Determination of C.V. and C.R. for α = .05.- α = 5%. So, Reject H0 for the top 5% samples or 𝑿
ഥ.
dist. of 𝑿
ഥ ~ N(mean=µ, S.D.=σ/ 𝒏=4/ 𝟏𝟔 =1), given σ=4 and n=16.
𝑿
ഥ ~ N(mean=µ=66,S.D.=σ/ 𝒏=4/ 𝟏𝟔 =1) shown below.
Therefore, under H0(µ=66), 𝑿
Type I and Type II Errors
Population Condition

H0 True H0 False or 𝐻1 true


Conclusion using sample (m < 12) (m > 12)

Accept H0 Correct
Type II Error
(Conclude m < 12) Decision

Reject H0 Correct
Type I Error
(Conclude m > 12) Decision

2
Testing of hypothesis-

ഥ −𝟔𝟔
𝑿
C.V. is 𝑋ഥ = 66+1.64(1) = 67.64 and C.R. = {𝑿
ഥ| 𝑿
ഥ > 66+1.64(1)}, C.R. = {
𝟏
=Z|Z > 1.64}. In the general format
ഥ| 𝑿
C.R. in the general format is C.R. = {𝑿 ഥ > µ0+Zα(σ/ 𝒏)}. ഥ − 𝝁𝟎
𝑿
C.R. = { =Z|Z > Zα}.
𝝈/ 𝒏
ഥ which is
This not readily table accessible but standardized 𝑿
ഥ −𝟔𝟔
𝑿 ഥ − 𝝁𝟎
𝑿
Z= follows a Z-distribution. Note- = Z is called the test statistics = T.S.
𝟏 𝝈/ 𝒏

ഥ ~ N(mean=µ, S.D.=σ/ 𝒏 = 1).


Note- We will show as α decreases β increases. Need area interpretation of α and β. 𝑿

α = P(rejecting H0 | H0) = Area of the C.R. under H0-distribution of 𝑿.
ഥ.
Β = P(accepting H0 | H1) = Area of the A.R. under H1- distribution of 𝑿
Under H0(µ=66), 𝑋ത ~ N(mean=µ=66, S.D.=1), under H1(µ=68, 𝑋ത ~ N(mean=µ=68, S.D.=1). Draw both the curves in same graph.
Testing of hypothesis-
Note- As α (area of C.R.
under H0-curve)
decreases, 𝛽 (area of the
A.R. under H1-curve)
increases.

Case 2 (H1:µ< µ0)- Researcher thinks the population mean has decreased, then H1 now becomes H1: µ < 66”.
ഥ ≪ 66” then reject H0. So will reject for bottom 5% 𝑿
In that case If 𝑿 ഥ -values under the H0-distribution of 𝑿

𝐻0

C.V. is 𝑋ഥ = 66-1.64(1) = 64.36, C.R. = {𝑿


ഥ| 𝑿
ഥ < 66-1.64(1)}. General format is C.R. = {𝑿
ഥ| 𝑿
ഥ < µ0-Zα(σ/ 𝒏)}.
ഥ −𝟔𝟔
𝑿 ഥ − 𝝁𝟎
𝑿
In table accessible form C.R. = { =Z|Z < -1.64}. General format C.R. = { =Z|Z < - Zα}.
𝟏 𝝈/ 𝒏
Testing of hypothesis-
Case 3 (H1: µ ≠ µ0)- Researcher thinks the population mean now is different. Then H1 now becomes, H1: µ1 ≠ 66.
ഥ ≪ 𝟔𝟔 or 𝑿
If 𝑿 ഥ ≫ 66, then Reject H0. So, Reject H0 for bottom 2.5% samples or top 2.5% samples under H0-distribution of 𝑿
ഥ.

ഥ| 𝑿
C.R. = {𝑿 ഥ < 66-1.96(1) OR 𝑿
ഥ >66+1.96(1)} and General format (GF) is C.R. = {𝑿
ഥ| 𝑿
ഥ < µ0-Zα/2(σ/ 𝒏) OR 𝑿
ഥ > µ0+Zα/2(σ/ 𝒏 )}.

ഥ −𝟔𝟔
𝑿 ഥ −𝟔𝟔
𝑿 ഥ − 𝝁𝟎
𝑿 ഥ − 𝝁𝟎
𝑿
C.R. = { 𝟏 =Z|Z < -1.96 OR Z > 1.96} = { 𝟏 =Z| |Z| > 1.96}. GF C.R. = { =Z|Z < - Zα/2 OR Z > Zα/2} = { =Z| |Z| >Zα/2}.
𝝈/ 𝒏 𝝈/ 𝒏

Note- Case 1- right tailed C.R./test, Case 2- left tailed C.R./test, In general- one tailed C.R./test. Case 3- two tailed C.R./test.

How to do problems in testing of hypothesis? – 1) write H0 and H1, 2) compute T.S. under H0, 3) draw the C.R., 4)
accept/reject H0, 5) conclusion with reference to the question.
Testing single population mean 𝜇
Ex- Height is normally distributed with S.D. 4”. A random sample of size 16 yields sample mean as 68”. Can you say the
population mean height is more than 66” at level of significance .05?
Ans.- 1) H0: µ=66” vs H1: µ > 66”. (Right tailed C.R.)
Power of a test/C.R. for given value of the
parameter µ-
Power γ = 1–β =1- P(accepting H0|H1) =
P(rejecting H0|H1) = Area of the C.R. under

H1-distribution of 𝑋.
Note- i) it is the probability of a correct
decision, ii) It is the probability of
T.S.= ascertaining the correctness of the H1.
4) Reject H0.
5) The population mean height is more than 66”.
p-value- Smallest value of probability of rejection of 𝐻0 for a given
sample data. H0 is rejected for the given sample data if α > p-value
and accepted if α < p-value.
Ex (continued)- Find p-value of the above test.
Ans.- P(Z > 2) = 1 -.9772 = .0228. Implies p-value is .0228.
Testing of hypothesis- Power of a Test/ C.R.
Ex(continued)- Ht. = X ~ N(mean=µ, S.D.=4). A random sample of size 16 is drawn for testing H0:µ=66 vs H1:µ>66 at α=.05.
We have already seen the C.R. = { 𝑋ത | 𝑋ത > 67.64}. Compute power at µ = 68,69,70.

ത 𝑁(𝑚𝑒𝑎𝑛 = 𝜇, 𝑺. 𝑬. = 𝜎 = 4 = 𝟏)
𝑋~ 𝑛 16
ത 𝑁(𝑚𝑒𝑎𝑛 = 66, 𝑆. 𝐸. = 1)
For 𝜇 = 66, 𝑋~
ത 𝑁(𝑚𝑒𝑎𝑛 = 68, 𝑆. 𝐸. = 1)
For 𝜇 = 68, 𝑋~
ത 𝑁(𝑚𝑒𝑎𝑛 = 69, 𝑆. 𝐸. = 1)
For 𝜇 = 69, 𝑋~
ത 𝑁(𝑚𝑒𝑎𝑛 = 70, 𝑆. 𝐸. = 1)
For 𝜇 = 70, 𝑋~

AR CR

67.64−68
γ (µ=68) = P(𝑋ഥ > 67.64|µ=68) = P(Z > ) = P(Z > -.36) = 1 - .3564 = .6436
1
67.64−69
γ (µ=69) = P(𝑋ത > 67.64 | µ =69) = P( Z > ) = P(Z > -1.36)= 1 - .0869 = .9131
1
67.64−70
γ (µ=70) = P( 𝑋ത > 67.64 | µ=70) = P( Z > ) = P(Z > -2.36) = 1 - .0091 = .9909.
1
Testing about 𝜇- Determination of n for given 𝛼 and 𝛽
Ex (continued) Ht. = X ~ N(mean=𝝁, S.D.=𝝈). Want to determine n required to test 𝑯𝟎 : 𝝁 = 𝝁𝟎 vs 𝑯𝟏 : 𝝁 > 𝝁𝟎 (say 𝝁 = 𝝁𝟏 > 𝝁𝟎 )
for given values of 𝜶 and 𝜷.
𝝈
ഥ ~N(mean=𝝁, S.E.= )
Note- 𝑿 𝒏
𝑯𝟎 𝑯𝟏 Ex- Ht. = X ~ N(mean=𝜇, S.D.=𝜎=4). Find the
sample size required to test 𝐻0 : 𝜇 = 𝜇0 = 66 vs 𝐻1 :
𝜇 > 𝜇0 (say 𝜇1 = 68) given 𝛼 = 0.05 and 𝛽 = 0.10.
𝛽 𝛼 𝟐
2 (𝒁𝜶 +𝒁𝜷 ) (1.645+1.28)2
Ans.- n = 𝜎 (𝝁 −𝝁 )𝟐 = 42 = 34.22≅35
𝟏 𝟎 (68−66)2

𝑋ത
𝝁𝟎 AR C CR 𝝁𝟏

𝟐
𝜎 𝜎 𝜎 𝟐 (𝒁𝜶 +𝒁𝜷 )
C= 𝝁𝟎 + 𝑍𝛼 ( ) = 𝝁𝟏 - 𝑍𝛽 ( 𝑛) ⟹ (𝑍𝛼 + 𝑍𝛽 ) ( ) = (𝝁𝟏 - 𝝁𝟎 ) ⟹ n = 𝝈 (𝝁 −𝝁 )𝟐
𝑛 𝑛 𝟏 𝟎
Note- For a two-tailed test/C.R., replace 𝒁𝜶 by 𝒁𝜶/𝟐 .
Testing 𝜇- 𝜎 known and unknown
(𝑷.𝑬.−𝑷𝒂𝒓𝒂𝒎𝒆𝒕𝒆𝒓 𝒖𝒏𝒅𝒆𝒓𝑯𝟎 )
T.S.|𝑯𝟎 =
𝑺.𝑬.𝒐𝒓 𝑬𝒔𝒕.𝑺.𝑬.𝒐𝒇 𝒕𝒉𝒆 𝑷.𝑬.

Ex- A random sample of size 100


yields mean= 18000 and S.D.= 4000.
Can you say the population mean
income has decreased from 19000 at
α = .05? Find the p-value.

Ans.- n=100 > 30, σ unknown. Use Sl. No. 2. 3) CR


1) H0: µ=19000 vs H1: µ < 19000. (Left tailed C.R.)
ഥ − 𝝁𝟎
𝑿
2) T.S.| H0 = = Z = (18000-19000)/(4000/ 100) =
𝑺/ 𝒏
- 1000/400 = - 2.5.
4) Reject H0.
5) The population mean is less than 19000.
p-value=Area from - ∞ to – 2.5 = 0.0062.
Ex- A random sample of size 4 from the population of height yields the following observations: 64,66,68,70. Can you say the
population mean height is 70” at α = .05?
Testing 𝜇- Examples
ത 67, S2 = σ𝑛𝑖=1(𝑋-𝑋)
Ans.- n small, σ unknown, population normal. Can use t- T.S. at Sl. No. 4. 𝑋= ത 2/(n-1) = 20/3. S= 2.582

1) H0: µ=70” vs H1: µ≠ 70. (Two tailed C.R.)


ഥ − 𝝁𝟎
𝑿
2) T.S.| H0 = = tn-1 = t3 = (67-70)/(2.582/ 4 ) = - 2.32
𝑺/ 𝒏
3) CR

𝑡.10,3 = 1.638
𝑡.05,3 = 2.353

So area in the lower tail of the computed T.S.


value -2.32 lies between 5% and 10%.
P-value lies between 2*5%=10% and 2*10%=20%

4) Do not reject H0.


5) can’t say the population mean height is not 70”.
Inference about population proportion p
Ex- Population proportion BPL, cure by a drug, defective in a production line, A-type blood, credit card holders.
𝑿
ഥ = where X is the number of successes in n trials or in a random sample of size n. X~ Bin(n,p).
Point estimator = 𝒑 𝒏
𝒑𝒒
𝒑) = 𝝁𝒑ഥ = p, S.E.(ഥ
Note- E(ഥ 𝒑) =𝝈𝒑ഥ =
𝒏
𝒑𝒒 (ഥ
𝒑−𝒑)
Theorem- 𝑝ҧ ~ N (mean= 𝝁𝒑ഥ =p, S.E.(ഥ
𝒑) =𝝈𝒑ഥ = ) OR = Z if np≥5 and nq≥5.
𝒏 𝒑𝒒/𝒏

Testing p- 1) H0: p = p0 vs H1: p > p0 OR p < p0 OR p ≠ p0.


4) Reject/accept 𝐻0
ҧ 0)
(𝑝−𝑝 5) Conclusion
2) T.S.|H0 = = Z If np0 ≥ 5 and nq0 ≥ 5.
𝑝0 𝑞0 /𝑛
3) C.R.- For α = .05.
Testing Population Proportion p- Example
Ex- In a random sample of 100 families 60 are BPL. Can you say the population proportion BPL is 65% at α = .05?
𝑋
Ans.- np0 = 100* .65 = 65 ≥ 5 and nq0 = 35 ≥ 5. Can use Z-test statistic. 𝑝ҧ = 𝑛 = 60/100= .6.

1) H0: p= .65 vs H1: p ≠ .65. (Two tailed C.R.)


ҧ 0)
(𝑝−𝑝
2) T.S.|H0 = = Z = (.6 - .65)/ (.65 ∗ .35)/100 = - 1.05.
𝑝0 𝑞0 /𝑛
3) CR- 4) Accept H0.
5) The population BPL is 65%.
p-value = 2 times the area from - ∞ to – 1.05
= 2 * (.1469) = .2938.
Developing Null and Alternative Hypotheses
• Alternative Hypothesis as a Research Hypothesis
• Example:
A new teaching method is developed that is believed to be better than
the current method.
• Alternative Hypothesis: 𝑯𝟏 : 𝝁 > 𝝁𝟎
The new teaching method is better.
• Null Hypothesis:
The new method is no better than the old method.

13
Developing Null and Alternative Hypotheses
• Alternative Hypothesis as a Research Hypothesis
• Example:
A new sales force bonus plan is developed in an attempt to increase
sales.
• Alternative Hypothesis: 𝑯𝟏 : 𝝁 > 𝝁𝟎
The new bonus plan increase sales.
• Null Hypothesis:
The new bonus plan does not increase sales.

14
Developing Null and Alternative Hypotheses
• Alternative Hypothesis as a Research Hypothesis
• Example:
A new drug is developed with the goal of lowering blood pressure more
than the existing drug.
• Alternative Hypothesis: 𝑯𝟏 : 𝝁 > 𝝁𝟎
The new drug lowers blood pressure more than the existing drug.
• Null Hypothesis:
The new drug does not lower blood pressure more than the existing
drug.

15
Developing Null and Alternative Hypotheses
• Null Hypothesis as an Assumption to be Challenged
• Example:
The label on a soft drink bottle states that it contains 67.6 fluid ounces.
• Null Hypothesis:
The label is correct. m > 67.6 ounces.
• Alternative Hypothesis:
The label is incorrect. m < 67.6 ounces.

16
Summary of Forms for Null and Alternative Hypotheses
about a Population Mean
• The equality part of the hypotheses always appears in the null hypothesis.
• In general, a hypothesis test about the value of a population mean m must
take one of the following three forms (where m0 is the hypothesized value of
the population mean).
𝐻0 : 𝜇 ≥ 𝜇0 𝐻0 : 𝜇 ≤ 𝜇0 𝐻0 : 𝜇 = 𝜇0
𝐻𝑎 : 𝜇 < 𝜇0 𝐻𝑎 : 𝜇 > 𝜇0 𝐻𝑎 : 𝜇 ≠ 𝜇0

One-tailed One-tailed Two-tailed


(lower-tail) (upper-tail)

17
Null and Alternative Hypotheses
• Example: Metro EMS
A major west coast city provides one of the most comprehensive
emergency medical services in the world. Operating in a multiple hospital
system with approximately 20 mobile medical units, the service goal is to
respond to medical emergencies with a mean time of 12 minutes or less.
The director of medical services wants to formulate a hypothesis test that
could use a sample of emergency response times to determine whether or
not the service goal of 12 minutes or less is being achieved.

18
Null and Alternative Hypotheses
H0: m < 12 The emergency service is meeting the response goal;
no follow-up action is necessary.

Ha: m > 12 The emergency service is not meeting the response


goal; appropriate follow-up action is necessary.

where: m = mean response time for the population


of medical emergency requests

19
One-Tailed Tests About a Population Mean: s Known
• Example: Metro EMS
The response times for a random sample of 40 medical emergencies were
tabulated. The sample mean is 13.25 minutes. The population standard
deviation is believed to be 3.2 minutes.
The EMS director wants to perform a hypothesis test, with a .05 level of
significance, to determine whether the service goal of 12 minutes or less is
being achieved.

20
One-Tailed Tests About a Population Mean: s Known
• p -Value and Critical Value Approaches

1. Develop the hypotheses. H0: m < 12


Ha: m > 12 Right Tailed CR

2. Specify the level of significance. a = .05

3. Compute the value of the test statistic.


ҧ 0
𝑥−𝜇 13.25−12
𝑧= = = 2.47
𝜎Τ 𝑛 3.2/ 40

21
One-Tailed Tests About a Population Mean: s Known
• CR
Sampling
a = .05
Distribution of

p-value (p-Value < a,


=  so reject H0.)

𝑥ҧ − 𝜇0
𝑧=
𝜎Τ 𝑛
0 za = z=
1.645 2.47
• T.S. computed =2.47 > 1.645, we reject H0.
22
• We conclude Metro EMS is not meeting response goal of 12 minutes
One-Tailed Tests About a Population Mean: s Known
• p –Value Approach

Compute the p –value.

For z = 2.47, cumulative probability = .9932.


p-value = 1 - .9932 = .0068

Because p-value = .0068 < a = .05, we reject H0.

23
Two-Tailed Tests About a Population Mean: s Known
• Example: Glow Toothpaste
The production line for Glow toothpaste is designed to fill tubes with a
mean weight of 6 oz. Periodically, a sample of 30 tubes will be selected in
order to check the filling process.
Quality assurance procedures call for the continuation of the filling
process if the sample results are consistent with the assumption that the
mean filling weight for the population of toothpaste tubes is 6 oz.; otherwise
the process will be adjusted.

24
Two-Tailed Tests About a Population Mean: s Known
• Example: Glow Toothpaste
Assume that a sample of 30 toothpaste tubes provides a sample mean of
6.1 oz. The population standard deviation is believed to be 0.2 oz.
Perform a hypothesis test, at the .03 level of significance, to help
determine whether the filling process should continue operating or be
stopped and corrected.

25
Two-Tailed Tests About a Population Mean: s Known
• p –Value and Critical Value Approaches

1. Determine the hypotheses. 𝐻0 : 𝜇 = 6


𝐻𝑎 : 𝜇 ≠ 6 Two Tailed CR

2. Specify the level of significance. a = .03

3. Compute the value of the test statistic.


ҧ 0
𝑥−𝜇 6.1−6
𝑧= = = 2.74
𝜎Τ 𝑛 .2/ 30

26
Two-Tailed Tests About a Population Mean: s Known
• Critical Value Approach
Sampling
Distribution of
𝑥ҧ − 𝜇0
𝑧=
𝜎Τ 𝑛

Reject H0 Reject H0
a/2 = .015 a/2 = .015
Do Not Reject H0
z
2.74
-2.17 0 2.17

• |Z-computed|= 2.74 > 2.17 = z15 . So, we reject H0


• The mean filling weight of tooth-paste is not 6 ounces 27
Two-Tailed Tests About a Population Mean: s Known

Compute the p –value.


For z = 2.74, cumulative probability = .9969
p-value = 2(1 - .9969) = .0062

Because p-value = .0062 < a = .03, we reject H0.

28
Two-Tailed Tests About a Population Mean: s Known
• CR

1/2 1/2
p-value p-value
= .0031 = .0031

a/2 = a/2 =
.015 .015

z
z = -2.74 0 z = 2.74
-za/2=.015 = -2.17 za/2=.015 = 2.17

29
Confidence Interval Approach to
Two-Tailed Tests About a Population Mean
• The 97% confidence interval for m is
𝜎
𝑥ҧ ± 𝑧𝛼/2 = 6.1 ± 2.17 .2/ 30 = 6.1 ± .07924
𝑛
or 6.02076 to 6.17924
• Because the hypothesized value for the population mean, m0 = 6, is not in
this interval, the hypothesis-testing conclusion is that the null hypothesis,
H0: m = 6, can be rejected.

30
Tests About a Population Mean: s Unknown
• Test Statistic: ഥ − 𝝁𝟎
𝒙
𝒕=
𝒔Τ 𝒏
• This test statistic has a t distribution with n - 1 degrees of freedom.

31
Example: Highway Patrol
• One-Tailed Test About a Population Mean: s Unknown
A State Highway Patrol periodically samples vehicle speeds at various
locations on a particular roadway. The sample of vehicle speeds is used to
test the hypothesis H0: m < 65.
The locations where H0 is rejected are deemed the best locations for radar
traps. At Location F, a sample of 64 vehicles shows a mean speed of 66.2 mph
with a standard deviation of 4.2 mph. Use a = .05 to test the hypothesis.

32
One-Tailed Test About a Population Mean: s Unknown
• p –Value and Critical Value Approaches

1. Determine the hypotheses. H0: m < 65


Ha: m > 65 Right Tailed CR

2. Specify the level of significance. a = .05

3. Compute the value of the test statistic.

ഥ−𝝁𝟎
𝒙 𝟔𝟔.𝟐−𝟔𝟓
𝒕= = = 2.286
𝒔Τ 𝒏 𝟒.𝟐/ 𝟔𝟒

33
One-Tailed Test About a Population Mean: s Unknown

Reject H0
(a = 5)

p-value (p-Value < a ,


Do Not Reject H0 so reject H0.)
< .025

t 𝒕.𝟎𝟐𝟓,𝟔𝟑 =1.998
0 ta = t= 𝒕.𝟎𝟏,𝟔𝟑 =2.387
1.669 2.286

4. t-computed =2.286 > 1.669 = t.05,63 , we reject H0. 34


One-Tailed Test About a Population Mean: s Unknown

5. Mean speed at location F is more than 65 mph. Hence it is a good location for radar trap.

Compute the p –value.


𝑡.025,63 =1.998 < 2.286 < 𝑡.01,63 = 2.387
Hence, .01 < p–value < .025
Because p-value < a = .05, we reject H0.

35
Two-Tailed Test About a Population Proportion
• Example: National Safety Council (NSC)
For a Christmas and New Year’s week, the National Safety Council
estimated that 500 people would be killed and 25,000 injured on the nation’s
roads. The NSC claimed that 50% of the accidents would be caused by drunk
driving.
A sample of 120 accidents showed that 67 were caused by drunk driving.
Use these data to test the NSC’s claim with a = .05.

36
Two-Tailed Test About a Population Proportion
• p –Value and Critical Value Approaches
1. Determine the hypotheses. 𝑯𝟎 : 𝒑 = . 𝟓 𝐚𝐧𝐝 𝑯𝒂 : 𝒑 ≠ . 𝟓 Two Tailed CR

2. Specify the level of significance. a = .05

3. Compute the value of the test statistic. n𝑝0 =120*.5=60, n𝑞0 =60

𝑝0 1−𝑝0 .5 1−.5
𝜎𝑝ҧ = = = .045644
𝑛 120

67
ҧ 0
𝑝−𝑝 −.5
120
𝑻. 𝑺. = 𝒛 = = = 1.28
𝜎𝑝
ഥ .045644

37
4. CR-

0.025 0.025

1.28 T.S.= Z
CR -1.96 0 AR 1.96 CR

5. T.S. computed falls in the AR. Hence do not reject 𝐻0


6. NSC’s claim is correct.
Two-Tailed Test About a Population Proportion
• p-Value Approach
4. Compute the p -value.
For z = 1.28, cumulative probability = .8997
p-value = 2(1 - .8997) = .2006

5. Determine whether to reject H0.

Because p-value = .2006 > 0.05 = a ,


we cannot reject H0.

39
Comparison of Two Population Means 𝜇1 , 𝜇2 -Indep. samples
Ex- 1) Comparing population mean heights of two tribes Testing / comparing µ1 and µ2 for independent samples-
2) Mean income of two cities. Parameter of interest is (𝝁𝟏 - 𝝁𝟐 )
3) Mean wages of two firms. ഥ 𝟏 -𝑿
ഥ 𝟐 ) where 𝑋ത1 = ∑𝑋1 /𝑛1 , 𝑋ത2 = ∑𝑋2 /𝑛2 .
Point estimator = (𝑿
4) Mean weights before and after a weight reducing program.
5) Mean income before and after a poverty alleviation program. 𝜎12 𝜎22
ത ത ത ത
𝜇𝑋ത1−𝑋ത2 = E(𝑋1 - 𝑋2 ) = (𝜇1 - 𝜇2 ), 𝜎𝑋ത1−𝑋ത2 = S.E.(𝑋1 -𝑋2 ) = (𝑛 + 𝑛 )
1 2

Theorem- Let X11, …, 𝑋1𝑛1 be a random sample of size n1 from a N(µ1,σ1) and X21, …, 𝑋2𝑛2 be a random sample of size n2 from
3) CR
𝝈𝟐𝟏 𝝈𝟐𝟐
ഥ1 - 𝑿
N(µ2,σ2). Then [(𝑿 ഥ 2) – (𝝁𝟏 - 𝝁𝟐 )]/ ( + ) = Z.
𝒏 𝒏 𝟏 𝟐
𝝈𝟐𝟏 𝝈𝟐𝟐
ഥ1 - 𝑿
C.I. for (𝝁𝟏 - 𝝁𝟐 ) is (𝑿 ഥ 2) ∓ 𝒁𝜶/𝟐 (𝒏 + 𝒏 )
𝟏 𝟐

Testing (𝝁𝟏 - 𝝁𝟐 )-
1) H0: 𝜇1 - 𝜇2 = 0 vs H1: 𝜇1 - 𝜇2 > 0, 𝜇1 - 𝜇2 < 0, 𝜇1 - 𝜇2 ≠ 0

2
𝜎 𝜎 2
2) T.S.|H0 = [( 𝑋ത 1 - 𝑋ത 2 ) – ( µ1 - µ2)]/ (𝑛1 + 𝑛2 ) |H0 = Z.
1 2

4) Reject/accept 𝐻0
5) Conclusion
Comparison of Two Means (𝜇1 , 𝜇2 )- Independent Samples
Ex- A random sample size 25 from the population of heights oftribe1 with S.D. 4” yields mean height 68”, a random sample of
size 16 from the population of heights of tribe2 with S.D. 3 yields mean 65”. Can you say population mean height of tribe1 is
different from that of tribe2 at α = .05? Compute the p-value. 3) CR-
Ans.- 1) H0: µ1-µ2=0 vs 𝐻1 :𝜇1 -𝜇2 ≠0. (Two tailed C.R.)

2𝜎 2
𝜎
2)T.S.|H0 = [( 𝑋ത 1 - 𝑋ത 2 ) – ( µ1 - µ2)]/ (𝑛1 + 𝑛2 ) |H0
1 2

16 9
= Z = [(68- 65) – 0]/ (25 + 16) = 2.74

4) Reject H0.
Ex(modified)- Can you say the population mean height of the
5) The population mean height of tribe1 is different from that
two tribes differ by 2” at α=.05? Use same data.
of tribe2.
1) H0: µ1-µ2=2 vs 𝐻1 : 𝜇1 - 𝜇2 ≠2. (Two tailed C.R.)
p- value = 2 times the area from 2.74 to ∞. = 2* (1 - .9969)=
2 2
.0062. 𝜎 𝜎
2)T.S.|H0 = [( 𝑋ത 1 - 𝑋ത 2 ) – ( µ1 - µ2)]/ (𝑛1 + 𝑛2 ) |H0
1 2

16 9
= Z = [(68- 65) – 2]/ (25 + 16) = .91.
4) Accept 𝐻0 .
5) Yes
Comparison of means 𝜇1 , 𝜇2 – IndependentT.S.|𝐻
samples
= 0
Table for inference about two population means 𝝁𝟏 , 𝝁𝟐 - (𝝁𝟏 - 𝝁𝟐 ) (𝑃.𝐸.−𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑢𝑛𝑑𝑒𝑟𝐻0 )
𝑆.𝐸.𝑜𝑟 𝐸𝑠𝑡.𝑆.𝐸.𝑜𝑓 𝑡ℎ𝑒 𝑃.𝐸.

C.I. for
parameter is
P.E. ∓ (Table
value)(S.E. or
Est. S.E. of
P.E.)

Note- For normal populations with 𝝈𝟏 and 𝝈𝟐 unknown use t-distribution


Comparison of means in Independent Samples- Examples
Ex- A random sample of size 100 from city1 yields mean income16000 and S.D. 3000, a random sample of size 120 from city2
yields mean income 17000 and S.D. 4000. Can you say the population mean income of city1 is less than that of city2 at α=.05?
Ans.- n1 =100 ≥30 and n2=120≥30, so we can use Z- T.S. given in Sl. No. 2. Given 𝑋ത 1 =16000, S1= 3000, 𝑋ത 2=1700, S2=4000.
3) CR- 4) Reject H0.
1) H0:µ1-µ2=0 vs µ1-µ2 < 0. (Left tailed C.R.)
5) Population
𝑆 2 𝑆 2
2) T.S. |H0 = [( 𝑋ത 1 - 𝑋ത 2) – (µ1 - µ2)]/ (𝑛1 + 𝑛2 ) |H0 = Z mean income of
1 2
city1 is less than
30002 40002 that of city2.
= [(16000-17000) – 0]/ ( + ) = - 2.12.
100 120
𝝈𝟐𝟏 𝝈𝟐𝟐
ഥ 𝟏 -𝑿
6) 95% C.I. for (µ1 - 𝝁𝟐 ) is (𝑿 ഥ 𝟐 ) ∓𝒁𝜶/𝟐 ( + ) OR -1000∓𝟏. 𝟗𝟔 𝟒𝟕𝟏. 𝟔𝟗𝟖 OR -1000∓924.53 OR (-1924.53, -75.47)
𝒏 𝒏 𝟏 𝟐
Ex- Random samples of sizes 5 and 4 respectively from the populations of height of tribe1 and tribe2 are given. Tribe1- 64”,
66”, 68”,70”,72”; Tribe2- 62”, 64”, 66”, 68”. Can you say on an average tribe1 is taller than tribe2 at α=.05?
Ans.- n1=5<30, n2=4<30 and σ1, σ2 unknown. Can use t- T.S. 𝑋ത 1 = ∑X1/n1 = 68, 𝑋ത 2 = ∑X2/n2 = 65. 𝑆12 = ∑(𝑋1 − 𝑋ത 1)2/(n1-1) =
1) H0: µ1-µ2=0 vs H1:µ1-µ2>0. (Right tailed C.R.) 40/4 =10, 𝑆22 =∑(𝑋2 − 𝑋ത 2)2/(n2-1) = 20/3 =6.667.

2
𝑆 2
𝑆 2
2) T.S.|H0= [( 𝑋ത 1 - 𝑋ത 2) – (µ1 - µ2)]/ (𝑛1 + 𝑛2 ) =𝑡𝐾 𝑺1 2 𝑺2 2 𝟏𝟎 6.667 2
+ + 4
1 2 𝒏1 𝒏2 𝟓
K= 2 2 = 1 𝟏𝟎 2 1 6.667 2
=6.98≅7
1 𝑺1 2 1 𝑺2 2 + ( ቁ
10 6.667 +𝒏 −1( 𝒏 ൰ 5−1 𝟓 4−1 4
T.S. computed = [(68-65) – 0]/ ( + )= 3/1.915 = 1.567. 𝒏1 −1 𝒏1 2 2
5 4
Comparison of means- Independent samples

3) CR-

.05

AR CR
0
1.567 𝑡.05,7 =1.895

4) Accept H0.
5) The population average height of tribe1 is not more than that of
tribe2.

𝟐 𝟐
ഥ 𝟐 ) ∓𝒕.𝟎𝟐𝟓,𝟕 (𝑺𝟏 + 𝑺𝟐 ) OR 3 ∓ (2.365)(1.915) OR 3∓4.529 OR - 1.529 to 7.529.
ഥ 𝟏 -𝑿
95% C.I. for (µ1 - 𝝁𝟐 ) is [(𝑿
𝟏 𝒏 𝟐 𝒏
Comparison of 𝜇1 , 𝜇2 in Dependent (paired) Samples

ഥ 𝝁𝟎
𝒅−
2) T.S.|H0 = = tn-1 = t5-1=4.
𝑺𝒅 / 𝒏
ҧ
𝑑=∑d/n=20/5=4. 𝑆𝑑2 =(∑d2 - n𝑑ҧ 2)/(n-1) = (146-5*42)/4 =16.5.

T.S computed = (4 – 0)/ 16.5/5 = 2.20.

4) Reject 𝐻0 .
5) Program is effective.
Interval Estimation of 1 - 2: s 1 and s 2 Known
• Example: Par, Inc.
Par, Inc. is a manufacturer of golf equipment and has developed a new
golf ball that has been designed to provide “extra distance.”
In a test of driving distance using a mechanical driving device, a sample of
Par golf balls was compared with a sample of golf balls made by Rap, Ltd., a
competitor. The sample statistics appear on the next slide.

7
Interval Estimation of 1 - 2: s 1 and s 2 Known
• Example: Par, Inc.
Sample #1 Sample #2
Par, Inc. Rap, Ltd.

Sample Size 120 balls 80 balls


Sample Mean 295 yards 278 yards

Based on data from previous driving distance


tests, the two population standard deviations are
known with s 1 = 15 yards and s 2 = 20 yards.

8
Interval Estimation of 1 - 2: s 1 and s 2 Known
• Example: Par, Inc.

Let us develop a 95% confidence interval estimate of the difference


between the mean driving distances of the two brands of golf ball.

9
Point Estimate of 1 - 2
Point estimate of 1 - 2 = 𝑥1ҧ − 𝑥ҧ2 = 295 - 278
= 17 yards

where:
1 = mean distance for the population
of Par, Inc. golf balls
2 = mean distance for the population
of Rap, Ltd. golf balls

10
Interval Estimation of 1 - 2: s 1 and s 2 Known
𝜎1 2 𝜎2 2 (15)2 (20)2
𝑥1ҧ − 𝑥ҧ2 ± 𝑧𝛼/2 + = 17 ± 1.96 +
𝑛1 𝑛2 120 80

17 + 5.14 or 11.86 yards to 22.14 yards

We are 95% confident that the difference between


the mean driving distances of Par, Inc. balls and Rap,
Ltd. balls is 11.86 to 22.14 yards.

11
Hypothesis Tests About 1 - 2: s1 and s2 Known
• Example: Par, Inc.
Can we conclude, using a = .01, that the mean driving distance of Par, Inc.
golf balls is greater than the mean driving distance of Rap, Ltd. golf balls?

12
Hypothesis Tests About 1 - 2: s1 and s2 Known
• p –Value and Critical Value Approaches

1. Develop the hypotheses. H0: 1 - 2 < 0


Ha: 1 - 2 > 0 (Right-tailed CR)
where:
1 = mean distance for the population of Par, Inc. golf balls
2 = mean distance for the population of Rap, Ltd. golf balls

2. Specify the level of significance. a = .01

13
Hypothesis Tests About 1 - 2: s1 and s2 Known
Critical Value Approach
3. Compute the value of the test statistic.
𝑥ҧ1 −𝑥ҧ 2 −𝐷0
T.S.|H0 = =Z
(𝜎1 )2 (𝜎2 )2
𝑛1
+ 𝑛
2
295−278 −0 17
Z 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 = = = 6.49
(15)2 (20)2 2.62
120
+ 80

4. CR

Z-computed= 6.49 > 2.33 = z.01 . Hence, reject H0 .

5. We can conclude that the mean driving distance of Par golf ball is greater than
that of Rap golf ball. 14
Hypothesis Tests About 1 - 2: s1 and s2 Known
• p –Value Approach

4. Compute the p–value.


For z = 6.49, the p –value < .0001.

5. Determine whether to reject H0.


Because p–value < .01 =a , we reject H0.

15
Difference Between Two Population Means: s 1 and s 2
Unknown
• Example: Specific Motors
Specific Motors of Detroit has developed a new Automobile known as the
M car. 24 M cars and 28 J cars (from Japan) were road tested to compare
miles-per-gallon (mpg) performance. The sample statistics are shown on the
next slide.

16
Difference Between Two Population Means: s 1 and s 2
Unknown
• Example: Specific Motors
Sample #1 Sample #2
M Cars J Cars
24 cars 28 cars Sample Size
29.8 mpg 27.3 mpg Sample Mean
2.56 mpg 1.81 mpg Sample Std. Dev.

17
Difference Between Two Population Means: s 1 and s 2
Unknown
• Example: Specific Motors
Let us develop a 90% confidence interval estimate of the difference between
the mpg performances of the two models of automobile.

18
Point Estimate of 1 - 2
Point estimate of 1 - 2 = 𝑥1ҧ − 𝑥ҧ2 = 29.8 - 27.3 = 2.5 mpg

where:
1 = mean miles-per-gallon for the population of M cars
2 = mean miles-per-gallon for the population of J cars

19
Interval Estimation of 1 - 2: s1 and s2 Unknown
The degrees of freedom for ta/2 are:
2
(2.56)2 (1.81)2
+
24 28
𝑑𝑓 = 2 2 = 40.59 = 41
1 (2.56)2 1 (1.81)2
+
24 − 1 24 28 − 1 28

with a/2 = .05 and df = 41, 𝑡.05,41 = 1.683

20
Interval Estimation of 1 - 2: s1 and s2 Unknown

𝑠1 2 𝑠2 2
𝑥1ҧ − 𝑥ҧ2 ± 𝑡𝛼/2 +
𝑛1 𝑛2

(2.56)2 (1.81)2
29.8 − 27.3 ± 𝟏. 𝟔𝟖𝟑 +
24 28

2.5 + 1.051 or 1.449 to 3.551 mpg

We are 90% confident that the difference between


the miles-per-gallon performances of M cars and J cars
is 1.449 to 3.551 mpg.

21
Hypothesis Tests About 1 - 2: s1 and s2 Unknown
• Example: Specific Motors
Can we conclude, using a .05 level of significance, that the miles-per-gallon
(mpg) performance of M cars is greater than the miles-per-gallon performance
of J cars?

22
Hypothesis Tests About 1 - 2: s1 and s2 Unknown
• p –Value and Critical Value Approaches
1. Develop the hypotheses.
H0: 1 - 2 < 0
Ha: 1 - 2 > 0 (right-tailed test)

where:
1 = mean mpg for the population of M cars
2 = mean mpg for the population of J cars

23
Hypothesis Tests About 1 - 2: s1 and s2 Unknown
Critical Value Approach

2. Specify the level of significance. a = .05

3. Compute the value of the test statistic.

29.8 − 27.3 − 0
𝑡= = 4.003
(2.56)2 (1.81)2
+
24 28
4. CR
t- computed = 4.003 > 1.683 = 𝑡.05,41 . Hence reject H0 .

5. The mean mpg of M cars is more than that of J cars.


24
Hypothesis Tests About 1 - 2: s1 and s2 Unknown
• p –Value Approach
4. Compute the p –value.
𝑑𝑓 = 𝐾 = 41

Because t computed = 4.003 > t.05,41 = 1.683, the p–value < .05.

Hence, we reject H0

25
Inferences About the Difference Between Two Population Means:
Matched Samples
• Example: Express Deliveries
A Chicago-based firm has documents that must be quickly distributed to
district offices throughout the U.S. The firm must decide between two
delivery services, UPX (United Parcel Express) and INTEX (International
Express), to transport its documents.

26
Inferences About the Difference Between Two Population Means:
Matched Samples
• Example: Express Deliveries
In testing the delivery times of the two services, the firm sent two reports
to a random sample of its district offices with one report carried by UPX and
the other report carried by INTEX. Do the data on the next slide indicate a
difference in mean delivery times for the two services? Use a .05 level of
significance.

27
Inferences About the Difference Between Two Population Means:
Matched Samples
Delivery Time (Hours)
District Office UPX INTEX Difference
Seattle 32 25 7
Los Angeles 30 24 6
Boston 19 15 4
Cleveland 16 15 1
New York 15 13 2
Houston 18 15 3
Atlanta 14 15 -1
St. Louis 10 8 2
Milwaukee 7 9 -2
Denver 16 11 5

28
Inferences About the Difference Between Two Population Means:
Matched Samples
• p –Value and Critical Value Approaches
1. Develop the hypotheses.
H0: d=U-I = 0
Ha: d=U-I   (Two-tailed test)

Let d = the mean of the difference values for the


two delivery services for the population
of district offices

29
Inferences About the Difference Between Two Population Means:
Matched Samples
• Critical Value Approach
2. Specify the level of significance. a = .05
3. Compute the value of the test statistic.
ത 𝑑
𝑑−𝜇 σ 𝑑𝑖 (7+6+⋯+5) σ 𝑑𝑖 −𝑑ത 2 76.1
T.S.|H0 =
𝑠𝑑 / 𝑛
= t10-1=9 𝑑ҧ = = = 2.7, 𝑠𝑑 = 𝑛−1
=
9
= 2.9
𝑛 10
.
2.7−0
t-computed= = 2.94
2.9/ 10

4. CV and CR

|t-computed|= 2.94 > 2.262 = t.025,9 . Hence, reject H0 .

5. The mean delivery time of UPX and NTX are different.


305
Inferences About the Difference Between Two Population Means:
Matched Samples
• p –Value Approach

Compute the p –value.


𝑡.01,9 =2.821 < 2.94 < 𝑡.005,9 = 3.250,

For t = 2.94 and df = 9, the p–value is between .02 and .01.


(This is a two-tailed test, so we double the upper-tail areas of
.01 and .005.)

Because p–value < a = .05, we reject H0.

31
Inference about population variance 𝜎 2
(𝑛−1)𝑆 2
Ex- 1) Population variation of height of a tribe .95=P[𝜒 2 .975,𝑛−1 ≤ ≤ 𝜒 2 .025,𝑛−1 ]
𝜎2
Parameter = σ2.
Point estimator = S2= σ𝑛𝑖=1(𝑋𝑖 -𝑋)
ത 2/(n-1) .

(𝒏−𝟏)𝑺𝟐
Thm- = 𝝌𝟐 𝒏−𝟏
𝝈𝟐

Ex- A random sample of size 4 from the population of


height yields the following observations: 64,66,68,70.
i) Construct a 95% C.I. for the population variance.
ii) Find its width.
iii) Can you say the population S.D. of height is 5”?
Notation- 𝝌𝟐 𝜶,𝒌 2
i) LCL = ((n-1)S2/ 𝜒.025,𝑛−1=4−1 = (4 – 1)(20/3)/9.348 = 2.139.
ത 67, S2 = σ𝑛𝑖=1(𝑋-𝑋)
Ans.- n=4, 𝑋= ത 2/(n-1) = 20/3. 2
UCL = (n-1)S2/ 𝜒.975,𝑛−1=3 = 20/.216 = 92.593.
(𝑛−1)𝑆 2 (𝑛−1)𝑆 2
(1-𝛼)100% C.I. for 𝜎2 is to 𝜒2 i) Width = 92.593 – 2.139 = 90.454.
𝜒2 𝛼/2,(𝑛−1) 1−𝛼/2,(𝑛−1)
ii) σ2 = (5)2 =25 ∈ C.I.. This implies population S.D. can assumed to be 5”.
Testing of hypothesis about population variance 𝜎 2
1) H0: σ2 = 𝜎02 vs H1: σ2 > 𝜎02 , σ2 < 𝜎02 , σ2 ≠ 𝜎02 .
2
2) T.S.|H0 = (n-1)S2/σ2 = 𝜒𝑛−1 .
3) CR-

4) Reject/accept 𝐻0

5) Conclusion w.r.t the question

Ex- (continued) A random sample of size 4 from the population


of height yields the following observations: 64,66,68,70. Can
you say the population S.D. of height is 5” at α = .05?
Ans.-
1) H0: σ2 = 25 vs H1: σ2 ≠ 25. (Two tailed C.R.)
2
2) T.S.|H0 = (n-1)S2/σ2 = 𝜒𝑛−1=4−1=3 = 20/25 =.8
Inferences About a Population Variance
• A variance can provide important decision-making information.
• Consider the production process of filling containers with a liquid detergent
product.
• The mean filling weight is important, but also is the variance of the filling
weights.
• By selecting a sample of containers, we can compute a sample variance for the
amount of detergent placed in a container.
• If the sample variance is excessive, overfilling and underfilling may be occurring
even though the mean is correct.

• 4
Examples of Sampling Distribution of (n - 1)s2/ 2

With 2 degrees
of freedom

With 5 degrees
of freedom

With 10 degrees
of freedom

(𝑛 − 1)𝑠 2
𝜎2
0

5
Interval Estimation of  2
• Example: Buyer’s Digest (A)
Buyer’s Digest rates thermostats manufactured for home temperature
control. In a recent test, 10 thermostats manufactured by ThermoRite
were selected and placed in a test room that was maintained at a
temperature of 68oF. The temperature readings of the ten
thermostats are shown on the next slide.

6
Interval Estimation of  2
• Example: Buyer’s Digest (A)
We will use the 10 readings below to develop a 95% confidence interval
estimate of the population variance.

Thermostat 1 2 3 4 5 6 7 8 9 10

Temperature 67.4 67.8 68.2 69.3 69.5 67.0 68.1 68.6 67.9 67.2

7
Interval Estimation of  2
For n - 1 = 10 - 1 = 9 d.f. and a = .05
Selected Values from the Chi-Square Distribution Table
Degrees Area in Upper Tail
of Freedom .99 .975 .95 .90 .10 .05 .025 .01
5 0.554 0.831 1.145 1.610 9.236 11.070 12.832 15.086
6 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812
7 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475
8 1.647 2.180 2.733 3.490 13.362 15.507 17.535 20.090
9 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666

10 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209

8
Interval Estimation of  2
• Sample variance s2 provides a point estimate of  2.
σ 𝑥𝑖 − 𝑥ҧ 2 6.3
2
𝑠 = = = .70
𝑛−1 9
• A 95% confidence interval for the population variance is given by:

10 − 1 . 70 10 − 1 . 70 𝝌𝟐 .𝟎𝟐𝟓,𝟗 = 19.02
2
≤𝜎 ≤ 𝝌𝟐 .𝟗𝟕𝟓,𝟗 = 2.70
19.02 2.70

.33 <  2 < 2.33

9
Hypothesis Testing About a Population Variance
• Example: Buyer’s Digest (B)
Recall that Buyer’s Digest is rating ThermoRite thermostats. Buyer’s
Digest gives an “acceptable” rating to a thermostat with a temperature
variance of 0.5 or less.
We will conduct a hypothesis test (with a = .10) to determine
whether the ThermoRite thermostat’s temperature variance is
“acceptable”.

10
Hypothesis Testing About a Population Variance
• Example: Buyer’s Digest (B)
Using the 10 readings, we will conduct a hypothesis test (with a = .10) to
determine whether the ThermoRite thermostat’s temperature variance is
“acceptable”.

Thermostat 1 2 3 4 5 6 7 8 9 10
Temperature 67.4 67.8 68.2 69.3 69.5 67.0 68.1 68.6 67.9 67.2

11
Hypothesis Testing About a Population Variance
• Hypotheses: H0:  2 < 0.5
Ha:  2 > 0.5 (right-tailed test)

2
• T.S.|H0 = (n-1)S2/σ2 = 𝜒𝑛−1=10−1=9

2 (𝑛−1)𝑠 2 9𝑠 2 9(0.7)
𝜒 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 = = = = 12.6
𝜎2 .5 .5
• CV and CR
2
𝜒 2 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 = 12.6 < 14.684 = 𝜒.10,9 = CV . Hence do not reject H0

• We can conclude ThermoRite thermostat’s temperature variance is acceptable.

12
Hypothesis Testing About a Population Variance
For n - 1 = 10 - 1 = 9 d.f. and a = .10
Selected Values from the Chi-Square Distribution Table
Degrees Area in Upper Tail
of Freedom .99 .975 .95 .90 .10 .05 .025 .01
5 0.554 0.831 1.145 1.610 9.236 11.070 12.832 15.086
6 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812
7 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475
8 1.647 2.180 2.733 3.490 13.362 15.507 17.535 20.090
9 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666

10 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209

13
Hypothesis Testing About a Population Variance
• Rejection Region

Area in Upper
Tail = .10

2
2
0 12.6 14.684= 𝜒 .10,9

Reject H0

14
Hypothesis Testing About a Population Variance
• Using the p-Value
• The rejection region for the ThermoRite thermostat example is in the upper
tail; thus, the appropriate p-value is less than .90 ( 2 = 4.168) and greater
than .10 ( 2 = 14.684).
• Because the p –value > a = .10, we cannot reject the null hypothesis.

(The exact p-value is .18156.)

15
Inferences About Two Population Variances
• We may want to compare the variances in:
• product quality resulting from two different production processes,
• temperatures for two heating devices, or
• assembly times for two assembly methods.
• We use data collected from two independent random sample, one from
population 1 and another from population 2.
• The two sample variances will be the basis for making inferences about the two
population variances.

16
2 2
Comparing two population variances 𝜎1 , 𝜎2
Ex- Population variation of heights of two tribes. Variation of weights of two tribes. Variation of income of two towns.

𝝈𝟐𝟏 𝑺𝟐𝟏
Parameter of interest = . Point estimator = .
𝝈𝟐𝟐 𝑺𝟐𝟐
Theorem-

𝑺𝟐𝟏 𝝈𝟐𝟏
( 𝟐 )/( 𝟐) = 𝑭𝒏𝟏−𝟏,𝒏 .
𝑺𝟐 𝝈𝟐 𝟐 −𝟏

Testing 𝝈𝟐𝟏, 𝝈𝟐𝟐 – (α = .05) Notation- 𝑭𝜶,(𝒌𝟏,𝒌𝟐 )


1) H0: 𝜎12 /𝜎22 =1 vs H1:𝜎12 /𝜎22 >1, 𝜎12 /𝜎22 <1, 𝜎12 /𝜎22 ≠1

𝑺𝟐𝟏 𝝈𝟐𝟏
2) T.S.|H0 = ( 𝟐 )/( 𝟐) = 𝑭𝒏𝟏−𝟏,𝒏 .
𝑺𝟐 𝝈𝟐 𝟐 −𝟏
2 2
Comparing two population variances 𝜎1 , 𝜎2
Ex- Random samples of sizes 5 and 4 respectively from the 3) CR-
populations of height of tribe1 and tribe2 are given. Tribe1-
64”, 66”, 68”,70”,72”; Tribe2- 62”, 64”, 66”, 68”. Can you say
the population variance of the heights of tribe1 is the same
as that of tribe2 at α=.02?
Ans.- Already computed- 𝑋ത 1 = ∑X1/n1 = 68, 𝑋ത 2 = ∑X2/n2 = 65.
𝑆12 = ∑(𝑋1 − 𝑋ത 1)2/(n1-1) = 40/4 =10, 𝑆22 =∑(𝑋2 − 𝑋ത 2)2/(n2-1) =
20/3 =6.667
1) H0: 𝜎12 = 𝜎22 vs H1: 𝜎12 ≠ 𝜎22 . (Two tailed C.R.) Note- 𝑭𝜶,(𝑲𝟏 ,𝑲𝟐 ) = 1/𝑭𝟏−𝜶,(𝑲𝟐 ,𝑲𝟏 )
𝑺𝟐𝟏 𝝈𝟐𝟏 4) Accept H0
2) T.S.|H0= ( 𝟐 )/( 𝟐) |H0 = 𝑭𝒏𝟏−𝟏,𝒏 =𝟒,𝟑 = (10/6.667)/1 = 1.5.
𝑺𝟐 𝝈𝟐 𝟐 −𝟏
5) Variation of heights same.

𝝈𝟐𝟏 𝑺𝟐𝟏 𝑺𝟐𝟏


C.I. for - LCL = ( 𝟐 )/ 𝑭𝜶,𝒏𝟏 −𝟏,𝒏𝟐 −𝟏 , UCL = ( 𝟐 )/ 𝑭𝟏−𝜶,𝒏𝟏 −𝟏,𝒏𝟐−𝟏
𝝈𝟐𝟐 𝑺𝟐 𝟐 𝑺𝟐 𝟐 𝑺𝟐𝟏 10
Ans.- LCL = ( 𝟐 )/ 𝑭𝜶=.𝟎𝟏,𝒏𝟏−𝟏,𝒏𝟐 −𝟏 = 6.667 /28.7 = 0.052
𝑺𝟐
𝜎1 2 𝟐
Ex (Continued)- For the height data, find the 98% C.I. for . 𝑺 𝟐
10
𝜎2 2 UCL = ( 𝟏𝟐 )/ 𝑭𝟏−𝜶,=.𝟗𝟗,𝒏𝟏 −𝟏,𝒏𝟐−𝟏= 6.667 /0.06 = 25
𝑺𝟐 𝟐
Hypothesis Testing About the Variances of Two Populations
• Example: Buyer’s Digest (C)
Buyer’s Digest has conducted the same test, as was described earlier, on
another 10 thermostats, this time manufactured by TempKing. The temperature
readings of the ten thermostats are listed on the next slide.
We will conduct a hypothesis test with a = .10 to see if the variances are equal
for ThermoRite’s thermostats and TempKing’s thermostats.

21
Hypothesis Testing About the Variances of Two Populations
• Example: Buyer’s Digest (C)
ThermoRite Sample

Thermostat 1 2 3 4 5 6 7 8 9 10
Temperature 67.4 67.8 68.2 69.3 69.5 67.0 68.1 68.6 67.9 67.2

TempKing Sample

Thermostat 1 2 3 4 5 6 7 8 9 10
Temperature 67.7 66.4 69.2 70.1 69.5 69.7 68.1 66.6 67.3 67.5

22
Hypothesis Testing About the Variances of Two Populations
• Hypotheses
𝐻0 : 𝜎12 = 𝜎22 (TempKing and ThermoRite thermostats
have the same temperature variance)
𝐻𝑎 : 𝜎12 ≠ 𝜎22 (Their variances are not equal) (Two-tailed test)
• Test statistic
𝑺𝟐𝟏 𝝈𝟐𝟏 TempKing’s sample variance is 1.768
T.S.|H0 = ( 𝟐 )/( 𝟐 ) |H0 = 𝑭𝒏𝟏−𝟏,𝒏 =𝟗,𝟗 ThermoRite’s sample variance is .700
𝑺 𝟐 𝝈𝟐 𝟐 −𝟏

𝑠12
F-computed = ൗ𝑠2 = 1.768Τ.700 =2.53
2

• CV and CR
F-computed = 2.53 < 3.18 = F.05,(9,9) . Hence, do not reject 𝐻0 .

• Thermostats of TempKing and ThermoRite have same variation of temperature.23


Hypothesis Testing About the Variances of Two Populations
Selected Values from the F Distribution Table

Denominator Area in
Degrees Upper Numerator Degrees of Freedom
of Freedom Tail 7 8 9 10 15
8 .10 2.62 2.59 2.56 2.54 2.46
.05 3.50 3.44 3.39 3.35 3.22
.025 4.53 4.43 4.36 4.30 4.10
.01 6.18 6.03 5.91 5.81 5.52

9 .10 2.51 2.47 2.44 2.42 2.34


.05 3.29 3.23 3.18 3.14 3.01
.025 4.20 4.10 4.03 3.96 3.77
.01 5.61 5.47 5.35 5.26 4.96

24
Hypothesis Testing About the Variances of Two Populations
• Determining and Using the p-Value
Area in Upper Tail .10 .05 .025 .01
F Value (df1 = 9, df2 = 9) 2.44 3.18 4.03 5.35

• Because F = 2.53 is between 2.44 and 3.18, the area in the upper tail of
the distribution is between .10 and .05.
• But this is a two-tailed test; after doubling the upper-tail area, the p-
value is between .20 and .10.
• Because a = .10, we have p-value > a and therefore we cannot reject
the null hypothesis.

25
Comparison of two population proportions 𝑝1 , 𝑝2
Ex- BPL of two villages
Ex- BPL of two villages. In a random sample of n1=100
Parameter of interest = 𝒑𝟏 - 𝒑𝟐 ,
households X1=60 are BPL in village1, in a random sample
Point estimator = 𝒑 ഥ𝟏 - 𝒑ഥ𝟐 .
of n2=80 households from village2 X2=36 are BPL. 𝑝ҧ 1=
𝒑ഥ𝟏 = 𝑿𝟏 /𝒏𝟏 , 𝒑 ഥ𝟐 = 𝑿𝟐 /𝒏𝟐 .
60/100 =.6, 𝑝ҧ 2= 36/80=.45.
𝑋1 ~ Bin(𝑛1 ,𝑝1 ) and 𝑋2 ~ Bin(𝑛2 ,𝑝2 ).
𝜇𝑝ҧ1 −𝑝ҧ2 = E(𝑝1ҧ - 𝑝ҧ2 ) = 𝑝1 -𝑝2 , 𝒑𝟏 𝒒𝟏 𝒑𝟐 𝒒𝟐
ഥ1 - 𝒑
Thm.- [( 𝒑 ഥ2) – (p1 - p2)]/ ( + )= Z, provided n1p1,n1q1,n2p2,n2q2 ≥ 5.
𝑝1 𝑞1 𝑝2 𝑞2 𝒏𝟏 𝒏𝟐
S.E. (𝑝1ҧ - 𝑝ҧ2 ) = 𝜎𝑝ҧ1−𝑝ҧ2 = ( + )
𝑛1 𝑛2
ഥ𝟏 𝒒
𝒑 ഥ𝟏 ഥ𝟐 𝒒
𝒑 ഥ𝟐
95% C.I. for (𝒑𝟏 - 𝒑𝟐 ) is (ഥ ഥ𝟐 ) ∓𝒁𝜶/𝟐 (
𝒑𝟏 - 𝒑 + ), ഥ𝟏 , 𝒏𝟏 𝒒
provided 𝒏𝟏 𝒑 ഥ𝟏 ,
𝒏𝟏 𝒏𝟐
ഥ𝟐 , 𝒏𝟐 𝒒
𝒏𝟐 𝒑 ഥ𝟐 ≥ 5.
3) CR-
Testing of hypothesis for p1,p2-
1) H0: p1-p2 = 0 vs H1: p1-p2 > 0, <0, ≠ 0.
1 1
2) T.S.|H0 = [( 𝑝ҧ 1 - 𝑝ҧ 2) – (p1 - p2)]/ 𝑝ҧ 𝑞(
ത 𝑛 + 𝑛 ) = Z, provided
1 2

𝑛1 𝑝1ҧ , 𝑛1 𝑞ത1 , 𝑛2 𝑝ҧ2 , 𝑛2 𝑞ത2 ≥ 5. where 𝑝ҧ = pooled sample


𝑋 +𝑋
proportion of success = 𝑛1+ 𝑛2 , 𝑞ത = 1 - 𝑝.ҧ
1 2
Comparison of two population proportions- 𝑝1 , 𝑝2

Ex- In a random sample of 100 households from village-1 60 are BPL, in a random sample of 80 households from village-2 36
are BPL. Can you say population proportion BPL for village1 is more than that of village 2 at α = .05? Construct a 95% C.I. for
(𝒑𝟏 - 𝒑𝟐 ).
Ans.- 𝑛1 𝑝1ҧ = 60 ≥5, 𝑛1 𝑞ത1 = 40 ≥ 5, 𝑛2 𝑝ҧ2 = 36 ≥ 5, 𝑛2 𝑞ത2 = 44 ≥ 5. Use Z- T.S..
1) H0: p1 – p2 = 0 vs H1: p1 – p2 > 0 (Right tailed C.R.)
1 1
2) T.S.|H0 = [( 𝑝ҧ 1 - 𝑝ҧ 2) – (p1 - p2)]/ 𝑝ҧ 𝑞(
ത + )= Z.
𝑛1 𝑛2
𝑝ҧ 1= 60/100 =.6, 𝑝ҧ 2= 36/80=.45, 𝑝ҧ = (60+36)/(100+80) = .533.
1 1
T.S. computed = (.60 - .45)/ .533 (.467) (100 + 80) = .15/.051 = 2.93.
Comparison of two population proportions
4) Reject H0.
5) The population proportion BPL of village 1 is more than that of village 2.
P-value- Area from Z=2.93 to ∞ under Z-curve = 1 - .9983 = 0.0017.

ഥ𝟏 𝒒
𝒑 ഥ𝟏 ഥ𝟐 𝒒
𝒑 ഥ𝟐 .6∗.4 .45∗.55
95% C.I. for (𝒑𝟏 - 𝒑𝟐 ) is (ഥ ഥ𝟐 ) ∓𝒁𝜶/𝟐 (
𝒑𝟏 - 𝒑 + ) OR .15 ∓1.96 + OR .15 ∓1.96(.074) OR .15 ∓.14
𝒏𝟏 𝒏𝟐 100 80

OR .01 to .29.
Interval Estimation of p1 - p2
• Example: Market Research Associates
Market Research Associates is conducting research to evaluate the
effectiveness of a client’s new advertising campaign. Before the new
campaign began, a telephone survey of 150 households in the test market
area showed 60 households “aware” of the client’s product.
The new campaign has been initiated with TV and newspaper
advertisements running for three weeks.

4
Interval Estimation of p1 - p2
• Example: Market Research Associates
A survey conducted immediately after the new campaign showed 120 of
250 households “aware” of the client’s product. Does the data support the
position that the advertising campaign has provided an increased awareness
of the client’s product?

5
Point Estimator of the Difference Between
Two Population Proportions
p1 = proportion of the population of households “aware” of the product after
the new campaign
p2 = proportion of the population of households “aware” of the product before
the new campaign
𝑝1ҧ = sample proportion of households “aware” of the product after the new
campaign
𝑝ҧ2 = sample proportion of households “aware” of the product before the new
campaign
120 60
ഥ𝟏 − 𝒑
𝒑 ഥ𝟐 = − = .48 − .40 = . 𝟎𝟖
250 150

6
Interval Estimation of p1 - p2
For  = .05, z.025 = 1.96:

.48(.52) .40(.60)
.48 − .40 ± 1.96 +
250 150
.08 + 1.96(.0510)
.08 + .10

Hence, the 95% confidence interval for the difference in


after and before awareness of the product is -.02 to +.18.

7
Hypothesis Tests about p1 - p2
• Example: Market Research Associates
Can we conclude, using a .05 level of significance, that the proportion of
households aware of the client’s product increased after the new advertising
campaign?

8
Hypothesis Tests about p1 - p2
• p -Value and Critical Value Approaches

1. Develop the hypotheses. H0: p1 - p2 < 0


H a : p1 - p2 > 0 (Right-tailed test)
p1 = proportion of the population of households
“aware” of the product after the new campaign
p2 = proportion of the population of households
“aware” of the product before the new campaign

9
Hypothesis Tests about p1 - p2
• Critical Value Approach
2. Compute the value of the test statistic.
250 .48 +150(.40) 180
𝑝ҧ = = = .45
250+150 400
1 1
𝑠𝑝ҧ1−𝑝ҧ2 = .45(.55) + = .0514
250 150

1 1
T.S.|H0 = [( 𝑝ҧ 1 - 𝑝ҧ 2) – (p1 - p2)]/ 𝑝ҧ 𝑞(
ത 𝑛 +𝑛 )=Z
1 2

.48−.40 −0 .08
Z−𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 = = = 1.56
.0514 .0514

3. Z-computed =1.56 < 1.645. Hence, do not reject H0  = .05

4. We can not conclude that the proportion of households aware of the client’s
product increased after the new campaign. 10
Hypothesis Tests about p1 - p2
• p –Value Approach
Compute the p –value.
For z = 1.56, the p–value = .0594

Because p–value >  = .05, we cannot reject H0.

11
Chi-square tests for Categorical data
1) Test of goodness of fit. (𝑶−𝑬)𝟐
T.S.|H0 = ∑ 𝑬 = 𝝌𝟐𝑲−𝟏 .
2) Test of independence.
3) Testing homogeneity of several populations.
Testing goodness of fit-
It is used to test whether or not the given probability
distribution or frequency distribution of several categories
(Multinomial population) is in agreement with the
empirical/experimental probability distribution or frequency
distribution of these categories. We have already done it for 𝑬𝒊 = n*(𝒑𝒊 |𝑯𝟎 )
two categories in binomial distribution (testing population
Categories Observed Expected/theoretical (O-E)2/E
proportion of success). We start with an example of testing Frequency (Oi or fi) Frequency (Ei) =n*pi
fairness of a die. H0: The die is a fair one. OR
1 70 600*1/6=100 900/100

2 102 100 4/100

3 95 100 25/100

4 105 100 25/100

5 98 100 4/100

6 130 100 900/100

Total 600 600 1858/100


Chi-square test of independence
Testing independence of attributes- We will explain by taking
an example. We want to test whether or not the two
attributes Drinking habit and Smoking habit are independent.
1) H0: Drinking habit and Smoking habit are independent.
H1: Not independent

Note-
1) If the theoretical or expected frequency of a category is less
than 5, then the category can be clubbed/merged with the
adjacent (preceding or succeeding) category(s) such that the
40∗45 𝑹𝑻∗𝑪𝑻
E(SD) = P(SD)|H0*n =(40/100)(45/100)*100= = frequency of the new merged category is at least 5.
100 𝒏
2) In the fitting of a theoretical probability distribution to a
(𝑶−𝑬)𝟐
T.S.|H0 = ∑ 𝑬 = 𝝌𝟐𝒓−𝟏 𝒄−𝟏 = 𝟐−𝟏 𝟐−𝟏 =𝟏 where given empirical data, the d.f. of the χ2-test statistic = K – 1 - ẟ,
r = no. of rows (no. of categories of the 1st attribute) and where K = number of categories after merger (if any), ẟ =
c = no. of columns (no. of categories of the 2nd attribute). number of parameters of the theoretical probability
distribution to be estimated from the sample data to
T.S. computed = (25-18)2/18+(15-22)2/22+(20-
compute expected frequency 𝐸𝑖 .
27)2/27+(40-33)2/33 = 8.25
Chi-square test of goodness of fit
Ex- The number of accidents occurring at an intersection
during last 100 months is given below. Can you say the
data follow a Poisson distribution at α = .05?

Ex-(modified) For the accident data, can you say the data
Ans.-1) H0: The data follow a Poisson distribution. follows a Poisson distribution with mean 3.3 at 𝛼=0.05?
Ans.-
H1: The data doesn’t follow Poisson distribution
1) H0: The data follow a Poisson distribution with mean 3.3
H1: The data doesn’t follow Poisson distribution
2
2) T.S.|H0 = ∑(O – E)2/E = 𝜒𝐾−1−𝛿= 8−1 −1−0=6 = 3.14
No. of accidents (X) No. of months (f or P|H0 E|H0=n*p|H0 (O-E)2/E
O)

0 4 .0369 3.69
1 7 11 .1217 12.17 15.86 1.4893
2 20 .2008 20.08 0.0032
3 28 .2209 22.09 1.5812
4 18 .1823 18.23 0.0029
5 12 .1203 12.03 0.0000

𝜆መ = 𝜇Ƹ = 𝑋ത = ∑Xf/n = 334/100 = 3.34 ≅ 3.3. 6 6 .0662 6.62 0.0581

2 ≥7 5 .0509 5.09 0.0016


T.S.|H0 = ∑(O – E)2/E = 𝜒𝐾−1−𝛿= 8−1 −1−1=5 = 3.14 Total 100 = n 1 100 3.1363
Fitting Normal distribution
3) CR

0.05

0 AR 3.14 𝜒 2 .05,6 =12.592 CR

4) Accept 𝐻0
5) Data follows a Poisson distribution with mean 3.3
Testing Homogeneity of several populations
Testing Homogeneity of several populations- 1) 𝑯 : 𝒑 = 𝒑 = 𝒑
𝟎 𝑨 𝑩 𝑪
It means homogeneity of several populations 𝐻1 : Above not true.
with respect to a particular characteristic. 𝑶−𝑬 𝟐 𝟐
Ex- 2) 𝑻.S.|𝑯 𝟎 = σ [ ] = 𝝌 𝒓−𝟏 𝒄−𝟏 =𝑲−𝟏=𝟐 .
𝑬
Want to test the homogeneity of three types Where, E = 𝑹𝑻∗𝑪𝑻, r = No. of rows, c = No. of columns. K = No. of populations
𝑮𝑻
of seeds with respect to their germinating
power.
100 seeds from Variety A, 120 from Variety B,
80 from Variety C are selected at random.
They are put to germinate. Numbers
germinated from each variety are given below.

(90−78.67)2 (10−21.33)2 (90−94.40)2 (30−25.60)2


T.S. computed= 78.67 + 21.33 + + +
94.40 25.60
(56−62.93)2 (24−17.07)2
+ 17.07 =12.188
62.93
Testing Homogeneity of several populations
4) Reject 𝐻0 .
5) The populations are different with
respect to the germinating power.
𝝌𝟐 .𝟎𝟓,𝟐
We can investigate further to know which pairs of populations are different using multiple comparison procedure where we
compare the absolute value of the difference between a pair of sample proportions of success with the Marascuilo pairwise
comparison critical value as shown below.
𝑝𝐴ҧ = 90/100=.9, 𝑝ҧ𝐵 = 90/120=.75, 𝑝ҧ𝐶 = 56/80= .7. Comparison between Variety A and Variety B- |𝑝𝐴ҧ - 𝑝ҧ𝐵 |
= .15 > .121 ⇒ Difference between A and B
|𝑝𝐴ҧ - 𝑝ҧ𝐵 | = .15, |𝑝𝐴ҧ - 𝑝ҧ𝐶 | = .2, |𝑝ҧ𝐵 - 𝑝ҧ𝐶 | = .05.
Comparison between Variety A and Variety C- |𝑝𝐴ҧ - 𝑝ҧ𝐶 |
ഥ𝒊 (𝟏−ഥ
𝒑 𝒑𝒊 ) ഥ𝒋 (𝟏−ഥ
𝒑 𝒑𝒋 ) = .2 > .142 ⇒ Difference between A and C
Mascuilo Critical Value = 𝑪𝑽𝒊𝒋 = 𝝌 𝜶,𝑲−𝟏 𝟐 + 𝒏
𝒏𝒊 𝒋 Comparison between Variety B and Variety C- |𝑝ҧ𝐵 - 𝑝ҧ𝐶 |
= .05 < .158 ⇒ No difference between B and C.
.𝟗∗.𝟏 .𝟕𝟓∗.𝟐𝟓
For Variety A and variety B = 𝑪𝑽𝑨𝑩 = 5.991 + = .121
𝟏𝟎𝟎 𝟏𝟐𝟎

.𝟗∗.𝟏 .𝟕∗.𝟑
For Variety A and variety C = 𝑪𝑽𝑨𝑪 = 5.991 + = .142
𝟏𝟎𝟎 𝟖𝟎

.𝟕𝟓∗.𝟐𝟓 .𝟕∗.𝟑
For Variety B and variety C = 𝑪𝑽𝑩𝑪 = 5.991 + = .158
𝟏𝟐𝟎 𝟖𝟎
Multinomial Distribution Goodness of Fit Test
• Example: Finger Lakes Homes (A)

Finger Lakes Homes manufactures four models of prefabricated homes, a two-story colonial, a log
cabin, a split-level, and an A-frame. To help in production planning, management would like to
determine if previous customer purchases indicate that there is a preference in the style selected.

7
Multinomial Distribution Goodness of Fit Test
• Example: Finger Lakes Homes (A)

The number of homes sold of each model for 100 sales over the past two years is shown below.

Split- A-
Model Colonial Log Level Frame
# Sold 30 20 35 15

8
Multinomial Distribution Goodness of Fit Test
• Hypotheses

H0: pC = pL = pS = pA = .25

Ha: The population proportions are not


pC = .25, pL = .25, pS = .25, and pA = .25

where:
pC = population proportion that purchase a colonial
pL = population proportion that purchase a log cabin
pS = population proportion that purchase a split-level
pA = population proportion that purchase an A-frame

9
Multinomial Distribution Goodness of Fit Test
• Expected Frequencies

e1 = .25(100) = 25 e2 = .25(100) = 25
e3 = .25(100) = 25 e4 = .25(100) = 25
• Test Statistic
𝑶−𝑬 𝟐
𝑻.S.|𝑯𝟎 = σ[ ]= 𝝌𝟐𝒌−𝟏=𝟒−𝟏=𝟑
𝑬

30−25 2 20−25 2 35−25 2 15−25 2


𝜒 2 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 = + + +
25 25 25 25

=1+1+4+4
= 10

10
Multinomial Distribution Goodness of Fit Test
• CR

2 = 10 > 7.815 = 𝜒 2 .05,3

With  = .05 and Reject H0


k-1=4-1=3 Hence,
degrees of freedom preference is
indicated in the
style selected.

Do Not Reject H0 Reject H0

2
7.815 10
𝜒 2 .05,3

11
Multinomial Distribution Goodness of Fit Test
• Conclusion Using the p-Value Approach

Area in Upper Tail .10 .05 .025 .01 .005

2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838

Because 2 = 10 is between 9.348 and 11.345, the area in the upper tail of
the distribution is between .025 and .01.

The p-value <  = .05. We can reject the null hypothesis.

12
Test of Independence
• Example: Finger Lakes Homes (B)

Each home sold by Finger Lakes Homes can be classified according to price and to style. Finger
Lakes’ manager would like to determine if the
price of the home and the style of the home are independent variables.

13
Test of Independence
• Example: Finger Lakes Homes (B)

The number of homes sold for each model and price for the past two years is shown below. For
convenience, the price of the home is listed as either less than $200,000 or more than or equal to
$200,000.

Price Colonial Log Split-Level A-Frame

< $200,000 18 6 19 12
> $200,000 12 14 16 3

n=100 homes are cross-classified according to price level and style

14
Test of Independence
• Hypotheses

H0: Price of the home is independent of the style of the home that is purchased

Ha: Price of the home is not independent of the style of the home that is purchased

15
Test of Independence
• Observed Frequencies

Price Colonial Log Split-Level A-Frame Total


< $200K 18 6 19 12 55
> $200K 12 14 16 3 45

30 20 35 15 100
Total

16
Test of Independence
• Test Statistic
𝑶−𝑬 𝟐
𝑻.S.|𝑯𝟎 = σ[ ]= 𝝌𝟐𝒓−𝟏 𝒄−𝟏 =(𝟐−𝟏)(𝟒−𝟏)=𝟑
𝑬

(18 − 16.5)2 (6 − 11)2 (3 − 6.75)2


𝜒 2 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 = + + ⋯+
16.5 11 6.75

= .1364 + 2.2727 + . . . + 2.0833 = 9.149

• Rejection Rule/CR

T.S. computed= 9.149 > 7.815 = 𝜒 2 .05,3 . Hence reject H0 .

At the .05 level of significance, we conclude that the price of the home
is not independent of the style of home that is purchased.
17
Test of Independence
• Conclusion Using the p-Value Approach

Area in Upper Tail .10 .05 .025 .01 .005


2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838

Because 2 = 9.145 is between 7.815 and 9.348, the area in the upper tail of the
distribution is between .05 and .025.

The p-value <  =.05 . We can reject the null hypothesis.

(Actual p-value is .0274)

18
Testing the Equality of Population Proportions
for Three or More Populations
• Example: Finger Lakes Homes

Finger Lakes Homes manufactures three models of prefabricated homes, a two-story colonial, a log
cabin, and an A-frame. To help in product-line planning, management would like to compare the customer
satisfaction with the three home styles.

p1 = proportion likely to repurchase a Colonial for the population of Colonial owners


p2 = proportion likely to repurchase a Log Cabin for the population of Log Cabin owners
p3 = proportion likely to repurchase an A-Frame for the population of A-Frame owners

𝑯𝟎 : p1 = p2 = p3
𝑯𝟏 : Above not true

19
Testing the Equality of Population Proportions
for Three or More Populations
• We begin by taking a sample of owners from each of the three populations.

• Each sample contains categorical data indicating whether the respondents are likely or not likely to
repurchase the home.

20
Testing the Equality of Population Proportions
for Three or More Populations
• Observed Frequencies (sample results)

Home Owner
Colonial Log A-Frame Total
Likely to Yes 97 83 80 260
Repurchase No 38 18 44 100
Total 135 101 124 360

21
Testing the Equality of Population Proportions
for Three or More Populations
• Expected Frequencies (computed)

Home Owner
Colonial Log A-Frame Total
Likely to Yes 97.50 72.94 89.56 260
Repurchase No 37.50 28.06 34.44 100
Total 135 101 124 360

22
Testing the Equality of Population Proportions
for Three or More Populations
• Next, compute the value of the chi-square test statistic.
2
𝑶−𝑬 𝟐 𝑓𝑖𝑗 −𝑒𝑖𝑗
𝑻.S.|𝑯𝟎 = σ[ ] = σ𝑖 σ𝑗 = 𝝌𝟐𝟑−𝟏=𝟐
𝑬 𝑒𝑖𝑗

where:
fij = observed frequency for the cell in row i and column j

eij = expected frequency for the cell in row i and column j


under the assumption H0 is true
Note: The test statistic has a chi-square distribution with k – 1 degrees of freedom, provided the
expected frequency is 5 or more for each cell.
23
Testing the Equality of Population Proportions
for Three or More Populations
• Computation of the Chi-Square Test Statistic.
Obs. Exp. Sqd. Sqd. Diff. /
Likely to Home Freq. Freq. Diff. Diff. Exp. Freq.
Repurchase Owner fij eij (fij - eij) (fij - eij)2 (fij - eij)2/eij
Yes Colonial 97 97.50 -0.50 0.2500 0.0026
Yes Log Cab. 83 72.94 10.06 101.1142 1.3862
Yes A-Frame 80 89.56 -9.56 91.3086 1.0196
No Colonial 38 37.50 0.50 0.2500 0.0067
No Log Cab. 18 28.06 -10.06 101.1142 3.6041
No A-Frame 44 34.44 9.56 91.3086 2.6509
Total 360 360 2 = TS comp.= 8.6700

24
Testing the Equality of Population Proportions
for Three or More Populations
• CR- (using  = .05)

With  = .05 and


k-1=3-1=2
degrees of freedom

Do Not Reject H0 Reject H0

2
𝜒 2 .05,2 = 5.991 8.67

• Reject H0 . Hence, customer satisfaction with respect to home style is not homogeneous.
25
Testing the Equality of Population Proportions
for Three or More Populations
• Conclusion Using the p-Value Approach

Area in Upper Tail .10 .05 .025 .01 .005

2 Value (df = 2) 4.605 5.991 7.378 9.210 10.597

Because 2 = 8.670 is between 9.210 and 7.378, the area in the upper tail of
the distribution is between .01 and .025.

The p-value <  = .05 . We can reject the null hypothesis.

(Actual p-value is .0131)

26
Testing the Equality of Population Proportions
for Three or More Populations
• We have concluded that the population proportions for the three populations of home-owners are
not equal.

• To identify where the differences between population proportions exist, we will rely on a multiple
comparison procedure.

27
Regression
Studying relationship between variables
Ex- Relationship between Weight (Y) and Height (X).
Nature of relationship- From scatter diagram. Obtained by plotting 𝑌𝑖 vs 𝑋𝑖 of n pairs of sample data (𝑋𝑖 , 𝑌𝑖 ), i=1,2, … , n.
Concentration about a line indicates linear relation, about a parabola – quadratic or second-degree relation.
Ex- Suppose scatter diagram indicates linear relation between Wt. (Y) and Ht. (X)
Statistical relationship – Y = 𝜶 + 𝜷X + 𝝐, 𝝐 = random error term= …, 𝜶 = Y-intercept, 𝜷 = slope = ROC of Y w.r.t. X
Model- (We will explain for weight (Y) and height (X) example)
𝑌𝑖 = α + β 𝑋𝑖 + 𝜺𝒊 , i = 1,2, … ,n.
With X = X1, we get Y1 = α + βX1 + ϵ1. It means weight y1 of a person
chosen at random from people with height X1 (=62” say) is linearly
related to the height X1 plus some random error term ϵ1.
Assumptions-
Ex- Sample data of income and consumption-
Income (X) Consumption (Y)

10 9

12 11

14 13

16 15

18 16
Two-variable Linear Regression model (LRM)
Assumptions-
Y|X=62~ N(mean= 𝛼+𝛽(62), Var=𝜎 2 )
Y|X=64~ N(mean= 𝛼+𝛽(64), Var=𝜎 2 )
𝑌𝑖 | 𝑋𝑖 ~ N(mean= 𝛼+𝛽𝑋𝑖 , Var=𝜎 2 )
OR 𝜺𝒊 ~ N(mean= 0, Var=𝜎 2 )

PRL- Y = 𝜶 + 𝜷X gives Average


Relationship between Y and X in
the population.
Two-variable LRM Estimation of PRL-
Objectives/issues- Let the PRL be Y = α + βX where α and β are to be determined on the basis of
i) To estimate the model or PRL. sample data (Xi , Yi), I = 1,2, … ,n of size n. Let the sample regression line (SRL) be
෡=𝜶
𝒀 ෡ where 𝛼ො and 𝛽መ are to be determined such that the error in
ෝ + 𝜷X
ii) Study properties of the estimators.
iii) Inference about the parameters estimation E is minimum.
iv) Analysis of Variance (ANOVA). ෡ )2 = ∑(Y - 𝜶
E = ∑(Y - 𝒀 ෡ 2 = ∑e2 is minimum, where e = Y - 𝜶
ෝ - 𝜷X) ෡ = residual.
ෝ - 𝜷X
v) Prediction.
𝜕𝐸 መ = ∑e = 0 ⇒ ∑Y = n𝛼ො + 𝛽∑X
መ …………. Normal eq. 1
Ex- = 0 ⇒ ∑(Y - 𝛼ො - 𝛽X)
Income (X) Consumption (Y) 𝜕ෝ
𝛼

𝜕𝐸 መ መ 2 …. Normal eq. 2
10 9
෡ =0 ⇒ ∑(Y - 𝛼ො - 𝛽X)X = ∑eX = 0 ⇒ ∑YX = 𝛼ො ∑X + 𝛽∑X
𝜕𝛽
12 11
Equations 1) and 2) are called Normal Equations obtained by the method of Least
14 13 Squares. Solving, we get
16 15 ෡ = ∑(X - 𝑿
𝜷 ഥ )(Y - 𝒀
ഥ )/ ∑(X - 𝑿
ഥ )2 = ∑xy/ ∑x2 and 𝜶
ෝ=𝒀 ෡𝑿
ഥ-𝜷 ഥ . where 𝒙𝒊 = 𝑿𝒊 - 𝑿
ഥ and

18 16 ഥ.
𝒚𝒊 = 𝒀𝒊 - 𝒀

i) Compute the SRL/LBF. Note- For any computation, we need ∑X , ∑Y , ∑X2 , ∑Y2 , ∑XY .
ii) Interpret the slope parameter value. ഥ 2 = ∑X2 – (∑X)2/n , ∑y2 = ∑Y2 - n𝒀
∑x2 = ∑X2 - n𝑿 ഥ 2 = ∑Y2 – (∑Y)2/n ,
iii) Verify the normal equations. ∑xy = ∑XY - n𝑿ഥ𝒀ഥ = ∑XY – (∑X)(∑Y)/n
Two-variable LRM- Least Square Estimation
Income (X) Consumption X2 Y2 XY 𝑌෠ = 0.2 + .9 e=Y-𝑌෠ eX
(Y) X

(𝑋4 , 𝑌4 ) 10 9 100 81 90 9.2 -.2 -2.0

(𝑋4 ,𝑌෠4 ) 12 11 144 121 132 11.0 0 0

14 13 196 169 182 12.8 .2 2.8


SRL/LBF
෠ 𝛼+
𝑌= መ
ො 𝛽X 16 15 256 225 240 14.6 .4 6.4

18 16 324 256 288 16.4 -.4 -7.2

Total = 70 64 1020 852 932 0 0

n = 5 , ∑X = 70 , ∑Y = 64 , ∑X2 = 1020 , ∑Y2 = 852 , ∑XY =


i) 𝛽መ = ∑xy/ ∑x2 = 36/40 = 0.9 . 𝛼ො = 𝑌ത - 𝛽መ 𝑋ത = 12.8 – (.9)(14) = ഥ = 70/5 =14 , 𝒀ഥ = 64/5 = 12.8 .
932 . 𝑿
0.2.
෡=𝜶
SRL is 𝒀 ෡ = 0.2 + .9 X .
ෝ + 𝜷X ∑x2 = ∑X2 – (∑X)2/n = 1020 – (70)2/5 = 40, ∑y2 = ∑Y2 –
i) 𝛽መ = 0.9. It means for 1 rupee increase in income X, (∑Y)2/n = 852 – (64)2/5 = 32.8,
consumption Y increases by 0.9 rupees. ∑xy = ∑XY – (∑X)(∑Y)/n = 932 – (70)(64)/5 = 36 .
ii) It can be seen from the table above, the normal equations
∑e = 0 and ∑eX = 0.
Two-variable LRM-
Properties of the Least Square Estimators (LSE)-
෡ = β, E(𝛼)
Theorem- E(𝜷) ෡ 𝝈 ෡ 𝟐 = σ2/∑x2, Var(𝛼)=
ො = 𝛼, Var(𝜷)= ො 𝜎𝛼ෝ 2 = σ2∑X2/n∑x2 . It means LSE are UBE of the parameters.
𝜷

መ and Var(𝛼)
Note- Var(𝛽) ො involve the population parameter σ2 to be estimated from the sample data. 𝝈2 is called the
Variance of Regression. It represents the variations of population Y (consumption) values from the PRL. It should be
෡.
estimated by the variation of sample Y (consumption) values from the SRL, i.e. by residual e = Y - 𝒀
ෝ 2 = 𝑺𝟐 = ∑e2/ (n-2) is an UBE of σ2.
Theorem- 𝝈

Computational form of Residual Sum of Squares (RSS) = ∑e2 = ∑Y2 - 𝜶 ෡


ෝ ∑Y - 𝜷∑XY.
෡ =𝝈
Note – Est. Var(𝜷) ෝ 2/∑x2, Est. Var(𝛼) ෡ = 𝝈
ො = 𝜎ො 2∑X2/n∑x2, Est. S.E.(𝜷) ෝ 𝟐 /∑𝒙𝟐 , Est. S.E.(𝛼)
ො = 𝜎ො 2 /∑𝑥 2 .
መ iv) Est. S.E.(𝛽).
Ex (continued)- Compute i) RSS ∑𝑒 2 , ii) 𝜎ො 2, iii) Est. Var(𝛽), መ

i) RSS = ∑e2 = ∑Y2 - 𝛼∑Y መ


ො - 𝛽∑XY = 852 - .2*64 - .9*932 = .4.
ii) 𝝈ෝ 2 = ∑e2/(n-2) = .4/3 = .13333.
෡ = 𝜎ො 2/∑x2 = .13333/40 = .00333.
iii) Est. Var(𝜷)
෡ = 𝜎ො෡ = 𝑆෡ = 𝜎ො 2 /∑𝑥 2 = .00333 = .0577.
iv) Est. S.E.(𝜷) 𝛽 𝛽
Two-variable LRM, Inference about slope 𝛽
෡ – β)/Est. S.E.(𝜷)
Theorem- (𝜷 ෡ = (𝜷-β)/
෡ ෝ 𝟐 /∑𝒙𝟐 = 𝒕𝒏−𝟐
𝝈
෡ ∓ 𝒕.𝟎𝟐𝟓,𝒏−𝟐 Est. S.E. (𝜷)
C.I. – 95% C.I. for β is 𝜷 ෡

Ex (continued)- i) Construct 95% C.I. for the population m.p.c. β, ii) can you say the population m.p.c. is .8?, iii) can you say the
m.p.c. is 0?
෡ ∓ 𝒕.𝟎𝟐𝟓 ,𝒏−𝟐 Est. S.E.(𝜷)
Ans- i) 95% C.I. for β is 𝜷 ෡ or .9 ∓ 𝑡.025,3 (.0577)
or .9 ∓ (3.182)(.0577) or .9 ∓ .184 or (.716,1.084).
ii) β = .8 ∈ C.I. ⇒ Population m.p.c. can be assumed to be .8.
iii) β = 0 ∉ C.I. ⇒ population m.p.c. is not 0. It means the regression is significant.

Testing β – Note- In regression we usually test H0: β = 0 vs


H1: β ≠ 0 When β = 0, the PRL becomes Y = α. It
1) H0: β = 𝛽0 vs H1: β > 𝛽0 , β < 𝛽0 , β≠ 𝛽0
means y doesn’t change when X changes. i.e.
෡ – β)/Est. S.E.(𝜷)|H
2) T.S.|H0 = (𝜷 ෡ 0 = (𝜷-β)/
෡ ෝ 𝟐 /∑𝒙𝟐 |H0 = 𝒕𝒏−𝟐
𝝈 regression is not significant.
3) C.R. -
Two-variable LRM- Testing slope 𝛽

Ex (continued)- Can you say the regression is significant at level of significance α = .05 for using the sample data on income
and consumption?
Ans.- 1) H0: β = 0 (regression is not significant) vs H1: β ≠ 0 (regression is significant).
෡ – β)/Est. S.E.(𝜷)|H
2) T.S.|H0 = (𝜷 ෡ 0 = 𝒕𝒏−𝟐=𝟓−𝟐=𝟑 .
T.S. computed = (.9 – 0)/.0577 = 15.598.
Two-variable LRM- Testing slope 𝛽
3) C.R.-
4) Reject 𝐻0
5) Regression is significant.

Analysis of variance (ANOVA)- SST = SSE + SSR


𝑆𝑆𝐸 𝑆𝑆𝑅
It means analyzing the variation of consumption (Y) values for the 1 = 𝑆𝑆𝑇 + 𝑆𝑆𝑇
𝑺𝑺𝑹
model Y = α + βX+𝜀 on the basis sample data (Xi , Yi) , I = 1, … , n of = Proportion of total variation
𝑺𝑺𝑻
size n. ഥ ) = (Y - 𝒀
(Y - 𝒀 ෡ ) + (𝒀
෡-𝒀
ഥ) explained by regression = Coefficient of
ഥ )2 = ∑(Y - 𝒀
∑(Y - 𝒀 ෡ )2 + ∑(𝒀
෡-𝒀
ഥ )2 . determination = CD = 𝑹𝟐

ഥ )2 = Measures the total variation in Y-values = Sum of squares total = SST.


∑(Y - 𝒀
Note- Correlation coefficient = r =
෡-𝒀
∑(𝒀 ഥ )2 = Measures the variation due to regression or due to X = SSR. ∑xy/ ∑𝑥 2 ∑𝑦 2 . It can be shown
෡ )2 = Measures variation unexplained by regression or X = Sum of squares
∑(Y - 𝒀 ෡ 𝑪𝑫.
r = (Sign of 𝜷)
error = SSE
Two-variable LRM- ANOVA
ANOVA Table for regression
Source of Sum of squares d.f. MS = SS/d.f. F-ratio to test H0: Reg.
variation (SV) (SS) not sig. vs H1: Reg. sig. Ans.-
Due to SSR = ∑(𝑌෠ - 𝑌)
ത2 2–1 MSR = SSR/(2-1) MSR/MSE = i) SST = ∑y2 = 32.8
regression (X) 𝐹 2−1 ,(𝑛−2) ii) SSE = ∑e2 = .4
Due to error or ෠ 2= n-2
SSE = ∑(Y - 𝑌) MSE = SSE/(n-2) = If F comp. > 𝐹𝛼,(1,𝑛−2) , iii) SSR = SST – SSE = 32.8 - .4 = 32.4
unexplained by ∑e2 𝜎ො 2 then reject H0. iv) MSR = SSR/(2-1) = 32.4
regression (X)
v) MSE = 𝜎ො 2 = .13333
Total variation ത2
SST = ∑(Y - 𝑌) n -1 vi) F-ratio computed = MSR/MSE = 32.4/.13333 =
243.006
Ex (Contd.)-
vii) 𝐹.01,(1,3) = 34.1
Compute i) SST, ii) SSE, iii) SSR, iv) MSR, v) MSE, vi) F-ratio, vii) viii) F computed = 243.006 > 34.1 = 𝐹.01,(1,3) ⇒
𝐹.01,(1,3) , viii) conclusion from F-test, ix) 𝑅2 , x) interpret 𝑅2 , xi) Reject H0 ⇒ Regression is significant.
correlation coefficient r. ix) R2 = SSR/SST = 32.4/32.8 = .9878 = 98.78%
x) 98.78% of the total variation in consumption is
explained by income.
xi) r = 𝐶𝐷 = .9878 = .9939. (since 𝛽መ > 0)
Correlation
Degree of linear association between variables X and Y
Ex- Height and weight, Income and consumption
Quantitative measure of the linear association- C.C.
Population data- N units. Observations are (𝑿𝟏 ,𝒀𝟏 ), (𝑿𝟐 ,𝒀𝟐 ), … , (𝑿𝑵 ,𝒀𝑵 ).
𝝈𝑿𝒀 ∑(𝑿− 𝝁𝑿 )(𝒀− 𝝁𝒀 ) ∑(𝑿− 𝝁𝑿 )𝟐 ∑(𝒀− 𝝁𝒀 )𝟐
𝝆𝑿𝒀 = = ൗ = ∑(𝑿 − 𝝁𝑿 )(𝒀 − 𝝁𝒀 )Τ ∑(𝑿 − 𝝁𝑿 )𝟐 ∑(𝒀 − 𝝁𝒀 )𝟐
𝝈𝑿 𝝈𝒀 𝑵 𝑵 𝑵

Note- ∑(𝑋 − 𝜇𝑋 )2 = ∑ 𝑋 2 - N𝜇𝑋 2 , ∑(𝑌 − 𝜇𝑌 )2 = ∑ 𝑌 2 - N𝜇𝑌 2 , ∑(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 ) = ∑ 𝑋𝑌 - N𝜇𝑋 𝜇𝑌 .

Sample data- n units. Observations are (𝑿𝟏 ,𝒀𝟏 ), (𝑿𝟐 ,𝒀𝟐 ), … , (𝑿𝒏 ,𝒀𝒏 ).
𝑺
𝒓𝑿𝒀 = 𝑺 𝑿𝒀 ഥ )(𝒀 − 𝒀
= ∑(𝑿 − 𝑿 ഥ )Τ ∑(𝑿 − 𝑿
ഥ )𝟐 ∑(𝒀 − 𝒀
ഥ )𝟐 = ∑ 𝒙𝒚Τ ∑ 𝒙𝟐 ∑ 𝒚𝟐 ; Where x = (X - 𝑿
ഥ ) and y = (Y - 𝒀
ഥ ).
𝑺
𝑿 𝒀

Ex (continued) C.C. between income and expenditure = ∑ 𝒙𝒚Τ ∑ 𝒙𝟐 ∑ 𝒚𝟐 = 36/ 40 ∗ 32.8 = 0.9939.

Regression and correlation-


∑ 𝑥𝑦 𝑆 𝑟𝑋𝑌 𝑆𝑌
1) Regression coefficient of Y on X = 𝑏𝑌𝑋 = 𝛽መ = ∑ 𝑥 2 = 𝑆𝑋𝑌
2 = 𝑆𝑋
𝑋
2) SRE of Y on X is 𝑌෠ = 𝛼ො + 𝑏𝑌𝑋 X = (𝑌ത − 𝑏𝑌𝑋 𝑋)
ത + 𝑏𝑌𝑋 X ⟹ (Y - 𝑌) ത = 𝑏𝑌𝑋 (X - 𝑋)

3) -1 ≤ r ≤ 1
Note – Correlation does not necessarily mean causation. E.g. Income and expenditure on alcoholic beverages.
Two-variable LRM-Prediction
Prediction-

1) Predicting the mean of Y- values given X = X0. In notation, predicting 𝝁𝟎 = 𝝁𝒀|𝑿=𝑿𝟎 = α+β𝑿𝟎 .
2) Predicting single Y-values given X= X0 . In notation predicting Y|(X = X0) = 𝒀𝟎 = (α+β𝑿𝟎 ) + 𝜺𝟎
for the model 𝒀𝒊 = α + β𝑿𝒊 + ∈𝒊 , i = 1,2, … ,n. based on sample data.
Prediction of 𝝁𝟎 = 𝝁𝒀|𝑿=𝑿𝟎 = α+β𝑿𝟎 -

ෞ𝟎 = 𝝁𝒀|𝑿=𝑿
Point estimator 𝝁 ෟ 𝟎= 𝜶 ෡ 𝟎
ෝ + 𝜷𝑿
𝟏 𝑿𝟎 − 𝑿ഥ 𝟐
ෞ𝟎 ∓ 𝒕,𝟎𝟐𝟓,𝒏−𝟐 (
95% C.I. for 𝝁𝟎 is 𝝁 [ + ∑ 𝟐 ]𝝈 ෢𝟐 ).
𝒏 𝒙

Prediction of single Y|(X = X0) = 𝒀𝟎 = (α+β𝑿𝟎 ) + 𝜺𝟎


෢𝟎 = 𝜶
Point predictor of 𝒀𝟎 is 𝒀 ෡ 𝟎
ෝ + 𝜷𝑿

ഥ 𝟐
෢𝟎 ∓ 𝒕,𝟎𝟐𝟓,𝒏−𝟐 ( [𝟏 + 𝟏 +
95% P.I. for 𝒀𝟎 is 𝒀
𝑿𝟎 − 𝑿
𝟐 ]𝝈෢𝟐 )
𝒏 ∑𝒙


Note- 1)Prediction is more precise when 𝑿𝟎 is close to 𝑿
2) P.I. of 𝒀𝟎 is wider than C.I. of 𝝁𝟎 .
Two-variable LRM- Prediction
Ex(continued) – For the income and consumption example given income (X) is 15, i) Find the point estimator of the
population mean consumption, ii) Find the 90% C.I. of the population mean consumption, iii) Find the point predictor of
single consumption value (Y), iv) Find the 90% P.I. for the single consumption (Y).
Ans.-

i) The point estimator of the mean consumption = 𝜇


ෞ0 = 𝜇𝑌|𝑋=15 መ
ෟ = 𝛼ො + 𝛽(15)= .2 + .9*15 = 13.7
1 𝑋0 − 𝑋ത 2 ෢
ii) Therefore, 90% C.I. of the mean consumption 𝜇0 is 𝜇
ෞ0 ∓ 𝑡,05,5−2 ( [𝑛 + ∑ 𝑥2
]𝜎 2 )

1 15 − 14 2
13.7 ∓ 2.353*( + (.133) ) or 13.7 ∓ .407 or (13.293,14.107)
5 40

iii) The point predictor of single consumption given income=15 is 𝑌෡0 = 𝜇


ෞ0 = 13.7.

1 𝑋0 − 𝑋ത 2 ෢
iv) 90% P.I. for 𝑌0 is 𝑌෡0 ∓ 𝑡,05,5−2 ( [1 + 𝑛 + ∑ 𝑥2
]𝜎 2 )=

1 15 − 14 2
13.7 ∓ 2.353*( 1+5+ 40
(.133) ) or 13.7 ∓ .95 or (12.75,14.65)
Design of Experiment- CRD
Experimental research is necessary to establish the cause and effect relationship between variables which the descriptive
research can’t accomplish. In experimental research we control the effect of the other variables of the environment which
influences the variable under study. E.g. when we want study the effect of income on the buying behavior of a product, we
should eliminate the effect of education and age on the buying behavior as they also influence it.
Completely randomized design (CRD)- Usually it is used to compare more than two population means or treatments. In the
testing of hypothesis, we have seen how to compare two population means. Three population means can be compared by
comparing three pairs (µ1,µ2), (µ1,µ3) and (µ2,µ3) separately. It does not take into account the simultaneous variation of all
the three samples. So, this is not an efficient way of comparing three means simultaneously. Can use CRD for this.
Ex- We want to compare three teaching methods (treatments). Teaching Scores Treatment
Method mean =
H0: µ1 = µ2 = µ3 vs H1: Above not true or at least one inequality. (TM)/ 𝑿𝒊.
Treatment
The estimator of the population mean is the sample mean. If
the sample means are close to one another, then we can say H0
TM1 10 = 𝑋11 12 = 𝑋12 14 = 𝑋13 16 = 𝑋14 𝑿𝟏. = 13
is true. i.e. if the variation between them is large compared with
TM2 12 = 𝑋21 14 = 𝑋22 16 = 𝑋23 18 = 𝑋24 𝑿𝟐. = 15
the other sources of variation then we reject H0. So, we have to
TM3 20 = 𝑋31 21 = 𝑋32 24 = 𝑋33 26 = 𝑋34 𝑿𝟑. = 22.75
look at different components of the total variation. Now we
𝑿.. = 16.917
explain the sampling procedure and sample data.

𝑺𝟏 𝟐 =20/3=6.667, 𝑺𝟐 𝟐 =20/3=6.667, 𝑺𝟑 𝟐 =22.75/3=7.583


Completely Randomized Design (CRD)
𝑋𝑖𝑗 = Score of the jth student receiving the ith TM (treatment). J =
1, 2, … , 𝑛𝑖 . i = 1, 2,3(=K).
Here n1 = 4, n2 = 4, n3 = 4. Total sample size = n = n1 + n2 + n3 = 12.

𝑋𝑖. = Sample mean of the ith treatment = σ𝑗 𝑋𝑖𝑗 / 𝑛𝑖 .

𝑋.. = Overall sample mean = σ𝑖 σ𝑗 𝑋𝑖𝑗 /n .

Model- 𝑿𝒊𝒋 = 𝝁𝒊 + 𝜺𝒊𝒋 = µ + 𝜶𝒊 + 𝜺𝒊𝒋 .


H0: 𝝁1=𝝁2=𝝁3 OR α1=α2=α3 (no difference between treatments) vs H1: At
µ = overall population mean. least one inequality (treatment effects are different)
𝜇𝑖 = mean of the ith population. The total variation in X- values can be split into two terms as shown below.
𝛼𝑖 = effect of the ith treatment σ𝒊 σ𝒋(𝑿𝒊𝒋 − 𝑿.. )𝟐 = σ𝒊 σ𝒋(𝑿𝒊𝒋 − 𝑿𝒊. )𝟐 + σ𝒊 σ𝒋(𝑿𝒊. − 𝑿.. )𝟐 .
𝜖𝑖𝑗 = error term = deviation of any σ𝑖 σ𝑗(𝑋𝑖. − 𝑋.. )2 =Variation between treatments or sum of squares due to
observation from the corresponding
treatments = SSTr.
population mean.
σ𝑖 σ𝑗(𝑋𝑖𝑗 − 𝑋𝑖. )2 = Variation within treatments or sum of squares due to
Assumptions-
error/residual = SSE
𝜺𝒊𝒋 ~ IID N(mean=0, Variance = 𝝈𝟐𝜺 ).
CRD- Analysis of Variance (ANOVA) Computational form of SST, SSE, SSTr.-
2
Source of SS d.f. MS=SS/d.f. F-ratio to test SST = σ𝑖 σ𝑗(𝑋𝑖𝑗 − 𝑋.. )2 = σ𝑖 σ𝑗 𝑋𝑖𝑗 - n𝑋ത..2 =
variation (SV) H0: 𝛼1 = 𝛼2 = 𝛼3 2
σ𝑖 σ𝑗 𝑋𝑖𝑗 – CF where CF is the correction
(No difference
between treatments) factor given by
𝑻𝟐
Due to treatment SSTr.= 212.1667 K-1 = 3-1 MSTr.= 106.0833 T.S.|H0 CF = (σ𝑖 σ𝑗 𝑋𝑖𝑗 )2/n= 𝒏
(between group SSTr./(K-1) = MSTr./MSE=15.215
෍ ෍(𝑋𝑖. − 𝑋ഥ.. )2
variation)
𝑖 𝑗
SSTr. = σ𝑖 σ𝑗(𝑋𝑖. − 𝑋.. )2 = σ𝑖 σ𝑗 𝑋ത𝑖.2 – CF
= 𝐹K−1,𝑛−𝐾
Due to SSE= 62.75 n-K MSE= 6.9722
error/residual = 12-3 SSE/(n-K)
SSE = SST – SSTr.
෍ ෍(𝑋𝑖𝑗 − 𝑋𝑖. )2 If 𝐹𝐶𝑜𝑚𝑝 >𝐹𝑇𝑎𝑏𝑢𝑙𝑙𝑎𝑡𝑒𝑑 ,
(within group
𝑖 𝑗 SSE = (𝒏𝟏 -1) 𝑺𝟏 𝟐 + (𝒏𝟐 -1) 𝑺𝟐 𝟐 + … + (𝒏𝑲 -1) 𝑺𝑲 𝟐
variation)
Total variation SST= 274.9167 n-1 then reject 𝐻0 where 𝑺𝒊 𝟐 = Sample variance of ith treatment
= 12-1 ഥ 𝒊. )𝟐
σ𝒋(𝑿𝒊𝒋 −𝑿
෍ ෍(𝑋𝑖𝑗 − 𝑋ഥ.. )2 =
𝑖 𝑗 (𝒏𝒊 −𝟏)

Ex (Contd.) CF = (σ𝑖 σ𝑗 𝑋𝑖𝑗 )2/n = (203)2/12 = 3434.0833 MSTr. = SSTr./(K-1) = 212.1667/2 = 106.08335
2
SST = σ𝑖 σ𝑗 𝑋𝑖𝑗 – CF = 3709 – 3434.0833 = 274.9167 MSE = SSE/(n – K) = 62.75/9 = 6.97222

SSTr. = σ𝑖 σ𝑗 𝑋ത𝑖.2 – CF = 3646.25 – 3434.0833 = 212.1667 T.S.|H0 = MSTr./ MSE = 𝑭𝑲−𝟏,𝒏−𝑲 = 𝑭𝟐,𝟗

SSE = SST - SSTR. = 274.9167 – 212.1667 = 62.75 F computed = 106.08335/6.97222 = 15.215 > 8.02 =𝑭.𝟎𝟏,(𝟐,𝟗) .
CRD- Multiple comparison procedure
Multiple comparison procedure- From the F-test of the analysis of variance, we find the population mean scores of the three
populations are different. We may like to know where from this difference is arising. We can use multiple pair wise
comparison procedure for this. This was given by Fisher.
Fisher’s Least Significance Difference (LSD) method- Ex-(Continued) Comparing TM1 with TM2- Use 𝛼 = .05.
We will compare a pair of population means. There are
1) 𝐻0 : 𝜇1 = 𝜇2 (𝛼1 =𝛼2 ) Vs 𝐻1 : 𝜇1 ≠ 𝜇2 (𝛼1 ≠ 𝛼2 )
three pairs (1,2), (1,3) and (2,3). Procedure is given below.
(𝑋ത1. −𝑋ത2. )
2) 𝑇.S.| 𝐻0 = = 𝑡𝑛−𝐾=12−3=9
1) 𝑯𝟎 : 𝝁𝒊 = 𝝁𝒋 Vs 𝑯𝟏 : 𝝁𝒊 ≠ 𝝁𝒋 . 𝑀𝑆𝐸(
1 1
+ )
𝑛1 𝑛2
ഥ 𝒊. −𝑿
(𝑿 ഥ 𝒋. )
2) 𝑻.S.| 𝑯𝟎 = 𝟏 𝟏
= 𝒕𝒏−𝑲 3) T.S. computed =
(13−15) −2
= 1.867 = - 1.071. |t-comp.|
𝑴𝑺𝑬(𝒏 +𝒏 ) 1 1
𝒊 𝒋 6.972(4+4)

3) If |t-computed|> t-tabulated = 𝒕𝜶,𝒏−𝑲 , then reject 𝑯𝟎 =1.071 < 2.262 = 𝑡.025,9 .


𝟐
⟹ Do not reject 𝐻0 .⟹ Population mean scores of TM1
and TM2 are not different.
In a similar manner, comparison can be made for the pairs (1,3)
and (2,3). 1) 𝑯𝟎 : 𝝁𝒊 = 𝝁𝒋 Vs 𝑯𝟏 : 𝝁𝒊 ≠ 𝝁𝒋
ഥ 𝒊. − 𝑿
2) T.S. = (𝑿 ഥ 𝒋. )
𝟏 𝟏
An alternative approach can be adopted using Fisher’s LSD 3) LSD = 𝒕𝜶/𝟐,𝒏−𝑲 𝑴𝑺𝑬(𝒏 + 𝒏 )
𝒊 𝒋
method where the T.S. (𝑋ത𝑖. − 𝑋ത𝑗. ) is compared with the LSD. ഥ 𝒊. − 𝑿
4) If |𝑿 ഥ 𝒋. | > LSD, then reject 𝑯𝟎 .
CRD- Multiple comparison
Ex- (Continued) Since we have the same sample sizes 𝑛𝑖 = 4
Confidence interval approach-
from all the three populations, LSD will be the same for all the
three pairs (1,2), (1,3), (2,3). Let us use 𝛼 = .05. ഥ 𝒊. − 𝑿
(1 - 𝛼) 100% C.I. for (𝝁𝒊 - 𝝁𝒋 ) is (𝑿 ഥ 𝒋. ) ∓ LSD.

1 1 1 1 For (TM1,TM2) pair- (𝑿ഥ 𝟏. − 𝑿ഥ 𝟐. ) ∓ LSD OR - 2 ∓ 4.223


LSD = 𝑡𝛼/2,𝑛−𝐾 𝑀𝑆𝐸(𝑛 + ) = 𝑡.025,9 6.972(4 + ) =
𝑖 𝑛𝑗 4 OR (- 6.223,2.223). 0 ∈ C.I..
2.262(1.867) = 4.223.
⟹ Population means of TM1 and TM2 are not
For (TM1,TM2) pair - |(𝑋ത1. − 𝑋ത2. )| = |13-15| = 2 < 4.223 ⟹ Do different.
not reject 𝐻0 . ⟹ Means not different. ഥ 𝟏. − 𝑿
ഥ 𝟑. ) ∓ LSD OR – 9.75 ∓
For (TM1,TM3) pair- (𝑿
For (TM1,TM3) pair - |(𝑋ത1. − 𝑋ത3. )| = |13-22.75| = 9.75 > 4.223 4.223 OR (– 13.973, - 5.527). 0 ∉ C.I..
⟹ Reject 𝐻0 . ⟹ Means different.
⟹ Population means of TM1 and TM3 are different.
For (TM2,TM3) pair - |(𝑋ത2. − 𝑋ത3. )| = |15-22.75| = 7.75 > 4.223 ഥ 𝟐. − 𝑿
ഥ 𝟑. ) ∓ LSD OR – 7.75 ∓
For (TM2,TM3) pair- (𝑿
⟹ Reject 𝐻0 . ⟹ Means different.
4.223 OR (– 11.973, - 3.527). 0 ∉ C.I..
From the above, we can conclude that the population mean
⟹ Population means of TM2 and TM3 are different.
score of TM3 is different from both TM1 and TM2.

You might also like