Two Sample Updated Test

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Two Sample

Gauranga C. Samanta

Assistant Professor
Department of Mathematics
BITS Pilani K K Birla Goa Campus, Goa

26th November 2019


Test Concerning a Population Proportion

Let p denote the proportion of individuals or objects in a


population who possess a specified property (e.g., cars with
manual transmissions or smokers who smoke a filter cigarette).
If an individual or object with the property is labeled a success
(S), then p is the population proportion of successes.
Tests concerning p will be based on a random sample of size n
from the population. Provided that n is small relative to the
population size, X (the number of S 0 s in the sample) has
(approximately) a binomial distribution.
Furthermore, if n itself is large [np 10 and n(1 p) 10] ,
both X and the estimator p̂ = Xn are approximately normally
distributed.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Test Concerning a Population Proportion Cont.

Null hypothesis: H0 : p = p0
Test statistic value: z = p p̂ p0
p0 (1 p0 )/n
Alternative Hypothesis: Rejection Region
H1 : p > p 0 z z↵ (upper-tailed)
H1 : p < p 0 z  z↵ (lower-tailed)
H1 : p 6= p0 either z z↵/2 or z  z↵/2 (two-tailed)
These test procedures are valid provided that np0 10 and
n(1 p0 ) 10 .

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Example

Example
Natural cork in wine bottles is subject to deterioration, and as a
result wine in such bottles may experience contamination. The
article “E↵ects of Bottle Closure Type on Consumer Perceptions of
Wine Quality” (Amer. J. of Enology and Viticulture, 2007:
182-191) reported that, in a tasting of commercial chardonnays, 16
of 91 bottles were considered spoiled to some extent by
cork-associated characteristics. Does this data provide strong
evidence for concluding that more than 15% of all such bottles are
contaminated in this way? Let’s carry out a test of hypotheses
using a significance level of 0.10.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Solution
p = the true proportion of all commercial chardonnay bottles
considered spoiled to some extent by cork-associated
characteristics.
The null hypothesis is H0 : p = 0.15.
The alternative hypothesis is H1 : p > 0.15, the assertion that
the population percentage exceeds 15%.
Since np0 = 91(0.15) = 13.65 > 10 and
nq0 = 91(0.85) = 77.35 > 10, the large sample z test can
be used.
The test statistic value is z = p p̂ 0.15
(0.15)(0.85)/91
The form of H1 implies that an upper-tailed test is
appropriate: Reject H0 if z z0.10 = 1.28 .
p̂ = 16/91 = 0.1758, z = 0.69
Since .69 < 1.28, z is not in the rejection region. At
significance level 0.10, the null hypothesis cannot be rejected.
Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa
Test on single proportion for Binomail RV

Test the hypothesis H0 : p = p0 , H1 : p < p0 , we use the


binomial distribution to compute the P-value
P = P(X  x when p = p0 )
The value x is the number of successes in our sample of size n.
If this P -value is less than or equal to ↵, our test is significant
at the ↵ level and we reject H0 in favor of H1 . Similarly
Test the hypothesis H0 : p = p0 , H1 : p > p0 , we use the
binomial distribution to compute the P-value
P = P(X x when p = p0 )
and reject H0 in favor of H1 if this P-value is less than or
equal to ↵.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Test on single proportion for Binomail RV
Test the hypothesis H0 : p = p0 , H1 : p 6= p0 , at ↵ LOS
P = 2P(X  x when p = p0 ) if x < np0
or
P = 2P(X x when p = p0 ) if x > np0
Reject H0 in favor of H1 if the computed P -value is less than
or equal to ↵.
The steps for testing a null hypothesis about a proportion against
various alternatives are as follows:
H0 : p = p 0
One of the alternatives H1 : p < p0 , p > p0 , or p 6= p0
Choose a level of significance equal to ↵.
Test statistic: Binomial variable X with p = p0 .
Computations: Find x, the number of successes, and compute
the appropriate P -value.
Decision: Draw appropriate conclusions based on the P -value.
Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa
Test on single proportion for Binomail RV
Test the hypothesis H0 : p = p0 , H1 : p 6= p0 , at ↵ LOS
P = 2P(X  x when p = p0 ) if x < np0
or
P = 2P(X x when p = p0 ) if x > np0
Reject H0 in favor of H1 if the computed P -value is less than
or equal to ↵.
The steps for testing a null hypothesis about a proportion against
various alternatives are as follows:
H0 : p = p 0
One of the alternatives H1 : p < p0 , p > p0 , or p 6= p0
Choose a level of significance equal to ↵.
Test statistic: Binomial variable X with p = p0 .
Computations: Find x, the number of successes, and compute
the appropriate P -value.
Decision: Draw appropriate conclusions based on the P -value.
Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa
Example

Example
A builder claims that heat pumps are installed in 70% of all homes
being constructed today in the city of Goa. Would you agree with
this claim if a random survey of new homes in this city showed
that 8 out of 15 had heat pumps installed? Use a 0.10 level of
significance.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Solution

H0 : p = 0.7
H1 : p 6= 0.7
↵ = 0.10
Test statistic binomial variable X with p = 0.7 and n = 15.
Computations: x = 8 and np0 = 15(0.7) = 10.5. Therefore
the computed P- value is P
P = 2P(X  8 when p = 0.7) = 2 8x=0 b(x; 15, 0.7) =
0.2622 > 0.10
Decision: Do not reject H0 . Conclude that there is
insufficient reason to doubt the builder’s claim.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Example

Example
A commonly prescribed drug for relieving nervous tension is
believed to be only 60% e↵ective. Experimental results with a new
drug administered to a random sample of 100 adults who were
su↵ering from nervous tension show that 70 received relief. Is this
sufficient evidence to conclude that the new drug is superior to the
one commonly prescribed? Use a 0.05 level of significance.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Solutions

H0 : p = 0.6
H1 : p > 0.6
↵ = 0.05
Critical region: z > 1.645
Computiosns: x = 70, n = 100, p̂ = 70/100 and
z = p 0.7 0.6 = 2.04,
(0.6)(0.4)/100
P(Z > 2.04) < 0.0207
Decision: Reject H0 and conclude that the new drug is
superior.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Inferences Based on Two Samples

Basic Assumptions:
X1 , X2 , · · · Xm is a random sample from a distribution with
mean µ1 and variance 12 .
Y1 , Y2 , · · · , Yn is a random sample from a distribution with
mean µ2 and variance 22 .
The X and Y samples are independent of one another.

Theorem
The expected value of X̄ Ȳ is µ1 µ2 , so X̄ Ȳ q
is an unbiased
2 2
estimator of µ1 µ2 . The sd of X̄ Ȳ is X̄ Ȳ = m
1
+ 2
n

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Inferences Based on Two Samples

Basic Assumptions:
X1 , X2 , · · · Xm is a random sample from a distribution with
mean µ1 and variance 12 .
Y1 , Y2 , · · · , Yn is a random sample from a distribution with
mean µ2 and variance 22 .
The X and Y samples are independent of one another.

Theorem
The expected value of X̄ Ȳ is µ1 µ2 , so X̄ Ȳ q
is an unbiased
2 2
estimator of µ1 µ2 . The sd of X̄ Ȳ is X̄ Ȳ = m
1
+ 2
n

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Test Procedures for Normal Population with Known
Variances

Null Hypothesis: H0 : µ1 µ2 = d0
x̄ ȳ d0
Test statistic value: z = r
2 2
1+ 2
m n

Alternative Hypothesis: Rejection Region


H1 : µ 1 µ 2 > d 0 z z↵ (upper-tailed)
H1 : µ 1 µ 2 < d 0 z  z↵ (lower-tailed)
H1 : µ1 µ2 6= d0 either z z↵/2 or z  z↵/2 (two-tailed)

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Example

Example
Analysis of a random sample consisting of m = 20 specimens of
cold-rolled steel to determine yield strengths resulted in a sample
average strength of x̄ = 29.8 ksi . A second random sample of
n = 25 two-sided galvanized steel specimens gave a sample average
strength of ȳ = 34.7ksi. Assuming that the two yield-strength
distributions are normal with 1 = 4.0 and 2 = 5.0 , does the
data indicate that the corresponding true average yield strengths
µ1 and µ2 are di↵erent? Let’s carry out a test at significance level
↵ = 0.01 .

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Solutions

The parameter of interest is µ1 µ2 , the di↵erence between


the true average strengths for the two types of steel.
The null hypothesis is H0 : µ1 µ2 = 0 .
The alternative hypothesis is H1 : µ1 µ2 6= 0; if H1 is true,
then µ1 and µ2 are di↵erent.
With d0 = 0, the test statistic value is z = r x̄ ȳ
2 2
1+ 2
m n

The inequality in H1 implies that the test is two-tailed. For


↵ = 0.01 , ↵2 = 0.005 , and z↵/2 = z0.005 = 2.58, H0 will be
rejected if z 2.58 or if z  2.58 .
Computations:
m = 20, x̄ = 29.8, 12 = 16, n = 25, ȳ = 34.7, 2
2 = 25
z = 29.8
q 34.7 = 4.90 = 3.66
16 25 1.34
20
+ 25

Decision: H0 is therefore rejected

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Large Sample Tests

x̄ ȳ d0
Use of the test statistic value z = r
s2 s2
1+ 2
m n

These tests are usually appropriate if both m > 40 and n > 40


. A P-value is computed exactly as it was for our earlier z
tests.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Example

Example
What impact does fast-food consumption have on various dietary
and health characteristics? The article “E↵ects of Fast-Food
Consumption on Energy Intake and Diet Quality Among Children
in a National Household Study” (Pediatrics, 2004: 112-118)
reported the accompanying summary data on daily calorie intake
both for a sample of teens who said they did not typically eat fast
food and another sample of teens who said they did usually eat
fast food. With
m = 663, n = 413, µ1 = 2258, µ2 = 2637, s1 = 1519, s2 = 1138
Does this data provide strong evidence for concluding that true
average calorie intake for teens who typically eat fast food exceeds
by more than 200 calories per day the true average intake for those
who don’t typically eat fast food? Let’s investigate by carrying out
a test of hypotheses at a significance level of approximately 0.05.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Solutions
The parameter of interest is µ1 µ2 , where µ1 is the true
average calorie intake for teens who don’t typically eat fast
food and µ2 is true average intake for teens who do typically
eat fast food. The hypotheses of interest are
H0 : µ1 µ2 = 200 versus H1 : µ1 µ2 < 200
The alternative hypothesis asserts that true average daily
intake for those who typically eat fast food exceeds that for
those who don’t by more than 200 calories. The test statistic
value is
z = x̄ rȳ 2( 200)
2
s s
1 + 2
m n
The inequality in H1 implies that the test is lower-tailed; H0
should be rejected if z  z0.05 = 1.645 . The calculated
test statistic value is
2258 2637+200 179
z=q (1519) 2 (1138) 2
= 81.34 = 2.20
663
+ 413
Since 2.20 < 1.645. The null hypothesis is rejected.
Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa
Confidence Intervals for µ1 µ2

When both population distributions are normal, standardizing


X̄ Ȳ gives a random variable Z with a standard normal
distribution. Since the area under the z curve between z ↵2 and
z ↵2 is 1 ↵ , it follows that
0 1
X̄ Ȳ (µ1 µ2 )
P @ z↵/2 < r
2 2
< z↵/2 A = 1 ↵
1+ 2
m n

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Two Sample t Test and Confidence Interval

When the population distributions are both normal, the


standardized variable
T = X̄ rȲ (µ1 µ2 )
2 2
s s
1 + 2
m n
has approximately t distribution with ⌫ df,
✓ ◆2
s12 s22
m
+ n
Where ⌫ can e defined as ⌫ = (s 2 /m)2 (s 2 /n)2
2 + 2n 1
m 1

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Two Sample t Test and Confidence Interval

The two sample t confidence interval for µ1 µ2 with


q
s12 s22
confidence level 100(1 ↵)% is then x̄ ȳ ± t↵/2,⌫ m + n
The two sample t test for testing H0 : µ1 µ2 = d0 is as
follows:
Alternative Hypothesis: Rejection Region
H1 : µ 1 µ 2 > d 0 t t↵,⌫ (upper-tailed)
H1 : µ 1 µ 2 < d 0 t  t↵,⌫ (lower-tailed)
H1 : µ1 µ2 6= d0 either t t↵/2,⌫ or t  t↵/2,⌫ (two-tailed)

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Confidence Interval for a Population Proportion

Let p denote the proportion of ”successes” in a population,


where success identifies an individual or object that has a
specified property
A random sample of n individuals or objects is to be selected,
and X is the number of successes in the sample, X can be
regarded
p as a binomial rv with E (X ) = np and
X = np(1 p).
Furthermore, if both np 10 and nq 10, X has
approximately a normal distribution.
X
The natural estimator of p is p̂ = n, the sample fraction of
successes.
Confidence
✓ interval for a population proportion is given by
q q ◆
p̂ z ↵2 p̂q̂
n , p̂ z ↵2 p̂q̂
n

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Questions

Example
The pressure (Y ) is determined by measurements as a function of
(x). A sample
P of size 25Pwas taken and Pthe resulting data gave the
following
P 2 xi =P1225, yi = 3673, xi yi = 242393,
xi = 80825, yi2 = 726939. Test the hypothesis that slop is 3
at 5% level of significance. Also find the estimated regression line
(round up to four decimal places).

Example
Find the maximum likelihood estimators for ✓ in the probability
density (
function
3✓3 x 4 , if x ✓ > 0
f (x) =
0, otherwise.
and check weather the estimator is biased or unbiased.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Questions

Example
The data regarding the production of wheat in tons(X ) and the
price of the flour in kilos (Y ) in the decade of the 80‘ s in Spain
were

X 30 28 32 25 25 25 22 24 35 40
Y 25 30 27 40 42 40 50 45 30 25

Find the regression line using the method of least squares and
compute a 95% confidence interval for the slope of the regression
line.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Questions

Example
The temperature Y is determined by measurements as a function
of depth
P20 x. A ample of Psize 20 was takenPand the resulting data
20 20
gave:
P20 2 i=1 x i = 1050,
P20 2 i=1 y i = 5184, i=1 xi yi = 355755,
i=1 xi = 71750, i=1 yi = 1764182.
(a) Find the estimated regression line.
(b) Test the hypothesis that slop is 5 at 5% level of significance.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Questions

Example
To estimate the average time required for certain repairs, an
automobile manufacturer engaged 40 mechanics, a random sample,
and measured the time taken by each of them in the performance
of this task. If it took them on an average 24.05 minutes with a
standard deviation of 2.68 minutes, what can the manufacturer
assert with 95% confidence about the maximum error?

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Questions

Example
An account on server A is more expensive than an account on
server B. However, server A is faster. To see if whether it’s
optimal to go with the faster but more expensive server, a manager
needs to know how much faster it is. A certain computer algorithm
is executed 20 times on server A and 30 times on server B with the
following results

Server- A Server-B
Sample mean 6.7 7.5
Sample sd 0.6 1.2

Construct a 95% confidence interval for the di↵erence of µ1 µ2


between the mean execution times on server A and server B.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Solutions

We have n= 30, m = 20, X̄ = 6.7, Ȳ = 7.5, Sx = 0.6, and


Sy = 1.2. We use the method for unknown, unequal variances.
we find

degrees of

freedom:
2
(0.6)2 (1.2)2
30
+ 20
⌫= (0.6)4 (1.2)4
= 25.4
+
(30)2 (29) (20)2 (19)

find t0.025,25 = 2.060


The confidence
qinterval is
sx2 sy2
X̄ Ȳ ± t↵/2 n + m
ANS: [ 1.4, 0.2]

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Questions

Example
Internet connections are often slowed by delays at nodes. Let us
determine if the delay time increases during heavyvolume times.
Five hundred packets are sent through the same network between
5pm and 6pm (sample X , and three hundred packets are sent
between 10pm and 11pm (sample Y ). The early sample has a
mean delay time of 0.8 sec with a standard deviation of 0.1 sec
whereas the second sample has a mean delay time of 0.5 sec with a
standard deviation of 0.08 sec. Construct a 99.5% confidence
interval for the di↵erence between the mean delay times.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Questions

Example
A manager evaluates e↵ectiveness of a major hardware upgrade by
running a certain process 50 times before the upgrade and 50 times
after it. Based on these data, the average running time is 8.5
minutes before the upgrade, 6.2 minutes after it. Historically, the
standard deviation has been 1.8 minutes, and presumably it has not
changed. Construct a 90% confidence interval showing how much
the mean running time reduced due to the hardware upgrade.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Questions

Example
The number of concurrent users for some internet service provider
has always averaged 5000 with a standard deviation of 800. After
an equipment upgrade, the average number of users at 100
randomly selected moments of time is 5200. Does it indicate, at a
5% level of significance, that the mean number of concurrent users
has increased? Assume that the standard deviation of the number
of concurrent users has not changed.

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


Questions

Example
A quality inspector finds 10 defective parts in a sample of 500
parts received from manufacturer A. Out of 400 parts from
manufacturer B, she finds 12 defective ones. A computer-making
company uses these parts in their computers and claims that the
quality of parts produced by A and B is the same. At the 5% level
of significance, do we have enough evidence to disprove this claim?

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa


THANK YOU

Gauranga C. Samanta BITS Pilani, K K Birla Goa Campus, Goa

You might also like