Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

CHI-SQUARE APPLICATIONS

Karlina Sari, SE., MA.


Statistics II
November 18th , 2010
Introduction
• Chi-square is a family of probability
distributions
( n  1) s 2
2 
2
• Χ² ≥ 0
The Purpose of Chi-Square Test
• Goodness-of-fit test
• Testing the independence of two variables
• Comparing proportions from k independent
samples
• Testing the standard deviation
Goodness-of-fit Test
1. H0 : The sample is from the specified population
H1 : The sample is not from the specified
population
k (O  E ) 2
2.  2
 
j 1
j j

Ej

k = number of categories/cells in the table


Oj = observed frequency in cell j
Ej = expected frequency in cell j
3. Find X²α where df = k-1-m (m is the number of
population parameters, e.g., µ or σ)
4. Testing criteria :
Calculated X² > X²α  H0 is rejected
Calculated X² ≤ X²α  H0 is not rejected
Example 1
• A researcher in the department of education has
collected data consisting of a random sample of 300
SAT scores of high school seniors in her state who
took the college entrance examination last year. The
sample mean is 945.04 and the standard deviation is
142.61. A frequency distribution for the scores shows
the following distribution :
Cate >800 800- 900- 1000 1100 1200 1300 >1399
gory 899 999 - - - -
1099 1199 1299 1399
f 36 96 78 48 25 10 3 4
Based on these sample data, use the 0.01 level
of significance in determining whether the
sample could have been drawn from a
population in which the scores are normally
distributed!
Testing the Independence of Two Variables

1. H0 : The variables are independent of each other


H1 : The variables are not independent of each
other
2. r k (O  E ) 2
  
2 ij ij

i 1 j 1 Eij
r = number of rows; k = number of columns;
Oij = observed frequency in row i column j;
Eij = expected frequency in row i column j
3. Find X²α where df = (r-1)(k-1)
Example 2
A traffic-safety researcher has observed 500 vehicles at
a stop sign in a suburban neighborhood and recorded
the type of vehicle and driver behavior at the stop
sign. At the 0.05 level of significance, could there be
some relationship between driver behavior and the
type of vehicle being driven?
Behavior at Stop Sign Total
Type of Stopped Coasted Ran it
Vehicle Sedan 183 107 60 350
Wagon 54 27 19 100
Pickup 14 20 16 50
Total 251 154 95 500
Comparing Proportions from k
Independent Samples
1. H0 : π1 = π2 = π3 = …. = πk
H1 : At least one of the πj values is different
2. r k (O  E ) 2
  
2 ij ij

i 1 j 1 Eij
r = number of rows; k = number of columns;
Oij = observed frequency in row i column j;
Eij = expected frequency in row i column j
3. Find X²α where df = (r-1)(k-1)
Example 3
The contingency below was obtained from to survey data of
the Bureau of the Census:
Period Total

Moved to
a Diff
‘70-’74 ‘75-’79 ‘80-’84
State Yes 93 91 174 358
No 907 909 1826 3642
1000 1000 2000 4000

Use the 0.05 level in testing whether the proportions from


three periods could be the same!
HOMEWORK
1. At the 0.05 level of significance, test whether
the data represented in the following
frequency distribution could have been
drawn from a population that is normally
distributed with µ = 80 and σ = 5
>66 66-69 70-73 74-77 78-81 82-85 86-89 >89
1 2 20 53 52 46 24 2
2. A pharmaceutical firm, studying the selection of
“name brand” versus “generic equivalent” on
prescription forms, has been given a sample of 150
recent prescriptions submitted to a local pharmacy.
Of the 44 under-40 patients in the sample, 16
submitted a prescription form with the “generic
equivalent” box checked. Of the 52 patients in the
40-60 age group, 28 submitted a prescription form
specifying “generic equivalent”, and for the 54
patients in the 61-or-over age group, 32 submitted a
prescription form specifying “generic equivalent”. At
the 0.01 level, is age group independent of name
brand/generic specification?
3. It has been reported that 18.3% of all US
households were heated by electricity in
1980, compared to 26.5% in 1993 and 30.7%
in 2001. At the 0.05 level, and assuming a
sample size of 1000 US households for each
year, test whether the population
percentages could be equal for these years!

You might also like