Professional Documents
Culture Documents
Chi Squared Tests 卡方檢定
Chi Squared Tests 卡方檢定
Chi Squared Tests 卡方檢定
1
A Common Theme…
Number of Statistical
What to do? Data Type?
Categories? Technique:
Describe a Nominal 2 goodness of fit
Two or more
population ( 名目資料 ) test ( 適合度檢驗 )
2 test of a
Compare two contingency table
Nominal Two or more
populations ( 同質性檢定 , 列聯
表)
2 test of a
Compare two or contingency table
Nominal --
more populations ( 同質性檢定 , 列聯
表)
2 test of a
Analyze relationship
contingency table
between two Nominal --
( 獨立性檢定 , 列聯
variables
One data type… …Two techniques 表) 2
Introduction
3
15.1 Chi-Squared Goodness-of-Fit Test ( 適合
度檢定 )
• The hypothesis tested involves the probabilities p1, p2, …,
pk.of a multinomial distribution ( 多項分配 ).
• The multinomial experiment ( 多項實驗 ) is an extension
of the binomial experiment.
– There are n independent trials.
– The outcome of each trial can be classified into one of k
categories, called cells.
– The probability pi that the outcome fall into cell i remains
constant for each trial. Moreover,
p1 + p2 + … +pk = 1.
– Trials of the experiment are independent.
4
Chi-squared Goodness-of-Fit Test ( 適合度檢
定)
• We test whether there is sufficient evidence to
reject a pre-specified set of values for pi.
• The hypothesis:
H 0 : p1 a1 , p 2 a 2 ,..., p k a k
H 1 : At least one p i a i
6
The multinomial goodness of fit test -
Example
• Example 15.1 – continued
– To study the effect of the campaign on the market
shares, a survey was conducted.
– 200 customers were asked to indicate their preference
regarding the product advertised.
– Survey results:
• 102 customers preferred the company A’s product,
• 82 customers preferred the company B’s product,
• 16 customers preferred the competitors product.
7
The multinomial goodness of fit test -
Example
• Example 15.1 – continued
8
The multinomial goodness of fit test -
Example
• Solution
– The population investigated is the brand preferences.
– The data are nominal (A, B, or other)
– This is a multinomial experiment (three categories).
– The question of interest: Are p1, p2, and p3 different
after the campaign from their values before the
campaign?
9
The multinomial goodness of fit test -
Example
• The hypotheses are:
H0: p1 = .45, p2 = .40, p3 = .15
H1: At least one pi changed.
The expected frequency for each
category (cell) if the null hypothesis What actual frequencies
is true is shown below: did the sample return?
30 = 200(.15) 10
16
Company fi ( 觀測次 ei ( 期望次 (fi-ei) (fi-ei)2/ei
數) 數)
A 102 90 12 1.60
B 82 80 2 0.05
Others 16 30 -14 6.53
Total 200 200 2=8.18
11
The multinomial goodness of fit test -
Example
• The statistic is
k
( f e ) 2
2 i i
i 1 ei
where e i np i
2 i i
i 1 ei
where e i np i
16
Example: raining days
記錄過去 90 週內,每一週中下雨的天數,資料如表 13.1 所示
表 13.1 每一週中下雨的天數資料
每週下雨天數 0 1 2 3 4 5 6 7
週數 6 12 27 18 21 6 0 0
17
18
Step 2: 整理數據與計算統計量
表 13. 2 觀察次數與期望次數
x fi pi ei
0 6 0.0394 3.54 合併
1 12 0.1619 14.58
2 27 0.2853 25.68
3 18 0.2793 25.14
4 21 0.1640 14.76
5 6 0.0578 5.19
6 0 0.0113 1.02 合併
7 0 0.0010 0.09
總數 90 1 90
19
表 13. 3 觀察次數與期望次數
x fi pi ei
0~1 18 0.2013 18.12
2 27 0.2853 25.68
3 18 0.2793 25.14
4 21 0.1640 14.76
5~7 6 0.0701 6.3
總數 90 1 90
計算其 2 值為
(18 18.12) 2 (27 25.68) 2 (6 6.3) 2
2
... 4.75
18.12 25.68 6.3
Step 3: 結論
因 為 k 5 , 且 估 計 p̂ , 因 此 自 由 度 為 n-1-1 , 02.05,( 511) 02.05,3 7.815 , 由 於
2 4.75 7.815 02.05,3 ,所以在顯著水準 0.05 下接受(不拒絕)虛無假設 H 0 ,
亦即證據支持 X 服從二項分配。
20
Exercise #1: raining days-2
• 一週內下雨的天數,在顯著水準為 α = 0.05 之
下,試問是否服從 b(7, 0,025) 二項分配?
x pi fi ei fi-ei (fi-ei)^2/ei
0 6
1 12
2 27
3 18
4 21
5 6
6 0
7 0
Total 1 90 90
21
Exercise: raining days-2
表 13. 5 觀察次數與期望次數
x fi pi ei
0 6 0.1335 12
1 12 0.3114 28.02
2 27 0.3115 28.05
3 18 0.1730 15.57
4~7 27 0.0706 6.36
總數 90 1 90
計算其 2 值為
(6 12) 2 (12 28.02) 2 (27 6.36) 2
2
... 79.56
12 28.02 6.36
X pi ei fi (fi-ei)2/ei
0 0.11 10.74 9 0.28
1 0.27 26.84 31 0.64
2 0.30 30.20 29 0.05
3 0.20 20.13 18 0.23
>3 0.12 12.09 13 0.07
chi-square=1.267, p-value=0.867,
2 02.05,51 9.487
0.867>0.05, Do not reject Ho
24
Chi-Squared test for Poisson Distribution [ 補充 ]
1. Set up the null and alternative hypotheses ( 設定假
設 ).
2. Select a random sample ( 選擇樣本 ) and
a. Record the observed frequency, fi , for each of the k
values of the Poisson random variable. ( 紀錄觀察次
數)
b. Compute the mean number of occurrences, . ( 計
算平均數 )
3. Compute the expected frequency of occurrences, ei , for
each value of the Poisson random variable. ( 計算理
論次數 )
25
Goodness of Fit Test: Poisson
Distribution
4. Compute the value of the test statistic. ( 計算統
計量 )
2
k ( f e )
2 i i
i 1 ei
2 2
5. Reject H0 if
(where is the significance level and there are k-
m-1 degrees of freedom). ( 判斷 )
26
Example: Troy Parking Garage
• Poisson Distribution Goodness of Fit Test
In studying the need for an additional entrance to a
city parking garage, a consultant has
recommended an approach that is applicable only
in situations where the number of cars entering
during a specified time period follows a Poisson
distribution . ( 汽車進入停車場之情況是否服
從 Poisson distribution?)
27
Example: Troy Parking Garage
• Poisson Distribution Goodness of Fit Test
A random sample of 100 one-minute time intervals
resulted in the customer arrivals listed below. A statistical
test must be conducted to see if the assumption of a
Poisson distribution is reasonable.
# Arrivals 0 1 2 3 4 5 6 7 8 9 10 11 12
Frequency 0 1 4 10 14 20 12 12 9 8 6 3 1
28
Example: Troy Parking Garage
• Poisson Distribution Goodness of Fit Test
– Hypotheses
H0: Number of cars entering the garage during
a one-minute interval is Poisson distributed.
H1: Number of cars entering the garage during
a one-minute interval is not Poisson
distributed
29
Example: Troy Parking Garage
• Poisson Distribution Goodness of Fit Test
– Estimate of Poisson Probability Function
Total Arrivals = 0(0) + 1(1) + 2(4) + . . . + 12(1) = 600
Total Time Periods = 100
Estimate of = 600/100 = 6
e x e 6 6 x
Hence, f ( x) x! x!
30
Example: Troy Parking Garage
• Poisson Distribution Goodness of Fit Test
–Expected Frequencies
x f (x ) xf (x )=ei x f (x ) xf (x ) =ei
0 .0025 .25 7 .1389 13.89
1 .0149 1.49 8 .1041 10.41
2 .0446 4.46 9 .0694 6.94
3 .0892 8.92 10 .0417 4.17
4 .1339 13.39 11 .0227 2.27
5 .1620 16.20 12 .0155 1.55
6 .1606 16.06 Total 1.0000 100.00
31
Example: Troy Parking Garage
• Poisson Distribution Goodness of Fit Test
– Observed and Expected Frequencies
i fi ei fi - ei
0 or 1 or 2 5 6.20 -1.20
3 10 8.92 1.08
4 14 13.39 .61
5 20 16.06 3.94
6 12 16.06 -4.06
7 12 13.77 -1.77
8 9 10.33 -1.33
9 8 6.88 1.12
10 or more 10 8.39 1.61
32
Example: Troy Parking Garage
• Poisson Distribution Goodness of Fit Test
– Test Statistic
( 1 .20) 2
(1 . 08) 2
( 2. 01) 2
2 ... 2.637
6.20 8.92 7.99
– Rejection Rule
With = .05 and k - p - 1 = 9 - 1 - 1 = 7 d.f. (where k = number of
categories and p = number of population parameters estimated),
.205, 72 14.07
Reject H0 if > 14.07
– Conclusion
We cannot reject H0. There’s no reason to doubt the
assumption of a Poisson distribution.
33
Exercise #2
• 隨機抽查過去 100 週機器每週更換組件數的次數
分配如下表,試以 =0.01 檢定其是否符合 =4 的
Poisson 分配模型?
組件數 0 1 2 3 4 5 6 7 8
X
週數 1 4 18 22 14 17 18 3 3
34
Solution
X 機率 期望次數 實際次數
0.00 0.02 1.83 1.00
1.00 0.07 7.33 4.00
2.00 0.15 14.65 18.00
3.00 0.20 19.54 22.00
4.00 0.20 19.54 14.00
5.00 0.16 15.63 17.00
6.00 0.10 10.42 18.00
7.00 0.06 5.95 3.00
8.00 0.03 2.98 3.00
35
Chi-Squared test for Normality [15.4]
• The goodness of fit Chi-squared test can be used to
determined if data were drawn from any distribution.
• The general procedure:
– Hypothesize on the parameter values of the distribution we test
(i.e. 0, 0 for the normal distribution).
– For the variable tested X specify, disjoint ranges that cover all its
possible values.
– Build a Chi squared statistic that (aggregately) compares the
expected frequency under H0 and the actual frequency of
observations that fall in each range.
– Run a goodness of fit test based on the multinomial experiment.
36
Chi-Squared test for Normality
• For a sample size of n=50 ,the sample mean was
460.38 with standard error of 38.83. Can we infer from
the data provided that this sample was drawn from a
normal distribution with = 460.38 and = 38.83? Use
5% significance level.
• Interval 1: X ≦ 421.55 f1=10
• Interval 2: 421.55≦X ≦460.38 f2=13
• Interval 3: 460.38≦X ≦499.21 f3=19
• Interval 3: X≧499.21 f4=8
37
Solution
• First let us select z values that define expected proportions
and frequency in each cell (expected frequency > 5 for
each cell.)
421.55 460.38
P1 P( Z 421.55) P( Z ) P ( Z 1) 0.1587
38.83
421.55 460.38 460.38 460.38
P2 P(421.55 Z 460.38) P( Z ) P(1 Z 0) 0.3413
38.83 38.83
460.38 460.38 499.21 460.38
P3 P(421.55 Z 499.21) P( Z ) P(0 Z 1) 0.3413
38.83 38.83
499.21 460.38
P4 P( Z 499.21) P( Z ) P( Z 1) 0.0.1587
38.83
38
2 test for normality
Solution
Expected frequency
z1 = -1; P(z < -1) = p1 = .1587; e1 = np1 = 50(.1587) = 7.94
z2 = 0; P(-1 < z< 0) = p2 = .3413; e2 = np2 = 50(.3413) = 17.07
z3 = 1; P(0 < z < 1) = p3 = .3413; e3 = 17.07
P(z > 1) = p4 = .1587; e4 = 7.94
f3 = 19
e2 = 17.07 e3 = 17.07
f2 = 13
f1 = 10 f4 = 8
e1 = 7.94 e4 = 7.94
40
2 test for normality
– The test statistic
=2 (10 - 7.94)2
7.94 +
(13 - 17.07)2
17.07
+
(19 - 17.07)2
17.07
+ (8 - 7.94)2 = 1.72
7.94
42
Example- 2 test for normality
• H0:The age of customers is normally distributed, N(15,16).
H1: H0:The age of customers is not normally distributed, N(15,16).
• Let =0.05.
• RR={2>20.05,7-1=12.59}
• Test statistics
= 2 (20 - 26.72)2
26.72 +
(80 – 36.76)2
36.76
+
(120-59.62)2
59.62
+…+ (4 – 42.24)2
= 294.02
42.24
Conclusion: There is sufficient evidence to conclude at 5% significance
43
level that the data are not normally distributed.
0 15 9 15
P1 P(0 X 9) P ( Z ) P (3.75 Z 1.5) 0.0668
4 4
9 15 11 15
P2 P (0 X 9) P ( Z ) P (1.5 Z 1.0) 0.0919
4 4
11 15 13 15
P3 P(11 X 13) P ( Z ) P(1.0 Z 0.5) 0.1498
4 4
17 15 20 15
P6 P (17 X 20) P( Z ) P (0.5 Z 1.25) 0.2029
4 4
20 15
P7 P ( X 20) P ( Z ) P( Z 1.25) 0.1056
4
44
15.2 Chi-squared Test of a Contingency Table
( 列聯表 )
• This test is used to test whether…
– two nominal variables are related ( 獨立性檢定 ).
– there are differences between two or more populations
of a nominal variable ( 一致性檢定 ).
• To accomplish the test objectives, we need to
classify the data according to two different criteria.
45
Contingency table 2 test –
Example
• Example 15.2
– In an effort to better predict the demand for courses
offered by a certain MBA program, it was hypothesized
that students’ academic background affect their choice
of MBA major, thus, their courses selection ( 學生的
教育背景影響對主修的選擇 ).
– A random sample of last year’s MBA students was
selected. The following contingency table summarizes
relevant data.
46
Contingency table 2 test –
Example
Degree Accounting Finance Marketing
BA(藝術) 31 13 16 60
BENG(工程 ) 8 16 7 31
BBA 12 10 17 60
Other 10 5 7 39
61 44 47 152
50
k
( fi e i )2
2
i1
ei
2= (31 - 24.08)2
24.08 +….+
(5 - 6.39)2
6.39 +….+
(7 - 6.80)2
6.80
= 14.70
51
Contingency table 2 test –
Example
• Solution – continued
– The critical value in our example is:
2 ,( r 1)( c 1) .205,( 4 1)( 31) 12.5916
• Conclusion:
Since 2 = 14.70 > 12.5916, there
is sufficient evidence to infer at 5% significance
55
Solution
假設:
H 0 : p1 (南區) p2 (北區)
H 1 : p1 p 2
計算其 2 值為
2
2 2 f ij eij
2
i 1 j 1 eij
88 74
2
12 26
2
60 74
2
40 26
2
= 20.374
74 26 74 26
由於 2 20.374 > 6.635 02.01,( 21)( 21) ,所以在顯著水準 0.01 下拒絕虛無
假設 H 0 ,亦即有足夠的證據顯示,南北兩區農民中,贊成 農民年金制度者
所佔的比例不一致。
56
Exercise #3
某一大學學生代聯會決議自下年度對校內停車採取抽籤收費制,經
抽樣 200 位同學,結果如表 13.14 所示
學院 贊成 反對 無意見
文 30 23 7
法 32 24 8
商 37 30 9
57
解:
虛無假設與對立假設建立如下:
H 0 : 學院與意向是獨立
H 1 : 學院與意向是相關
表 13.15 期望次數
意向(j)
學院( i ) 列和
贊成 反對 無意見
60 99 60 77 60 24
文 29.70 23.10 7.20 60
200 200 200
64 99 64 77 64 24
法 31.68 24.64 7.68 64
200 200 200
76 99 76 77 76 24
商 37.62 29.26 9.12 76
200 200 200
行和 99 77 24 200
58
Solution
計算其 2 值為
2
3 3 f ij eij
2
i 1 j 1 eij
=
30 29.70
2
23 23.10
2
9 9.12
2
59
HW
60
Exercise #2-Solution
•隨機抽查過去 100 週機器每週更換組件數的次數分配如下
表,試以 a=0.05 檢定其是否符合 =4 的 Poisson 分配模型?
61