Chi Squared Tests 卡方檢定

Chapter 15
Chi Squared Tests

卡方檢定
1
A Common Theme…
Number of Statistical
What to do? Data Type?
Categories? Technique:
Describe a Nominal 2 goodness of fit
Two or more
population ( 名目資料 ) test ( 適合度檢驗 )
2 test of a
Compare two contingency table
Nominal Two or more
populations ( 同質性檢定 , 列聯
表)
2 test of a
Compare two or contingency table
Nominal --
more populations ( 同質性檢定 , 列聯
表)
2 test of a
Analyze relationship
contingency table
between two Nominal --
( 獨立性檢定 , 列聯
variables
One data type… …Two techniques 表) 2
Introduction
• Two statistical techniques are presented, to

analyze nominal data.
– A goodness-of-fit test for the multinomial experiment
( 多項實驗 ).
– A contingency table test of independence.
• Both tests use the 2 as the sampling distribution
of the test statistic.
3
15.1 Chi-Squared Goodness-of-Fit Test ( 適合
度檢定 )
• The hypothesis tested involves the probabilities p1, p2, …,
pk.of a multinomial distribution ( 多項分配 ).
• The multinomial experiment ( 多項實驗 ) is an extension
of the binomial experiment.
– There are n independent trials.
– The outcome of each trial can be classified into one of k
categories, called cells.
– The probability pi that the outcome fall into cell i remains
constant for each trial. Moreover,
p1 + p2 + … +pk = 1.
– Trials of the experiment are independent.
4
Chi-squared Goodness-of-Fit Test ( 適合度檢
定)
• We test whether there is sufficient evidence to
reject a pre-specified set of values for pi.
• The hypothesis:
H 0 : p1  a1 , p 2  a 2 ,..., p k  a k
H 1 : At least one p i  a i
• The test builds on comparing actual frequency (fi

觀測次數 ) and the expected frequency ( 期望
次數 ) of occurrences in all the cells. 5
The multinomial goodness of fit test -
Example
• Example 15.1
– Two competing companies A and B have enjoy
dominant position in the market. The companies
conducted aggressive advertising campaigns.
– Market shares before the campaigns were:
• Company A = 45%
• Company B = 40%
• Other competitors = 15%.
6
Example
• Example 15.1 – continued
– To study the effect of the campaign on the market
shares, a survey was conducted.
– 200 customers were asked to indicate their preference
regarding the product advertised.
– Survey results:
• 102 customers preferred the company A’s product,
• 82 customers preferred the company B’s product,
• 16 customers preferred the competitors product.
7
Example
Can we conclude at 5% significance level that the

market shares were affected by the advertising
campaigns?
8
Example
• Solution
– The population investigated is the brand preferences.
– The data are nominal (A, B, or other)
– This is a multinomial experiment (three categories).
– The question of interest: Are p1, p2, and p3 different
after the campaign from their values before the
campaign?
9
Example
• The hypotheses are:
H0: p1 = .45, p2 = .40, p3 = .15
H1: At least one pi changed.
The expected frequency for each
category (cell) if the null hypothesis What actual frequencies
is true is shown below: did the sample return?
90 = 200(.45) 80 = 200(.40) 102 82

1
2
3 1
2
3
30 = 200(.15) 10
16
Company fi ( 觀測次 ei ( 期望次 (fi-ei) (fi-ei)2/ei
數) 數)
A 102 90 12 1.60
B 82 80 2 0.05
Others 16 30 -14 6.53
Total 200 200 2=8.18
11
Example
• The statistic is
k
( f  e ) 2
2   i i
i 1 ei
where e i  np i
• The rejection region is

 2   2 ,k 1
12
Example
k
(102  90) 2
(82  80) 2
(16  30) 2
2      8.18
i 1 90 80 30
2 ,k 1  .205,31  5.99147
The p value  P (  2  8.18)  .01679

[ from Excel ( CHIDIST (8.18, 2)]
13
The goodness of fit test for distributions
• The statistic is
k
( f  e ) 2
2   i i
i 1 ei
where e i  np i
• The rejection region is

 2  2 , k  m 1
m ：未知母體參數個數
14
Example
• Example 15.1 – continued  
2 2
0.05, 3 0 1  2
0.05, 2
2 with 2 degrees of freedom

0.025
Conclusion: Since 8.18 > 5.99, there is sufficient
0.02 evidence at 5% significance level to reject the null
hypothesis. At least one of the probabilities pi is
0.015
different. Thus, at least two market shares have
0.01 changed.
Alpha P value
0.005
0
5.99 8.18
0 2 4 6 8 10 12
Rejection region
15
Required conditions –
the rule of five
• The test statistic used to perform the test is only
approximately Chi-squared distributed.
• For the approximation to apply, the expected cell
frequency has to be at least 5 for all the cells
(npi  5).
• If the expected frequency in a cell is less than 5,
combine it with other cells.
16
Example: raining days
記錄過去 90 週內，每一週中下雨的天數，資料如表 13.1 所示
表 13.1 每一週中下雨的天數資料
每週下雨天數 0 1 2 3 4 5 6 7
週數 6 12 27 18 21 6 0 0
X 表一週內下雨的天數，在顯著水準為 α＝0.05 之下，試問 X 是

否服從二項分配？
17
18
Step 2: 整理數據與計算統計量
表 13. 2 觀察次數與期望次數
x fi pi ei
0 6 0.0394 3.54 合併
1 12 0.1619 14.58
2 27 0.2853 25.68
3 18 0.2793 25.14
4 21 0.1640 14.76
5 6 0.0578 5.19
6 0 0.0113 1.02 合併
7 0 0.0010 0.09
總數 90 1 90
可以發現 x  0 這組，以及 x  6 、 x  7 這兩組，期望次數太小( ei < 5)需要進行併

組，其中 x  6、 x  7 這兩組合併之後期望次數仍然太小，再與 x  5 這一組合併，
而最後只剩下 5 組資料。將合併之後的資料再次經過整理可得表 13. 3 所示
19
x fi pi ei
0~1 18 0.2013 18.12
2 27 0.2853 25.68
3 18 0.2793 25.14
4 21 0.1640 14.76
5~7 6 0.0701 6.3
總數 90 1 90
計算其  2 值為
(18  18.12) 2 (27  25.68) 2 (6  6.3) 2
 
2
  ...   4.75
18.12 25.68 6.3
Step 3: 結論
因為 k  5 ，且估計 p̂ ，因此自由度為 n-1-1 ，  02.05,( 511)   02.05,3  7.815 ，由於
 2  4.75  7.815   02.05,3 ，所以在顯著水準   0.05 下接受(不拒絕)虛無假設 H 0 ，
亦即證據支持 X 服從二項分配。
20
Exercise #1: raining days-2
• 一週內下雨的天數，在顯著水準為 α ＝ 0.05 之
下，試問是否服從 b(7, 0,025) 二項分配？
x pi fi ei fi-ei (fi-ei)^2/ei
0 6
1 12
2 27
3 18
4 21
5 6
6 0
7 0
Total 1 90 90
21
Exercise: raining days-2
x fi pi ei
0 6 0.1335 12
1 12 0.3114 28.02
2 27 0.3115 28.05
3 18 0.1730 15.57
4~7 27 0.0706 6.36
總數 90 1 90
(6  12) 2 (12  28.02) 2 (27  6.36) 2
 
2
  ...   79.56
12 28.02 6.36
又  02.05,( 51)   02.05, 4  9.48773 ，由於  2  79.56  9.48773 ，落在拒絕域內，所以在顯著水
準   0.05 下拒絕虛無假設 H 0 ，即證據顯示 X 不服從二項分配 b(7,0.25) 。

22
2 Test on Probability Distribution
---Binomial Distribution
Example:
Blitz laundry is well known for its obnoxious commercials, which advertise
that 20% of all Blitz boxes contain a valuable discount coupon. The
commercials also claim that the boxes containing coupons are randomly
distributed across all stores carrying the product. A recent study obtained
a random sample of ten Blitz boxes from each of 100 different stores. The
results were
Of 10 Boxes, number contain coupons Number of stores
0 9
1 31
2 29
3 18
>3 13
Do these data appear to come from a binomial distribution? (=0.05)
23
Solution
• Ho: X 服從 Binominal distribution (p=0.2, n=10)
• H1:X 不服從 Binominal distribution (p=0.2, n=10)
X pi ei fi (fi-ei)2/ei
0 0.11 10.74 9 0.28
1 0.27 26.84 31 0.64
2 0.30 30.20 29 0.05
3 0.20 20.13 18 0.23
>3 0.12 12.09 13 0.07
chi-square=1.267, p-value=0.867,
 2   02.05,51  9.487
0.867>0.05, Do not reject Ho
24
Chi-Squared test for Poisson Distribution [ 補充 ]
1. Set up the null and alternative hypotheses ( 設定假
設 ).
2. Select a random sample ( 選擇樣本 ) and
a. Record the observed frequency, fi , for each of the k
values of the Poisson random variable. ( 紀錄觀察次
數)
b. Compute the mean number of occurrences, . ( 計
算平均數 )
3. Compute the expected frequency of occurrences, ei , for
each value of the Poisson random variable. ( 計算理
論次數 )
25
Goodness of Fit Test: Poisson
Distribution
4. Compute the value of the test statistic. ( 計算統
計量 )
2
k ( f  e )
2   i i
i 1 ei
2 2
5. Reject H0 if   
(where  is the significance level and there are k-
m-1 degrees of freedom). ( 判斷 )
26
Example: Troy Parking Garage
• Poisson Distribution Goodness of Fit Test
In studying the need for an additional entrance to a
city parking garage, a consultant has
recommended an approach that is applicable only
in situations where the number of cars entering
during a specified time period follows a Poisson
distribution . ( 汽車進入停車場之情況是否服
從 Poisson distribution?)
27
A random sample of 100 one-minute time intervals
resulted in the customer arrivals listed below. A statistical
test must be conducted to see if the assumption of a
Poisson distribution is reasonable.
# Arrivals 0 1 2 3 4 5 6 7 8 9 10 11 12
Frequency 0 1 4 10 14 20 12 12 9 8 6 3 1
28
– Hypotheses
H0: Number of cars entering the garage during
a one-minute interval is Poisson distributed.
H1: Number of cars entering the garage during
a one-minute interval is not Poisson
distributed
29
– Estimate of Poisson Probability Function
Total Arrivals = 0(0) + 1(1) + 2(4) + . . . + 12(1) = 600
Total Time Periods = 100
Estimate of  = 600/100 = 6
e  x e 6 6 x
Hence, f ( x)  x!  x!
30
–Expected Frequencies
x f (x ) xf (x )=ei x f (x ) xf (x ) =ei
0 .0025 .25 7 .1389 13.89
1 .0149 1.49 8 .1041 10.41
2 .0446 4.46 9 .0694 6.94
3 .0892 8.92 10 .0417 4.17
4 .1339 13.39 11 .0227 2.27
5 .1620 16.20 12 .0155 1.55
6 .1606 16.06 Total 1.0000 100.00
31
– Observed and Expected Frequencies
i fi ei fi - ei
0 or 1 or 2 5 6.20 -1.20
3 10 8.92 1.08
4 14 13.39 .61
5 20 16.06 3.94
6 12 16.06 -4.06
7 12 13.77 -1.77
8 9 10.33 -1.33
9 8 6.88 1.12
10 or more 10 8.39 1.61
32
– Test Statistic
(  1 .20) 2
(1 . 08) 2
( 2. 01) 2
2    ...   2.637
6.20 8.92 7.99
– Rejection Rule
With  = .05 and k - p - 1 = 9 - 1 - 1 = 7 d.f. (where k = number of
categories and p = number of population parameters estimated),
 .205, 72  14.07
Reject H0 if  > 14.07
– Conclusion
We cannot reject H0. There’s no reason to doubt the
assumption of a Poisson distribution.
33
Exercise #2
• 隨機抽查過去 100 週機器每週更換組件數的次數
分配如下表，試以 =0.01 檢定其是否符合 =4 的
Poisson 分配模型？
組件數 0 1 2 3 4 5 6 7 8
X
週數 1 4 18 22 14 17 18 3 3
34
Solution
X 機率期望次數實際次數
0.00 0.02 1.83 1.00
1.00 0.07 7.33 4.00
2.00 0.15 14.65 18.00
3.00 0.20 19.54 22.00
4.00 0.20 19.54 14.00
5.00 0.16 15.63 17.00
6.00 0.10 10.42 18.00
7.00 0.06 5.95 3.00
8.00 0.03 2.98 3.00
35
Chi-Squared test for Normality [15.4]
• The goodness of fit Chi-squared test can be used to
determined if data were drawn from any distribution.
• The general procedure:
– Hypothesize on the parameter values of the distribution we test
(i.e. 0, 0 for the normal distribution).
– For the variable tested X specify, disjoint ranges that cover all its
possible values.
– Build a Chi squared statistic that (aggregately) compares the
expected frequency under H0 and the actual frequency of
observations that fall in each range.
– Run a goodness of fit test based on the multinomial experiment.
36
Chi-Squared test for Normality
• For a sample size of n=50 ,the sample mean was
460.38 with standard error of 38.83. Can we infer from
the data provided that this sample was drawn from a
normal distribution with  = 460.38 and = 38.83? Use
5% significance level.
• Interval 1: X ≦ 421.55 f1=10
• Interval 2: 421.55≦X ≦460.38 f2=13
• Interval 3: 460.38≦X ≦499.21 f3=19
• Interval 3: X≧499.21 f4=8
37
Solution
• First let us select z values that define expected proportions
and frequency in each cell (expected frequency > 5 for
each cell.)
421.55  460.38
P1  P( Z  421.55)  P( Z  )  P ( Z  1)  0.1587
38.83
421.55  460.38 460.38  460.38
P2  P(421.55  Z  460.38)  P( Z )  P(1  Z  0)  0.3413
38.83 38.83
460.38  460.38 499.21  460.38
P3  P(421.55  Z  499.21)  P( Z )  P(0  Z  1)  0.3413
38.83 38.83
499.21  460.38
P4  P( Z  499.21)  P( Z  )  P( Z  1)  0.0.1587
38.83
38
2 test for normality
Solution
Expected frequency
z1 = -1; P(z < -1) = p1 = .1587; e1 = np1 = 50(.1587) = 7.94
z2 = 0; P(-1 < z< 0) = p2 = .3413; e2 = np2 = 50(.3413) = 17.07
z3 = 1; P(0 < z < 1) = p3 = .3413; e3 = 17.07
P(z > 1) = p4 = .1587; e4 = 7.94
The cell boundaries are The expected

calculated from the frequencies e2 = 17.07 e3 = 17.07
corresponding z values can now be
under H0. determined for
z1 =(x1 - 460.38)38.83 = -1; each cell.
.3413 .3413
x1 = 421.55
e1 = 7.94 e4 = 7.94
.1587 .1587
39
421.55 460.38 499.21
– The test statistic
=2 (10 - 7.94)2

7.94 +
(13 - 17.07)2
17.07
+
(19 - 17.07)2
17.07
+ (8 - 7.94)2 = 1.72
7.94
f3 = 19
e2 = 17.07 e3 = 17.07
f2 = 13
f1 = 10 f4 = 8
e1 = 7.94 e4 = 7.94
40
– The test statistic
=2 (10 - 7.94)2
7.94 +
(13 - 17.07)2
17.07
+
(19 - 17.07)2
17.07
+ (8 - 7.94)2 = 1.72
7.94
– The rejection region

 2  2 ,k 1 m where m is the number of parameters
estimated from the data (  ,  ).
 2 ,k3  .205,43  3.84146
Conclusion: There is insufficient evidence to conclude at
5% significance level that the data are not normally
distributed. 41
Example- 2 test for normality
• The owner of a record company specializing in rock music
thinks that the ages of the company’s customer are normally
distributed with mean 15 years and variance 16.
• A sample size of 400 randomly chosen customers is obtained.
• With the following information, test the null hypothesis that the
sample comes from a normally distributed population with
mean=15 and variance=16.
Age 0-9 9-11 11-13 13-15 15-17 17-20 20+

fi 20 80 120 140 20 16 4
42
Example- 2 test for normality
• H0:The age of customers is normally distributed, N(15,16).
H1: H0:The age of customers is not normally distributed, N(15,16).
• Let =0.05.
• RR={2>20.05,7-1=12.59}
• Test statistics
Age 0-9 9-11 11-13 13-15 15-17 17-20 20+

fi 20 80 120 140 20 16 4
pi 0.0668 0.0919 0.1498 0.1915 0.1915 0.2029 0.1056
ei 26.72 36.76 59.62 76.6 81.16 81.16 42.24
= 2 (20 - 26.72)2
26.72 +
(80 – 36.76)2
36.76
+
(120-59.62)2
59.62
+…+ (4 – 42.24)2
= 294.02
42.24
Conclusion: There is sufficient evidence to conclude at 5% significance
43
level that the data are not normally distributed.
0  15 9  15
P1  P(0  X  9)  P ( Z )  P (3.75  Z  1.5)  0.0668
4 4
9  15 11  15
P2  P (0  X  9)  P ( Z )  P (1.5  Z  1.0)  0.0919
4 4
11  15 13  15
P3  P(11  X  13)  P ( Z )  P(1.0  Z  0.5)  0.1498
4 4

17  15 20  15
P6  P (17  X  20)  P( Z )  P (0.5  Z  1.25)  0.2029
4 4
20  15
P7  P ( X  20)  P ( Z  )  P( Z  1.25)  0.1056
4
44
15.2 Chi-squared Test of a Contingency Table
( 列聯表 )
• This test is used to test whether…
– two nominal variables are related ( 獨立性檢定 ).
– there are differences between two or more populations
of a nominal variable ( 一致性檢定 ).
• To accomplish the test objectives, we need to
classify the data according to two different criteria.
45
Contingency table 2 test –
Example
• Example 15.2
– In an effort to better predict the demand for courses
offered by a certain MBA program, it was hypothesized
that students’ academic background affect their choice
of MBA major, thus, their courses selection ( 學生的
教育背景影響對主修的選擇 ).
– A random sample of last year’s MBA students was
selected. The following contingency table summarizes
relevant data.
46
Example
Degree Accounting Finance Marketing
BA(藝術) 31 13 16 60
BENG(工程 ) 8 16 7 31
BBA 12 10 17 60
Other 10 5 7 39
61 44 47 152
The observed values
There are two ways to address the problem

If each classification is considered If each undergraduate degree
a nominal variable, are these two is considered a population, do
variables dependent ( 是否互相影響 )? these populations differ ( 一致
性檢定 )? 47
Example
• Solution Since ei = npi but pi is
– The hypotheses are: unknown, we need to
estimate the unknown
H0: The two variables are independent probability from the data,
H1: The two variables are dependent assuming H0 is true.
– The test statistic – The rejection region

k
( fi  e i ) 2
 
2
 i1
ei
 2   2,(r 1)( c 1)
k is the number of cells in

48
the contingency table.
Estimating the expected frequencies
Undergraduate MBA Major
Degree Accounting Finance Marketing Probability
BA 6060 60/152
BENG 31 31/152
BBA 3939 39/152
Other 22 22/152
6161 44
44 47 152
152
Probability 61/152 44/152 47/152
Under the null hypothesis the two variables are independent:
P(Accounting and BA) = P(Accounting)*P(BA)= [61/152][60/152].

The number of students expected to fall in the cell “Accounting - BA” is
eAcct-BA = n(pAcct-BA) = 152(61/152)(60/152) = [61*60]/152 = 24.08
The number of students expected to fall in the cell “Finance - BBA” is
eFinance-BBA = npFinance-BBA = 152(44/152)(39/152) = [44*39]/152 = 11.29
49
The expected frequencies for a
contingency table
• The expected frequency of cell of raw i and
column j in the contingency table is calculated by
(Column j total)(Row i total)

eij =
Sample size
50
k
( fi  e i )2
 
2

i1
ei
Calculation of the 2 statistic

• Solution – continued
Undergraduate MBA Major
Degree Accounting Finance Marketing
BA 31 24.08
31 (24.08)
k 13 (17.37) 2 16 (18.55) 60
(f  e )

BENG 2 8 (12.44) 16 (8.97) 7 (9.58) 31
 
BBA 31 24.08
12 (15.65)
i i
10 (11.29) 17 (12.06) 39
Other
31 24.08
10 (8.83)
1
i61
e 55 6.39
(6.39) 77 6.80
i
(6.80) 22
44 47 152
5 6.39 7 6.80
31 24.08
The expected frequency
5 6.39 7 6.80
31 24.08
5 6.39 7 6.80
 2= (31 - 24.08)2
24.08 +….+
(5 - 6.39)2
6.39 +….+
(7 - 6.80)2
6.80
= 14.70
51
Example
• Solution – continued
– The critical value in our example is:
 2 ,( r 1)( c 1)   .205,( 4 1)( 31)  12.5916
• Conclusion:
Since 2 = 14.70 > 12.5916, there
is sufficient evidence to infer at 5% significance
level that students’ undergraduate degree

and MBA students courses selection
52
are dependent.
Using the computer
Define a code to specify each nominal Select the Chi squared / raw data
value. Input the data in columns one Option from Data Analysis Plus
column for each category. under tools. See Xm15-02
Code: Degree MBA Major Contingency Table

Undergraduate degree 3 1 1 2 3 Total
1 = BA 1 1 1 31 13 16 60
2 = BENG 2 8 16 7 31
1 1
3 = BBA
1 1 3 12 10 17 39
4 = OTHERS
2 2 4 10 5 7 22
MBA Major
1 = ACCOUNTING 1 3 Total 61 44 47 152
2 = FINANCE . . Test Statistic CHI-Squared = 14.7019
3 = MARKETING . . P-Value = 0.0227
53
Required condition Rule of five
– The 2 distribution provides an adequate approximation to the
sampling distribution under the condition that eij >= 5 for all
the cells.
– When eij < 5 rows or columns must be added such that the
condition is met.
Example
18 (12.8)
10 (10.1) 14 (17.9)
4 (5.1)
12 (12.7) 16 (16.0) 7 (6.3)
23 (22.3)
8 ( 7.2) 8 (9.2) 4 (3.6)
12 (12.8)
14 + 4 12.8 + 5.1
We combine
16 + 7 16 + 6.3
column 2 and 3
8+4 9.2 + 3.6
54
Example
n1.n.1 100 148 n2. n.1 100 × 148

  74 ,
  74,
N 200 N 200
n1.n.2 100  52 n2. n.2 100 × 52
  26,   26
N 200 N 200
55
Solution
假設：
H 0 : p1 (南區)  p2 (北區)
H 1 : p1  p 2
  
2
2 2 f ij  eij 
2
i 1 j 1 eij

 88  74 
2

 12  26
2

 60  74 
2

 40  26 
2
= 20.374
74 26 74 26
由於  2  20.374 > 6.635   02.01,( 21)( 21) ，所以在顯著水準   0.01 下拒絕虛無
假設 H 0 ，亦即有足夠的證據顯示，南北兩區農民中，贊成農民年金制度者
所佔的比例不一致。
56
Exercise #3
某一大學學生代聯會決議自下年度對校內停車採取抽籤收費制，經
抽樣 200 位同學，結果如表 13.14 所示
表 13.14 3x3 列聯表
學院贊成反對無意見
文 30 23 7
法 32 24 8
商 37 30 9
試問：在顯著水準為α= 0.05 之下，學院之間意向是否獨立？
57
解：
虛無假設與對立假設建立如下：
H 0 : 學院與意向是獨立
H 1 : 學院與意向是相關
分別計算期望次數 eij 如表 13.15 所示：
表 13.15 期望次數
意向(j)
學院( i ) 列和
贊成反對無意見
60  99 60  77 60  24
文  29.70  23.10  7.20 60
200 200 200
64  99 64  77 64  24
法  31.68  24.64  7.68 64
200 200 200
76  99 76  77 76  24
商  37.62  29.26  9.12 76
200 200 200
行和 99 77 24 200
58
Solution
 2  
3 3 f ij  eij 
2
i 1 j 1 eij
=
 30  29.70
2

 23  23.10
2
 
 9  9.12 
2
29.70 23.10 9.12

 0.0727
因為 r  3, c  3 ，母體參數 pi. 與 p. j 未知，
可得  02.05,( 31)( 31)   02.05, 4  9.4877 ，

由於  2  0.0727 <9.4877，
所以在顯著水準   0.05 下接受(不拒絕)虛無假設 H 0 ，
亦即有足夠的證據顯示學院之間的意向是獨立。
59
HW
• 15.1, 15.23, 15.35, A15.1 (p.611)
60
Exercise #2-Solution
•隨機抽查過去 100 週機器每週更換組件數的次數分配如下
表，試以 a=0.05 檢定其是否符合 =4 的 Poisson 分配模型？
Critical value chi-square (0.01, 7-1)=16.8
61

Chi Squared Tests 卡方檢定

Uploaded by

Copyright:

Available Formats

You might also like

Chi Squared Tests 卡方檢定

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chi Squared Tests 卡方檢定

Uploaded by

Copyright:

Available Formats

Chapter 15

Chi Squared Tests

• Two statistical techniques are presented, to

• The test builds on comparing actual frequency (fi

Can we conclude at 5% significance level that the

90 = 200(.45) 80 = 200(.40) 102 82

• The rejection region is

The p value  P (  2  8.18)  .01679

• The rejection region is

2 with 2 degrees of freedom

X 表一週內下雨的天數，在顯著水準為 α＝0.05 之下，試問 X 是

可以發現 x  0 這組，以及 x  6 、 x  7 這兩組，期望次數太小( ei < 5)需要進行併

又  02.05,( 51)   02.05, 4  9.48773 ，由於  2  79.56  9.48773 ，落在拒絕域內，所以在顯著水

準   0.05 下拒絕虛無假設 H 0 ，即證據顯示 X 不服從二項分配 b(7,0.25) 。

The cell boundaries are The expected

=2 (10 - 7.94)2

– The rejection region

Age 0-9 9-11 11-13 13-15 15-17 17-20 20+

Age 0-9 9-11 11-13 13-15 15-17 17-20 20+

The observed values

There are two ways to address the problem

– The test statistic – The rejection region

k is the number of cells in

Under the null hypothesis the two variables are independent:

P(Accounting and BA) = P(Accounting)*P(BA)= [61/152][60/152].

(Column j total)(Row i total)

Calculation of the 2 statistic

level that students’ undergraduate degree

Code: Degree MBA Major Contingency Table

n1.n.1 100 148 n2. n.1 100 × 148

表 13.14 3x3 列聯表

試問：在顯著水準為α= 0.05 之下，學院之間意向是否獨立？

分別計算期望次數 eij 如表 13.15 所示：

29.70 23.10 9.12

可得  02.05,( 31)( 31)   02.05, 4  9.4877 ，

• 15.1, 15.23, 15.35, A15.1 (p.611)

Critical value chi-square (0.01, 7-1)=16.8

You might also like