Sardilla's Report On Advance Statistic

A N A LY S IS O F
E NU M E R A T I O
N DA T A
Tests of enumeration data that are considered in this
book are chi-square test of goodness-of-fit test and Chi-
square test of independence. These tests are
nonparametric tests and are widely used in research. The
chi-square test is denoted by the symbol X²- test. Perhaps
the very reason for using it is its simplicity in terms of
computation. As a nonparametric test, it is free from an
assumption with respect to the distribution of data. This
test is classified under distribution-free statistics.
IMPORTANT TERMS
 PARAMETRIC TEST: the test in which, the population constants like mean,std deviation. std error,
correlation coefficient, proportion etc. and data tend to follow one assumed or established distribution
such as normal, binomial. poisson etc.
 NON PARAMETRIC TEST: the test in which no constant Of a population is used. Data do not
follow any specific distribution and no assumption are made in these tests. E.g. to classify good.
better and best we just allocate arbitrary numbers or marks to each category.
 HYPOTHESIS: It is a definite statement about the population parameters.
 NULL HYPOTHESIS states that no association exists between the two cross-tabulated variables in
the population. and therefore the variables are statistically independent.
 ALTERNATIVE HYPOTHESIS: proposes that the two variables are related in the population.
 CONTINGENCY TABLE: When the table is prepared by enumeration of qualitative data by
entering the actual frequencies. and if that table represents occurance oftwo sets of events, that table is
called the contingency table.
TOPIC OUTLINE
 Goodness – of – fit Test
 Chi Square Test of Independence
*factors to be consider in using x² Test
INTODUCTION
 chi-square test is an important test amongst the
several tests of significance developed by IMPORTANT CHARACTERISTICS OF A CHI
statisticians. SQUARE TEST
• This test (as a non-parametric test) is based on
 As was developed by Karl Pearson in1900. frequencies and not on the parameters like mean and
 CHI SQUARE TEST is a non parametric test not standard deviation.
based on any assumption Or distribution Of any • The test is used (or-testing-the-bupthesisand is not
variable. useful for estimation.
• This test can also be applied to a complex contingency
 This statistical test follows a specific distribution with several classes and as such is a very useful test in
known as chi square distribution. research work.
 In general The test we use to measure the differences • This test is an important non-parametric test as no rigid
between what is observed and what is expected assumptions are necessary in regard to the type of pop
according to an assumed hypothesis is called the chi- ulation,and relatively less mathematical details are
square test. involved.
 The Chi Square statistic is commonly used for testing relationships between categorical variables. The null
hypothesis of the Chi-Square test is that no relationship exists on the categorical variables in the population;
they are independent. An example research question that could be answered using a Chi-Square analysis
would be:
*Is there a significant relationship between voter intent and political party membership?*
There are two different types of chisquare (X2) tests, both involve categorical data.
1 . The chi-square for goodness of fit
2. The chi-square test for independence
GOODNESS-OF-FIT TEST
The goodness-of-fit test is a one-sample test. It is used to compare the observed or actual
frequencies with the expected or theoretical frequencies. This test statistics determines whether the
observation or observed data are following a certain distribution or not. In other words, this is used to
test if the observed frequencies are equal to the expected frequencies. As we have already seen, results
of obtained - values in samples do not always agree exactly with theoretical results expected according
to rules of probability.
The formula is:
O = Observed Frequency
E = Expected Frequency
GOODNESS-OF-FIT TEST
Also referred to as one-sample chisquare.
It explores the proportion of cases that fall into the various categories of a single
variable, and compares these with hypothesized value.
Test that the null hypothesis that the observed frequencies, proportion, percentage distribution for
an experiment or a survey follow a certain or a given pattern theoretical distribution
(hypothesized value).
EXAMPLE 1
In 200 tosses of a coin, 115 heads and 85 tails were observed* Test the
hypothesis that the coin is fair using a .05 level of significance.
Step 1. Hypotheses
H0: O = E
H1: O ≠ E
Step 2. Level of Significance
a = .05
Step 3. Reject H0 if the computed value is greater than the critical value.
Step 4. Test Statistics — Goodness-of-fit X² test
df = k-1
Worksheet Compilation
(O – E)²
O E (O – E)²
E
115 100 225 2.25
85 100 225 2.25
200 200 450 4.50
Step 5. The computed value of 4.50 is greater than the critical value of 3.84, we reject the
null hypothesis, thus a significant difference exists.
Step 6. Conclusion. The coin is not fair at .05 level of significance.
EXAMPLE 2
A die is rolled 60 times. What is the probability that the outcomes turn up 1 , 2, 4,
5 and 6 assuming that the expected frequencies are 10 for each of the faces of the die.
X 1 2 3 4 5 6 Total
O 8 6 9 7 16 14 60
E 10 10 10 10 10 10 60
Step 1. Hypotheses
H0: O = E
H1: O ≠ E
Step 2. Level of Significance
a = .01
Step 3. Reject H0 if the computed value is greater than the critical value.
Step 4. Test Statistics — Goodness-of-fit X² test
Worksheet CoMpilation
X 1 2 3 4 5 6 X²
(O – E)² -2 -4 -1 -3 6 4
(O – E)² 0.4 1.6 0.1 0.9 3.6 1.6 82
E
Step 5. The computed value of 4.50 is greater than the critical value of 3.84, we reject the
null hypothesis, thus a significant difference exists.
Step 6. Conclusion The coin is not fair at .05 level of significance.
CHI-SQUARE TEST OF INDEPENDENCE
 The chi-square test of independence is used to analyze two variables with different categories.
It is used to determine whether the two variables are independent or not related. When the
researcher will look into whether gender is dependent or not on the choice of profession,
attitudes of the students toward the teacher and level of academic performance, research skills
and research attitudes, etc., the variables considered shall be categorized in a nominal form to
determine their respective frequencies before the computations shall be made. Since the
frequencies can be easily identified when they are arranged in a contingency table, this test is
Sometimes referred to as contingency analysis.
 The formula is:
X² = chi-square
L fo = observed frquency
Fe = expected frquency
EXAMPLE 1.
For example, in a research study on "Research Skills and Highest Educational Attainment", the research skills are
categorized into very skillful, moderately skillful and not skillful, while the categories of highest educational
attainment include bachelors, masters and doctorate degrees. In this categorization, the 3 x 3 contingency table is
used. The data are as follows.
RESEACRH SKILLS AND HIGHEST EDUCATION ATTAINMENT

Variable 2
Variable 1
Total
Research Skills Highest Education Attainment
Bachelor’s Master’s Doctorate
Very Skillful 15 9 8 32
Moderately Skillful 10 7 6 23
Not Skillful 5 6 6 17
Total 30 22 20 72
Step 1. Hypotheses
H0: There is no significant relationship between the research skills and educational
attainment
H1: There is significant relationship between the research skills and educational
attainment
Step 2. Alpha is 0.05, two tailed test
Step 3. Reject H0 if the computed value is greater than 9.49.
Step 4. Compute the value oglf the test statistics from the given data.
df = (number of column – 1) (number of row - 1)

1st Category
30 x 32/72= 13.33 = (15-13.33)² / 13.33= .21
30 x 23/72=9.58 = (10-9.58)² / 9.58= .018
30 x 17/72= 7.08 = (5-7.08)² / 7.08= .61
2nd Category
22 x 32/72= 9.78 = (9-9.78)² / 13.33= .06
22 x 23/72= 7.03 = (7-7.03)² / 9.58= .00013
22 x 17/72= 5.19 = (6-5.19)² / 7.08= .13
3rd Category
20 x 32/72= 8.89 = (9-9.78)² / 8.89= .09
20 x 23/72= 6.39 = (7-7.03)² / 6.39= .024
20 x 17/72= 4.72 = (6-5.19)² / 4.72= .35
Thus;
1st Category = .21 + 0.18 + 61 = .838
2nd Category = .06 + 00013+ .13 = .190
3rd Categpry = . 09 + 034 + .35 = .464
Total = 1.492
X² = 1.492
Step 5. The computed X² value of 1.492 is lesser than critic value, we fail to reject the null hupothesis.
Step 6. Snce the computed X² value of 1.492 is lesser that the critical value which is 9.49, it means that
there is no significant relationship between the research skills and the highest educational attainment.
Thus, based on the hypothetical data, we could satly that highest educational attainment independent from
the research.
EXERCISE
Findout if there is significance relationship between the extent of cultural

practices and reigious invlovement of the Bagos in the hinterland municipalities of
Ilocos Sur. Cultural practicesare categorized into three scales (Always Practiced,
Moderately Practiced, Not Practiced) while the religious involvement include Catholic,
Non-Catholic and none at all. The data are as follows:
Religious Cultural Practices Total

Involvement
Always Moderately Not Practiced
Practiced Practiced
Catholic 112 120 29 261
Non- Catholic 131 101 90 322
Not at all 16 25 13 54
Total 259 246 132 637

SOLUTION:
Step 1.
Ho: There is no significance relationship between the extent of cultural practices and reigious invlovement
of the Bagos in the hinterland municipalities of Ilocos Sur.
H1: There is no significance relationship between the extent of cultural practices and reigious invlovement
of the Bagos in the hinterland municipalities of Ilocos Sur.
Step 2. Alpha is 0.05 (directional test)
Step 3. Reject Ho if the computed value is greater that the critical value.
Step 4. Computr the test Statistics.
Religious Involvement and Cultural Practice – B
Cultural Practices
Religious
Involvement Always Practiced Moderately Not Practiced
Practiced
Catholic 112-106.12 120-100.79 29-54.08
Non- Catholic 131-130.92 101-124.35 99- 66.73
Not at all 16- 21.96 25-20.85 13-11.19
X² = (112-106.12)² /106.12 + (131-130.92)² /130.92 + (16- 21.96)² /21.96

= 0.03 + .000049 + 1.62 = 1.96
X² = (120-100.79)² /100.79 + (101-124.35)² /124.35 + (25-20.85)² /20.85
= 3.66 + .4.38 + .83 = 8.87
X² = (29-54.08)² /54.08 + (90-66.73)² /66.73 + (13-11.19)² /11.19
= 11.63 + .8.11 + .29 = 20.03
Total: X² = 1.96 + 8.87 + 20.03 = 30.86
Step 5. The computed value is 30.86 which is greater than the critical value of 9.488 at .05 level of
significance, we reject the null hypothesis.
Step 6. Conclusion:
Religious involvement is dependent on the extent of cultural practices of the respondents or vice cersa.
FACTORS TO BE CONSIDER IN USING THE
X² TEST
When the frequency is small in one of the categories in a contingency table, the Yate's
Correction shall be used to avoid erroneous findings and conclusions,According to Cooper and
Schindler (2001), Yate's Correction for continuity is often applied in a 2 x 2 table when the sample
size is greater than 40 or when the sample is between 20 and 40 and the values of expected
frequencies are five or more.
The formula is:
The contingency table is arrange as:
A B
C D
Yates's correction for continuity
 Theory by Frank Yates (1902-1994) was one of the pioneers of 20th century
Statistics
 In Statistics Yates's correction for continuity (or Yates's chi-square test) is used
in certain situations when testing for independence in a contingency table.
 In some cases, Yates's correction may adjust too far, and so its current use is
limited.
 When sample sizes are small, the use ofX will introduces some bias into the
calculation, so that the X value tends to be a little too large.
 To remove the bias, we use continuity correction (Yates' Correction)
When do we use the correction for continuity when performing a chi-square
analysis on a 2x2 table?
 In statistics, Yates' correction for continuity (or Yates' chi-square test) is used in certain
situations when testing for independence in a contingency table. In some cases, Yates'
correction may adjust too far, and so its current use is limited.
 Using the chi-squared distribution to interpret Pearson's chi-squared statistic requires one
to assume that the discrete probability of observed binomial frequencies in the table can
be approximated by the continuous chi-squared distribution. This assumption is not quite
correct, and introduces some error.
 To reduce the error in approximation, Frank Yates, an English statistician, suggested a
correction for continuity which adjusts the formula for Pearson's chi-square test by
subtracting 0.5 from the difference between each observed value and its expected value in
a 2 × 2 contingency table (Yates, 1934). This reduces the chi-square value obtained and
thus increases its p-value.
 The effect of Yates' correction is to prevent overestimation of statistical significance for
small data. This formula is chiefly used when at least one cell of the table has an expected
count smaller than 5. Unfortunately, Yates' correction may tend to overcorrect. This can
result in an overly conservative result that fails to reject the null hypothesis when it
should. So it is suggested that Yates' correction is unnecessary even with quite low sample
sizes (Sokal and Rohlf, 1981), such as total sample sizes less than or equal to 20.
EXERCISE
Let us consider the data below to illustrate the computation using the common chi-square formula and the
correction made. The sex and attitudes toward of multimedia usage in teaching is illustrated as follows:
Attitudes Towards Multimedia Usage in Teaching and the Gender Factors
Sex Attitudes Total

Positive Negative
Male 45 6 51
(44.47) (6.26)
Female 55 8 63
(55.26) (7.74)
Total 100 14 114
 Illustration using the general formula
X² = (45-44.74)² /44.74 + (6-6.26)² /6.26 + (55-55.26)² /55.26 + (8-7.74)² /7.74

= .0015 + .0108 + .0012. + .0087 = .022
Illistration in using the Yate’s Correction Formula
Attitudes Towards Multimedia Usage in Teaching and the Gender Factor - B
Sex Attitudes Total

Positive Negative
Male A B 51
45 6
Female C D 63
55 8
Total 100 14 114
Solution:
X² = 144(|360 –330| - 57)²
(51) (63) (100) (14)
144(30- 57)²
X² =
4498200
144(|45(8) – 6(55)| - 144/2)² 83106

X² = X² =
(45+6) (55+8) (45+55) (6+8) 4498200
144(|360 –330| - 144/2)² 0.187

X² = (45+6) (55+8) (45+55) (6+8) X² =
The Yate’s Correction can be used also using the formula:
To illustrate the computation using the revised formula, we used the previous data and observe the result.
X² = (|45-44.74|-.5)² + (|6-6.26|-.5)² + (|55-55.26|-.5)² + (|8-7.74|-.5)²

44.74 6.26 55.26 7.74
= (.26-.5)² + (.26-.5)² + (.26-.5)² + (.26-.5)²
44.74 6.26 55.26 7.74
= .0013 + .009 + .001 + .0074
X² =
.0187
Another factor to be considered in the use of chi-square is when there are categories with
smaller number of frequencies. When the categories are more than two (2 x 2 contingency table),
Yate's Correction is not applicable,instead, collapse the table or reduce the categories particularly
when such data are measured on Likert scale However, when the categories cannot be collapsed or
reduced because they are entirely different, it is advisable to discard so that the chi-square result is
not affected. Let us consider the following data, the first table containing 5-point scale with three
groups of respondents, and observe the improved one.
B. Collapsed/Reduced TableSince the original data
A. Original data on the ratings given by the contain small number of frequencies along the scales
teachers, administrators and students on the of "excellently implemented" and "not implemented",
extent of implementation of guidance services of the table should be reduced as presented below before
a certain college. the computation shall be made.
EXTENT OG INFORMATION AMONG EXTENT OG INFORMATION AMONG THREE
THREE DIFFERENT ACADEMIC GROUPS DIFFERENT ACADEMIC GROUPS - B
Groups EI VMI MI SI NI Total

Groups Very Much Moderately Slightly Total
Implemented Implemented Implemented
Administration 2 8 9 7 1 27 Administration 10 9 8 27
Teachers 3 10 12 10 4 39 Teachers 13 12 14 39
Atudents 5 16 10 15 3 49 Atudents 21 10 18 49
Total 10 34 31 32 8 115 Total 44 31 40 115

LIMITATION OF A CHI- SQUARE TEST
 The data is from a random.sample.This test applied in a four fould table, will not give a reliable result with
one degree Of freedom cell is less than 5.in such case, Yates correction is necessry. i.e. reduction Of the mode
Of (0 — e) by half.
 Even ifYate's correction, the test may be misleading if any expected frequency is much below 5. in that case
another appropriate test should be applied.
 In continzency tables larger than 2*2, Yate•s correction cannot be applied.
 Interprit this test with caution if sample total or total of values in all the cells is less than 50.
 This test tells the presence or absence of an association between the events but association.
 This test doesn't indicate the cause and effect, it only tells the probability of occurance of association by
chance.
 The test is to be applied only when the individual w hich means that the occurrence of one individual
observation(event) has no effect upon the occurrence of any other observation (event) in the sample under
consideration.
SUMMARY
 Tests of enumeration data are among the statistical tools that researchers investigate the relationship
between variables. The chi-square goodness-of-fit test is used to compare the observed from the
theoretical frequencies. The chi-square test ofindependence is used in comparing two or more
variables which are classified into two or more categories.
 These tests are considered nonparametric because the data are assumed to be normally distributed.
 When the frequency is small in one of the categories in a contingency table, the Yate's Correction shall
be used tb avoid erroneous findings and conclusions. When the categories are more than two (2 x 2
contingency table), collapse the table or reduce the categories particularly when such data are
measured on Likert scale.

Sardilla's Report On Advance Statistic

Uploaded by

Copyright:

Available Formats

You might also like

Sardilla's Report On Advance Statistic

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sardilla's Report On Advance Statistic

Uploaded by

Copyright:

Available Formats

A N A LY S IS O F

The formula is:

df = (number of column – 1) (number of row - 1)

Findout if there is significance relationship between the extent of cultural

Religious Cultural Practices Total

Catholic 112 120 29 261

Non- Catholic 131 101 90 322

Total 259 246 132 637

X² = (112-106.12)² /106.12 + (131-130.92)² /130.92 + (16- 21.96)² /21.96

Sex Attitudes Total

 Illustration using the general formula

X² = (45-44.74)² /44.74 + (6-6.26)² /6.26 + (55-55.26)² /55.26 + (8-7.74)² /7.74

Attitudes Towards Multimedia Usage in Teaching and the Gender Factor - B

Sex Attitudes Total

144(|45(8) – 6(55)| - 144/2)² 83106

144(|360 –330| - 144/2)² 0.187

X² = (|45-44.74|-.5)² + (|6-6.26|-.5)² + (|55-55.26|-.5)² + (|8-7.74|-.5)²

Groups EI VMI MI SI NI Total

Total 10 34 31 32 8 115 Total 44 31 40 115

You might also like