Professional Documents
Culture Documents
Sardilla's Report On Advance Statistic
Sardilla's Report On Advance Statistic
Sardilla's Report On Advance Statistic
E NU M E R A T I O
N DA T A
Tests of enumeration data that are considered in this
book are chi-square test of goodness-of-fit test and Chi-
square test of independence. These tests are
nonparametric tests and are widely used in research. The
chi-square test is denoted by the symbol X²- test. Perhaps
the very reason for using it is its simplicity in terms of
computation. As a nonparametric test, it is free from an
assumption with respect to the distribution of data. This
test is classified under distribution-free statistics.
IMPORTANT TERMS
PARAMETRIC TEST: the test in which, the population constants like mean,std deviation. std error,
correlation coefficient, proportion etc. and data tend to follow one assumed or established distribution
such as normal, binomial. poisson etc.
NON PARAMETRIC TEST: the test in which no constant Of a population is used. Data do not
follow any specific distribution and no assumption are made in these tests. E.g. to classify good.
better and best we just allocate arbitrary numbers or marks to each category.
HYPOTHESIS: It is a definite statement about the population parameters.
NULL HYPOTHESIS states that no association exists between the two cross-tabulated variables in
the population. and therefore the variables are statistically independent.
ALTERNATIVE HYPOTHESIS: proposes that the two variables are related in the population.
CONTINGENCY TABLE: When the table is prepared by enumeration of qualitative data by
entering the actual frequencies. and if that table represents occurance oftwo sets of events, that table is
called the contingency table.
TOPIC OUTLINE
Goodness – of – fit Test
Chi Square Test of Independence
*factors to be consider in using x² Test
INTODUCTION
chi-square test is an important test amongst the
several tests of significance developed by IMPORTANT CHARACTERISTICS OF A CHI
statisticians. SQUARE TEST
• This test (as a non-parametric test) is based on
As was developed by Karl Pearson in1900. frequencies and not on the parameters like mean and
CHI SQUARE TEST is a non parametric test not standard deviation.
based on any assumption Or distribution Of any • The test is used (or-testing-the-bupthesisand is not
variable. useful for estimation.
• This test can also be applied to a complex contingency
This statistical test follows a specific distribution with several classes and as such is a very useful test in
known as chi square distribution. research work.
In general The test we use to measure the differences • This test is an important non-parametric test as no rigid
between what is observed and what is expected assumptions are necessary in regard to the type of pop
according to an assumed hypothesis is called the chi- ulation,and relatively less mathematical details are
square test. involved.
The Chi Square statistic is commonly used for testing relationships between categorical variables. The null
hypothesis of the Chi-Square test is that no relationship exists on the categorical variables in the population;
they are independent. An example research question that could be answered using a Chi-Square analysis
would be:
*Is there a significant relationship between voter intent and political party membership?*
There are two different types of chisquare (X2) tests, both involve categorical data.
1 . The chi-square for goodness of fit
2. The chi-square test for independence
GOODNESS-OF-FIT TEST
The goodness-of-fit test is a one-sample test. It is used to compare the observed or actual
frequencies with the expected or theoretical frequencies. This test statistics determines whether the
observation or observed data are following a certain distribution or not. In other words, this is used to
test if the observed frequencies are equal to the expected frequencies. As we have already seen, results
of obtained - values in samples do not always agree exactly with theoretical results expected according
to rules of probability.
O = Observed Frequency
E = Expected Frequency
GOODNESS-OF-FIT TEST
Also referred to as one-sample chisquare.
It explores the proportion of cases that fall into the various categories of a single
variable, and compares these with hypothesized value.
Test that the null hypothesis that the observed frequencies, proportion, percentage distribution for
an experiment or a survey follow a certain or a given pattern theoretical distribution
(hypothesized value).
EXAMPLE 1
In 200 tosses of a coin, 115 heads and 85 tails were observed* Test the
hypothesis that the coin is fair using a .05 level of significance.
Step 1. Hypotheses
H0: O = E
H1: O ≠ E
Step 2. Level of Significance
a = .05
Step 3. Reject H0 if the computed value is greater than the critical value.
Step 4. Test Statistics — Goodness-of-fit X² test
df = k-1
Worksheet Compilation
(O – E)²
O E (O – E)²
E
115 100 225 2.25
85 100 225 2.25
200 200 450 4.50
Step 5. The computed value of 4.50 is greater than the critical value of 3.84, we reject the
null hypothesis, thus a significant difference exists.
Step 6. Conclusion. The coin is not fair at .05 level of significance.
EXAMPLE 2
A die is rolled 60 times. What is the probability that the outcomes turn up 1 , 2, 4,
5 and 6 assuming that the expected frequencies are 10 for each of the faces of the die.
X 1 2 3 4 5 6 Total
O 8 6 9 7 16 14 60
E 10 10 10 10 10 10 60
Step 1. Hypotheses
H0: O = E
H1: O ≠ E
Step 2. Level of Significance
a = .01
Step 3. Reject H0 if the computed value is greater than the critical value.
Step 4. Test Statistics — Goodness-of-fit X² test
Worksheet CoMpilation
X 1 2 3 4 5 6 X²
(O – E)² -2 -4 -1 -3 6 4
(O – E)² 0.4 1.6 0.1 0.9 3.6 1.6 82
E
Step 5. The computed value of 4.50 is greater than the critical value of 3.84, we reject the
null hypothesis, thus a significant difference exists.
Step 6. Conclusion The coin is not fair at .05 level of significance.
CHI-SQUARE TEST OF INDEPENDENCE
The chi-square test of independence is used to analyze two variables with different categories.
It is used to determine whether the two variables are independent or not related. When the
researcher will look into whether gender is dependent or not on the choice of profession,
attitudes of the students toward the teacher and level of academic performance, research skills
and research attitudes, etc., the variables considered shall be categorized in a nominal form to
determine their respective frequencies before the computations shall be made. Since the
frequencies can be easily identified when they are arranged in a contingency table, this test is
Sometimes referred to as contingency analysis.
The formula is:
X² = chi-square
L fo = observed frquency
Fe = expected frquency
EXAMPLE 1.
For example, in a research study on "Research Skills and Highest Educational Attainment", the research skills are
categorized into very skillful, moderately skillful and not skillful, while the categories of highest educational
attainment include bachelors, masters and doctorate degrees. In this categorization, the 3 x 3 contingency table is
used. The data are as follows.
RESEACRH SKILLS AND HIGHEST EDUCATION ATTAINMENT
Variable 2
Variable 1
Total
Research Skills Highest Education Attainment
Bachelor’s Master’s Doctorate
Very Skillful 15 9 8 32
Moderately Skillful 10 7 6 23
Not Skillful 5 6 6 17
Total 30 22 20 72
Step 1. Hypotheses
H0: There is no significant relationship between the research skills and educational
attainment
H1: There is significant relationship between the research skills and educational
attainment
Step 2. Alpha is 0.05, two tailed test
Step 3. Reject H0 if the computed value is greater than 9.49.
Step 4. Compute the value oglf the test statistics from the given data.
2nd Category
22 x 32/72= 9.78 = (9-9.78)² / 13.33= .06
22 x 23/72= 7.03 = (7-7.03)² / 9.58= .00013
22 x 17/72= 5.19 = (6-5.19)² / 7.08= .13
3rd Category
20 x 32/72= 8.89 = (9-9.78)² / 8.89= .09
20 x 23/72= 6.39 = (7-7.03)² / 6.39= .024
20 x 17/72= 4.72 = (6-5.19)² / 4.72= .35
Thus;
1st Category = .21 + 0.18 + 61 = .838
2nd Category = .06 + 00013+ .13 = .190
3rd Categpry = . 09 + 034 + .35 = .464
Total = 1.492
X² = 1.492
Step 5. The computed X² value of 1.492 is lesser than critic value, we fail to reject the null hupothesis.
Step 6. Snce the computed X² value of 1.492 is lesser that the critical value which is 9.49, it means that
there is no significant relationship between the research skills and the highest educational attainment.
Thus, based on the hypothetical data, we could satly that highest educational attainment independent from
the research.
EXERCISE
Not at all 16 25 13 54
Step 1.
Ho: There is no significance relationship between the extent of cultural practices and reigious invlovement
of the Bagos in the hinterland municipalities of Ilocos Sur.
H1: There is no significance relationship between the extent of cultural practices and reigious invlovement
of the Bagos in the hinterland municipalities of Ilocos Sur.
Step 2. Alpha is 0.05 (directional test)
Step 3. Reject Ho if the computed value is greater that the critical value.
Step 4. Computr the test Statistics.
Religious Involvement and Cultural Practice – B
Cultural Practices
Religious
Involvement Always Practiced Moderately Not Practiced
Practiced
Catholic 112-106.12 120-100.79 29-54.08
Non- Catholic 131-130.92 101-124.35 99- 66.73
Not at all 16- 21.96 25-20.85 13-11.19
A B
C D
Yates's correction for continuity
Theory by Frank Yates (1902-1994) was one of the pioneers of 20th century
Statistics
In Statistics Yates's correction for continuity (or Yates's chi-square test) is used
in certain situations when testing for independence in a contingency table.
In some cases, Yates's correction may adjust too far, and so its current use is
limited.
When sample sizes are small, the use ofX will introduces some bias into the
calculation, so that the X value tends to be a little too large.
To remove the bias, we use continuity correction (Yates' Correction)
When do we use the correction for continuity when performing a chi-square
analysis on a 2x2 table?
In statistics, Yates' correction for continuity (or Yates' chi-square test) is used in certain
situations when testing for independence in a contingency table. In some cases, Yates'
correction may adjust too far, and so its current use is limited.
Using the chi-squared distribution to interpret Pearson's chi-squared statistic requires one
to assume that the discrete probability of observed binomial frequencies in the table can
be approximated by the continuous chi-squared distribution. This assumption is not quite
correct, and introduces some error.
To reduce the error in approximation, Frank Yates, an English statistician, suggested a
correction for continuity which adjusts the formula for Pearson's chi-square test by
subtracting 0.5 from the difference between each observed value and its expected value in
a 2 × 2 contingency table (Yates, 1934). This reduces the chi-square value obtained and
thus increases its p-value.
The effect of Yates' correction is to prevent overestimation of statistical significance for
small data. This formula is chiefly used when at least one cell of the table has an expected
count smaller than 5. Unfortunately, Yates' correction may tend to overcorrect. This can
result in an overly conservative result that fails to reject the null hypothesis when it
should. So it is suggested that Yates' correction is unnecessary even with quite low sample
sizes (Sokal and Rohlf, 1981), such as total sample sizes less than or equal to 20.
EXERCISE
Let us consider the data below to illustrate the computation using the common chi-square formula and the
correction made. The sex and attitudes toward of multimedia usage in teaching is illustrated as follows:
Attitudes Towards Multimedia Usage in Teaching and the Gender Factors
144(30- 57)²
X² =
4498200
To illustrate the computation using the revised formula, we used the previous data and observe the result.
X² =
.0187
Another factor to be considered in the use of chi-square is when there are categories with
smaller number of frequencies. When the categories are more than two (2 x 2 contingency table),
Yate's Correction is not applicable,instead, collapse the table or reduce the categories particularly
when such data are measured on Likert scale However, when the categories cannot be collapsed or
reduced because they are entirely different, it is advisable to discard so that the chi-square result is
not affected. Let us consider the following data, the first table containing 5-point scale with three
groups of respondents, and observe the improved one.
B. Collapsed/Reduced TableSince the original data
A. Original data on the ratings given by the contain small number of frequencies along the scales
teachers, administrators and students on the of "excellently implemented" and "not implemented",
extent of implementation of guidance services of the table should be reduced as presented below before
a certain college. the computation shall be made.
EXTENT OG INFORMATION AMONG EXTENT OG INFORMATION AMONG THREE
THREE DIFFERENT ACADEMIC GROUPS DIFFERENT ACADEMIC GROUPS - B
Administration 2 8 9 7 1 27 Administration 10 9 8 27
Teachers 3 10 12 10 4 39 Teachers 13 12 14 39
Atudents 5 16 10 15 3 49 Atudents 21 10 18 49