L24 - Example of Chi-Square Goodness-of-Fit Test, Chi-Square Cross Table, Association

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Examples of Chi-Square

Goodness-of-Fit Test,
Cross Tabulation
and Association
Chi-Square Test
Chi-Square Test: Know about the Inferential Statistical test
that Operates on Categorical Variables

• Chi-Square goodness-of-fit test – This test is performed for one categorical value and
begins with hypothesizing that variable distribution behaves in a specific manner. For
example, in order to identify the daily staffing needs of a store, the manager would wish
to know if there is any consistency in the number of customers throughout the week.
• Chi-Square test for independence – This test is conducted for two categorical values. It
compares variables in a contingency table & investigates if they are related. However, if
the observed data doesn’t fit the model, the likelihood that the variables are dependent
enhances, proving that the null hypothesis is incorrect.
• A buyer for a t-shirt shop wants to compare the
proportion of t-shirts of each size that are sold
1. Case to the proportion that were ordered. The buyer
counts the number of t-shirts of each size that
are sold in a week.
The buyer performs a chi-square goodness-of-fit test to
determine whether the proportions of t-shirt sizes sold
are consistent with the proportion of t-shirt sizes
ordered.
1.Open the sample data, TshirtSales.MTW.
2.Choose Stat > Tables > Chi-Square Goodness-of-
Chi-Square Fit Test (One Variable).

Goodness-of- 3.In Observed counts, enter Counts.


4.In Category names (optional), enter Size.
Fit Test 5.Under Test, select Specific proportions, and
enter Proportions
6.Click OK.

Interpret the
Interpret the results (1)

• In these results, the observed count for each t-


shirt size is not very different from the expected
count. The break-down by size is as follows:25
small shirts were sold, while 22.5 were expected
to be sold.
• 41 medium shirts were sold, while 45 were
expected to be sold.
• 91 large shirts were sold, while 90 were
expected to be sold.
• 68 extra-large shirts were sold, while 67.5 were
expected to be sold.
• The largest difference between observed and
expected sales is in the medium category.
Consequently, this category has the largest
contribution to the chi-square statistic, 0.355.
• The overall chi-square statistic is 0.648 and has a
Interpret the p-value of 0.885. Because the p-value is greater
than the significance level of 0.05, the buyer fails
results (2) to reject the null hypothesis. The buyer
concludes that there is not a significant
difference between the observed t-shirt sales
and the expected t-shirt sales.
Minitab
Result
Graphical Interpretation
2. Chi-squared test for nominal
(categorical) data

2 Variables
• Note: in the case of 2 variables being compared,
the test can also be interpreted as determining
Chi-squared if there is an association (or relationship)
test for between the two variables.
• Case: The maternity wards of two hospitals had
(categorical) different preparation for childbirth schemes. A
study of mothers who had participated in the
data schemes asked them to assess their satisfaction
with the scheme with the following results:
(Observed counts) Hospital
A B Total
Very satisfied 38 72 110 The expected numbers
Satisfied 33 57 90
Neutral 42 38 80
Dissatisfied a little 26 44 70
Dissatisfied a lot 11 29 40
Total 200 240 440
Hypothesis: 2 variables of Categorical data
• To answer the question 'is there any evidence of a difference in the
satisfaction of the mothers between the two schemes at the two
hospitals?', the chi-square test is used.
• Suitable null and alternative hypotheses might be:
• H0: There is no difference in satisfaction of the mothers between the
two schemes, and
• H1: There is a difference in satisfaction of the mothers between the
two schemes.
Cross Tabulation and Chi-Square
3. Statistical Association between
Categorical Variables Like Gender
and Political Affiliation (Raw Data)
• Let's go back into the Stat > Tables > Cross
Tabulation and Chi-Square dialog. This time,
click on the Chi-Square... button. Check the
Minitab options for Chi-Square analysis and Expected
cell counts, then press OK, and OK again to run
the analysis. Minitab gives the following
output:
Minitab
4. Example of Cross
Tabulation and Chi-Square
Umbrella Handles Defective
Interpretation (1)

• The engineer uses the Pearson test and the Likelihood-Ratio test to determine whether an
association between machine and shift exists.
• Because the p-values for the Pearson test and the Likelihood-Ratio test are less than 0.05,
the engineer rejects the null hypothesis and concludes that there is an association
between the variables of machine and shift.
Interpretation (2)

• In these results, there were a total of 408 defective umbrella handles. 143 defective
handles were made by Machine 1, 155 defective handles were made by Machine 2, and
110 defective handles were made by Machine 3.
• Also, more of the defective handles were made during the first shift than during the other
shifts. A total of 160 defective handles were made on the first shift, 134 defective handles
were made on the second shift, and 114 defective handles were made on the third shift.
Interpretation (3)

In each cell, Minitab displays the actual count, the expected count, and
the standardized residual, which indicates the magnitude and direction of
the difference between the actual and expected counts.

For instance, from Machine 3, during the 3rd shift, 34 defective handles
were made, and 30.74 were expected. The small positive residual
indicates that the actual and expected counts are fairly close.

But from Machine 2, during the 3rd shift, 32 defective handles were
made, and 43.31 were expected. The larger negative residual indicates
that less defective handles were produced than expected.
Criteria to run the
regression model
Assumption #1:

Your dependent variable should be measured at the continuous


level (i.e., it is an interval or ratio variable).

Examples of such continuous variables include height (measured in feet and inches), temperature (measured in °C),
salary (measured in US dollars), revision time (measured in hours), intelligence (measured using IQ score), firm size
(measured in terms of the number of employees), age (measured in years), reaction time (measured in milliseconds),
grip strength (measured in kg), power output (measured in watts), test performance (measured from 0 to 100), sales
(measured in number of transactions per month), academic achievement (measured in terms of GMAT score), and so
forth.
Assumption #2:

Your independent variable should be measured at


the continuous or categorical level.

In case you are unsure, examples of categorical variables include gender (e.g., two groups: male and female), ethnicity
(e.g., three groups: Caucasian, African American and Hispanic), physical activity level (e.g., four groups: sedentary, low,
moderate and high), and profession (e.g., five groups: surgeon, doctor, nurse, dentist, therapist). In this guide, we show
you the linear regression procedure and Minitab output when both your dependent and independent variables were
measured at the continuous level.

You might also like