Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 44

ALLAMA IQBAL OPEN UNIVERSITY ISLAMABAD

ASSIGNMENT NO 2

NAME: SABA NAZEER

SEMESTER: 3RD

USER ID: 0000351


ASSIgNMENT NO:

PROgRAM: B.ED. (1.5)

COURSE CODE: 8614


Assignment2
Q.1 Explain three major measures of central tendency. Also explain the
procedure to calculate them.

Central Tendency
In statistics, the central tendency is the descriptive summary of a data set. Through
the single value from the dataset, it reflects the centre of the data distribution.
Moreover, it does not provide information regarding individual data from the dataset,

where it gives a summary of the dataset. Generally, the central tendency of a dataset can
be defined using some of the measures in statistic

Definition
The central tendency is stated as the statistical measure that represents the single
value of the entire distribution or a dataset. It aims to provide an accurate
description of the entire data in the distribution.

Measures of Central Tendency


The central tendency of the dataset can be found out using the three important
measures namely mean, median and mode.
Mean
The mean represents the average value of the dataset. It can be calculated as the
sum of all the values in the dataset divided by the number of values. In general, it
is considered as the arithmetic mean. Some other measures of mean used to find
the central tendency are as follows:

• Geometric Mean

• Harmonic Mean

• Weighted Mean

It is observed that if all the values in the dataset are the same, then all geometric,
arithmetic and harmonic mean values are the same. If there is variability in the
data, then the mean value differs. Calculating the mean value is completely easy.

The formula to calculate the mean value is given by:


In symmetric data distribution, the mean value is located accurately at the centre.

Median
Median is the middle value of the dataset in which the dataset is arranged in the
ascending order or in descending order. When the dataset contains an even number of
values, then the median value of the dataset can be found by taking the mean of the
middle two values.

Consider the given dataset with the odd number of observations arranged in descending
order – 23, 21, 18, 16, 15, 13, 12, 10, 9, 7, 6, 5, and 2

Here 12 is the middle or median number that has 6 values above it and 6 values below
it.
Now, consider another example with an even number of observations that are arranged
in descending order – 40, 38, 35, 33, 32, 30, 29, 27, 26, 24, 23, 22, 19, and
17

When you look at the given dataset, the two middle values obtained are 27 and 29.

Now, find out the mean value for these two numbers.

i.e.,(27+29)/2 =28

Therefore, the median for the given data distribution is 28.

Mode

The mode represents the frequently occurring value in the dataset. Sometimes the
dataset may contain multiple modes and in some cases, it does not contain any
mode at all.

Consider the given dataset 5, 4, 2, 3, 2, 1, 5, 4, 5

Since the mode represents the most common value. Hence, the most frequently
repeated value in the given dataset is 5.

Based on the properties of the data, the measures of central tendency are selected.
• If you have a symmetrical distribution of continuous data, all the three
measures of central tendency hold good. But most of the times, the analyst uses
the mean because it involves all the values in the distribution or dataset.

• If you have skewed distribution, the best measure of finding the central
tendency is the median.

Video Lesson

Measure of Central Tendency

Measures of Central Tendency and Dispersion

The central tendency measure is defined as the number used to represent the center

or middle of a set of data values. The three commonly used measures of central

tendency are the mean, median, and mode. Measures of central tendency

Recommended: First read Measures of shape

Definition
A measure of central tendency (also referred to as measures of centre or central
location) is a summary measure that attempts to describe a whole set of data with a
single value that represents the middle or centre of its distribution.

There are three main measures of central tendency:

• mode

• median

• mean
• Each of these measures describes a different indication of the typical or central
value in the distribution.
Mode

The mode is the most commonly occurring value in a distribution.

Consider this dataset showing the retirement age of 11 people, in whole years:

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

This table shows a simple frequency distribution of the retirement age data.

ImageDescription

Frequency distribution table

The most commonly occurring value is 54, therefore the mode of this distribution is 54
years.

Advantage of the mode


The mode has an advantage over the median and the mean as it can be found for both
numerical and categorical (non-numerical) data.

Limitations of the mode


The are some limitations to using the mode. In some distributions, the mode may
not reflect the centre of the distribution very well. When the distribution of
retirement age is ordered from lowest to highest value, it is easy to see that the
centre of the distribution is 57 years, but the mode is lower, at 54 years.
It is also possible for there to be more than one mode for the same distribution of data,
(bi-modal, or multi-modal).

Median
The median is the middle value in distribution when the values are arranged in
ascending or descending order.

The median divides the distribution in half (there are 50% of observations on either side
of the median value). In a distribution with an odd number of observations, the median
value is the middle value.

Advantage of the median


The median is less affected by outliers and skewed data than the mean and is usually the
preferred measure of central tendency when the distribution is not symmetrical.

Limitation of the median


The median cannot be identified for categorical nominal data, as it cannot be logically
ordered.

Mean
The mean is the sum of the value of each observation in a dataset divided by the number
of observations. This is also known as the arithmetic average.

Looking at the retirement age distribution again:

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

Advantage of the mean

The mean can be used for both continuous and discrete numeric data.
Limitations of the mean

The mean cannot be calculated for categorical data, as the values cannot be summed.

As the mean includes every value in the distribution the mean is influenced by outliers
and skewed distributions.

Another thing about the mean

y G y
‗ ‘ y y x (pronounced Xbar).

Q.2 What do you mean by inferential statistics? How is it important in


educational research?

Inferential Statistics | An Easy Introduction & Examples

Published on September 4, 2020 by Pritha Bhandari. Revised on June


22, 2023.

While descriptive statistics summarize the characteristics of a data set, inferential


statistics help you come to conclusions and make predictions based on your data.

Inferential statistics have two main uses:

• making estimates about populations (for example, the mean SAT score of all 11th
graders in the US).
• testing hypotheses to draw conclusions about populations (for example, the
relationship between SAT scores and family income).

Table of contents
Descriptive versus inferential statistics
Descriptive statistics allow you to describe a data set, while inferential statistics allow you to
make inferences based on a data set.

Descriptive statistics

Using descriptive statistics, you can report characteristics of your data:

• The distribution concerns the frequency of each value.


• The central tendency concerns the averages of the values.

• The variability concerns how spread out the values are.

In descriptive statistics, there is no uncertainty – the statistics precisely describe the data
that you collected. If you collect data from an entire population, you can directly compare
these descriptive statistics to those from other populations.

Example: Descriptive statisticsYou collect data on the SAT scores of all 11th graders in a
school for three years.

Inferential statistics
Most of the time, you can only acquire data from samples, because it is too x v
w y ‘ interested in.

Example: Inferential statisticsYou randomly select a sample of 11th graders in your state and
collect data on their SAT scores and other characteristics.

You can use inferential statistics to make estimates and test hypotheses about the whole
population of 11th graders in the state based on your sample data.
Sampling error in inferential statistics
Since the size of a sample is always smaller than the size of the population, some
‘ y sampling error, which is the difference
between the true population values (called parameters) and the measured sample values
(called statistics).

Here's why students love Scribbr's proofreading services

Discover proofreading & editing


Estimating population parameters from sample statistics
The characteristics of samples and populations are described by numbers called statistics
and parameters:

• A statistic is a measure that describes the sample (e.g., sample mean).


• A parameter is a measure that describes the whole population (e.g., population
mean).

Sampling error is the difference between a parameter and a corresponding statistic.


S y ‘ w ,y inferential statistics to estimate
these parameters in a way that takes sampling error into account.

There are two important types of estimates you can make about the population: point
estimates and interval estimates.

• A point estimate is a single value estimate of a parameter. For instance, a sample


mean is a point estimate of a population mean.
• An interval estimate gives you a range of values where the parameter is expected
to lie. A confidence interval is the most common type of interval estimate.
Both types of estimates are important for gathering a clear idea of where a parameter is
likely to lie.

Confidence intervals

A confidence interval uses the variability around a statistic to come up with an


interval estimate for a parameter. Confidence intervals are useful for estimating
parameters because they take sampling error into account.

While a point estimate gives you a precise value for the parameter you are interested in,
a confidence interval tells you the uncertainty of the point estimate.
They are best used in combination with each other.

Hypothesis testing

Hypothesis testing is a formal process of statistical analysis using inferential


statistics. The goal of hypothesis testing is to compare populations or assess
relationships between variables using samples.

Hypotheses, or predictions, are tested using statistical tests. Statistical tests also

estimate sampling errors so that valid inferences can be made. Parametric tests

make assumptions that include the following:

• the population that the sample comes from follows a normal distribution of scores
• the sample size is large enough to represent the population
• the variances, a measure of variability, of each group being compared are similar
Comparison tests

Comparison tests assess whether there are differences in means, medians or


rankings of scores of two or more groups.
Comparison test Parametric? What’s being compared? Samples
t test Yes Means 2 samples

ANOVA Yes Means 3+ samples

Mood’s median No Medians 2+ samples

Wilcoxon signed-rank No Distributions 2 samples

Wilcoxon rank-sum (Mann-Whitney U) No Sums of rankings 2 samples

Kruskal-Wallis H No Mean rankings 3+ samples


Means can only be found for interval or ratio data, while medians and rankings are more
appropriate measures for ordinal data.

Correlation tests

Correlation tests determine the extent to which two variables are associated.

Although P ‘ r is the most statistically powerful , S ‘ r is


v v w ‘ w distribution.

The chi square test of independence is the only test that can be used with nominal
variables.

Correlation test Parametric? Variables

Correlation test Parametric? Variables

Pearson’s r Yes Interval/ratio variables


Spearman’s r No Ordinal/interval/ratio variables

Chi square test of independence No Nominal/ordinal variables

Regression tests
Regression tests demonstrate whether changes in predictor variables cause
changes in an outcome variable. You can decide which regression test to use based
on the number and types of variables you have as predictors and outcomes.

Data transformations help you make your data normally distributed using
mathematical operations, like taking the square root of each value.

Regression test Predictor Outcome

Simple linear regression 1 interval/ratio variable 1 interval/ratio variable

Multiple linear 2+ interval/ratio variable(s) 1 interval/ratio variable


regression

Logistic regression 1+ any variable(s) 1 binary variable


Regression test Predictor Outcome
Nominal regression 1+ any variable(s) 1 nominal variable

Ordinal regression 1+ any variable(s) 1 ordinal variable


Inferential Statistics
Inferential statistics is a branch of statistics that makes the use of various analytical
tools to draw inferences about the population data from sample data. Apart from
inferential statistics, descriptive statistics forms another branch of statistics.

What is Inferential Statistics?


Inferential statistics helps to develop a good understanding of the population data
by analyzing the samples obtained from it. It helps in making generalizations about
the population by using various analytical tests and tools.
Q.3 When and where do we use correlation and regression in research?
Correlation is the relationship or association between two variables. There are multiple
ways to measure correlation, but the most common is Pearson's correlation coefficient (r),
which tells you the strength of the linear relationship between two variables.

Spurious Relationships
It's important to remember that correlation does not always indicate causation. Two
variables can be correlated without either variable causing the other. For instance, ice
cream sales and drownings might be correlated, but that doesn't mean that ice cream
causes drownings—instead, both ice cream sales and drownings increase when the
weather is hot. Relationships like this are called spurious correlations.
in the same direction

Negative correlation The variables change in As coffee


opposite directions consumption increases, tiredness decreases

Zero correlation There is no relationship Coffee consumption is not correlated between


the variables with height

Prevent plagiarism. Run a free check.

Try for free

When to use correlational research


Correlational research is ideal for gathering data quickly from natural settings. That
helps you generalize your findings to real-life situations in an externally valid way.

There are a few situations where correlational research is an appropriate choice.

To investigate non-causal relationships


Y w w w v , y ‘ expect to find a causal
relationship between them.

Correlational research can provide insights into complex real-world relationships,


helping researchers develop theories and make predictions.

To explore causal relationships between variables


You think there is a causal relationship between two variables, but it is impractical,
unethical, or too costly to conduct experimental research that manipulates one of
the variables.

To test new measurement tools


You have developed a new instrument for measuring your variable, and you need to
test its reliability or validity.
Correlational research can be used to assess whether a tool consistently or

accurately captures the concept it aims to measure. How to collect

correlational data

There are many different methods you can use in correlational research. In the
social and behavioral sciences, the most common data collection methods for this
type of research include surveys, observations, and secondary data.

Surveys
In survey research, you can use questionnaires to measure your variables of
interest.
You can conduct surveys online, by mail, by phone, or in person.

Surveys are a quick, flexible way to collect standardized data from many
, ‘ y q w unbiased way and capture
relevant insights.

Naturalistic observation
Naturalistic observation is a type of field research where you gather data about a
behavior or phenomenon in its natural environment.

This method often involves recording, counting, describing, and categorizing


actions and events. Naturalistic observation can include both qualitative and
quantitative elements, but to assess correlation, you collect data that can be
analyzed quantitatively (e.g., frequencies, durations, scales, and amounts).

Secondary data
Instead of collecting original data, you can also use data that has already been
collected for a different purpose, such as official records, polls, or previous studies.
Using secondary data is inexpensive and fast, because data collection is complete.
However, the data may be unreliable, incomplete or not entirely relevant, and you
have no control over the reliability or validity of the data collection procedures.

How to analyze correlational data


After collecting data, you can statistically analyze the relationship between
variables using correlation or regression analyses, or both. You can also visualize
the relationships between variables with a scatterplot.

Correlation analysis
Using a correlation analysis, you can summarize the relationship between variables
into a correlation coefficient: a single number that describes the strength and
w v ,y ‘ q y the degree of the relationship between variables.

Regression analysis
With a regression analysis, you can predict how much a change in one variable will
be associated with a change in the other variable. The result is a regression
equation that describes the line on a graph of your variables.

Here's why students love Scribbr's proofreading services

Correlation and causation


I‘ correlation does not imply causatio n. Just because y
w w g ‘ y them causes the other for a few reasons.

Directionality problem
If two variables are correlated, it could be because one of them is a cause and the
other B g ‘ wy
to infer w w , ‘ causality from
correlational studies.

Third variable problem


A confounding variable is a third variable that influences other variables to make
them seem causally related even though they are not. Instead, there are separate
causal links between the confounder and each variable.

I , ‘ v extraneous variables. Even if you


statistically control for some potential confounders, there may still be other hidden
variables that disguise the relationship between your study variables.

ExampleYou find a strong positive correlation between working hours and


workrelated stress: people with lower working hours report lower levels of
workrelated
Hwv , ‘ v w w g stress.

Q.4 How F Distribution is helpful in making conclusion in


educational research? Briefly discuss the interpretation of F
Distribution.

Why do we need F-distribution?


The F-distribution is useful in hypothesis testing. Hypothesis testing is used by
scientists to statistically compare data from two or more populations. The
Fdistribution is needed to determine whether the F-value for a study indicates any
statistically significant differences between two populations.

What is F-distribution and what is an example of it?


The F-distribution contains all of the possible values for a test statistic. It is
determined by the degrees of freedom and is always skewed right, meaning that all
of the values are greater than zero.
What does an F-test tell you?
An F-test is a statistical method for comparing the variances of two populations.
This can be used to determine whether statistically significant differences occur
between two populations.

What is an example of an F-test?


The F-test can be used in a variety of experimental settings. For example, if a
scientist wants to determine whether statistically significant weight loss exists
between two groups based on the amount of time spent exercising, the F-test could
be used.

What is a two sample F-test?


The two sample F-test is used when comparing the variances of two populations.
This allows the researcher to determine whether there are statistically significant
differences between the two populations.

7 types of statistical distributions with practical examples

Data Science Dojo Staff

Statistical distributions help us understand a problem better by assigning a range


of possible values to the variables, making them very useful in data science and
machine learning. Here are 7 types of distributions with intuitive examples that
often occur in real-life data.

y ‘ g g ‘ g g w, etting on a sports team to win an away


match, framing a policy for an insurance company, or simply trying your luck on
blackjack at the casino, probability and distributions come into action in all aspects
of life to determine the likelihood of events.
Having a sound statistical background can be incredibly beneficial in the daily life
of a data scientist. Probability is one of the main building blocks of data science
and machine learning. While the concept of probability gives us mathematical
, v z w ‘ g underneath.

Level up your AI game: Dive deep into Large Language Models with us!

Having a good grip on statistical distribution makes exploring a new dataset and
finding patterns within a lot easier. It helps us choose the appropriate machine
learning model to fit our data on and speeds up the overall process.

PRO TIP: Join our data science bootcamp program today to enhance your data
science skillset!

In this blog, we will be going over diverse types of data, the common distributions
for each of them, and compelling examples of where they are applied in real life.

Before we proceed further, if you want to learn more about probability distribution,
watch this video below:

Common types of data


Explaining various distributions becomes more manageable if we are familiar with
the type of data they use. We encounter two different outcomes in day-to-day
experiments: finite and infinite outcomes.

Difference between Discrete and Continuous Data (Source)

When you roll a die or pick a card from a deck, you have a limited number of
outcomes possible. This type of data is called Discrete Data, which can only take a
specified number of values. For example, in rolling a die, the specified values are
1, 2, 3, 4, 5, and 6.
Types of statistical distributions
Depending on the type of data we use, we have grouped distributions into two
categories, discrete distributions for discrete data (finite outcomes) and continuous
distributions for continuous data (infinite outcomes).

Discrete distributions

Discrete uniform distribution: All outcomes are equally likely


In statistics, uniform distribution refers to a statistical distribution in which all
outcomes are equally likely. Consider rolling a six-sided die. You have an equal
probability of obtaining all six numbers on your next roll, i.e., obtaining precisely
one of 1, 2, 3, 4, 5, or 6, equaling a probability of 1/6, hence an example of a
discrete uniform distribution.

Fair Dice Uniform Distribution Graph


Uniform distribution is represented by the function U(a, b), where a and b represent

the starting and ending values, respectively. Similar to a discrete uniform

distribution, there is a continuous uniform distribution for continuous variables.

Bernoulli Distribution: Single-trial with two possible outcomes


The Bernoulli distribution is one of the easiest distributions to understand. It can be
used as a starting point to derive more complex distributions. Any event with a
single trial and only two outcomes follows a Bernoulli distribution.

Binomial Distribution: A sequence of Bernoulli events


The Binomial Distribution can be thought of as the sum of outcomes of an event
following a Bernoulli distribution. Therefore, Binomial Distribution is used in
binary outcome events, and the probability of success and failure is the same in all
successive trials. An example of a binomial event would be flipping a coin multiple
times to count the number of heads and tails.

Binomial vs Bernoulli distribution.

The difference between these distributions can be explained through an example.


C y ‘ g q z 10 /F q y g single T/F question
would be considered a Bernoulli trial, whereas attempting the entire quiz of 10 T/F
questions would be categorized as a Binomial trial. The main characteristics of
Binomial Distribution are:

• Given multiple trials, each of them is independent of the other. That is, the

‘ ne.

• Each trial can lead to just two possible results (e.g., winning or losing), with
probabilities p and (1 – p).

A binomial distribution is represented by B (n, p), where n is the number of trials


and p is the probability of success in a single trial. A Bernoulli distribution can be
shaped as a binomial trial as B (1, p) since it has only one trial. The expected value
―x‖ , E x = np. Similarly, variance
is
represented as Var(x) = np(1-p).

Binomial Distribution Graph


Poisson Distribution: The probability that an event may or may
not occur
Poisson distribution deals with the frequency with which an event occurs within a
specific interval. Instead of the probability of an event, Poisson distribution
requires knowing how often it happens in a particular period or distance. For
example, a cricket chirps two times in 7 seconds on average.

The main characteristics which describe the Poisson Processes are:

• The events are independent of each other.

• An event can occur any number of times (within the defined period).
• w v ‘ y

Poisson Distribution Graph

The graph of Poisson distribution plots the number of instances an event occurs in
the standard interval of time and the probability of each one.

Continuous distributions

Normal Distribution: Symmetric distribution of values around the mean

Normal distribution is the most used distribution in data science. In a normal


distribution graph, data is symmetrically distributed with no skew.

Normal Distribution Bell Curve Graph

H ,y w ― - ‖ v g , g that most data points


exist there. N , σ2 here, µ represents the me , σ2
v , w y provided. The expected value of a normal distribution is
equal to its mean. The curve is symmetric at the center. Therefore mean, mode, and
median are equal to the same value, distributing all the values symmetrically
around the mean.

Conclusion
Data is an essential component of the data exploration and model development
process. The first thing that springs to mind when working with continuous
variables is looking at the data distribution. We can adjust our Machine Learning
models to best match the problem if we can identify the pattern in the data
distribution, which reduces the time to get to an accurate outcome.

Q.5 Discuss, in details, Chi-square as independent test and


Goodness-of-fit test.

Chi-Square (Χ²) Tests | Types, Formula & Examples

Published on May 23, 2022 by Shaun Turney. Revised on June 22,


2023.

AP ‘ chi-square test is a statistical test for categorical data. It is used to


determine whether your data are significantly different from what you expected.
w y P ‘ -square tests:

• The chi-square goodness of fit test is used to test whether the frequency
distribution of a categorical variable is different from your expectations.

• The chi-square test of independence is used to test whether two categorical


variables are related to each other.

Chi- q w Χ2 ― -q ‖ (rhymes with ― y - q ‖ I


-squared.
What is a chi-square test?

P ‘ -q Χ 2) tests, often referred to simply as chi-square tests, are among the


most common nonparametric tests.

Note: P ‘ y g variable, but they can


involve a categorical variable as an independent variable (e.g., ANOVAs).

Frequency of visits by bird species


at a bird feeder during a 24-hour
period
Bird species Frequency

House sparrow 15

House finch 12

Black-capped 9 Test hypotheses about frequency


chickadee
distributions
Common grackle 8 w y P ‘ -square tests, but they
both European starling 8 test whether the observed frequency
distribution of
Mourning dove 6
categorical variable is significantly
different from its expected frequency distribution. A
frequency distribution describes how observations are distributed between different
groups.

Example: Bird species at a bird feeder


A chi-square test (a chi-square goodness of fit test) can test whether these observed
frequencies are significantly different from what was expected, such as equal
frequencies.

Example: Handedness and nationality

Contingency table of the handedness of a sample of Americans and Canadians


Right-handed Left-handed

American 236 19

Canadian 157 16

A chi-square test (a test of independence) can test whether these observed


frequencies are significantly different from the frequencies expected if handedness
is unrelated to nationality.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

• Academic style

• Vague sentenc

The chi-square formula


B P ‘ -square tests use the same formula to calculate the test statistic, chi-
q Χ2):

Where:

• Χ2 is the chi-square test statistic


• Σ ― ‖

• O is the observed frequency

• E is the expected frequency

The larger the difference between the observations and the expectations (O − E in
the equation), the bigger the chi-square will be.

When to use a chi-square test


AP ‘ -square test may be an appropriate option for your data if all of the
following are true:

1. You want to test a hypothesis about one or more categorical variables. If


one or more of your variables is quantitative, you should use a different
statistical test. Alternatively, you could convert the quantitative variable into
a categorical variable by separating the observations into intervals.

2. The sample was randomly selected from the population.

3. There are a minimum of five observations expected in each group or


combination of groups.

Types of chi-square tests w y P ‘

-square tests are:

• Chi-square goodness of fit test

• Chi-square test of independence

Mathematically, these are actually the same test. However, we often think of them

y‘ Chi-square goodness of fit test


You can use a chi-square goodness of fit test when you have one categorical
variable. It allows you to test whether the frequency distribution of the categorical
variable is significantly different from your expectations.

Example: Hypotheses for chi-square goodness of fit testExpectation of equal


proportions

• Null hypothesis (H0): The bird species visit the bird


feeder in equal proportions.

• Alternative hypothesis (HA): The bird species visit the bird feeder in
different proportions.

Expectation of different proportions

• Null hypothesis (H0): The bird species visit the bird feeder in the same
proportions as the average over the past five years.

• Alternative hypothesis (HA): The bird species visit the bird feeder in
different proportions from the average over the past five years.

Chi-square test of independence


You can use a chi-square test of independence when you have two categorical
variables.

Example: Chi-square test of independence

• Null hypothesis (H0): The proportion of people who are left-handed is


the same for Americans and Canadians.

• Alternative hypothesis (HA): The proportion of people who are


lefthanded differs between nationalities.

Other types of chi-square tests


Some consider the chi-square test of homogeneity to be another variety of P ‘
-square test. It tests whether two populations come from the same distribution by
determining whether the two populations have the same proportions as each other.

McNemar’s test is a test that uses the chi- q I ‘ v y P ‘

-q , ‘ y Y w you have a related pair of

categorical variables that each have two groups.

Ex :MN ‘ S 100 w flavors of ice cream and


asked whether they like the taste of each.

Contingency table of ice cream flavor preference


Like chocolate Dislike chocolate

Like vanilla 47 32

Dislike vanilla 8 13

• Null hypothesis (H0): The proportion of people who like chocolate is the
same as the proportion of people who like vanilla.

• Alternative hypothesis (HA): The proportion of people who like


chocolate is different from the proportion of people who like vanilla.

There are several other types of chi- q P ‘ -square tests,


including
the test of a single variance and the likelihood ratio chi-square test. How

to perform a chi-square test

x g P ‘ -square test depends on which y ‘ g, g


y w :

1. Create a table of the observed and expected frequencies. This


can sometimes be the most difficult step because you will need to carefully
consider which expected values are most appropriate for your null
hypothesis.

2. Calculate the chi-square value from your observed and expected


frequencies using the chi-square formula.

3. Find the critical chi-square value in a chi-square critical value table

or using statistical software.

4. Compare the chi-square value to the critical value to determine

which is larger.

5. Decide whether to reject the null hypothesis. You should reject


the
null hypothesis if the chi-square value is greater than the critical value. If
you reject the null hypothesis, you can conclude that your data are
significantly different from what you expected.

How to report a chi-square test

I y P ‘ -square test in your research


paper, dissertation or thesis, you should report it in your results section. You can follow

these rules if you want to report statistics in APA Style:

• Y ‘ v -square test is a commonly used statistic.

•Refer to chi- q g G y , Χ2. Although the symbol looks v y


―X‖

L , ‘ y symbol. Greek symbols should not be italicized.

Practice questions powered by Typeform

Other interesting articles


If you want to know more about statistics, methodology, or research bias, make sure
to check out some of our other articles with explanations and examples.

Statistics

• Chi square test of independence

• Statistical power

• Descriptive statistics

Methodology

• Double-blind study • Case-control study

Research bias

• Hawthorne effect

• Unconscious bias
Chi-Square Goodness of Fit Test

• What is the Chi-square goodness of fit test?


• The Chi-square goodness of fit test is a statistical hypothesis test used to
determine whether a variable is likely to come from a specified distribution
or not.

• When can I use the test?

• You can use the test when you have counts of values for a categorical
variable.

• The Chi-square goodness of fit test checks whether your sample data is
likely to be from a specific theoretical distribution.

• What do we need?
For the goodness of fit test, we need one variable. We also need an idea, or
hypothesis, about how that variable is distributed. Here are a couple of
examples:

We have bags of candy with five flavors in each bag. The bags should
contain an equal number of pieces of each flavor. The idea we'd like to test is
that the proportions of the five flavors in each bag are the same.

Understanding results

L ‘ wg
A simple bar chart of the data shows the observed counts for the flavors of candy:

You might also like