Download as pdf or txt
Download as pdf or txt
You are on page 1of 80

Statistical Evaluation of Data

1 /52
Descriptive / inferential
• Descriptive statistics are methods that help
researchers organize, summarize, and simplify
the results obtained from research studies.

• Inferential statistics are methods that use the


results obtained from samples to help make
generalizations about populations.

2 /52
Statistic / parameter
• A summary value that describes a sample is
called a statistic. M=25 s=2

• A summary value that describes a population


is called a parameter. µ =25 σ=2

3 /52
Frequency Distributions
One method of simplifying and organizing a set
of scores is to group them into an organized
display that shows the entire set.

4 /52
Example

5 /52
Histogram & Polygon

6 /52
Bar Graphs

7 /52
Bar Graph

8 /52
Central tendency
The goal of central tendency is to identify the
value that is most typical or most representative
of the entire group.

9 /52
Central tendency
• The mean is the arithmetic average.
• The median measures central tendency by
identifying the score that divides the
distribution in half.
• The mode is the most frequently occurring
score in the distribution.

10 /52
Variability
Variability is a measure of the spread of scores
in a distribution.

1. Range (the difference between min and max)


2. Standard deviation describes the average
distance from the mean.
3. Variance measures variability by computing
the average squared distance from the mean.
11 /52
Variance = the index of variability.
SD = SQRT (Variance)
Variance = (Sum of Squares) / N
X X-M (X-M)2
10 4 16
7 1 1
9 3 9
8 2 4 Variance = 70/10= 7
7 1 1
6 0 0 SD = SQRT(7) =2.64
5 -1 1
4 -2 4
3 -3 9
1 -5 25

Total=60 SS =70
Mean=6
12 /52
Non-numerical Data
Proportion or percentage in each category.
For example,
• 43% prefer Democrat candidate,
• 28% prefer Republican candidate,
• 29% are undecided

13 /52
Hypothesis testing
• A hypothesis test is a statistical procedure that
uses sample data to evaluate the credibility of
a hypothesis about a population.

14 /52
5 elements of a hypothesis test
1. The Null Hypothesis
The null hypothesis is a statement about the
population, or populations, being examined, and
always says that there is no effect, no change, or no
relationship.

2. The Sample Statistic


The data from the research study are used to
compute the sample statistic.

15 /52
5 elements of a hypothesis test
3. The Standard Error
Standard error is a measure of the average, or standard distance
between sample statistic and the corresponding population
parameter.
"standard error of the mean , sm" refers to the standard deviation of the distribution of sample means taken from a population.

4. The Test Statistic


A test statistic is a mathematical technique for comparing the
sample statistic with the null hypothesis, using the standard
error as a baseline.

M 1− M 2
t=
sm
16 /52
5 elements of a hypothesis test
5. The Alpha Level ( Level of Significance)
The alpha level, or level of significance, for a
hypothesis test is the maximum probability that the
research result was obtained simply by chance.

A hypothesis test with an alpha level of .05, for


example, means that the test demands that there is
less than a 5% (. 05) probability that the results are
caused only by chance.
17 /52
Reporting Results from a Hypothesis
Test
• In the literature, significance levels are
reported as p values.

For example, a research paper may report a


significant difference between two treatments
with p <.05. The expression p <.05 simply means
that there is less than a .05 probability that the
result is caused by chance.

18 /52
Errors in Hypothesis Testing
If a researcher is misled by the results from the
sample, it is likely that the researcher will reach
an incorrect conclusion.
Two kinds of errors can be made in hypothesis
testing.

19 /52
Type I Errors
• A Type I error occurs when a researcher finds evidence
for a significant result when, in fact, there is no effect (
no relationship) in the population.
• The error occurs because the researcher has, by chance, selected an extreme sample that appears to show
the existence of an effect when there is none.

• The consequence of a Type I error is a false report. This


is a serious mistake.
• Fortunately, the likelihood of a Type I error is very
small, and the exact probability of this kind of mistake
is known to everyone who sees the research report.

20 /52
Type II error
• A Type II error occurs when sample data do
not show evidence of a significant effect
when, in fact, a real effect does exist in the
population.
• This often occurs when the effect is so small that it does not show up in the sample.

21 /52
Types of Errors
Controlled via sample size Typically restrict to a 5% Risk
(=1-Power of test) = level of significance

Study reports Study reports


NO difference IS a difference
(Do not reject H0) (Reject H0)
H0 is true
Difference Does
NOT exist in X Type I
Error
population
HA is true
Difference DOES
exist in population
X Type II
Error

Prob of this = Power of test /52


Type I and Type II Errors –
Example
Your null hypothesis is that the battery for a heart pacemaker
has an average life of 300 days, with the alternative
hypothesis that the average life is more than 300 days. You are
the quality control manager for the battery
manufacturer. (a)Would you rather make a Type I error or
a Type II error? (b)Based on your answer to part (a),
should you use a high or
low significance level?

/52
Type I and Type II Errors –
Example
Given H0 : average life of pacemaker = 300 days, and HA:
Average life of pacemaker > 300 days
0
a)It is better to make a Type II error (where H0 is false i.e
average life is actually more than 300 days but we accept H0
and assume that the average life is equal to 300 days)
(b)As we increase the significance level (α) we increase the
chances of making a type I error. Since here it is better to
make a type II error we shall choose a low α.

/52
Two Tail Test
Two tailed test will reject the null hypothesis if the sample mean
is significantly higher or lower than the hypothesized mean.
s c h ool
Appropriate when H0 : µ = µ0 and HA: µ ≠ µ0
e.g The manufacturer of light bulbs wants to produce light bulbs
with a mean life of 1000 hours. If the lifetime is shorter he will
lose customers to the competition and if it is longer then he will
incur a high cost of production. He does not want to deviate
significantly from 1000 hours in either direction. Thus he
selects the hypotheses as
H0 : µ = 1000 hours and HA: µ ≠ 1000 hours and uses a
two tail test.

/52
One Tail Test
A one-sided test is a statistical hypothesis test in which the
values for which we can reject the null hypothesis, H0 are
located entirely in one tail of the probability distribution.
school
Lower tailed test will reject the null hypothesis if the sample
mean is significantly lower than the hypothesized mean.
Appropriate
when H0 : µ = µ0 and HA: µ < µ0
e.g A wholesaler buys light bulbs from the manufacturer in large
lots and decides not to accept a lot unless the mean life is at
least 1000 hours.
H0 : µ = 1000 hours and HA: µ <1000 hours and uses a lower
tail test.
i.e he rejects H0 only if the mean life of sampled bulbs is
significantly below 1000 hours. (he accepts HA and rejects the /52
One Tail Test
Upper tailed test will reject the null hypothesis if the sample mean
is significantly higher than the hypothesized mean. Appropriate
when H0 : µ = µ0 and HA: µ > µ0 school
e.g A highway safety engineer decides to test the load
bearing capacity of a 20 year old bridge. The minimum
load-bearing capacity of the bridge must be at least 10 tons.
H0 : µ = 10 tons and HA: µ >10 tons
and uses an upper tail test.
i.e he rejects H0 only if the mean load bearing capacity of the
bridge is significantly higher than 10 tons.

/52
Hypothesis test for
population mean Sample mean
n(x − 0 )
H0 : µ = µ0 and Test statistic  =
s
Population mean
For HA: µ > µ0, reject H0 if   t n −1, school
For HA: µ < µ0, reject H0 if   −t n −1,

For HA: µ ≠ µ0, reject H0 if   tn−1, 2

For n ≥ 30, replace tn−1, by z

/52
Hypothesis test for population
mean
A weight reducing program that includes a strict diet and exercise
claims on its online advertisement that an average
overweight person lose 10 pounds in three months. Following
the program’s method a group of twelve overweight persons
have lost 8.1 5.7 11.6 12.9 3.8 5.9 7.8 9.1 7.0 8.2 9.3
and 8.0 pounds
in three months. Test at 5% level of significance whether the
program’s advertisement is overstating the reality.

/52
Hypothesis test for population
mean
Solution:
H0: µ = 10 (µ0) HA: µ < 10 (µ0)
n = 12, x(bar) = 8.027, s = 2.536,  = 0.05
12(8.075 − 10) 3.46  −1.925
= = = −2.62
2.536 2.536
Critical t-value = -tn-1,α= - t11,0.05= -2. 201 (TINV)
Since  < -tn-1,α we reject H0 and conclude that the
program is overstating the reality.
(What happens if we take  = 0.01? Is the program
overstating the reality at 1% significance level?)
/52
Hypothesis test for population
proportion
n(p̂ − p0 )
H0 : p = p0 and Test statistic  =
p0 (1− p0 )
For HA: p > p0 reject H0 if   z
school
For HA: p < p0 reject H0 if   −z

For HA: p ≠ p0 reject H0 if   z 2

/52
Hypothesis test for population
proportion
n(p̂ − p0 )
H0 : p = p0 and Test statistic  =
p0 (1− p0 )
For HA: p > p0 reject H0 if   z
school
For HA: p < p0 reject H0 if   −z

For HA: p ≠ p0 reject H0 if   z 2

/52
Hypothesis test for population
proportion
A ketchup manufacturer is in the process of deciding whether to
produce an extra spicy brand. The research
department used a national telephone survey of 6000
households and found the extra spicy ketchup would be
purchased by 335 of them. A much more extensive study made
two years ago showed that 5% of the households would
purchase the brand then. At a 2% significance level, should the
company conclude that there is an increased interest in the
extra-spicy flavor?

/52
Hypothesis test for population
proportion
335
n = 6000, pˆ = = 0.05583
6000
H0 : p = 0.05(p0 ) HA : p  0.05 school
n(p̂ − p0 ) 6000  0.00583
= =
p0 (1− p0 ) 0.05  0.95
77.459  0.00583
= = 2.072
0.218
 = 0.02
Z (the critical value of Z ) = 2.05 (NORMSINV)
  Z we reject H0 i.e the current interest is significantly greater
than the interest of two years ago.
/52
Hypothesis test for population
standard deviation
(n −1)s 2
H0 :  =  0 and Test statistic  =
 02
For HA:  > 0 reject H0 if    (n−1),
2(R )
 school
For HA:  < 0 reject H0 if   (n−1),1−
2(R )

For HA:  ≠ 0 reject H0 if   (n−1),1−


2(R )
2 or    2(R )
(n−1), 2

/52
Hypothesis test for comparing
two population means
Consider two populations with means µ1, µ2 and standard deviations 1 and 2.
x = 1 and x = 2 are the means of the sampling distributions of population1
1 2

and population2 respectively.  x and  x deno te th e sta nd arderrors of the


1

sampling distributions of the means.


2
sc h o o l
12  22
x1−x 2 is the mean of the difference between sample means and X1 − X2 = +
n1 n2
is the corresponding standard error.
( X 1 − X 2 ) − ( 1 − 2 )H
H0 : µ1 = µ2 and test statistic,  = 0

 x −x
For HA: µ1 > µ2 reject H0 if  > Z 1 2

Here  denotes the


For HA: µ1 < µ2 reject H0 if  <-
standardized difference of
Z  HA: µ1  µ2 reject H0 if
For   Z 2 sample means

(decision makers may be concerned with parameters of two populations e.g


do female employees receive lower salary than their male counterparts for
the same job)
/52
Hypothesis test for comparing
population means

A sample of 32 money market mutual funds was chosen on


January 1, 1996 and the average annual rate of return over the
past 30 days was found to be 3.23% and the sample standard
deviation was 0.51%. A year earlier a sample of 38 money-
market funds showed an average rate of return of 4.36% and
the sample standard deviation was 0.84%. Is it reasonable to
conclude (at α
= 0.05) that money-market interest rates declined during
1995?

/52
Hypothesis test for comparing
population means
n1 = 32, x1 = 3.23, 1 = 0.51 n2 = 38, x2 = 4.36,  2 = 0.84
H0 : 1 = 2 HA : 1  2
 12  2 2 0.26 0.71 school
 X −X = + = + = 0.026 = 0.163
1 2
n1 n2 32 38
(x1 − x2 ) − ( 1 − 2 )H0 −1.13 − 0
= = = −6.92
 X −X
1 2
0.163
 = 0.05
Critical value of Z = −Z = −1.64
  −Z we reject H0 and conclude that there has
been a decline.

/52
Hypothesis test for comparing
population proportions
Consider two samples of sizes n1 and n2 with p1and p2 as the respective
proportions of successes. Then

p̂ =
n1p1 + n2 p2
n1 + n2 populations.
s cho o l
is the estimated overall pr op or tio no fs uccesses in the two

p̂q̂ p̂q̂
+ is the estimated standard error of the difference
̂ p −p =
1 2
n1 n2 between the two proportions.
(p1 − p2 ) − (p1 − p2 )H 0
H0 : p1 = p2 and test statistic,  =
̂ x −x
For HA: p1 > p2 reject H0 if  >
1 2

A training director may wish to


Z For HA: p1 < p2 reject H0 if 
determine if the proportion of
<-
ForZH A: p1  p2 reject H0 if   Z 2 promotable employees at one office
is different from that of another.

/52
Hypothesis test for comparing
population proportions
A large hotel chain is trying to decide whether to convert more of
its rooms into non-smoking rooms. In a random sample of 400
school
guests last year, 166 had requested non-smoking rooms. This year
205 guests in a sample of 380 preferred the non-smoking rooms.

• Would you recommend that the hotel chain


convert more rooms to non-smoking? Support
your recommendation by testing the appropriate
hypotheses at a 0.01 level of significance.

/52
Hypothesis test for comparing
population
166
proportions
205
n1 = 400,p1 = = 0.415, n2 = 380,p2 = = 0.5395
400 380
H0 : p1 = p2 H A : p1  p2

p̂ =
n1p1 + n2 p2 400  0.415 + 380  0.5395
=
sch oo l ( Proportionof success
= 0 .4 75 7 in the two populations)
n1 + n2 400 + 380

1 1 1 
̂ p1 −p2 = p̂q̂  +  = 0.4757  0.5243 
1
+  = 0.0358
 n1 n2   400 380 
 = 0.01 The hotel chain should
Critical value of Z = −Z = −2.32 convert more rooms to non-
(p1 − p2 ) − (p1 − p2 )H smoking rooms as there has
−0.1245 − 0
= 0
= = −3.48 been a significant increase
ˆpˆ1 −p̂2 0.0358 in the number of guests
  −Z we reject H 0 seeking non-smoking
rooms.
/52
Steps to undertaking a Hypothesis test
Hypothesis Testing Flowchart
Define study question
Choose a
Set null and alternative hypothesis suitable
test
Calculate a test statistic

Calculate a p-value

Make a decision and interpret


your conclusions

/52
HYPOTHESIS
TESTING

Null hypothesis, H0 Alternative hypothesis,HA

▪State the hypothesized value of the All possible alternatives other than
parameter before sampling. the null hypothesis.
▪The assumption we wish to test E.g µ ≠ 20 µ > 20
(or the assumption we are trying to
reject) µ < 20
▪E.g population mean µ = 20 There is a difference between coke
and diet coke
▪There is no difference between
coke and diet coke

/52
Null Hypothesis

The null hypothesis H0 represents a theory that has been put


forward either because it is believed to be true or because it
school
is used as a basis for an argument and has not been
proven. For example, in a clinical trial of a new drug, the
null hypothesis might be that the new drug is no better,
on average, than the current drug. We would write
H0: there is no difference between the two drugs on an
average.

/52
Alternative Hypothesis
The alternative hypothesis, HA, is a statement of what a
statistical hypothesis test is set up to establish. For example, in
trial sc ho o l
of a new drug, the alternative hypo t he si s mi ghtbe that the
the clinical
new drug has a different effect, on average, compared to that of
the current drug. We would write
HA: the two drugs have different effects, on average. or
HA: the new drug is better than the current drug, on average.

The result of a hypothesis test:


‘Reject H0 in favour of HA’ OR ‘Do not reject H0’

/52
Procedure of Hypothesis Testing
The Hypothesis Testing comprises the following
steps: Step 1
Set up a hypothesis.

Step 2
Set up a suitable significance level.
The confidence with which an experimenter rejects or accepts Null
Hypothesis depends on the significance level adopted. Level of
significance is the rejection region ( which is outside the confidence
or acceptance region).The level of significance, usually denoted by the
α.

/52
Selecting and interpreting
significance level
1. Deciding on a criterion for accepting or rejecting the null
hypothesis.
s c ho
2. Significance level refers to the percen ta ge o f

ol
s a mple means that is outside certain prescribed limits. E.g
testing a hypothesis at 5% level of significance means
▪ that we reject the null hypothesis if it falls in the two regions
of area 0.025.
▪ Do not reject the null hypothesis if it falls within the region of
area 0.95.
3. The higher the level of significance, the higher is the
probability of rejecting the null hypothesis when it is true.
(acceptance region narrows)
/52
Critical value Critical value

/52
If our sample statistic(calculated value) fall in the non-
shaded region( acceptance region), then it simply means
that there is no evidence to reject the null hypothesis.

It proves that null hypothesis (H0 ) is true. Otherwise, it


will be rejected.
Step 3
Determination of suitable test statistic: For example
Z, t Chi-Square or F-statistic.
Step 4
Determine the critical value from the table.

/52
After doing computation, check the sample result.
Compare the calculated value( sample result) with the value
obtained from the table.(tabulated or critical value)

Step 6
Making Decisions
Making decisions means either accepting or rejecting the
null hypothesis.
If computed value(absolute value) is more than the
tabulated or critical value, then it falls in the critical region.
In that case, reject null hypothesis, otherwise accept.

/52
Type I and Type II Errors
When a statistical hypothesis is tested, there are 4 possible
results:
(1)The hypothesis is true but our test accepts it. (2)The
hypothesis is false but our test rejects it. (3)The hypothesis
is true but our test rejects it. (4)The hypothesis is false but
our test accepts it.

Obviously, the last 2 possibilities lead to errors.

Rejecting a null hypothesis when it is true is called a Type I


error.
Accepting a null hypothesis when it is false is called Type II
error.

/52
Example 1 - Court Room Trial
In court room, a defendant is considered not guilty as
long as his guilt is not proven. The prosecutor tries to
prove the guilt of the defendant. Only when there is
enough charging evidence the defendant is condemned.
In the start of the procedure, there are two hypotheses
H0: "the defendant is not guilty", and H1: "the
defendant is guilty". The first one is called null
hypothesis, and the second one is called alternative
(hypothesis).

/52
Null Hypothesis (H0) is Alternative
true He is not guilty Hypothesis (H1) is
true
He is guilty
Accept Wrong
Right decision
Null decision
Hypothesi
Reject Wrong Type II Error
sNull Right decision
decision
Hypothesi Type I Error
s

/52
Chi squared Test?
• Null: There is NO association between
class and survival
• Alternative: There IS an association between
class and survival
contingency table
3x2

www.statstutor.ac.uk /52
What would be expected if the null is true?
• Same proportion of people would have died in each class!
• Overall, 809 people died out of 1309 = 61.8%

www.statstutor.ac.uk /52
What would be expected if the null is true?
• Same proportion of people would have died in each class!
• Overall, 809 people died out of 1309 = 61.8%

www.statstutor.ac.uk /52
Chi-squared test statistic
• The chi-squared test is used when we want to see if
two categorical variables are related
• The test statistic for the Chi-squared test uses the
sum of the squared differences between each pair of
observed (O) and expected values (E)

 =
2
n
(Oi − Ei ) 2

i =1 Ei
www.statstutor.ac.uk /52
*
*Following steps are required to calculate the value of chi-square.

1.Identify the problem


2.Make a contingency table and note the observed frequency (O) is
each classes of one event, row wise i.e. horizontally and then the
numbers in each group of the other event, column wise i.e. vertically.
3.Set up the Null hypothesis (Ho); According to Null hypothesis, no
association exists between attributes. This need s setting up of
alternative hypothesis (HA). It assumes that an association exists
between the attributes.
4.Calculate the expected frequencies (E).
5.Find the difference between observed and Expected frequency in
each cell (O-E).
6. Calculate the chi-square value applying the formula. The value is
/52
ranges from zero to Infinite.
*Solved Example-
Two varieties of snapdragons, one with red flower and other with
white flowers were crossed. The result obtained in F2 generation
are: 22 red, 52 pink, and 23 white flower plants. Now it is desired to
ascertain these figures show that segregation occurs in the simple
Mendelian ratio of 1:2:1.
*Solution-
Null hypothesis- H0 : The genes carrying red color and
white color characters are segregating in simple
Mendelian ratio of 1:2:1
1
*Expected2 frequencies:
Red = 4 × 97 =1 24.25

Pink = 4 × 97 = 48.50
/52
Red Pink White Total

Observed frequency (O) 22 52 23 97

Expected frequency (E) 24.25 48.50 24.25 97

Deviation (O-E) -2.25 3.50 -1.25

= (-2.25)2/ 24.25 + (3.50)2/ 48.50 + (-1.25)2/ 24.25

/52
5.06 12.25 1.56
= + +
24.25 48.50 24.25

= 0.21 + 0.25 + 0.06

= 0.53
Conclusion –
The calculated Chi-square value (.53) is less than the
tabulated chi-square value (5.99) at 5% level of
probability for 2 d.
f. The hypothesis is, therefore, in agreement with the
recorded facts.
/52
ANOVA
• If the number of samples is more than two the Z-test and t-
test cannot be used.
• The technique of variance analysis developed by fisher is very
useful in such cases and with its help it is possible to study the
significance of the difference of mean values of a large no.of
samples at the same time
• ANOVA, or Analysis of Variance, is a statistical method used to
compare the means of three or more samples to determine if
at least one of the sample means significantly differs from the
others.
• ANOVA tests the null hypothesis that all groups have the same
population mean, against the alternative hypothesis that at
least one group is different.
62 /52
Applications and Benefits of ANOVA towards Data Science
Applications of ANOVA
• A/B Testing: ANOVA is widely used in A/B testing where multiple versions of a web page
or app are compared to determine which one performs better on a given metric.
• Machine Learning Model Evaluation: It can be used to compare the performance of
different models or algorithms by treating the performance metric (e.g., accuracy, F1
score) as the dependent variable.
• Feature Selection: ANOVA can help in identifying which categorical variables (features)
have a significant impact on the target variable, which is crucial in model building and
optimization.

Benefits of ANOVA
• Efficiency: Allows for the simultaneous comparison of more than two groups, which is
more efficient than conducting multiple t-tests.
• Insightful: Provides insights into data by identifying variables that significantly impact
the outcome, aiding in understanding relationships in the data.
• Versatility: ANOVA can be applied to a wide range of data science projects, from
exploratory data analysis to complex experimental designs.

63 /52
Terminologies of ANOVA
1. Independent Variable (Factor)
The variable that is being manipulated or categorized in an experiment. In ANOVA, it refers to
the groups or treatments being compared. If an ANOVA has one independent variable, it's
called a one-way ANOVA; if it has two, it's called a two-way ANOVA, and so on.

2. Dependent Variable
The outcome or response variable. This is the variable that is measured in the experiment and
is believed to be influenced by the independent variable(s).

3. Levels
The different categories or groups within an independent variable. For example, if the
independent variable is "fertilizer type," the levels would be the specific types of fertilizers
being tested.

4. Between-Group Variance (Between-Subjects Variance)


The variability among the different groups being compared. It measures how much the group
means deviate from the overall mean, indicating the effect of the independent variable on the
dependent variable.

5. Within-Group Variance (Within-Subjects Variance)


The variability within each group. It measures how much the individual scores in each group
deviate from their respective group mean. This variance is attributed to random error or 64 /52
individual differences not due to the independent variable.
Terminologies of ANOVA
6. Total Variance
•The overall variability in the dependent variable. It's the sum of the between-group variance
and the within-group variance.
7. F-Ratio (F-Statistic)
•The ratio of between-group variance to within-group variance. It is used to determine
whether the differences among group means are statistically significant. A higher F-ratio
suggests a greater likelihood that significant differences exist among the group means.
8. Degrees of Freedom
•A concept used in statistical tests that refers to the number of independent pieces of
information used in the calculation of a statistic. For ANOVA, there are degrees of freedom for
the numerator (related to between-group variance) and the denominator (related to within-
group variance).
9. P-Value
•The probability of observing the results given that the null hypothesis is true. A small p-value
(typically ≤ 0.05) indicates strong evidence against the null hypothesis, leading to its rejection
in favor of the alternative hypothesis.
10. Null Hypothesis (H0)
•The hypothesis that there is no effect or no difference between groups. In ANOVA, it suggests
that all group means are equal.
11. Alternative Hypothesis (H1 or Ha)
•The hypothesis that there is an effect or a difference between groups. In ANOVA, it suggests
65 /52
that at least one group mean is different from the others.
Terminologies of ANOVA - Example
let's consider a study that aims to evaluate the effect of different study methods on exam
performance. In this study, students are divided into four groups, each group using a
different study method: Group A uses flashcards, Group B participates in group study
sessions, Group C reviews lecture recordings, and Group D reads textbooks only. The exam
scores of the students after using these methods are recorded to analyze the effectiveness
of each study method.
1. Independent Variable (Factor)
•Example: The study method is the independent variable, with four levels (flashcards,
group study, lecture recordings, textbooks).
2. Dependent Variable
•Example: The exam score of each student is the dependent variable, which is believed to
be influenced by the study method.
3. Levels
•Example: The levels of the independent variable (study method) are flashcards, group
study, lecture recordings, and textbooks.
4. Between-Group Variance (Between-Subjects Variance)
•Example: This would measure how much the average exam scores of each study method
group (A, B, C, D) deviate from the overall average exam score of all groups combined. A
high between-group variance suggests that the study method significantly affects exam
scores.
66 /52
Terminologies of ANOVA - Example
5. Within-Group Variance (Within-Subjects Variance)
•Example: This measures the variance in exam scores within each group of students using
the same study method. For instance, it measures how much individual scores in the
flashcard group vary from the average score of that group, indicating variability due to
factors other than the study method.
6. Total Variance
•Example: The total variance in exam scores would be the sum of the variance between the
groups (due to different study methods) and the variance within each group (due to
individual differences).
7. F-Ratio (F-Statistic)
•Example: If the between-group variance (effect of study methods on exam scores) is
significantly larger than the within-group variance (individual differences in exam scores), the
F-ratio would be high, suggesting that the choice of study method has a significant effect on
exam performance.
8. Degrees of Freedom
•Example: In this study, the degrees of freedom for the numerator (between-group) would
be the number of groups minus one (4-1=3), and for the denominator (within-group), it
would be the total number of observations minus the number of groups.

67 /52
Terminologies of ANOVA - Example

9. P-Value
•Example: If the p-value is less than 0.05, it suggests that there is a statistically significant
difference in exam scores among at least some of the study methods.
10. Null Hypothesis (H0)
•Example: The null hypothesis would state that there is no difference in average exam scores
between the four study methods (flashcards, group study, lecture recordings, textbooks).
11. Alternative Hypothesis (H1 or Ha)
•Example: The alternative hypothesis suggests that at least one study method leads to
significantly different average exam scores compared to the others.

68 /52
Degree of Freedom df
Degrees of freedom refer to the number of independent pieces of information used in the
calculation of a statistic.
Suppose a teacher wants to compare the final exam scores of students who used three
different study methods: Method A, Method B, and Method C. Each method is used by a
different group of students. Here are the group sizes:
•Method A: 5 students
•Method B: 6 students
•Method C: 4 students

Total Number of Observations


First, calculate the total number of observations across all groups. This is simply the sum of
the students in all groups.
Total number of observations=5+6+4=15Total number of observations=5+6+4=15

Degrees of Freedom Between Groups (df Between)


The degrees of freedom between groups (df Between) is calculated based on the number of
groups. It is the number of groups minus one.
df Between=Number of groups−1df Between=Number of groups−1
Given there are 3 groups (A, B, C):
df Between=3−1=2df Between=3−1=2
69 /52
Degree of Freedom df

Degrees of Freedom Within Groups (df Within)


The degrees of freedom within groups (df Within) is calculated based on the total number
of observations minus the number of groups.
df Within=Total number of observations−Number of groupsdf Within=Total number of obser
vations−Number of groups
With 15 total observations across 3 groups:
df Within=15−3=12df Within=15−3=12

Total Degrees of Freedom


The total degrees of freedom (df Total) is calculated by subtracting 1 from the total number
of observations.
df Total=Total number of observations−1df Total=Total number of observations−1
Thus:
df Total=15−1=14df Total=15−1=14

These degrees of freedom are used in the ANOVA test to determine the critical values from
the F-distribution

70 /52
F-statistic in ANOVA
The F-statistic in ANOVA is calculated to determine whether there are any statistically
significant differences between the means of three or more groups.
It is derived from dividing the variance between the groups by the variance within the groups.

Imagine we have three diet plans (A, B, C) and we're interested in understanding their effect
on weight loss over a month. We have the following weight loss data (in pounds) for each
group:
•Diet A: 2, 4, 3, 5
•Diet B: 3, 5, 4, 6
•Diet C: 5, 7, 6, 8
Step 1: Calculate the mean weight loss for each group and the overall mean
•Mean of Diet A: (2+4+3+5)/4=3.5(2+4+3+5)/4=3.5
•Mean of Diet B: (3+5+4+6)/4=4.5(3+5+4+6)/4=4.5
•Mean of Diet C: (5+7+6+8)/4=6.5(5+7+6+8)/4=6.5
•Overall Mean (Grand Mean): (3.5+4.5+6.5)/3=4.833(3.5+4.5+6.5)/3=4.833
Step 2: Calculate the Between-Group Variance (SSB: Sum of Squares Between)
This measures how much each group mean deviates from the grand mean, weighted by the
number of observations in each group.
SSB=n(mean of Diet A−grand mean)2+n(mean of Diet B−grand mean)2+n(mean of Diet C−gran
d mean)2
Where n is the number of observations per group (in this case, 4).
SSB=4(3.5−4.833)2+4(4.5−4.833)2+4(6.5−4.833)2 = 18.668 71 /52
F-statistic in ANOVA
Calculating SSW
SSW is the sum of squared deviations of each observation from its group mean. It's calculated
for each group and then summed up.

•Mean of Diet A: 3.5


•Mean of Diet B: 4.5
•Mean of Diet C: 6.5

For Diet A:
SSWA=(2−3.5)2+(4−3.5)2+(3−3.5)2+(5−3.5)2 = 5
For Diet B:
SSWB=(3−4.5)2+(5−4.5)2+(4−4.5)2+(6−4.5)2 = 5
For Diet C:
SSWC​=(5−6.5)2+(7−6.5)2+(6−6.5)2+(8−6.5)2 = 5
Total SSW:
SSW​=SSWA​+SSWB​+SSWC​ = 15 72 /52
F-statistic in ANOVA
Step 4: Calculate the Mean Square Between (MSB) and Mean Square Within (MSW)
•MSB (Mean Square Between Groups):

MSB=SSB/dfbetween
Where dfbetween ​= number of groups−1= 3−1= 2
MSB=18.668/2=9.334

•MSW (Mean Square Within Groups):


MSW=SSW/dfwithin​
Where dfwithin​=total number of observations−number of groups=12−3=9
MSW=15/9 = 1.666

Step 5: Calculate the F-Statistic


F=MSB/MSW
F=9.334/1.666
F≈5.6

F-statistic for given example is 5.6

73 /52
Steps of calculating ANOVA

Step 1: Formulate the Hypotheses


•Null Hypothesis (H0): All group means are equal (μ1​=μ2​=…=μk).
•Alternative Hypothesis (Ha): At least one group mean is different from the others.
Step 2: Calculate the Means
•Find the mean of each group and the overall mean of all observations.
Step 3: Calculate the Sum of Squares Between Groups (SSB)

Where ni is the number of observations in group i, and k is the number of groups.

Step 4: Calculate the Sum of Squares Within Groups (SSW)

Here, xij represents individual observations within each group.

Step 5: Calculate the Total Sum of Squares (SST)


SST=SSB+SSW

Alternatively, it can be calculated directly from the variance of all observations from the
overall mean.
74 /52
Steps of calculating ANOVA

Step 6: Determine Degrees of Freedom


•Degrees of Freedom Between Groups (dfB): k−1.
•Degrees of Freedom Within Groups (dfW): N−k, where N is the total number of
observations.

Step 7: Calculate the Mean Squares


•Mean Square Between Groups (MSB): MSB=SSB/dfB.
•Mean Square Within Groups (MSW): MSW=SSW/dfW.

Step 8: Calculate the F-Statistic


F=MSW/MSB This ratio tests whether the variance between group means is significantly
greater than the variance within the groups.

Step 9: Compare F-Statistic to Critical Value


•Find the critical value of F for your degrees of freedom and chosen significance level
(usually 0.05) in an F-distribution table.
•If the calculated F is greater than the critical F, reject the null hypothesis.

75 /52
Types of ANOVA
1. One-Way ANOVA
•Description: Tests the effect of a single factor (independent variable) on a continuous
outcome variable across two or more groups.
•Use Case: Comparing the effectiveness of different diets on weight loss, where the diets are
the only variable being tested.
2. Two-Way ANOVA
•Description: Evaluates the impact of two independent variables on a continuous outcome,
allowing for the examination of interactions between factors.
•Use Case: Examining how diet (Factor A) and exercise regimen (Factor B) together affect
weight loss, including any interaction effects between diet and exercise.
3. Repeated Measures ANOVA
•Description: Used when the same subjects are measured multiple times under different
conditions or over time.
•Use Case: Measuring the cognitive performance of a group of individuals before and after
consuming different types of beverages (e.g., water, coffee, tea) at several intervals.
4. Multivariate ANOVA (MANOVA)
•Description: Extends ANOVA to test for the effect of independent variables on multiple
dependent variables simultaneously.
•Use Case: Investigating the effect of teaching methods on students' performance in different
subjects (e.g., math, science, literature) at once.
76 /52
Types of ANOVA
5. Mixed-Design ANOVA
•Description: Combines elements of between-subjects (involving different groups of subjects)
and within-subjects (involving the same subjects over time or conditions) designs, suitable for
more complex experiments.
•Use Case: Studying the impact of a training program (between-subjects factor) on skill
improvement over time (within-subjects factor), with some participants receiving training and
others not.
6. ANCOVA (Analysis of Covariance)
•Description: Blends ANOVA and regression, allowing for the examination of the impact of
one or more factors on a dependent variable while statistically controlling for the variability
associated with one or more covariates.
•Use Case: Evaluating the effectiveness of different teaching methods on final exam scores
while controlling for students' initial knowledge levels.
7. Factorial ANOVA
•Description: Designed to explore the effects of two or more independent variables at
multiple levels and their interactions on a dependent variable.
•Use Case: Assessing the impact of various levels of temperature and humidity on the growth
rate of a plant species.

77 /52
Difference between ANOVA and Hypothesis Testing

•ANOVA is a specialized form of hypothesis testing designed for comparing means across
three or more groups, using the F-statistic to test the null hypothesis of no difference
between group means. Hypothesis testing, on the other hand, is a more general approach
used to determine if sample data is significantly different from what is expected under the
null hypothesis, employing a variety of statistical tests depending on the nature of the data
and the hypothesis.

Key Difference
•Scope: Hypothesis testing is a broad concept encompassing various statistical tests,
including ANOVA, used to make inferences about population parameters. ANOVA is a
specific type of hypothesis test focused on comparing means across multiple groups.
•Application: Hypothesis testing can be applied to a wide range of questions and data
types (e.g., means, proportions, variances), while ANOVA specifically addresses questions
about the differences in means across groups.
•Statistical Test: While hypothesis testing uses various test statistics depending on the test
(e.g., t-statistic for t-tests, chi-square statistic for chi-square tests), ANOVA specifically uses
the F-statistic.

78 /52
Difference between ANOVA and T-Test

1. Purpose and Use Case


•T-test: Used to compare the means of two groups or samples. There are different types of
t-tests (independent samples t-test, paired sample t-test, and one-sample t-test)
depending on the study design and data structure.
•ANOVA: Used to compare the means of three or more groups or samples. ANOVA can be
extended to more complex designs, such as Two-Way ANOVA, which examines the effect
of two independent variables on a dependent variable.

2. Hypothesis Testing
•T-test: Tests the null hypothesis that there is no difference between the two group
means.
•ANOVA: Tests the null hypothesis that all group means are equal. If ANOVA indicates
significant differences, it means that at least one group mean differs from the others, but
it does not specify which groups are different.

3. Statistical Model
•T-test: Compares the means of two groups and considers the variability within those
groups.
•ANOVA: Compares the means across multiple groups by partitioning the total variance
into variance between groups and variance within groups.
79 /52
Difference between ANOVA and T-Test

4. Output and Interpretation


•T-test: Provides a t-statistic, degrees of freedom, and a p-value from which the
significance of the difference between two means can be directly interpreted.
•ANOVA: Provides an F-statistic and a p-value. The F-statistic is the ratio of the variance
between groups to the variance within groups. If the ANOVA is significant, further post hoc
tests are needed to determine which specific means are different.

6. Multiple Comparisons
•T-test: When multiple t-tests are used to compare more than two groups, it increases the
risk of Type I error (false positive). This is not an efficient or recommended approach for
comparing multiple groups.
•ANOVA: Designed to compare multiple groups in one go, thus controlling for Type I error
rate. However, if ANOVA shows a significant difference, post hoc pairwise comparisons are
needed to determine which specific groups differ.

The t-test is suitable for comparing two groups, while ANOVA is designed for three or
more groups. Using multiple t-tests to perform the job of an ANOVA not only increases the
computational burden but also inflates the chance of making a Type I error.

80 /52

You might also like