Biostats - Inferential

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

GE 2113 | BIO STATISTICS AND EPIDEMIOLOGY - LECTURE

INFIRENTIAL STATISTICS
TRANSCRIBED BY: ELLEMA, ARIANNE KENDRA P. | BMLS – 2D | RTRMF | Date Transcribed: 10/ 09/ 23

Inferential statistics involves drawing Null hypothesis used:


conclusions about populations by examining Hypothesis testing is used to assess the ( = ) equal to, the same as, not changed from,
samples. It allows us to make inferences about plausibility of a hypothesis by using sample is.
the entire set, including specific examples data.
within it, based on information obtained from a The null hypothesis is usually a hypothesis of
subset of examples. These inferences rely on the Such data may come from a larger population, equality between population parameters; e.g.,
principles of evidence and utilize sample or from a data-generating process. The word a null hypothesis may state that the population
statistics as a basis for drawing broader "population" will be used for both of these cases mean return is equal to zero.
conclusions. in the following descriptions.
The alternative hypothesis is effectively the
The accuracy of inferential statistics depends Hypothesis testing is a decision-making process opposite of a null hypothesis (e.g., the
largely on the accuracy of sample data and for evaluating claims about a population. population mean return is not equal to zero).
how it represents the larger population. This can Moreover, it is basically testing an assumption Thus, they are mutually exclusive, and only one
be effectively done by obtaining a random that we can make about a population. can be true. However, one of the two
sample. Results that are based on non-random hypotheses will always be true.
samples are usually discarded. Random Examples
sampling - though not very straightforward 1. Does the mean height of Second Year
always – is extremely important for carrying out Medtech students differ from 66 inches? Alternative Hypothesis
inferential techniques. 2. Is the proportion of Second Year Medtech -It is denoted by Ha or H1.
male student’s height significantly higher than -It is the contrary to the null hypothesis.
A. Hypothesis Testing the Second Year Medtech female students? -It shows that there is significant difference, an
B. Level of Significance effect, change, relationship between a
C. Calculated and Critical Value Null Hypothesis parameter and specific value.
D. Statistical Tools - It is denoted by H0. -The independent variable has an effect on the
- it is the initial claim. dependent variable.
HYPOTHESIS TESTING - It shows no significant difference, no changes, -Ha : Ц1 Ц2 , Ha : Ц1 > Ц2 , Ha : Ц1 < Ц2
a. Null nothing happened, no relationship between
b. Alternative two parameters. Alternative Hypothesis used:
- The independent variable has no effect on (≠) not equal, different from, changed from, not
Hypothesis testing, sometimes called the dependent variable. the same as
significance testing, is an act in statistics - H0 : Ц 1 = Ц 2 (>) greater than, above, higher than, longer
whereby an analyst tests an assumption (Note: Ц – population parameter) than, bigger than, increased, at least
regarding a population parameter.

AE | 2D 1
(<) less than, below, lower than, smaller than, Ha : The mean score in Ha : The current percentage of unmarried
shorter than, decreased or reduced from, at couple is different from 34%.
Math of the Second
most. or
Year Medtech Ha: Ц ≠ 8.1
H0: Ц = 34%
Stating/Formulating the Null and Alternative students is not 8.1. Ha: Ц ≠ 34%
Hypothesis for Each Statement
Example 1. The average of bus drivers in Example 6: A chemist invented an additive to
Tacloban City is 38.8 years. Example 4. A social worker wants to test (at α = increase the lifespan of rechargeable battery.
H0: Ц = 38.8 years 0.05 whether the average Body Mass Index The said additive will extend the battery’s life
Ha : Ц ≠ 38.8 years (BMI) of the students under feeding program is span to 48 months.
different from 18.2 kg.
Example 2: The average number of calories of H0 : The average lifespan extension of
a low-calorie meal is at most 300. Null and Alternative Null and Alternative rechargeable battery is 48 months.
H0: Ц = 300 calories Hypotheses in Words Hypotheses in Ha : The average lifespan extension of
Ha : Ц < 300 calories Symbols rechargeable battery is not 48 months.
H0 : The average BMI H0: Ц = 18.2 or
Example 3: The school has 99% confidence of the students under H0: Ц = 48 months
level cord claims that the mean score in Math feeding program is Ha: Ц ≠ 48 months or Ц < 48 months or Ц > 48
of the Second Year Medtech students is 8.1 The not different from months
teacher wishes to find out if the claim is true. 18.2 kg.
She tests if there is significant difference Ha : The average BMI Ha: Ц ≠ 18.2 Example 7: We want to test whether the mean
between the batch mean score and the mean of the students under height of Second Year Medtech students is 60
score of students in her class. feeding program is inches.
different from 18.2 kg. H0 : The mean height of Second Year Medtech
Null and Alternative Null and Alternative students is 60 inches.
Example 5: In 2015, it was recorded that around Ha : The mean height of Second Year Medtech
Hypotheses in Words Hypotheses in
34% of the population were not married. A students is not (or less than or higher than) 60
Symbols researcher surveyed a random sample of 500 inches.
H0 : The mean score in couples. He found out that 18% of them were or
living together but unmarried. Test at 5% H0: Ц = 60 inches
Math of the Second significance level if the current percentage of Ha: Ц ≠ 60 inches or Ц < 60 inches or Ц > 60
Year Medtech H0: Ц = 8.1 unmarried couple is different from 34%. inches
students is 8.1. H0 : The current percentage of unmarried BENEFITS OF HYPOTHESIS TESTING
couple is 34%. (or The current percentage of  Helps assess the accuracy of new ideas
unmarried couple is not different from 34%.) or theories by testing them against data.

AE | 2D 2
 Allows researchers to determine whether A level of significance of p = 0.05 means that • It is the probability of making the wrong
the evidence supports their hypothesis, there is a 95% probability that the results found decision when the null hypothesis is true.
helping to avoid false claims and in the study are the result of a true • In public health research, α is usually
conclusions. relationship/difference between groups being 0.01 or 1%. In social science research, α
 Provides a framework for decision- compared. It also means that there is a 5% is usually 0.05% or 5% and 0.10 or 10% in
making based on data rather than chance that the results were found by chance other studies.
personal or biases. alone and no true relationship exists between • This implies that there is 1%, 5%, or 10%
 Helps to reduce the effects of chance groups. probability of rejecting a true null
and confounding variables, providing a hypothesis.
robust framework for making informed For example, a level of significance of 0.05
conclusions. means that there is a 5% chance that the result Level of Significance in Symbol
is insignificant, or that it just happened by α = 0.01
LIMITATIONS OF HYPOTHESIS TESTING chance alone. α = 0.05 or
 Relies exclusively on data and doesn’t α = 0.10
provide a comprehensive Significant difference means that the difference Note: If the alternative hypothesis used is ≠,
understanding of the subject being between the groups could not have happened then alpha level will be divided by 2.
studied. by chance. It can happen that a difference is α/2 = 0.005
 The accuracy of the results depends on seen between two groups, however, if the α/2 = 0.025 or
the quality of the available data and difference is significant, then the researchers α/2 = 0.05
statistical methods used. accept that it was caused by the specific
 Inaccurate data or inappropriate experimental intervention used during the Level of Test Types
hypothesis formulation may lead to experiment. Significance α One-Tailed Two-tailed
incorrect conclusions or failed tests. 0.05 +1.645 or - ±1.96
 Can lead to errors, such as analyst either Note: The lower the level of significance, the 1.645
accepting or rejecting a null hypothesis less likely the result was found by chance; or, to 0.01 +2.33 or - ±2.58
when they shouldn’t have which can state it another way, the more likely the 2.33
results to false conclusion or missed researchers are correct in stating that the given 0.001 +3.09 or - ±3.30
opportunities to identify significant intervention is effective. 3.09
patterns or relationship in the data.
In summary, level of significance Example 1: Maria uses 5% level of significance
LEVEL OF SIGNIFICANCE • It is denoted by alpha or α which refers in proving that there is no significant change in
Level of significance means how sure a to the degree of significance in which the average number of enrollees in the 10
researcher is that the results found are not we accept or reject the null hypothesis. sections for the last two years. Find the level of
accidental (not by chance). • 100% accuracy is not possible in significance.
accepting or rejecting a hypothesis.

AE | 2D 3
It means that the chance that the null Answer: α = 0.10
hypothesis (H0) would be rejected when it is Interpretation:
true is 5%. (or α = 0.05 is the area under the Example 4: A sample of 100 private school ▪ If the value of the test statistic is less
normal curve within the rejection region.) students were surveyed whether they planned extreme than the critical value, then the
to transfer to public school in the incoming null hypothesis cannot be rejected.
school year. Results showed that 40% of these ▪ If the test statistic is more extreme than
students will transfer to public school. Test the the critical value, the null hypothesis is
average number of private school students that rejected and the alternative hypothesis
will transfer to public school is not 40% at 5% is accepted.
level of significance. ▪ In other words, the critical value divides
Answer: α = 0.025 the distribution graph into the
Example 5: The average number of years to acceptance and the rejection region. If
finish basic education is 14. A sample of 30 the value of the test statistic falls in the
senior high school students were asked and rejection region, then the null hypothesis
Example 2: In 2015, it was recorded that around found out that the mean number of years to is rejected otherwise it cannot be
34% of the population were not married. A finish their basic education is 12 with a standard rejected.
researcher surveyed a random sample of 500 deviation of 2 years. Test the hypothesis at 93%
couples. He found out that 18% of them were confidence interval that the average number Areas
living together but unmarried. Test at 5% of years to finish basic education is less than 14 Rejection Region: also known as critical region,
significance level if the current percentage of years. is the set of all values of the test statistic that
unmarried couples is different from 34%. Answer: α = 0.07 causes us to reject the null hypothesis.

Answer: α/2 = 0.025 CALCULATED AND CRITICL VALUE Non-rejection Region: also known as
acceptance region, is the set of all values of
Example 3: An average construction worker Critical Value the test statistic that causes us to fail to reject
hourly rate pay in the Philippines is Php 62.50  can be defined as a value that is the null hypothesis.
with a standard deviation of Php 6.01. A compared to a test statistic in hypothesis
random sample of 20 manufacturing workers testing to determine whether the null Level of Test Types
were asked on their hourly rate and found out hypothesis is to be rejected or not. Significance α One-Tailed Two-tailed
that they had an average pf Php 50.00 hourly  cut-off value that is used to mark the 0.05 +1.645 or - ±1.96
rate pay with a standard deviation of Php 5.00. start of a region where the test statistic, 1.645
Construct a 90% confidence interval that the obtained in hypothesis testing, is unlikely 0.01 +2.33 or - ±2.58
average hourly rate for construction workers is to fall in. In hypothesis testing, the critical 2.33
higher than the average hourly rate for value is compared with the obtained 0.001 +3.09 or - ±3.30
manufacturing workers. test statistic to determine whether the 3.09
null hypothesis has to be rejected or not.

AE | 2D 4
STEPS TO DETERMINE CRITICAL VALUE Test Statistics for 1 Sample t-test: - Reject the null hypothesis if the test
• Step 1: Subtract the confidence level Formula: statistic does not lie in the
from 100%. 100% - 95% = 5%. acceptance region (two-tailed
• Step 2: Convert this value to decimals to hypothesis test).
get α. Thus, α = 5%.
• Step 3: If it is a one-tailed test then the Where:
alpha level will be the same value in Ẍ is the sample mean
step 2. However, if it is a two-tailed test, μ. is the population mean
the alpha level will be divided by 2. s is the sample standard deviation
• Step 4: Depending on the type of test n is the size of the sample.
conducted the critical value can be
looked up from the corresponding Test Statistic for 2 Sample t-test >:
distribution table using the alpha value.  For one-tailed test or directional test:
- Reject the null hypothesis if test
T-CRITICAL VALUE statistics > t-critical value (right-tailed
A t-test is used when the population standard hypothesis test).
deviation is not known and the sample size is
lesser than 30. A t-test is conducted when the
population data follows a Student t distribution.

STEPS TO CALCULATE T-VALUE


• Determine the alpha level.
• Subtract 1 from the sample size. This
gives the degrees of freedom (df).  For one-tailed test or directional test:
• If the hypothesis test is one-tailed - Reject the null hypothesis if test
then use the one-tailed t distribution statistic < t critical value (left tailed
table. Otherwise, use the two-tailed t hypothesis test).
distribution table for a two-tailed test.
• Match the corresponding df value
(left side) and the alpha value (top The decision criterion is used for all tests. Only
row) of the table. Find the the test statistics and critical value change.
intersection of this row and column
to give the t critical value. Z-CRITICAL VALUE
A z test is conducted on a normal distribution
 For two-tailed test or directional test <: when the population standard deviation is

AE | 2D 5
known and the sample size is greater than or STEPS ON CALCULATING F CRITICAL VALUE • Using the chi-square distribution table,
equal to 30. • Find the alpha level. the intersection of the row of the df and
• Subtract 1 from the size of the first the column of the alpha value yields the
STEPS ON CALCULATING Z-CRITICAL VALUE sample. This gives the first degree of chi-square critical value.
• Find the alpha level. freedom. Say, x
• Subtract the alpha level from 1 for a • Similarly, subtract 1 from the second Test statistic for chi-squared test statistic:
two-tailed test. For a one-tailed test sample size to get the second df. Say, y.
subtract the alpha level from 0.5. • Using the f distribution table, the
• Look up the area from the z distribution intersection of the x column and y row
table to obtain the z critical value. For a will give the f critical value.
left-tailed test, a negative sign needs to
be added to the critical value at the CRITICAL VALUE CALCULATION
end of the calculation. Example1: Suppose a right-tailed z test is being
Test Statistic for large samples: , conducted. The critical value needs to be
Test Statistics for one sample z-test: calculated for a 0.0079 alpha level. Then the
variance of the first sample and variance steps are as follows:
of the second sample. • Subtract the alpha level from 0.5. Thus,
, ð is the population standard 0.5 – 0.0079 = 0.4921
deviation. • Using the z distribution table find the
Test Statistic for small samples: , area closest to 0.4921. The closest area is
Test Statistic for two sample Z-test: 0.4922. As this value is at the intersection
variance of the first sample and variance of 2.4 and 0.02 thus, the z critical value =
of the second sample. 2.42.

CHI-SQUARE CRITICAL VALUE


The chi-square test is used to check if the
sample data matches the population data. It
can also be used to compare two variables to
see if they are related. ILLUSTRATING THE REJECTION REGION

THE CHI-SQUARE CRITICAL VALUE IS GIVEN AS Example 1: According to a survey, the average
F-CRITICAL VALUE FOLLOWS: daily usage of social media worldwide of
The F test is largely used to compare the • Identify the alpha level. global internet users amounts to 142 minutes
variances of two samples. The test statistic so • Subtract 1 from the sample size to per day. Sofia conducts her own survey among
obtained is also used for regression analysis. determine the degrees of freedom (df). her 10 friends to find if their time spent on social
media is significantly higher than the global
survey.

AE | 2D 6
Friend’s Name Minutes per The computed t-value = 1.593 the global computed t-
Day Spent on The critical t-value = 1.833 usage. value is at the
Social Media non-rejection
Illustration: Ha: The region.
Allen 132
average
Bryan 148
daily usage
Ellen 165
of social
Jake 157
media of
Mindie 120
her friends is
Shamsie 144 higher than
Candice 136 the global
Dory 160 usage.
Mitch 185
Mila 173 Example 2: A medical trial is conducted to test
Important Note:
- If the computed value is less than the whether a certain drug reduces cholesterol
Formulating Hypotheses: level. Upon trial, the computed z-value of 2.715
critical value, the computed value is
Null Hypothesis (H0): The average daily usage lies in the rejection area.
at the non-rejection area (or the
of social media of her friends is the same as the
computed value is at the
global usage. Null Hypothesis (H0): The certain drug is
acceptance region.), thus, we fail to
Alternative Hypothesis (Ha): The average daily effective in reducing cholesterol level by 60%.
reject the null hypothesis (or we
usage of social media of her friends is higher Alternative Hypothesis (Ha): The certain drug is
accept the null hypothesis.)
than the global usage. not effective in reducing cholesterol level by
- If the computed value is greater
Given: 60%.
than the critical value, the
Ц = 142 n = 10 Ẍ = 152 s = 19.855
computed value is at the rejection
Asked: t = ? Illustration:
area, thus, we reject the null
Solution:
hypothesis.

Interpretation of Data:
t= = 1.593 H0: The
average Since the We fail to
Determine the critical value. Use df = n – 1 = 9,
daily usage computed reject the null
one-tailed test at 5% level of significance.
of social value is less hypothesis, H0.
media of than the (or We accept
To find the critical values, use the t – distribution
her friends is critical value, the null
Table: (not attached)
the same as the hypothesis.) The computed z-value is greater than the
critical value.

AE | 2D 7
hypothesis, H0 (or we accept the null Answer: Rejection
hypothesis, H0)
 If the computed value is at the rejection Example 6: Given
Interpretation of Data: area, we accept the null hypothesis, H0. Critical t-value of ± 1.311
H0: The Computed t-value of -1.134
certain drug The certain We reject the Interpretation of Data: The computed t-value is at the _______________
is effective in drug is not null The computed t-value is at the non- region.
reducing effective in hypothesis, H0 rejection area. Therefore, we fail to reject the
cholesterol reducing in favor of null hypothesis, H0 (or we accept the null
level by 60%. cholesterol alternative hypothesis, H0.)
Ha: The level by 60%. hypothesis,
certain drug Ha. (or We Example 4: Given:
is not accept the Critical t-value of 1.318
effective in alternative Computed t-value of 1.1 Answer: Acceptance
reducing hypothesis.) The computed t-value is at the _______________
cholesterol region.
STATISTICAL TOOLS
level by 60%.
Correlation
Example 3: Sketch the rejection region of the A correlation is a statistical measure of the
test hypothesis with critical values of ± 1.753 relationship between two variables. The
and determine if the computed t-value of – measure is best used in variables that
1.52 lies in that region. demonstrate a linear relationship between
Illustration: Answer: Acceptance each other. The fit of the data can be visually
represented in a scatterplot. Using a
Example 5: Given scatterplot, we can generally assess the
Critical t-value of -1.671 relationship between the variables and
Computes t-value of -2.45 determine whether they are correlated or not.
The computed t-value is at the _______________
region. Formula:

Important Note:
Where:
 If the computed value is at the non-
rejection area, we fail to reject the null

AE | 2D 8
• rxy – the correlation coefficient of the Example: John is an investor. His portfolio
linear relationship between the variables primarily tracks the performance of the S&P 500 Pearson R Correlation
x and y and John wants to add the stock of Apple Inc. The Pearson correlation coefficient (r) is the
• xi – the values of the x-variable in a Before adding Apple to his portfolio, he wants most common way of measuring a linear
sample to assess the correlation between the stock and correlation. It is a number between –1 and 1
• x̅ – the mean of the values of the x- the S&P 500 to ensure that adding the stock that measures the strength and direction of the
variable won’t increase the systematic risk of his relationship between two variables.
• yi – the values of the y-variable in a portfolio.
sample
• ȳ – the mean of the values of the y- Step 1: Gather the prices for the last five years.
variable Step 2: Find the mean od S&P 500 and Apple.
Step 3:
STEPS TO CALCULATE THE CORRELATION Step 4: Multiple values from a and b
COEFFICIENT Step 5: Square root of both a and b
1. Obtain a data sample with the values of
x-variable and y-variable.
2. Calculate the means (averages) x̅ for
the x-variable and ȳ for the y-variable.
3. For the x-variable, subtract the mean
What is the Pearson correlation coefficient?
from each value of the x-variable (let’s
The Pearson correlation coefficient (r) is the
call this new variable “a”). Do the same
most widely used correlation coefficient and is
for the y-variable (let’s call this variable
known by many names:
“b”).
• Pearson’s r
4. Multiply each a-value by the Calculate the coefficient:
• Bivariate correlation
corresponding b-value and find the sum
• Pearson product-moment correlation
of these multiplications (the final value is
coefficient (PPMCC)
the numerator in the formula).
• The correlation coefficient
5. Square each a-value and calculate the
sum of the result
Interpretation: The coefficient indicates that the The Pearson correlation coefficient is a
6. Find the square root of the value
prices of the S&P 500 and Apple Inc. have a descriptive statistic, meaning that it summarizes
obtained in the previous step (this is the
high positive correlation. This means that their the characteristics of a dataset. Specifically, it
denominator in the formula).
respective prices tend to move in the same describes the strength and direction of the
7. Divide the value obtained in step 4 by
direction. Therefore, adding Apple to his linear relationship between two quantitative
the value obtained in step 7.
portfolio would, in fact, increase the level of variables.
systematic risk.

AE | 2D 9
Although interpretations of the relationship When r is 1 or –1, all the points fall exactly on
strength (also known as effect size) vary the line of best fit:
between disciplines, the table below gives
general rules of thumb:

When r is greater than .5 or less than –.5, the When to use the Pearson correlation coefficient
points are close to the line of best fit: Pearson correlation coefficient is a good
choice when all of the following are true:
• Both variables are quantitative: You will
need to use a different method if either
of the variables is qualitative.
• The variables are normally distributed:
You can create a histogram of each
variable to verify whether the
The Pearson correlation coefficient is also an distributions are approximately normal.
inferential statistic, meaning that it can be used It’s not a problem if the variables are a
to test statistical hypotheses. Specifically, we When r is between 0 and .3 or between 0 and –
little non-normal.
can test whether there is a significant .3, the points are far from the line of best fit:
• The data have no outliers: Outliers are
relationship between two variables. observations that don’t follow the same
patterns as the rest of the data. A
Visualizing the Pearson correlation coefficient scatterplot is one way to check for
Another way to think of the Pearson correlation outliers—look for points that are far
coefficient (r) is as a measure of how close the away from the others.
observations are to a line of best fit. • The relationship is linear: “Linear” means
that the relationship between the two
The Pearson correlation coefficient also tells you variables can be described reasonably
whether the slope of the line of best fit is When r is 0, a line of best fit is not helpful in well by a straight line. You can use a
negative or positive. When the slope is describing the relationship between the scatterplot to check whether the
negative, r is negative. When the slope is variables: relationship between two variables is
positive, r is positive. linear.

AE | 2D 10
Calculating the Pearson correlation coefficient called y—the formula will give the
Below is a formula for calculating the Pearson same answer either way.
correlation coefficient (r) - Next, add up the values of x and y.
(In the formula, this step is indicated
by the Σ symbol, which means “take
the sum of”.)
Step 3: Calculate the cross product and its sum
Example: Dataset Calculating the sums of x and y
In a final column, multiply together x and y (this
Imagine that you’re studying the relationship Weight = x
is called the cross product). Take the sum of the
between newborns’ weight and length. You Length = y
new column.
have the weights and lengths of the 10 babies
Calculating the cross product and its sum
born last month at your local hospital. After you Σx = 3.63 + 3.02 + 3.82 + 3.42 + 3.59 + 2.87 + 3.03
convert the imperial measurements to metric, + 3.46 + 3.36 + 3.30
you enter the data in a table: Σx = 33.5
Σy = 53.1 + 49.7 + 48.4 + 54.2 + 54.9 + 43.7 + 47.2
+ 45.2 + 54.4 + 50.4
Σy = 501.2

Step 2: Calculate x2 and y2 and their sums


Create two new columns that contain the
squares of x and y. Take the sums of the new
columns.
Calculating x2 and y2 and their sums

Step 4: Calculate r
Use the formula and the numbers you
calculated in the previous steps to find r.
Calculating r.
n = 10
Step 1: Calculate the sums of x and y
- Start by renaming the variables to
“x” and “y.” It doesn’t matter which
variable is called x and which is

AE | 2D 11
Step 1: Calculate the t value the t value is negative you should ignore the
Calculate the t value (a test statistic) using this minus sign.
formula: Example: Comparing the t value to the critical
value of t (t*)
t = 1.506
t* = 1.86
The t value is less than the critical value of t.

Step 4: Decide whether to reject the null


hypothesis
• If the t value is greater than the critical
value, then the relationship is statistically
significant (p < α). The data allows you
to reject the null hypothesis and
provides support for the alternative
hypothesis.
• If the t value is less than the critical
Step 2: Find the critical value of t value, then the relationship is not
Testing for the significance of the Pearson You can find the critical value of t (t*) in a t statistically significant (p > α). The data
correlation coefficient table. To use the table, you need to know three doesn’t allow you to reject the null
The Pearson correlation coefficient can also be things: hypothesis and doesn’t provide support
used to test whether the relationship between • The degrees of freedom (df): For Pearson for the alternative hypothesis.
two variables is significant. correlation tests, the formula is df = n – 2.
• Significance level (α): By convention, Example: Deciding whether to reject the null
The Pearson correlation of the sample is r. It is the significance level is usually .05. hypothesis. For the correlation between weight
an estimate of rho (ρ), the Pearson correlation • One-tailed or two-tailed: Most often, and height in a sample of 10 newborns, the t
of the population. Knowing r and n (the sample two-tailed is an appropriate choice for value is less than the critical value of t.
size), we can infer whether ρ is significantly correlations. Therefore, we don’t reject the null hypothesis
different from 0. that the Pearson correlation coefficient of the
• Null hypothesis (H0): ρ = 0 Example: Finding the critical value of t population (ρ) is 0. There is no significant
• Alternative hypothesis (Ha): ρ For a two-tailed test of significance at α = .05 relationship between weight and height (p >
≠0 and df = 8, the critical value of t (t*) is 1.86. .05).
(Note that a sample size of 10 is very small. It’s
To test the hypotheses, you can either use Step 3: Compare the t value to the critical value possible that you would find a significant
software like R or Stata or you can follow the Determine if the absolute t value is greater than relationship if you increased the sample size.)
three steps below. the critical value of t. “Absolute” means that if

AE | 2D 12
Reporting the Pearson correlation coefficient Spearman rank order correlation coefficient, in
If you decide to include a Pearson correlation short Spearman’s rank correlation, measures
(r) in your paper or thesis, you should report it in the strength and direction of association
your results section. between two ranked variables. It basically gives
the measure of monotonicity of the relation
You can follow these rules if you want to report between two variables i.e. how well the
statistics in APA Style: relationship between two variables could be
• You don’t need to provide a reference represented using a monotonic function.
or formula since the Pearson correlation The formula for Spearman’s rank coefficient is:
coefficient is a commonly used statistic. Step 1: Create a table for the given data.
• You should italicize r when reporting its
Step 2: Rank both the data in descending
value. order. The highest marks will get a rank of 1 and
• You shouldn’t include a leading zero (a where the lowest marks will get a rank of 5.
zero before the decimal point) since the ��� = Spearman’s rank correlation Step 3: Calculate the difference between the
Pearson correlation coefficient can’t be coefficient ranks (d) and the square value of d.
greater than one or less than negative di = Difference between the two ranks of each Step 4: Add all your d square values.
one. observation
• You should provide two significant digits n = Number of observations
after the decimal point.
The Spearman Rank Correlation can take a
When Pearson’s correlation coefficient is used value from +1 to -1 where,
as an inferential statistic (to test whether the • A value of +1 means a perfect
relationship is significant), r is reported alongside association of rank
its degrees of freedom and p value. The • A value of 0 means that there is no
degrees of freedom are reported in association between ranks Step 5: Insert these values into the formula.
parentheses beside r. • A value of -1 means a perfect negative
association of rank
Example: Reporting the Pearson correlation
coefficient in APA Style Example = 1 - (6 * 14) / 5(25 - 1)
Consider the score of 5 students in Math and = 0.3
Newborns’ weight and length were moderately Science that are mentioned in the table.
correlated, although the relationship was not Interpretation: The Spearman’s Rank Correlation
statistically significant, r(8) = .47, p > .17. for the given data is 0.3. The value is near 0,
which means that there is a weak correlation
Spearman Rank order between the two ranks.

AE | 2D 13
How to Interpret a Phi Coefficient Several version’s of Tau exist.
PHI COEFFICIENT Similar to a Pearson Correlation Coefficient, a • Tau-A and Tau-B are usually used for
A Phi Coefficient (sometimes called a Mean Phi Coefficient takes on values between -1 and square tables (with equal columns and
Square Contingency Coefficient) is a measure 1 where: rows). Tau-B will adjust for tied ranks.
of the association between two binary • -1 indicates a perfectly negative • Tau-C is usually used for rectangular
variables. relationship between the two variables. tables. For square tables, Tau-B and Tau-
For a given 2×2 table for two random variables • 0 indicates no association between the C are essentially the same.
x and y: two variables.
• 1 indicates a perfectly positive Formula:
relationship between the two variables. Kendall’s Tau = (C – D / C + D)
Where: C is the number of concordant pairs
In general, the further away a Phi Coefficient is and D is the number of discordant pairs.
from zero, the stronger the relationship between Example.
the two variables. Two interviewers ranked 12 candidates (A
through L) for a position. The results from most
The Phi Coefficient can be calculated as: In other words, the further away a Phi preferred to least preferred are:
Φ = (AD-BC) / √(A+B)(C+D)(A+C)(B+D) Coefficient is from zero, the more evidence • Interviewer 1: ABCDEFGHIJKL.
Example. there is for some type of systematic pattern • Interviewer 2: ABDCFEHGJILK.
Suppose we want to know whether or not between the two variables.
gender is associated with political party Calculate the Kendall Tau Correlation.
preference. We take a simple random sample KENDALL RANK CORRELATION COEFFICIENT Step 1: Make a table of rankings. The first
of 25 voters and survey them on their political Kendal Rank Correlation Coefficient, also column, “Candidate” is optional and for
party preference. The following table shows the known as Kendall’s Tau, is a non-parametric reference only. The rankings for Interviewer 1
results of the survey: measure of relationships between columns of should be in ascending order (from least to
ranked data. The Tau correlation coefficient greatest).
returns a value of 0 to 1, where:
• 0 is no relationship,
• 1 is a perfect relationship.

A quirk of this test is that it can also produce


negative values (i.e. from -1 to 0). Unlike a linear
We can calculate the Phi Coefficient between graph, a negative relationship doesn’t mean
the two variables as: much with ranked columns (other than you
Φ = (4*4-9*8) / √(4+9)(8+4)(4+8)(9+4) = (16-72) / perhaps switched the columns around), so just
√24336 = -0.3589 remove the negative sign when you’re
interpreting Tau.

AE | 2D 14
Step 4: Sum the values in the two columns:

When all concordant pairs have been


counted, it looks like this: Step 5: Insert the totals into the formula:
Kendall’s Tau = (C – D / C + D)
Step 2: Count the number of concordant pairs, = (61 – 5) / (61 + 5) = 56 / 66 = .85.
using the second column. Concordant pairs are
how many larger ranks are below a certain Interpretation: The Tau coefficient is .85,
rank. For example, the first rank in the second suggesting a strong relationship between the
interviewer’s column is a “1”, so all 11 ranks rankings.
below it are larger.
Perfect Correlation
Counting how many values are below the
second column seems very odd when you first
Step 3: Count the number of discordant pairs do it. But it does work. Just as a thought
and insert them into the next column. The experiment, here’s what the spreadsheet looks
number of discordant pairs is similar to Step 2, like if both interviewers were in perfect
only you’re looking for smaller ranks, not larger agreement:
ones.

However, going down the list to the third row (a


rank of 4), the rank immediately below (3) is
smaller, so it doesn’t count for a concordant
pair.

AE | 2D 15
used in hypothesis testing to determine whether However, note that you can only uses a t test to
a process or treatment actually has an effect compare two means. If you want to compare
on the population of interest, or whether two three or more means, use an ANOVA instead.
groups are different from one another.
The T Score
The t test tells you how significant the The t score is a ratio between the difference
differences between group means are. It lets between two groups and the difference within
you know if those differences in means could the groups.
have happened by chance. The t test is usually - Larger t scores = more difference
used when data sets follow a normal between groups.
distribution but you don’t know the population - Smaller t score = more similarity
And, inserting the totals into the formula we variance. between groups.
get: A t score of 3 tells you that the groups are three
Tau = (66 – 0) / (66 + 0) = 1, which is (as we For example, you might flip a coin 1,000 times times as different from each other as they are
expect) perfect agreement. and find the number of heads follows a normal within each other. So when you run a t test,
distribution for all trials. So you can calculate bigger t-values equal a greater probability that
Calculating Statistical Significance the sample variance from this data, but the the results are repeatable.
If you want to calculate statistical significance population variance is unknown. Or, a drug
for your result, use this formula to get a z-value: company may want to test a new cancer drug T-Values and P-values
to find out if it improves life expectancy. In an How big is “big enough”? Every t-value has a p-
experiment, there’s always a control group (a value to go with it. A p-value from a t test is the
group who are given a placebo, or “sugar probability that the results from your sample
pill”). So while the control group may show an data occurred by chance. P-values are from
Inserting the values from our results:
average life expectancy of +5 years, the group 0% to 100% and are usually written as a decimal
taking the new drug might have a life (for example, a p value of 5% is 0.05). Low p-
expectancy of +6 years. It would seem that the values indicate your data did not occur by
drug might work. But it could be due to a fluke. chance. For example, a p-value of .01 means
To test this, researchers would use a Student’s t- there is only a 1% probability that the results
Interpretation: Finding the area for a z-score of test to find out if the results are repeatable for from an experiment happened by
3.85 on a z-table gives an area of .0001 — a tiny an entire population. Calculating the Statistic / Test Types
probability value which tells you this result is
statistically significant. In addition, a t test uses a t-statistic and There are three main types of t-test:
compares this to t-distribution values to • An Independent Samples t-test
T-Test determine if the results are statistically compares the means for two groups.
A t test is a statistical test that is used to significant.
compare the means of two groups. It is often

AE | 2D 16
• A Paired sample t-test compares means Although the manufacturers are different, you
from the same group at different times might be subjecting them to the same
(say, one year apart). conditions.
• A One sample t-test tests the mean of a
single group against a known mean. Example.
Calculate a paired t test by hand for the
Paired T Test (Paired Samples T Test or following data:
Dependent Samples T Test)
A paired t test (also called a correlated pairs t-
test, a paired samples t test or dependent
samples t test) is where you run a t test on
dependent samples. Dependent samples are
essentially connected — they are tests on the
Step 3: Square the differences from Step 1.
same person or thing. For example:
• Knee MRI costs at two different hospitals,
• Two tests on the same person before
and after training,
• Two blood pressure measurements on
the same person using different
equipment.
Step 1: Subtract each Y score from each X
When to Choose a Paired T Test / Paired score.
Samples T Test / Dependent Samples T Test
Choose the paired t-test if you have two
measurements on the same item, person or
thing.

But you should also choose this test if you have Step 4: Add up all of the squared differences
two items that are being measured with a from Step 3.
unique condition. For example, you might be
measuring car safety performance in vehicle
research and testing and subject the cars to a
series of crash tests. Step 2: Add up all of the values from Step 1
then set this number aside for a moment.

AE | 2D 17
<.05. So we can reject the null hypothesis that
there is no difference between means.
However, note that you can ignore the minus
sign when comparing the two t-values as ±
indicates the direction; the p-value remains the
same for both directions.

Independent T Test
The independent t-test, also called the two
sample t-test, independent-samples t-test or
student's t-test, is an inferential statistical test
that determines whether there is a statistically
Step 5: Use the following formula to calculate significant difference between the means in
the t-score: two unrelated groups.

Independent sample t-test is a statistical


technique that is used to analyze the mean
1. The “ΣD” is the sum of Step 6: Subtract 1 from the sample size to get comparison of two independent groups. In
X-Y from Step 2. the degrees of freedom. We have 11 items. So independent samples t-test, when we take two
2. ΣD2: Sum of the 11 – 1 = 10. samples from the same population, then the
squared differences mean of the two samples may be identical. But
(from Step 4). Step 7: Find the p-value in the t-table, using the when samples are taken from two different
3. (ΣD)2: Sum of the degrees of freedom (df) in Step 6. But if you populations, then the mean of the sample may
differences (from Step don’t have a specified alpha level, use 0.05 differ. In this case, it is used to draw conclusions
2), squared. (5%). about the means of two populations, and used
to tell whether or not they are similar.
If you’re unfamiliar with the Σ notation used in So, for this example t test problem, with df = 10, Assumptions in independent samples t-test:
the t test, it basically means to “add everything the t-value is 2.228. • Assumes that the dependent variable is
up”. You may find this article useful: summation normally distributed.
notation. Step 8: In conclusion • Assumes that the variance of the two
Compare your t-table value from Step 7 (2.228) groups are the same as the dependent
to your calculated t-value (-2.74). variable.
The calculated t-value is greater than the table • Assumes that the two samples are
value at an alpha level of .05. In addition, note independent of each other.
that the p-value is less than the alpha level: p • Samples are drawn from the population
at random.

AE | 2D 18
• In independent sample t-test, all two population means are identical. Compare contrast, parametric tests compare the means
observations must be independent of the calculated value of the independent between two independent groups). In certain
each other. sample t-test with the table value of the sample situations, where the data are similarly shaped
• In independent sample t-test, t-test. If the calculated value of the (see assumptions), this is valid – but it should be
dependent variables must be measured independent sample t-test is greater than the noted that the medians are not actually
on an interval or ratio scale. table value of the predetermined significance involved in calculation of the Mann-Whitney U
Procedures for independent sample t-test: level, we will reject the null hypothesis and say test statistic. Two groups could have the same
1. Set up the hypothesis. that the means of the two groups are different. median and be significantly different
a. Null Hypothesis: It is assumed when the If the calculated value of the independent according to the Mann-Whitney U test.
means of the two groups are not significantly sample t-test is less than the table value, then
different. we will say that the means of the two groups When to use the Mann-Whitney U Test
b. Alternative Hypothesis: Assumes that the are the same. Non-parametric tests (sometimes referred to as
means of the two groups are significantly b.1 Mann-Whitney U Test ‘distribution-free tests’) are used when you
different. The Mann-Whitney U Test, also known as the assume the data in your populations of interest
2. Calculate the standard deviation for the Wilcoxon Rank Sum Test, is a non-parametric do not have a Normal distribution. You can
independent sample t-test by using this formula: statistical test used to compare two samples or think of the Mann Whitney U-test as analogous
groups. to the unpaired Student’s t-test, which you
would use when assuming your two populations
The Mann-Whitney U Test assesses whether two are normally distributed, as defined by their
sampled groups are likely to derive from the means and standard deviation (the parameters
3. Calculate the value of the independent
same population, and essentially asks; do these of the distributions).
sample t-test by using this formula:
two populations have the same shape with
regards to their data? In other words, we want
evidence as to whether the groups are drawn
from populations with different levels of a
4. Degree of freedom for independent sample
variable of interest. It follows that the
t-test:
hypotheses in a Mann-Whitney U Test are:
• The null hypothesis (H0) is that
where the two populations are
V= degree of freedom equal.
N1+N2= number of observations in both • The alternative hypothesis
samples of the independent sample t-test. (H1) is that the two
populations are not equal.
5. Hypothesis testing: In hypothesis testing for The Mann-Whitney U Test is a common
the independent sample t-test, statistical Some researchers interpret this as comparing statistical test that is used in many fields
decisions are made about whether or not the the medians between the two populations (in including economics, biological sciences and

AE | 2D 19
epidemiology. It is particularly useful when you • Sufficient sample size is needed for a
are assessing the difference between two valid test, usually more than 5
independent groups with low numbers of observations in each group.
individuals in each group (usually less than 30),
which are not normally distributed, and where Example.
the data are continuous. If you are interested in Consider a randomized controlled trial
comparing more than two groups which have evaluating a new anti-retroviral therapy for HIV.
skewed data, a Kruskal-Wallis One-Way analysis A pilot trial randomly assigned participants to
of variance (ANOVA) should be used. either the treated or untreated groups (N=14).
We want to assess the viral load (quantity of
Mann-Whitney U Test Assumptions virus per milliliter of blood) in the treated versus
Some key assumptions for Mann-Whitney U Test the untreated groups.
are detailed below:
• The variable being compared between The data are shown below:
the two groups must be continuous
(able to take any number in a range –
for example age, weight, height or heart
rate). This is because the test is based on These data are both skewed with a sample size
ranking the observations in each group. of n=7 in each treatment arm, and so a non- After summing the ranks for each group, the
• The data are assumed to take a non- parametric test is appropriate. Before we Mann-Whitney U test statistic is selected as the
Normal, or skewed, distribution. If your calculate the test, we choose a significance smallest of the two following calculated U
data are normally distributed, the level (usually α=0.05). The first step is to assign values:
unpaired Student’s t-test should be used ranks to the values from the full sample (both
to compare the two groups instead. treatment groups pooled together) in order
• While the data in both groups are not from smallest to largest. We can then generate
assumed to be Normal, the data are a test statistic based on the ranks.
assumed to be similar in shape across
the two groups. The table below shows the viral load values in
• The data should be two randomly the treated and untreated groups ranked
selected independent samples, smallest to largest, along with the summed
meaning the groups have no ranks of each group:
relationship to each other. If samples are
paired (for example, two measurements
from the same group of participants),
then a paired samples t-test should be
used instead.

AE | 2D 20
to generate a z-statistic and a significance two paired groups. The tests essentially
value generated this way. If the null hypothesis calculate the difference between sets of pairs
is true, the distribution of U approximates to a and analyze these differences to establish if
Normal distribution. they are statistically significantly different from
one another.
Next we determine a ‘critical value’ of U with
which to compare our calculated test statistic, KEY TAKEAWAYS
which we can do using a reference table of • The Wilcoxon test compares two paired
critical values and using our sample sizes (n=7 in groups and comes in two versions, the
both groups) and two-sided level of rank sum test, and signed rank test.
significance (α=0.05). • The goal of the test is to determine if two
or more sets of pairs are different from
In our current example, the critical value can one another in a statistically significant
be determined from the reference table as 8. manner.
Finally, we can use this to accept or reject the • Both versions of the model assume that
null hypothesis using the following decision rule: the pairs in the data come from
Where we let 1 denote the treated group and 2 Reject H0 if U ≤ 8. dependent populations, i.e., following
denote the untreated group (denotation of the same person or share price through
groups is arbitrary), where n1 and n2 are the Given that our U statistic is equal to the time or place.
number of participants and where R1 and R2 critical value, we can reject the null hypothesis
are the sums of the ranks in the treated and that the two groups are equal and accept the Understanding the Wilcoxon Test
untreated groups, respectively. In this example, alternative hypothesis that there is evidence of The rank sum and signed rank tests were both
U1=41 and U2=8. We therefore select U=8 as the a difference in viral load between the groups proposed by American statistician Frank
test statistic. treated with the new therapy versus untreated. Wilcoxon in a groundbreaking research paper
published in 1945. The tests laid the foundation
Important Note on Normal Approximation Wilcoxon Signed Rank for hypothesis testing of nonparametric
There are situations where the sample size may The Wilcoxon signed-rank test is a non- statistics, which are used for population data
be too large for the reference table to be used parametric statistical hypothesis test used either that can be ranked but do not have numerical
to calculate the exact probability distribution – to test the location of a population based on a values, such as customer satisfaction or music
in which case we can use a Normal sample of data, or to compare the locations of reviews. Nonparametric distributions do not
approximation instead. Since U is found by two populations using two matched samples. have parameters and cannot be defined by
adding together independent, similarly The one-sample version serves a purpose similar an equation as parametric distributions can.
distributed random samples, the central limit to that of the one-sample Student's t-test. The types of questions that the Wilcoxon test
theorem applies when the sample is large The Wilcoxon test, which can refer to either the can help answer include things like:
(usually >20 in each group). The standard rank sum test or the signed rank test version, is a
deviation of the sum of the ranks can be used nonparametric statistical test that compares

AE | 2D 21
•Are test scores different from 5th the population data does not follow a whether your data are significantly different
grade to 6th grade for the same normal distribution. from what you expected. There are two types
students? of Pearson’s chi-square tests:
• Does a particular drug have an Calculating a Wilcoxon Test Statistic • The chi-square goodness of fit test is
effect on health when tested on the The steps for arriving at a Wilcoxon signed rank used to test whether the frequency
same individuals? test statistic, W, are as follows: distribution of a categorical variable is
These models assume that the data comes 1. For each item in a sample of n items, different from your expectations.
from two matched, or dependent, populations, obtain a difference score, Di, between • The chi-square test of independence is
following the same person or stock through two measurements (i.e., subtract one used to test whether two categorical
time or place. The data is also assumed to be from the other). variables are related to each other.
continuous as opposed to discrete. Because it is 2. Neglect then positive or negative signs
a nonparametric test, it does not require a and obtain a set of n absolute Chi-square is often written as Χ2 and is
particular probability distribution of the differences |Di|. pronounced “kai-square” (rhymes with “eye-
dependent variable in the analysis. 3. Omit difference scores of zero, giving square”). It is also called chi-squared.
Types of the Wilcoxon Test you a set of n non-zero absolute
• The Wilcoxon rank sum test can be used difference scores, where n' ≤ n. Thus, n' What is a chi-square test?
to test the null hypothesis that two becomes the actual sample size. Pearson’s chi-square (Χ2) tests, often referred to
populations have the same continuous 4. Then, assign ranks Ri from 1 to n to each simply as chi-square tests, are among the most
distribution. A null hypothesis is a of the |Di| such that the smallest common nonparametric tests. Nonparametric
statistical test that says there's no absolute difference score gets rank 1 tests are used for data that don’t follow the
significant difference between two and the largest gets rank n. If two or assumptions of parametric tests, especially the
populations or variables. The base more |Di| are equal, they are each assumption of a normal distribution.
assumptions necessary to employ the assigned the average rank of the ranks If you want to test a hypothesis about the
rank sum test is that the data are from they would have been assigned distribution of a categorical variable you’ll
the same population and are paired, individually had ties in the data not need to use a chi-square test or another
the data can be measured on at least occurred. nonparametric test. Categorical variables can
an interval scale, and the data were 5. Now reassign the symbol "+" or "–" to be nominal or ordinal and represent groupings
chosen randomly and independently. each of the n ranks Ri, depending on such as species or nationalities. Because they
• The Wilcoxon signed rank test assumes whether Di was originally positive or can only have a few specific values, they can’t
that there is information in the negative. have a normal distribution.
magnitudes and signs of the differences 6. The Wilcoxon test statistic W is
between paired observations. As the subsequently obtained as the sum of the Note: Parametric tests can’t test hypotheses
nonparametric equivalent of the paired positive ranks. about the distribution of a categorical variable,
student's t-test, the signed rank can be Chi square test but they can involve a categorical variable as
used as an alternative to the t-test when A Pearson’s chi-square test is a statistical test for an independent variable (e.g., ANOVAs).
categorical data. It is used to determine

AE | 2D 22
Test hypotheses about frequency distributions frequencies are significantly different from what decide whether the difference is big enough to
There are two types of Pearson’s chi-square was expected, such as equal frequencies. be statistically significant, you compare the chi-
tests, but they both test whether the observed square value to a critical value.
frequency distribution of a categorical variable Example 2. Handedness and nationality
is significantly different from its expected Contingency table of the handedness of a When to use a chi-square test
frequency distribution. A frequency distribution sample of Americans and Canadians A Pearson’s chi-square test may be an
describes how observations are distributed appropriate option for your data if all of the
between different groups. following are true:
1. You want to test a hypothesis about one
Frequency distributions are often displayed or more categorical variables. If one or
using frequency distribution tables. A frequency more of your variables is quantitative,
distribution table shows the number of you should use a different statistical test.
observations in each group. When there are Alternatively, you could convert the
two categorical variables, you can use a quantitative variable into a categorical
specific type of frequency distribution table A chi-square test (a test of independence) can variable by separating the observations
called a contingency table to show the number test whether these observed frequencies are into intervals.
of observations in each combination of groups. significantly different from the frequencies 2. The sample was randomly selected from
Example 1. Bird species at a bird feeder. expected if handedness is unrelated to the population.
Frequency of visits by bird species at a bird nationality. 3. There are a minimum of five
feeder during a 24-hour period. Chi-square formula observations expected in each group or
Both of Pearson’s chi-square tests use the same combination of groups.
formula to calculate the test statistic, chi- Types of chi-square tests
square (Χ2): The two types of Pearson’s chi-square tests are:
• Chi-square goodness of fit test
• Chi-square test of
independence
Mathematically, these are actually the same
Where: test. However, we often think of them as
• Χ2 is the chi-square test statistic different tests because they’re used for
• Σ is the summation operator (it means different purposes.
“take the sum of”)
• O is the observed frequency Chi-square goodness of fit test
• E is the expected frequency You can use a chi-square goodness of fit test
A chi-square test (a chi-square goodness of fit The larger the difference between the when you have one categorical variable. It
test) can test whether these observed observations and the expectations (O − E in the allows you to test whether the frequency
equation), the bigger the chi-square will be. To distribution of the categorical variable is

AE | 2D 23
significantly different from your expectations. • Alternative hypothesis (HA): The as the proportion of people who like
Often, but not always, the expectation is that proportion of people who are left- vanilla.
the categories will have equal proportions. handed differs between nationalities. • Alternative hypothesis (HA): The
proportion of people who like chocolate
Example: Hypotheses for chi-square goodness Other types of chi-square tests is different from the proportion of people
of fit test Some consider the chi-square test of who like vanilla.
Expectation of equal proportions homogeneity to be another variety of Pearson’s
• Null hypothesis (H0): The bird species visit chi-square test. It tests whether two populations There are several other types of chi-square tests
the bird feeder in equal proportions. come from the same distribution by that are not Pearson’s chi-square tests,
• Alternative hypothesis (HA): The bird determining whether the two populations have including the test of a single variance and the
species visit the bird feeder in different the same proportions as each other. You can likelihood ratio chi-square test.
proportions. consider it simply a different way of thinking How to perform a chi-square test
about the chi-square test of independence. The exact procedure for performing a
Expectation of different proportions Pearson’s chi-square test depends on which
• Null hypothesis (H0): The bird species visit McNemar’s test is a test that uses the chi-square test you’re using, but it generally follows these
the bird feeder in the same proportions test statistic. It isn’t a variety of Pearson’s chi- steps:
as the average over the past five years. square test, but it’s closely related. You can 1. Create a table of the observed and
• Alternative hypothesis (HA): The bird conduct this test when you have a related pair expected frequencies. This can
species visit the bird feeder in different of categorical variables that each have two sometimes be the most difficult step
proportions from the average over the groups. It allows you to determine whether the because you will need to carefully
past five years. proportions of the variables are equal. consider which expected values are
most appropriate for your null
Chi-square test of independence Example: McNemar’s test hypothesis.
You can use a chi-square test of independence 2. Calculate the chi-square value from
when you have two categorical variables. It Suppose that a sample of 100 people is offered your observed and expected
allows you to test whether the two variables are two flavors of ice cream and asked whether frequencies using the chi-square
related to each other. If two variables are they like the taste of each. formula.
independent (unrelated), the probability of Contingency table of ice cream flavor 3. Find the critical chi-square value in a
belonging to a certain group of one variable preference chi-square critical value table or using
isn’t affected by the other variable. Like chocolate Dislike chocolate statistical software.
Like vanilla 47 32 4. Compare the chi-square value to the
Example: Chi-square test of independence Dislike 8 13 critical value to determine which is
• Null hypothesis (H0): The proportion of larger.
vanilla
people who are left-handed is the same 5. Decide whether to reject the null
• Null hypothesis (H0): The proportion of
for Americans and Canadians. hypothesis. You should reject the null
people who like chocolate is the same
hypothesis if the chi-square value is

AE | 2D 24
greater than the critical value. If you There was no significant relationship between between income and happiness. You survey
reject the null hypothesis, you can handedness and nationality, Χ2 (1, N = 428) = 500 people whose incomes range from 15k to
conclude that your data are 0.44, p = .505. 75k and ask them to rank their happiness on a
significantly different from what you scale from 1 to 10.
expected. Linear Regression Your independent variable (income) and
Linear regression analysis is used to predict the dependent variable (happiness) are both
How to report a chi-square test value of a variable based on the value of quantitative, so you can do a regression
If you decide to include a Pearson’s chi-square another variable. The variable you want to analysis to see if there is a linear relationship
test in your research paper, dissertation or predict is called the dependent variable. The between them.
thesis, you should report it in your results section. variable you are using to predict the other
You can follow these rules if you want to report variable's value is called the independent Note: If you have more than one independent
statistics in APA Style: variable. variable, use multiple linear regression instead.
• You don’t need to provide a reference Regression models describe the relationship Assumptions of simple linear regression
or formula since the chi-square test is a between variables by fitting a line to the Simple linear regression is a parametric test,
commonly used statistic. observed data. Linear regression models use a meaning that it makes certain assumptions
• Refer to chi-square using its Greek straight line, while logistic and nonlinear about the data. These assumptions are:
symbol, Χ2. Although the symbol looks regression models use a curved line. Regression 1. Homogeneity of variance
very similar to an “X” from the Latin allows you to estimate how a dependent (homoscedasticity): the size of the error
alphabet, it’s actually a different variable changes as the independent in our prediction doesn’t change
symbol. Greek symbols should not be variable(s) change. significantly across the values of the
italicized. independent variable.
• Include a space on either side of the Simple Linear Regression 2. Independence of observations: the
equal sign. Simple linear regression is used to estimate the observations in the dataset were
• If your chi-square is less than zero, you relationship between two quantitative collected using statistically valid
should include a leading zero (a zero variables. You can use simple linear regression sampling methods, and there are no
before the decimal point) since the chi- when you want to know: hidden relationships among
square can be greater than zero. 1. How strong the relationship is between observations.
• Provide two significant digits after the two variables (e.g., the relationship 3. Normality: The data follows a normal
decimal point. between rainfall and soil erosion). distribution.
• Report the chi-square alongside its 2. The value of the dependent variable at
degrees of freedom, sample size, and p a certain value of the independent Linear regression makes one additional
value, following this format: Χ2 (degrees variable (e.g., the amount of soil erosion assumption:
of freedom, N = sample size) = chi- at a certain level of rainfall) 4. The relationship between the
square value, p = p value). independent and dependent variable is
Example: Reporting a chi-square test Simple linear regression example: You are a linear: the line of best fit through the
social researcher interested in the relationship

AE | 2D 25
data points is a straight line (rather than • B0 is the intercept, the predicted value We can see that there is a positive relationship
a curve or some sort of grouping factor). of y when the x is 0. between the monthly e-commerce sales (Y)
If your data do not meet the assumptions of • B1 is the regression coefficient – how and online advertising costs (X).
homoscedasticity or normality, you may be much we expect y to change as x
able to use a nonparametric test instead, such increases. “The positive correlation means that the values
as the Spearman rank test. • x is the independent variable ( the of the dependent variable (y) increase when
Example. Data that doesn’t meet the variable we expect is influencing y). the values of the independent variable (x) rise.”
assumptions. • e is the error of the estimate, or how So, if we want to predict the monthly e-
much variation there is in our estimate of commerce sales from the online advertising
You think there is a linear relationship between the regression coefficient. costs, the higher the value of advertising costs,
cured meat consumption and the incidence of Linear regression finds the line of best fit line the higher our prediction of sales.
colorectal cancer in the U.S. However, you find through your data by searching for the We will use the above data to build our Scatter
that much more data has been collected at regression coefficient (B1) that minimizes the diagram.
high rates of meat consumption than at low total error (e) of the model. Now, let’ see how the Scatter diagram looks
rates of meat consumption, with the result that like:
there is much more variation in the estimate of Example 1.
cancer rates at the low range than at the high You have to study the relationship between the
range. Because the data violate the monthly e-commerce sales and the online
assumption of homoscedasticity, it doesn’t work advertising costs. You have the survey results for
for regression, but you perform a Spearman 7 online stores for the last year.
rank test instead. Your task is to find the equation of the straight
If your data violate the assumption of line that fits the data best.
independence of observations (e.g., if The following table represents the survey results
observations are repeated over time), you may from the 7 online stores.
be able to perform a linear mixed-effects
model that accounts for the additional
structure in the data. The Scatter plot shows how much one variable
affects another. In our example, above Scatter
How to perform a simple linear regression plot shows how much online advertising costs
Simple linear regression formula affect the monthly e-commerce sales. It shows
The formula for a simple linear regression is: their correlation.
Let’s see the simple linear regression equation.
Y = Β0 + Β1X
Y= 125.8 + 171.5*X
• y is the predicted value of the
dependent variable (y) for any given
value of the independent variable (x).

AE | 2D 26
Note: You can find easily the values for Β0 and The slope of 171.5 shows that each increase of Now, we see that we have a negative
Β1 with the help of paid or free statistical one unit in X, we predict the average of Y to relationship between the car price (Y) and car
software, online linear regression calculators or increase by an estimated 171.5 units. age(X) – as car age increases, price decreases.
Excel. All you need are the values for the The formula estimates that for each increase of When we use the simple linear regression
independent (x) and dependent (y) variables 1 dollar in online advertising costs, the equation, we have the following results:
(as those in the above table). expected monthly e-commerce sales are Y = Β0 + Β1X
Now, we have to see our regression line: predicted to increase by $171.5. Y = 7836 – 502.4*X
Graph of the Regression Line: This was a simple linear regression example for a Let’s use the data from the table and create
Diagram 2 positive relationship in business. Let’s see an our Scatter plot and linear regression line:
example of the negative relationship. Diagram 3:
Example 2.

You have to examine the relationship between


the age and price for used cars sold in the last
year by a car dealership company.

Here is the table of the data:

Linear regression aims to find the best-fitting


straight line through the points. The best-fitting Interpretation of the Results
line is known as the regression line. With an estimated slope of – 502.4, we can
If data points are closer when plotted to conclude that the average car price
making a straight line, it means the correlation decreases $502.2 for each year a car increases
between the two variables is higher. In our in age.
example, the relationship is strong.
The orange diagonal line in diagram 2 is the a. Multiple Linear Regression
regression line and shows the predicted score
on e-commerce sales for each possible value Multiple linear regression is used to estimate the
of the online advertising costs. relationship between two or more independent
variables and one dependent variable. You
Interpretation of the Results can use multiple linear regression when you
want to know:

AE | 2D 27
1. How strong the relationship is between In multiple linear regression, it is possible that To find the best-fit line for each independent
two or more independent variables and some of the independent variables are actually variable, multiple linear regression calculates
one dependent variable (e.g. how correlated with one another, so it is important three things:
rainfall, temperature, and amount of to check these before developing the • The regression coefficients that lead to
fertilizer added affect crop growth). regression model. If two independent variables the smallest overall model error.
2. The value of the dependent variable at are too highly correlated (r2 > ~0.6), then only • The t statistic of the overall model.
a certain value of the independent one of them should be used in the regression • The associated p value (how likely it is
variables (e.g. the expected yield of a model. that the t statistic would have occurred
crop at certain levels of rainfall, Normality: The data follows a normal by chance if the null hypothesis of no
temperature, and fertilizer addition). distribution. relationship between the independent
Linearity: the line of best fit through the data and dependent variables was true).
Multiple Linear Regression Example points is a straight line, rather than a curve or It then calculates the t statistic and p value for
You are a public health researcher interested in some sort of grouping factor. each regression coefficient in the model.
social factors that influence heart disease. You
survey 500 towns and gather data on the How to perform a Multiple Linear Regression Example.
percentage of people in each town who Multiple linear regression formula An analyst may want to know how the
smoke, the percentage of people in each town The formula for a multiple linear regression is: movement of the market affects the price of
who bike to work, and the percentage of ExxonMobil (XOM). In this case, their linear
people in each town who have heart disease. equation will have the value of the S&P 500
Because you have two independent variables • = the predicted value of the index as the independent variable, or
and one dependent variable, and all your dependent variable predictor, and the price of XOM as the
variables are quantitative, you can use multiple dependent variable.
• = the y-intercept (value of y when all
linear regression to analyze the relationship
other parameters are set to 0)
between them. In reality, multiple factors predict the outcome
• = the regression coefficient ( )
Assumptions of Multiple Linear Regression of an event. The price movement of
Multiple linear regression makes all of the same of the first independent variable ( ) ExxonMobil, for example, depends on more
assumptions as simple linear regression: (a.k.a. the effect that increasing the than just the performance of the overall
Homogeneity of variance (homoscedasticity): value of the independent variable has market. Other predictors such as the price of
the size of the error in our prediction doesn’t on the predicted y value) oil, interest rates, and the price movement of oil
change significantly across the values of the • … = do the same for however many futures can affect the price of Exon Mobil
independent variable. independent variables you are testing (XOM) and the stock prices of other oil
Independence of observations: the • = the regression coefficient of companies. To understand a relationship in
observations in the dataset were collected the last independent variable which more than two variables are present,
using statistically valid sampling methods, and • = model error (a.k.a. how much multiple linear regression is used.
there are no hidden relationships among variation there is in our estimate of )
variables.

AE | 2D 28
Multiple linear regression (MLR) is used to variable is differentiated with a number—1,2, 3, an event occurring, such as voted or didn't
determine a mathematical relationship among 4...p. The multiple regression model allows an vote, based on a given dataset of
several random variables.1 In other terms, MLR analyst to predict an outcome based on independent variables.
examines how multiple independent variables information provided on multiple explanatory
are related to one dependent variable. Once variables. Logistic regression is a supervised machine
each of the independent factors has been Still, the model is not always perfectly accurate learning algorithm that accomplishes binary
determined to predict the dependent variable, as each data point can differ slightly from the classification tasks by predicting the probability
the information on the multiple variables can outcome predicted by the model. The residual of an outcome, event, or observation. The
be used to create an accurate prediction on value, E, which is the difference between the model delivers a binary or dichotomous
the level of effect they have on the outcome actual outcome and the predicted outcome, is outcome limited to two possible outcomes:
variable. The model creates a relationship in included in the model to account for such slight yes/no, 0/1, or true/false.
the form of a straight line (linear) that best variations.
approximates all the individual data points. Assuming we run our XOM price regression Logical regression analyzes the relationship
model through a statistics computation between one or more independent variables
Referring to the MLR equation above, in our software, that returns this output: and classifies data into discrete classes. It is
example: extensively used in predictive modeling, where
• yi = dependent variable—the price of the model estimates the mathematical
XOM probability of whether an instance belongs to a
• xi1 = interest rates specific category or not.
• xi2 = oil price Interpretation of the Results For example, 0 – represents a negative class; 1 –
• xi3 = value of S&P 500 index An analyst would interpret this output to mean represents a positive class. Logistic regression is
• xi4= price of oil futures if other variables are held constant, the price of commonly used in binary classification
• B0 = y-intercept at time zero XOM will increase by 7.8% if the price of oil in problems where the outcome variable reveals
• B1 = regression coefficient that measures the markets increases by 1%. The model also either of the two categories (0 and 1).
a unit change in the dependent shows that the price of XOM will decrease by
variable when xi1 changes - the change 1.5% following a 1% rise in interest rates. R2 Some examples of such classifications and
in XOM price when interest rates change indicates that 86.5% of the variations in the instances where the binary response is
• B2 = coefficient value that measures a stock price of Exxon Mobil can be explained by expected or implied are:
unit change in the dependent variable changes in the interest rate, oil price, oil futures, 1. Determine the probability of heart attacks:
when xi2 changes—the change in XOM and S&P 500 index. With the help of a logistic model, medical
price when oil prices change practitioners can determine the relationship
Logistic Regression between variables such as the weight, exercise,
The least-squares estimates—B0, B1, B2…Bp—are Logistic regression is the type of statistical etc., of an individual and use it to predict
usually computed by statistical software. As model (also known as logit model) is often used whether the person will suffer from a heart
many variables can be included in the for classification and predictive analytics. attack or any other medical complication.
regression model in which each independent Logistic regression estimates the probability of

AE | 2D 29
2. Possibility of enrolling into a university: Logistic regression uses a logistic function called The following equation represents logistic
Application aggregators can determine the a sigmoid function to map predictions and their regression:
probability of a student getting accepted to a probabilities. The sigmoid function refers to an
particular university or a degree course in a S-shaped curve that converts any real value to
college by studying the relationship between a range between 0 and 1.
the estimator variables, such as GRE, GMAT, or
TOEFL scores. Moreover, if the output of the sigmoid function
3. Identifying spam emails: Email inboxes are (estimated probability) is greater than a
filtered to determine if the email predefined threshold on the graph, the model
communication is promotional/spam by predicts that the instance belongs to that class.
understanding the predictor variables and If the estimated probability is less than the
applying a logistic regression algorithm to predefined threshold, the model predicts that here,
check its authenticity. the instance does not belong to the class. • x = input value
For example, if the output of the sigmoid • y = predicted output
Key Advantages of Logistic Regression function is above 0.5, the output is considered • b0 = bias or intercept term
as 1. • b1 = coefficient for input (x)
The logistic regression analysis has several
This equation is similar to linear regression,
advantages in the field of machine learning.
On the other hand, if the output is less than 0.5, where the input values are combined linearly to
the output is classified as 0. Also, if the graph predict an output value using weights or
goes further to the negative end, the predicted coefficient values. However, unlike linear
value of y will be 0 and vice versa. In other regression, the output value modeled here is a
words, if the output of the sigmoid function is binary value (0 or 1) rather than a numeric
0.65, it implies that there are 65% chances of value.
the event occurring; a coin toss, for example.
The sigmoid function is referred to as an
activation function for logistic regression and is
defined as:

1. Easier to implement machine learning


methods: where,
2. Suitable for linearly separable datasets • e = base of natural logarithms
3. Provides valuable insights: • value = numerical value one wishes to
transform
Logistic Regression Equation and Assumptions

AE | 2D 30
While implementing logistic regression, one 2. Little or no multicollinearity between the
needs to keep in mind the following key predictor/explanatory variables
assumptions: 3. Linear relationship of independent variables
to log odds
4. Prefers large sample size
5. Problem with extreme outliers
6. Consider independent observations

The assumption can be verified by plotting


residuals against time, which signifies the order
of observations. The plot helps in determining
the presence or absence of a random pattern.
If a random pattern is present or detected, this
assumption may be considered violated.
Key properties of the logistic regression
equation Types of Logistic Regression with Examples
Typical properties of the logistic regression Logistic regression is classified into binary,
equation include: multinomial, and ordinal. Each type differs from
• Logistic regression’s dependent variable the other in execution and theory. Let’s
obeys ‘Bernoulli distribution’ understand each type in detail.
• Estimation/prediction is based on 1. Binary logistic regression
‘maximum likelihood.’ Binary logistic regression predicts the
• Logistic regression does not evaluate the relationship between the independent and
coefficient of determination (or R binary dependent variables.
squared) as observed in linear 2. Multinomial logistic regression
regression’. Instead, the model’s fitness is A categorical dependent variable has two or
assessed through a concordance. more discrete outcomes in a multinomial
regression type. This implies that this regression
For example, KS or Kolmogorov-Smirnov type has more than two possible outcomes.
statistics look at the difference between Examples:
cumulative events and cumulative non-events 1. Let’s say you want to predict the most
to determine the efficacy of models through popular transportation type for 2040.
credit scoring. Here, transport type equates to the
Logistic Regression Best Practices
dependent variable, and the possible
1. The dependent/response variable is binary or outcomes can be electric cars, electric
dichotomous trains, electric buses, and electric bikes.

AE | 2D 31
2. Predicting whether a student will join a
college, vocational/trade school, or
corporate industry.
3. Estimating the type of food consumed
by pets, the outcome may be wet food,
dry food, or junk food.
3.Ordinal logistic regression
Ordinal logistic regression applies when the
dependent variable is in an ordered state (i.e.,
ordinal). The dependent variable (y) specifies
an order with two or more categories or levels.
Examples: Dependent variables represent,
1. Formal shirt size: Outcomes = In this figure, model B represents a better fit
XS/S/M/L/XL than model A. This is because, although model
2. Survey answers: Outcomes = A shows high variability, model B seems to be
Agree/Disagree/Unsure more precise.
3. Scores on a math test: Outcomes = 4. Appropriately interpret the results
Poor/Average/Good A logistic model is accurate when it has a fine-
tuned build strategy and when the
Logistic Regression Best Practices for 2022 interpretation of the results produced by it is
Logistic regression can produce an accurate made right. Generally, a model is rated purely
Representation of Two Logistic Regression by analyzing the statistical significance of the
model if some best practices are followed, from
Models estimates. However, not much attention is
independent variable selection and choice of
model building strategy to validating the model given to the magnitude of the coefficients.
1. Identify dependent variables to ensure the Thus, interpreting the coefficients and
results.
model’s consistency discussing how the results relate to the research
2. Discover the technical requirements of the hypothesis or question is one of the good
model practices for logistic regression.
Coefficients are easy to interpret in linear
3. Estimate the model and evaluate the regression but not in logistic regression, as the
goodness of the fit estimates produced in the latter are not as
intuitive. In logistic type regression, the logit
transformation reveals the independent
variable’s impact on the variation of the
dependent variable’s natural logarithm of the
odds.

AE | 2D 32
5. Validate observed results psychology and later expanded to subjects The ANOVA test allows a comparison of more
Another critical practice that researchers can that were more complex. than two groups at the same time to determine
implement is validating the observed results Key Takeaways whether a relationship exists between them.
with a subsample of the original dataset. This • Analysis of variance, or ANOVA, is a The result of the ANOVA formula, the F statistic
practice makes the model results more reliable, statistical method that separates (also called the F-ratio), allows for the analysis
especially when working with smaller samples. observed variance data into different of multiple groups of data to determine the
components to use for additional tests. variability between samples and within
6.Analysis of Variance (ANOVA) • A one-way ANOVA is used for three or samples.
more groups of data, to gain If no real difference exists between the tested
ANOVA stands for Analysis of Variance. It is a information about the relationship groups, which is called the null hypothesis, the
statistical method used to analyze the between the dependent and result of the ANOVA's F-ratio statistic will be
differences between the means of two or more independent variables. close to 1. The distribution of all possible values
groups or treatments. It is often used to • If no true variance exists between the of the F statistic is the F-distribution. This is
determine whether there are any statistically groups, the ANOVA's F-ratio should actually a group of distribution functions, with
significant differences between the means of equal close to 1. two characteristic numbers, called the
different groups. numerator degrees of freedom and the
Analysis of variance (ANOVA) is an analysis tool Formula for ANOVA denominator degrees of freedom.
used in statistics that splits an observed Example of How to Use ANOVA
aggregate variability found inside a data set A researcher might, for example, test students
into two parts: systematic factors and random from multiple colleges to see if students from
factors. The systematic factors have a statistical where: one of the colleges consistently outperform
influence on the given data set, while the F = ANOVA coefficient students from the other colleges. In a business
random factors do not. Analysts use the MST = Mean sum of squares due to treatment application, an R&D researcher might test two
ANOVA test to determine the influence that MSE = Mean sum of squares due to error different processes of creating a product to see
independent variables have on the dependent if one process is better than the other in terms
variable in a regression study. What Does the Analysis of Variance Reveal? of cost efficiency.
The t- and z-test methods developed in the 20th The ANOVA test is the initial step in analyzing The type of ANOVA test used depends on a
century were used for statistical analysis until factors that affect a given data set. Once the number of factors. It is applied when data
1918, when Ronald Fisher created the analysis test is finished, an analyst performs additional needs to be experimental. Analysis of variance
of variance method. ANOVA is also called the testing on the methodical factors that is employed if there is no access to statistical
Fisher analysis of variance, and it is the measurably contribute to the data set's software resulting in computing ANOVA by
extension of the t- and z-tests. The term inconsistency. The analyst utilizes the ANOVA hand. It is simple to use and best suited for small
became well-known in 1925, after appearing in test results in an f-test to generate additional samples. With many experimental designs, the
Fisher's book, "Statistical Methods for Research data that aligns with the proposed regression sample sizes have to be the same for the
Workers."3 It was employed in experimental models. various factor level combinations.

AE | 2D 33
ANOVA is helpful for testing three or more factors and tests the effect of two factors at the rejected, post-hoc tests are used to identify the
variables. It is similar to multiple two-sample t- same time. population means that are different.
tests. However, it results in fewer type I errors How does ANOVA differ from a T test? Post hoc in Latin means ‘after this’. Simply put, a
and is appropriate for a range of issues. ANOVA ANOVA differs from T tests in that ANOVA can post-hoc analysis refers to a statistical analysis
groups differences by comparing the means of compare three or more groups while T tests are specified after a study has been concluded
each group and includes spreading out the only useful for comparing two groups at one and the data collected. A post-hoc test is done
variance into diverse sources. It is employed time. to identify exactly which groups differ from
with subjects, test groups, between groups and What is Analysis of Covariance (ANCOVA)? each other. Therefore, such tests are also called
within groups. Analysis of Covariance combines ANOVA and multiple comparison tests.
One-Way ANOVA Versus Two-Way ANOVA regression. It can be useful for understanding
There are two main types of ANOVA: one-way within-group variance that ANOVA tests do not Purpose of post-hoc test
(or unidirectional) and two-way. There also explain.
variations of ANOVA. For example, MANOVA Does ANOVA rely on any assumptions? When the null hypothesis is rejected in an
(multivariate ANOVA) differs from ANOVA as Yes, ANOVA tests assume that the data is omnibus test (a test that provides overall results
the former tests for multiple dependent normally distributed and that the levels of for study data, e.g. ANOVA), it means that at
variables simultaneously while the latter variance in each group is roughly equal. Finally, least one parameter is significant. The question
assesses only one dependent variable at a it assumes that all observations are made to be asked at this stage is:
time. One-way or two-way refers to the number independently. If these assumptions are not Which groups significantly differ from the others
of independent variables in your analysis of accurate, ANOVA may not be useful for in terms of the mean?
variance test. A one-way ANOVA evaluates the comparing groups. Types of studies that commonly use post-hoc
impact of a sole factor on a sole response The Bottom Line analysis
variable. It determines whether all the samples ANOVA is a good way to compare more than A post-hoc analysis can be conducted for
are the same. The one-way ANOVA is used to two groups to identify relationships between proportions and frequencies, but it is mostly
determine whether there are any statistically them. The technique can be used in scholarly used for testing mean differences. The following
significant differences between the means of settings to analyze research or in the world of types of research involve post-hoc analyses.
three or more independent (unrelated) groups. finance to try to predict future movements in A. In any discipline, studies investigating
A two-way ANOVA is an extension of the one- stock prices. Understanding how ANOVA works differences between groups will use post-hoc
way ANOVA. With a one-way, you have one and when it may be a useful tool can be tests when the null hypothesis of an ANOVA
independent variable affecting a dependent helpful for advanced investors. model is rejected.
variable. With a two-way ANOVA, there are a. Post Hoc Test Here is an example.
two independents. For example, a two-way A priori comparisons are performed before the A researcher wants to investigate differences in
ANOVA allows a company to compare worker data are collected, and post-hoc (or a the effectiveness of TikTok, Instagram and
productivity based on two independent posteriori) comparisons are done after the data Facebook influencers in promoting a
variables, such as salary and skill set. It is utilized have been collected. When the null hypothesis nutraceutical brand. Let’s say that, by ANOVA,
to observe the interaction between the two of an analysis of variance (ANOVA) model is the null hypothesis (that all three influencer
types have similar effectiveness) is rejected. A

AE | 2D 34
post-hoc pairwise comparison may then reveal significance in the dependent variable, even order to eliminate multiple spurious positives,
that Instagram influencers have a significantly when there is none. the alpha value needs to be lowered to
higher effectiveness in promoting the brand If a particular test, such as a linear regression, account for the number of comparisons being
than TikTok and Facebook influencers, while the thus yields correct results 99% of the time, performed.
latter two are similar. running the same regression on 100 different The test is named for the Italian mathematician
B. In medicine, post-hoc analyses may be used samples could lead to at least one false- who developed it, Carlo Emilio Bonferroni
in clinical trials if the original hypothesis does positive result at some point. The Bonferroni test (1892–1960).1 Other types of multiple
not hold (e.g. the primary outcome being the attempts to prevent data from incorrectly comparison tests include Scheffé's test and the
antidiabetic effect of a drug). appearing to be statistically significant like this Tukey-Kramer method test. A criticism of the
Triallists then re-examine the dataset for other by making an adjustment during comparison Bonferroni test is that it is too conservative and
outcomes (not originally planned, e.g. testing. may fail to catch some significant findings.
improvement in renal outcomes in diabetes Key Takeaways In statistics, a null hypothesis is essentially the
patients) and perform statistical analysis to • The Bonferroni test is a statistical test belief that there's no statistical difference
determine other valuable results from the trial. used to reduce the instance of a false between two data sets being compared.
Note: For most clinical trials, the research positive. Hypothesis testing involves testing a statistical
questions and statistical tests must be defined • In particular, Bonferroni designed an sample to confirm or reject a null hypothesis.
before observing the research outcomes, even adjustment to prevent data from The test is performed by taking a random
before the first patient is enrolled. Primary, incorrectly appearing to be statistically sample of a population or group. While the null
secondary and exploratory outcome measures significant. hypothesis is tested, the alternative hypothesis is
should be established beforehand, while post- • An important limitation of Bonferroni also tested, whereby the two results are
hoc outcome measures can be specified after correction is that it may lead analysts to mutually exclusive.
the trial has started. This ‘pre-registration’ avoids mix actual true results. However, with any testing of a null hypothesis,
the practice of outcome switching (reporting there's the expectation that a false-positive
something different from what was originally Understanding the Bonferroni Test result could occur. This is formally called a Type
planned). Pre-specified and post-hoc outcome The Bonferroni test, also known as "Bonferroni I error, and as a result, an error rate that reflects
measures must be clearly indicated in the correction" or "Bonferroni adjustment" suggests the likelihood of a Type I error is assigned to the
analysis section, in a way that makes it possible that the p-value for each test must be equal to test. In other words, a certain percentage of
to readily distinguish between them. its alpha divided by the number of tests the results will likely yield a false positive.
C. Analyses of pooled data from completed performed.
trials comprise a type of post-hoc study as well. The Bonferroni test is a multiple-comparison Using Bonferroni Correction
a.1 Bonferrone Test correction used when several dependent or For example, an error rate of 5% might typically
The Bonferroni test is a type of multiple independent statistical tests are being be assigned to a statistical test, meaning that
comparison test used in statistical analysis. performed simultaneously. The reason is that 5% of the time there will likely be a false
When performing a hypothesis test with multiple while a given alpha value may be appropriate positive. This 5% error rate is called the alpha
comparisons, eventually a result could occur for each individual comparison, it is not level. However, when many comparisons are
that appears to demonstrate statistical appropriate for the set of all comparisons. In being made in an analysis, the error rate for

AE | 2D 35
each comparison can impact the other results, Tukey’s HSD). The Scheffe test is the most • A = 36.00,
creating multiple false positives. flexible, but it is also the test with the lowest • B = 34.50,
Bonferroni designed his method of correcting statistical power. Deciding which test to run • C = 35.60,
for the increased error rates in hypothesis largely depends on what comparisons you’re • D = 36.21.
testing that had multiple comparisons. interested in:
Bonferroni's adjustment is calculated by taking • If you only want to make pairwise ANOVA Data (from the ANOVA output)
the number of tests and dividing it into the comparisons, run the Tukey procedure • F-value = 3.24 (from the f-table, with 3
alpha value. Using the 5% error rate from our because it will have a narrower numerator degrees of freedom and 16
example, two tests would yield an error rate of confidence interval. denominator degrees of freedom at an
0.025 or (.05/2) while four tests would therefore • If you want to compare all possible α of 0.05).
have an error rate of .0125 or (.05/4). Notice simple and complex pairs of means, run • Mean Squared Error: 0.028.
that the error rate decreases as the sample size the Scheffe test as it will have a narrower
increases. confidence interval. Steps
a.2 Scheffe Test Step 1: Calculate the absolute values of pair
The Scheffé test is used to make unplanned wise differences between sample means. You’ll
comparisons, rather than pre-planned have to figure out all the possible combinations.
comparisons, among group means in an Calculating the Test For four samples, there are 6 possible
analysis of variance (ANOVA) experiment. The Only run this test if you have rejected the null combinations of two: AB AC AD BC BD and CD.
Scheffé test has the advantage of giving the hypothesis in an ANOVA test, indicating that the For example, for AB the absolute difference
experimenter the flexibility to test any means are not the same. Otherwise, the means (|A-B|) is 36.00 – 34.50 = 1.50.
comparisons that appear interesting. are equal and so there is no point in running this
The Scheffe Test (also called Scheffe’s test.
procedure or Scheffe’s method) is a post-hoc • The null hypothesis for the test is that all
test used in Analysis of Variance. It is named for means are the same: H0: μi = μj.
the American statistician Henry Scheffe. After • The alternate hypothesis is that the
you have run ANOVA and got a significant F- means are not the same: H0: μi ≠ μj.
statistic (i.e. you have rejected the null Like most statistical tests, you find a critical
hypothesis that the means are the same), then value and then compare it with a test statistic. Step 2: Use the following formula to find a set of
you run Sheffe’s test to find out which pairs of Reject the null hypothesis if the Scheffe test Scheffe formula values (you will use these in the
means are significant. The Scheffe test corrects statistic is greater than the critical value. next Step):
alpha for simple and complex mean The steps that follow are for calculating simple
comparisons. Complex mean comparisons pairwise comparisons by hand for four samples
involve comparing more than one pair of A B C and D. You’ll need descriptive statistics Where:
means simultaneously. data (the sample means) plus data from • (k – 1) is the between samples degrees
Out of the three mean comparisons test you ANOVA (like mean squared error). of freedom. For this set of data, that’s 3
can run (the other two are Fisher’s LSD and Sample Means for Samples ABCD: (4 – 1 = 3).

AE | 2D 36
• f , 1 * n − k k − ; α = the f-value (from One commonly used post-hoc test is Fisher’s
ANOVA). For this set of data, the f-value least significant difference test.
is 3.24. To perform this test, we first calculate the
• MSE = mean square error (from ANOVA). following test statistic:

It’s easier than it looks! Once you’ve found the


df and MSE from your ANOVA output, you only
need to change the sample sizes in the formula where:
for each comparison. (The sample are A = 5, B • t.025, DFw: The t-critical value from the t-
= 6, C = 4, D = 5). distribution table with α = .025 and DFw is
For example, comparing A and B we get: the degrees of freedom within groups
from the ANOVA table.
• MSW: The mean squares within groups
And comparing A and C we get:
a.3 Fisher LSD from the ANOVA table.
Fisher's least significant difference (LSD) • n1, n2: The sample sizes of each group
procedure is a two-step testing procedure for
pairwise comparisons of several treatment We can then compare the mean difference
groups. between each group to this test statistic. If the
A one-way ANOVA is used to determine absolute value of the mean difference
whether or not there is a statistically significant between two groups is greater than the test
difference between the means of three or statistic, we can declare that there is a
more independent groups. statistically significant difference between the
The hypotheses used in an ANOVA are as group means.
follows: The following example shows how to perform
Step 3: Compare the values in Step 1 to the
H0: The means are equal for each group. Fisher’s least significant difference test in
values in Step 2 and find any values that are
HA: At least one of the means is different from practice.
larger than the comparison values. Larger
the others. Example: Fisher’s Least Significance Difference
values are statistically significant at your chosen
If the p-value from the ANOVA is less than some Test
alpha level (in this case, 0.05). For this example,
significance level (like α = .05), we can reject Suppose a professor wants to know whether or
the only value that isn’t significant is AD,
the null hypothesis and conclude that at least not three different studying techniques lead to
because 0.21 is smaller than 0.33.
one of the group means is different from the different exam scores among students. To test
others. this, she randomly assigns 10 students to use
But in order to find out exactly which groups are each studying technique and records their
different from each other, we must conduct a exam scores.
post-hoc test.

AE | 2D 37
The following table shows the exam scores for the mean exam scores between the three Tukey's range test, also known as Tukey's test,
each student based on the studying technique groups are equal. Tukey method, Tukey's honest significance test,
they used: Thus, we can proceed to perform Fisher’s least or Tukey's HSD (honestly significant difference)
significant difference test to determine which test, is a single-step multiple comparison
group means are different. procedure and statistical test. It can be used to
Using values from the output of the ANOVA, we find means that are significantly different from
can calculate Fisher’s test statistic as: each other.
Named after John Tukey, it compares all
possible pairs of means, and is based on a
studentized range distribution (q) (this
distribution is similar to the distribution of t from
the t-test. See below).
Tukey's test compares the means of every
treatment to the means of every other
treatment; that is, it applies simultaneously to
We can then calculate the absolute mean the set of all pairwise comparisons
difference between each group:

and identifies any difference between two


means that is greater than the expected
standard error. The confidence coefficient for
The professor performs a one-way ANOVA and
the set, when all sample sizes are equal, is
get the following results:
exactly 1 - α for any 0 ≤ α ≤ 1 . For unequal
The absolute mean differences between sample sizes, the confidence coefficient is
technique 1 vs. technique 2 and technique 1 greater than 1 − α. In other words, the Tukey
vs. technique 3 are greater than Fisher’s test method is conservative when there are
statistic, thus we can conclude that these unequal sample sizes.
techniques lead to statistically significantly This test is often followed by the Compact
different mean exam scores. Letter Display (CLD) statistical procedure to
render the output of this test more transparent
Interpretation of the Result to non-statistician audiences.
We can also conclude that there is no Assumptions
significant difference in mean exam scores 1. The observations being tested are
between technique 2 and technique 3. independent within and among the
Since the p-value in the ANOVA table (.018771)
is less than .05, we can conclude that not all of groups.
a.4 Tukey LSD/Tukey’s Range Test

AE | 2D 38
2. The groups associated with each mean size n from each of k populations with the same
in the test are normally distributed. normal distribution N(μ, σ2) and suppose that
3. There is equal within-group variance
across the groups associated with each ymin is the smallest of these sample means where n i and n j are the sizes of groups i and j
mean in the test (homogeneity of and ymax is the largest of these sample means, respectively. The degrees of freedom for the
variance). and suppose S2 is the pooled sample variance whole design is also applied.
The test statistic from these samples. Then the following random Comparing ANOVA and Tukey–Kramer tests
Tukey's test is based on a formula very similar to variable has a Studentized range distribution. Both ANOVA and Tukey–Kramer tests are based
that of the t-test. In fact, Tukey's test is on the same assumptions. However, these two
essentially a t-test, except that it corrects for tests for k groups (i.e. μ1 = μ2 = ... = μk) may result
family-wise error rate. in logical contradictions when k> 2, even if the
The formula for Tukey's test is assumptions hold. It is possible to generate a set
This value of q is the basis of the critical value of
q, based on three factors: of pseudorandom samples of strictly positive
1. α (the Type I error rate, or the probability measure such that hypothesis μ1 = μ2 is rejected
of rejecting a true null hypothesis) at significance level 1 – α > 0.95 while μ1 = μ2 =
where YA is the larger of the two means being
2. k (the number of populations)
compared, YB is the smaller of the two means μ3 is not rejected even at 1,- α = 0.975.
3. df (the number of degrees of freedom
being compared, and SE is the standard error a.5 Dunnet Test
(N – k) where N is the total number of
of the sum of the means. Dunnett’s Test (also called Dunnett’s Method or
observations)
This qs value can then be compared to a q Dunnett’s Multiple Comparison) compares
The distribution of q has been tabulated and
value from the studentized range distribution. If means from several experimental groups
appears in many textbooks on statistics. In some
the qs value is larger than the critical value qα against a control group mean to see is there is
tables the distribution of q has been tabulated
obtained from the distribution, the two means a difference. When an ANOVA test has
are said to be significantly different at level α : 0 without the factor. To understand which significant findings, it doesn’t report which pairs
table it is, we can compute the result for k = 2 of means are different. Dunnett’s can be used
≤ α ≤ 1. and compare it to the result of the Student's t- after the ANOVA has been run to identify the
Since the null hypothesis for Tukey's test states distribution with the same degrees of freedom pairs with significant differences.
that all means being compared are from the and the same α. In addition, R offers a One fixed “control” group is compared to all of
same population (i.e. μ1 = μ2 = μ3 = ... = μk), the cumulative distribution function (ptukey) and a the other samples, so it should only be used
means should be normally distributed quantile function (qtukey) for α. when you have a control group. If you don’t
(according to the central limit theorem). This Confidence limits have a control group, use Tukey’s Test.
gives rise to the normality assumption of Tukey's The Tukey confidence limits for all pairwise Calculation
test. comparisons with confidence coefficient of at As Dunnett’s compares two groups, it acts
The studentized range (q) distribution least 1 − α are similarly to a t-test. The following formula gives
The Tukey method uses the studentized range you a value that you can use to compare
distribution. Suppose that we take a sample of mean differences. The formula is:

AE | 2D 39
Step 3: Find “n”, the number in one group. For
this example, that’s 5:

Step 1: Look up the tDunnett critical value in the


Dunnett-critical value table. You’ll need:
• Your chosen alpha level (usually 5%), Solving the formula, we get:
• Sample size, n, DDunnett = 2.65 * 2.481 = 6.575.
• Degrees of freedom from the ANOVA The answer (6.575) is the critical distance
“Within Groups” output. This is the “n” in between means. If the distance between a
the left hand column of the table. control group mean and an experimental
For example, let’s say you had an alpha level of group mean is greater than 6.575, then that
5%, a sample size of 5 and 20 df. The Dunnett distance is significant. For example, let’s say you
critical value is 2.65. had a control group mean of 52 and three
experimental group means of 55, 64 and 72.
The differences between means (absolute
values) are:
• Group 1 55 – 52 = 3
• Group 2 64 – 52 = 12
• Group 3 72 – 52 = 20

Interpretation of the Result


Only groups 2 and 3 exceed the Dunnett’s
Plug the value into the formula: distance of 6.575, so they are the significant
pairs.

Step 2: Find the mean squares (MS) of the within


group in the ANOVA source table. Plug that
value into the above formula. For example, let’s
say your MS within is 15.39:

AE | 2D 40

You might also like