Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Confounding

variables
E X P E R I M E N TA L D E S I G N I N P Y T H O N

Luke Hayden
Instructor
Obvious conclusion?
print(p9.ggplot(df)+ p9.aes(x= 'Team', y= 'Weight')+ p9.geom_boxplot())

EXPERIMENTAL DESIGN IN PYTHON


Maybe not...
print(p9.ggplot(df)+ p9.aes(x= 'Team', y= 'Weight', fill="Event")+ p9.geom_boxplot())

EXPERIMENTAL DESIGN IN PYTHON


Interpretation
Possible options:

1. Differences due to country

2. Differences due to event

3. Differences due to both event & country

Hard to choose between these

EXPERIMENTAL DESIGN IN PYTHON


Confounding variables
Confounding variable

Extra variable that is not accounted for

Another variable that also affects your variable of interest

Example

Expensive cars and higher test scores in school

Both linked to family income

EXPERIMENTAL DESIGN IN PYTHON


Let's practice!
E X P E R I M E N TA L D E S I G N I N P Y T H O N
Blocking &
Randomization
E X P E R I M E N TA L D E S I G N I N P Y T H O N

Luke Hayden
Instructor
Making comparisons
Compare apples with apples

Only variable of interest should differ between groups

Remove sources of variation

See variation of interest

EXPERIMENTAL DESIGN IN PYTHON


Random sampling
Simple way to assign to treatments

import pandas as pd
from scipy import stats

seed= 1916
subset_A = df[df.Sample == "A"].sample(n= 30, random_state= seed)
subset_B = df[df.Sample == "B"].sample(n= 30, random_state= seed)

t_result = stats.ttest_ind(subset_A.value, subset_B.value)

EXPERIMENTAL DESIGN IN PYTHON


Other sources of variation
Example

Two potato varieties (Roosters & Records)

Two fertilizers tested (A & B)

Variety effect could confound

EXPERIMENTAL DESIGN IN PYTHON


Blocking
Solution to confounding Design

Control for confounding by balancing in Variety Fertilizer A Fertilizer B


respect of other variable
Records 10 10
In example
Roosters 10 10
10 elds of each variety treated with each
fertilizer

EXPERIMENTAL DESIGN IN PYTHON


Implementing a blocked design
import pandas as pd

block1 = df[(df.Variety == "Roosters") ].sample(n=15, random_state= seed)


block2 = df[(df.Variety == "Records") ].sample(n=15, random_state= seed)

fertAtreatment = pd.concat([block1, block2])

EXPERIMENTAL DESIGN IN PYTHON


Paired samples
Special case 2017 yield 2018 yield

Control for individual variation 60.2 63.2


Increase statistical power by reducing noise
12 15.6
Example
13.8 14.8
Yield of 5 elds before/after change of
91.8 96.7
fertilizer
50 53

EXPERIMENTAL DESIGN IN PYTHON


Implementing a paired t-test
from scipy import stats

yields2018= [60.2, 12, 13.8, 91.8, 50]


yields2019 = [63.2, 15.6, 14.8, 96.7, 53]

ttest = stats.ttest_rel(yields2018,yields2019)

print(ttest[1])

p-value:

0.007894143467973484

EXPERIMENTAL DESIGN IN PYTHON


Let's practice!
E X P E R I M E N TA L D E S I G N I N P Y T H O N
ANOVA
E X P E R I M E N TA L D E S I G N I N P Y T H O N

Full Name
Instructor
Factors
Independent variables (Factors)

Experimentally manipulated/examined

Dependent variables

Try to understand their patterns

Approaches

t-test: one dependent variable, one discrete independent variable with two levels

How to deal with more complexity?

EXPERIMENTAL DESIGN IN PYTHON


ANOVA
Analysis of variance

Use

Generalize t-test to broader context

Approach

Partition variation

Multiple simultaneous tests

EXPERIMENTAL DESIGN IN PYTHON


One-way ANOVA
Use

One factor with 3+ levels

Does factor affect sample mean?

Example:

Do these three fertilizers yield different


mean potato production?

EXPERIMENTAL DESIGN IN PYTHON


Implementing a one-way ANOVA
from scipy import stats

array_fertA = df[df.Fertilizer == "A"].Production


array_fertB = df[df.Fertilizer == "B"].Production
array_fertC = df[df.Fertilizer == "C"].Production

anova = stats.f_oneway(array_fertA, array_fertB, array_fertC)

print(anova[1])

0.00

EXPERIMENTAL DESIGN IN PYTHON


Two-way ANOVA
Use

Two factors with 2+ levels

Does each factor explain variation in the


dependent variable?

Example

Potato production experiment:


2 varieties, 2 fertilizers

Examine effect on production

EXPERIMENTAL DESIGN IN PYTHON


Implementing a two-way ANOVA
import statsmodels as sm

formula = 'Production ~ Fertilizer + Variety'


model = sm.api.formula.ols(formula, data=df).fit()

aov_table = sm.api.stats.anova_lm(model, typ=2)


print(aov_table)

sum_sq df F PR(>F)

Fertilizer 1.0 p-value

Variety 1.0 p-value

Residual NaN NaN

EXPERIMENTAL DESIGN IN PYTHON


Example

EXPERIMENTAL DESIGN IN PYTHON


Example output
sum_sq df F PR(>F)

Fertilizer 16268.819755 1.0 15860.400389 0.0

Variety 16080.781810 1.0 15677.083028 0.0

Residual 4098.900541 3996 NaN NaN

EXPERIMENTAL DESIGN IN PYTHON


Let's practice!
E X P E R I M E N TA L D E S I G N I N P Y T H O N
Interactive effects
E X P E R I M E N TA L D E S I G N I N P Y T H O N

Luke Hayden
Instructor
Additive model

EXPERIMENTAL DESIGN IN PYTHON


Interactive effects

EXPERIMENTAL DESIGN IN PYTHON


Implementing ANOVA with interactive effects
import statsmodels as sm
formula = 'Production ~ Fertilizer + Variety + Fertilizer:Variety'
model = sm.api.formula.ols(formula, data=df).fit()
aov_table = sm.api.stats.anova_lm(model, typ=2)
print(aov_table)

sum_sq df F PR(>F)

Fertilizer 1.0 p-value

Variety 1.0 p-value

Fertilizer:Variety 1.0 p-value

Residual NaN NaN

EXPERIMENTAL DESIGN IN PYTHON


Example 1

EXPERIMENTAL DESIGN IN PYTHON


Interactive effect
sum_sq df F PR(>F)

Fertilizer 56279.045164 1.0 56804.326681 0.0

Variety 56213.809280 1.0 56738.481917 0.0

Fertilizer:Variety 55215.630416 1.0 55730.986532 0.0

Residual 3959.048150 3996 NaN NaN

EXPERIMENTAL DESIGN IN PYTHON


Example 2

EXPERIMENTAL DESIGN IN PYTHON


No interactive effect
sum_sq df F PR(>F)

Fertilizer 15889.934273 1.0 15613.185329 0.0

Variety 16132.350924 1.0 15851.379901 0.0

Fertilizer:Variety 0.908174 1.0 0.892357 0.344897

Residual 4066.830440 3996 NaN NaN

EXPERIMENTAL DESIGN IN PYTHON


Beyond 2-way ANOVA
Two-way ANOVA Three-way ANOVA

formula = 'Production ~ formula = 'Production ~ Fertilizer +


Fertilizer + Variety + Fertilizer:Variety' Variety + Season + Fertilizer:Variety +
Fertilizer:Season + Variety:Season +

3 variables, 3 p-values Fertilizer:Variety:Season'

7 variables, 7 p-values

Complex!

EXPERIMENTAL DESIGN IN PYTHON


Let's practice!
E X P E R I M E N TA L D E S I G N I N P Y T H O N

You might also like