Professional Documents
Culture Documents
Assumptions in Hypothesis Testing: James Chapman
Assumptions in Hypothesis Testing: James Chapman
hypothesis testing
HYPOTHESIS TESTING IN PYTHON
James Chapman
Curriculum Manager, DataCamp
Randomness
Assumption
The samples are random subsets of larger
populations
Consequence
Sample is not representative of population
Consequence
Increased chance of false negative/positive error
Consequence
Wider con dence intervals
n ≥ 30 n1 ≥ 30, n2 ≥ 30
n × p^ ≥ 10 n1 × p^1 ≥ 10
Revisit data collection to check for randomness, independence, and sample size
James Chapman
Curriculum Manager, DataCamp
Parametric tests
z-test, t-test, and ANOVA are all parametric tests
alpha = 0.01
import pingouin
pingouin.ttest(x=repub_votes_potus_08_12_small['repub_percent_08'],
y=repub_votes_potus_08_12_small['repub_percent_12'],
paired=True,
alternative="less")
repub_votes_small['diff'] = repub_votes_small['repub_percent_08'] -
repub_votes_small['repub_percent_12']
print(repub_votes_small)
repub_votes_small['abs_diff'] = repub_votes_small['diff'].abs()
print(repub_votes_small)
Incorporate the sum of the ranks for negative and positive di erences
T_minus = 1 + 4 + 5 + 2 + 3
T_plus = 0
W = np.min([T_minus, T_plus])
James Chapman
Curriculum Manager, DataCamp
Wilcoxon-Mann-Whitney test
Also know as the Mann Whitney U test
age_vs_comp_wide = age_vs_comp.pivot(columns='age_first_code_cut',
values='converted_comp')
import pingouin
pingouin.mwu(x=age_vs_comp_wide['child'],
y=age_vs_comp_wide['adult'],
alternative='greater')
alpha=0.01
pingouin.kruskal(data=stack_overflow,
dv='converted_comp',
between='job_sat')
James Chapman
Curriculum Manager, DataCamp
Course recap
Chapter 1 Chapter 3
Chapter 2 Chapter 4
Bayesian statistics
Bayesian Data Analysis in Python
Applications
Customer Analytics and A/B Testing in Python