Professional Documents
Culture Documents
P Valuejaw 2
P Valuejaw 2
P Valuejaw 2
Jeff Witmer
21 April 2016
P-value ≠ Pr(H0 is true)!
Pr(A|B) ≠ Pr(B|A)
“Probability of B given A”
Pr(data|H0) ≠ Pr(H0|data)
Pr(clouds|rain) ≠ Pr(rain|clouds)
2
Doing a hypothesis test means making a
decision.
Reject H0 Retain H0
H0 true Type I error OK
3
Examples (?)
George stands trial for a crime (e.g., burglary).
What is H0, Ha? Type I error? Type II error?
H0: George is innocent
Ha: He is guilty
Type I error: innocent man convicted
Type II error: guilty man set free
Note: “not guilty” ≠ “innocent”
Note: A grand jury might have looked at several possible
defendants and only agreed to let the DA bring forward
George’s case. I.e., George was not chosen randomly to
stand trial. If we were to randomly chose defendants, then we
would make lots of Type I errors over the many trials.
4
Susan goes to her doctor because she thinks she is ill.
What is H0, Ha? Type I error? Type II error?
H0: Susan is well
Ha: She is sick
5
Fred reads that aliens landed at Roswell, NM in 1947.
Should he believe this?
What is H0, Ha? Type I error? Type II error?
H0: No such thing happened
Ha: There is a conspiracy of silence
6
Expert witness work
Consider the question asked, then give one of
the six acceptable answers:
Yes
No
I don’t know
I don’t remember
Could you please repeat the question?
Green
Not “The car was a green Honda with a sunroof, NY
license plates, and the radio was blaring.”
Q: What color was the car? A: Green
7
Hypothesis test
No matter what question you wish the test
would answer, a hypothesis test only answers
one question.
Not “This model is probably true.”
Not “The effect of the drug is large.”
Not “People should care about the difference I have
found.”
Q: Are the data consistent with the model (such that
any deviation from the model could reasonably have
happened by chance)? A: Yes (or No)
8
See the Dance of the P-values
https://www.youtube.com/watch?v=ez4DgdurRPg
12
Consider testing whether an effect is zero.
mean SE H0?
Group 1 25 10 Reject
Group 2 10 10 Retain
Group 1
vs 15 14 Retain!
Group 2
14
Two (different) Ideas
23
Publication bias
One study looked at 10 years of papers
indexed in PubMed and identified 4970
observational studies of medical treatments.
82% of them had statistically significant results
at the 0.05 level.
Another study looked at 1046 research
articles in three clinical psychology journals.
86% of them used statistical tests; 94% of
these rejected H0 at the 0.05 level.
24
Ben Goldacre TED MED talk
25
2005 paper in PLoS Medicine
26
2013 paper, Statistics in Medicine
“Our experiment provides evidence that the majority of
27
Garden of Forking Paths
(“researcher degrees of freedom”)
See Gelman and Loken (2014), American
Scientist, vol. 102, no. 6, page 460+
Do I include that outlier?
Should I do a separate analysis for women?
What about for people over age 40?
It makes sense to exclude participants we later
found out are not native English speakers, right.
Etc.
28
The Reproducibility Project
Attempting to reproduce 100 research findings in
three major psychology journals. Only 39 of them
were classified as replications. (Of the 61 non-
replications, 24 had “at least moderately similar
findings” and 37 failed to meet even that standard.)
97 of the 100 original studies had “statistically
significant” results, but only 36 of the replications
did.
29
Note: The Reproducibility Project has its critics.
See http://science.sciencemag.org/content/351/6277/1037.2
And a response:
https://hardsci.wordpress.com/2016/03/03/evaluating-a-
new-critique-of-the-reproducibility-project/
30
NIH new (2015) stat guidelines
See
http://www.nih.gov/about/reporting-preclinical-re
search.htm
for a statement of “principles with the aim of
facilitating the interpretation and repetition of
experiments”
31