Professional Documents
Culture Documents
Never Start With A Hypothesis
Never Start With A Hypothesis
Cassie Kozyrkov
Nov 30, 2018 · 8 min read
That’s why everything begins with a physical action/decision that you commit to doing if
you don’t gather any (more) evidence. This is called your default action.
What I’m asking you is, “What will you actually do if you walk away and remain ignorant
of the information?”
“Gather data” is not an appropriate answer. I’m prodding you to tell me which of the
options you’d go for if I forced you choose RIGHT NOW. (Sorry I yelled.)
If binary feels too basic, the amazing variety of shapes on your screen speaks volumes of the
power of binary options put together. When you need to make a more complex decision, you can
compound several hypothesis tests. Let’s start with one at a time.
Bayesians are different when it comes to this, but if you’re feeling righteous Bayesian
rage because you’re at philosophical odds with the logic here, take a deep breath and
think of this as a lesson in knowing your enemy. We’ll talk about the Bayesian way of
life soon enough.
For now, the clue as to which kind of statistics you’re dealing with is in the jargon
floating about. If you hear “confidence interval” or “p-value”, hello Frequentist. If you
hear “credible interval” or “prior” or “posterior” (this is nothing rude, I promise), hello
Bayesian. If the first is more familiar, it’s because most educational programs teach
Frequentist thinking before/instead of Bayesian thinking.
I’m asking you what you’d prefer to do if you stay ignorant, so you don’t need data to
answer my question, though you may find a previous analysis inspiring. Exploratory
answer my question, though you may find a previous analysis inspiring. Exploratory
data analysis (EDA) is a sort of guided meditation, if you will. It’s a tool to help decision-
makers through this part. Read this if you’re keen to dive deeper into how analysts and
decision-makers work together.
EDA is pretty useful… if you can afford it. The price is all data you used for it has to be
nuked from orbit before you get to the statistics part. For teams that aren’t flush with
data, excluding any of it from inference is too expensive. They’re entirely at the mercy of
the mental span and brainstorming ability of their decision-maker.
Playing it safe
Imagine a decision about launching a new product. The typical choice among decision-
makers is to play it safe: don’t launch it unless the data give you a good reason to hit
the green button. If you don’t have data, you’d cheerfully mothball the project. Maybe
that’s a mistake, but hey — you can live with yourself. You picked the default in a way
that makes sticking to it the lesser evil as far as mistakes go.
The default action is the option that you find palatable under
ignorance.
Other examples where society considers the default to be fairly obvious are innocent-
until-proven-guilty (default = don’t convict if there’s no evidence), testing new
medications (default = don’t approve if there’s no evidence), and scientific publication
(default = don’t publish if there’s no evidence).
Although true indifference is fairly rare in the human animal, if you’d honestly be willing
to flip a coin in the absence of data, then you don’t need statistics. If your mind isn’t set,
it can’t be changed. Move along and read this instead. Statistical inference is for
decision-making under uncertainty. If you have the answer already, go home.
To be dry about it, the first move involves framing your decision under no information
and I hope you see that a decision-maker’s training is more relevant for this than a
mathematician’s.
This is one of the decision-making tasks on the tougher end of the spectrum. For non-
trivial examples (stuff that’s slightly more involved than the baby examples you’ll see in
class) it really takes a lot of mental discipline, creativity, flexibility, and concentration to
do it well.
Once you’ve imagined all possible parallel worlds, it’s time to put each in one of two
buckets: let’s call Bucket 1 “Worlds Where I’d Be Happy To Take My Default Action” and
Bucket 2 “All The Other Ones.”
You might have heard shorthand descriptions of the null hypothesis like “status quo” or
“the boring one” or “the thing we don’t want to prove.” All of these are subtly inaccurate,
lazy things a professor might teach a first-year college kid of untrustworthy mental
sophistication. But I trust you to handle the philosophical weirdness, so now you know
that the null hypothesis describes the full collection of universes in which you’d happily
choose your default action. Let’s have a few moments of silence out of respect for the
mental gymnastics we’re asking decision-makers to handle.
Let’s have a quick reminder of where we stand. The point here is that you’ve set
things up so you’re committed to doing your default action as long as you know
nothing, you know only a little, or you know with absolute certainty that you’re a
citizen of a null hypothesis universe.
You’d better switch from the default action to the alternative action: NOT doing your
default. This might spiral off into a series of other decisions, but one thing’s for sure:
you’re not touching the default with a bargepole. The data have changed your mind!
The default is the action you’re okay with falling into passively
whereas the alternative action is something you need to be
actively convinced to do.
The point is that you won’t ever know for sure which of the worlds is your world. That’s
why it’s important that your default action is chosen in a way that accurately reflects
your values. How do you check? If you’ve framed things right, a Type I error should feel
worse a Type II error. In other words:
worse a Type II error. In other words:
If that’s not true, you haven’t really been honest with yourself about which action is
which. Let’s take it again from the top!
Unfortunately, picking your default action incorrectly is a common mistake among those
who learn the math without absorbing any of the philosophy. It’s also a symptom of a
team where the decision-maker is missing in action and the numbers nerds are out en
masse.
A surefire way to set yourself up for failure is to start with the hypotheses instead of the
actions. That a vestige of the way the class exercises are structured (because statistics
classes don’t teach you the decision-maker’s role, those things are almost always done
for you by the professor), but in real life it amounts to getting off on the wrong foot. With
all the effort you’re about to put into the rest of it, wouldn’t it be a shame to faceplant
barely out of the gate?
If you’re craving these ideas in example form (with aliens!), read on here.
Don’t faceplant right out of the gate by starting with the hypotheses, always start with the
default action.
Decision Decision
Data Science Statistics Intelligence Making Analytics
1.4K
6
claps