Professional Documents
Culture Documents
The Art Science of AB Testing For Business Decisions
The Art Science of AB Testing For Business Decisions
Alex P. Miller
Ph.D. Candidate, Information Systems
Department of Operations, Information, & Decisions
Welcome &
Introduction
Ph.D. Candidate
Information Systems,
OID Department
3
1. Core concepts
2. A/B testing
paradigms in
Overview: business
3. Simulation
exercise
4. Debrief
4
What will you get out of this workshop?
● A hands-on understanding of A/B testing:
○ What is it?
○ What types of business problems can it help you solve?
○ What does it look & feel like to use A/B testing for
decision making?
5
Core Concepts in
A/B Testing
Definition:
A
A/B testing is:
the practice of using of
randomized experiments B
for making business
decisions
7
Definition:
A
A/B testing is:
the practice of using of
randomized experiments B
for making business
decisions
9
When used properly:
● Randomized experiments are the “gold
standard” for measuring cause & effect
10
“Experimentation is the
least arrogant method
of gaining knowledge.”
— Isaac Asimov
A/B testing is for everyone
● Tech companies (Microsoft, Google, Amazon,
Facebook) are well-known for having
intensely experimental organizations
12
A/B testing is for everyone
● Tech companies (Microsoft, Google, Amazon,
Facebook) are well-known for having
intensely experimental organizations
13
Recommended Reading
15
Why run experiments?
● Randomized experimentation is a technique
of gathering data that is specifically designed
as a means of “causal inference”
16
Why run experiments?
● Randomized experimentation is a technique
of gathering data that is specifically designed
as a means of “causal inference”
Causal inference:
The process of understanding and measuring
cause & effect
18
Why is this problem hard?
It’s hard to separate your actions from other
factors that could affect customer behavior:
A B
Homepage
design
��💻
Customer
behavior
19
Why is this problem hard?
It’s hard to separate your actions from other
factors that could affect customer behavior:
💵
Day of
month/week
🌦
Weather
A B
��
Homepage Marketing
design campaigns
��💻 ��
Customer Competitor’s
behavior strategies
20
How does randomization help?
💵
Day of
month/week
🌦
Weather
A B
��
Homepage Marketing
design campaigns
��💻 ��
Customer Competitor’s
behavior strategies
21
How does randomization help?
Randomizing which homepage customers see allows you
to isolate the effect of that variable; with enough data,
other factors that affect behavior should be balanced
💵
Day of
month/week
🌦
A B Weather
🗣
Marketing
campaigns
Homepage
design ��💻
😈
Customer Competitor’s
behavior strategies
22
A/B testing is valuable in situations when:
You have multiple strategies/actions you can
implement and:
23
A/B testing is a particularly powerful tool
in digital business, relative to traditional
forms of commerce
● Randomization is easy
● Measurement is easy
24
What should you test?
26
Key Steps for Running an A/B Test
1. Develop a set of “hypotheses” to test
��
e.g., “variations”, “treatments” “arms”, “strategies”
27
Key Steps for Running an A/B Test
1. Develop a set of “hypotheses” to test
��
e.g., “variations”, “treatments” “arms”, “strategies”
28
Key Steps for Running an A/B Test
1. Develop a set of “hypotheses” to test
��
e.g., “variations”, “treatments” “arms”, “strategies”
29
Key Steps for Running an A/B Test
1. Develop a set of “hypotheses” to test
��
e.g., “variations”, “treatments” “arms”, “strategies”
30
Walkthrough: Optimize Nike product page
Suppose a UX designer has a new idea for how
the product page should look:
31
32
Hypotheses? ✅
A B
33
Hypotheses? ✅
A B
Evaluation criterion?
A B
A B
👤 💻
Web server
User
Run experiment: A/B Test in Action
User’s computer
requests website Testing software
randomly assigns user
💻
to treatment arm
👤
Web server
User A B
Run experiment: A/B Test in Action
User’s computer
requests website Testing software
randomly assigns user
💻
to treatment arm
👤
Web server
User A B
User sees assigned
treatment
Run experiment: A/B Test in Action
User’s computer
requests website Testing software
randomly assigns user
💻
to treatment arm
👤
Web server
User A B
User sees assigned
treatment
💻
to treatment arm
👤
Web server
User A B
User sees assigned
treatment
43
Evaluating the results from an A/B test
44
Evaluating the results from an A/B test
45
Evaluating the results from an A/B test
46
Evaluating the results from an A/B test
“Effect size”
47
Evaluating the results from an A/B test
48
Evaluating the results from an A/B test
Statistics provides a
principled way to quantify
how certain you should be
about your results given:
● the magnitude of effect
you observed and your
sample size
50
Common statistics can be difficult to interpret
51
Common statistics can be difficult to interpret
100%
“Confidence” 99.9% 99% 95%
100%
“Confidence” 99.9% 99% 95%
100%
“Confidence” 99.9% 99% 95%
100%
“Confidence” 99.9% 99% 95%
100%
“Confidence” 99.9% 99% 95%
100%
“Confidence” 99.9% 99% 95%
100%
“Confidence” 99.9% 99% 95%
60
Testing Paradigms
for Business
Decisions
The importance of...
Understanding
and Defining the
Goal of A/B Tests
Statistics in the real world
● There’s a fundamental trade-off in statistics:
Precision Speed
Hypothesis Metric
Testing Optimization
�� ��
64
Hypothesis Testing ��
● You come to the table with a set of
predetermined hypotheses
● Primary concerns:
Treatment A
Treatment B
test (random assignment) implement (all remaining customers given same treatment)
Treatment A
Treatment B
Deploy optimal treatment arm
71
Which paradigm is “correct”?
● However:
72
Sample size example using classical
“significance” and “power” levels
Suppose website conversion rate is 5%...
● To detect a
○ 0.5% absolute difference (~10% relative difference)
● You need: 90,000 observations
● To detect a
○ 0.1% absolute difference (2% relative difference)
● You need: 1 million+ observations
75
Why classical notions of “significance” may
be irrelevant for many A/B tests
● Classical “statistical significance” are based on
“false positive control” guarantees
77
Why classical notions of “significance” may
be irrelevant for many A/B tests
● For many business decisions, “false positives”
are not that costly
○ Often by the time some variation can be tested in an
experiment, most of the design/development work is
already done
79
Hypothesis Metric
Testing Optimization
“precision mindset” “risk mindset”
�� ��
● Precision matters
80
Hypothesis Metric
Testing Optimization
“precision mindset” “risk mindset”
�� ��
● Precision matters ● Precision is “nice to
have”, but maximizing
● False positives are costly profits is the primary
goal
81
Key insight #1 for using A/B testing within
a “metric optimization” framework:
82
Key insight #1 for using A/B testing within
a “metric optimization” framework:
83
Key insight #1 for using A/B testing within
a “metric optimization” framework:
85
Key insight #2 for using A/B testing within
a “metric optimization” framework:
Fraction of experiments by effect size
86
Key insight #2 for using A/B testing within
a “metric optimization” framework:
Fraction of experiments by effect size
88
Simulation
Exercise
● I’ve helped develop an interactive tool
designed to:
90
● I will give a brief demo of how to use the tool
91
Logistics
● I’ll be breaking you out into smaller rooms to
form teams
92
Practice Mode! (20min)
● Spend 5-10 minutes playing the game on your
own to familiarize yourself with interface
93
Competition Mode (15-20min)
96
Future of A/B Testing
● A/B testing + Machine Learning = Much more
sophisticated personalization
○ e.g., Moving from targeting customers
based on 2 variables (Location, Device) to
50 variables
○ Recent advances in ML make this
easy/automatable in principled ways
97