Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

The A/ B Test ing Simulat ion

Tear Sheet
This tear sheet is designed to help you in class and serves as a guide to your debrief.

To Start the Simulation: Practice Mode

After the simulation is set up (see Launching the Simulation) tell students: You will be
jumping right into an A/B testing simulation. Remind students to read their Student
Introduction to the Simulation and watch the Interface Walkthrough. You can introduce
the simulation ahead of Practice Mode. Watch how I do this here.

During the Simulation: Mid-Simulation Instructions

Once the simulation starts, keep an eye on the simulation progress. You will be able to
see how students progress through the simulation and how many have completed their
experiments. When all students have finished, you can give students instructions ahead
of Tournament Mode. Let students know that only one team member should input
decisions and that they should consider the strategies they used in Practice Mode to help
their team. These are my instructions. You can then launch Tournament Mode,
configuring the simulation and assigning teams. You’ll have access to a number of reports
following Tournament Mode (see Configuring the Simulation: Tournament Mode).

Debrief

Your debrief will have a number of phases as you begin the discussion, cover key topics
and wrap up the discussion. Think ahead about the arc of the discussion and your goals
for the discussion. You can use the debrief screens to show students how each team did
and ask that students unpack their ideas during your debrief.

My strategy for debriefing is as follows: I begin with the student experience and ground
my discussion in that experience, inviting the top three teams to walk the class through
the strategies they used. I use student responses as a lead-in to highlight key ideas, which
I point out briefly as students talk about their strategies. I use this initial phase of the
discussion as a springboard to connect the lessons of the simulation for students. In the
larger discussion, I make reference to specific team strategies so that students make the
connection between their actions in the simulation and the key ideas I unpack.

In response to my initial question, some teams may tell me that they experimented for 9
or 10 weeks and then locked in their decisions; I point out that by continuing to

© 2021, T he W hart on Sc hool, T he U niv ers it y of Penns y lv ania 1


The A/ B Test ing Simulat ion

experiment, teams may have traded in their ability to maximize profits because they tried
to learn what would work best and spent most of their time on exploration. This key idea
– the explore-exploit tradeoff – is one that I will elaborate on later in the discussion.

Some teams tell me that they tried to rely on intuitions or real-life observations regarding
how people react to certain price points or marketing messages. But I point out another
key idea – the simulation is an artificial environment – some lessons are generalizable,
and some are not, and it is important to differentiate the two. In this case, I tell students
that these simulations are not based on any actual user behavior; in the real world,
intuition might serve as one tool in their arsenal; for instance, you may believe that your
customers would have a stronger emotional reaction to a particular tagline. They should
not assume that will be true in the simulation environment. Either way, they can test their
intuitions through an A/B test in the simulation environment just like they can do so in
the real world.

Some teams tell me that they did well by experimenting for the first two or three weeks,
learning all they could, and then locking in decisions for the remaining ten week period.
This response gives me an opening to discuss both the explore-exploit tradeoff and
strategies that students use that are similar to the strategy an algorithm deploys and our
ability to automate the process of A/B testing through AI.

Following this interaction, I connect the lessons and make the following points, starting
with the explore-exploit tradeoff. I tell students:

One of the most fundamental tradeoffs that you grappled with is the explore-
exploit tradeoff. Exploration is the idea of gathering more information about the
environment we are in. We can explore and learn for a long time – experimenting
with different treatments and the allocations of these treatments to different
subgroups. Exploitation is the idea of making the best decision given the
information we have. The fundamental tradeoff you're trying to make here is how
much to explore and exploit. Many students alluded to this tradeoff.

I then move the discussion to another key idea: the ability to automate the process of
A/B testing through AI, especially when the decision environment is complex.
Connecting this with my earlier point about the tradeoff, I tell students:

This tradeoff can actually be automated. There are machine learning algorithms
that are known as multi-armed banded algorithms; these belong to a class of
machine learning algorithms known as reinforcement learning algorithms. What

© 2021, T he W hart on Sc hool, T he U niv ers it y of Penns y lv ania 2


The A/ B Test ing Simulat ion

those algorithms do is similar to what you did in this decision-making environment


- experiment and gain data by experimenting, then look at that data and adapt a
strategy. One of these algorithms is called an Epsilon-first algorithm; this
algorithm explores first, gets as much information as possible, locks in decisions,
and subsequently exploits.

I point out that the winning team had a similar strategy of exploring early on and then
exploiting for the majority of the time. I then move on to another key idea in the
simulation - the tradeoff between running many experiments at low precision versus
running few at high precision. I tell students:

The tradeoff you are grappling with is how much data you need to get the
statistical confidence in your decision. On the one hand, you could run fewer tests
with very large sample sizes and get greater statistical power, allocating more
time and more traffic. However, if you do that, you can’t necessarily test every
alternative. This is the tradeoff of running fewer tests at high statistical power
versus more tests at lower statistical power. If the gains from testing come from
small incremental gains, we need large sample sizes to detect incremental
changes. In contrast, if the primary gains are from rare successes then those gains
can be detected using small sample sizes. It isn’t obvious which strategy to use.

I then zoom out of the simulation and discuss research in the field directly related to this
tradeoff using a lean experimentation strategy (see Concepts and Background). I
conclude by reinforcing the earlier lessons about the fundamental tradeoffs that
students will have to grapple with and focus on how students can apply these in the
field. I tell students that the key issues to consider are: deciding what to test (there are
so many things to test); deciding whether or not to look for big or incremental gains;
grappling with the exploration-exploitation tradeoff; and the importance of having a
point of view about your setting – that is, are you looking for large unpredictable gains
that can be detected with smaller sample sizes or incremental gains that must be
detected with large sample sizes with high statistical power?

At this point, I usually add to the discussion by exploring an additional topic:


Reinforcement Learning. You may or may not want to add this to your debrief,
depending on the circumstances of your class. You can watch both this debrief and the
additional Reinforcement Learning debrief.

© 2021, T he W hart on Sc hool, T he U niv ers it y of Penns y lv ania 3

You might also like