Setting Up Your Command Center

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Setting up your command

center
Ankur Agrawal

Ankur Agrawal

Product | Business | Experience | VP@Ola | SVP@MakeMyTrip | Snapdeal


Published Apr 26, 2020
+ Follow

This is part 2 of a two-part article on business command


centers. Part 1 established the need for command centers. In this
part, I talk about what to put inside one.

There’s a lot of literature out there on measuring what matters, and


managing by objectives. (e.g., see Measure What Matters.) But
there isn’t a lot about how to find what matters.

I’ll propose both – an architecture of metrics, and a way to identify


the metrics that matter.

Step 1: Identify the right metrics

Conceptually, there will be three layers.

1. Top level business outcomes layer, defining the core


business metrics. This layer could be the P&L statement,
with each revenue and cost line being the basis for the
second layer.
2. Next layer of business metrics, following from each line of
the P&L statement. These would typically be operational
metrics, but still outcome metrics, that together get us to
the relevant P&L line item. Think carefully about what
elements impact each revenue or cost line.

3. Lastly, the levers that contribute to these metrics. Typically,


operational and product metrics. And these would
generally be input metrics.

Identifying the metrics in layers 2 and 3 will take some cross-


functional effort, and some structured root-cause thinking. But
you'll come out of that effort with a much clearer understanding of
the business.

Step 2: Identify the right measures

Once we’ve identified the core metrics, we still need to figure out
the right measures. This is absolutely vital. For example, in the cab
industry, we might decide that time for a cab to arrive is very
important. So that’s a metric. But what specifically should we
measure? Avg time? Median time? 90th percentile time? Or
something else?

As this is primarily an experience metric, we go by what the users


tell us. So we find out from our users what’s acceptable. Let’s say
users are ok with a 5 minute wait time. Then the right measure is %
of rides with ETA < 5 min.

Let me take a different example. One of the direct costs in the P&L
in the same business is the cost of connecting a call between the
rider and the driver. (We need to mask the phone numbers for
privacy reasons.) As this is a direct cost, it’s best to look at it at a
unit level, or as intensity. So, number of calls/ride is the right metric.
(And we should probably add cost/call as well, just to be safe.)
Here are two simple examples of the approach. They are based on
the app-based cab service. The first one shows a revenue-side
example. In Layer 1, it breaks revenue into multiple stages, the first
one of which is gross orders. Gross Orders themselves come about
through a series of steps, one of which is orders/user session, which
is an operational outcome metric. This itself is an outcome of the
usual conversion funnel. Each stage of the conversion funnel has
contributing input levers, some of which are listed in this example in
layer 3.

The second one shows a cost example. Shows two specific direct
costs, with sample input metrics.
Once we have the right metrics and measures, we need to decide
how to look at them.

Step 3: Decide what to look for

I propose we look at two aspects:

Trends: these give you a perspective on long-term health of the


system. I recommend looking at them as 7-day moving averages,
over a 12 month (or longer) period. And we should ask the
following questions:

• What’s the right value for each metric?: Why is it where it


is? Can it be better? If it has varied in the past, what were
the reasons? That will help identify the levers that will allow
us to improve it.

• Is there a long term decay or improvement? Often certain


metrics will go bad slowly over a long term, and the new
values will keep becoming the new normal unless we take
a longer view. If they have worsened, what are the
reasons? In trying to answer this question, you will discover
new and important relationships between levers and
outcomes that are probably worth tracking.

• Trends will also give us a good understanding of the


natural variation or noise. If a metric moves outside of
the ‘noise’ area, something is wrong, and needs an
intervention.

In the call masking example from above: in one particular month,


there was a spike in costs, because of an error that resulted in
phone calls being made across countries, rather than within the
same country. It was a large unpleasant surprise when we received
the invoice next month. The cost/call metric being tracked as part of
the command center would have alerted us within a day!

Exceptions: Defined as something that simply should not happen. If


it has happened, something is broken which needs to be fixed. In
software engineering terms, think of these as unit test assertions
that should never fail. These should be tautologies.

The dashboard, over time, can have tens of such metrics – all of
them should have zero as the count. Every single one of them gives
confidence about a certain process. One can think of the exceptions
as an ongoing black-box process governance: if nothing is going
wrong, then the processes must be working right!

Note that exceptions should be defined realistically, based on the


current process performance – not based on your ideal end-state.
Over time, as you fix underlying issues and the exceptions reduce,
you can iteratively tighten the acceptable ranges.

Wherever people track any metric, trends are normally tracked in


some form of the other. But I have rarely seen exceptions being
tracked. When they are, they immediately add significant value. As
soon as you get these set-up, you start finding both transactional
and structural issues that were not apparent till then, and fixing
them results in significantly smoother operations.

At MakeMyTrip, one of the exceptions we defined was around


cancellations. The exception was defined like this:

Number of cancellation requests that have not resulted in a refund


more than 48 hours later.

As soon as we started tracking this, we found that this number was


not zero, even though we expected it to be. So we dug deeper,
discovered and fixed multiple issues in product, tech, and the
manual processes:
• Some of the automated cancellation requests would fail,
but the software didn’t read the failure status properly, so
the request would fall through the cracks.

• In some cases, automation tried to send the cancellation


request to a manual processing queue, but a handover
failure resulted in the request falling through the cracks
again.

• The manual process was tracked just like a regular support


ticket: which meant that there was no structured data
being recorded. This meant that if agents made errors in
allocating the ticket or closed it without doing the work,
the system wouldn't find out.

Fixing all of these resulted in significant reduction in customer


escalations, and in losses that MakeMyTrip had to bear due to
tickets not getting cancelled in time.

Not just that - this exercise resulted in two ideas that became
significant customer facing value propositions later - Instant
Refunds (an industry first initiative processing consumer refunds
instantly), and MMTPromise (a resolution time promise for customer
support, backed by a monetary penalty paid into the user's
MakeMyTrip wallet.)

Once we have the core metrics, we can also look at them by various
types of cuts – filter them to get more insights. That’s pretty
standard for any dashboard, so I won’t go into that aspect. (Some of
these would be: By market, by user cohorts, by app operating
system, by network type, by product category, by time of day, by
day of week….)

At TravelGuru, the command center dashboard for the flights


business covered the customer journey from website visit to ticket
purchase. By focusing on both the trend and exception metrics, we
were able to find multiple interventions that allowed us to increase
conversions by 33% in just 2 months, while also decreasing costs
significantly!

Step 4: Define Alerts

We then define alerts on both trends and exceptions:

Trends: We define acceptable range for each metric, and set up


alerts when they cross the range. Note that this would be different
for different time granularities – larger time buckets will naturally
smoothen variations. So the acceptable range should be smaller for
larger time buckets.

Exceptions: Any time even one transaction breaches the acceptable


threshold, there is an underlying problem worth looking at. More
often than not, you'll discover a structural issue as the root cause.

The ranges will keep evolving – as you understand your systems


better, and make it better, you can make the ranges tighter.

Alerts can be defined in a hierarchical manner, so that every


employee has their own dashboard, depending on their operating
context. So, for instance, a refund delayed beyond 2 days becomes
an alert for the Team Lead directly responsible for refunds; becomes
an alert for the customer service head at 5 days delay, and the head
of support at a 10 days delay.

Alternatively, a delayed payment by a B2B customer becomes an


alert for the relevant sales rep faster. Followed by the area manager,
then the regional manager, the national head, and so on…

At Next Education, we initially set up the operational exceptions


dashboard only for me (I was heading operations). In my weekly
reviews, we would discuss the exceptions with the team and resolve
them. The teams found this dashboard very useful, and mid-level
managers and team leads started looking at my dashboard every
day to find and fix these issues. We then created the hierarchical
dashboards for everyone, resulting in even better outcomes.

Step 5: Set up reviews

It’s very very important that there be regular reviews of the


dashboard by the operating leadership. Only then will we really get
the full value out of this. A review of the command center
dashboard should be a key standing agenda of the weekly
operational leadership meeting.

Do you have a business command center in your org? What do you


think about the approach above? Do you have experience with
the Exceptions type of metrics?

https://www.linkedin.com/pulse/setting-up-your-command-center-ankur-agrawal

You might also like