Download as pdf or txt
Download as pdf or txt
You are on page 1of 79

Why AI/Data Science

Projects Fail
How to Avoid Project Pitfalls
iii

Synthesis Lectures on
Computation and Analytics
This series focuses on advancing education and research at the interface of qualita-
tive analysis and quantitative sciences. Current challenges and new opportunities are
explored with an emphasis on the integration and application of mathematics and
engineering to create computational models for understanding and solving real-world
complex problems. Applied mathematical, statistical, and computational techniques
are utilized to understand the actions and interactions of computational and analyt-
ical sciences. Various perspectives on research problems in data science, engineering,
information science, operations research, and computational science, engineering, and
mathematics are presented. The techniques and perspectives are designed for all those
who need to improve or expand their use of analytics across a variety of disciplines
and applications.
Why AI/Data Science Projects Fail: How to Avoid Project Pitfalls
Joyce Weiner
iv

Copyright © 2021 by Morgan & Claypool

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any
other except for brief quotations in printed reviews, without the prior permission of the publisher.

Why AI/Data Science Projects Fail: How to Avoid Project Pitfalls


Joyce Weiner
www.morganclaypool.com

ISBN: 9781636390383 print


ISBN: 9781636390390 ebook
ISBN: 9781636390406 hardcover

DOI 10.2200/S01070ED1V01Y202012CAN001

A Publication in the Morgan & Claypool Publishers series


SYNTHESIS LECTURES ON COMPUTATION AND ANALYTICS
Lecture #1
v

Why AI/Data Science


Projects Fail
How to Avoid Project Pitfalls
Joyce Weiner
Intel

SYNTHESIS LECTURES ON COMPUTATION AND ANALYTICS #1

M
&C MORGAN & CLAYPOOL PUBLISHERS
vi

ABSTRACT
Recent data shows that 87% of Artificial Intelligence/Big Data projects don’t make
it into production (VB Staff, 2019), meaning that most projects are never deployed.
This book addresses five common pitfalls that prevent projects from reaching deploy-
ment and provides tools and methods to avoid those pitfalls. Along the way, stories
from actual experience in building and deploying data science projects are shared to
illustrate the methods and tools. While the book is primarily for data science practi-
tioners, information for managers of data science practitioners is included in the Tips
for Managers sections.

KEYWORDS
data science, project management, AI projects, data science projects, project planning,
agile applied to data science, Lean Six Sigma
vii

Contents
Preface���������������������������������������������������������������������������������������������������� ix

1 Introduction and Background��������������������������������������������������������������� 1

2 Project Phases and Common Project Pitfalls ��������������������������������������� 5


2.1 Tips for Managers������������������������������������������������������������������������� 10

3 Five Methods to Avoid Common Pitfalls���������������������������������������������� 13


3.1 Ask Questions ����������������������������������������������������������������������������� 14
3.2 Get Alignment ��������������������������������������������������������������������������� 14
3.3 Keep It Simple ��������������������������������������������������������������������������� 14
3.4 Leverage Explainability �������������������������������������������������������������� 15
3.5 Have the Conversation ��������������������������������������������������������������� 15
3.6 Tips for Managers ����������������������������������������������������������������������� 16

4 Define Phase������������������������������������������������������������������������������������������ 19
4.1 Project Charter ��������������������������������������������������������������������������� 19
4.2 Supplier-Input-Process-Output-Customer (SIPOC) Analysis ��� 23
4.3 Tips for Managers ����������������������������������������������������������������������� 28

5 Making the Business Case: Assigning Value to Your Project��������������� 29


5.1 Data Analysis Projects ��������������������������������������������������������������� 30
5.2 Automation Projects ������������������������������������������������������������������ 31
5.3 Improving Business Processes ����������������������������������������������������� 32
5.4 Data Mining Projects ����������������������������������������������������������������� 33
5.5 Improved Data Science �������������������������������������������������������������� 33
5.6 Metrics to Dollar Conversion ����������������������������������������������������� 33

6 Acquisition and Exploration of Data Phase������������������������������������������ 35


6.1 Acquiring Data �������������������������������������������������������������������������� 35
6.2 Developing Data Collection Systems ����������������������������������������� 35
6.3 Data Exploration ������������������������������������������������������������������������ 37
6.4 What Does the Customer Want to Know? �������������������������������� 37
6.5 Preparing for a Report or Model ����������������������������������������������� 38
6.6 Tips for Managers ����������������������������������������������������������������������� 38
viii

7 Model-Building Phase��������������������������������������������������������������������������� 41
7.1 Keep it Simple ��������������������������������������������������������������������������� 41
7.2 Repeatability ������������������������������������������������������������������������������ 42
7.3 Leverage Explainability �������������������������������������������������������������� 42
7.4 Tips for Managers ����������������������������������������������������������������������� 43

8 Interpret and Communicate Phase ������������������������������������������������������ 45


8.1  Know Your Audience ����������������������������������������������������������������� 46
8.2  Reports ��������������������������������������������������������������������������������������� 47
8.3  Presentations ������������������������������������������������������������������������������ 49
8.4  Models ��������������������������������������������������������������������������������������� 51
8.5  Tips for Mangers ����������������������������������������������������������������������� 52

9 Deployment Phase��������������������������������������������������������������������������������� 53
9.1  Plan for Deployment from the Start ������������������������������������������ 53
9.2  Documentation �������������������������������������������������������������������������� 54
9.3  Maintenance ����������������������������������������������������������������������������� 55
9.4  Tips for Managers ����������������������������������������������������������������������� 56

10 Summary of the Five Methods to Avoid Common Pitfalls������������������ 59


10.1 Ask Questions ��������������������������������������������������������������������������� 59
10.2 Get Alignment �������������������������������������������������������������������������� 59
10.3 Keep It Simple �������������������������������������������������������������������������� 60
10.4 Leverage Explainability ������������������������������������������������������������ 60
10.5 Have the Conversation �������������������������������������������������������������� 60

References���������������������������������������������������������������������������������������������� 63

Author Biography���������������������������������������������������������������������������������� 65
ix

Figures
Figure 4.1: Example project charter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Figure 4.2: Supplier-input-process-output-customer (SIPOC) analysis table.
The SIPOC is completed in three parts following the numbered
steps .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Figure 4.3: Example SIPOC for engineering dispositioning material . . . . . . . 24
Figure 4.4: Example SIPOC with both Part 1, Process, and Part 2, Output
and Customer completed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Figure 4.5: Example SIPOC with Part 3 Suppliers, Inputs started.. . . . . . . . . . 26
Figure 4.6: Example SIPOC with all parts completed.. . . . . . . . . . . . . . . . . . . . 27
Figure 8.1: Example presentation slide. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Tables
Table 1.1: Five project pitfalls�������������������������������������������������������������������������� 2
Table 1.2: Alignment between data science project phases and Lean six
sigma DMAIC framework ���������������������������������������������������������� 3
Table 3.1: Connection between the methods to avoid pitfalls and the five
project pitfalls������������������������������������������������������������������������������� 13
Table 3.2: Questions to ask at retrospectives��������������������������������������������������� 17
Table 4.1: Key components of a project charter ��������������������������������������������� 20
Table 5.1: Deliverables and metrics for various types of data science
projects����������������������������������������������������������������������������������������� 29
Table 5.2: Example calculation for time saved ����������������������������������������������� 31
Table 5.3: Types of waste with manufacturing and office examples ��������������� 32
Table 5.4: Common metrics and dollar conversion����������������������������������������� 33
Table 8.1: Data science project types and typical final deliverables����������������� 45
Table 8.2: Data visualization reading list ������������������������������������������������������� 48
xi

Preface
Who is this book for? To answer that question, I need to give a little background. My
degrees are in physics. I have a physics undergraduate degree and a Master’s degree
in optical science. In physics there are two disciplines: theoretical physics and experi-
mental physics. Similarly, I have observed that in data science, there are theorists who
focus on developing algorithms, and practitioners who use algorithms and apply data
science. This book is primarily for data science practitioners. I’ve also included infor-
mation for managers of data science practitioners in the Tips for Managers sections.
This book is organized into chapters. The first two chapters introduce common
project pitfalls and the methods to avoid them. The next chapters are based on project
phases for data science projects. In each chapter of the project phase chapters, I’ll
include tools you can use to help avoid the project pitfalls, highlighting which of the
five methods the tool supports. To get you started, the five methods are: (1) ask ques-
tions; (2) get alignment; (3) keep it simple; (4) leverage explainability; and (5) have
the conversation.
Throughout this book I use the term “data science projects” as an all-encom-
passing term that includes Artificial Intelligence (AI) and Big Data projects. AI itself
is an inclusive term. Machine learning and deep learning are both types of AI. AI is
defined by the Oxford English Dictionary as the theory and development of computer
systems able to perform tasks that normally require human intelligence, such as visual
perception, speech recognition, decision-making, and translation between languages
(Oxford Languages, 2020). That means it encompasses any method of computation
that mimics human intelligence, not only what we often think of—machine learning
and neural nets—but also expert systems and optimization algorithms. I’m using
“customer” to mean the person getting value from the project and the end user of the
project. I’m using “management” to mean your leadership and decision-making chain
within your organization.
1

CHAPTER 1

Introduction and Background


At the INFORMS Business Analytics Conference in 2018, one of the speakers shared
a statistic that 85% of Artificial Intelligence (AI) and Big Data projects failed. Look-
ing into it, I learned that in this case, failure was defined as “not being deployed.” So,
starting a project that didn’t make it to production was the definition of the project
failing. I was surprised by the number. 85% is a large percentage.
I could think of a few reasons why a project wouldn’t get deployed. Maybe you
didn’t like the results from the models you tried, or you needed to collect some more
data. But those reasons wouldn’t account for such a large percentage. That 85% of AI
and big data projects would fail indicates a systematic problem, not something specific
to a particular project.
In following up after the conference, I found an article by Venture Beat (VB
Staff, 2019) that increased the failure percentage to 87%. Now I was very interested
and started to think about why. Why is it that 87% of AI/big data projects would
fail before they reached deployment? What were the systematic problems that would
cause such a high rate of failure?
The Venture Beat article reports on a panel session with Deborah Leff, CTO
for data science and AI at IBM and Chris Chapo, SVP of data and analytics at Gap
at Transform 2019. During the panel session, Leff and Chapo discussed reasons
why only 13% of data science projects make it into production. In the session, they
mentioned the need for leadership support, access to data, collaboration across teams,
long-term ownership of solutions, and keeping it simple.
The reasons the panelists shared resonated with me, but I wasn’t fully satisfied
because some of these problems are not within the control of the data scientists or
others doing the projects. I started thinking about systematic challenges that can pre-
vent a data science project from reaching deployment that could be controlled by the
people doing the work.
I’ve been a data scientist for 25 years. That is not to say that my job title has been
“Data Scientist” for that amount of time. When I started, no one talked about Data
Science. I needed to use an entire paragraph to explain my area of technical expertise.
This paragraph covered data extraction, analysis, visualization, and modeling. Nowa-
days, I can just say I’m a data scientist and people understand what I do.
2 1. INTRODUCTION AND BACKGROUND

Over my career, I’ve worked on all sizes of data science projects. I’ve built small
reports, dashboards, and large predictive models. For any project there are risks or
pitfalls that can cause a project to fail that are in the control of the people working on
that project. The five pitfalls are:
1. the scope of the project is too big;

2. the project scope increased in size as the project progressed—e.g., scope


creep;

3. the model couldn’t be explained, hence there was lack of trust in the
solution,

4. the model was too complex and therefore difficult to maintain; and

5. the project solved the wrong problem (Table 1.1).


These are all systematic problems and can be addressed with a good framework.

Table 1.1: Five project pitfalls


1 The scope of the project is too big
2 The project scope increased in size as the project progressed (scope creep)
3 The model couldn’t be explained
4 The model was too complex
5 The project solved the wrong problem

An area of interest for me throughout my career has been to use data to drive
efficiency improvements. I began my career working as a process engineer in manu-
facturing. While working in manufacturing, I became interested in process improve-
ment and earned my Lean Six Sigma black belt.1 Lean Six Sigma uses the DMAIC
framework of Define, Measure, Analyze, Improve, and Control as a strategy for
improving processes. During a talk I attended on data science project phases, I real-
ized that the phases for data science projects (1. Define project objectives; 2. Acquire
and explore data; 3. Model data; 4. Interpret and communicate; and 5. Implement,
document, and maintain) lined up with the DMAIC framework (Table 1.2). This
made me think that some of the tools from Lean Six Sigma would be very helpful in
overcoming the five pitfalls.

1
If you are interested in learning more about Lean Six Sigma, an excellent overview is What is
Lean Six Sigma? (George, Rowlands, and Kastle, 2003)
1. INTRODUCTION AND BACKGROUND 3

Table 1.2: Alignment between data science project phases and Lean Six Sigma
DMAIC framework
Data Science Project Phases Lean Six Sigma DMAIC Framework
Define project objectives Define
Acquire and explore data Measure
Model data Analyze
Interpret and communicate Improve
Implement, document, and maintain Control

Getting back to the Transform 2019 panel session, the reasons they gave for
projects to fail were: need for leadership support, access to data, collaboration across
teams, long-term ownership of solutions, and keeping it simple. Of these reasons,
leadership support, access to data, and collaboration across teams are related to com-
munications, and having management alignment. Keeping it simple is just that—sim-
ple. Sometimes not easy, but simple. Long-term ownership of a solution is a matter
of planning—planning for deployment and building for maintainability. There is one
other potential problem that the panelists didn’t mention. That is lack of understand-
ing and therefore discomfort with a model. If you can’t explain why a model is predict-
ing a particular outcome, and having that explanation is important to management or
the customer for the model, then your project will fail.
Fundamentally, treat a data science project the way you would any other project.
A data science project is not a quick and dirty one-off skunk works kind of thing—if
you want it to be deployed. To get a project to production you need to plan it and do
up-front work. If you go slowly at the start of a project, and do the work on defining
the problem, getting alignment with the end customer for the project, then you will
only solve the problem once. By going slower up front you can end up going faster in
the end.
So, what exactly can you do to avoid project pitfalls? There are five methods
altogether.
1. Ask questions.

2. Get alignment.

3. Keep it simple.

4. Leverage explainability.

5. Have the conversation.


4 1. INTRODUCTION AND BACKGROUND

Asking questions enables communications and helps start the process of get-
ting management alignment. Asking questions up front ensures you are positioned
to start a project that will deliver results that the customer wants. Asking questions
fosters a collaborative atmosphere which will help if you need to get assistance from
other teams.
Explaining your intentions and documenting the project helps with alignment.
Having metrics supports accountability. You can use them to request help and re-
sources and get support from management. Aligning with management and getting
support from the start of the project will ensure that management is aware of your
project and can help if needed later.
Keeping it simple prevents a project from becoming so big that it can never be
finished. It also prevents problems with maintaining a project. If it is simple to explain,
and simple to execute, it can be simple to transfer to a different owner who can sustain
it long term. Starting with a simple problem allows you to build a solution and then
decide if adding on is needed or desired. This is the crawl, walk, run methodology.
Leveraging explainability is tied to both asking questions and getting align-
ment. If your management isn’t that comfortable with AI, they may prefer models
with very clear connections between the inputs and outputs. Knowing this in advance
will allow you to select the type of model that works for the project and meets their
criteria. In some cases, they may not care at all. This is important to know in advance
and asking questions and getting alignment helps to ensure you will build something
that your management and customer is happy with.
Lastly, all of these are about having the conversation. As a project goes along
there are many decision points. If you know what the items of key importance are for
your customer in advance, and have alignment with your management, you can make
these decisions quickly and move the project forward. Even with up-front engagement
and alignment you will need to continue to have conversations with management and
your customer as the project progresses. The best way to handle this is to understand
that this need will occur and plan for it by having regular check-ins with both your
customer and management.
5

CHAPTER 2

Project Phases and Common


Project Pitfalls
Let’s take a more in-depth look into the reasons for projects not to get to production.
While there are many reasons, and clearly this must be true because 87% of projects
don’t make it, I’ll cover 5 reasons that are systematic in nature. These reasons are:
1. the scope of the project is too big;

2. the project’s scope increased in size as the project progressed—e.g., scope


creep;

3. the model couldn’t be explained, hence there was lack of trust in the
solution;

4. the model was too complex; and

5. the project solved the wrong problem.


We’ll explore each one and discuss them in detail.
The first reason for a project not to get to deployment to production is that the
scope is too big. If we can’t fully get our arms around a project, it becomes difficult to
plan and is hard to finish. If we are overly ambitious and get ahead of our competency
or beyond the ability of the team working on a project, we can be slow to finish or
not finish at all. This is analogous to the do-it-yourself TV shows which rescue a cou-
ple with no building expertise, after they had gotten in over their heads. Sometimes
the couple jumps into demo without planning the project and runs into a problem.
Sometimes they pull down a load bearing wall because they didn’t do the work ahead
of time to investigate. Then, they must pay for experts to come in, assess the situation,
and correct the problems. This leads to increased time and expense to complete the
project. The same thing can happen with data science projects. We can bite off more
than we can chew. Typically, we don’t pull the roof down on our heads. We just fail to
deliver the project to production.
The problem with having too big a scope is that it is difficult to detect unless
you are checking in with your customer on a regular basis. Having too big a scope is a
common pitfall because everyone wants to deliver good work, and people are generally
6 2. PROJECT PHASES AND COMMON PROJECT PITFALLS

ambitious. “Sure,” we say, “we can have it do this and that, and that other feature, and
automatically control everything.” The problem then becomes if you go off to build
that all as one piece. If you do that, it puts you at risk of not finishing within the cus-
tomer’s desired timeframe because it has become a really huge project. It also puts you
at risk of another pitfall, solving the wrong problem.
The solution to a big project is breaking it into smaller deliverable pieces that
can be put into production as you go—incremental development. This gives you a
chance to check in with the customer. A benefit of an incremental development
framework like in Agile development2 is that this check is built into the process. Avoid
waiting until the very end to show your project to the customer and get feedback. If
you build incrementally and share as you go, you can learn if they will be satisfied by a
smaller scope. I’ve only had two cases where I’ve ever scoped a project too small, and
that was easily corrected by increasing the scope based on what we had learned so far.
So, having too big a scope can be addressed by asking questions at the start of
the project, and by delivering incrementally and asking questions at each delivery. As
in the case of the do-it-yourself couple, ask the equivalent questions to “What’s in this
wall?”: Where are the boundaries of the project? Does it feel big? Can it be broken into
pieces that can be delivered separately?
To give you an example of breaking a project into pieces, let’s look at the case of
taking a manual process and adding AI. To do this, you need data and a data pipeline.
If those are not yet available, that needs to be the first part of the project. You need
to automate the process and establish data collection. Then, later, you can add AI.
Trying to do it all at once just sets you up for failure. It’s also clear from this example
why having alignment is necessary. Without that conversation there would likely be a
misalignment in expectations for the project deliverables. While you would know that
you need to first set up automation so data for a future AI model can be collected,
management might only see that this project is taking forever and may pull the plug.
By having the conversation with management and getting alignment on the project,
you not only prevent the project from being canceled, but also get recognition for the
value delivered from automating a manual process as part of the project deliverables.
The second reason for a project to fail is scope creep. In this case, you start the
project off fine and then decide to add some more features. Additional features lead to
complexity. Complexity can lead to bugs in code. Complexity can also cause problems
with deployment, which can cause a project to fail.
Scope creep can cause you to not finish a project because there is no defined
“end.” If you decide in advance what the first-pass accuracy criteria will be for the
2
http://www.agilemanifesto.org/
2. PROJECT PHASES AND COMMON PITFALLS 7

model, you won’t fall into the trap of polishing it endlessly. The same applies to features
of the overall project. The thing to keep in mind to fight scope creep is that you can
always go back and make improvements later. Having a working model in deployment
is better than not having anything. As my Lean sensei has quoted, “don’t let perfect
be the enemy of good.” In other words, don’t hold off putting a working model into
deployment because it isn’t perfect.
To fight scope creep, it’s important to define the project’s objectives and scope
up front. You can always go back and revise these as you learn, but you then have
guidelines to keep you to a smaller project size that you can deliver to production.
Say, for example, I am going to be building a report. To help get alignment with
the customer, I’ll have a meeting to understand the decision they want to make based
on the report and what their needs are. Then, before I start pulling data or writing
code, I’ll draw out a mockup of the report and share it to get their feedback. I do this
drawing on paper or on a whiteboard so we can iterate and make changes real time in
the meeting. Once what the report should look like has been established, I have a good
feel for the scope of the project. If the customer calls me and asks for a new feature
before I’ve deployed the first version, I’ll suggest we wait for the first version to go out,
and then see if the feature is needed. Depending on what they are asking, we can have
a conversation about what currently planned feature to swap out for that new feature.
One way to establish a boundary on scope is to have a fixed timeline for de-
livering the project. In Agile development we learn about the iron triangle of project
management: resources, time, and features. Two out of the three are fixed, the last one
is flexible. Traditionally, resources and features are considered “fixed” and then you have
problems with slipping timelines and projects that never end. In Agile, resources and
time are fixed, the features that go into a project are flexible. If you set a fixed timeline
for the project, i.e., a two-week sprint with a product release at the end of each sprint,
and adjust the features you deliver to fit inside that timeline, you can prevent scope
creep. You also have to keep to the rule that you can’t add any features unless you have
completed all the features you initially agreed to, and you still have time left, or, that
you learned something about what the customer wants and will swap one feature for
another.
The third reason for a project not to get to deployment is that the model couldn’t
be explained and there is management (typically) concern about how it works and lack
of comfort in the solution. To avoid this pitfall, keep it simple.
The simpler a model is, the easier it is to explain. While it might be exciting for
you, as the data scientist, to use the latest model that you were recently reading about,
it may not be the best solution for the project. It can also backfire if that latest model
8 2. PROJECT PHASES AND COMMON PROJECT PITFALLS

doesn’t have the explainability that your customer desires. To avoid this pitfall, have the
conversation in advance with your customer. Do they need to know the reasoning be-
hind and input values that influenced a given prediction, or are they ok with predictive
values? Will they want to know causality or just correlation? In my work, I prioritize
using machine learning over neural nets, and physical models over machine learning,
because physical models are the easiest to explain, followed by machine learning al-
gorithms, while neural nets can be a black box. If there is a known equation for the
process you are working on, there is no need to get fancy—just use that physical model.
When I was working in semiconductor manufacturing, we used physical models
to predict critical dimensions. We created a system that compared the calculated mea-
surement based on the input parameters to actual measurements and made a feedback
loop to automatically adjust the lithography machine. This is an example of AI using
a simple physical model.
Keeping with models, the fourth risk for your project not to get to deployment
is that the model is too complex. Again, the solution to this risk is to keep it simple.
The simpler the model, the faster the calculation will be for inference. The simpler the
model, the faster it will be to train, or if it is really simple, no training will be needed
at all. Additionally, simpler models are easier to maintain.
As you are building a model, think about the number of parameters needed for
inference. Will that data be available every time your customer will want a prediction?
How hard or easy is it to gather the data needed? The simpler the model, the fewer
parameters it has. The fewer parameters needed, the easier it is to ensure all the re-
quired data is available.
Think about how long it takes to train your model (assuming you aren’t using a
physical model). How often will the model need to be retrained and how long will the
training take? If a model takes many hours to build—say a deep learning model—but
you only need to retrain infrequently, that might work. If it takes many hours to train
the model, and you’ll need to retrain monthly, that may not work. These are some of
the questions you’ll need to ask to gather this information. How often do the input
parameters change? How often does the process that the model supports change?
Understanding these parameters and having the conversation with the customer of the
model about training frequency is important to ensure your project gets to deployment.
The fifth risk for your project is that you solve the wrong problem. Of the five
reasons, solving the wrong problem is the most devastating for a project team. After
working long and hard on a project, getting to the end, only to learn that you haven’t
delivered a helpful solution is difficult. It is also not that uncommon.
2. PROJECT PHASES AND COMMON PITFALLS 9

A Principal Engineer where I work did a study of problem-solving task forces


looking at “why we were slow to get to root cause.” The number one reason was that
the teams working to solve the problem jumped to a conclusion with insufficient data.
The number two reason was that the teams solved the wrong problem.
The key to avoiding the pitfall of solving the wrong problem is to go slow. That’s
a challenging piece of advice if you are addressing why a team was slow to get to root
cause in the first place, but hear me out. When you solve the wrong problem, once you
realize it, you still need to continue to work to solve the actual problem. Most likely
what has happened is that you’ve addressed a symptom or superficial need, but not
addressed the root cause or fundamental want of the customer. So, you end up solving
the problem at least twice.
Solving the wrong problem can be avoided by having conversations and asking
questions up front to make sure you have alignment with your customer. Take time
at the beginning of a project to understand the problem and what the customer really
wants. I cover problem statements more deeply in Chapter 4.
Albert Einstein said, “If I had an hour to solve a problem, I’d spend 55 minutes
thinking about the problem and 5 minutes thinking about solutions.” I don’t intend
this quote to be a recommendation on the ratio of thinking about a problem versus
thinking about solutions. I do want to caution you about jumping into a project with-
out first spending time thinking about the problem. It may seem like you are going
too slowly by taking this up front time. However, if you can’t afford the time up front,
you certainly can’t afford the time to do it twice. This is called going slow to go fast.
I learned about going slow to go fast by participating in autocross. Autocross is a
timed race with only one car on the track at a time. Usually, autocross events take place
in an empty parking lots with the course marked by chalk lines and traffic cones. Your
score is determined by the time it takes for you to navigate the course, and a penalty is
given for any cones you hit, and for going off course. Autocross is more about handling
than about horsepower, so there are frequently tight turns, and almost always a slalom.
Hence, the going slow to go fast. If you try to take a curve too quickly, you will likely
take out a few cones and increase your time by quite a lot. If you go slower through
the curve, you can make it without hitting any cones and achieve a competitive time.
Applying the concept of going slow to go fast to data science projects means
thinking about, and planning for, deployment at the start of your project. As data
scientists, it is tempting to jump into the model building phase of a project as quickly
as possible. You know, just do the minimal data cleaning needed and get right to
modeling. The risk of this approach is that it sets your project up for all the pitfalls
and makes it likely to be one of the 87% that don’t make it to deployment. Instead,
10 2. PROJECT PHASES AND COMMON PROJECT PITFALLS

deliberately proceed through the project phases: Define project objectives; Acquire
and explore data; Model data; Interpret and communicate; and Implement, document,
and maintain. Check in with your customer throughout the project to make sure you
have alignment, and keep management updated on your progress and what has been
delivered so far. Keep things simple and iterate. This ensures you have an achievable
scope to your project. Iterating on the project means you can deliver value as you go
without having the full scope of the project finished.

2.1 TIPS FOR MANAGERS


Two of the project pitfalls are related to scope. Scoping problems are easy to fall into.
Help your team manage the scope of a project properly. To do this, ask questions at
the start of a project to aid in scoping it correctly.
• Who is this for?

• What is the minimum that they need?

• What needs to be done to deliver that minimum?


Keep reminding your team that the customer gets a benefit from delivery of
a project into production, even if it is not perfect. Help them move a project to de-
ployment before beginning improvements so your organization can start to receive
the business value while your team works on the next version. If you are familiar with
Agile development and the Agile Manifesto, this is applying the first principle: “Our
highest priority is to satisfy the customer through early and continuous delivery of
valuable software” (Beck, 2001). Applying this principle helps prevent scope creep and
makes sure you are seeing business value as quickly as possible.
Coach your team to assess scope creep by asking the following questions.
• Is the report/model currently working?

• What is the minimum work needed to get it working?

• Are we maximizing value delivered for the time we are spending on the
project?

• Can we get value from it as is?

• Can we put this in production? Why not?


Asking why can help you gain understanding from your data science team and
coach them to think through whether they are over processing—meaning putting in
2.2 TIPS FOR MANAGERS 11

more effort than is actually required. Lean Six Sigma suggests asking why five times.
In practice, I have found it best to ask why until you get to a fundamental constraint
like, “that’s how physics works.”
Another two of the pitfalls relate to models. Make sure your team discusses
with you the type of model they intend to use. Help them evaluate the customer’s and
the business’ need for explainability and select an appropriate model that meets those
requirements. Ask questions to gain understanding about the model to be used, so
that you are comfortable with the model and can help explain to others the method
your team is using. Help your team avoid building models that are not tied to reality
and avoid over processing in model building. It can be tempting for data scientists to
work on improving model accuracy indefinitely. Help your team ensure that they are
not chasing diminishing returns.
The last pitfall relates to solving the wrong problem. Two things will assist in
avoiding this pitfall. (1) Work with your team to ask questions at the start of a project.
(2) Support your team in getting alignment with customers and stakeholders.
Guide your team to go slow to go fast. Make planning both what will be done
and how it will be done a normal part of working on a project. Ask questions about
how the team is planning the project and the work. Ask from the very start how they
plan to deploy the project. Make sure they are thinking about the end state as they
build things. Allow them to iterate rather than wait for perfection to deploy a solution.
13

CHAPTER 3

Five Methods to Avoid


Common Pitfalls
So, what should we do up front to make sure we have fully defined the problem and
will not fall into one of the common pitfalls? There are five methods: Ask questions;
get alignment; keep it simple; leverage explainability; and have the conversation. Each
of the five methods addresses multiple pitfalls, as shown in Table 3.1.

Table 3.1: Connection between the methods to avoid pitfalls and the five project
pitfalls
Method Avoids These Pitfalls
Scope is too big
Scope creep
Ask Questions
Model couldn’t be explained
Solved the wrong problem
Scope is too big
Get Alignment Scope creep
Solved the wrong problem
Model couldn’t be explained
Keep it Simple
Model was too complex
Model couldn’t be explained
Leverage Explainability
Solved the wrong problem
Scope is too big
Scope creep
Have the Conversation Model couldn’t be explained
Model was too complex
Solved the wrong problem
14 3. FIVE METHODS TO AVOID COMMON PITFALLS

3.1 ASK QUESTIONS


You need to start a project by asking questions. For example, what is the desired out-
come? What is your customer looking for? What’s the goal of the project? Once you
understand the answers to these questions, you can build on them to ask additional
questions and gain clarity.
Asking questions isn’t limited to just the start of the project. Starting off by
asking questions gets you on the path to delivering a successful solution to production.
Continuing to ask questions as you work on the project keeps the project on that path.
Asking questions up front prevents having too big a scope and can prevent
scope creep. Asking questions about expectations can help ensure the model won’t be
too complex and will ensure you use a model that your customer is comfortable with.
Asking questions helps guarantee you will work on the correct problem. In Chapter
4, I’ll share two tools that will help you ask these questions in the first phase of your
project. These tools can be used to check in with your customer, management, and key
decision maker as your project progresses.

3.2 GET ALIGNMENT


In Chapter 4, I’ll share a tool that helps get alignment with your customer and with
your management before you start the project. Getting alignment with your customer
up front means that you will have agreement on what the project entails. Together
you’ll decide on the scope of the project, and what “done” looks like. I can’t emphasize
enough the benefit of have a clear definition of what it means that the project is done
that is agreed to by the customer. This alignment reduces the risk of scope creep and
of solving the wrong problem.
Having alignment with your management makes sure they understand what
benefits and deliverables will come from the project. It means that you have prepared
them for future requests for support or resources. In Chapter 5, I provide examples
of business value delivered by different types of data science projects, and a table of
conversions to dollars for common metrics.

3.3 KEEP IT SIMPLE


The next method to avoid project pitfalls is to keep it simple. This is something to keep
in mind through all the phases of the project. It is true for both model building, and
for the overall project. Added complexity can be a form of scope creep and can prevent
your project from being deployed to production due to bugs.
3.5 HAVE THE CONVERSATION 15

If you can get a simple version of the project deployed, you can always go back
and refine the model or add features. You get a benefit and deliver business value from
having the project deployed. Added complexity delays that benefit. In a model, com-
plexity can cause difficulty in the ability to maintain the model. Increased complexity
often comes with increased time required to train a model. I’ll discuss this in more
detail in Chapter 7.
Keep deployment of the project simple. If there is an existing system, use it to
deploy your project. If there are existing business processes, make sure your project
works within them, unless the intent of the project is to change them. I’ll cover this
topic in more detail in Chapter 9.

3.4 LEVERAGE EXPLAINABILITY


Part of the reason for asking questions and getting alignment is to understand your
customer’s needs and wants. Then, you can ensure the model you use is in alignment,
especially to your customer’s need for explainability. This prevents the customer having
second thoughts or last-minute concerns which risk deployment of a solution.
The same goes for management. Even if your end customer is ok with a “black
box” type solution, your management may not be comfortable providing it. This is less
of a concern if you are working for internal customers, but if your end customer is the
public, your management may want explainability. Or they may not. Ask them and
find out before you build things.
Take advantage of the industry interest in explainability. This is a hot topic
in the field. There is a lot of research in this area, and I anticipate there will be new
methods and solutions in the future (Royal Society, 2019).

3.5 HAVE THE CONVERSATION


Throughout the project, from the very beginning, it is important to have an ongoing
conversation with the customer. This will help prevent all five project pitfalls. You’ll be
able to set the scope of the project properly and avoid scope creep by checking in with
the customer as you go to share the status of the project and what can be done at in-
crements. You can make sure you are keeping the project at the right level of simplicity,
so it does what is needed without excess. Your customer will have working knowledge
of the model and familiarity with how the solution works through repeated exposure
during your conversations. Finally, you can be sure that you’ve solved the correct prob-
lem because you’ve checked in with the customer as you went.
16 3. FIVE METHODS TO AVOID COMMON PITFALLS

Feature requests and other suggested improvements are opportunities to have


further conversations about the project. As part of these conversations, make sure you
are asking about timelines as well as the desired outcome from the change. All too
often we hear customer requests as “must dos” but frequently the customer doesn’t
really know exactly what they want, or even what is actually possible. A portion of the
responsibility in delivering a finished project is to help the customer articulate what it
is that they actually need. The best way to get this to happen is to have a conversation
with the customer. Ask clarifying questions and use a whiteboard or paper to create
mock-ups so they can see what to expect and correct you if you aren’t fully understand-
ing what they are trying to get.
At the end of a project, take time to reflect back and collect your thoughts on
what you learned. Review the project charter and compare what you expected would
happen to the actual outcomes. It is helpful to have a final review with your customer
to share the results of the project, including delivered business value. At that review,
you can get your customer’s feedback on the project to incorporate into your reflection.
Cover both what you would want to change in future projects and what went well on
this project. This helps you learn and grow as a data scientist.

3.6 TIPS FOR MANAGERS


Support your team in asking questions at the start of a project and throughout the
process. Facilitate meetings between stakeholders and your team or represent your
team in meetings with stakeholders.
Facilitate conversations between your team and the customer of the final output
of the project. Make sure your team is checking in with the end customer regularly
and involving them in decisions—like how the final report will look, what graphs will
be used, and how the customer will interact with the model when deployed. Ensure
that there is a strong connection to customers and at the same time that the team is
protected from excessive new feature and change requests.
The best way to ensure a strong customer connection while protecting your team
is to establish structured meeting times to share the latest iteration with the customer
and to collect feedback. Help your team in receiving feedback. Sometimes a critique
can be challenging to hear, and if your team becomes defensive, then they lose the
opportunity to learn and improve.
Communicate with the customer and stakeholders that your team needs un-
interrupted time to work and that requests for changes and new features will only
be accepted in the established review and feedback meetings. If you are using Agile
3.6 TIPS FOR MANAGERS 17

Development, have the team’s Scrum Master help with protecting the team from dis-
tractions, and require that your team only accept user stories from outside the team if
they come via the Product Owner.
Check in with your team—Are they keeping it simple? Ask them about what
the cheapest and quickest solution would be. Coach them to choose simple methods.
Make it OK to put a solution into production and then go back and improve it in
the future.
Train your team to learn from past projects by performing retrospectives. A
retrospective is time where the team reflects on a project that has been deployed and
compares the project charter to actual results. Ask what was supposed to happen
and what did happen during the project. Ask what they have learned, and what they
would like to change or continue in the future. Keep it positive. This is about learning,
not recriminations.

Table 3.2: Questions to ask at retrospectives


What was supposed to happen?
What did happen? Was there a difference?
Why or why not?
What did we learn?
What do we want to change or continue?

The first time a team does a retrospective, they will be nervous that you will
be looking to find fault. Finding fault is counterproductive and will not help the
team improve and grow. Make it safe to discuss failure and examine what happened
without placing blame on individuals. Keep in mind that your team is doing their
best. No one comes to work wanting to mess up. To quote W. Edwards Deming, “A
bad system will beat a good person every time” (Deming, 1993). Support your team
to think about where your organization’s systems and business processes are holding
them back and take steps to make improvements. Start with small things your team
controls. If you only change one thing after that first retrospective as a result of their
feedback, they will feel empowered and engaged in the process, and future retrospec-
tives will be very fruitful.
19

CHAPTER 4

Define Phase
The first phase of a project is where you set the scope and determine the deliver-
ables—what the outcomes of the project will be. As discussed previously, it is im-
portant to do this up-front work because it prevents future problems that can cause
your project to fail. Work done in this step ensures you have alignment with both
management and the end customer, and that the scope of the project is defined. This
protects against scope creep and having too big a scope. The questions you ask and the
conversations you have during the define phase of the project protect against solving
the wrong problem.
There are two tools from Lean Six Sigma you can use to help ensure you ask the
questions, get alignment, and have the needed conversations at the beginning of the
project. These tools are: (1) a project charter and (2) a Supplier-Input-Process-Out-
put-Customer (SIPOC) analysis.

4.1 PROJECT CHARTER


The project charter is a living document that establishes the parameters around the
problem you are working on. It helps you identify who the stakeholders and final de-
cision makers are. Writing a project charter means that you get alignment on the fol-
lowing questions: What’s the problem you are solving? How will you know you solved
it? What is in and out of scope? What is the business benefit that will be delivered?
Who is involved in the project? Who is the decision maker?
It might be tempting to overlook this step or do a quick pass only. The benefit of
doing this work at the start of the project is that it forces you to have the conversations
and get alignment on boundary conditions, customers and stakeholders, and expected
outcomes. This will allow you to make decisions quickly as the project progresses, be-
cause you already have that information and don’t need to seek alignment during the
later phases of the project.
The reason the charter is a living document is that often there are things you
learn as you progress in the project that may require a change to the problem statement
or modification of the project scope. Having these written down means you notice
when the scope has changed, which triggers a conversation with the customer and
20 4. DEFINE PHASE

decision maker. This keeps the alignment between the project as delivered and the
customer’s expectations, which ensures your project will make it to production.
The format of the charter document isn’t as important as its content. The key
components of the charter are problem statement, scope, how to measure success of
the project, stakeholders, and decision maker (Table 4.1).

Table 4.1: Key components of a project charter


What is the problem that needs to be solved? Why is it
1 Problem Statement
important to solve this problem?
What is included, what is excluded? What are the
2 Scope
boundary conditions?
How will we measure the project? How will we know we
3 Metrics
are done?
Who will provide input to the project? Who will work
4 Stakeholders
on it? Who is our end customer?
Who will be the final decision maker? What type of
5 Decision maker
decisions are they responsible for?

The project charter starts with the problem statement. Why are you undertaking
this project in the first place? What is the problem that needs to be solved? The prob-
lem statement should contain the basic facts of the problem. Include why the issue
matters. This ensures there is alignment for why we are doing the project.
In writing the problem statement ensure you are keeping to just the facts of the
problem and not sneaking in possible solutions. The problem is not that we don’t have
a report for monthly sales figures. The problem is that we want to know if our new
marketing campaign is working. To do that, maybe looking at monthly sales figures is
the correct approach. Maybe there is another metric that would be as good or better.
If you assume the answer in setting up the problem, you limit your thinking.
To make sure you aren’t unintentionally limiting your thinking by including
solutions in your problem statement, you can use a standard format for problem state-
ments. A good problem statement answers: who has the problem, what the problem
is, where it occurs, when it occurs, and what the impact of the problem is to the busi-
ness. You can use a fill-in-the-blanks style format like this: During (period of time for
baseline), the (primary business measure) for (business process) was (baseline). This is
a gap of (objective target vs. baseline) from (business objective) and represents (cost
impact of the gap) of cost impact.
4.1 PROJECT CHARTER 21

Writing up my example of the new marketing campaign in this format, my


customer and I can start with: “During the new marketing campaign, the sales of
widgets was $50,000. This is a gap of $25,000 from our quarterly sales goal.” We can
see, though, that what we really will want to know is how the sales compared with the
new marketing campaign to sales without the campaign. We need to adjust the metric
we use accordingly. We rewrite the statement to say: “During the new marketing cam-
paign the sales of widgets was $50,000, $5,000 more than the period of the same dura-
tion prior to the campaign.” Now we have a clearer picture of how we need to format
the report for the customer to answer the questions they have. We know we need to
be able to compare periods with marketing campaigns to similar periods with baseline
sales to see the impact of the campaign. We may still want to include the gap to the
overall sales goal, or the customer may decide that isn’t important to the problem they
are looking to solve. This reinforces the need to ask questions and get alignment with
the customer of a project before you start building things. Taking time now to align
on the problem statement prevents solving the wrong problem.
Sometimes filling in the problem statement helps highlight that perhaps the
problem is we don’t have defined metrics. You can then work with your customer to
define what those metrics would be and set up the project to deliver those metrics in
a dashboard.
The scope of the project is another important item contained in the project
charter—what is included and excluded? What are the boundary conditions you need
to work within? Is there a budget for datasets from external vendors? Does all data
need to be collected from internal sources? Is your report for a specific team or does it
have multiple users? What other resource or technical constraints do you need to work
within? Even if, and I should say, especially if, these are the “everyone knows that” type
of boundaries, make sure you include them in your project charter. The reason is that
you can then question whether they should be boundaries for your project and have
the conversation with management about what would be possible if that constraint
was removed. This sets up the ability to do a return on investment calculation and
potentially relax some constraints.
Setting up the scope helps you to assess, before you start the project, if you
will have the resources you need to be able to deliver the desired outcomes. If things
don’t line up, having the conversation before work on the project starts means you’ll
be able to adjust either the constraints or the expected outcomes to match what you
can get done.
The charter should also include documentation on how you will know the
project is successful. What are the measured criteria that allow you to know that you
22 4. DEFINE PHASE

have addressed the problem sufficiently? Are there customer requirements for model
accuracy? Are there customer requirements for timeliness? Have those requirements
been addressed satisfactorily by the project? These measurements help you know
when you are done and are a layer of protection against scope creep. The expected
return on investment from the project should be documented as well. That helps
with prioritizing work, and with ensuring management understands the benefit of
resourcing the project.
Lastly, the charter should include who will provide input to the project—the
stakeholders, and who will be the final decision maker. For each decision, there
should be only one person who decides. To go fast, it is helpful to have established
in advance who that will be.

Project Charter
Project
New Marketing Campaign Dashboard
Name

Project
J. Weiner
Owner

Problem “During the new marketing campaign the sales of widgets was $50,000, this was
Statement $5,000 more than the period of the same duration prior to the campaign.”

Design and deploy a report that shows the impact of marketing campaigns on
Scope
widget sales

Widgets, marketing campaigns for widgets, report presented as an interactive


In Scope
dashboard design of visualizations to present data

Out of Non-widget products, other marketing campaigns, predictive models


Scope

Current
Metrics Item Goal Note
Value
Baseline sales of widgets $45,000.00 $75,000.00 Q3 2018 data
Widget sales during marketing campaign $50,000.00 $75,000.00 Q3 2019 data

Stake- Widget marketing team, J. Smith (Widget marketing team manager), Data
holders science team, D. Jones (Data science team manager), Widget sales department

Decision
J. Smith
Maker

Figure 4.1: Example project charter


4.2 SUPPLIER-INPUT-PROCESS-OUTPUT-CUSTOMER (SIPOC) ANALYSIS 23

Having the stakeholders and decision maker written down in the charter docu-
ment does two things. One, at the beginning of the project it makes you think through
who the project stakeholders are so you can get their input and feedback at the start
of the project. They may be customers of the end result of the project, or they may
be suppliers of data or other input. Writing down who they are means you have a list
and can check in with them as the project progresses. Writing down the stakeholders
is also a check on project scope. If you start to have a large number of stakeholders,
then it might be good to scope the project down or break it into pieces that can be
delivered separately to different sub-sets of stakeholders. The other thing that writing
down the stakeholders and decision maker does is that it helps with alignment. Stake-
holders can see who else will be giving input into the project, and they can see who
the final decision maker will be. This prevents all your stakeholders from assuming,
naturally, that they are the key decision maker for the project. It avoids the problem
of too many bosses which will make finishing a project difficult due to expansion of
scope and misalignment.

4.2 SUPPLIER-INPUT-PROCESS-OUTPUT-CUSTOMER
(SIPOC) ANALYSIS
A SIPOC (pronounced “sigh-pock”) summarizes the inputs and outputs of a process
in a table format; see Figure 4.2. More importantly, it includes the requirements for
those inputs and outputs. For a data science project, this helps ensure that the model
selected provides the expected accuracy and establishes what data are needed to gen-
erate that model. For a project delivering a report or dashboard, it helps determine the
criteria for frequency of use, who the audience will be, and what data are needed to
create the report or dashboard.
A SIPOC is created in three parts. The first part establishes the start and end of
the process that is being worked on. This could be to fix a problem or to enhance a pro-
cess with automation or AI. For example, if you are intending to build a report to help
make a decision, the process that is used to make the decision would be in the center
of the SIPOC. The second part focuses on the output of the process and the customer,
including the customer requirements. The third part looks at the inputs needed for the
process to deliver the outputs, and what the requirements are for those inputs.
24 4. DEFINE PHASE

Figure 4.2: Supplier-input-process-output-customer (SIPOC) analysis table. The SIPOC


is completed in three parts following the numbered steps.

Let’s use the following as an example and build the SIPOC. Say I am planning
to build a report to help engineers determine if material meets certain criteria for qual-
ity and can continue to be processed, or if it should be scrapped. That decision-making
process would go in the center of the SIPOC table. I would write down the start and
end points: start—material is on hold for engineering decision; end—material is dis-
positioned, as shown in Figure 4.3.

Figure 4.3: Example SIPOC for engineering dispositioning material.

The next phase is to examine the outputs. For this example, I write down the
outputs or deliverables from the process: dispositioned material. But let’s think that
4.2 SUPPLIER-INPUT-PROCESS-OUTPUT-CUSTOMER (SIPOC) ANALYSIS 25

through. Really, the output of the process is quality parts, so let’s make that change.
The other output of the process is timely decision making. Not only do we want quality
parts, we want to have our material flow through the factory and not be waiting for a
long time.
Next, I identify the customers who receive deliverables from the process: man-
ufacturing, engineering, and engineering management. Manufacturing is a customer
because they want to keep material flowing through the factory, and also want to make
sure the manufacturing process is making quality parts. Engineering is a customer be-
cause the reason that material needs to be dispositioned is useful for troubleshooting
the manufacturing process and making adjustments to ensure the process produces
quality parts. Engineering management is a customer because they are responsible for
their team to disposition material quickly and correctly and fully address problems, so
they don’t re-occur.
In thinking about the customers and why they are customers, we start to see
what their requirements might be. The best way to know for certain is to go ask the
customer. You can start this conversation by sharing what you think their requirements
might be, and then listen to their feedback and additions. Finally, I write down the
requirements for each output from each customer; see Figure 4.4.

Figure 4.4: Example SIPOC with both Part 1, Process, and Part 2, Output and Cus-
tomer completed.

Think about requirements in terms of accuracy, timeliness, and completeness. If


there are ways to measure these qualities for the output, include those metrics. In this
example, the manufacturing line wants quick response time from the overall process, so
material waits a minimal amount of time for engineering to decide. They also want the
26 4. DEFINE PHASE

correct choice, so material is not wasted or the final customer is not upset by receiving
material of poor quality. Engineering wants the report to have all the necessary infor-
mation in one place so they can quickly make a decision, they also want the report to
be accurate so they can disposition the material correctly. Knowing the outputs then
means I have a sense of what inputs are required.
The third part of completing a SIPOC analysis is to look at the inputs to the
process. This starts with writing down what inputs are required to enable the process
to occur. Then you look at who or what supplies each of those inputs. Finally, you
document the requirements for each input from each supplier.
In our example, the engineer needs to know what material is waiting for them to
disposition, why the material has been flagged for them to look at, and what happened
on the equipment when the material was being processed. The list of material is neces-
sary for timely decision making. Why the material was flagged and what happened on
the equipment provide the engineer with information so they can make corrections to
the process and prevent similar errors in the future. There is a clear connection between
the outputs that meet the customer’s needs and the inputs required to build the report.

Figure 4.5: Example SIPOC with Part 3 Suppliers, Inputs started.

Once we know the needed inputs, we can identify who (or what system) sup-
plies that information. Then we can determine the requirements for each of the inputs
so that our report can meet the needs of our customers.
In our example, as we complete the suppliers and requirements, we notice we
need one other input. We need to know how long the material has been waiting to
be able to meet the manufacturing requirement of not waiting longer than 12 hours.
The data need to be accurate and include detail on what was measured compared to
4.2 SUPPLIER-INPUT-PROCESS-OUTPUT-CUSTOMER (SIPOC) ANALYSIS 27

the goal for that parameter. An example of this is statistical process control limits.
This information is needed by the engineer to be able to answer why the material
was flagged. Suppliers can be teams or systems. In this example, manufacturing is a
supplier—if they have entered comments into the shop floor control system. Other
suppliers are the databases for the statistical process control and shop floor control
systems; see Figure 4.6.

Figure 4.6: Example SIPOC with all parts completed.

After completing the SIPOC, you might need to go back to your project charter
and update the list of stakeholders based on the findings from listing out the custom-
ers and suppliers. You may also need to adjust the scope of the project based on what
you’ve learned.
The SIPOC is helpful in the phase of your data science project where you are
acquiring and exploring data. From the SIPOC analysis you know the requirements
for the inputs to your process, report, or model which will help you select data sources.
From our example, we know we need to extract data for our report from the statistical
process control database and from the shop floor control database because we have
completed the SIPOC and understand the requirements to enable us to create the
output the customer desires. As you do the data acquisition and exploration, show the
results to your customer and get their feedback.
The project charter and SIPOC help you ask the questions at the start of a
project that set you up to succeed in the end and deliver a project to production. The
project charter establishes the problem you will be working on, so you don’t solve
the wrong problem, and sets the scope and definitions of success to prevent too big
a scope and scope creep. The SIPOC allows you to think through stakeholders and
28 4. DEFINE PHASE

scope the project requirements as well as get alignment on the expected deliverables
and requirements for those deliverables. This is useful for defining what “done” looks
like for your project.
The other factor in determining if you are done with a project is to compare
the expected business value to the business value delivered. Achieving the expected
business value early is a reason to re-assess the project scope and maybe stop working
on it, after having a conversation with your stakeholders and final decision maker. Not
meeting the expected business value after all the planned work is complete can be due
to factors outside your control and is worth assessing if further work should be done, or
if things are good enough as is. Again, this is a joint decision between the team work-
ing on the project and the final decision maker. In the next chapter, we’ll investigate
how to calculate business value for data science projects.

4.3 TIPS FOR MANAGERS


Ask to see a project charter for each project your team works on. This will frame the
work and provide metrics you can use to guide your team and assess the project status.
Make sure you have alignment with your team on what “done” means for the project
and hold them to it.
Request a SIPOC analysis for projects and coach your team to complete it early
in the define phase of the project. This analysis provides insight into the customer
needs and requirements to enable the project to meet those needs. Having done this
analysis, the team will know what data are required to build the report or model they
will deliver to the customer.
Have regular reviews with your team and make sure they are updating the proj-
ect charter and the SIPOC as they learn about the problem the project helps to solve.
Check in also to ensure the project isn’t suffering from scope creep. If you meet the ex-
pected business deliverables early, allow the team to be done with the project. Wrap up
all projects with a retrospective where you compare the expected business value to the
delivered value. Coach your team to investigate both cases of when you do not meet
the expected value, and when you exceed the expected value. This will help them learn
and refine their calculations and hone their ability to make more accurate estimates.
29

CHAPTER 5

Making the Business Case:


Assigning Value to Your Project
Part of the project charter is to document the expected return on investment for
the project. Assessing the business value for your project will help get resources and
funding. It helps to answer management’s question of “what do I get?” Knowing the
expected deliverables for a project helps in getting support. It also helps with defining
when you are done with a project.
It can sometimes be difficult to assign a business value to a data science proj-
ect. There is enough benefit to be derived from the effort to calculate the value that
performing the exercise is worthwhile despite the difficulty. To help, I’ve identified
business benefits by types of data science projects and created a table of conversions to
dollars for common metrics.

Table 5.1: Deliverables and metrics for various types of data science projects
Project Type Deliverables Metrics

Productivity
Root cause determination
Time to decision
Data analysis Problem solving support
Decision quality
Problem identification
Risk reduction

Time to decision
Time savings
Automation Waste eliminated
Decision support
Decision quality
Standardization Excursion prevention
Standards and business
Business process improve- Quality improvement
processes
ment Risk reduction
Improved model accuracy
New insights
Data mining Decision quality
Learned something new
Risk reduction
Productivity
Increased capability
Improved data science Decision quality
Advanced algorithms
Risk reduction
30 5. MAKING THE BUSINESS CASE: ASSIGNING VALUE TO YOUR PROJECT

To build out these metrics, let’s look at the types of data science projects I’ve
been involved in over the course of my career. I can group the projects I’ve done into
five broad categories. I’ve done data analysis projects. I’ve built reports and automated
processes. I’ve done projects to devise standards and improve business processes. I’ve
delivered insights from data mining. I’ve done projects which improved my organi-
zation’s ability to do data science. Each type of project has different deliverables, and
different metrics to measure those deliverables (Table 5.1).

5.1 DATA ANALYSIS PROJECTS


Productivity and time to decision are similar metrics, but slightly different. Productiv-
ity measures how much effort is required to determine root cause or identify problems.
If your project makes this easier, you can measure the impact by assessing the produc-
tivity improvement and looking at the value of other work people can now do since
they have time freed up from the results of your project. Time to decision is a measure
of how long it takes to gather the information needed to make a decision.
Here’s how a time to decision calculation works. Say it used to take someone
two hours to pull together the information required when a particular decision needed
to be made. This information needed to be presented to a meeting and the decision was
made in that meeting. Say you were able to create a script to extract the data needed
and build a visual like a graph rather than a table of numbers, which let the meeting
attendees clearly see the information they needed to support the decision. It could be
possible that you were able to reduce the time needed from 2 hours plus the time in the
meeting (let’s say it would take 20 minutes in the meeting) to 10 minutes total. That’s
an improvement in time to decision of 93%. Now, to convert to dollars, we take the
average salary of the participants—the person who usually collected the information
and the people making the decision and multiply that by the hours saved. I’m using
example values here, so for ease let’s say the average salary is $100,000 annually. That
works out to $50 per hour. The hours saved is 2 + 0.3333 = 2.3333 - 0.1666 hours =
2.1667 hours saved. Since in this case, only one person did the work to gather the data,
we multiply by one. You save that amount every time this decision needs to be made, so
if this is a decision that is made quarterly, the project delivers $433.34 annually (Table
5.2). That’s just from saving two hours of time for one person. If there is more than one
person doing the task and the task is performed daily, the numbers can really add up.
5.1 DATA ANALYSIS PROJECTS 31

Table 5.2: Example calculation for time saved


# of People
# Times Task Average
Time Saved Who Total Value
Performed Hourly
Per Task Perform the Delivered
Per Year Salary
Task
2.1667 4 (quarterly) 1 $50 $433.34

Let me take a little time here to talk about decision quality. The quality of a
decision is not defined by whether the outcome of the decision is good or bad. We
could make a good decision and still have a bad outcome. Say we are deciding to plant
a crop. We could make a quality decision and still have a bad outcome if the weather
changed from its forecast.
A good quality decision has the following characteristics: it is framed appro-
priately; there are alternatives; data and information are used to decide; the value
and trade-offs are clear; logical reasoning is used; and there is commitment to follow
through and to take action based on the decision. We can use these metrics to gauge
the decision quality. If our analysis is providing the data and information needed to
decide, how can we assess how much we have improved the quality versus not having
that data?

5.2 AUTOMATION PROJECTS


For projects where you are adding automation, or building reports, the metrics are
time to decision, waste eliminated, and decision quality. I talked about time to deci-
sion and decision quality in the previous section. Let’s focus on the concept of waste.
Automation projects are often developed to save time and effort on repetitive tasks.
Frequently they result in getting rid of unnecessary tasks or removing the need to wait
for information. In Lean manufacturing there are seven types of waste. These have
analogs to office work, and you can use similar calculations to quantify the benefit from
eliminating the waste. I’ve summarized the types of waste in Table 5.3.
32 5. MAKING THE BUSINESS CASE: ASSIGNING VALUE TO YOUR PROJECT

Table 5.3: Types of waste with manufacturing and office examples


Waste Manufacturing Examples Office Examples
Transportation Moving things Moving information
Work in progress, parts in Unread emails, reports to be
Inventory
storage read, approvals to be processed
Walking to get a copy, re-
Reaching for a tool, walking to
Motion trieving a file from a drawer,
get a ladder
searching for a file on a drive
Waiting for a part or for a Waiting for information, wait-
Waiting
person ing for an approval
Making more product than
Over production customers are willing to pay Making reports no one reads
for
Doing more work to a product Adding features to a report
Over processing than customers are willing to that don’t have a customer or a
pay for use case
Defects Parts that fail quality criteria Errors and omissions on forms

5.3 IMPROVING BUSINESS PROCESSES


Some projects I’ve worked on resulted in defining standards for how work was done
and improving business processes. In those cases, the deliverables are standards and
improved business processes and the metrics are quality improvements and prevent-
ing process excursions. Improved quality can come from defining standards, such as
a standard report that everyone who does a particular task uses. It can come from
automating tasks to reduce errors, or from adding error checks to data entry. Process
excursions can be prevented through applying statistical process control. This is not
for factory processes only. I’ve applied statistical process control to change approval
throughput times to identify data entry problems.
Lean six sigma projects fall into this category and frequently provide very large
business value, often by using data science—data analysis, developing automated re-
ports, and creating models. My green belt project delivered over one million dollars in
savings and used advanced statistical techniques like measurement system analysis and
design of experiments. My black belt project delivered just about two million dollars
in savings and used hypothesis testing and survey design.
5.6 METRICS TO DOLLAR CONVERSION 33

5.4 DATA MINING PROJECTS


For other projects, the key outcome is new insights. These can come in the form of
improved model accuracy, increased decision quality, gaining knowledge from mining
data, or just learning something new in general. Often insights from data mining result
in new projects which deliver further value. Learning something new is generally hard
to quantify but should be included as text in a final summary of your project.

5.5 IMPROVED DATA SCIENCE


Finally, one of the outcomes of a project can be improved data science. You can develop
new algorithms, and you can increase the organization’s ability to do data science as
a result of a project. I’ve done a number of projects that fall into this category, often
as a means to an end. In one project, we needed to access data stored on equipment
hard drives. Historically, collecting this data was completely manual, so it was not
done frequently. As part of the project, I worked with my IT department to set up a
shared file system where the data could be transferred automatically. This increased the
organization’s ability to do data science by making the previously orphaned data easy
to access and allowed it to be combined with other data sources to enable new insights.

5.6 METRICS TO DOLLAR CONVERSION


Converting these common metrics into dollar value enables comparisons between
different types of projects and helps with prioritization if there are multiple things
you could work on. Additionally, dollar value is easy to understand. Table 5.4 shows
common metrics and the conversion factor to translate the metric to dollar value. Keep
in mind that you frequently will want to annualize the value delivered. If you hadn’t
done the project, people would have to go back to doing the task the old manual way.
How often the task that has been improved is performed per year is a factor in how
much value your project has delivered.
One last point on assigning business value to a project. Data science projects
often result in freeing up people’s time by automating tasks or using AI to accelerate
work. Freeing up people’s time means that they can work on other things as a result of
the project. That additional work that your customer has been able to handle should
be monetized and included in your project’s results.
34 5. MAKING THE BUSINESS CASE: ASSIGNING VALUE TO YOUR PROJECT

Table 5.4: Common metrics and dollar conversion


Metric Measure In Conversion Factor
One time: none
Dollars saved Dollars Recurring: $/year × #
years
One time: none
Revenue gained Dollars Recurring: $/year × #
years
Value of action people can
Productivity Time saved, typically
now do; decision making,
Lean waste reduction Hours saved per year
influencing, etc.
(Value of increased market
segment share)3 × (number
Time to Market Weeks pulled in
of weeks accelerated to
market)
(Value of increased market
Weeks pulled in (for the segment share)3 × (number
Schedule acceleration
limiter) of weeks accelerated to
market)
Multiply by average salary
Time saved; Lean TIM-
of worker in role, calculate
Time to decision WOOD waste elimi-
savings due to waste elim-
nated4
inated
Cost per unit, cost (loss)
Yield improvement, errors per error, time wasted per
Quality improvement
reduced error per year × average
salary
Excursion prevention Units saved Cost per unit
Frame, alternatives, infor-
mation, values and trade- Often decision alterna-
Decision quality
offs, logical reasoning, tives are valued in dollars
commitment to action
Risk reduction Value of the risk metric Dollars

3
Increased market segment share can be difficult to quantify
4
See Table 5.3.
35

CHAPTER 6

Acquisition and Exploration of


Data Phase
In Chapter 4, we introduced SIPOC analysis. One of the results of doing this analysis
is that you get a picture of the inputs needed to build the final output of the project,
whether that is a report, presentation, or model. In addition to listing the needed
inputs, the SIPOC includes the requirements for those inputs in order to achieve the
desired results for the outputs. This is useful in identifying data sources for the project.
Not only do you know what data are required, you also have information on how often
the data are needed, and what level of quality is desired. In the ideal case, the data you
want to use are automatically collected and stored in a database which you can easily
access. Unfortunately, this is not always the situation.

6.1 ACQUIRING DATA


In acquiring data, there are two cases. One, when the data are available already either
in internal systems or from external sources, and two, when you don’t have the data.
The first case is straightforward; you need to connect to the data source and then move
on to the next step of exploring the data. The second case is more challenging, so I will
spend time on it.
When you don’t have the data that you need, you must determine if it is cur-
rently being collected. From time to time I have found that the data are not being
collected. In some cases, there are existing systems to automatically collect and upload
data. Including a parameter in these cases, means working with the system owner to
add that parameter to the automated data collection. As far as data gathering goes,
that is fairly simple. In other cases, there is no system and no data being collected. In
this situation, you need to develop a data collection system.

6.2 DEVELOPING DATA COLLECTION SYSTEMS


If you are developing a system to collect data, and the data have not been collected
previously, start small and start simple. Is there an automated system that captures the
data? Can you store the data from that automated system in one standard location?
As an example, say we are testing systems in a lab. The collected data are kept in files
36 6. ACQUISITION AND EXPLORATION OF DATA PHASE

on the test system’s hard drive, then transferred to the engineer’s computer for analysis.
By switching the storage location for the collected data to a network drive, we can
more easily explore data across multiple test systems. It is a simple change, much more
straightforward than setting up a relational database to store the data. Long term,
we may want to move in the direction of storing the data in a database. Our simple
change has opened up the potential for analysis of data from multiple systems that we
can start using right away.
Another example of starting small and simple is when a team I worked with
used a SharePoint5 list to collect data. We needed to forecast hardware use and would
do the forecasts twice a year. These forecasts were kept in various office document
formats like presentations or spreadsheets. Our problem was that we weren’t able to
use the historic data from past forecasts because they were not saved anywhere sys-
tematically. By developing a standard location for the forecasts on our SharePoint and
designing the SharePoint list to match formats people had been previously using, we
made it easy for the team to enter the data and build a history.
When you need to collect data from people, make it easy for them to enter the
data. If you have a form or data collection tool that has a lot of fields, people will tend
not to fill things in, or not fill fields in completely, if they feel it takes too long or the
form is too big. When you make the fields required fields, typically people will do the
minimum amount required, even if they may know more about the situation and could
add information by filling in other fields. When you are collecting data from people,
think about the experience from their perspective. People are busy. People typically
will do the minimum unless they are passionate about something—for example, I’ve
had situations where a manufacturing technician would be very frustrated by an on-
going problem and add a ton of helpful content into a comment field, because they
were angry that the problem kept happening. In that case, our normal data collection
systems didn’t transfer the complete information well from the techs to the engineers.
Sometimes the people who enter the data don’t understand the value that can
be gained from using the data they provide, so they do the minimum, or do it quickly
and maybe with less attention to detail than you would like. Help them help you by
minimizing the burden to enter data. Help them understand the value they deliver in
collecting the data and entering it. Circle back to the people who provide the data with
results from your analyses that have been made possible by their data and share what
can be learned. Because of these difficulties with manual data collection, I advocate for
automating data collection wherever possible.

5
Other marks are the properties of their respective owners
6.4 WHAT DOES THE CUSTOMER WANT TO KNOW? 37

Starting small and starting simple is helpful also because you have then tested a
system, which is valuable information when you need to expand or grow it. For exam-
ple, if you have started some manual collection, and determined that there are valuable
insights that can be delivered from that data, it is easier to build a case to add sensors
or other measurement instruments and collect the data automatically. In cases like
this, I’ve collaborated with my IT department to develop systems that will automate
the data acquisition. By starting small and already having data collected, I then have
a good idea of the amount of effort that will be needed to automate the system, and
a sense of the benefit that will be delivered. You can then calculate a return on invest-
ment for the effort which, in addition to information from your project charter and
SIPOC, is useful for convincing the IT department to work on your project.

6.3 DATA EXPLORATION


Once you have collected the data you need for your project, the next step is to explore
the data and begin to understand it and what insights you can gain from it. I start by
graphing and visualizing the data. One of the first graphs I make is a distribution of
each of the columns of data in the data set. This helps me scope the amount of data
cleaning I will need to do. It also helps get a sense of the structure of the data, if there
are missing values, and if I might need to transform some data. Then, I begin to look
at relationships between the data using x-y graphs.

6.4 WHAT DOES THE CUSTOMER WANT TO KNOW?


As you explore the data set, keep in mind what you have learned from the SIPOC.
Data exploration can be a time waster if you are looking at things in which the cus-
tomer isn’t interested. At the same time, insights from data exploration can be ex-
tremely beneficial when they are unexpected and surprising. Having had the up-front
conversations and the guidance from completing a project charter and SIPOC analysis
aids in striking the correct balance.
Focus your time on the primary questions the customer has of the data. Then,
as you explore, think about secondary questions a customer might have. Take, for ex-
ample, the case of building a report to look at a fleet of processing equipment. I might
first want to know which tools are running, then I might want to know which tools
had errors. From there I could ask about error frequency, or what are the most common
errors. While doing the exploration, I can capture these questions and build them into
the design of the report.
38 6. ACQUISITION AND EXPLORATION OF DATA PHASE

6.5 PREPARING FOR A REPORT OR MODEL


Once you have a sense of the data, you’ll begin the process of data cleaning. When
working with data that is manually entered, you will have multiple spellings of the
same word that need to be condensed. You will need to define or apply standards for
capitalization. You will need to decide how to deal with missing data. Data cleaning
frequently takes a long time, and just as frequently, isn’t mentioned. It’s important and
worth the effort.
When cleaning data think about how you will test that you are not over or
under cleaning. In a recent project, I used a simple comparison of the automatically
cleaned text to human cleaned text from the same input. This test became part of the
continuous integration for the project, so any time the text cleaning code was changed,
the tests to check for over or under cleaning were run automatically.
When exploring the data and thinking about what the customer wants to know
you may find that the data set as is doesn’t have the exact right parameters. This is
where feature engineering comes into play. You will need to generate new features
from the data available in the data set to help answer your customers’ questions. As an
example, you may need to separate text fields into columns, or create columns through
manipulating data in other columns—like calculating the duration of a process step
from the timestamps for material moving in and out of that step.
Other times, your data set has too much information. This is when feature se-
lection comes into play. For smaller sets, you can accomplish this manually looking at
correlations between parameters using x-y plots. If your data set is large, say over 1,000
columns, then use a machine learning-based method.
The bulk of the time I spend on data science projects is in this phase of the
project. Getting the data together can take a long time and can be a project in itself
if you need to develop a data collection system. Cleaning the data can also be a major
undertaking, not to mention feature engineering and feature selection. Going slowly
in this step will make the rest of your project easier. This is yet another place where it
pays to go slow to go fast.

6.6 TIPS FOR MANAGERS


When reviewing the SIPOC with your team during the define phase of a project, ask
about data acquisition. Does the required data exist? What are their plans to acquire
the data they need? Do they need support from IT or other groups within your or-
ganization? Do they need to purchase data sets externally? Does your organization
have policy or guidelines in place regarding external data? Make sure your team is in
6.6 TIPS FOR MANAGERS 39

compliance with policies on privacy of personally identifiable information and other


policies regarding data collection and storage.
Have patience during this phase of a project. Ensure your team is taking the
necessary time to effectively collect and clean the data for the project. Recognize the
value that is being delivered in this phase of the project. New data collection methods
and solutions are a clear business benefit delivered in this phase. Cleaned and labeled
data sets are additional beneficial outputs from this phase.
Coach your team to think about reuse and standardization as they work through
data acquisition and exploration. Can they modularize data cleaning code so it can be
used in another project? Can they upload the cleaned and labeled data to a database
so it can be accessed by other teams in your organization?
Ensure your team is engaging with the project customer during the exploration of
the data so that they are answering the correct questions and solving the correct problem.
41

CHAPTER 7

Model-Building Phase
Two of the project pitfalls relate directly to the model building phase of a data science
project: couldn’t explain the model, and the model was too complex. The tools to ad-
dress these pitfalls are to keep things simple and leverage explainability.

7.1 KEEP IT SIMPLE


The simpler the model, the easier it is to explain to others. Understanding early in the
project how important it is to your customer to be able to understand how a model is
using the input data to make predictions will ensure you use a type of model that meets
that need. By completing the project charter and SIPOC you will have identified the
customers and stakeholders for your project and understood their requirements. This
can then be translated into the model selection process.
The other thing the SIPOC can help with is to think about how you will main-
tain the model during the model selection process. By defining the input requirements
in the SIPOC, you can use that information to assess how data hungry the model will
be and that can help you decide between different choices.
Part of keeping things simple is to use the most basic type of model that meets
the project’s needs. Can you use classification and regression trees rather than a neural
net? Can you use a simple tree model rather than an ensemble model? Can you use a
physical model rather than machine learning? If there is already a known equation that
connects the inputs to the output you want to predict, use it! This is another benefit
from having done the work to document the SIPOC: you have thought through the
inputs that your model requires and thought through the requirements for the model
output. Having done that work, the task of model selection becomes easier.
Simple models are much easier to explain. For a physical model, you can explain
it by showing the equation. For a simple tree model, you can show the tree. Explaining
why a model made a particular prediction becomes harder when you use ensemble
methods or neural nets. This is an area of wide interest in the industry with a number
of researchers working on explainability.
Simple models are easier to connect to reality. There is a common trap in build-
ing models that you add parameters to improve the model but lose the connection
42 7. MODEL-BUILDING PHASE

between the model and what is actually happening in the real world. To be useful, a
model needs to be tied to reality and based on real measurements.

7.2 REPEATABILITY
Models need to be repeatable. If I provide the same inputs, I should get the same
results. This is easy to ensure if my model is simple. If different people run my model,
they should get the same results given the same inputs. If the model is run on a differ-
ent machine, I should get the same results. To be useful, models should be transport-
able—meaning I can share a model with another team. This is much easier to ensure
if they are simple.
Good coding practices help with repeatability and the ability to share models
and code between teams. This is something to think about during the modeling phase
of your project. Can you modularize your code so that other teams can use pieces of
your project? How will you test your model? How will you verify that the predictions
from your model are accurate?

7.3 LEVERAGE EXPLAINABILITY


Something to keep in mind when it comes to explainability is how data science savvy
your customers are. Recently, I was discussing a project with another data scientist
and their customer wanted to know why the model was predicting the importance of
certain parameters. In this case, the model was using classification and regression trees
to do feature selection. The difficulty is that the prediction of importance can only tell
you so much—it says that these parameters correlate to the output, but that doesn’t
itself tell you if those parameters are causal. To be able to explain causality, you need to
consult with domain experts, or even perform additional experiments. The distinction
between causality and correlation can be a difficult difference for people, even other
engineers, to understand.
I use caution with neural nets because of concerns around explainability. Neural
nets can be incredibly useful and are a valuable tool. They can also be a black box. You
can fool them and yourself if you are not careful in the selection of training data.
An example of fooling a neural net is given in the paper “Why Should I Trust
You: Explaining the Predictions of Any Classifier” (Ribeiro, Singh, and Guestrin,
2016), where the researchers trained a model to distinguish between husky dogs and
wolves, but the training data set was set up so that all the wolves had snow in the
picture and all the husky dogs where photographed without snow, typically indoors.
Essentially, they trained a neural net to identify snow, not to differentiate between
7.4 TIPS FOR MANAGERS 43

husky and wolf. In this case, the training set was deliberately selected to be biased in
this way for the purposes of the paper. The risk in using a neural net comes when this
type of problem occurs unintentionally, and you are not aware of the problem in the
training set.
Between concerns around explainability and the desire to keep things simple, I
typically prioritize using models in this order from simplest to most complex:
1. physical models;

2. classification and regression trees;

3. ensemble methods; and

4. neural nets.
Of course, model selection is highly dependent on the type of data that will be used.
Neural nets are particularly useful for visual analytics and natural language processing.

7.4 TIPS FOR MANAGERS


Make sure your team is keeping it simple. Use a physical model if one exists. Check in
that they are not using the latest algorithm just because it is cool. Ask to see the data
on what models they have tried and ask why they have selected the model they have
chosen. Sometimes using an “old” method is the best, and sometimes there is a benefit
to be gained from using the newest methods. Make sure your team is selecting models
and algorithms to use based on business need.
Ask your team about the model’s predictions. Do the predictions match what is
really happening? Make certain that the model is tied to reality. Ask about the model
parameters—are they available when someone will want to use the model?
What are the assumptions in the model? Are they accurate? Help your team
think these assumptions through and request they have data to validate their assump-
tions. Make sure the assumptions are clearly documented and updated as the model
changes. Request that your team creates documentation on the model, what other
models were tested, and why this one was selected.
Beware of bias in training data, especially for neural nets but also for machine
learning. Ask your team about how they are protecting against bias in their training
data. Absolutely require two separate sets of data: a training set and a testing set. These
are randomly created from the full dataset acquired. Don’t allow your team to fool
themselves by testing the model against the data used to train it. Watch for overfitting.
Overfitting is when parameters are included in a model so that the model accuracy on
44 7. MODEL-BUILDING PHASE

the training data set becomes very high. The problem is then that the model becomes
overly specific, and the accuracy on other data will be lower than optimal. This is a
place where keeping it simple helps. Testing on data not used to train the model will
help your team detect overfitting.
Support your team in applying good coding practices. Provide a source code
control system and ensure your team uses it. Have them create standards and doc-
ument how they write code so there is consistency across the team. Ensure that all
dependencies for your team’s code are documented and included so other teams can
reuse code your team has created, and that your team can repurpose code from others.
Support your team in creating tests for code and models. Consider requiring contin-
uous integration which uses automation tools which build and test code after each
change (Manturewicz, 2019).
Ask about model maintenance and how the model will be supported for the
long term. What is the plan to maintain the model? What will trigger the need to
retrain the model? Who will own the model long term? What are the systems that are
in place to support the model? What does your team need to build and what business
processes need to be developed?
45

CHAPTER 8

Interpret and Communicate


Phase
Every type of data science project ends at a different point. It occurs to me that one
reason that 87% of AI/data science projects don’t get to deployment is because of how
deployment could be defined. If the nature of that type of project is not to deploy a
model, does that mean the project didn’t get to deployment? Depending on the project
type, the end deliverable might be a presentation in a meeting. For the purposes of
this book, I’ll consider deployment to be successful delivery of the final product of the
project whether that is a model or communication of the results of an analysis.
In Chapter 5, I listed the different types of data science projects I’ve worked on.
Each of these types has a different final deliverable as summarized in Table 8.1.

Table 8.1: Data science project types and typical final deliverables
Type of Project Typical Final Deliverables
Data analysis Presentation
Automated report/dashboard
Automation Automated report/dashboard
Deployed model
Improved business process Automated report/dashboard
Deployed model
Data mining Presentation
Improved data science Presentation

For data analysis projects, the deliverables are usually determining root cause for
a problem, supporting the problem-solving process, or identifying problems that need
to be fixed. This means that data analysis projects most often end by you presenting
your findings in a meeting. Sometimes you then create an automated report or dash-
board to enable your customer to continue to monitor various metrics resulting from
your analysis.
In projects where the goal is to automate a process, the typical deliverable is a
report or dashboard. Sometimes I am asked to automate an analysis, often one which
requires input from multiple sources. Sometimes I am automating the process of gath-
ering data by generating a single report which includes all the information needed to
46 8. INTERPRET AND COMMUNICATE PHASE

make a given decision. The example I used in describing how the SIPOC works in
Chapter 4 is this type of project.
Automation projects can also result in deployed models. An example of this is
the process control project I mentioned in Chapter 7. The process we were automating
was a manual adjustment of equipment parameters based on statistical process control
values. We developed and deployed a physical model that would automatically make
the adjustments.
Improving a business process typically includes developing standards for how
work is done. The data science deliverable for this type of project is a report or dash-
board that either is the standard, for example a report can provide one standard way
to extract and view certain data in order to make decisions, or measures the business
process and helps maintain the new systems. There can also be deployed models to sup-
port this type of change, depending on the level of automation in the business process.
Data mining projects result in new insights. If those insights are not communi-
cated, no business value is generated. This communication is usually done in meetings
through presentations. It can also be accomplished via emailed reports or through
writing a paper.
Lastly, projects which improve data science should result in a presentation or
paper to share that increased capability or new algorithm. You may wind up with im-
proved data science capability as a side benefit to a project. No matter if it is the main
intent or an additional outcome, sharing what you have learned with your organization
increases the ability of the organization as a whole. It is worth spending the time to
write up the learning as a paper or presentation.

8.1  KNOW YOUR AUDIENCE


No matter if you are creating a presentation or building a model, it is important to un-
derstand who your audience is and to target your delivery to that audience. In the case
of a presentation, the audience is the attendees of the meeting or event where you will
be presenting. In the case of a report, your audience is the consumer of the report. For
a model, the audience is the user of the model. In each case, you need to understand
their needs and what information they hope to get.
To prepare, you can ask yourself the following questions. Check in with repre-
sentatives of your audience and ask them these questions as well. To start, you need
to define who the audience is for this particular project. Is there a forum you will be
presenting to? If that is the case, who is typically in the room? Are you providing
information to support a decision? What is that decision? Who are you trying to in-
8.2 REPORTS 47

fluence? What are the primary questions your audience will want to answer with the
information you are providing?
When I talk of reports, I mean automated reports or dashboards. When I talk
of presentations, I mean you sharing information typically in the form of a slide deck
to a group. When I talk of models, I mean AI models that make predictions. I will
tackle each one separately.

8.2  REPORTS
For reports there are three rules.
1. Keep it simple.

2. Keep it clear.

3. Use good visuals.


The reason to spend time to ask questions and understand your audience is that your
role is not just to communicate findings or data, but to guide your audience, interpret
the results of your findings, and highlight points of interest.
The first rule is to keep it simple, and there it is again, one of the ways to avoid
pitfalls for your project. For reports, put the most important information for your audi-
ence at the top left of the screen. The reason for this is that is where the eye goes first,
since we read top to bottom, left to right.6
The second rule is to keep it clear. Be consistent with colors in your report.
Users will come to associate particular colors with meaning, such as associating blue
with Tool 102. If suddenly there is data for Tool 101, and it is shown in blue and
Tool 102 is now green on graphs, your users may not notice that and will misread
the graphs.
Don’t confuse the person using the report with extra information. If you are not
sure they will want to see it, provide a way to view the information on demand. For
this reason, I prefer interactive reports. If a report is interactive, the user can influence
what information is provided to them. Clearly, you need to have an understanding of
what that might be in order to manage this. This is why it is important to have your
end users involved in the project from the start, to define the scope, and throughout
the design and development process.

6
If your report is for an Israeli audience, flip it and put the most important information on the
top right. This may also apply for Chinese or Japanese audiences. Ask about where your user
expects the most important information to be on a page.
48 8. INTERPRET AND COMMUNICATE PHASE

When building reports, I work in an iterative mode, and check in frequently


with the audience for the report. How do they expect to see the information pre-
sented? Can you enhance understanding by showing the information in a different
way? What common questions do they want to ask of the data after seeing the initial
visualization? How can you support answering those questions in your report? I typi-
cally start with a mock-up either on a whiteboard or as a pencil sketch on paper. This
is easy to modify and iterate before effort is placed into coding or working with a data
visualization software package.
The third rule of reports is to use good visuals. Since that’s not the primary focus
of this book, I will recommend some sources. I’ll highlight the three books I would
start with and have included a longer reading list in Table 8.2. The first book is Edward
Tufte’s, The Visual Display of Quantitative Information, which outlines the principles of
data visualization (Tufte, 2001). Stephen Few’s book, Show Me the Numbers: Designing
Tables and Graphs to Enlighten, was the text book for a course I took on data visual-
ization and has examples of both good and bad visualizations (Few, 2012). Finally,
Storytelling with Data: A Data Visualization Guide for Business Professionals by Cole
Nussbaumer Knafic gives practical tips and step-by-step examples of taking graphs
from complicated and cluttered to simple and clear (Knafic, 2015).

Table 8.2: Data visualization reading list


Title Author ISBN
The Visual Display of Quantitative
Edward Tufte (2001) 978-0961392147
Information (2nd Edition)
Show Me the Numbers: Designing
Tables and Graphs to Enlighten (2nd Stephen Few (2012) 978-0970601971
Edition)
Storytelling with Data: A Data
Cole Nussbaumer
Visualization Guide for Business 978-1119002253
Knafic (2015)
Professionals
Designing with the Mind in Mind:
Simple Guide to Understanding User Jeff Johnson (2014) 978-0124079144
Interface Design Guidelines
“Choosing Colors for Data Maureen Stone
Paper (Stone, 2006)
Visualization” (2006)
Envisioning Information Edward Tufte (1990) 978-0961392116
Beautiful Evidence Edward Tufte (2006) 978-0961392178
8.3 PRESENTATIONS 49

Information Dashboard Design (2nd


Stephen Few (2013) 978-938377006
Edition)
“Visualizations That Really Work” Scott Berinato Paper (Berinato,
(2016) 2016)
“Narrative Visualization: Telling Edward Segel and Paper (Segel and
Stories with Data” Jeffrey Heer (2010) Heer, 2010)

8.3  PRESENTATIONS
For presentations, there are four rules. The first three are in common with reports: (1)
keep it simple; (2) keep it clear; and (3) use good visuals. There is one additional rule
for presentations and that is, (4) make your presentation tell a story. The goal of your
presentation is to guide your audience, interpret the results of your findings to meet
their requirements, and highlight points of interest. Spending time to understand who
it is you will be presenting to gets you set up to accomplish those tasks.
The first rule is to keep it simple. For presentations, put your interpretation of
your findings at the very start of your slide deck. Knowing your audience helps here.
Do you need to provide background information for them to understand the context?
Is the forum you will be presenting to one of those that doesn’t let speakers get past
the first slide without a ton of questions? If it is that type of forum, build your presen-
tation to accommodate their style, by having one slide with the key point you want to
communicate, and then links to backup information to answer anticipated questions.
No matter what type of audience they are, keep to one idea per slide.
Keep your presentation clear. Don’t confuse your audience with all the analysis
you did to get to your final conclusion. Save that information in the backup of your
presentation in case of questions. Retain only the key graphs you made which got you
to the conclusion in your presentation, not every graph you generated in exploring
the data.
Part of keeping things clear is to not clutter your slides with extra information.
If you have a dense slide, consider separating the information into multiple slides, or
use builds to walk the audience through the information. Match what you say to what
is on the screen at that moment. A very good piece of advice is to plan out what you
will say for each slide and write it up either in the speaker notes section or in a separate
word document. Depending on how formal the presentation will be, consider prac-
ticing your presentation. If your presentation will be timed, practicing your delivery is
50 8. INTERPRET AND COMMUNICATE PHASE

particularly important. Practicing will allow you to gauge if you have too much or too
little content and will help get you used to working within the time limits.
Think about what the key point is that you want to communicate. Since each
slide only has one idea, make that idea the title of the slide. For maximum impact, I
express my titles as headlines. For example, “Increased marketing budget correlates to
increased sales” is the title of a slide with a graph showing marketing budget versus
sales (Figure 8.1). Notice also how I have included a text box with my analysis and
suggested course of action. Each slide should have a key takeaway message. To make
that message clear to the audience, I include it in a text box at the bottom of the slide
and use a build to allow them time to read the graph, before sharing my conclusion.

Increased Marketing Budget Correlates to Increased Sales


Sales
$70,000.00

$60,000.00
Sales Revenue (dollars)

$50,000.00

$40,000.00

$30,000.00
The more spent on marketing so far, the higher the sales revenue has
$20,000.00 been. It may be starting to top out at $30K. We should test that over
$10,000.00 the next two months by spending $32K each month.
$0.00
$0.00 $5,000.00 $10,000.00 $15,000.00 $20,000.00 $25,000.00 $30,000.00 $35,000.00
Mocked-up data from marketing and sales database, extracted August 28, 2010 by Joyce Weiner
Marketing Budget (dollars)

Figure 8.1: Example presentation slide

The third rule of presentations is to use good visuals. Use graphics and visual-
izations to underscore your point. As you present, verbally walk the audience through
your visualization, if they may not be familiar with reading that type of graph or table.
Include the data source and when the data was extracted as footnotes. Good visuals
often “grow legs” meaning a graph that really explains something well is often copied
and used in other presentations. This means you did an excellent job in capturing an
insight in a visual. Make sure your name is on the visuals you create, so that when they
are shared, you get credit.
For presentations, make your presentation tell a story. Like a story, your presen-
tation should have a plot and a beginning, middle, and end. Telling a story makes your
presentation easier to follow and makes it memorable. In the beginning of the story,
8.4 MODELS 51

let your audience know what to expect in your presentation. In the middle, deliver
the content and the value of the presentation, and in the end, summarize what you
covered. This is the classic, “Tell them what you’re going to tell them, tell them, and
tell them what you told them.” There is a reason it’s a classic—because it is effective.
Stories need plots. Some possible plots for presentations are: “My problem
and how I solved it,” “The current problem, options for solving it and the one I like,”
and “The current problem and the help I need from you.” In “my problem and how I
solved it” you are sharing data analysis used to identify solutions to a problem and ver-
ify that the problem has been solved. Or, you might be presenting on improvements
you made to model accuracy or a new algorithm. Another variation on this plot is
“my problem and how I found it” where you report on data mining analysis used to
uncover a problem.
In “the current problem, options for solving it and the one I like” you are pro-
viding analysis to support decision making. You are also providing analysis of possible
solutions and guiding the audience with your assessment of which is the best option
and why. In “The current problem and the help I need from you” you are presenting on
analysis of a problem and what resources are needed to move forward with a solution.
Of course, for all these stories, you are providing supporting evidence in the form of
charts, graphs, and tables.

8.4  MODELS
For models, interpretation of the information is built into the system that has been
put in place around the model and that uses the model’s output. For example, we have
a system that predicts the need for preventative maintenance. Based on the results of
the model’s prediction, the system will flag a user that maintenance is needed, or even
schedule maintenance through existing systems. If the model predicts a tool needs
adjustment, that might trigger a report with adjustment suggestions to engineering
or might trigger an automated adjustment. The method that is used depends on trust
level with that model, and how much experience the users have had with the model.
So, depending on the user’s level of comfort with the model, projects involving
a model might need reports, or they might need systems which interface with the
model. These systems need to provide the inputs required by the model to generate a
prediction and have rules or other methods to interpret that prediction.
52 8. INTERPRET AND COMMUNICATE PHASE

8.5  TIPS FOR MANGERS


Coach your team to understand their audience before they build presentations, re-
ports, or models. Check that they are keeping things simple, clear, and interpreting
their results with the audience in mind. Become familiar with good data visualization
practices and coach your team to apply them.
Support them in setting up meetings and collaboration sessions with the end
users of their reports and models so they can understand the users’ needs and work
with their users to develop output that works best for the customer. Support your
team in collaborating in an iterative way with the customer. Do this by having regular
check-in meetings with the customer during development. At these meetings, the
agenda is to review the progress, give a demonstration of the report or model to the
customer, and collect feedback.
For projects that end with a presentation, provide the presenter with informa-
tion about the expected audience, meeting attendees, and personalities. If there are
specific expectations for presenters in that meeting, share them with your team. Add
context to help build an effective presentation. For example, if there will be an execu-
tive in the room, ensure the presentation is tailored to match that executive’s preferred
style, whether that is having a single slide with backup information or having context
before making a decision. Help your team collect this information. Good sources are
the executive’s assistant, the meeting chair, and other people who have presented in
that forum.
Coach your team to practice giving their presentation before the meeting. Es-
pecially if presenting to senior executives. Provide an opportunity for a dry run with
you and give feedback on content, flow, and delivery.
53

CHAPTER 9

Deployment Phase
Begin planning for deployment from the beginning of your project. Excitement is a
common trap and when you start a new project, you may just want to get some data
and explore it and do some model building. The trouble is that then, you are started
down a path without fully thinking it through. Taking the time at the beginning of the
project to think about deployment gives your project an edge and can help you beat the
odds and be one of the 13% of projects that gets to the deployment phase.
Although tempting, don’t fall into the trap of minimally cleaning the data and
rushing to build a model. This method of execution of a data science project makes
deployment difficult because you haven’t thought about or planned for maintaining
the model in deployment. Once you have something created it is really disappointing
when you realize it is all wrong and needs to be scrapped. It is a much easier decision
to make at the beginning before you have spent any time building.

9.1  PLAN FOR DEPLOYMENT FROM THE START


In planning for deployment, you need to think through a few things. One, how will
the model be deployed? By this I mean, how will the user access the model, and get a
prediction? Two, if the project deliverable is not a model but a report, how will the user
access and interact with the report? Three, who will maintain the project long term?
How will you know if it stops working? What is the expected lifetime of the project?
Four, how often will you need to update the extraction, transformation, and load for
the input data? How often will you need to retrain or otherwise update the model?
Any project in deployment has a cycle. This cycle goes like this: plan, implement,
monitor, review. This is a loop that repeats over and over throughout the lifetime of the
project. First, you plan any needed improvements, then you implement them, monitor
the results, and review and decide on any needed changes.
In deploying a model or report, using existing systems keeps things simple. It
also helps ensure that your user will actually use your new model or report. There is
nothing worse than spending time and effort in creating something that is then not
used at all. As an aside, I recommend including some way to track usage for reports
so that if you find a report you created is not used, you can circle back with your cus-
tomer and have a conversation about their current needs and how your report could
54 9. DEPLOYMENT PHASE

be enhanced, adjusted to meet those needs, or possibly that your report is no longer
needed and you can stop running it.
Using systems that are already in place means less work is needed to get your
report or model into production. Inserting a model into an existing system, or even
building the model in a spreadsheet is easier than creating an all new application or
building a website for your model. When I worked in manufacturing, all our factory
equipment had associated computers to control the equipment. It was very easy to
deploy models to that controlling computer and harness the existing systems.

9.2  DOCUMENTATION
A favorite quote of mine is, “documentation is a gift you give your future self.” Having
started with a project charter and a SIPOC, you have begun to document the project
at the very start. It is best to continue with this practice of documenting as you go,
rather than waiting until the very end to write the project documentation. When you
wait, the burden of remembering what you did and why can become a mountain of a
task and make documentation hard to complete.
Frequently, projects fail because of lack of good documentation. A project may
be implemented once but can’t be easily maintained because there was no transfer of
knowledge. A good model can’t be reused because it doesn’t run on someone else’s
computer due to a lack of undocumented dependencies. What often happens in these
cases is that teams will redo a project, reinventing what had already existed because
they are unable to use it.
When thinking about deployment, think about the documentation for the proj-
ect. Where will you store the code and the documentation? Make sure all information
about the project is in one place. This can be a wiki, a shared drive, or a code repository.
While you are documenting your project, take the time to write up all the deci-
sions you made and why you made them. Did you investigate multiple models before
selecting the one that worked the best? That is great information to have for the future,
and for sharing with other teams in your organization. Did you select a particular lan-
guage to use for scripting because it was easy to interface into existing systems? Again,
great information to capture and keep for the long term.
At the end of the project, when it has been deployed, take time to reflect and
document the learnings you had over the course of the project. This should include
things like new insights that were gained, new algorithms that were developed, and
general learning that occurred. Also, take the time to go back and revisit the project
9.3 MAINTENANCE 55

charter. Did you accomplish what you planned to do? Why or why not? Write up your
reflections and include them with the final business value delivered by your project.

9.3  MAINTENANCE
A big consideration in the deployment phase is who will maintain the report or model.
The SIPOC is helpful in defining this as it gives information about who will be using
the model and what they expect. If your user is expecting 24×7 support, you need to
plan for that before putting your solution into production.
I ran into the problem of not planning for maintenance early in my career. I
developed a report to make it easier for manufacturing to make some production deci-
sions, and my report stopped working at 2 am. Of course, I was called in the middle of
the night to fix the report and get production running again. While that was all right
as a one-time solution, it would not work in the long term. I needed to convert my
report to work with existing systems that were supported on a round-the-clock basis.
If I had thought this through at the beginning, it would have prevented a scramble
and a re-write of the report.
Before your project goes into deployment, think about how you will know if the
report or model has stopped working. For a model, this is about error checking and
verification. Will you have a defined testing cycle? Can you detect errors automatically?
For a report this can be as simple as having a programmatically generated time
stamp at the top. The user can then check to see that the report has updated before
using the information. There is nothing worse that learning that decisions have been
made based on a report or dashboard that hasn’t been updated in two weeks. I include
a timestamp and a support phone number or email in my reports, so if a user identifies
that the report has stopped, they can contact the report owner or support team and
notify them of the problem.
Think about when you will update the model, and what will trigger an update.
The same goes for reports. Will you establish a time-based update cycle? Is there a
specific event that would trigger an update? For a model, you can measure accuracy,
and if it drops below a certain threshold, trigger retraining of the model. Sometimes
there are external factors that influence when you should retrain, such as changes to
the process that the model is built for, or changes to automation.
Finally, think about the expected lifetime for your project. What will trigger
obsolescence? It is unlikely that you will be running the same report or model forever.
How will you know if it is no longer being used? For reports, I like to have some way
of telling if users are interacting or view them. When that usage count falls off, it’s
56 9. DEPLOYMENT PHASE

time to have a conversation with the main decision maker about the usefulness of the
report. At this point you have two choices, you can update the report to meet the new
needs, or you can cancel the report because it is no longer useful.
When you no longer need a report or model, having documentation about all
the pieces and dependencies is incredibly helpful in completing end of life tasks. You
don’t want to remove or delete something that other models, reports, or teams rely on.
Cleaning up after a report or model frees up shared resources (compute, storage, etc.).

9.4  TIPS FOR MANAGERS


Work with your team to plan for deployment from the very start of a project. Differ-
ent projects have different end points and final deliverables (Table 8.1). Support your
team in recognizing which type of project they are working on and delivering to those
endpoints. Help them identify when a project is done, hold the retrospective, and
celebrate the completion and delivery of projects with them. Remember, when your
team deploys a data science project, they have beaten the odds and have avoided the
project pitfalls. Make sure they recognize the benefit of having structure for project
development and of going slow to go fast.
Check that your team is documenting the project. This includes the project
charter and SIPOC. Documentation should continue throughout the project, and
not be only done at the very end. Check in with your team and ask about the current
documentation. When decisions are made in the direction of the project, like which
model to use, or which system to use to deploy the model, make sure this is captured
in the project documentation. Capture not just the decision, but what alternatives were
explored and why the one was selected. This is helpful in case a decision needs to be
revisited in the future. Support your team by having systems in place for collecting
documentation and storing it. Once a project is deployed, make sure the project char-
ter is updated to capture the delivered business value, and include the writeup from
the retrospective in the document repository.
Slow your team down and have them plan out the work at the start of a project,
or whenever you discover that is has not been done. Have them do a project charter
and SIPOC to be set up properly for deployment. Keep things simple by asking how
they can use existing systems to deploy projects. Make sure they are thinking about the
full lifetime of a project. Help them think through maintenance of models and reports.
Ask your team how they will monitor the report or model to know that it is working
properly. Ask how they will monitor the report or model to know that it is still being
used. Help them plan for obsolescence and have systems in place for decommissioning
9.4 TIPS FOR MANAGERS 57

reports and models. Either build the capability for long-term support of models and
reports within your team or establish systems and methods for transferring projects to
other teams for long-term support.
59

CHAPTER 10

Summary of the Five Methods


to Avoid Common Pitfalls
The current statistic is that 87% of AI/big data projects fail. In this context, failing
means that the project never reaches deployment. By applying 5 methods to avoid
common pitfalls, you can give your project a better opportunity to beat the odds and
be one of the 13% that make it to production. The five pitfalls are:
1. the scope of the project is too big;

2. the project scope increased in size as the project progressed—e.g., scope


creep;

3. the model couldn’t be explained, hence there was lack of trust in the
solution;

4. the model was too complex and therefore was difficult to maintain; and

5. the project solved the wrong problem.


The five methods are:
1. ask questions;

2. get alignment;

3. keep it simple,;

4. leverage explainability; and

5. have the conversation.

10.1 ASK QUESTIONS


First of all, ask questions. This is something you need to do from the start of the project
and continue through the very end. Ask questions to get feedback during the define
phase of the project as you are staring to design the solution. Ensure that you under-
stand the problem that is to be solved and the process that is to be addressed and ask
questions to verify your understanding. Ask the questions to get this input before you
60 10. SUMMARY OF THE FIVE METHODS TO AVOID COMMON PITFALLS

start gathering data or building a model. Use the project charter and SIPOC analysis
tool to guide you in asking these questions.
Asking questions is not a one and done type of thing. Continue to check in with
the project stakeholders and customers as you go to ensure you are solving the right
problem and meeting their requirements. Update the charter and SIPOC as you gain
clarity and learn more about your customer’s needs. Show your customer the results of
your initial data exploration and ask for feedback. Ask about explainability, and how
the model will be used. Ask about long-term considerations like who will own main-
taining the model. Ask for feedback at the end of the project.

10.2 GET ALIGNMENT


Second, get alignment. Use the project charter to document the expected business
value your project will deliver. You can reassess this and revise the charter as you go
and realign with your stakeholders and decision maker. Make sure you have aligned
on what done looks like to prevent scope creep and enable you to actually finish the
project and get it into production.
Be sure you are aligned with your customer on explainability. Check in to be
sure they are not confused by terminology like correlation and causality. Make sure you
are using a model that fits their needs.
Get alignment on boundary conditions and long-term considerations. Share
your plans for deployment and maintenance with your customer and decision maker.

10.3 KEEP IT SIMPLE


Keep it simple. Use physical models and simple techniques. Use the minimum number
of input parameters to achieve your project goals.
Plan ahead for deployment and keep the deployment method simple. Use sys-
tems that already exist rather than creating new things. Think long term and think
about how the model will be maintained. Simple solutions are easier to maintain and
are easier to transfer from owner to owner.

10.4 LEVERAGE EXPLAINABILITY


Fourth, leverage explainability if it is important to your customer and their deci-
sion-making process. Ask about their process up front so that you establish their cri-
teria for explainability and use a model that your customer is comfortable with.
10.5 HAVE THE CONVERSATION 61

Consider simpler models like physical models or simple tree models to support
explainability. Take advantage of new techniques for explainability that are currently
being researched.

10.5 HAVE THE CONVERSATION


Lastly, have the conversation. An ongoing two-way conversation between you and the
users of your models and reports is a valuable thing. Keep your customer involved as
you go. This prevents scope creep and overbuilding.
Continue to check in with your customer throughout all the phases of the proj-
ect. Start involving your customer in the define phase by jointly creating the project
charter and getting their input in the SIPOC analysis. Keep them updated as you
acquire and explore the data and build models. Share the interpretation of the results
and your plans for deployment. This on-going conversation will prevent you from
solving the wrong problem.
Continually update your project charter as you go to help facilitate the com-
munication. Finally, when the project is complete and in production, reflect on what
you learned overall, calculate the delivered business value, and share both with your
customer and decision maker.
By following these five methods, you will ensure your project doesn’t fall into
the pitfalls of scope creep, too big a scope, a model that can’t be explained and is too
complex, or that you solved the wrong problem. This will set you up to deliver a project
that beats the odds and makes it into production.
63

References
Beck, K. and Beedle, M. (2001). Principles behind the Agile Manifesto. Retrieved
from agilemanifesto.org: https://agilemanifesto.org/principles.html. 10
Berinato, S. (2016). Visualizations that really work. Retrieved from Harvard Business
Review: https://hbr.org/2016/06/visualizations-that-really-work. 49
Deming, W. E. (1993). A bad system will beat a good person every time. Retrieved
from The W. Edwards Deming Institute: https://deming.org/a-bad-system-
will-beat-a-good-person-every-time/. 17
Few, S. (2012). Show Me the Numbers: Designing Tables and Graphs to Enlighten. El
Dorado Hills, CA: Analytics Press. 48
Few, S. (2013). Information Dashboard Design. Analytic Press. 49
George, M. L., Rowlands, D., and Kastle, B. (2003). What is Lean Six Sigma? Mc-
Graw-Hill Education. 2
Johnson, G. (2014). Designing with the Mind in Mind: Simple Guide to Understanding
User Interface Design Guidelines. 2nd Edition. Morgan Kaufmann. 48
Knafic, C. N. (2015). Storytelling With Data: A Data Visualization Guide for
Business Professionals. Hoboken, NJ: John Wiley and Sons, Inc. DOI:
10.1002/9781119055259. 48
Manturewicz, M. (2019). What is CI/CD—all you need to know. Retrieved from
https://codilime.com/; https://codilime.com/what-is-ci-cd-all-you-need-
to-know/. 44
Oxford Languages. (2020). Artificial intelligence definition. Retrieved from google.
com: https://tinyurl.com/y6zwlnkw. xi
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). “Why should I trust you?”: Explain-
ing the predictions of any classifier. arXiv 1602.04938 [cs.LG]. Retrieved
from https://arxiv.org/abs/1602.04938. DOI: 10.18653/v1/N16-3020. 42
Royal Society. (2019). Explainable AI: the basics Policy Briefing. Retrieved from Royal
Society: https://www.exploreaiethics.com/reports/explainable-ai-the-ba-
sics/. 15
64 REFERENCES

Segel, E. and Heer, J. (2010). Narrative visualization: Telling stories with data. IEEE
Transactions on Visualization and Computer Graphics (Proc. InfoVis). DOI:
10.1109/TVCG.2010.179. 49
Stone, M. (2006). Choosing colors for data visualization. Retrieved from Perceptual
Edge: https://www.perceptualedge.com/articles/b-eye/choosing_colors.pdf.
48
Tufte, E. (2001). The Visual Display of Quantitative Information. Cheshire, CT: Graph-
ics Press. 48
Tufte, E. (1990). Envisioning Information. Cheshire, CT: Graphics Press. 48
Tufte, E. (2006). Beautiful Evidence. Cheshire, CT: Graphics Press. 48
VB Staff. (2019). Why do 87% of data science projects never make it into production?
Retrieved from Venturebeat.com: https://venturebeat.com/2019/07/19/
why-do-87-of-data-science-projects-never-make-it-into-production/. vi, 1
65

Author Biography

Joyce Weiner is a Principal Engineer at Intel


Corporation. Her area of technical expertise is
data science and using data to drive efficiency.
Joyce is a black belt in Lean Six Sigma. She has
a B.S. in Physics from Rensselaer Polytechnic
Institute, and an M.S. in Optical Sciences from
the University of Arizona. She lives with her
husband outside Phoenix, Arizona.

You might also like