Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Quality Engineering

ISSN: 0898-2112 (Print) 1532-4222 (Online) Journal homepage: https://www.tandfonline.com/loi/lqen20

Statistics = Analytics?

Willis A. Jensen

To cite this article: Willis A. Jensen (2020) Statistics = Analytics?, Quality Engineering, 32:2,
133-144, DOI: 10.1080/08982112.2019.1633670

To link to this article: https://doi.org/10.1080/08982112.2019.1633670

Published online: 07 Aug 2019.

Submit your article to this journal

Article views: 269

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=lqen20
QUALITY ENGINEERING
2020, VOL. 32, NO. 2, 133–144
https://doi.org/10.1080/08982112.2019.1633670

STU HUNTER RESEARCH CONFERENCE ARTICLE

Statistics = Analytics?
Willis A. Jensen
W. L. Gore & Associates, Inc, Flagstaff, Arizona

ABSTRACT KEYWORDS
In recent years, analytics has received increasing attention in the business world with more big data; data analysis; data
conferences and publications addressing how to obtain insights from data. We compare science; statistical
and contrast some definitions of statistics and analytics and provide a hybrid definition. We engineering
share an example of an analytics project which highlights some of the statistical and non-
statistical tools required to be successful. We discuss the connection of statistics and ana-
lytics to other terms such as big data, data science, industrial statistics, statistical thinking,
statistical engineering and Six Sigma. The connections we highlight result in implications,
opportunities and challenges for the statistics profession.

Introduction do statisticians play in analytics projects? Is the discip-


line of statistics the same as the discipline of analytics?
In recent years, Analytics has received a lot of buzz
How much overlap is there between the two? What
and attention in the business world with many blogs,
are the implications of this buzz for the future of the
articles and popular books such as Competing on
statistics profession?
Analytics by Davenport and Harris (2007). A seminal
We discuss our perspectives on analytics based on
paper by Davenport (2006) entitled “Competing on
our recent work experiences. We explore the different
Analytics” was named as one of the top 12 ideas of
definitions of statistics and analytics and their related
the decade by the Harvard Business Review (Kirby
terms. We provide an example of a project involving
2010). Phrases such as “data is the new oil” and the people analytics (HR analytics), describing some of
“data economy” (Economist 2017) highlight the the tools used, including the relevant statistical meth-
importance of data going into the future. Workers, ods. This project involved some tools and analysis
managers and executives are all hearing about the that have not often been used in traditional industrial
power of data and analytics and want to know more. statistics, such as network analysis, algorithms and
Many new conferences and publications have high-speed computing methods. Because of our expe-
appeared covering the topics of data and analytics and riences much of our focus here is on business applica-
how to unlock the insights contained in the data. tions of analytics. Indeed, there is a field around
Popular application areas are appearing under the “Business Analytics”, but we believe that the concepts
names of people analytics, supply chain analytics, sales we discuss here are relevant for any organization that
analytics, customer analytics and more. wants to make decisions based on data.
Despite the hype and buzz, much of this content is For analytics projects, there are many different skill
not really all that new, the methods and techniques in sets that are required to be successful. Because of the
analytics are often well known to statisticians. They breadth of these skillsets it is rare for a single individ-
have been described in connection with other terms ual to possess all of them, requiring more teamwork
and disciplines such as industrial statistics, applied and collaboration. These skillsets include statistical
statistics, statistical thinking, statistical engineering, methods but also many that are non-statistical and
Six Sigma, big data and data science. This buzz and require statisticians to adapt to be successful in this
proliferation of different terms raises some interesting new environment. These non-statistical skillsets have
questions. Is analytics really just the application of implications on the traditional roles of statisticians
statistical methods to business problems? What role and on statistics education. We conclude with our

CONTACT Willis A. Jensen wjensen@wlgore.com W. L. Gore & Associates, Inc, Flagstaff, AZ.
This article was presented at the seventh Stu Hunter Research Conference in Induno Olana, Italy, February 2019.
ß 2019 Taylor & Francis Group, LLC
134 W. A. JENSEN

take on the answer to the question of whether or not definitions about the discipline of statistics as a sci-
statistics and analytics are the same thing. ence as well as including elements of the collection
and analysis of data. There is some mathematical
element to the discipline of statistics although that is
Some definitions
becoming less fundamental as computing power
To understand the link and relationship between sta- becomes more prevalent and the data analysis
tistics and analytics, we start with some definitions. In becomes more dependent on software tools.
our review of much of the literature in the area, we Box (1990, p. 252), noting the shift away from the
have found such a broad variety of definitions in the math heavy focus of statistics in the past, stated “It
terms and no real consistency in how people use the seems a pity that while we statisticians have an oppor-
terms. Nonetheless we need to define what we mean tunity to rate as first-class scientists we should settle for
for purposes of this article. Consider the following the rather dreary role of second-class mathematician”.
four definitions for the term “statistics” which come One of our former university professors was fond of
from a biased sample selected to illustrate the variety quoting this more concisely as “Why do we aspire to be
that is present in the literature. 2nd rate mathematicians when we can be 1st rate scien-
tists?” With the increase in computing power and data
 “A science, not a branch of mathematics but uses availability that continues today, mathematics is
mathematical models as essential tools.” (American becoming less important for statisticians while savvi-
Statistical Association 2018). ness with a computer is become more important. Lenth
 “A branch of mathematics dealing with the collec- (2014, p. 14) in a Youden address described this as the
tion, analysis, interpretation, presentation, and underlying tectonic plates shifting when he stated
organization of data.” (Wikipedia 2018) “Partly due to technology, the underpinnings of the
 “The science that deals with the collection, classifi- theory of statistics have also evolved in the past few
cation, analysis, and interpretation of numerical decades. Just as the Pacific tectonic plate has gradually
facts or data, and that, by use of mathematical the- shifted over the hot spot that created the Hawaiian
ories of probability, imposes order and regularity Islands, statistical theory has shifted so that the under-
on aggregates of more or less disparate elements” lying hot spot has changed largely from mathematics to
(Random House 2018) computer science.” This shift in the underlying tools
 “The practice or science of collecting and analyzing for the use statistical methods does not change the fun-
numerical data in large quantities, especially for damental focus of statistics on data and the science of
the purpose of inferring proportions in a whole extracting information from that data.
from those in a representative sample” (Oxford Now consider the definitions of analytics. In our
Dictionary 2018a) opinion, there is even more variety and differences in
how the term is used. This variety and the resulting
The American Statistical Association (ASA) defin- confusion was noted by Rose (2016). Again, a biased
ition is incidentally attributed as a quote from John sample of some definitions are given as:
Tukey but as far as we can determine, it is actually a
subset of a quote from a talk given by Tukey in 1951  “The scientific process of transforming data into
and which is printed in Tukey and Jones (1986, p. insight for making better decisions”
66). We do find it interesting that a definition of sta- (INFORMS 2018)
tistics is not clearly or officially defined within the  “The discovery, interpretation, and communication
ASA content on their website. Statisticians in general of meaningful patterns in data and applying those
recoil from the Wikipedia definition because they feel patterns towards effective decision making”
that it is a separate discipline from mathematics rather (Wikipedia 2018)
than being a branch of mathematics and we agree  “Information resulting from the systematic analysis
with that sentiment. The Random House definition is of data” (Oxford 2018b)
a little more complicated but agrees with the ASA def-  “It conceptually groups together the quantitative
inition in defining statistics as a science. Similarly, the decision science [which includes statistics, data sci-
Oxford dictionary definition also treats statistics as a ence, operations research, computer science, indus-
science and emphasizes the importance of samples trial engineering]” (Rose 2016)
and inference. What is common to all these defini-  “A comprehensive, data-driven strategy for prob-
tions? Wikipedia aside, there is something in all the lem solving” (Nelson 2018)
QUALITY ENGINEERING 135

There are some common themes in these defini- include discussion of the 3 Vs of Big Data in the
tions related to the use of data, information and deci- Volume, Velocity and Variety such as the discussion
sion-making. Another thing that seems to be common of Jain (2013) and which were elaborated further in
in these definitions is that analytics is something you Jones-Farmer, Ezell, and Hazen (2014). We wish to
do (ie methodology or strategy) in a general sense shift the debate from semantics about the definition
with data. We find the grouping concept of Rose and size of Big Data to the fundamentals of any data
(2016) to be interesting. While Rose (2016) noted the no matter how big they are. In keeping with the
many different ways analytics is used, they do provide theme of using terms beginning with the letter “V” we
a definition as a broad concept that includes a variety believe these fundamentals can be described with an
of other disciplines such as computer science and sta- additional set of terms which we call the
tistics. In their framework, analytics encompasses “Fundamental V’s” and which are shown in Figure 1.
other disciplines much the same way that the broad We believe that a greater focus on these 4 fundamen-
term of science encompasses the disciplines of physics, tal V’s will lead to more success with Big Data than
biology and chemistry. Seen this way, analytics the focus on the 3 V’s of Volume, Velocity and
becomes not so much a discipline in itself but a way Variety. As noted by Hoerl (2019), these fundamentals
to group disciplines that have some element of data do matter!
analysis and decision making in them. Treating it this In Figure 1, we see the importance of using the
way also allows you to have the alternative definition right data with the concept of the veracity of the data
of analytics as an action. You can do analytics just as and which has been described elsewhere as the 4th V
you can do science. This also allows one to talk about of Big Data by ASA Working Group (2014). The ver-
general principles for doing good analytics, just as you acity of something means that it is truthful and cor-
can talk about general principles to do good science. rect. Truthful data implies that we have the right data
We note that many good principles for doing good for the problem at hand. It means that the data are
science (e.g. starting with a question/hypothesis, correct and do not have errors in them that would
reproducible research, independent confirmation of lead to incorrect conclusions. We all know the adage
results) are also good principles for doing “Garbage in, garbage out”, which is true for all types
good analytics. of data. Systems for data gathering must be error
To share an alternative, hybrid definition of ana- proof. This is represented at the left of Figure 1 to
lytics, we take a brief detour to discuss some concepts show that the data is the raw material for data ana-
related to Big Data. There has been a lot of discussion lysis, it is the input into the process of analyzing data.
about Big Data and what makes data big enough to The concept of the variation in data is well-recog-
be considered “Big Data”. These discussions often nized by statisticians and they often stress the

Figure 1. Fundamental V’s of data analysis.


136 W. A. JENSEN

importance of accounting for variation in the use of detail is described as a framework of Information
any statistical tools. Variation is present in all proc- Quality (InfoQ) in Kenett and Shmueli (2014, 2016).
esses that generate data, even when such generating Thus, to define analytics we offer a hybrid defin-
processes are not recognized as such. All data is gen- ition of the previous definitions as “Creation of infor-
erated by processes. And the fundamental first tenet mation for effective decision-making by the systematic
of “statistical thinking” as defined by ASQ is that “All analysis of data.” This definition captures the doing
processes have variation” (Britz et al. 1996). This is and action of analytics rather than considering ana-
placed at the bottom of the figure to represent that lytics as just an outcome. In a sense we are arguing
the sources of variation may be hidden but it is always that analytics is synonymous with the general term of
present, underlying any data that are collected and data analysis. Implicit in this definition are all of the 4
used in an analysis. Vs that we have highlighted: veracity, variation, visu-
The next V is Visualization. It is not enough to alization and value. Also implicit in this definition is
that it captures the work of many of the different dis-
simply analyze the data and obtain conclusions. The
ciplines around quantitative decision-making based on
information contained in the data must be communi-
data that Rose (2016) described. Given this broader
cated to those who most are able to act on it.
definition of analytics, there are many ways to seg-
Visualization is a crucial way to be able to communi-
ment different types of analytics.
cate the analysis results and thereby obtain more
In the analytics literature, one way to segment dif-
value. If the data analyst is the only one who is able
ferent types of analytics is by the type of question that
to understand the insights obtained from the data, we seeking to answer. These different questions are
then the full value and impact will not be realized. shown in it is common to refer to different types of
This V is placed at the top of Figure 1 as it is the vis- analytics such as descriptive, predictive or prescriptive
ible piece of analysis in contrast with the variation. analytics. A common picture was shown by Elliott
The final V is Value. Value refers to the benefits (2013) as follows.
obtained from the output of the data analysis. It Although we don’t necessarily agree with the place-
should be emphasized that we do not mean Value to ment of the items within the axes of this graph, we do
be strictly a monetary value although that is certainly find it helpful to use these different types of questions
one possible aspect. Rather we use a broader defin- to help individuals and teams focus their analytical
ition of value to include information or knowledge efforts. In our experience, these different situations for
that can be acted upon in some way. The core elem- using these types of analytics is not always as linear or
ent of value is the ability to make a decision using the clear cut as is shown in Figure 2. For example, we’ve
data. Value comes from the answers to the questions had some experiences where there were difficulties in
that drive the data collection and analysis, but it is doing descriptive analytics, where there were no exist-
more than just the conclusions from a data analysis. It ing data systems much higher than the difficulty in
also includes the “so what?” that you ask after obtain- doing predictive analytics, and in another situation,
ing the conclusions. If you stop at conclusions, you where the data was more readily available. We’ve also
have not gotten any value from the data. It is at the found situations where descriptive analytics actually
right side of Figure 1 because it represents the output, created very high value and generated significant
the end product that comes from the raw material returns on investment compared to more sophisti-
of data. cated types of analytics.
We summarize the Vs with the simple equation of
Right Data þ Right Analysis ¼ Right Decision which is
shown at the bottom of Figure 1. The right data is the
correct, relevant data. Beyond the statistical methods
and modeling techniques for data analysis, the right
analysis includes powerful visualizations that correctly
account for the variation in the data and how it is col-
lected. It is important to recognize that the efforts to
do a good analysis can be wasted if the visualization
is misleading or uninformative. The right decision
creates the value to the organization that is using the
data. An alternative viewpoint on Figure 1 with more Figure 2. Types of analytics based on questions.
QUALITY ENGINEERING 137

Regardless of the type of question being asked, Given the general focus in business to make profit
there is data involved, there is a systematic analysis, or for organizations to aspire to make a difference in
there is decision-making and there is information. So, the world, it should not be surprising that decision-
our definition of analytics does not preclude these dif- making and action are important priorities. Insight, in
ferent types of analytics. Descriptive analytics is a and of itself, does not provide value unless it becomes
more retrospective view with the data and often actionable insight. Thus the “Value” dimension shown
involves reports, summary statistics and dashboards in Figure 1 is crucial to any analytics effort and we
with historical trends of data. Diagnostic analytics see that as a crucial distinction from traditional statis-
involves root cause analysis and data exploration to tical methodology which focuses more on the analysis
determine underlying reasons for why the data are the tools for the data at hand.
way they are. Predictive analytics involves building
models to determine significant variables that impact
A project example
the response and make predictions when we are confi-
dent the models are sufficiently accurate representa- To illustrate an example of an analytics project, we
tions of reality. Prescriptive analytics goes one step share a project we worked on that combined a variety
further to identify proactive actions that can be taken of tools and approaches at Gore. Gore is a large com-
based on predictions in order to improve the out- pany of nearly 10,000 associates spread out in multiple
comes before they actually occur. divisions in multiple countries across the globe. The
This segmentation based on the questions or pur- problem involved the peer-to-peer evaluation process
pose of the analytics does not preclude other ways to at Gore that was described in Hamel and Breen (2007,
segment analytics. Another way to segment is based Chapter 5). Individual associates at Gore are ranked
on application areas of these types of analytics. That’s by their peers in terms of their contributions that they
why it makes sense to refer to areas such as sports are making to the company and compensation deci-
analytics, HR analytics, supply chain analytics, market- sions are made based on the evaluation results.
ing analytics, web analytics, customer analytics and However, over time this annual process became time-
more. Many of these application areas use the same consuming and cumbersome because many individu-
types of tools in descriptive, diagnostics, predictive als were working in more diverse and geographically
and prescriptive analytics. So, you can for example, do dispersed teams. A more efficient way of gathering
predictive analytics in the marketing analytics arena as the peer ranking input that accounted for the chang-
well as in the supply chain analytics arena. Regardless ing work environment was needed.
of the segmentation used, the fundamental concepts An idea of a better way, using the network of con-
shown in Figure 1 still apply. nected associates was proposed. In this idea, each
associate nominated 5–20 other associates that they
felt would best understand their contribution to the
Statistics versus analytics
company and who could rank them. The nominations
Given these definitions of statistics and analytics, what are pooled together and based on the collective net-
is the overlap and distinction? Clearly, both have work map of all the nominations, pairs are selected to
something to do with data analysis. However, we be ranked by those who were nominated. The pairs
would argue that statistics has a stronger science were selected using a sophisticated algorithm provided
element while analytics has a stronger decision-mak- by an external vendor that seeks to find the optimal
ing element. None of the definitions for statistics pre- set based on some key criteria to maximize the quality
viously share include elements around decision of the information provided by inputters. A web-
making. There-in lies a fundamental shift from statis- based tool was built to collect the ranking data from
tics as a science to analytics that involves decision- the nearly 10,000 associates who are asked to provide
making or action. One can do science for the sake of input. The data on ranked pairs are analyzed with
science just as you can analyze data solely for the sake statistical software to determine the relative ranking of
of learning something from that data. But there is no associates and the data output is provided to the com-
value in the information obtained from data unless it mittees who determine the final rankings. The final
impacts a decision or is acted upon. Note that the rankings are used as an input into compensa-
choice to not take a particular action as a consequence tion decisions.
of a data is a decision that can be just as valuable as How did this work get started? The idea was
the choice to act. pitched by a statistician to leadership who supported
138 W. A. JENSEN

the idea and a small prototype experiment involving a but the fundamentals Vs shown in Figure 1 are still
smaller group of associates was executed. After the highly relevant. We believe a focus on getting the
successful execution of the prototype, we joined a right data needed to answer a question or solve a
cross-functional team of about 20 associates tasked problem will be more successful than simply doing a
with scaling up the concept to be used across the Big Data effort. In spite of the increasing availability
whole Enterprise. This team included individuals from of data being generated by new technology, good ana-
IT, HR, Project Management, Leadership and lytics is still essential to be able to extract the value
Statistics. Our role included more than just the statis- from that data.
tical analysis of the data, but also included education Data science is another term that has been used
of statistical concepts to the project team, simulation with increasing frequency. There are plenty of debates
of potential scenarios that could happen in the execu- about what data science is and how it is related to sta-
tion, testing of the algorithm results and involvement tistics and other disciplines such as the arguments of
in the change management efforts. As a result of the Donoho (2017). From our perspective, data science
project successfully meeting timelines, the resulting should have been the term that statisticians used to
process has resulted in a conservative estimate of describe their work. The definitions on statistics that
more than 10,000 labor hours saved every year. we shared earlier generally incorporate the terms
Looking back on this project, how would one “data” and “science” and regardless of how the term is
define the work? Is this an example of good statistical used now, it may have been a missed opportunity for
consulting or collaboration? It is a big data or data the statistics profession. Tukey (1962) regarded data
science project? Ultimately, it required the use of analysis as a science and suggested it should be its
many different skills and disciplines, so it is challeng- own discipline. Statisticians didn’t use the term and
ing to classify it as a project in a single discipline. The consequently the computer science profession later
complexity of the project makes it difficult to categor- began to use the term to describe the computational
ize, yet there is no denying the value of statistical tools used for data analysis. As a result, data science
methods involved along with other methods from now has a predominant computer science flavor to it
other disciplines. However, the statistical methods and is often seen more as a computationally heavy
were just one small aspect of the project. Using the approach to data analysis as reflected in data mining
broader definition of analytics given in Rose (2016), techniques and machine learning algorithms for work-
we feel comfortable calling this an analytics project ing with larger data sets. Some of those working as
because it involves methods from many different dis- “data scientists” do great work with the data but don’t
ciplines. Ultimately, this work falls under our defin- have as much experience with the science aspects of
ition of analytics as “Creation of information for data analysis. A scientific approach to data analysis
effective decision-making by the systematic analysis would always start with the question to be answered
of data.” and then get the right data to match that question
rather than the approach of starting with the data and
see what information can be extracted from it as
Connections to other disciplines
described by Leek (2013). We recognize that data sci-
In this preceding discussion on statistics and analytics, ence as a term has been evolving and that there is a
we have not discussed much other related terms. As wide variety in the mix of “data” and “science” focus
the field of data analysis has evolved over time, there that data scientists do in their work. Regardless of
are many different terms and perspectives and move- how one uses the term data science, it is still covered
ments that have been proposed. We share some by our broad definition of analytics.
thoughts on some of these terms including – big data, In general, we see statisticians as those who advo-
data science, Six Sigma and statistical engineering. cated for real understanding of what is happening in
As mentioned previously, another buzzword is that the relationships between the variables, who want to
of Big Data. In reality, it doesn’t matter how big or understand the causal effects, the science behind what
little the data are, good data analysis is still crucial to is happening in the data. In contrast, many individu-
be able to get the information from the data. als in the newer data science arena who are doing
Analytics is still analytics, regardless of how much algorithmic (e.g. machine learning) models are less
data one has at their disposal. Yes, the computing concerned with the science and more concerned with
techniques and analysis methods are different to getting good predictions in the models that are built.
account for large volume, velocity and variety of data They seek less to understand the causal effects.
QUALITY ENGINEERING 139

Breiman (2001) described these differences in two dif- technology and other relevant sciences to generate
ferent cultures of statistical modeling and argued that improved results.” Other definitions include “The
more emphasis needed to be placed on the algorith- study of systematic integration of statistical concepts,
mic modeling culture. Similarly, Shmueli (2010) dis- methods, and tools, often with other relevant disci-
tinguished between explanatory modeling which seeks plines, to solve important problems sustainably.”
to understand the causal effects of variables from pre- (International Statistical Engineering Association
dictive modeling which seeks to obtain better predic- 2018) and “The collaborative study and application of
tions. We don’t see this as an either/or issue. Rather the tactical links between statistical thinking and stat-
we that both cultures and both types of modeling are istical and discipline-specific tools with the objective
needed, depending on the problem at hand. Nelson of guiding better understanding of uncertainty in
(2018) described the synergy that can happen where knowledge and decision-making to generate improved
algorithmic models are used to help discover patterns results to benefit the organization and/or society.”
and relationships in the data and generate hypotheses. (ASQ Statistics Division 2018). These definitions have
The hypotheses can be tested and confirmed with substantial overlap with the broad definition of ana-
more traditional statistical models and experiments to lytics as a grouping of many disciplines and also
ensure that the results are explainable and reprodu- include the element of decision making. The method-
cible. Both exploratory data analysis to generate ques- ology of statistical engineering is sound as its underly-
tions and confirmatory data analysis to understand ing focus is the use of the scientific method to solve
the causal effects are important. Analytics as a broader problems. However, these definitions have a statistics
term embraces both of these cultures and modeling
perspective and how to link statistics to other disci-
types as necessary to solve problems.
plines rather than a more egalitarian, holistic perspec-
Six Sigma and its variations (e.g. Design for Six
tive of multiple disciplines working together in an
Sigma and Lean Six Sigma) is an approach that
analytics effort. We do not believe the term “statistical
received a lot of buzz in the 90s and early 00s in the
engineering” resonates with those in the business
business world in the United States. It continues to
world, although we do believe there is a strong desire
receive a lot of attention and focus throughout the
to hire and retain employees who have good problem-
world with journals, books and conferences. It is a
solving skills. In our personal experience, acting as a
more holistic approach to problem solving and has a
statistical engineer has been very beneficial for career
lot of similarities with the broad concept of analytics
as a systematic way to gain information for decision opportunities, even when not using the label as a job
making. There are several deficiencies in the Six descriptor. We do see potential for statistical engineer-
Sigma approach, although it has the right idea in ing as a sound approach within the ana-
focusing on real business value. The methodology is lytics umbrella.
very structured, going by acronyms such as DMAIC.
This structure can be limiting for more complex kinds Roles of statisticians in analytics
of problems. The statistical tools that are taught in Six
Sigma tend to be more appropriate for smaller data Statisticians have felt undervalued for decades. The
problems in industrial statistics as shown in Hoerl papers of Bross (1974), Pfeifer, Marquardt, and Snee
(2001) and often don’t reflect tools that are needed in (1988), Boroto and Zahn (1989), Hahn and Hoerl
other areas outside the traditional manufacturing or (1998), Hahn et al. (1999), Hahn (2002) and Meng
industrial environments. Six Sigma has been very suc- (2009) have a common theme in addressing what sta-
cessful for a certain class of problems and can con- tisticians can do to provide more value for their
tinue to be successful. As such we believe that the organizations. They emphasize the need to move
systematic methodology of Six Sigma falls under the beyond the tools to a broader skillset. The message
broad umbrella of analytics as a relevant methodology hasn’t changed much over the years, statisticians could
for a certain class of problems. do so much more and are sometimes limited by what
Hoerl and Snee (2017) elaborated on the concept their organizations let them do. If only our organiza-
of “statistical engineering” which has gained traction tions really understood our unique skillsets goes the
in some circles as a discipline. Statistical engineering refrain, we could really do something. In some cases,
was defined by Hoerl and Snee (2010, p. 123) as “The it is already changing and changing rapidly. But for
study of how to best use statistical concepts, methods, other organizations, why has the situation not
and tools and integrate them with information changed over the years?
140 W. A. JENSEN

We believe that this is because statisticians have the profession beyond just the focus to a single meth-
been too tool-centric in their focus. They love to get odology. A singular focus to statistical methods and
into the data and come up with cool ways to analyze the scientific method places more emphasis on hypoth-
that data. They’ve been too focused on the science esis testing and statistical significance which work well
aspects of data analysis and not enough on the value for a certain set of problems. However, hypothesis test-
of the information coming from the data analysis and ing as a confirmatory analysis is limited in the types of
how that information translates to better decision- problems for which it is relevant. We have to incorpor-
making. They focus more on innovative methodolo- ate other tools to expand our toolbox and the different
gies for analyzing the data and less on the innovative approaches to solve problems. These include techni-
applications of methodologies to real problems. The ques for data mining, machine learning, meta-data
statistics journals are full of incremental improve- analysis and exploratory data analysis. Use of these
ments to existing methodologies that have little methods have to be paired with more emphasis on
impact on real world problems with data. Yes, it is data visualization. We believe that graphical methods
important to develop new methods but there is a far for exploring data are more than just a way to commu-
greater need for more individuals who know how to nicate the results of an analysis, they are a form of ana-
apply the right methods to real problems than in lysis. They are a way to discover insights.
developing new methods. The business world and many other organizations
As mentioned previously, some have called for sta- need scientists, but they also need individuals who
tisticians to move beyond the mathematical aspects of can translate the insights into decisions and actions. It
statistics to more of an applied focus. This is captured needs those who can drive change based on the data
in a remarkable paper written more than a half-cen- analysis results. There will continue to be a role for
tury ago, Tukey (1962, p. 2) argued for this broader statisticians in the traditional sense as data analysts,
focus to graphical methods and data analysis when consultants and teachers to help others learn how to
saying “For a long time I thought I was a statistician, analyze their data. But there are additional roles that
interested in inferences from the particular to the gen- are appearing to meet gaps that organizations cur-
eral. But as I have watched mathematical statistics rently experience in implementing analytics. These
evolve, I have cause to wonder and to doubt … All roles may encompass tools and approaches for a par-
in all, I have come to feel that my central interest is ticular application area such as supply chain analytics
in data analysis” He includes in his definition of data or marketing analytics. In addition, other roles may
analysis the planning of data collection, analyzing and involve broader skillsets. For example, analytics team
interpretation of the data. This is similar to the con- leaders or analytics project managers require individu-
cept in the Box quote shared earlier of becoming 1st als who have both leadership and project management
rate scientists. This is an admirable goal for statisti- skills with a strong understanding of the different dis-
cians and a way that they can make contributions to ciplines within analytics. These roles require a more
their organizations. holistic view and experience with the broader strategy
However, even the shift to being 1st rate scientists to solve problems and aid in decision making.
can still be limiting. The business world generally sees
scientists as subject matter experts and consultants.
What does this mean for the future?
Scientists are generally not seen as decision makers or
problem solvers. So, we offer a possible alternative in The rise in analytics is a continuation of the democra-
the form of the earlier quote of Box, “Why do we tization of statistics which has been noted for some
aspire to limit ourselves as scientists when we can time (Hahn 2003). In some cases, individuals with a
solve real problems and influence decision-making?” more limited statistical expertise have job titles such
Being a first-rate problem solver means working on as Business Analyst, Data Scientist and others. The
high impact challenges that provide value to the potential for misuse of statistical methods may be less-
organization. It means supporting good decision mak- ened by the wide availability of accessible training and
ing across all levels of the organization. software but counteracted by the increasing potential
Note that we are not advocating that as problem for misuse because of the ease of use of the tools
solvers that we dispense with the scientific method. On (Trikha 2015). It is easier than before to apply
the contrary, the scientific method is foundational to machine learning models to datasets. In some cases, it
the problem-solving methodology and to decision doesn’t take much more than a few clicks and the
making. What we are advocating is a broader view of effort to get started is significantly lower than before.
QUALITY ENGINEERING 141

This creates a different set of challenges going forward successful. For those who have been playing in the
in the future. “industrial statistics” neighborhood, there is plenty of
Some software packages have features for auto- opportunity to contribute to the field even as it has
mated model building where all models are fit and the become populated by many without degrees in statis-
“best” one is output for the user to use, with little tics. It requires gaining more experience and know-
understanding of what the models actually or doing ledge of the subject matter, to become more
and without a real look as to whether or not the “embedded” as noted by Hahn (2003).
model makes sense for the data. Statisticians are well Anderson-Cook, Lu, and Parker (2019) discussed
aware of the limitations of models and the masking of the importance of being able to understand the con-
underlying differences and issues that can occur when text under which the data are collected. As world gets
making judgements solely based on a goodness-of-fit more complex, it takes more time to understand the
criteria or other predictive measures. This will create context and this leads to more specialization in appli-
more opportunities to educate and teach others how cation areas. So, it is not enough to just be a statisti-
to appropriately use analytics tools. In cases where cian who understands a little bit about certain
very large data sets require automated model fitting business areas. A greater depth of understanding will
procedures, there are opportunities for developing be required to be able to successfully apply analytics
methodologies that will be robust for users with less in different areas. Certain areas tend to use different
experience in model building. sets of tools so there will naturally be some specializa-
These opportunities in analytics mean new opportu- tion in tools when specializing in an application area.
nities and new possible roles for those with skills in For example, marketing problems will use more tools
data analysis, including statisticians. These opportuni- to assess consumer preferences, finance problems will
ties lead to some important implications for the future involve forecasting methods and people problems will
of the profession. We discuss three key implications in involve network analysis. This specialization will open
more detail. First, statisticians should define their spe- up different career paths that may not have been as
cialties by application areas and not by tools and meth- available to statisticians in the past.
ods. Second, collaboration across multiple disciplines
will become more crucial. And third, it is possible to
Collaboration
grow our skillsets into other areas and be successful in
other disciplines. We address each of these in order. Analytics requires work across multiple disciplines. To
continue the analogy with science, for broad complex
scientific problems, it is not enough to just be able to
Specialization
use tools in physics or tools in chemistry. Multiple sci-
Statisticians can no longer be the only experts in data entific disciplines can be used together to solve com-
analysis. The availability of data and the demand for plex scientific problems. Likewise, for complex business
data analysis skills far outstrips the supply. decisions, it is not enough to be able to do statistics or
Statisticians have to specialize a little more, the field build some computer code or use some machine learn-
of data analysis and the quantitative data sciences is ing algorithm. The tools across multiple disciplines
too big for any single discipline. But this specialization have to be used together as described in Trikha (2015).
can be more than just specialization in the tools, it Anderson-Cook, Lu, and Parker (2019) and the
can be in the application areas. This is similar to the ensuing discussion highlight the importance of collab-
concept of being a scientist or engineer. Because the oration for statisticians with those in other disciplines.
field of science is so broad, scientists work in a nar- While working with others has always been important
rower capacity focusing on a specific set of problems. for the profession, collaboration is more than just
Similarly, in engineering, there are fewer generalists. working well with others. A consulting relationship is
Rather you see engineers specializing in a particular not collaboration. True collaboration is a joint effort on
area, such as electrical, mechanical, civil, aerospace, or a common goal and a real team effort. Collaboration
computer engineering. skills will become increasingly crucial for analytics
While statisticians love to be able to “play in many projects and teams to be able to tackle more complex
backyards”, the number of backyards where data ana- problems. Some of these collaborative efforts will be
lysis is needed is becoming so large that it is impos- more temporary in tackling specific problems. But we
sible to play in all of them. Rather, one must focus on see a growing need for collaborative, multi-disciplinary,
a group of similar backyards in a neighborhood to be cross-functional teams that are fully dedicated to
142 W. A. JENSEN

tackling broad and complex analytics challenges. as a profession (Anderson-Cook, Lu, and Parker
Statisticians can be a part of these teams and move 2019). We have found in our own experiences that
beyond a consulting role to roles as core team mem- shifting our focus from statistics to analytics has cre-
bers. Increased collaborative efforts in methodology ated opportunities that would not have been possible
development will also help ensure that those methodol- otherwise. Activity seeking out business problems has
ogies can be usefully applied to real problems. allowed us to increase our impact on the organization.
Business leaders are desperate for analytical talent and
don’t often realize that statisticians have the skills that
Expanding skillsets
they are looking for, provided that statisticians are
The skills required to be successful in broader uses of open to new problems, new applications, new tools
statistics have been noted previously by authors. Hahn and different ways of working.
and Doganaksoy (2012a) give a comprehensive look at Rather than bemoaning the fact that organizations
these skills that go beyond knowledge of statistical don’t really understand statistics or statisticians or
methods. These skills include communication skills, that they are following the hype of Big Data, why not
holistic thinking, flexibility, teamwork, leadership and jump onto the analytics bandwagon? There is a huge
lifelong learning. These skills are applicable and espe- talent gap when it comes to data analysis skills.
cially relevant for analytics work. Others who have dis- Companies are fighting for the talent which is scarce.
cussed some of these skills include Hahn (2002, 2003), Statisticians can help with analytics problems if they
Steinberg et al. (2008), Hahn and Doganaksoy (2012b), are willing to step out of traditional statistician roles
Nelson (2018, Chapter 12) and Anderson-Cook, Lu, and embrace broader and more diversified roles that
and Parker (2019). Nelson (2018) in particular dis- are available. Of course, we believe there will continue
cusses a number of practices and approaches to be suc- to be roles for statisticians well into the future, as
cessful with analytics efforts, well beyond the specific long as there is data. But we also believe that there
data analysis tools. These include skills related to are opportunities to develop a broader set of skills
understanding the source and quality of the data being beyond the statistical and increase one’s mobility to
used, the potential biases that can occur when using a be better positioned for a rapidly changing workforce
set of data and how to translate the analysis result to environment.
actions to take or decisions to make. We believe the Some in the statistics profession have previously
traditional statistics curriculum with its heavy focus on noted the importance of linking the information to
inferential procedures and p values is not adequate in action and decision making, such as those advocating
preparing statisticians for the future opportunities that for the concepts of Six Sigma or of Statistical
are appearing in the analytics arena. Engineering. However, rather than trying to create
The different elements described in Figure 1 corres- another term or discipline and further muddy the
pond to different sets of skills needed. For example, waters, why not take an existing term no matter how
data engineering skills are necessary to be able to imperfect it is and make it our own? Why not join
bring different data sources together and prepare it so forces with many others with non-statistical back-
that it is the right data for analysis. Data visualization grounds who have an interest in data analysis?
skills with graphical layouts, dashboard design and The idea to expand the boundaries of statistics to a
storytelling are important for communicating the data broader field has been noted by other authors such as
analysis results. Breiman (2001, p. 231) who shared “Oddly, we are in
Given the speed at which the world is evolving, it a period where there has never been such a wealth of
is no longer possible to stop learning and still be suc- new statistical problems and sources of data. The dan-
cessful. Software is becoming more and more power- ger is that if we define the boundaries of our field in
ful and the toolbox must be adapted to match the terms of familiar tools and familiar problems, we will
data that is available and the problems that are pre- fail to grasp the new opportunities.” We believe we
sent. As these new skills are gained, it opens up add- can expand the boundaries by embracing the opportu-
itional opportunities and allows more career mobility. nities being presented by analytics.
Tukey (1962, p. 64) again provides a remarkable
insight into the future and says it better than we can
An opportunity to rebrand?
say it when he shared “The future of data analysis can
Different authors have noted the difficulty of statisti- involve great progress, the overcoming of real difficul-
cians gaining some level of respect and appreciation ties, and the provision of a great service to all fields of
QUALITY ENGINEERING 143

science and technology. Will it? That remains to us, Statistical Science 16 (3):199–231. doi:10.1214/ss/
to our willingness to take up the rocky road of real 1009213726.
problems in preference to the smooth road of unreal Britz, G., D. Emerling, L. Hare, R. Hoerl, and J. Shade.
1996. Statistical thinking. ASQC Statistics Division Special
assumptions, arbitrary criteria, and abstract results Publication. Available at: http://asq.org/statistics/1996/03/
without real attachments. Who is for the challenge?” statistical-thinking.pdf (accessed December 17, 2018).
Bross, I. D. 1974. The role of the statistician: Scientist or
shoe clerk. The American Statistician 28 (4):126–7. doi:
About the authors 10.2307/2683335.
Willis A. Jensen is a member of the HR Analytics team at Davenport, T. H. 2006. Competing on analytics. Harvard
W. L. Gore & Associates, where he works on data and ana- Business Review 84 (1):98–107. Available at: https://hbr.
lytics problems related to people data. He previously led the org/2006/01/competing-on-analytics (accessed June 22,
global statistics team that provided statistical support and 2018).
training across the globe. He holds degrees in Statistics Davenport, T. H., and J. G. Harris. 2007. Competing on
from Brigham Young University and a PhD in Statistics analytics: The new science of winning. Boston, MA:
from Virginia Tech. He served for 5 years as an Associate Harvard Business Press.
Editor of Technometrics, currently serves as a member of Donoho, D. 2017. 50 years of data science. Journal of
the editorial board for Journal of Quality Technology and Computational and Graphical Statistics 26 (4):745–66.
previously served as a member of the editorial board of doi:10.1080/10618600.2017.1384734.
Quality Engineering. He was a steering committee member Economist. 2017. The world’s most valuable resource is no
of the ASQ/ASA Fall Technical Conference and is a three- longer oil, but data. Available at: https://www.economist.
time winner of the Shewell award for best presentation of com/leaders/2017/05/06/the-worlds-most-valuable-
that conference. He is a recipient of the Nelson and resource-is-no-longer-oil-but-data (accessed December
Bisgaard awards from ASQ and is an ASQ fellow. 14, 2018).
Elliott, T. 2013. #GartnerBI: Analytics moves to the core.
Available at: http://timoelliott.com/blog/2013/02/gart-
nerbi-emea-2013-part-1-analytics-moves-to-the-core.html
Acknowledgments (accessed May 4, 2016).
Hahn, G., and R. Hoerl. 1998. Key challenges for statisti-
The authors thank their colleagues Claire Crawford, Chris cians in business and industry. Technometrics 40 (3):
Chen and Cameron Willden at W.L. Gore & Associates as 195–200. doi:10.1080/00401706.1998.10485516.
well as Roger Hoerl at Union College for their input and Hahn, G. J. 2002. Deming and the proactive statistician. The
many discussions that have contributed to the ideas American Statistician 56 (4):290–8. doi:10.1198/
shared here.
000313002542.
Hahn, G. J. 2003. The embedded statistician. In WJ Youden
Address, 47th Annual Fall Technical Conference, El Paso,
References
TX.
Anderson-Cook, C. M., L. Lu, and P. Parker. 2019. Effective Hahn, G. J., and N. Doganaksoy. 2012a. A career in statis-
interdisciplinary collaboration between statisticians and tics: Beyond the numbers. Hoboken, NJ: Wiley.
other subject matter experts with discussion. Quality Hahn, G. J., and N. Doganaksoy. 2012b. Traits of a success-
Engineering 31 (1):164–76. ful statistician. Available at: http://stattrak.amstat.org/
American Statistical Association 2018. What is statistics? 2012/06/01/successfulstatistician/ (accessed December 26,
Available at: https://www.amstat.org/asa/what-is-statistics. 2018).
aspx. (accessed December 14, 2018). Hahn, G. J., W. J. Hill, R. W. Hoerl, and S. A. Zinkgraf.
ASA Working Group. 2014. Discovery with data: 1999. The impact of six sigma improvement—a glimpse
Leveraging statistics with computer science to transform into the future of statistics. The American Statistician 53
science and society. Available at: http://www.amstat.org/ (3):208–15. doi:10.2307/2686099.
policy/pdfs/BigDataStatisticsJune2014.pdf. (accessed Hamel, G., and B. Breen. 2007. The future of management.
November 6, 2014) Boston, MA: Harvard Business Press.
ASQ Statistics Division. 2018. Statistical engineering. Hoerl, R. W. 2001. Six sigma black belts: What do they
Available at: http://asq.org/divisions-forums/statistics/ need to know?. Journal of Quality Technology 33 (4):
quality-information/statistical-engineering (accessed 391–406. doi:10.1080/00224065.2001.11980094.
December 20, 2018). Hoerl, R. W. 2019. The integration of big data analytics
Boroto, D. R., and D. A. Zahn. 1989. Promoting statistics: On into a more holistic approach. JMP whitepaper Available
becoming valued and utilized. The American Statistician at: https://www.jmp.com/en_us/whitepapers/jmp/integra-
43 (2):71–2. doi:10.1080/00031305.1989.10475618. tion-of-big-data-analytics-holistic-approach.html (accessed
Box, G. E. P. 1990. Discussion of communications between June 4, 2019).
statisticians and engineers/physical scientists. Hoerl, R. W., and R. Snee. 2010. Statistical thinking and
Technometrics 32 (3):251–2. doi:10.2307/1269094. methods in quality improvement: A look to the future.
Breiman, L. 2001. Statistical modeling: The two cultures Quality Engineering 22 (3):119–29. doi:10.1080/08982112.
(with comments and a rejoinder by the author). 2010.481485.
144 W. A. JENSEN

Hoerl, R. W., and R. D. Snee. 2017. Statistical engineering: oxforddictionaries.com/definition/analytics (accessed


An idea whose time has come? The American Statistician December 17, 2018).
71 (3):209–19. doi:10.1080/00031305.2016.1247015. Oxford University Press. 2018b. “Statistics” in Oxford
INFORMS. 2018. Certified analytics professional handbook. Dictionary online. Available at: https://en.oxforddiction-
Available at: https://www.certifiedanalytics.org/index.php aries.com/definition/statistics (accessed December 28,
(accessed December 28, 2018). 2018).
International Statistical Engineering Association. 2018. Pfeifer, C. G., D. W. Marquardt, and R. D. Snee. 1988. A
What is statistical engineering? Available at: https://isea- time for change. Chance: New Directions for Statistics and
change.org/page-18073 (accessed December 20, 2018).
Computing 1 (1):39–42.
Jain, K. 2013. What is big data and how is big data architec-
Random House. 2018. “Statistics” in Random House
ture designed? Available at: http://www.analyticsvidhya.
Unabridged Dictionary online as Dictionary.com,
com/blog/2013/07/big-data (accessed December 17, 2018).
Jones-Farmer, L. A., J. D. Ezell, and B. T. Hazen. 2014. Available at: https://www.dictionary.com/browse/statistics
Applying control chart methods to enhance data quality. (accessed December 14, 2018).
Technometrics 56 (1), 29–41. doi:10.1080/00401706.2013. Rose, R. 2016. Defining analytics: A conceptual framework.
804437. Or/MS Today 43 (3):34–8.
Kenett, R. S., and G. Shmueli. 2014. On information quality. Shmueli, G. 2010. To explain or to predict? Statistical
Journal of the Royal Statistical Society: Series A (Statistics Science 25 (3):289–310. doi:10.1214/10-STS330.
in Society) 177 (1):3–38. doi:10.1111/rssa.12007. Steinberg, D. M., S. Bisgaard, N. Doganaksoy, N. Fisher, B.
Kenett, R. S., and G. Shmueli. 2016. Information quality: Gunter, G. Hahn, S. Keller-McNulty, J. Kettering, W. Q.
The potential of data and analytics to generate knowledge. Meeker, D. C. Montgomery, et al. 2008. The future of
Hoboken, NJ: Wiley. industrial statistics: A panel discussion. Technometrics 50
Kirby, J. 2010. The decade in management ideas. Available (2):103–27. doi:10.1198/004017008000000136.
at: https://hbr.org/2010/01/the-decade-in-management- Trikha, R. 2015. The risky eclipse of statisticians. Available
ideas?referral=03758&cm_vc=rr_item_page.top_right at: https://blog.hackerrank.com/the-risky-eclipse-of-statis-
(accessed June 22, 2018). ticians/ (accessed January 2, 2019).
Leek, J. 2013. The key word in “Data science” is not data, it Tukey, J. W. 1962. The future of data analysis. The Annals
is science. Available at: https://simplystatistics.org/2013/ of Mathematical Statistics 33 (1):1–67. doi:10.1214/aoms/
12/12/the-key-word-in-data-science-is-not-data-it-is-sci-
1177704711.
ence/ (accessed December 20, 2018).
Tukey, J. W., and L. V. Jones. 1986. The collected works of
Lenth, R. V. 2014. The web of statistics. Statistics Division
John W. Tukey. Vol. 3, philosophy and principles of data
Newsletter 33 (1):12–19.
Meng, X. L. 2009. Desired and feared—What do we do now analysis: 1949–1964. Belmont, CA: Wadsworth Advanced
and over the next 50 years?. The American Statistician 63 Books & Software.
(3):202–10. doi:10.1198/tast.2009.09045. Wikipedia Contributors. 2018. “Statistics.” In Wikipedia,
Nelson, G. S. 2018. The analytics lifecycle toolkit: A practical The Free Encyclopedia. Available at: https://en.wikipedia.
guide for an effective analytics capability. Hoboken, NJ: org/wiki/Statistics (accessed December 14, 2018).
Wiley. Wikipedia Contributors. 2018. “Analytics.” In Wikipedia,
Oxford University Press. 2018a. “Analytics” in Oxford The Free Encyclopedia. Available at: https://en.wikipedia.
Dictionary online. Available at: https://en. org/wiki/Analytics (accessed December 28, 2018).

You might also like