Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

Data Driven

Magíster/Diplomado en Data Science

25 de Junio de 2022
Christopher Pope Schwartz
c.pope@udd.cl
Data Driven
• For an organization to be data-driven, there have to be
humans in the loop, humans who ask the right
questions of the data, humans who have the skills to
extract the right data and metrics, and humans who use
that data to inform next steps. In short, data alone is not
going to save your organization.
What do we mean by Data-Driven?
• Data-drivenness is about building tools, abilities, and,
most crucially, a culture that acts on data.
– Data Collection
– Data Access
– Reporting
– Alerting
– Analysis

Without data you’re just another person with an opinion.


What do We mean by Data-Driven?

• The analytics value chain (from Dykes, 2010). In a data-driven organization, the data feed
reports, which stimulate deeper analysis. These are fed up to the decision makers who
incorporate them into their decision-making process, influencing the direction that the
company takes and providing value and impact. Figure from http://bit.ly/dykes-reporting.
Data Collection
• Prerequisite #1: An organization must be collecting data
• Has to be the right data. The dataset has to be relevant to the
question at hand. It also has to be timely, accurate, clean, unbiased;
and perhaps most importantly, it has to be trustworthy.
• “Data Janitor Work”: Data scientists spend 80% of their time
obtaining, cleaning, and preparing data, and only 20% of their time
building models, analyzing, visualizing, and drawing conclusions
from that data
• big data as a panacea: if you collect everything, somewhere in there
are diamonds
Data Collection
• Prerequisite #2: Data must be accessible and queryable.
– Joinable: relational databases, NoSQL, Hadoop, Excel?, etc.
– Shareable: data-sharing culture within the organization. Siloed
data is always going to inhibit the scope of what can be
achieved
– Queryable: There must be appropriate tools to query and slice
and dice the data
Data With Limited View
• The blind men and the giant
elephant: the localized
(limited) view of each blind
man leads to a biased
conclusion.
Reporting
• Is this useful? Is this enough?
Alerting
• Alerts are essentially reports about what is happening right
now. They typically provide very specific data with well-
designed metrics. But like reports, they don’t tell you why
you are seeing a spike in CPU utilization, and they don’t
tell you what to do, right now, to rectify the problem. As
such, like reports, they lack this crucial context. There is
no causal explanation.
From Reporting and Alerting to Analysis
• Reporting especially is a highly valuable component of a data-
driven organization. You can’t have an effective one without it.
• Reporting => Data-Driven?
• Reporting is a fundamentally backward view of the world
• To be data-driven, you have to go beyond that. To be forward-
looking and engage in analysis, dig in, and find out why the
numbers are changing and, where appropriate, make testable
predictions or run experiments to gather more data that will
shed light on why.
From Reporting and Alerting to Analysis
• Reporting
– “The process of organizing data into informational summaries
in order to monitor how different areas of a business are per-
forming”
• Analysis
– “Transforming data assets into competitive insights that will
drive business decisions and actions using people, processes
and technologies”
From Reporting and Alerting to Analysis
• Key attributes of reporting versus analysis
From Reporting and Alerting to Analysis
• Davenport’s hypothesized key questions addressed by analytics (modified from
Davenport et al., 2010). D) is valuable analytics but only E) and F) are data-
driven and then if and only if information is acted upon (more explanation in
text).

Danger Zone
Excel, click
“Chart" and then
“Add trendline”
Hallmarks of Data-Drivenness
• Types of activities that truly data-driven organizations engage in:
– A data-driven organization may be continuously testing (A/B testing).
– A data-driven organization may have a continuous improvement mindset. It
may be involved in repeated optimization of core processes.
– A data-driven organization may be involved in predictive modeling,
forecasting sales, stock prices, or company revenue, but importantly feeding
the prediction errors and other learning back into the models to help
improve them.
– A data-driven organization will almost certainly be choosing among future
options or actions using a suite of weighted variables.
Analytics Maturity
• “Business Intelligence
and Analytics” of
Davenport and Harris’
Competing on Analytics.
HBR Press, previously
derived from Jim Davis’
levels of analytics.
Analytics Maturity
• In 2009, Jim Davis, the senior vice president and chief marketing officer of SAS Institute,
declared that there are eight levels of analytics:
• Standard reports
– What happened? When did it happen?
– Example: monthly financial reports.
• Ad hoc reports
– How many? How often? Where?
– Example: custom reports.
• Query drill down (or online analytical processing, OLAP)
– Where exactly is the problem? How do I find the answers?
– Example: data discovery about types of cell phone users and their calling behavior.
• Alerts
– When should I react? What actions are needed now?
– Example: CPU utilization mentioned earlier.
Analytics Maturity
• Statistical analysis
– Why is this happening? What opportunities am I missing?
– Example: why are more bank customers refinancing their homes?
• Forecasting
– What if these trends continue? How much is needed? When will it be needed?
– Example: retailers can predict demand for products from store to store.
• Predictive modeling
– What will happen next? How will it affect my business?
– Example: casinos predict which VIP customers will be more interested in particular vacation
packages.
• Optimization
– How do we do things better? What is the best decision for a complex problem?
– Example: what is best way to optimize IT infrastructure given multiple, conflicting business and
resource constraints?
How to Incubate
and Sustain an
Analytics-Driven
Culture

Lectura 3: Tips for Building a Data Science Capability (Bozz|Allen|Hamilton)


How to Incubate and Sustain an Analytics-
Driven Culture
• Foster curiosity and experimentation, and embrace failure for the
unexpected insights and learning that it can provide.
• Empowered your data science team to investigate and discover; to
design and run their own experiments that blend inductive pattern
recognition with deductive hypothesis generation; and to explore
“rabbit holes” of interest.
• “time and space to “geek out,” establish a series of reward and
recognition mechanisms focused on investing in passion projects.
• fail fast in the discovery process. It does no good to toil on
something for 6 months and then learn that it doesn’t work.
• Experiment and fail quickly to discover more.

Lectura 3: Tips for Building a Data Science Capability (Bozz|Allen|Hamilton)


How to Incubate and Sustain an Analytics-
Driven Culture
• True curiosity and experimentation requires an
enormous tent of skills, abilities, and perspectives to
achieve meaningful outcomes for data science.
• Include data scientists, technologists, domain experts,
organizational design and strategy practitioners, design
thinkers, and human capital specialists.
• Diversity of thought can help organizations unlock the
particularly thorny problems and extract value from their
data.

Lectura 3: Tips for Building a Data Science Capability (Bozz|Allen|Hamilton)


How to Incubate and Sustain an Analytics-
Driven Culture
• Avoid the added burden of unnecessary
hierarchical structures.
• Provide data science teams with access to
resources, tools, and talent; the freedom to scope
and run their own experiments; and the time and
space to collaborate with a diverse team and the
broader data science community.
• Examples: sandbox to store algorithms and code
so that good work can be replicated without
permission.
Lectura 3: Tips for Building a Data Science Capability (Bozz|Allen|Hamilton)
How to Incubate and Sustain an Analytics-
Driven Culture
• What’s important is to provide a variety of learning
opportunities that can cater to the different skill sets and
attitudes that exist within the organization.
• Design opportunities and learning programs at all levels to
help people get familiar with data science concepts and
become more comfortable in practicing them.
• Don’t cast data scientists as mythical rock stars and then
not offer a way into the “club.” all data scientists have a
role in providing mentorship and learning opportunities.
• Data scientists need to attend hackathons, teach classes,
and mentor junior staff whenever possible.

Lectura 3: Tips for Building a Data Science Capability (Bozz|Allen|Hamilton)


How to Incubate and Sustain an Analytics-
Driven Culture
• Analytics must be available and accessible to all
employees in some form or fashion.
• Example: drag and drop packages of coding to
help solve analytical inquiries, lowers the barrier
to entry and empowers “everyday employees” in
performing data analysis.
• Develop a Field Guide to Data Science, which
provides a baseline understanding about data
science and a common lexicon so that employees
can communicate with each other and engage in
solving analytical problems.
Lectura 3: Tips for Building a Data Science Capability (Bozz|Allen|Hamilton)
How to Incubate and Sustain an Analytics-
Driven Culture
• Reward the approach and thought process
that people take as well as the results
achieved.
• People need to believe in the analytical
process and thus be recognized for trying
it until it is customary.

Lectura 3: Tips for Building a Data Science Capability (Bozz|Allen|Hamilton)


Myths
1. Governance structures are the solution
2. Analytical solutions and operational realities are
mutually exclusive
3. Data Scientists provide a service to the business
4. Complex analytics and tools are better than simple
ones

Lectura 3: Tips for Building a Data Science Capability (Bozz|Allen|Hamilton)


To Build Buy-In

Lectura 3: Tips for Building a Data Science Capability (Bozz|Allen|Hamilton)


Estructura
organizacional
Common
situation:
Analysts and
analytics
projects are
scattered
across the
organization
At some point
those pockets of
analytics need to
be coordinated,
consolidated, or
centralized.
When debating alternative
organizational structures for
analytical groups, it’s
important to keep in mind the
overriding goals for the
organization
• Different priorities for
these goals may lead to
different organizational
models
Goals:
Supporting business
decision-makers with
analytical capabilities

Fostering visibility for


analytics throughout
the organization and
ease in finding help
with analytical
problems and
decisions
Goals:
Providing leadership
and a “critical mass”
home for analytical
people and the ability
to easily share ideas
and collaborate on
projects across
analysts
Building and
monitoring
analytical
capabilities and
expertise
Goals:
Creating standardized methodological approaches, tools, and processes
Researching and adopting new analytical practices
Goals:

Reducing the
cost to
deliver
analytical
outcomes
Different priorities for these goals may lead to different
organizational models!!!

Centralized Model Difussed Model Deployed Model


The Centralized Model
• Centralized data science teams
serve the entire organization
• They report to a chief data
scientist, who decides which
projects the teams will work on,
and how to manage the projects.
• Business units work with the data
science teams to solve specific
challenges.
The Centralized Model: advantages
• Greater efficiency with limited resources, including flexibility to
modify team composition during the life of a project as needs change
• Access to data science is organization-wide, rather than limited to
individual business units
• Central management stream- lines business processes, professional
development, and enabling tools, contributing to economies of scale
• Organizational separation between the business units and data
science teams promotes the perception that analytics are objective
• Project diversity motivates data science teams and contributes to
strong retention
The Centralized Model: challenges
• It can be difficult to enlist business units that have not yet bought in to data
science
• Business units often feel that they compete for data science resources and
projects
• Teams re-form for every new problem, requiring time to establish
relationships, trust, and collaboration
• Business units must provide another organization (i.e., the data science unit)
with access to their data, which they are often reluctant to do
• As a separate unit with rotating staff, data science teams may not develop the
intimate domain knowledge that can provide efficiency to future business
unit projects
The Centralized Model needs to put a focus on…
Selling Analytics Portfolio Management

Demonstrate tangible impacts of analytics Create transparency into how the


to business unit leaders—they are critical organization will identify and select data
partners and need to buy in science projects, including criteria to
prioritize opportunities and align resources

Teamwork Education

Establish early partnerships between data Train business unit leaders on the
science teams and business units, which will fundamentals of data science and the
be integral to framing problems and characteristics of a good data science
translating analytics into business insights problem, so people across the
Organization can recognize opportunities
The Difussed Model
• Diffused, or decentralized, data
science teams are fully embedded
in business units such as
marketing, research and
development, operations, and
logistics.
• The teams report to individual
business unit leaders and perform
work under their leadership.
The Difussed Model: Advantages
• A benefit of this approach is that it allows data science teams to gain a deepened
understanding of how analytics can benefit a particular domain or business units
• Data science teams can quickly react to high-priority business unit needs
• Business units are more likely to own the analytics—to be involved with the data
science effort, accept the output, and adopt some change as a result
• Data science teams learn the organization’s data and its context, reducing project
spin-up and helping them become equal partners in both solving problems and
identifying the possibilities
• A deepened understanding of the business inspires data science teams to ask new,
hard questions of the data, and they understand the right questions to ask
The Difussed Model: Challenges
• Business units with the most money often have full access to analytics while
others have none—this may not translate to the greatest organizational
impact
• Data science teams may face pressure to compromise their objectivity to
avoid making a business unit “look bad”
• Lack of central management may result in redundant software licenses and
tools, which drives up total costs to the organization
• The structure offers limited motivation for business units to integrate,
inhibiting collaboration in already siloed organizations
• Work may become stale to data scientists, driving them to seek new and
diverse challenges
The Difussed Model needs to put a focus on…

Governance Peer Collaboration Creative Outlets

Establish cross-functional Establish forums such as Fund analytics


group(s) responsible for data science communities competitions,
guiding organization-wide of practice and mentorship crowdsourcing, and
analytics standards, to circles to share best conference attendance that
include data, tool selection, practices and lessons allow data scientists to
and means of prioritizing learned (e.g., trends, exercise their minds, solve
analytics efforts algorithms, methods) new problems, and explore
techniques
The Deployed Model
• As with the diffused model, data science teams
are embedded in the business units.
• The difference is that the embedded teams in
the deployed model report to a single chief
data scientist as opposed to business unit
leaders.
• In this model, also called the matrixed
approach, teams are generally assigned to
individual business units, though they are
sometimes also assigned to broader product
lines, or to mission sets comprised of members
from several business units.
The Deployed Model: Advantages
• Shared benefits of both the centralized and diffused model
• Data science teams collectively develop knowledge across business
units, with central leadership as a bridging mechanism for addressing
organization-wide issues
• Access to data science is organization-wide, and close integration
with business units promotes analytics adoption
• Project diversity both motivates data science teams and improves
recruiting and retention
• Central leadership streamlines career management approaches, tool
selection, and business processes/approaches
The Deployed Model: Challenges
• Deployed teams are responsible to two bosses—staff may
become uncertain about to whom they are ultimately
accountable
• Data science teams may face difficulty being accepted into
business units, where long-time relationships have been
established
• Access to analytics-resources may still feel competitive between
business units, and data science units risk alienating business
units whose proposed projects are not selected
The Deployed Model puts a focus on…
Conflict Management Formal Performance Feedback

The chief data scientist should proactively Agree to performance goals at the onset of
engage business unit leaders to prevent each project, and collect feedback during
competing priorities from becoming the data the life of project, including at its
science teams’ responsibility to resolve conclusion

Rotation Pipeline

Allow data science teams to work on projects Regularly communicate the data science
across different business units, rather than project pipeline, allowing business units to
within a single business unit—take advantage see how their priorities are positioned
of one of the main benefits this model
affords
Actividad N°3
Leer:
• Uber - Driven to Democratize Data
Muchas gracias

Christopher Pope Schwartz


c.pope@udd.cl
https://www.linkedin.com/in/cpopesch/

You might also like