2-Data Analytics Lifecycle

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

- 2- Data Analytics Lifecycle

Lecture 2: Data Analytics Lifecycle 1


Lecture 2:
Data Analytics Lifecycle
Upon completion of this module, you should be able to know about:

○ Data Analytics Life Cycle


■ Discovery
■ Data preparation,
■ Model Planning,
■ Model Building,
■ Communicate Results,
■ Operationalize

Lecture 2: Data Analytics Lifecycle 2


Key Roles for a Successful Analytic Project
Role Description

Someone who benefits from the end results and can consult and advise project team on
Business User
value of end results and how these will be operationalized

Person responsible for the project, providing the motives for the project and core business
Project Sponsor problem, generally provides the funding and will assess the degree of value from the final
outputs of the working team

Project Manager Ensure key milestones and objectives are met on time and at expected quality.

Business Intelligence Business domain expertise with deep understanding of the data, KPIs, key metrics and
Analyst business intelligence from a reporting perspective

Deep technical skills to assist with tuning SQL queries for data management, extraction and
Data Engineer
support data ingest to analytic sandbox

Database Database Administrator who provisions and configures database environment to support
Administrator (DBA) the analytical needs of the working team
Provide subject matter expertise for analytical techniques, data modeling, applying valid
Data Scientist analytical techniques to given business problems and ensuring overall analytical objectives
are met

Lecture 2: Data Analytics Lifecycle 3


Data Analytics Lifecycle
Do I have enough information to draft an
1 analytic plan and share for peer review?
Discovery
Do I have
enough good
quality data to
6 2
start building
Operationalize Data Prep the model?

5 3
Communicate Model
Results Planning

4
Model Do I have a good idea
Building about the type of model
Is the model robust to try? Can I refine the
enough? Have we analytic plan?
failed for sure?

Lecture 2: Data Analytics Lifecycle 4


Data Analytics Lifecycle Do I have enough
Phase 1: Discovery information to draft an
analytic plan and share for
1 peer review?
Discovery
Do I have
enough good

➔Learn the Business Domain


quality data to
start building
◆ Operationalize Data you
Determine amount of domain knowledge needed to orient Prepto the the model?
data and
interpret results downstream
◆ Determine the general analytic problem type (such as clustering, classification)
➔Learn from the past
Communicate
◆ Have Model
there been previous attempts in the organization to solve this problem?
Results Planning
◆ If so, why did they fail? Why are we trying again? How have things changed?

Model Do I have a good idea


Building about the type of model
Is the model robust to try? Can I refine the
enough? Have we analytic plan?
failed for sure?

Lecture 2: Data Analytics Lifecycle 5


Data Analytics Lifecycle Do I have enough
Phase 1: Discovery information to draft an
analytic plan and share for
1 peer review?
Discovery
Do I have
enough good
quality data to

➔Resources
start building
Operationalize Data Prep the model?

◆ Assess available technology


◆ Available data – sufficient to meet your needs
◆ People for the working team
Communicate Model
◆ Assess scope of time for the project in calendar time
Results and person-hours
Planning
◆ Do you have sufficient resources to attempt the project? If not, can you
get more?
Model Do I have a good idea
Building about the type of model
Is the model robust to try? Can I refine the
enough? Have we analytic plan?
failed for sure?

Lecture 2: Data Analytics Lifecycle 6


Data Analytics Lifecycle Do I have enough
Phase 1: Discovery information to draft an
analytic plan and share for
1 peer review?
Discovery
Do I have
enough good

➔Frame the problem…..Framing is the process of


quality data to
stating the analyticsstart
problem
building
to beOperationalize
solved Data Prep the model?

◆ State the analytics problem, why it is important, and to whom


◆ Identify key stakeholders and their interests in the project
◆ Clearly articulate the current situation and pain points
Communicate Model
◆ Objectives – identify what needs to be achieved in business terms and what needs
to beResults
done to meet the needs Planning
● What is the goal? What are the criteria for success? What’s “good
enough”?
Model Do I have a good idea
● What is the failure criterion (when do we just stopabouttryingtheor settle
type for what
of model
Is the model robust
Building
we have)? to try? Can I refine the
enough? Have we analytic plan?
◆ Identify
failed the success criteria, key risks, and stakeholders
for sure?

Lecture 2: Data Analytics Lifecycle 7


Data Analytics Lifecycle Do I have enough
Phase 1: Discovery information to draft an
analytic plan and share for
1 peer review?
Discovery
Do I have
enough good
➔Identify Data Sources – Begin Learning the Data quality data to
start building
Aggregate sources for previewing the data and Data
◆ Operationalize provide high-
Prep the model?
level understanding
◆ Review the raw data
◆ Determine the structures and tools needed
Communicate Model
Results Planning

Model Do I have a good idea


Building about the type of model
Is the model robust to try? Can I refine the
enough? Have we analytic plan?
failed for sure?

Lecture 2: Data Analytics Lifecycle 8


Data Analytics Lifecycle Do I have enough
Phase 2: Data Preparation information to draft an
analytic plan and share for
peer review?

➔Prepare Analytic Sandbox Discovery


Do I have
◆ Workspace for the analytic team enough good
quality data to
◆ Determine needed transformations 2
start building
● Assess data quality and structuring
Operationalize Data Prep the model?

● Derive statistically useful measures


◆ Determine and establish data connections
for raw data
Communicate Model
Results Planning

Model Do I have a good idea


• Useful Tools for this phase: Building about the type of model
Is the model robust
• For Data Transformation & Cleansing: SQL, Hadoop, MapReduce,
to try? CanAlpine
I refine Miner
the
enough? Have we analytic plan?
failed for sure?

Lecture 2: Data Analytics Lifecycle 9


Data Analytics Lifecycle Do I have enough
Phase 2: Data Preparation information to draft an
analytic plan and share for
peer review?
Discovery
➔Familiarize yourself with the data Do I have
◆ List your data sources enough good
quality data to
2
◆ What’s needed vs. what’s available start building
➔DataOperationalize
Conditioning Data Prep the model?

◆ Clean and normalize data


➔Survey & Visualize
◆ Overview, zoom & filter
Communicate Model
◆ Descriptive
ResultsStatistics Planning
◆ Data Quality

Model Do I have a good idea


Building about the type of model
• Is the model
Useful Tools for thisrobust
phase: to try? Can I refine the
• Descriptive
enough? Statistics
Have we on candidate variables for diagnostics & quality analytic plan?
• Visualization:
failed for sure?
R (base package, ggplot and lattice), GnuPlot, Ggobi/Rggobi, Spotfire,
Tableau

Lecture 2: Data Analytics Lifecycle 10


Data Analytics Lifecycle Do I have enough
Phase 3: Model Planning 1/4 information to draft an
analytic plan and share for
peer review?
Discovery
Do I have
➔Determine Methods enough good
quality data to
◆ Select methods based on hypotheses, data start building
structure and volume
Operationalize Data Prep the model?

◆ Ensure techniques and approach will meet


business objectives
➔Techniques & Workflow 3
Communicate Model
◆ Candidate sample tests and sequence
Results Planning
◆ Identify and document modeling
assumptions
Model Do I have a good idea
Building about the type of model
Is the model robust to try? Can I refine the
• Useful enough?
Tools for thiswephase: R/PostgresSQL, SQL
Have analytic plan?
Analytics,failed
Alpine Miner, SAS/ACCESS, SPSS/OBDC
for sure?

Lecture 2: Data Analytics Lifecycle 11


Data Analytics Lifecycle Do I have enough
Phase 3: Model Planning 2/4 information to draft an
analytic plan and share for
peer review?
➔Data Exploration Discovery
➔Variable Selection (attributes) Do I have
enough good
◆ Inputs from stakeholders and domain quality data to
experts start building
Operationalize Data Prep the model?
◆ leverage a technique for dimensionality
reduction
◆ Iterative testing to confirm the most
3
significant variables
Communicate Model
➔Model Selection
Results Planning
◆ Choose technique based on the end goal

Model Do I have a good idea


Building about the type of model
Is the model robust to try? Can I refine the
enough? Have we analytic plan?
failed for sure?

Lecture 2: Data Analytics Lifecycle 12


Data Analytics Lifecycle Do I have enough
Phase 4: Model Building information to draft an
analytic plan and share for
peer review?
Discovery
➔Develop data sets for testing, training, and production purposes Do I have
◆ Need to ensure that the model data is sufficiently robust for enough good
the
quality data to
model and analytical techniques start building
◆Operationalize Data
test sets for validating approach, training set for Prep
initial experiments the model?

➔Get the best environment you can for building models and
workflows…fast hardware, parallel processing
Communicate Model
Results Planning

4
Is the model robust Model Do I have a good idea
enough? Have we Building about the type of model
failed for sure? to try? Can I refine the
analytic plan?
• Useful Tools for this phase: R, PL/R, SQL, Alpine Miner, SAS Enterprise Miner

Lecture 2: Data Analytics Lifecycle 13


Data Analytics Lifecycle Do I have enough
Phase 5: Communicate Results information to draft an
analytic plan and share for
peer review?
Discovery
Do I have
enough good
quality data to
Did we succeed? Did we fail? start building
Operationalize ➔Interpret the results Data Prep the model?

➔Identify key findings


➔Quantify business value
5
➔Summarizing findings, depending
Modelon audience
Communicate
Results Planning

Model Do I have a good idea


Building about the type of model
Is the model robust to try? Can I refine the
enough? Have we analytic plan?
failed for sure?

Lecture 2: Data Analytics Lifecycle 14


Data Analytics Lifecycle Do I have enough
Phase 6: Operationalize information to draft an
analytic plan and share for
peer review?
Discovery
Do I have
➔Run a pilot enough good
quality data to
➔Assess the benefits
6
start building
Operationalize Data Prep the model?
➔Provide final deliverables
➔Implement the model in the production environment
➔Define process to update, retrain, and retire the
Communicate model, as needed Model
Results Planning

Model Do I have a good idea


Building about the type of model
Is the model robust to try? Can I refine the
enough? Have we analytic plan?
failed for sure?

Lecture 2: Data Analytics Lifecycle 15


Mini Case Study:

Analytic Plan Churn Prediction for


Retail Banking

Components of Retail Banking: Yoyodyne Bank


Analytic Plan
Phase 1: Discovery How do we identify churn/no churn for a customer?
Business Problem
Framed
Phase 2: Data prep 5 months of customer account history.

Phase 3: Model Planning regression to identify most influential factors predicting churn.
- Analytic
Technique
Phase 4: Model Apply the model on data
Execution
Phase 5: Once customers stop using their accounts for gas and groceries, they
Result & will soon erode their accounts and churn.
Key Findings If customers use their debit card fewer than 5 times per month, they
will leave the bank within 60 days.
Business Impact If we can target customers who are high-risk for churn, we can
reduce customer attrition by 25%. This would save $3 million in lost
of customer revenue and avoid $1.5 million in new customer
acquisition costs each year.

Lecture 2: Data Analytics Lifecycle 16


Check Your Knowledge

● In which phase would you expect to invest most of your project time
and why? Where would expect to spend the least time?
● What are the benefits of doing a pilot program before a full scale
rollout of a new analytical methodology? Discuss this in the context
of the mini case study.
● What kinds of tools would be used in the following phases, and for
which kinds of use scenarios?
○ Phase 2: Data Preparation
○ Phase 4: Model Execution
● Now that you have completed the analytical project at Yoyodyne, you
have an opportunity to repurpose this approach for an online
eCommerce company. What phases of the lifecycle do you need to
focus on to identify ways to do this?

17

You might also like