Professional Documents
Culture Documents
Introduction To Data Science: Week 1 Unit 1
Introduction To Data Science: Week 1 Unit 1
Deployment &
4 Modeling (2) 5 Evaluation 6 Maintenance
Classification Analysis with Evaluation Phase Overview Deployment Phase
Decision Trees Model Performance Metrics Overview
Classification Analysis with Model Testing Deployment Options
KNN, NN, and SVM Improving Model Monitoring & Maintenance
Time Series Analysis Performance Automating Deployment &
Ensemble Methods Maintenance
Simulation & Optimization Myths & Challenges
Automated Modeling Data Science Applications
and References
Final Exam
Data science is an
interdisciplinary field about
processes and systems that
enable the extraction of
knowledge or insights from
data.
Data science employs
techniques and theories
drawn from a wide range of
disciplines.
SAP HANA
Data Analysts / Citizen
Business Users Data Scientists Application
Data Scientists
Developers
Analytics skills from low to high
Business User / Data Analyst Custom Embedded
Embedded Analytics
Driven Analytics Analytics Analytics
SAP Suite / Application Innovation / Industry / LoB / CDP SAP Hybris Marketing, IoT Predictive Maintenance,
Fraud
Application Function
SAP Predictive Analytics Modeler (AFM)
SAP
SAP SAP HANA SAP RDS Partner
Industry &
Predictive SAP Lumira Studio / Analytics Analytical
LoB
Analytics AFM Solutions BI & Tools
Solutions
SAP HANA
Predictive Analysis Business Function Automated
Simulation Optimization
Library (PAL) Library Predictive Library
R
Text Analysis and
Text Search Spatial Analysis Graph Engine Rules Engine
Mining
Data types
Connect to SAP HANA directly or via Sybase IQ / Hadoop / ESP / Data Services
Application
R
Calculation Engine
R
Rserve
Trigger R
Font
R R
R Operator
Client
Write Rserve
Rserve
R R Runtime
Results
Tables Tables
Contact information:
open@sap.com
2016 SAP SE or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate
company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its
affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or SAP affiliate company products and
services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as
constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop
or release any functionality mentioned therein. This document, or any related presentation, and SAP SEs or its affiliated companies strategy and possible future
developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time
for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-
looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place
undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.
A project methodology:
Task 1
Provides a framework for recording experience
Allows projects to be replicated Task 2
Provides an aid to project planning and Task 3
management
Is a comfort factor for new adopters Task 4
Reduces dependency on stars
Business Data
Understanding Understanding
Data
Preparation
Deployment
Modeling
Data
Evaluation
Business
Determine Business Business
Background Success
Objectives Objectives
Criteria
Requirements
Assess Inventory of Risks & Costs &
Assumptions & Terminology
Situation Resources Contingencies Benefits
Constraints
Data Science
Determine Data Data Science
Success
Science Goals Goals
Criteria Key
Initial TASKS
Produce Project Assessment
Project Plan of Tools & OUTPUTS
Plan
Techniques
Initial Data
Collect Initial
Collection
Data
Report
Data
Describe
Description
Data
Report
Data
Explore
Exploration
Data
Report Key
TASKS
Verify Data Data Quality
Quality Report OUTPUTS
Rationale for
Select Data
Inclusion/Exclusion
Data Cleaning
Clean Data
Report
OUTPUTS
Format Data Reformatted Data
Generate Test
Test Design
Design
Build
Parameter Settings Models Model Description
Model Key
TASKS
Review
Review of Process
Process
OUTPUTS
Plan
Deployment Plan
Deployment
Produce
Final Report Final Presentation
Final Report Key
TASKS
Business Data
Understanding Understanding
Data
Monitoring Preparation
Modeling
Deployment Data
Evaluation
Contact information:
open@sap.com
2016 SAP SE or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate
company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its
affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or SAP affiliate company products and
services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as
constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop
or release any functionality mentioned therein. This document, or any related presentation, and SAP SEs or its affiliated companies strategy and possible future
developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time
for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-
looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place
undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.
Business
Determine Business Business
Background Success
Objectives Objectives
Criteria
Requirements
Assess Inventory of Risks & Costs &
Assumptions & Terminology
Situation Resources Contingencies Benefits
Constraints
Data Science
Determine Data Data Science
Success
Science Goals Goals
Criteria Key
Initial TASKS
Produce Project Assessment
Project Plan of Tools & OUTPUTS
Plan
Techniques
Task
The first objective of the data analyst is to thoroughly
understand, from a business perspective, what the
client really wants to accomplish.
Outputs
Background
Business Objectives
Business Success Criteria
Task
In the previous task, your objective is to quickly get to the crux of
the situation. Here, you want to flesh out the details.
Outputs
Inventory of Resources
Requirements, Assumptions, & Constraints
Risks & Contingencies
Terminology
Costs & Benefits
Task
A business goal states objectives in business terminology.
A data science goal states project objectives in technical terms.
Outputs
Describe data science goals.
Define data science success criteria.
Task
Describe the intended plan for achieving the data
mining goals and thereby achieving the business goals.
Output
Project plan with project stages, duration, resources,
etc.
Initial assessment of tools & techniques.
Contact information:
open@sap.com
2016 SAP SE or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate
company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its
affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or SAP affiliate company products and
services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as
constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop
or release any functionality mentioned therein. This document, or any related presentation, and SAP SEs or its affiliated companies strategy and possible future
developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time
for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-
looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place
undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.
In their Third Annual Data Miner Survey, Rexer Analytics, an analytics and
renowned CRM consulting firm based in Winchester, Massachusetts asked the
BI community How do you evaluate project success in data mining? Out of 14
different criteria, a massive 58% ranked Model performance (lift, R2, etc) as
the primary factor.
Anomalies Trends
What anomalies or unusual values What are the trends, both historical
might exist? Are they errors or real and emerging, and how might they
changes in behavior? continue?
Associations
What are the correlations in
the data? What are the
cross-sell opportunities?
? Relationships
What are the main influencers, for
example customer churn, employee
turnover etc.?
Groupings
Are there any clear groupings of the data,
for example customer segments for
specific marketing campaigns?
Detect anomalies
Classification Regression
or outliers (data Forecasting with time
Association Clustering continuous target
bivariate target variable cleansing or series data
variable
decision support)
Classification Regression
Who will (buy | fraud | churn ) next What will the (revenue | # churners) be next
(week | month | year)? (week | month)?
Association or Recommendation
Link Analysis
Engines
Analyze interactions to identify
(communities | influencers) Provides recommendations on web sites or to
retailers basket analysis
Contact information:
open@sap.com
2016 SAP SE or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate
company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its
affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or SAP affiliate company products and
services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as
constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop
or release any functionality mentioned therein. This document, or any related presentation, and SAP SEs or its affiliated companies strategy and possible future
developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time
for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-
looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place
undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.
Initial Data
Collect Initial
Collection
Data
Report
Data
Describe
Description
Data
Report
Data
Explore
Exploration
Data
Report Key
TASKS
Verify Data Data Quality
Quality Report OUTPUTS
Task
Acquire the data (or access to the data) listed in the project
resources.
This initial collection includes data loading into the data exploration
tool and data integration if multiple data sources are acquired.
Task
Examine the gross or surface properties of the
acquired data and report on the results.
Task
This task tackles the data mining questions, which can be
addressed using querying, visualization, and reporting.
Task
Examine the quality of the data, addressing questions such
as:
Is the data complete?
Is it correct or does it contain errors?
Are there missing values in the data?
Contact information:
open@sap.com
2016 SAP SE or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate
company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its
affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or SAP affiliate company products and
services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as
constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop
or release any functionality mentioned therein. This document, or any related presentation, and SAP SEs or its affiliated companies strategy and possible future
developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time
for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-
looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place
undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.
Descriptive statistics
Graphs
The data are modified according to the analysis:
Adjust extreme observations, estimate missing observations, transform
variables, bin data, form new variables.
Contact information:
open@sap.com
2016 SAP SE or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate
company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its
affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or SAP affiliate company products and
services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as
constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop
or release any functionality mentioned therein. This document, or any related presentation, and SAP SEs or its affiliated companies strategy and possible future
developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time
for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-
looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place
undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.