Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Developing and Deploying a Machine Learning

Scenario for SAP HANA


DAT312
PUBLIC
Speakers

Las Vegas
September 24–27, 2019

Christoph Morgen
Frank Gottfried

Barcelona
October 8-10, 2019

Christoph Morgen
Frank Gottfried

Bangalore
November 13-15, 2019

Sathish Hariharan
Suriyanarayanan Balamurugan

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


Take the session survey.
We want to hear from you!

Complete the session evaluation for this session


DAT312 on the SAP TechEd mobile app.

Download the app from


iPhone App Store or Google Play.

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3


Disclaimer

The information in this presentation is confidential and proprietary to SAP and may not be disclosed without the permission of SAP.
Except for your obligation to protect confidential information, this presentation is not subject to your license agreement or any other service
or subscription agreement with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or any related
document, or to develop or release any functionality mentioned therein.
This presentation, or any related document and SAP's strategy and possible future developments, products and or platforms directions and
functionality are all subject to change and may be changed by SAP at any time for any reason without notice. The information in this
presentation is not a commitment, promise or legal obligation to deliver any material, code or functionality. This presentation is provided
without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a
particular purpose, or non-infringement. This presentation is for informational purposes and may not be incorporated into a contract. SAP
assumes no responsibility for errors or omissions in this presentation, except if such damages were caused by SAP’s intentional or gross
negligence.
All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from
expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates,
and they should not be relied upon in making purchasing decisions.

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4


Agenda

SAP HANA Machine Learning


 Overview
 Introduction to Predictive Analysis Library and
Python client API for HANA ML

Data Science to ML scenario operation


 Workflow overview

Hands-on exercise scenario


 Overview and guidance

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5


SAP HANA Machine Learning Overview
SAP HANA machine learning
In-database and external machine learning capabilities

Richest multimodal in-database applications


 Trending ML algorithms for HANA embedded use and processing with in-memory performance
 Interfaces for Data Scientists in R and Python
 Combine and enrich spatial, text analysis, graph processing with machine learning

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7


SAP HANA machine learning – typical scenarios addressed
Enabling data scientists to build in-database Machine Learning scenarios

Typical example scenarios

Predicting customer behavior like churn, fraud or Forecasting future sales, demand, cost, etc.
buying behavior (classification) based on historic time related data
(time series forecasting)

Predicting car prices, based on model Analyzing shopping baskets to suggest product
characteristics and market trends (regression) placements or additional purchases to a customer
(association analysis)

Enabling marketers to develop targeted marketing Detecting anomalies in financial transactions for
programs by grouping customers (clustering) fraud analysis, or in machine sensor data for
predictive maintenance (outlier detection)

Provide personalized product recommendations In a given social network, you seek to infer which
by analyzing product associations, individual new interactions among its members are likely to
purchase history and external factors occur in the near future (link analysis / prediction)
(recommender system)

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8


SAP HANA machine learning – Predictive Analysis Library (PAL)
Native in-database Machine Learning

Predictive Analysis Library (PAL) SAP HANA Platform


 Machine learning algorithm library, designed and
Predictive Analysis Library (PAL)
optimized for massive parallel in-memory processing.
– Addresses key scenarios like Classification, Classification Association
Regression or Time Series Forecasting (and more) analysis
Regression
– Over 90+ classic and trending ensemble algorithms Outlier detection
Time series
– High performance parallel mass prediction, real-time forecasting
Recommender
transactional speed prediction System
Cluster
Link prediction
– Segmented Modeling, like segmented Forecasting analysis

– Automated cross validation and hyper parameter


selection
 Easy to develop and simple to embed with applications

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9


SAP HANA Predictive Analysis Library (PAL) – functional overview
Algorithm overview by category
Classification Analysis Cluster Analysis Association Analysis Statistical Functions
 Apriori, Apriori Lite  Mean, Median, Variance, Standard
 CART, C4.5 and CHAID  FP-Growth Deviation, Kurtosis, Skewness
 DBSCAN, K-Means/Accelerated K-  KORD – Top K Rule Discovery
Decision Tree Analysis Means**, K-Medoid Clustering,  Weighted Scores Table, ABC Analysis
 Sequential Pattern Mining*
 K Nearest Neighbor K-Medians, GEO DBSCAN 4  Covariance Matrix
 Logistic Regression Elastic Net  Kohonen Self Organized Maps Probability Distribution  Pearson Correlations Matrix
 Back-Propagation (Neural Network)  Agglomerate Hierarchical  Distribution Fit/ Weibull analysis  Chi-squared Tests: Quality of Fit,
 Cumulative Distribution Function Test of Independence
 Naïve Bayes,  Affinity Propagation  Quantile Function
 Support Vector Machine  Latent Dirichlet Allocation (LDA)  Kaplan-Meier Survival Analysis  F-test (variance equal test)
 Random Decision Trees  Gaussian Mixture Model (GMM)  Data Summary*
 Hybrid Gradient Boosting Tree (HGBT)4  Cluster Assignment Outlier Detection  Correlation Function*
Gradient Boosting Decision Tree (GBDT)*  Inter-Quartile Range Test (Tukey’s Test)  ANOVA**, One-sample Median Test**,
 Variance Test
 Linear Discriminant Analysis (LDA)* Time Series Analysis  Anomaly Detection
T Test**, Wilcox Signed Rank Test**
 Confusion Matrix, Area Under Curve  Single/Double/ Brown/Triple Exp.  Kernel Density Estimation 4,
 Grubbs Outlier Test
 Conditional Random Field 4 Smoothing  Entropy 4
 Forecast Smoothing Recommender Systems
Data Preparation
Regression  Auto – ARIMA/Seasonal ARIMA  Factorized Polynomial Regression
Models**  Sampling, Binning, Scaling,
 Multiple Linear Regression Elastic Net  Croston Method  Alternating least squares**** Partitioning, Discretize 4
 Polynomial, Exponential, Bi-Variate  Forecast Accuracy Measure  Field-aware Factorization Machines
Geometric, Bi-Variate Logarithmic  Substitute Missing Values,
 Linear Regression with Damped Trend (FFM) ****
Regression Missing Value Handling 4
and Seasonal Adjust
 Generalized Linear Model (GLM)* Link Prediction  Principal Component Analysis
 Test for White Noise, Trend, Seasonality (PCA)/PCA Projection
 Cox Proportional Hazards Model*  Common Neighbors, Jaccard’s
 Fast Fourier Transform (FFT)* Coefficient, Adamic/Adar, Katzβ  TSNE 4
 Random Decision Trees  Hierarchical Forecasting **** PageRank ****  Factor Analysis***
 Hybrid Gradient Boosting Tree (HGBT) 4  Change Point Detection 4
Gradient Boosting Decision Tree (GBDT)*  Multi dimensional scaling***
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC * New HANA 2 SPS 00 | ** New HANA 2 SPS 01 | *** New HANA 2 SPS 02 | **** New HANA 2 SPS 03 4new HANA 2 SPS04 10
SAP HANA machine learning – Predictive Analysis Library (PAL)
Native in-database machine learning

Predictive Analysis Library – Key capabilities

 Addresses all key scenarios like Classification, Segmented ML model development and prediction
Regression or Time Series Forecasting (and more) – Supported with all PAL algorithms and scenarios
– All major machine learning scenario on structure data – Like segmented time series forecasting (forecast
can be addressed, within the databases segmented by store, product, etc.)
– Algorithms fast and optimized for in-database execution
 Automated cross validation, hyper parameter
 Over 90+ classic and trending algorithms selection for key algorithm
– Random decision trees and gradient boosting decision – Model development support and automation, higher
trees outperform in most classification and regression productivity and faster results with best possible and
use cases stable models

 High-performance parallel mass prediction,  Easy to develop and simple to embed within
real-time transactional speed prediction applications
– Multi-node fastest big data predictions as well as – Supports both expert data scientists and developer
real-time transactional <50ms speed prediction personas
– Simple SQL interface and Python and R client APIs
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 11
SAP HANA machine learning – Automated Predictive Library (APL)
Native in-database automated predictive analytics

 SAP HANA embeds the Automated Predictive Library* SAP HANA Platform
– Addresses key scenarios like automated Classification,
Regression or Time Series Forecasting (and more) Automated Predictive Library (APL)

– Automation is based on concepts of “Structural Risk


Classification Association
Minimization” and covers analysis steps of automated analysis
variable selection, data preparation, variable encoding, Regression
Recommendation
missing value handling, outlier handling, binning and banding, Time series
model testing and best model selection forecasting
Link analysis
Cluster
 Automation is the key to broad and fast adoption analysis
– Quick and easy to leverage for non-expert Data Scientist and
to consume in applications built on HANA
– The APL provides simple procedure functions for developers
to Create, Train, Apply, Deploy and Query predictive models

* https://blogs.sap.com/2019/04/23/automate-machine-learning-with-apl-now-part-of-sap-hana-sps04/
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 12
SAP HANA client APIs and external machine learning integration
Leverage open source machine learning with SAP HANA

Client-side APIs for HANA Machine Learning


 Native Python driver for SAP HANA
Python Notebook
 Python and R APIs for HANA Machine Learning
ODBC data
– Exposing HANA data via dataframes
– Surfacing in-database algorithms for Data Scientist in their
expert environment SAP HANA Platform

External Machine Learning integration covers External Machine Learning Integration

 R Integration with SAP HANA


TensorFlow
– Connect and interoperate with the SAP HANA database from R Integration
Integration
R Studio
– R script-code to be processed as part of the overall query
execution plan from SAP HANA
data + R-script result
 TensorFlow Integration with SAP HANA EML call prediction
result
– Easily extend deep learning from SAP HANA R-Serve Server
Active
– Retain the familiar database development environment R-Processing Model(s)

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 13


R and Python client APIs for SAP HANA machine learning

R and Python packages wrapping in-database machine learning functions


 Enabling Data Scientists to
– Easily leverage data in SAP HANA and in-memory performance
– Script in Python or R while process data remotely in SAP HANA
– Leverage SAP HANA ML algorithm libraries as easily as any other R or Python packages

SAP HANA DataFrame Data Scientist using Python Data Scientist using R
• Stores only references to data in
SAP HANA
• No data transferred to the SAP HANA Core ML API
Python Program R Program • Python / R wrapper for
Python/R process except when
explicitly requested DataFrame ML API DataFrame ML API PAL / APL functions
• Hides SQL statements • Typical Python / R ML
hdbcli for Python RODBC (HDBODBC)
• Remote, in-database like interface
computation in SAP HANA • Processes data based
• Useful for data analysis and on dataframes
HANA HANA
exploration
• Used as implicit input structure to
the Core ML API

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 14


Python client API for SAP HANA machine learning
Data Scientist using Python

Key capabilities
• Allows scripting in Python while instructing remote processing of
data in SAP HANA, leveraging HANA In-database embedded ML
capabilities
• The HANA dataframe object as virtual data reference in Python
for data preprocessing, transformation and analysis
• Exploratory data analysis (EDA) visualization capabilities
• Large set of Predictive Analyis Library (PAL) functions for the Python API for HANA ML
expert Data Scientist, allowing simple conversion of Python native hdbcli (HANA client for Python)

ML scenarios to HANA embedded operationalization


• Automated Predictive Library (APL) functions exposing
AutoML and non-expert predictive functions in Python
• General available release version 1.0.5 with HANA client of SAP
HANA 2 SPS04. Runs with SAP HANA 2 SPS03 or higher.

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 15


Data Science to ML scenario operation
Data Science to machine learning scenario operation
End to end design to operation workflow of a SAP HANA machine learning scenario

Data Scientist using Python Developer / Application User /


Explores, experiments, optimizes a ML scenario Integration Architect Machine Learning
Technical integration operation responsible

• Train model version


• Set active
• Apply model / predict
• Monitor model
performance
• Retrain model version
• Validate candidate model,
etc….
•…

Ad-hoc SQL generation

SAP HANA

Deployment / Use and operation of


HANA design-time artefact Transport of the ML scenario
generation ML scenario project

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 17


SAP HANA application project – machine learning scenario
Components and structure

ML scenario project structure


• Datasources, security and configuration objects,
e.g. synonyms
• Scenario local- and consumption-level data
structures, e.g. model table, debriefing statistics, ..
• Base-level artefacts
‒ Implements the core elements of the machine
learning scenario.
• Consumption-level artefacts
‒ binds the core artefacts into the actual application,
binds the actual data, etc …
‒ May have implementation on multiple application
layers, e.g. HANA native objects, ABAP AMDPs,
SAP Data Intelligence pipelines, …

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 18


Exercise scenario and hands-on details
Exercise scenario
Scenario - train a PAL classification model to predict customer churn

Explanatory Variables Target


When we train
the model, the
outcome is
Name City Age ContractActivityLabel
known
Mike Miami 42 Churned
Jerry New York 32 Retended
Training sub-set
Bryan Orlando 18 Retended
Patricia Miami 45 Churned
Elodie Phoenix 35 Retended
Validation sub-set
Learn a classification model Remy Chicago 72 Churned
to predict the
target column values

Predictive
Model
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 20
Exercise scenario
Scenario - train a PAL classification model to predict customer churn

Name City Age ContractActivityLabel


Marine Miami 45 ?
Julien Miami 52 ? “Apply” the model onto
Fred Orlando 20 ?
new data to calculate the
“predicted classification”
Michelle Boston 34 ? and its “probability” for
Nicolas Phoenix 90 ? each customer
Marine Miami 45 ?
Predict with Model
Name City Age Predicted_Label Probability
Marine Miami 45 Churned 0.83

Predicted Data Julien Miami 52 Churned 0.62


Fred Orlando 20 Retended 0.65
Michelle Boston 34 Retended 0.94
Nicolas Phoenix 90 Retended 0.87
Marine Miami 45 Retended 0.43
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 21
Exercises tasks and timing overview

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 22


How to navigate in the instruction

Select Chapter by clicking at it Select Exercise by clicking at it Navigate to next instruction step by clicking at arrow

After last instruction step of an exercise


• Select Next Exercise
• Or navigate back to the Chapter/Exercise selection
by clicking at icon SAP TECHED

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 23


Wrap up
Continue your SAP TechEd 2019 Learning Experience
Join the digital SAP TechEd Learning Room 2019 in SAP Learning Hub

 Access SAP TechEd Learning Journeys


 Discover related learning content
 Watch webinars of SAP TechEd lectures
 Learn about SAP’s latest innovations with openSAP
 Collaborate with SAP experts
 Self-test your knowledge
 Earn a SAP TechEd knowledge badge

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 25


Engage with the SAP TechEd Community
Access replays and continue your SAP TechEd discussion after the event
within the SAP Community

Access replays Continue the conversation Check out the latest blogs
 Keynotes  Read and reply to blog posts  See all SAP TechEd blog posts
 Live interviews  Ask questions  Learn from peers and experts
 Select lecture sessions  Join discussions
http://sapteched.com/online sap.com/community SAP TechEd blog posts

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 26


More information

Related SAP TechEd Learning Journeys


 DAT1 - Build your data-driven intelligent enterprise
 DAT4 - Develop Next Generation Cloud Native Applications with SAP HANA
 AIN2 - Transform your business processes with intelligent technologies
 AIN5 - Unleash your data's potential with smart analytics

Related SAP TechEd sessions


 DAT206 – Hybrid Deployment Agility with SAP HANA Cloud
 DAT260 – SAP Data Intelligence: Machine Learning Push-Down to SAP HANA with Python
 CAA387 – Powering the Intelligent Enterprise with Machine Learning
 AIN365 – Deep-Dive Hands-On into SAP S/4HANA–Based Machine Learning Scenarios
 DAT365 - End-to-End Application Development for SAP HANA

Public SAP Web sites


 SAP Community: www.sap.com/community ./topics/hana ./topics/machine-learning
 SAP Developers: https://developers.sap.com/topics/machine-learning.html#tutorials
 SAP products: www.sap.com/products/hana.html www.sap.com/products/hana/features/advanced-analytics.html
https://saphanacloudservices.com/
 SAP Samples: https://github.com/SAP-samples/hana-ml-samples
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 27
Thanks for attending this session.

Feedback Contact for further topic inquiries


Please complete your session evaluation Christoph Morgen
for DAT312 SAP HANA Product Management
Christoph.Morgen@sap.com
Follow us

www.sap.com/contactsap

© 2019 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See www.sap.com/copyright for additional trademark information and notices.

You might also like