intel-bluedata-ai-ml-webinar-62818-final_435477

AI and Machine Learning
Enterprise Use Cases and Challenges

Today’s Speakers
Radhika Rangarajan Nanda Vijaydev

Director, Data Analytics and AI Solutions Sr. Director, Solutions Management
Intel BlueData Software
Agenda
• AI: The Path to Deeper Insight

• Example AI Use Cases and Case Studies with ML / DL
• Challenges in Building a Distributed ML / DL Pipeline
• Running Distributed ML / DL on Docker Containers
• Q & A < Enter questions at any time in your BrightTalk web client
Ai and machine learning:
Enterprise use cases
Radhika Rangarajan
The path to deeper insight
Cognitive
Analytics
Prescriptive
Analytics
Predictive Self-Learning
Analytics How Do I Proceed?
Forecast
How Should I Proceed?
Diagnostic
Analytics Foresight
What Will Happen, When, and Why
AI
Descriptive Insight
Analytics What Happened and Why?
Hindsight
What Happened?
Is the driving force
AI will transform
Consumer Health Finance Retail Government Energy Transport Industrial Other

Smart Enhanced Algorithmic Support Defense Oil & Gas In-Vehicle Factory Advertising
Assistants Diagnostics Trading Experience Data Exploration Experience Automation Education
Chatbots Drug Fraud Detection Insights Smart Automated Predictive
Marketing Gaming
Search Discovery Research Safety & Grid Driving Maintenance
Merchandising Professional &
Patient Care Security Operational Aerospace Precision
Personalization Personal Loyalty IT Services
Research Finance Resident Improvement Shipping Agriculture
Augmented Telco/Media
Supply Chain Engagement Conservation Field
Reality Sensory Risk Mitigation Search & Sports
Aids Security Smarter Rescue Automation
Robots
Cities
Source: Intel forecast

Smarter AI Through the Industry’s Most Comprehensive Platform
Partner ecosystem to facilitate AI in

Data community finance, health, retail, industrial & more
Intel analytics Future
ecosystem to Driving AI
Portfolio of software tools to accelerate forward
get your data
ready from tools time-to-solution through R&D,
integration to investment
analysis and policy
Multi-purpose to purpose-built leadership
hardware AI compute from cloud to device
TOOLS Portfolio of software tools to
accelerate time-to-solution
Com
TOOLKITS
DEEP LEARNING (INFERENCE) REASONING DEEP LEARNING
ing
Intel® Deepsoon !
†
OpenVINO™ Intel® Movidius™ SDK Intel® Saffron™ AI
Application Open Visual Inference & Neural Network Optimized inference deployment Cognitive solutions on CPU
Learning Studio‡
Optimization toolkit for inference deployment on on Intel VPUs for for anti-money laundering, Open-source tool to
Developers CPU/GPU/FPGA for TensorFlow*, Caffe* & TensorFlow* & Caffe* predictive maintenance, more compress deep learning
MXNet* development cycle
Com
libraries
MACHINE LEARNING LIBRARIES DEEP LEARNING FRAMEWORKS
ing
Python R Distributed Now optimized for CPU Optimizations in progress soon
•Scikit- •Cart •MlLib (on Spark)
!
Data
*
* * * * *
*
FOR
learn •Random •Mahout

*
Scientists •Pandas
•NumPy
Forest
•e1071 TensorFlow* MXNet* Caffe* BigDL Caffe2* PyTorch* CNTK*
(Spark)* PaddlePaddle*
foundation
ANALYTICS, MACHINE & DEEP LEARNING PRIMITIVES DEEP LEARNING GRAPH COMPILER
Python DAAL MKL-DNN clDNN Intel® nGraph™ Compiler (Alpha)

Library Intel distribution Intel® Data Analytics Open-source deep neural Open-sourced compiler for deep learning model
optimized for Acceleration Library network functions for computations optimized for multiple devices from
Developers machine learning (incl machine learning) CPU / integrated graphics multiple frameworks
†
Formerly the Intel® Computer Vision SDK
*Other names and brands may be claimed as the property of others.
Developer personas show above represent the primary user base for each row, but are not mutually-exclusive
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
Deep learning frameworks
Many Popular DL Frameworks are now optimized for CPU
*
* *
FOR *
See installation guides at ai.intel.com/framework-optimizations/
* * * *
More under optimization: and more…
SEE ALSO: Machine Learning Libraries for Python (Scikit-learn, Pandas, NumPy), R (Cart, randomForest, e1071), Distributed (MlLib on Spark, Mahout)
*Limited availability today
Other names and brands may be claimed as the property of others.
Who is building what….across the Industry
Consumer Health Finance Retail Manufacturing Scientific computing

CALL CENTER ROUTING ANALYSIS OF 3D MRI FRAUD DETECTION IMAGE FEATURE STEEL SURFACE WEATHER
IMAGE SIMILARITY SEARCH MODELS FOR KNEE RECOMMENDATION EXTRACTION DEFECT DETECTION FORECASTING
SMART JOB SEARCH DEGRADATION CUSTOMER/MERCHANT
PROPENSITY
And other emerging usages…

Result
>90%
Top model accuracy for training and test
data showed >90% for no lesions, limited
data sets for mild/severe lesions.
Client: Center for Digital Challenge: Projected by 2040 – 78M adults Solution: Apache Spark* with BigDL on
Health Innovation (CDHI) with doctor-diagnosed OA & 35M with arthritis- Cloudera (CDH 5.9*), on Intel® Xeon® servers
at UCSF, leveraging new attributable activity limitations. Need from Dell*. With 3D image convolution in
digital health automated system that classifies menisci based BigDL, the CDHI team built a MRI
technologies to on presence/absence of lesions, provides classification system & deployed it on their
transform healthcare. immediate objective results at MRI scan, & CDH Dell cluster.
eliminates intra-user variability.
https://cdn.oreillystatic.com/en/assets/1/event/269/Automatic%203D%20MRI%20knee%20damage%20classification%20with%203D%20CNN%20using%20BigDL%20on%20Spark%20Presentation.pdf
Intel does not control or audit third-party benchmark data or the web sites referenced in this document.
You should visit the referenced web site and confirm whether referenced data are accurate.
Result
60% increase
In coverage rate and 20% accuracy rate, better
than the traditional rule-based approach.
Client: China UnionPay*, Challenge: Detect fraudulent credit card Solution: Using Cloudera, Apache Spark*
which specializes in transactions with more coverage and with BigDL, running on Intel® Xeon® and 5th
banking services and accuracy. Gen Intel® Core™ Processors for credit card
payment systems. It is fraud detection. Historical data is stored on
the 3rd largest payment
network in the world. Apache Hive*. Data preprocessing done with
Apache Spark SQL*.
https://www.intel.com/content/www/us/en/financial-services-it/union-pay-case-study.html
Result
4X GaiN
in performance with Intel® Xeon® CPU
cluster, per JD.com. Processing ~380M
images with Intel Xeon CPU E5-2650 v4 @
2.20GHz with 1200 YARN cores.
Client: JD.Com*, second Challenge: Building deep learning Solution: Switched from GPU to CPU
largest online retailer in applications such as image similarity search cluster. Using Apache Spark* with BigDL,
China, with on GPU cluster was costly & complex. running on Intel® Xeon® processors. Intel
approximately 25 million Technical issues included high latency when delivered an image detection & extraction
registered users. downloading graphic data from Apache pipeline. BigDL used to build deep learning
HBase* & complicated data pre-processing models for image recognition & feature
in GPU environment. extraction.
https://software.intel.com/en-us/articles/building-large-scale-image-feature-extraction-with-bigdl-at-jdcom
Result
20x faster time-to-train
For processing a 10k image dataset, reducing
the training time from 11 hours to 31 minutes
with over 99% accuracy.
Collaborator: Challenge: High content screening of cellular Solution: Intel and Novartis teams were
Novartis International AG, phenotypes is a fundamental tool supporting able to scale to more than 120
based in Switzerland, and early stage drug discovery. While analyzing (3.9Megapixel) images per second with 32
one of the largest whole microscopy images would be desirable, TensorFlow* workers.
pharmaceutical these images are more than 26x larger than Configuration: A cluster consisting of eight
companies in the world. images found typically in datasets such as Intel® Xeon® processor servers using an
ImageNet*. As a result, the high computational Intel® Omni-Path Fabric interconnect and
workload would be prohibitive in terms of deep TensorFlow* optimized for Intel
neural network model training time. architecture.
https://newsroom.intel.com/news/using-deep-neural-network-acceleration-image-analysis-drug-discovery/
Result
$1 billion
Streamlined collection of US $1 billion
in revenue by designing new APIs for
car and license plate image recognition.
Client: SERPRO, Brazil’s Challenge: Across Brazil, 35,000 traffic Solution: Used deep learning techniques to
largest government- enforcement cameras document 45 million optimize SERPRO’s code. With Brazilian
owned IT services violations every year, generating US $1 student-partners, developed new
corporation, providing billion in revenue. Fully automating the algorithms, training and inference tests
technology services to complex, labor-intensive process for issuing using Google TensorFlow* on Dell EMC
multiple public sector tickets by integrating image recognition via PowerEdge R740*, running on Intel® Xeon®
agencies. AI could reduce costs and processing time. Scalable processor-based platforms.
http://www.decisionreport.com.br/destaque/serpro-desenvolve-produto-validador-cognitivo-de-infracoes/
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. *Other names and brands may be
You should visit the referenced web site and confirm whether referenced data are accurate. claimed as the property of others.
BUILDING A DISTRIBUTED
ML / DL PIPELINE FOR AI
Distributed ML / DL – Requirements
• Access to common data, big and small

• Choice of modeling techniques
• Ability to build, share, and iterate
• Reproducibility
• Easy scaling to test on real data sets
• Support for different roles and actions
Distributed ML / DL – Challenges
• Scalability, repeatability, and reproducibility across environments
Laptop On-Prem Cluster Off-Prem Cluster
• Deploying distributed platforms, libraries, applications, and versions

ML / DL Stack: Tools + Infrastructure
Role-based
Data Scientists Developers Data Engineers Decision Makers access control
Pipeline Tools Tools for

distributed
BI ML / DL ML / DL pipeline
FOR
Tools Tools
Intel optimized DL frameworks Other BI / DL / ML tools
Data Frameworks Access to data

and storage
Model Workflow
Kafka HDFS HBase Spark
Storage Mgmt
Distributed ML / DL Workflow
Conceptualize
• Sandbox /custom
• Multiple models
Measure Model Build environments
• Real-time /
• Access data
batch • Model storage
• Feedback loop
• Add libraries
• Save models
Deploy Iterate • Add features
• Publish API endpoints
• Rerun
Distributed ML / DL with BlueData
Data Scientists Developers Data Engineers Data Analysts
BlueData EPIC ™ Software Platform
Big Data Tools ML / DL Tools Data Science Tools BI / Analytics Tools Bring-Your-Own
ElasticPlane™ – Self-service, multi-tenant clusters
IOBoost™ – Extreme performance and scalability
DataTap™ – In-place access to data on-prem or in the cloud
Compute
Storage NFS HDFS

On-Premises Public Cloud
App Store with Pre-Built Images
Docker images for multiple

applications and versions
Ability to create and add

new images, and
save or restore tested
combinations on demand
On-Demand Deployment
On-demand creation of
ML / DL environments
with Jupyter, R Studio,
and Zeppelin notebooks
Environments can be
provisioned with a
Spark or Hadoop cluster
if needed
Clusters Provisioned in Minutes
Links to notebooks
(with Active Directory
authentication), integrated
with multi-node Spark
R Studio link for R users, or

Jupyter link for Python users
Ready-to-Use Notebooks
Notebooks with built-in

local and distributed
kernels for model building
and scoring
Multiple Data Sources with ACL
Access to HDFS data lake,

with pass-through security
from compute
Save and restore models and

data using shared NFS,
mounted as local FS
BlueData AI / ML Accelerator
A new turnkey software + services solution targeted at AI use cases:
• Accelerated deployment of multi-node containerized sandbox
environments for ML / DL, using BlueData’s EPIC software platform
• Ready-to-run Docker application images for popular ML / DL tools
• Available either on-premises or in the public cloud (AWS, Azure, GCP)
• Packaged with professional services, training, and support
Faster time-to-value and reduced TCO with

on-demand environments for distributed ML / DL
Key Takeaways: Distributed ML / DL
• Operationalizing distributed ML / DL is hard work

– Unique requirements for access to data, models, tools, etc.
– Need support for fast, iterative prototyping and reproducibility
– Requires ultimate flexibility as tools evolve and new options emerge
• Leverage a purpose-built solution (e.g. BlueData EPIC)
– Bring agility to building distributed ML / DL pipelines, powered by Docker
– Provide ability to share code, models, and data with secure multi-tenancy
– Enable self-service environments with on-demand provisioning
• Optimized for Intel Xeon architecture, on-prem or in the cloud
Intel® AI academy
For developers, students, instructors and startups
Get 4-weeks FREE access to

Get smarter using the Intel® AI DevCloud or
online tutorials,
webinars, student kits learn Develop use your existing Intel®
Xeon® Processor-based
and support forums cluster
Educate others using Showcase your innovation

available course at industry & academic
materials, hands-on
labs, and more
teach Share events and online via the
Intel AI community forum
software.intel.com/ai
Radhika Rangarajan Nanda Vijaydev
Thank You
For more information, visit:

www.bluedata.com
and
www.ai.intel.com

intel-bluedata-ai-ml-webinar-62818-final_435477

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

intel-bluedata-ai-ml-webinar-62818-final_435477

Uploaded by

Copyright:

Available Formats

AI and Machine Learning

Enterprise Use Cases and Challenges

Radhika Rangarajan Nanda Vijaydev

• AI: The Path to Deeper Insight

Consumer Health Finance Retail Government Energy Transport Industrial Other

Source: Intel forecast

Partner ecosystem to facilitate AI in

learn •Random •Mahout

Python DAAL MKL-DNN clDNN Intel® nGraph™ Compiler (Alpha)

See installation guides at ai.intel.com/framework-optimizations/

Consumer Health Finance Retail Manufacturing Scientific computing

And other emerging usages…

• Access to common data, big and small

Laptop On-Prem Cluster Off-Prem Cluster

• Deploying distributed platforms, libraries, applications, and versions

Pipeline Tools Tools for

Data Frameworks Access to data

Data Scientists Developers Data Engineers Data Analysts

BlueData EPIC ™ Software Platform

Storage NFS HDFS

Docker images for multiple

Ability to create and add

R Studio link for R users, or

Notebooks with built-in

Access to HDFS data lake,

Save and restore models and

Faster time-to-value and reduced TCO with

• Operationalizing distributed ML / DL is hard work

Get 4-weeks FREE access to

Educate others using Showcase your innovation

For more information, visit:

You might also like