Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

AI and Machine Learning

Enterprise Use Cases and Challenges


Today’s Speakers

Radhika Rangarajan Nanda Vijaydev


Director, Data Analytics and AI Solutions Sr. Director, Solutions Management
Intel BlueData Software
Agenda

• AI: The Path to Deeper Insight


• Example AI Use Cases and Case Studies with ML / DL
• Challenges in Building a Distributed ML / DL Pipeline
• Running Distributed ML / DL on Docker Containers
• Q & A < Enter questions at any time in your BrightTalk web client
Ai and machine learning:
Enterprise use cases

Radhika Rangarajan
The path to deeper insight
Cognitive
Analytics
Prescriptive
Analytics
Predictive Self-Learning
Analytics How Do I Proceed?
Forecast
How Should I Proceed?
Diagnostic
Analytics Foresight
What Will Happen, When, and Why

AI
Descriptive Insight
Analytics What Happened and Why?

Hindsight
What Happened?
Is the driving force
AI will transform

Consumer Health Finance Retail Government Energy Transport Industrial Other


Smart Enhanced Algorithmic Support Defense Oil & Gas In-Vehicle Factory Advertising
Assistants Diagnostics Trading Experience Data Exploration Experience Automation Education
Chatbots Drug Fraud Detection Insights Smart Automated Predictive
Marketing Gaming
Search Discovery Research Safety & Grid Driving Maintenance
Merchandising Professional &
Patient Care Security Operational Aerospace Precision
Personalization Personal Loyalty IT Services
Research Finance Resident Improvement Shipping Agriculture
Augmented Telco/Media
Supply Chain Engagement Conservation Field
Reality Sensory Risk Mitigation Search & Sports
Aids Security Smarter Rescue Automation
Robots
Cities

Source: Intel forecast


Smarter AI Through the Industry’s Most Comprehensive Platform

Partner ecosystem to facilitate AI in


Data community finance, health, retail, industrial & more
Intel analytics Future
ecosystem to Driving AI
Portfolio of software tools to accelerate forward
get your data
ready from tools time-to-solution through R&D,
integration to investment
analysis and policy
Multi-purpose to purpose-built leadership
hardware AI compute from cloud to device
TOOLS Portfolio of software tools to
accelerate time-to-solution

Com
TOOLKITS
DEEP LEARNING (INFERENCE) REASONING DEEP LEARNING
ing
Intel® Deepsoon !

OpenVINO™ Intel® Movidius™ SDK Intel® Saffron™ AI
Application Open Visual Inference & Neural Network Optimized inference deployment Cognitive solutions on CPU
Learning Studio‡
Optimization toolkit for inference deployment on on Intel VPUs for for anti-money laundering, Open-source tool to
Developers CPU/GPU/FPGA for TensorFlow*, Caffe* & TensorFlow* & Caffe* predictive maintenance, more compress deep learning
MXNet* development cycle

Com
libraries
MACHINE LEARNING LIBRARIES DEEP LEARNING FRAMEWORKS
ing
Python R Distributed Now optimized for CPU Optimizations in progress soon
•Scikit- •Cart •MlLib (on Spark)
!
Data
*
* * * * *
*
FOR

learn •Random •Mahout


*

Scientists •Pandas
•NumPy
Forest
•e1071 TensorFlow* MXNet* Caffe* BigDL Caffe2* PyTorch* CNTK*
(Spark)* PaddlePaddle*

foundation
ANALYTICS, MACHINE & DEEP LEARNING PRIMITIVES DEEP LEARNING GRAPH COMPILER

Python DAAL MKL-DNN clDNN Intel® nGraph™ Compiler (Alpha)


Library Intel distribution Intel® Data Analytics Open-source deep neural Open-sourced compiler for deep learning model
optimized for Acceleration Library network functions for computations optimized for multiple devices from
Developers machine learning (incl machine learning) CPU / integrated graphics multiple frameworks

Formerly the Intel® Computer Vision SDK
*Other names and brands may be claimed as the property of others.
Developer personas show above represent the primary user base for each row, but are not mutually-exclusive
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
Deep learning frameworks
Many Popular DL Frameworks are now optimized for CPU

*
* *
FOR *

See installation guides at ai.intel.com/framework-optimizations/

* * * *
More under optimization: and more…
SEE ALSO: Machine Learning Libraries for Python (Scikit-learn, Pandas, NumPy), R (Cart, randomForest, e1071), Distributed (MlLib on Spark, Mahout)
*Limited availability today
Other names and brands may be claimed as the property of others.
Who is building what….across the Industry

Consumer Health Finance Retail Manufacturing Scientific computing


CALL CENTER ROUTING ANALYSIS OF 3D MRI FRAUD DETECTION IMAGE FEATURE STEEL SURFACE WEATHER
IMAGE SIMILARITY SEARCH MODELS FOR KNEE RECOMMENDATION EXTRACTION DEFECT DETECTION FORECASTING
SMART JOB SEARCH DEGRADATION CUSTOMER/MERCHANT
PROPENSITY

And other emerging usages…


Result
>90%
Top model accuracy for training and test
data showed >90% for no lesions, limited
data sets for mild/severe lesions.

Client: Center for Digital Challenge: Projected by 2040 – 78M adults Solution: Apache Spark* with BigDL on
Health Innovation (CDHI) with doctor-diagnosed OA & 35M with arthritis- Cloudera (CDH 5.9*), on Intel® Xeon® servers
at UCSF, leveraging new attributable activity limitations. Need from Dell*. With 3D image convolution in
digital health automated system that classifies menisci based BigDL, the CDHI team built a MRI
technologies to on presence/absence of lesions, provides classification system & deployed it on their
transform healthcare. immediate objective results at MRI scan, & CDH Dell cluster.
eliminates intra-user variability.

https://cdn.oreillystatic.com/en/assets/1/event/269/Automatic%203D%20MRI%20knee%20damage%20classification%20with%203D%20CNN%20using%20BigDL%20on%20Spark%20Presentation.pdf

Intel does not control or audit third-party benchmark data or the web sites referenced in this document.
You should visit the referenced web site and confirm whether referenced data are accurate.
Result
60% increase
In coverage rate and 20% accuracy rate, better
than the traditional rule-based approach.

Client: China UnionPay*, Challenge: Detect fraudulent credit card Solution: Using Cloudera, Apache Spark*
which specializes in transactions with more coverage and with BigDL, running on Intel® Xeon® and 5th
banking services and accuracy. Gen Intel® Core™ Processors for credit card
payment systems. It is fraud detection. Historical data is stored on
the 3rd largest payment
network in the world. Apache Hive*. Data preprocessing done with
Apache Spark SQL*.

https://www.intel.com/content/www/us/en/financial-services-it/union-pay-case-study.html
Intel does not control or audit third-party benchmark data or the web sites referenced in this document.
You should visit the referenced web site and confirm whether referenced data are accurate.
Result
4X GaiN
in performance with Intel® Xeon® CPU
cluster, per JD.com. Processing ~380M
images with Intel Xeon CPU E5-2650 v4 @
2.20GHz with 1200 YARN cores.

Client: JD.Com*, second Challenge: Building deep learning Solution: Switched from GPU to CPU
largest online retailer in applications such as image similarity search cluster. Using Apache Spark* with BigDL,
China, with on GPU cluster was costly & complex. running on Intel® Xeon® processors. Intel
approximately 25 million Technical issues included high latency when delivered an image detection & extraction
registered users. downloading graphic data from Apache pipeline. BigDL used to build deep learning
HBase* & complicated data pre-processing models for image recognition & feature
in GPU environment. extraction.

https://software.intel.com/en-us/articles/building-large-scale-image-feature-extraction-with-bigdl-at-jdcom
Intel does not control or audit third-party benchmark data or the web sites referenced in this document.
You should visit the referenced web site and confirm whether referenced data are accurate.
Result
20x faster time-to-train
For processing a 10k image dataset, reducing
the training time from 11 hours to 31 minutes
with over 99% accuracy.

Collaborator: Challenge: High content screening of cellular Solution: Intel and Novartis teams were
Novartis International AG, phenotypes is a fundamental tool supporting able to scale to more than 120
based in Switzerland, and early stage drug discovery. While analyzing (3.9Megapixel) images per second with 32
one of the largest whole microscopy images would be desirable, TensorFlow* workers.
pharmaceutical these images are more than 26x larger than Configuration: A cluster consisting of eight
companies in the world. images found typically in datasets such as Intel® Xeon® processor servers using an
ImageNet*. As a result, the high computational Intel® Omni-Path Fabric interconnect and
workload would be prohibitive in terms of deep TensorFlow* optimized for Intel
neural network model training time. architecture.
https://newsroom.intel.com/news/using-deep-neural-network-acceleration-image-analysis-drug-discovery/
Intel does not control or audit third-party benchmark data or the web sites referenced in this document.
You should visit the referenced web site and confirm whether referenced data are accurate.
Result
$1 billion
Streamlined collection of US $1 billion
in revenue by designing new APIs for
car and license plate image recognition.

Client: SERPRO, Brazil’s Challenge: Across Brazil, 35,000 traffic Solution: Used deep learning techniques to
largest government- enforcement cameras document 45 million optimize SERPRO’s code. With Brazilian
owned IT services violations every year, generating US $1 student-partners, developed new
corporation, providing billion in revenue. Fully automating the algorithms, training and inference tests
technology services to complex, labor-intensive process for issuing using Google TensorFlow* on Dell EMC
multiple public sector tickets by integrating image recognition via PowerEdge R740*, running on Intel® Xeon®
agencies. AI could reduce costs and processing time. Scalable processor-based platforms.

http://www.decisionreport.com.br/destaque/serpro-desenvolve-produto-validador-cognitivo-de-infracoes/
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. *Other names and brands may be
You should visit the referenced web site and confirm whether referenced data are accurate. claimed as the property of others.
BUILDING A DISTRIBUTED
ML / DL PIPELINE FOR AI
Distributed ML / DL – Requirements

• Access to common data, big and small


• Choice of modeling techniques
• Ability to build, share, and iterate
• Reproducibility
• Easy scaling to test on real data sets
• Support for different roles and actions
Distributed ML / DL – Challenges
• Scalability, repeatability, and reproducibility across environments

Laptop On-Prem Cluster Off-Prem Cluster

• Deploying distributed platforms, libraries, applications, and versions


ML / DL Stack: Tools + Infrastructure
Role-based
Data Scientists Developers Data Engineers Decision Makers access control

Pipeline Tools Tools for


distributed
BI ML / DL ML / DL pipeline
FOR
Tools Tools
Intel optimized DL frameworks Other BI / DL / ML tools

Data Frameworks Access to data


and storage
Model Workflow
Kafka HDFS HBase Spark
Storage Mgmt
Distributed ML / DL Workflow

Conceptualize

• Sandbox /custom
• Multiple models
Measure Model Build environments
• Real-time /
• Access data
batch • Model storage
• Feedback loop

• Add libraries
• Save models
Deploy Iterate • Add features
• Publish API endpoints
• Rerun
Distributed ML / DL with BlueData

Data Scientists Developers Data Engineers Data Analysts

BlueData EPIC ™ Software Platform

Big Data Tools ML / DL Tools Data Science Tools BI / Analytics Tools Bring-Your-Own
ElasticPlane™ – Self-service, multi-tenant clusters
IOBoost™ – Extreme performance and scalability
DataTap™ – In-place access to data on-prem or in the cloud

Compute

Storage NFS HDFS


On-Premises Public Cloud
App Store with Pre-Built Images

Docker images for multiple


applications and versions

Ability to create and add


new images, and
save or restore tested
combinations on demand
On-Demand Deployment

On-demand creation of
ML / DL environments
with Jupyter, R Studio,
and Zeppelin notebooks

Environments can be
provisioned with a
Spark or Hadoop cluster
if needed
Clusters Provisioned in Minutes

Links to notebooks
(with Active Directory
authentication), integrated
with multi-node Spark

R Studio link for R users, or


Jupyter link for Python users
Ready-to-Use Notebooks

Notebooks with built-in


local and distributed
kernels for model building
and scoring
Multiple Data Sources with ACL

Access to HDFS data lake,


with pass-through security
from compute

Save and restore models and


data using shared NFS,
mounted as local FS
BlueData AI / ML Accelerator
A new turnkey software + services solution targeted at AI use cases:
• Accelerated deployment of multi-node containerized sandbox
environments for ML / DL, using BlueData’s EPIC software platform
• Ready-to-run Docker application images for popular ML / DL tools
• Available either on-premises or in the public cloud (AWS, Azure, GCP)
• Packaged with professional services, training, and support

Faster time-to-value and reduced TCO with


on-demand environments for distributed ML / DL
Key Takeaways: Distributed ML / DL

• Operationalizing distributed ML / DL is hard work


– Unique requirements for access to data, models, tools, etc.
– Need support for fast, iterative prototyping and reproducibility
– Requires ultimate flexibility as tools evolve and new options emerge
• Leverage a purpose-built solution (e.g. BlueData EPIC)
– Bring agility to building distributed ML / DL pipelines, powered by Docker
– Provide ability to share code, models, and data with secure multi-tenancy
– Enable self-service environments with on-demand provisioning
• Optimized for Intel Xeon architecture, on-prem or in the cloud
Intel® AI academy
For developers, students, instructors and startups

Get 4-weeks FREE access to


Get smarter using the Intel® AI DevCloud or
online tutorials,
webinars, student kits learn Develop use your existing Intel®
Xeon® Processor-based
and support forums cluster

Educate others using Showcase your innovation


available course at industry & academic
materials, hands-on
labs, and more
teach Share events and online via the
Intel AI community forum

software.intel.com/ai
Radhika Rangarajan Nanda Vijaydev
Thank You

For more information, visit:


www.bluedata.com
and
www.ai.intel.com

You might also like