Professional Documents
Culture Documents
intel-bluedata-ai-ml-webinar-62818-final_435477
intel-bluedata-ai-ml-webinar-62818-final_435477
Radhika Rangarajan
The path to deeper insight
Cognitive
Analytics
Prescriptive
Analytics
Predictive Self-Learning
Analytics How Do I Proceed?
Forecast
How Should I Proceed?
Diagnostic
Analytics Foresight
What Will Happen, When, and Why
AI
Descriptive Insight
Analytics What Happened and Why?
Hindsight
What Happened?
Is the driving force
AI will transform
Com
TOOLKITS
DEEP LEARNING (INFERENCE) REASONING DEEP LEARNING
ing
Intel® Deepsoon !
†
OpenVINO™ Intel® Movidius™ SDK Intel® Saffron™ AI
Application Open Visual Inference & Neural Network Optimized inference deployment Cognitive solutions on CPU
Learning Studio‡
Optimization toolkit for inference deployment on on Intel VPUs for for anti-money laundering, Open-source tool to
Developers CPU/GPU/FPGA for TensorFlow*, Caffe* & TensorFlow* & Caffe* predictive maintenance, more compress deep learning
MXNet* development cycle
Com
libraries
MACHINE LEARNING LIBRARIES DEEP LEARNING FRAMEWORKS
ing
Python R Distributed Now optimized for CPU Optimizations in progress soon
•Scikit- •Cart •MlLib (on Spark)
!
Data
*
* * * * *
*
FOR
Scientists •Pandas
•NumPy
Forest
•e1071 TensorFlow* MXNet* Caffe* BigDL Caffe2* PyTorch* CNTK*
(Spark)* PaddlePaddle*
foundation
ANALYTICS, MACHINE & DEEP LEARNING PRIMITIVES DEEP LEARNING GRAPH COMPILER
*
* *
FOR *
* * * *
More under optimization: and more…
SEE ALSO: Machine Learning Libraries for Python (Scikit-learn, Pandas, NumPy), R (Cart, randomForest, e1071), Distributed (MlLib on Spark, Mahout)
*Limited availability today
Other names and brands may be claimed as the property of others.
Who is building what….across the Industry
Client: Center for Digital Challenge: Projected by 2040 – 78M adults Solution: Apache Spark* with BigDL on
Health Innovation (CDHI) with doctor-diagnosed OA & 35M with arthritis- Cloudera (CDH 5.9*), on Intel® Xeon® servers
at UCSF, leveraging new attributable activity limitations. Need from Dell*. With 3D image convolution in
digital health automated system that classifies menisci based BigDL, the CDHI team built a MRI
technologies to on presence/absence of lesions, provides classification system & deployed it on their
transform healthcare. immediate objective results at MRI scan, & CDH Dell cluster.
eliminates intra-user variability.
https://cdn.oreillystatic.com/en/assets/1/event/269/Automatic%203D%20MRI%20knee%20damage%20classification%20with%203D%20CNN%20using%20BigDL%20on%20Spark%20Presentation.pdf
Intel does not control or audit third-party benchmark data or the web sites referenced in this document.
You should visit the referenced web site and confirm whether referenced data are accurate.
Result
60% increase
In coverage rate and 20% accuracy rate, better
than the traditional rule-based approach.
Client: China UnionPay*, Challenge: Detect fraudulent credit card Solution: Using Cloudera, Apache Spark*
which specializes in transactions with more coverage and with BigDL, running on Intel® Xeon® and 5th
banking services and accuracy. Gen Intel® Core™ Processors for credit card
payment systems. It is fraud detection. Historical data is stored on
the 3rd largest payment
network in the world. Apache Hive*. Data preprocessing done with
Apache Spark SQL*.
https://www.intel.com/content/www/us/en/financial-services-it/union-pay-case-study.html
Intel does not control or audit third-party benchmark data or the web sites referenced in this document.
You should visit the referenced web site and confirm whether referenced data are accurate.
Result
4X GaiN
in performance with Intel® Xeon® CPU
cluster, per JD.com. Processing ~380M
images with Intel Xeon CPU E5-2650 v4 @
2.20GHz with 1200 YARN cores.
Client: JD.Com*, second Challenge: Building deep learning Solution: Switched from GPU to CPU
largest online retailer in applications such as image similarity search cluster. Using Apache Spark* with BigDL,
China, with on GPU cluster was costly & complex. running on Intel® Xeon® processors. Intel
approximately 25 million Technical issues included high latency when delivered an image detection & extraction
registered users. downloading graphic data from Apache pipeline. BigDL used to build deep learning
HBase* & complicated data pre-processing models for image recognition & feature
in GPU environment. extraction.
https://software.intel.com/en-us/articles/building-large-scale-image-feature-extraction-with-bigdl-at-jdcom
Intel does not control or audit third-party benchmark data or the web sites referenced in this document.
You should visit the referenced web site and confirm whether referenced data are accurate.
Result
20x faster time-to-train
For processing a 10k image dataset, reducing
the training time from 11 hours to 31 minutes
with over 99% accuracy.
Collaborator: Challenge: High content screening of cellular Solution: Intel and Novartis teams were
Novartis International AG, phenotypes is a fundamental tool supporting able to scale to more than 120
based in Switzerland, and early stage drug discovery. While analyzing (3.9Megapixel) images per second with 32
one of the largest whole microscopy images would be desirable, TensorFlow* workers.
pharmaceutical these images are more than 26x larger than Configuration: A cluster consisting of eight
companies in the world. images found typically in datasets such as Intel® Xeon® processor servers using an
ImageNet*. As a result, the high computational Intel® Omni-Path Fabric interconnect and
workload would be prohibitive in terms of deep TensorFlow* optimized for Intel
neural network model training time. architecture.
https://newsroom.intel.com/news/using-deep-neural-network-acceleration-image-analysis-drug-discovery/
Intel does not control or audit third-party benchmark data or the web sites referenced in this document.
You should visit the referenced web site and confirm whether referenced data are accurate.
Result
$1 billion
Streamlined collection of US $1 billion
in revenue by designing new APIs for
car and license plate image recognition.
Client: SERPRO, Brazil’s Challenge: Across Brazil, 35,000 traffic Solution: Used deep learning techniques to
largest government- enforcement cameras document 45 million optimize SERPRO’s code. With Brazilian
owned IT services violations every year, generating US $1 student-partners, developed new
corporation, providing billion in revenue. Fully automating the algorithms, training and inference tests
technology services to complex, labor-intensive process for issuing using Google TensorFlow* on Dell EMC
multiple public sector tickets by integrating image recognition via PowerEdge R740*, running on Intel® Xeon®
agencies. AI could reduce costs and processing time. Scalable processor-based platforms.
http://www.decisionreport.com.br/destaque/serpro-desenvolve-produto-validador-cognitivo-de-infracoes/
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. *Other names and brands may be
You should visit the referenced web site and confirm whether referenced data are accurate. claimed as the property of others.
BUILDING A DISTRIBUTED
ML / DL PIPELINE FOR AI
Distributed ML / DL – Requirements
Conceptualize
• Sandbox /custom
• Multiple models
Measure Model Build environments
• Real-time /
• Access data
batch • Model storage
• Feedback loop
• Add libraries
• Save models
Deploy Iterate • Add features
• Publish API endpoints
• Rerun
Distributed ML / DL with BlueData
Big Data Tools ML / DL Tools Data Science Tools BI / Analytics Tools Bring-Your-Own
ElasticPlane™ – Self-service, multi-tenant clusters
IOBoost™ – Extreme performance and scalability
DataTap™ – In-place access to data on-prem or in the cloud
Compute
On-demand creation of
ML / DL environments
with Jupyter, R Studio,
and Zeppelin notebooks
Environments can be
provisioned with a
Spark or Hadoop cluster
if needed
Clusters Provisioned in Minutes
Links to notebooks
(with Active Directory
authentication), integrated
with multi-node Spark
software.intel.com/ai
Radhika Rangarajan Nanda Vijaydev
Thank You