AI Computing Trends - Challenges Innovations-Final

AI Computing
Trends - Challenges - Innovations

1 AI Industry Trend Analysis
2 AI Challenges and Innovations

Computing: Foundation for the Development of AI Industry
Facebook
AI Research
Baidu
Horizon Robotics Cloud
IBM Cloud
Computing
The core that facilitates and advances the development of the entire AI industry!
Inspur: Leading AI Computing
Inspur Full-Stack AI System Inspur AI Market Share

Vertical AI 18%
End-to-End AI Solutions
Solutions
Solutions
4%
TensorFlow-
TensorFlow- 4%
Optimized
Optimized Frameworks
Frameworks Caffe-MPI 51%
Opt
5%
Comprehensive 17%
AIStation
AIStation T-Eye
T-Eye
Management
Management Suite
Suite
Inspur Huawei Sugon Powerleader Enginetech

Leading
Leading AI
AI Computing
Computing FPGA
FPGA
GPU
GPU Server
Server CPU
CPU Server
Server Others
Platform Accelerator
Inspur: Leading AI Computing
ResNet 50
Rank Vendor Server CPU CPU GPU
Performance (minutes ）
1 Inspur NF5488A5 AMD EPYC 7742 2 A100-SXM4-40GB (400W) 33.37
2 NVIDIA DGX2H Intel Xeon Platinum 8174 2 V100-SXM3-32GB (450W) 35.25
3 NVIDIA DGXA100 AMD EPYC 7742 2 A100-SXM4-40GB (400W) 39.78
Intel® Xeon® Platinum
4 Alibaba HJBOG 2 A100-SXM4-40GB (400W)
8269CY 40.49
5 Fujitsu GX2570M5 Intel® Xeon® Platinum 8268 2 V100-SXM3-32GB (300W) 68.82
6 NVIDIA DGX1 Intel® Xeon® E5-2698 v4 2 V100-SXM2-16GB (300W) 71.28
7 DellEMC DSS8440 Intel® Xeon® Gold 6248 2 V100S-PCIe-32GB (300W) 78.54
8 DellEMC C4140 Intel® Xeon® Gold 6148 2 V100-SXM2-32GB (300W) 154.04
9 Intel - Intel® Xeon® Platinum 8380H 8 -

1104.53
2x AMD EPYC CPU , PCIe Gen4 1st Rank in MLPerf v0.7 Performance Resnet50 Benchmark, ~20%
8x NVidia A100 GPU higher than other A100 system 2020/07/29 MLPerf v0.7 Released
32x 64G DDR4-2933
4* 3.2T NVMe SSD Break 18th World records in MLPerf v0.7 Resnet50 Inference
scene. 2020/10/22 MLPerf v0.7 will release
NF5488A5
Trends in the Development of AI Industry
Shorten AI application R&D cycle Accelerate AI application launch

Improve innovation efficiency Reduce operational CAPEX
AI and IT infrastructure AI fast empowerment accelerate the

integration & operation traditional industry AI transformation
Trend 1: AI Innovation Speed Becomes Competitiveness Core
COMPUTING POWER
AI
GPU- Hours
Computing Amount
TRAINING
2500
2000
2000
1500
1000
MODELS/ALGORITHMS 500
256
0
New development of models and algorithms ResNet (Human) NasNet-A (AutoML)
require exponential compute power increase
Faster growth in computing power translates to faster AI R&D cycles!

Trend 2 ： AI Business Implementation Acceleration
Voice Assistant Smart Speaker NLP AR/VR
400 million/month >5 million 409 billion times / day > 1 billion
Voice Wake Up Times of DuerOS Annual Shipments of Tmall Genie Numbers of Baidu Brain Calls Number of SenseAR Terminal Users
Face Recognition Autonomous Driving Smart Retail xPU
Large-scale Applications 400+ >100 billion Various manufacturers

Super Counter, Face Scan Payment, , New Retail, Passenger numbers of free self-driving Smart retail market will be more AI-oriented FPGAs and custom chips
“Face Scanning” Check-in, Baby Come Home taxi test provided by Waymo than 100 billion in China
Waymo plan to charge later this year
Trend 3 ： Convergence of AI with IT Infrastructure
Microsoft deploys network virtualization of FPGA acceleration architecture

for large-scale processing in FPGA cloud, flexible and scalable Hardware Reduced enterprise
& Software architecture access costs
Intelligent robot remotely supported and updated, virtual intelligent

assistant inference applications are supported and updated through the
powerful cloud platform with AI / DL technologies in the background.
Cloud platform
Baidu's powerful cloud database technology supports the Apollo
research program and realize the development of DuerOS system.
Alibaba Cloud combines the system and algorithms to achieve the fastest More data sharing, more
model building with AI technology, supports multi-person simultaneous intelligent AI
development, A / B test
Trend 4 ： Acceleration of Traditional Industry AI Transformation
AI Enabled Virtual AI Enabled Policy Engine AI Enabled Medical System

Customer Assistant
Suning AI Customer Service Citic Securities AI Policy Engine Philips AI Intelligent Diagnosis
200 million members, accuracy rate 10X quantitative trading strategy
reaches 92% Lung nodule detection error
generation efficiency
rate<1%
1 AI Industry Trend Analysis
2 AI Challenges and Innovations

Challenges for AI R&D and Innovation
Server Computing Accelerator

Power Interconnection DL Model Development Resource Optimization
Demand for AI computing power DL performance requires more

grows rapidly, current AI servers DL models requires higher Investigate and alleviate the
GPUs in single server, higher
do not meet the requirements. performance then servers can performance bottleneck during
bandwidth connectivity between
provide. DL models & algorithms training, improve the overall system
the GPU servers.
need to be improved. resource efficiency
Challenges of AI Integration into Cloud Computing
Maximize AI computing efficiency on cloud platform
AI software evolves fast, difficult to be updated in the cloud stack
Heterogeneous AI computing has become a mainstream trend,

cloud platform has to be fully compatible
With AI training data explosion, cloud platform computing and

cloud storage performance have to be improved simultaneously
Inspur AI OpenStack Private Cloud Platform
Data Model Model Model

Key Characteristics AI model management development training Visualization
Training
layer Efficiency Improvement
Station
Converged Cloud + AI TensorFlow CNTK PaddlePaddle
infrastructure
AI
Fast AI development
AI Docker
Flexible heterogeneous resources
GPU GPU GPU
Mirroring
Scheduling Management Monitoring Frame mirroring
cloud environment Management Management
Packaged delivery
Extremely simplified AI Cloud Platform OpenStack
cluster management
Computing Storage Internet
resource Resource Resources
Physical Distributed
Resourc CPU GPU/FPGA 10G IB
Shared
e Layer Server Server network Network
Cloud storage
Inspur AI Station + OpenStack Cloud Platform Example
Computing Resource Pool Data management PaaS
Resource pooling, fine Unify cloud storage, With dozens of algorithm

scheduling, resource integrate data flow, improve groups, and more than 100
utilization increased 3X developers, the training model
data flow efficiency
size increased 3 times
1000+CPU/GPU
High performance cloud storage system Lightening speed deployment of deep
Linked data processing, labeling, learning development platform
Docker resource pooling and standardization, training, etc. Quick start network model training task
isolation Provide training data read-ahead and Full deep learning training service
Load balance, scheduling on caching mechanism
demand
Building up the AI Ecosystem
Summary
Accelerating R&D Innovation
Inspur Optimizing Business Operation

AI Computing
Leveraging Cloud Infrastructure
Shaping Industry Ecosystem

Thank you
COMPUTING INSPIRES FUTURE

AI Computing Trends - Challenges Innovations-Final

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AI Computing Trends - Challenges Innovations-Final

Uploaded by

Copyright:

Available Formats

AI Computing

Trends - Challenges - Innovations

2 AI Challenges and Innovations

Inspur Full-Stack AI System Inspur AI Market Share

Inspur Huawei Sugon Powerleader Enginetech

9 Intel - Intel® Xeon® Platinum 8380H 8 -

Shorten AI application R&D cycle Accelerate AI application launch

AI and IT infrastructure AI fast empowerment accelerate the

Faster growth in computing power translates to faster AI R&D cycles!

Voice Assistant Smart Speaker NLP AR/VR

Face Recognition Autonomous Driving Smart Retail xPU

Large-scale Applications 400+ >100 billion Various manufacturers

Microsoft deploys network virtualization of FPGA acceleration architecture

Intelligent robot remotely supported and updated, virtual intelligent

AI Enabled Virtual AI Enabled Policy Engine AI Enabled Medical System

2 AI Challenges and Innovations

Server Computing Accelerator

Demand for AI computing power DL performance requires more

Maximize AI computing efficiency on cloud platform

AI software evolves fast, difficult to be updated in the cloud stack

Heterogeneous AI computing has become a mainstream trend,

With AI training data explosion, cloud platform computing and

Data Model Model Model

Computing Resource Pool Data management PaaS

Resource pooling, fine Unify cloud storage, With dozens of algorithm

Accelerating R&D Innovation

Inspur Optimizing Business Operation

Shaping Industry Ecosystem

COMPUTING INSPIRES FUTURE

You might also like