Professional Documents
Culture Documents
Lez.b-06 - nVIDIA GPU and Servers
Lez.b-06 - nVIDIA GPU and Servers
Lez.b-06 - nVIDIA GPU and Servers
2
THE TIME HAS COME FOR GPU COMPUTING
GPU-Accelerated
Computing
107
103
1.5X per year
Single-threaded perf
1980 1990 2000 2010 2020
3
DEEP LEARNING
IS SWEEPING ACROSS INDUSTRIES
INTERNET SERVICES MEDICINE MEDIA & ENTERTAINMENT SECURITY & DEFENSE AUTONOMOUS MACHINES
INTERNET SERVICES
Image/Video classification Cancer cell detection Video captioning Face recognition Pedestrian detection
Speech recognition Diabetic grading Content based search Video surveillance Lane tracking
Natural language processing Drug discovery Real time translation Cyber security Recognize traffic signs
4
DEFINITIONS
A NEW COMPUTING MODEL
Algorithms that learn from examples
MACHINE LEARNING
DEEP LEARNING
Car
DEEP NEURAL NETWORKS Vehicle
Learn from data
Easily to extend
Coupe
Accelerated with GPUs
6
WHAT PROBLEM ARE YOU SOLVING?
Defining the AI/DL Task
BUSINESS EXAMPLE OUTPUTS
INPUTS AI/DL TASK
QUESTION HEALTHCARE RETAIL FINANCE
Is “it” present
Detection Cancer Detection Targeted ads Cybersecurity
or not?
7
VOLTA AND NVLINK
8
TESLA V100
WORLD’S MOST ADVANCED DATA CENTER GPU
9
REVOLUTIONARY AI PERFORMANCE
3X Faster DL Training Performance
Exponential Performance over time Relative Time to Train Improvements
(GoogleNet) (LSTM)
100x
8x V100
cuDNN7
2X
80x CPU 15 Days
Speedup vs K80
60x
1X
18 Hours
40x P100
8x P100
cuDNN6
20x 4x M40
cuDNN3 1X
1x K80 6 Hours
cuDNN2 V100
0x
Q1 Q3 Q2 Q2
15 15 16 17
0 10 20
10
GoogleNet Training Performance on versions of cuDNN Neural Machine Translation Training for 13 Epochs |German ->English, WMT15 subset | CPU = 2x
Vs 1x K80 cuDNN2 Xeon E5 2699 V4
END-TO-END PRODUCT FAMILY
TRAINING INFERENCE
DESKTOP DATA CENTER DATA CENTER EMBEDDED AUTOMOTIVE
Dell PowerEdge
C4140
TESLA P4
TITAN V
Jetson
TESLA V100 Drive PX
11
POWERING THE DEEP LEARNING ECOSYSTEM
NVIDIA SDK Accelerates Every Major Framework
COMPUTER VISION SPEECH & AUDIO NATURAL LANGUAGE PROCESSING
OBJECT DETECTION IMAGE CLASSIFICATION VOICE RECOGNITION LANGUAGE TRANSLATION RECOMMENDATION ENGINES SENTIMENT ANALYSIS
Mocha.jl
12
developer.nvidia.com/deep-learning-software
DELL AI SOLUTIONS
13
PowerEdge C4140 Server
Faster time to insights with ultra-dense accelerator optimized server platform
TA R G E T E D W O R K L O A D S
Key Capabilities
14
14 of 21 THE BEDROCK OF THE MODERN DATACENTER
™
C4140 – Now with NVIDIA Volta and NVLink™ ®
Volta V100 performs 2.6X avg. speed up for DL workloads than Pascal P100
Delivers 44X more throughput compared to CPU nodes with lower latency
15
15 of 21 THE BEDROCK OF THE MODERN DATACENTER
C4140 and NVLink™
PCIe Topology NVLink Topology
16
16 of 21 THE BEDROCK OF THE MODERN DATACENTER
17
17 of 21
INDUSTRY'S #1
Server Portfolio
PowerEdge
Now Introducing C4140
Extreme Scale
Towers Racks Modular
Infrastructure
*Based on units sold (tie). IDC Worldwide Quarterly Server Tracker, Q1-Q3, 2016.
18 T H E B E D R O C K O F T H E M O D E R N D ATA C E N T E R
Dell - Internal Use - Confidential