Lez.b-06 - nVIDIA GPU and Servers

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

DELL AND NVIDIA FOR YOUR AI

WORKLOADS IN THE DATA CENTER


Helge Gose, NVIDIA Solution Architect, June 7, 2018
What is Deep Learning?

AGENDA Volta and NVLINK


Inference to Training – Dell solutions

2
THE TIME HAS COME FOR GPU COMPUTING
GPU-Accelerated
Computing
107

1.1X per year


105

103
1.5X per year

Single-threaded perf
1980 1990 2000 2010 2020

3
DEEP LEARNING
IS SWEEPING ACROSS INDUSTRIES
INTERNET SERVICES MEDICINE MEDIA & ENTERTAINMENT SECURITY & DEFENSE AUTONOMOUS MACHINES

INTERNET SERVICES

Image/Video classification Cancer cell detection Video captioning Face recognition Pedestrian detection
Speech recognition Diabetic grading Content based search Video surveillance Lane tracking
Natural language processing Drug discovery Real time translation Cyber security Recognize traffic signs

4
DEFINITIONS
A NEW COMPUTING MODEL
Algorithms that learn from examples
MACHINE LEARNING

TRADITIONAL APPROACH Car


Vehicle
Requires domain experts
Time-consuming experimentation
Custom algorithms Coupe
Not scalable to new problems

DEEP LEARNING

Car
DEEP NEURAL NETWORKS Vehicle
Learn from data
Easily to extend
Coupe
Accelerated with GPUs

6
WHAT PROBLEM ARE YOU SOLVING?
Defining the AI/DL Task
BUSINESS EXAMPLE OUTPUTS
INPUTS AI/DL TASK
QUESTION HEALTHCARE RETAIL FINANCE
Is “it” present
Detection Cancer Detection Targeted ads Cybersecurity
or not?

What type of thing


Classification Image Classification Basket Analysis Credit Scoring
is “it”?
Text Data Images
To what extent is Tumor Size/Shape Build 360º
Segmentation Credit Risk Analysis
“it” present? Analysis Customer View

What is the likely Survivability Sentiment &


Prediction Fraud Detection
outcome? Prediction behavior recognition

Video Audio What will likely Therapy Recommendation Algorithmic


Recommendations
satisfy the objective? Recommendation Engine Trading

7
VOLTA AND NVLINK
8
TESLA V100
WORLD’S MOST ADVANCED DATA CENTER GPU

5,120 CUDA cores


640 NEW Tensor cores
7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS | 125 Tensor TFLOPS
20MB SM RF | 16MB Cache
16GB/ 32GB HBM2 @ 900GB/s | 300GB/s NVLink

9
REVOLUTIONARY AI PERFORMANCE
3X Faster DL Training Performance
Exponential Performance over time Relative Time to Train Improvements
(GoogleNet) (LSTM)

100x
8x V100
cuDNN7
2X
80x CPU 15 Days

Speedup vs K80
60x

1X
18 Hours
40x P100
8x P100
cuDNN6
20x 4x M40
cuDNN3 1X
1x K80 6 Hours
cuDNN2 V100
0x
Q1 Q3 Q2 Q2
15 15 16 17
0 10 20

Over 80X DL Training


3X Reduction in Time to Train Over P100
Performance in 3 Years

10
GoogleNet Training Performance on versions of cuDNN Neural Machine Translation Training for 13 Epochs |German ->English, WMT15 subset | CPU = 2x
Vs 1x K80 cuDNN2 Xeon E5 2699 V4
END-TO-END PRODUCT FAMILY
TRAINING INFERENCE
DESKTOP DATA CENTER DATA CENTER EMBEDDED AUTOMOTIVE

JETPACK SDK DriveWorks SDK

Dell PowerEdge
C4140

TESLA P4

TITAN V
Jetson
TESLA V100 Drive PX

DGX Station TESLA V100

11
POWERING THE DEEP LEARNING ECOSYSTEM
NVIDIA SDK Accelerates Every Major Framework
COMPUTER VISION SPEECH & AUDIO NATURAL LANGUAGE PROCESSING
OBJECT DETECTION IMAGE CLASSIFICATION VOICE RECOGNITION LANGUAGE TRANSLATION RECOMMENDATION ENGINES SENTIMENT ANALYSIS

DEEP LEARNING FRAMEWORKS

Mocha.jl

NVIDIA DEEP LEARNING SDK

12
developer.nvidia.com/deep-learning-software
DELL AI SOLUTIONS
13
PowerEdge C4140 Server
Faster time to insights with ultra-dense accelerator optimized server platform

TA R G E T E D W O R K L O A D S

• Machine Learning and Deep


Learning
• Technical Computing (Research /
Life Sciences)
• Low latency, high performance
applications (FSI) Xeon Scalable Tesla
Processors GPUs

Key Capabilities

• Unthrottled performance and superior thermal efficiency with patent-pending interleaved


GPU system design*
• No-compromise (CPU + GPU) acceleration technology up to 500 TFLOPS / U+ using the
NVIDIA® Tesla™V100 with NVLink™
• 2.4KW PSUs help future-proof for next generation GPUs
* Based on Dell internal analyses and Principled Technologies Report - Jan 2015.
• Simplified deployment with pre-configured Ready Bundles
+
Based on V100 NVLink Tensor Core Performance

14
14 of 21 THE BEDROCK OF THE MODERN DATACENTER

C4140 – Now with NVIDIA Volta and NVLink™ ®

Faster time to insights with ultra-dense accelerator optimized server platform

NVIDIA® Volta GPU has over NVIDIA® NVLink™ is a high-


21 Billion Transistors and bandwidth interconnect
640 Tensor cores to deliver enabling ultra fast
100+ TFLOPS communication between
CPU and GPU and between
GPUs

 Volta V100 performs 2.6X avg. speed up for DL workloads than Pascal P100

 Delivers 44X more throughput compared to CPU nodes with lower latency

 NVLink 5X – 10X faster than traditional PCIe Gen3 Interconnect

 Volta-Optimized Software for important HPC applications

*Source: NVIDIA® Volta benchmarks for multiple applications 2017

15
15 of 21 THE BEDROCK OF THE MODERN DATACENTER
C4140 and NVLink™
PCIe Topology NVLink Topology

 NVLINK is 25Gbps versus PCIe at 8Gbps


 Increase in performance due to higher clock speed – 7%
 Increase in performance Peer to Peer GPU communication – 7%+

16
16 of 21 THE BEDROCK OF THE MODERN DATACENTER
17
17 of 21
INDUSTRY'S #1
Server Portfolio
PowerEdge
Now Introducing C4140

OpenManage Enterprise – Intelligent Automation Systems Management

Extreme Scale
Towers Racks Modular
Infrastructure
*Based on units sold (tie). IDC Worldwide Quarterly Server Tracker, Q1-Q3, 2016.

18 T H E B E D R O C K O F T H E M O D E R N D ATA C E N T E R
Dell - Internal Use - Confidential

You might also like