CloudEngine 16800 Overview Presentation

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 22

A CloudEngine Built for the AI

Era
The AI Era Is
Approaching

AI
Cloud
86% in 2025
25% in 2018

Huawei GIV: enterprise AI adoption rate

Focus on mining and monetize data


Focus on applications and rapidly deploy services
In the AI Era, Mining Data Is the
Core
Computing and storage transformation
Digital flood improves data processing efficiency
requires high Unstructured data
requires AI for processing Network bottlenecks are eliminated
processing capability
Network

180 ZB 95% A
I
Storage Computing
Data generated annually: 8.6 Audio/Video
HDD  SSD  SCM CPU  GPU  AI chip
ZB (2015)  180 ZB (2025) Percentage of unstructured data

If the data due to be generated in 2025 was stored as 1080p HD videos, it would take more than a year
to watch all of the videos. "Data is not the goal – knowledge and intelligence are the core."
Three Challenges for Data Centers in the AI Era

50% 20x Hours

Packet loss rate of 0.1% reduces 100 GE cannot support Manual troubleshooting
AI computing power by 50% 20x traffic increase takes hours
Three Features of DC Switches in the AI
Era

Embedded AI 48 x 400 GE/slot Autonomous driving


Chip

100% AI computing power 48 x 400 GE per slot, supporting Fault identification in seconds and
5x traffic growth in the future location in minutes
Industry's First Data Center Switch Built for the AI Era

CloudEngine 16800
Data Center Networks Advance Towards the AI Era
Embedded AI Chip High-Density 400 GE Autonomous Driving

Packet Loss Has Become a Major Bottleneck in the AI Era

1 day 7 days 4 weeks > 50%


10 PB data collected 500 GPUs trained 40 GPUs trained once GPU idle time
Embedded AI Chip Autonomous Driving

High-Density 400 GE

Zero Packet Loss, 100% AI Computing Power

Unique intelligent Industry-first switch with


lossless algorithm embedded AI chip

st
iLosslessTM
1 AI

Eliminate packet loss on Create intelligent engines


Ethernet networks for switches
Embedded AI Chip High-Density 400 GE Autonomous Driving

Unique Intelligent Lossless Switching Algorithm

Millions of flows and Intelligent and optimal


Per-flow service detection
tens of thousands of matching of flows and queues
queues
Number of PFC
frames

Queue egress
utilization
Detect
Flows and
network … queues

AI training Waterline
High-performance
Detect database
services … Deep neural
network algorithm
iLossLess TM
algorit

0 packet loss 100% throughput


<10 μs E2E latency
Embedded AI Chip High-Density 400 GE Autonomous Driving

Embedded AI Chip

Highly efficient AI chip

• 8 TFLOPS

Dual-channel
high-performance CPU server

AI 25x
Note: ML/DL running
efficiency comparison

CloudEngine 16800
iLossless algorithm running on the best platform
Embedded AI Chip High-Density 400 GE Autonomous Driving

AI Services Require High-Density 400G Switches


AI training for first-time voice recognition
300 million parameters/20 ms real-time synchronization

Moore's Law

Switch

100 GE 400 GE switching


server 2X requirement
/24 months
800G evolution capability
Brought to market three years Data center traffic increase
earlier than expected
Embedded AI Chip High-Density 400 GE Autonomous Driving

Industry's Highest Performance

CloudEngine 16800
Other vendors
50%
lower power
consumption per bit of data

5x
better performance
per rack unit
5X
48 x 400 GE/slot 36 x 100 GE/slot
768 x 400 GE/chassis 576 x 100 GE/chassis
Embedded AI Chip High-Density 400 GE Autonomous Driving

Breakthroughs in High-Density 400 GE Technology

< 0.15 W/GE PUE < 1.1

Efficient Power Supply Powerful Heat Dissipation


SuperPower SuperCooling
High integration, 250 A → 700 A Power consumption density increases three-fold in the 400 GE era.
How can the size of a power distribution unit be controlled? How can efficient heat dissipation be achieved to meet DC PUE requirements?
Embedded AI Chip High-Density 400 GE Autonomous Driving

SuperPowerTM –– Efficient Power Supply


Intelligent power modules, magnetic blow-out technology 49% 95%
Space occupied by Power supply
power supplies efficiency per unit
Embedded AI Chip High-Density 400 GE Autonomous Driving

SuperCoolingTM –– Card-level
Phase Change Heat Dissipation

Carbon nanotube thermal pad


Phase change heat dissipation

Heat dissipation Temperature Card reliability


efficiency improved lowered by improved

4x 19℃ 20%
Embedded AI Chip High-Density 400 GE Autonomous Driving

SuperCoolingTM –– Heat
Dissipation for the Entire System

Power consumption Noise reduction


per bit

50% 6 dB
Industry's first Mute
mixed-flow fans deflector ring

Each switch saves 320,000 kWh and reduces carbon


emissions by more than 250 tons per year, lowering
electricity costs by CN¥260,000.
Embedded AI Chip High-Density 400 GE Autonomous Driving

New Challenges Faced by Intelligent O&M in DCs

Intelligent O&M platform


The load on the O&M
platform is increasingly high.
Collection → Storage →
Analysis → Decision-making

Telemetry Data collection Analysis server Decision-making time

1000x Approx. 100 Hours

V V

M M

Note: Statistics are based on a data center with 10,000 servers


Embedded AI Chip High-Density 400 GE Autonomous Driving

Enabling Autonomous Driving, Identifying Faults in


Seconds, and Locating Faults in Minutes

Fault identification Fault location

Minutes Hours
Manual

Seconds Minutes
Intelligent
* Calculation in known DC maintenance scenarios
Embedded AI Chip High-Density 400 GE Autonomous Driving

Three Key Factors of Autonomous Driving Networks

The distributed AI and two-level intelligent O&M architecture


Autonomous Driving Networks
allows faults to be identified and located rapidly,
realizing real-time self-healing.

AI Algorithm 20+ iNetOps algorithms


Analytics & Intelligence

Telemetr ms level
y
DCN Intent &
Automation
Edge Local inference and execution of 60% of faults
AI
CloudEngine 16800
Embedded AI Chip High-Density 400 GE Autonomous Driving

Intelligent O&M Algorithm

72+ types of typical faults

90% automatic fault location rate


Gaussian Process Regression

Reinforcement Learning

Network Correlation Analysis Logistic Regression

Logistic Regression Gaussian Process Regression

Non-Linear Regression Decision-Tree Learning

DBSCAN …
CloudEngine 16800 Series Models and Cards

CloudEngine 16800 Industry


Average
Programmable core Slot 19.2 Tbit/s 3.6 Tbit/s
components bandwidth
Microsegmentation-based
fine-grained isolation Port density 48 x 400 GE 36 x 100 GE

NSH-based service Power


chain orchestration c onsumption per 0.13 W 0.27 W
data bit

48 x 400 GE 18/36 x 100 GE 24 x 40 GE

48 x 100 GE 36 x 40 GE 48 x 10 GE
Huawei’s Vision and Mission

Bring digital to every person, home and organization for a fully connected, intelligent world

You might also like