Professional Documents
Culture Documents
05 Huawei MindSpore AI Development Framework - ALEX
05 Huawei MindSpore AI Development Framework - ALEX
05 Huawei MindSpore AI Development Framework - ALEX
Foreword
2 Huawei Confidential
Objectives
3 Huawei Confidential
Contents
1. Development Framework
Architecture
▫ Key Features
4 Huawei Confidential
5 Huawei Confidential
How will deep learning develop?
15B
340M
90M
TBE
(operator development) ① Equivalent to open-source frameworks in the industry, MindSpore
1
3 preferentially serves self-developed chips and cloud services.
CCE CUDA... ② It supports upward interconnection with third-party frameworks and can
interconnect with third-party ecosystems through Graph IR, including
training frontends and inference models. Developers can expand the
Ascend chips Third-party chips
capability of MindSpore.
7 Huawei Confidential
Overall Solution: Core Architecture
MindSpore
Unified APIs for all scenarios Easy development:
AI Algorithm As Code
Business Number 02
MindSpore intermediate representation (IR) for computational Efficient execution:
graph
Optimized for Ascend
Deep graph
On-device execution Pipeline parallelism
optimization GPU support
8 Huawei Confidential
MindSpore Design: Auto Differ
S S
Technical path
of automatic
differential
9 Huawei Confidential
Example of Differentitation
10 Huawei Confidential
Graph-Based AD
d
x
brop_pow Pros:
d
1. Static graph:easy to optimize、parallel
brop_mul brop_add
y
Cons:
x pow
add f 1. Extra learning
2. Easy to cause the problem of graph expansion
y mul
11 Huawei Confidential
Source Code Transformation
MindSpore
12 Huawei Confidential
13 Huawei Confidential
Auto Parallelism
Subgraph 1 Dens
MatMul
e
Subgraph 2
Dens
MatMul
e
Network
CPU CPU
Ascend Ascend Ascend Ascend
14 Huawei Confidential
Segmentation strategy search
Highlight2:High effiency searching algorithm
graph
cost
strategy
Cost model strategy
cost
Rearrangement
Operator Strategy
strategy
Strategy
simplification
Dynamic
Target
planning
function
Strate
gy
15 Huawei Confidential
On-Device Execution (1)
Challenges Key Technologies
Challenges for model execution with supreme chip computing power: Chip-oriented deep graph optimization reduces the
Memory wall, high interaction overhead, and data supply difficulty. synchronization waiting time and maximizes the parallelism
Partial operations are performed on the host, while the others are of data, computing, and communication. Data pre-
performed on the device. The interaction overhead is much greater processing and computation are integrated into the Ascend
than the execution overhead, resulting in the low accelerator usage. chip:
conv
CPU
conv
bn
relu6
add
conv GPU
bn Data copy
Conditional Jump Task
relu6 Dependent notification task
kernel1 kernel2 …
dwconv
bn Effect: Elevate the training performance tenfold
Large data interaction overhead
relu6 and difficult data supply compared with the on-host graph scheduling.
16 Huawei Confidential
On-Device Execution (2)
Challenges Key Technologies
Challenges for distributed gradient aggregation with supreme chip The optimization of the adaptive graph segmentation driven by gradient
computing power: data can realize decentralized All Reduce and synchronize gradient
the synchronization overhead of central control and the communication aggregation, boosting computing and communication efficiency.
overhead of frequent synchronization of ResNet50 under the single
iteration of 20 ms; the traditional method can only complete All Reduce
after three times of synchronization, while the data-driven method can
autonomously perform All Reduce without causing control overhead. Device
1
Leader
All Gather
Device Device
Broadcast 2
1024 workers 5
All Reduce
Gradient synchronization
Gradient synchronization
Effect: a smearing overhead of less than 2 ms
17 Huawei Confidential
Distributed Device-Edge-Cloud Synergy Architecture
Effect: consistent model deployment performance across all scenarios thanks to the unified
architecture, and improved precision of personalized models
18 Huawei Confidential
Contents
1. Development Framework
▫ Architecture
Features
19 Huawei Confidential
AI Computing Framework: Challenges
20 Huawei Confidential
New Programming Paradigm
Algorithm scientist Algorithm scientist
+
Experienced system developer
MindSpore
-2050Loc Other
-2550Loc
One-line
Automatic parallelism
21 Huawei Confidential
Code Example
TensorFlow code snippet: XX lines, manual MindSpore code snippet: two lines,
parallelism automatic parallelism
class DenseMatMulNet(nn.Cell):
def __init__(self):
super(DenseMutMulNet, self).__init__()
self.matmul1 = ops.MatMul.set_strategy({[4, 1], [1, 1]})
self.matmul2 = ops.MatMul.set_strategy({[1, 1], [1, 4]})
def construct(self, x, w, v):
y = self.matmul1(x, w)
z = self.matmul2(y, v)
return s
22 Huawei Confidential
New Execution Mode (1)
23 Huawei Confidential
New Execution Mode (2)
ResNet 50 V1.5
ImageNet 2012
With the best batch
size of each card
1802
(Images/Second) Detecting objects in 60 ms
965
(Images/Second)
Tracking objects in 5 ms
24 Huawei Confidential
New Collaboration Mode
v.s
.
Development Deployment
• Varied requirements, objectives, and constraints for
device, edge, and cloud application scenarios
v.s.
Execution Saving model
25 Huawei Confidential
High Performance
Cross-layer memory
26 Huawei Confidential
Vision and Value
Efficient
development
Profound expertise
required
Algorithms
Programming
Flexible Outstanding
deployment performance
Long deployment Diverse computing units
duration and models
Unified development CPU + NPU
flexible deployment Graph + Matrix
27 Huawei Confidential
Contents
1. Development Framework
28 Huawei Confidential
Installing MindSpore
Method 1: source code compilation and installation
Installation commands:
29 Huawei Confidential
Getting Started
Module Description
In MindSpore, data is stored in tensors.
model_zoo Defines common network models
Common tensor operations:
Data loading module, which defines the dataloader and
asnumpy() communication dataset and processes data such as images and texts.
__Str__# (conversion into strings) train Training model and summary function modules.
30 Huawei Confidential
Programming Concept: Operation
Softmax operator Common operations in MindSpore:
- array: Array-related operators
1. Name 2. Base class - ExpandDims - Squeeze
- Concat - OnesLike
- Select - StridedSlice
- ScatterNd…
3. Comment
- math: Math-related operators
- AddN - Cos
- Sub - Sin
- Mul - LogicalAnd
- MatMul - LogicalNot
- RealDiv - Less
- ReduceMean - Greater…
4. Attributes of the operator are initialized here
- nn: Network operators
- Conv2D - MaxPool
- Flatten - AvgPool
- Softmax - TopK
- ReLU - SoftmaxCrossEntropy
- Sigmoid - SmoothL1Loss
- Pooling - SGD
- BatchNorm - SigmoidCrossEntropy…
5. The shape of the output tensor can be derived based
on the input parameter shape of the operator. - control: Control operators
- ControlDepend
6. The data type of the output tensor can be derived
based on the data type of the input parameters.
- random: Random operators
31 Huawei Confidential
Programming Concept: Cell
A cell defines the basic module for calculation. The objects of the cell can be directly
executed.
__init__: It initializes and verifies modules such as parameters, cells, and primitives.
Construct: It defines the execution process. In graph mode, a graph is compiled for execution and
is subject to specific syntax restrictions.
bprop (optional): It is the reverse direction of customized modules. If this function is undefined,
automatic differential is used to calculate the reverse of the construct part.
Cells predefined in MindSpore mainly include: common loss (Softmax Cross Entropy With
Logits and MSELoss), common optimizers (Momentum, SGD, and Adam), and common
network packaging functions, such as TrainOneStepCell network gradient calculation and
update, and WithGradCell gradient calculation.
32 Huawei Confidential
Programming Concept: MindSporeIR
MindSporeIR is a compact, efficient, and
flexible graph-based functional IR that can
represent functional semantics such as free
variables, high-order functions, and
recursion. It is a program carrier in the
process of AD and compilation optimization.
Each graph represents a function definition
graph and consists of ParameterNode,
ValueNode, and ComplexNode (CNode).
The figure shows the def-use relationship.
33 Huawei Confidential
Development Case
Let’s take the recognition of MNIST handwritten digits as an example to
demonstrate the modeling process in MindSpore.
Data Network Model Application
34 Huawei Confidential
1. Development Framework
35 Huawei Confidential
Installing MindSpore
Method 1: source code compilation and installation
Installation commands:
36 Huawei Confidential
37 Huawei Confidential
38 Huawei Confidential
39 Huawei Confidential
40 Huawei Confidential
41 Huawei Confidential
42 Huawei Confidential
43 Huawei Confidential
44 Huawei Confidential
45 Huawei Confidential
46 Huawei Confidential
47 Huawei Confidential
48 Huawei Confidential
Model Evaluation
net = LeNet5()
loss = nn.loss.SoftmaxCrossEntropyWithLogits(is_grad=False, sparse=True, reduction='mean')
opt = nn.Momentum(net.trainable_params(), lr, momentum)
loss_cb = LossMonitor(per_print_times=ds_train.get_dataset_size())
LeNet5 tutorial:https://gitee.com/mindspore/course/tree/master/checkpoint
49 Huawei Confidential
Checkpoint
Checkpoint tutorial:https://gitee.com/mindspore/course/tree/master/checkpoint
50 Huawei Confidential
Model inferencing
• Use MindSpore for inferencing
> Load Checkpoint into your net
> Call model.predict for inferencing
CKPT_2 = 'ckpt/lenet_1-2_1875.ckpt'
def infer(data_dir):
ds = create_dataset(data_dir, training=False).create_dict_iterator()
data = ds.get_next()
images = data['image']
labels = data['label']
net = LeNet5()
load_checkpoint(CKPT_2, net=net)
model = Model(net)
output = model.predict(Tensor(data['image']))
preds = np.argmax(output.asnumpy(), axis=1)
Parameter
parameter attribute description Default ranges
type
--help optional Display commands help。 - - -
--model_path <MODEL_PATH> required Set model path。 str 空 -
--model_name <MODEL_NAME> required Set model name。 str 空 -
--port <PORT> optional Set service port number。 int 5500 1~65535
--device_id <DEVICE_ID> optional Set Device ID int 0 0~7
https://www.mindspore.cn/tutorial/zh-CN/r0.6/advanced_use/serving.html
52 Huawei Confidential
Interfaces comparison class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d()
self.fc1 = nn.Linear()
MindSpore shares the similar API and code style as tf.keras and def forward(self, x):
x = self.conv1(x)
pytorch,easy to use and migrate x = F.relu(x)
x = self.fc1(x)
class Net(tf.keras.Model): class Net(nn.Cell): return x
def __init__(self): def __init__(self):
super(MyModel, self).__init__() super(Net, self).__init__() net = Net()
self.conv1 = tf.keras.layers.Conv2D() self.conv1 = nn.Conv2d() loss_fn = nn.CrossEntropyLoss()
self.relu = tf.keras.layers.ReLU() self.relu = nn.ReLU() opt = optim.SGD(momentum=0.9)
self.fc1 = tf.keras.layers.Dense() self.fc1 = nn.Dense()
dataset = torchvision.datasets.CIFAR10()
def call(self, x): def construct(self, x): loader = torch.utils.data.DataLoader(dataset)
x = self.conv1(x) x = self.conv1(x)
x = self.relu(x) x = self.relu(x) for epoch in range(5):
x = self.fc1(x) x = self.fc1(x) for i, data in enumerate(loader, 0):
return x return x opt.zero_grad()
outputs = net(data[0])
model = Net() net = Net() loss = loss_fn(outputs, data[1])
loss_fn = loss_fn = loss.backward()
tf.keras.losses.SparseCategoricalCrossentropy() nn.loss.SoftmaxCrossEntropyWithLogits() opt.step()
opt = tf.keras.optimizers.Adam() opt = nn.Momentum()
with torch.no_grad():
data = tf.keras.datasets.cifar10.load_data() dataset = dataset.Cifar10Dataset() for data in dataloader:
outputs = net(data[0])
model.compile(optimizer=opt, loss=loss_fn, model = train.Model(net, loss_fn, opt, _, pred = torch.max(outputs.data, 1)
metrics=['accuracy']) metrics={'acc'}) total += data[1].size(0)
model.fit(data[0], data[1], 5) model.train(5, dataset) correct += (pred == labels).sum().item()
model.evaluate(data[0], data[1]) model.eval(dataset) acc = 100 * correct / total
Tensorflow.keras
53 Huawei Confidential MindSpore Pytorch
Quiz
B. Network
C. Control
D. Others
54 Huawei Confidential
Summary
This chapter describes the framework, design, features, and the environment
setup process and development procedure of MindSpore.
55 Huawei Confidential
More Information
TensoFlow: https://www.tensorflow.org/
PyTorch: https://pytorch.org/
Mindspore: https://www.mindspore.cn/en
56 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.