Professional Documents
Culture Documents
AWS 기반 기계 학습 자동화 및 최적화를 위한 실전 기법
AWS 기반 기계 학습 자동화 및 최적화를 위한 실전 기법
AWS 기반 기계 학습 자동화 및
최적화를 위한 실전 기법
남궁영환 김대근
데이터 사이언티스트 SA 데이터 사이언티스트 SA
아마존웹서비스 아마존웹서비스
Agenda DEV DAY
• AI/ML at AWS
• Summary
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS ML Stack : 가장 깊고 폭넓은 역량과 기술의 집약
AI SERVICES
AMAZON
SAGEMAKER
(ML developers and Pre-bui lt algori thms & notebooks One-cli ck model trai ni ng & tuni ng One-cli ck deployment & hosti ng
data scientists) Data labeli ng (GROUND TRUTH) Opti mi zati on (N E O )
Al gor i thms & model s Rei nforcement learni ng
(AWS MARKETPLACE FOR MACHINE LEARNING)
ML FRAMEWORKS
& INFRASTRUCTURE
(ML researchers and EC2 P3 EC2 C5 FPGAs GREENGRASS ELASTIC INFERENTIA
academics) & INFERENCE
P3DN
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scaling TensorFlow near-linearly 256 GPUs at 2018
Stock AWS-Optimized
scaling efficiency TensorFlow TensorFlow
65% 90%
with 256 GPUs
Amazon SageMaker
및
AWS Deep Learning AMIs
에서 사용 가능
training
time 30 min 14 min
https://aws.amazon.com/about-aws/whats-new/2018/11/tensorflow-scalability-to-256-gpus/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://www.slideshare.net/ExtractConf
https://eng.uber.com/horovod/
- Andrew Ng - Uber
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
대규모 머신러닝이 중요한 이유 (2/3)
Scaling to Very Very Large Corpora for Natural Language Disambiguation, Banko and Brill, Microsoft Research (2001)
http://www.aclweb.org/anthology/P01-1005
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
대규모 머신러닝이 중요한 이유 (3/3)
• 공통 목표
ü 컴퓨팅, 네트워킹, 컨테이너, 분산 트레이닝 성능 튜닝, . . .
ü 머신러닝 엔지니어는 선호하는 ML/DL 프레임워크를 이용하여
비즈니스 성공에 기여할 수 있는 모델 개발에 집중
대규모 머신러닝은
• Data Management
문제 및 접근 방식에
ü 데이터의 규모 ∝ 해결 과제 및 알고리즘의 복잡도
따라 해결 방안이 ü 데이터의 견고성(durability) 및 가용성(availability)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to train and deploy deep learning models
“해결하려는 워크로드를 고려하여 적절한
ML/DL 모델 트레이닝 및 배포 환경을 선택합니다”
Amazon
Amazon EC2 Amazon
Elastic Container Service for Elastic Container Service
Kubernetes
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DEV DAY
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
P3 instance
3 가지 타입 중 14 리전
대규모 병렬 처리가 필요한 워크로드에 적합
• 기계학습 모델 트레이닝 P3.2xlarge P3.8xlarge P3.16xlarge
• 모든 ML 프레임워크 및 모델 타입 지원
• 다양한 형태의 인스턴스 사용 가능
(Spot instance 사용 시 최대 70% 비용 절감 가능)
https://aws.amazon.com/ko/ec2/instance-types/p3/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
P3dn.24xlarge instance
https://aws.amazon.com/ko/ec2/instance-types/p3/#Amazon_EC2_P3dn.24xlarge_Instances
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS FSx for Lustre
(https://aws.amazon.com/ko/fsx/lustre/)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Infrastructure for ML on AWS (1/3)
Amazon EFS
Deep Learning
Deep Learning hydrate
Placement Lustre Multi-node TensorFlow
Application Stack Auto Scaling worker nodes
kernel Container Registry
Cluster-wide Group driver
persistent storage
Multi-node parallel
Deep Learning
Placement Group
Bastion host | BeeGFS management node | Cluster monitoring
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://aws.amazon.com/ko/blogs/compute/distributed-deep-learning-made-easy/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Infrastructure for ML on AWS (3/3)
Cloud-native AWS Deep Learning Cluster
Amazon Event
AWS Cloud CloudWatch trigger
NVIDIA GPU-backed
running containers
https://aws.amazon.com/ko/blogs/compute/scalable-deep-learning-training-using-multi-node-parallel-jobs-with-aws-batch-and-amazon-fsx-for-lustre/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DEV DAY
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Horovod (1/9)
• 분산 딥러닝을 위한 오픈 소스 프레임워크
• Stock TensorFlow, Keras, PyTorch 등과 연동하여 동작
• 쉽고 간단한 설치 `pip install horovod`
• 고급 알고리즘 사용 가능
• High-Performance 네트워크 (RDMA, GPUDirect) 지원
• ML 엔지니어와 인프라를 분리
ü 인프라팀은 컨테이너 및 MPI 환경을 제공
ü ML 엔지니어는 선호하는 딥러닝 프레임워크 사용
ü 프레임워크 상에서 분산 트레이닝에 대한 공통 기대치 horovod.ai
(인프라팀 & ML 엔지니어)
https://eng.uber.com/horovod/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Horovod (2/9)
Worker A
5 13 8 19 42 1
Worker C Worker B
9 27 3 15 8 4 8 11 4 2 7 7
• Ring-AllReduce Worker A
Synchronous updates
Worker C Worker B
• 9 27 7 17 8 4 13 24 4 2 7 7
HOROVOD_HIERARCHICAL_ALLREDUCE=1
ü Tensor Fusion Worker A
22 51 15 36 50 5
HOROVOD_FUSION_THRESHOLD=67108864
Worker C Worker B
HOROVOD_CYCLE_TIME=5 22 51 7 17 57 12 13 24 15 36 57 12
ü FP16 all-reduce
hvd.DistributedOptimizer(...,compression=hvd.Compression.fp16) Worker A
22 51 15 36 57 12
Worker C Worker B
22 51 15 36 57 12 22 51 15 36 57 12
https://eng.uber.com/horovod/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Horovod (3/9) 4. Synchronize initial state between workers
hooks = [hvd.BroadcastGlobalVariablesHook(0)]
with tf.train.MonitoredTrainingSession(hooks=hooks,...) as mon_sess:
1. 라이브러리 초기화 ...
# OR
import horovod.tensorflow as hvd
bcast_op = hvd.broadcast_global_variables(0)
hvd.init()
sess.run(bcast_op)
3. Learning Rate 조정 및
Horovod 분산 Optimizer 추가 * Horovod for TensorFlow, Keras, and PyTorch
import horovod.tensorflow as hvd
opt = tf.train.MomentumOptimizer(
import horovod.keras as hvd
lr=0.01 * hvd.size())
import horovod.tensorflow.keras as hvd
opt = hvd.DistributedOptimizer(opt)
import horovod.torch as hvd
# more frameworks coming
Horovod (6/9)
[참고] 예제 코드 – Estimator API
import tensorflow as tf # Broadcast initial variable state.
import horovod.tensorflow as hvd hooks = \
[hvd.BroadcastGlobalVariablesHook(0)]
# Initialize Horovod
hvd.init() # Only checkpoint on rank 0
# Pin GPU to be used ckpt_dir = "/tmp/train_logs" \
config = tf.ConfigProto() if hvd.rank() == 0 else None
config.gpu_options.visible_device_list =
str(hvd.local_rank()) # Create the Estimator
mnist_classifier = tf.estimator.Estimator(
# Build model... model_fn=cnn_model_fn,
def model_fn(features, labels, mode): model_dir=ckpt_dir,
loss = ... config=tf.estimator.RunConfig(
opt = tf.train.MomentumOptimizer( session_config=config))
lr=0.01 * hvd.size())
mnist_classifier.train(
# Add Horovod Distributed Optimizer input_fn=train_input_fn,
opt = hvd.DistributedOptimizer(opt)
steps=100,
hooks=hooks)
return tf.estimator.EstimatorSpec(...)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
( source code from https://github.com/horovod/horovod )
Horovod (7/9)
[참고] 예제 코드 – Horovod for MxNet # Fetch and broadcast parameters
params = model.collect_params()
import mxnet as mx if params is not None:
import horovod.mxnet as hvd hvd.broadcast_parameters(params, root_rank=0)
from mxnet import autograd # Create DistributedTrainer, a subclass of gluon.Trainer
trainer = hvd.DistributedTrainer(params, opt)
# Initialize Horovod # Create loss function
hvd.init() loss_fn = ...
# Pin GPU to be used to process local rank # Train model
context = mx.gpu(hvd.local_rank()) for epoch in range(num_epoch):
num_workers = hvd.size() train_data.reset()
for nbatch, batch in enumerate(train_data, start=1):
# Build model data = batch.data[0].as_in_context(context)
model = ... label = batch.label[0].as_in_context(context)
model.hybridize() with autograd.record():
# Create optimizer output = model(data.astype(dtype, copy=False))
optimizer_params = ... loss = loss_fn(output, label)
opt = mx.optimizer.create('sgd', **optimizer_params) loss.backward()
# Initialize parameters trainer.step(batch_size)
model.initialize(initializer, ctx=context)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
( source code from https://github.com/horovod/horovod )
Horovod (8/9)
[참고] 예제 코드 – Horovod for Keras
import keras model.compile(
from keras import backend as K loss='categorical_crossentropy’,
import tensorflow as tf optimizer=opt,
import horovod.keras as hvd metrics=['accuracy'])
# Initialize Horovod
# Broadcast initial variable state.
hvd.init() callbacks = [hvd.callbacks.BroadcastGlobalVariabl
esCallback(0)]
# Pin GPU to be used
config = tf.ConfigProto() ...
config.gpu_options.visible_device_list = \ model.fit(
x_train,
str(hvd.local_rank())
K.set_session(tf.Session(config=config)) y_train,
callbacks=callbacks,
# Build model... epochs=10,
model = ... validation_data=(x_test, y_test))
opt = keras.optimizers.Adadelta(lr=1.0 * hvd.size())
# Add Horovod Distributed Optimizer.
opt = hvd.DistributedOptimizer(opt)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
( source code from https://github.com/horovod/horovod )
Horovod (9/9)
[참고] 예제 코드 – Horovod for PyTorch
import torch # Horovod: broadcast parameters
import horovod.torch as hvd hvd.broadcast_parameters(
model.state_dict(),
# Initialize Horovod root_rank=0)
hvd.init()
for epoch in range(100):
# Horovod: pin GPU to local rank for batch_idx, (data, target) in ...:
torch.cuda.set_device(hvd.local_rank())
optimizer.zero_grad()
output = model(data)
# Build model...
loss = F.nll_loss(output, target)
model = Net() loss.backward()
model.cuda()
optimizer.step()
optimizer = optim.SGD(model.parameters())
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DEV DAY
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scaling performance using distributed training
TensorFlow & Horovod on Amazon EC2
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scaling performance using distributed training
TensorFlow & Horovod on Amazon EC2
https://docs.aws.amazon.com/ko_kr/dlami/latest/devguide/tutorial-horovod-tensorflow.html
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scaling performance using distributed training
TensorFlow & Horovod on Amazon EC2 50,000
45,000
40,000
Images/Second
time-to-train: 47 min ~ 50 min Training using 35,000
30,000
P3 instances 25,000
구성 정보
20,000
(ResNet-50 & ImageNet)
15,000
10,000
5,000
• 8 * P3.16xlarge instances -
1 2 4 8 16 32 64
• DL Framework: TensorFlow, MxNet Number of GPUs
• ML model: ResNet-50
• Dataset: ImageNet (1.2 millions of images)
• Top-1 validation accuracy: 76%
https://aws.amazon.com/blogs/machine-learning/scalable-multi-node-deep-learning-training-using-gpus-in-the-aws-cloud/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scaling performance using distributed training
TensorFlow & Horovod on Amazon EC2
• 32 * P3.16xlarge instances
• DL Framework: TensorFlow
• ML model: ResNet-50
• Dataset: ImageNet
• Top-1 validation accuracy 75.4%
• Top-5 validation accuracy 92.6%
https://aws.amazon.com/ko/blogs/machine-learning/scalable-multi-node-training-with-tensorflow/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DEV DAY
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EKS 기반 분산 딥러닝 성능 최적화 (1/11)
https://aws.amazon.com/ko/quickstart/architecture/amazon-eks/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EKS 기반 분산 딥러닝 성능 최적화 (2/11)
Using Horovod in Amazon EKS
https://docs.aws.amazon.com/dlami/latest/devguide/deep-learning-containers-eks-tutorials-distributed-gpu-training.html
• STEP 4. Create a MPI Job template, define the number of nodes (replicas),
number of GPUs each node has (gpusPerReplica)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EKS 기반 분산 딥러닝 성능 최적화 (3/11)
EKS Deep Learning Benchmark Utility
https://github.com/aws-samples/aws-eks-deep-learning-benchmark
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EKS 기반 분산 딥러닝 성능 최적화 (4/11)
EKS Deep Learning Benchmark Utility
https://github.com/aws-samples/aws-eks-deep-learning-benchmark
• Setup NFS
kubectl create -f deploy/benchmark-nfs-svc.yaml
kubectl get svc benchmark-nfs-svc -o=jsonpath={.spec.clusterIP}
s3ResultPath: 's3://kubeflow-pipeline-data/benchmark/',
• Run the benchmmark jobs s3DatasetPath: 's3://eks-dl-benchmark/imagenet/',
clusterConfig: 's3://kubeflow-pipeline-data/benchmark/cluster_config.yaml',
experiments: [{
experiment: 'experiment-20190415-01',
1. Update your workflow trainingJobConfig: 's3://kubeflow-pipeline-data/benchmark/mpi-job-imagenet.yaml',
setting using ks command trainingJobPkg: 'mpi-job',
trainingJobPrototype: 'mpi-job-custom',
// Change to upstream once https://github.com/kubeflow/kubeflow/pull/3062 is merged
2. Update benchmark
trainingJobRegistry: 'github.com/jeffwan/kubeflow/tree/make_kubebench_reporter_optional/kubeflow',
}],
workflow manifest directly githubSecretName: 'github-token',
githubSecretTokenKeyName: 'GITHUB_TOKEN',
image: 'seedjeffwan/benchmark-runner:20190424',
name: '20190424-00',
namespace: 'default',
nfsVolume: 'benchmark-pv',
nfsVolumeClaim: 'benchmark-pvc',
region: 'us-west-2',
trainingDatasetVolume: 'dataset-claim',
s3SecretName: 'aws-secret',
s3SecretAccesskeyidKeyName: 'AWS_ACCESS_KEY_ID',
s3SecretSecretaccesskeyKeyName: 'AWS_SECRET_ACCESS_KEY',
storageBackend: 'fsx',
kubeflowRegistry: 'github.com/jeffwan/kubeflow/tree/make_kubebench_reporter_optional/kubeflow'
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EKS 기반 분산 딥러닝 성능 최적화 (6/11)
EKS Deep Learning Benchmark Utility
https://github.com/aws-samples/aws-eks-deep-learning-benchmark
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EKS 기반 분산 딥러닝 성능 최적화 (7/11)
( Amazon EKS + Kubeflow + AWS FSx CSI driver )
• Kubernetes
ü 컨테이너 기반의 다양한 ML/DL 프레임워크 지원
ü 탄력성 및 손쉬운 확장성 지원
ü Deep Neural Network 트레이닝 환경으로서 지속적으로 확산 중
• Amazon EKS
ü 완전 관리형 Kubernetes 서비스
ü EC2 P2, P3 인스턴스 상에서 Kubernetes 워크로드의 손쉬운 실행
• Kubeflow
ü 머신러닝 워크로드 효율적인 개발, 관리, 배포 등을 지원하는 Kubernetes-native 플랫폼
ü 분산 트레이닝 지원
(native TensorFlow architecture or MPI AllReduce (NVIDIA NCCL library or Horovod))
https://aws.amazon.com/ko/blogs/opensource/optimizing-distributed-deep-learning-performance-amazon-eks/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EKS 기반 분산 딥러닝 성능 최적화 (8/11)
( Amazon EKS + Kubeflow + AWS FSx CSI driver )
https://aws.amazon.com/ko/blogs/opensource/optimizing-distributed-deep-learning-performance-amazon-eks/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EKS 기반 분산 딥러닝 성능 최적화 (9/11)
( Amazon EKS + Kubeflow + AWS FSx CSI driver )
• Kubernetes v1.11.8
Amazon EKS-optimized • MPI Operator Alpha from Kubeflow 0.4.1
AMI with GPU support • CUDA 10 with NVIDIA Tesla 410.104 driver
• Docker 18.06.1-ce (incl. nvidia-docker2)
• FSx CSI Driver v0.1
AWS FSx for Lustre
• Hydrated from an S3 bucket
filesystem
(for ImageNet TFRecords)
• TENSORFLOW_VERSION: v1.13.1
• HOROVOD_VERSION: 0.16.0
TensorFlow
• CUDNN_VERSION: 7.4.2.24-1+cuda10.0
(customized image)
• NCCL_VERSION: 2.4.2-1+cuda10.0
• OPENMPI 4.0.0
• 1.28 millions of images (1000 classes)
Dataset (ImageNet) • 1024 training files & 128 validation files
(TFRecords)
• awscli, eksctl, ksonnet, and
Relevant tools
aws-iam-authenticator
https://aws.amazon.com/ko/blogs/opensource/optimizing-distributed-deep-learning-performance-amazon-eks/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EKS 기반 분산 딥러닝 성능 최적화 (10/11)
( Amazon EKS + Kubeflow + AWS FSx CSI driver )
ü AWS VPC CNI 플러그인 (최신버전)을 사용 (모든 NIC들이 EKS 클러스터 상에서 기본적으로
Jumbo Frame을 사용하도록)
https://aws.amazon.com/ko/blogs/opensource/optimizing-distributed-deep-learning-performance-amazon-eks/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EKS 기반 분산 딥러닝 성능 최적화 (11/11)
( Amazon EKS + Kubeflow + AWS FSx CSI driver )
ü MPI processor
ü 스레드 풀 조정 및 CPU 성능 튜닝
https://aws.amazon.com/ko/blogs/opensource/optimizing-distributed-deep-learning-performance-amazon-eks/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DEV DAY
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
TensorFlow 분산 트레이닝 in Amazon SageMaker (1/6)
Amazon SageMaker
• Amazon SageMaker는 Prebuilt TensorFlow 컨테이너를 제공 (TensorFlow v1.11+)
• ML 모델 트레이닝을 위한 하드웨어 리소스, 하이퍼파라미터 설정
• Training instances: ML 모델 트레이닝을 위한 비용 효율적이고 자동화된 클러스터
• Approaches for distributed training
ü TensorFlow’s native parameter server (TF v1.11+)
ü Horovod (TF v.1.12+)
https://aws.amazon.com/ko/blogs/machine-learning/launching-tensorflow-distributed-training-easily-with-horovod-or-parameter-servers-in-amazon-sagemaker/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
TensorFlow 분산 트레이닝 in Amazon SageMaker (2/6)
Parameter servers
• Multiple dedicated processes to
ü Collect gradients
(computed by “worker” processes)
ü Aggregate gradients
ü Distribute the updated gradients back to the
workers asynchronously
ü All-to-all communication model
• In Amazon SageMaker
ü No need to setup and manage the
parameter server cluster manually
ü A built-in script mode option
https://aws.amazon.com/ko/blogs/machine-learning/launching-tensorflow-distributed-training-easily-with-horovod-or-parameter-servers-in-amazon-sagemaker/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
TensorFlow 분산 트레이닝 in Amazon SageMaker (3/6)
https://aws.amazon.com/ko/blogs/machine-learning/launching-tensorflow-distributed-training-easily-with-horovod-or-parameter-servers-in-amazon-sagemaker/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
TensorFlow 분산 트레이닝 in Amazon SageMaker (4/6)
Horovod
• Amazon SageMaker 상에서 손쉬운 Horovod 클러스터 구성 자동화 및 실행 가능
• SageMaker TensorFlow container
ü sets up the MPI environment
ü run the mpirun command to start jobs on the cluster nodes
https://aws.amazon.com/ko/blogs/machine-learning/launching-tensorflow-distributed-training-easily-with-horovod-or-parameter-servers-in-amazon-sagemaker/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
TensorFlow 분산 트레이닝 in Amazon SageMaker (5/6)
https://aws.amazon.com/ko/blogs/machine-learning/launching-tensorflow-distributed-training-easily-with-horovod-or-parameter-servers-in-amazon-sagemaker/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
TensorFlow 분산 트레이닝 in Amazon SageMaker (6/6)
Larger # of gradients
Parameter Server or Horovod on
Long Parameter Server
a single instance with Multi-GPUs
Bigger model size
Smaller # of gradients
Short Parameter Server Horovod
Lesser model size
https://aws.amazon.com/ko/blogs/machine-learning/launching-tensorflow-distributed-training-easily-with-horovod-or-parameter-servers-in-amazon-sagemaker/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DEV DAY
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
fast.ai: Now anyone can train ImageNet in 18 minutes (1/5)
• Collaborators
ü Yaroslv Bulatov
ü Jeremy Howard
ü Andrew Shaw
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
fast.ai: Now anyone can train ImageNet in 18 minutes (2/5)
• Step 1
ü Find a good baseline
for single machine
• Step 2
ü Scale to multiple machines
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
fast.ai: Now anyone can train ImageNet in 18 minutes (3/5)
단일 머신 트레이닝
Trained ImageNet in 30 epochs Single p3.16xlarge instance
(instead of 90) trains to 93% in 1.5 hours
Number of steps
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
fast.ai: Now anyone can train ImageNet in 18 minutes (4/5)
분산 아키텍처
GPU0
Forward Backprop Gradients batch0_0
Gradients
Data Sync
GPU5 GPU1
Forward Backprop batch0_5 batch0_1
Gradients Gradients
GPU3
Forward Backprop Gradients batch0_3
Gradients
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
fast.ai: Now anyone can train ImageNet in 18 minutes (5/5)
기타 고려사항
Images/sec
aws configure
AMI python train.py
10k IOPS
Io2 volume
P3 instance
Number of steps
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DEV DAY
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Distributed Training of MnasNet on AWS (1/4)
MnasNet
• An automated mobile NAS* approach
• Trade-off between Accuracy and Latency
where
https://www.youtube.com/watch?v=4uDZxefPd-I
실행 예 (1/2) # Naming
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Distributed Training of MnasNet on AWS (3/4)
실행 예 (2/2)
...
I0923 16:15:22.086650 140202663954176 saver.py:1276] Restoring parameters from ./results_hvd/model.ckpt-62560
I0923 16:15:22.418808 140202663954176 session_manager.py:491] Running local_init_op.
I0923 16:15:22.426828 140202663954176 session_manager.py:493] Done running local_init_op.
I0923 16:15:47.475176 140202663954176 evaluation.py:277] Finished evaluation at 2019-09-23-16:15:47
I0923 16:15:47.475430 140202663954176 estimator.py:1979] Saving dict for global step 62560: global_step = 62560, loss =
2.1191003, top_1_accuracy = 0.74759614, top_5_accuracy = 0.9215545
I0923 16:15:47.475846 140202663954176 estimator.py:2039] Saving 'checkpoint_path' summary for global step 62560:
./results_hvd/model.ckpt-62560
I0923 16:15:47.476232 140202663954176 error_handling.py:93] evaluation_loop marked as finished
I0923 16:15:47.476345 140202663954176 mnasnet_main_hvd.py:1041] Eval results at step 62560: {'loss': 2.1191003,
'top_1_accuracy': 0.74759614, 'top_5_accuracy': 0.9215545, 'global_step': 62560}. Hvd rank 0
I0923 16:15:47.476416 140202663954176 mnasnet_main_hvd.py:1051] Finished training up to step 62560. Elapsed seconds 40649.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Distributed Training of MnasNet on AWS (4/4)
성능 테스트 결과 예
Machines • p3dn.24xlarge Num. of Top-1
Time-to-train
instances Validation
(hours)
• TENSORFLOW_VERSION: v1.13.1 (p3dn.24xlarge) Accuracy (%)
• CUDNN_VERSION: 7.4.2.24-1+cuda10.0
TensorFlow • NCCL_VERSION: 2.4.2-1+cuda10.0 1 29 75.2
• OPENMPI 4.0.0
2 24.3 74.5
• 1.28 millions of images (1000 classes)
Dataset • 1024 training files & 128 validation files
(ImageNet) (TFRecords) 4 9.0 74.67
Amazon SageMaker
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
감사합니다
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
여러분의 피드백을 기다립니다!
#AWSDEVDAYSEOUL
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.