Professional Documents
Culture Documents
Train Neural Networks Using AMD GPU and Keras - by Mattia Varile - Towards Data Science
Train Neural Networks Using AMD GPU and Keras - by Mattia Varile - Towards Data Science
You have 2 free member-only stories left this month. Sign up for Medium and get an extra one
AMD is developing a new HPC platform, called ROCm. Its ambition is to create a
common, open-source environment, capable to interface both with Nvidia (using CUDA)
and AMD GPUs (further information).
This tutorial will explain how to set-up a neural network environment, using AMD GPUs
in a single or multiple configurations.
On the software side: we will be able to run Tensorflow v1.12.0 as a backend to Keras on
top of the ROCm kernel, using Docker.
Hardware requirements
The official documentation (ROCm v2.1) suggests the following hardware solutions.
Supported CPUs
Get started Open in app
Current CPUs which support PCIe Gen3 + PCIe Atomics are:
Supported GPUs
ROCm officially supports AMD GPUs that use the following chips:
GFX8 GPUs
“Fiji” chips, such as on the AMD Radeon R9 Fury X and Radeon Instinct MI8
“Polaris 10” chips, such as on the AMD Radeon RX 480/580 and Radeon Instinct
MI6
“Polaris 11” chips, such as on the AMD Radeon RX 470/570 and Radeon Pro WX
4100
“Polaris 12” chips, such as on the AMD Radeon RX 550 and Radeon RX 540
GFX9 GPUs
“Vega 10” chips, such as on the AMD Radeon RX Vega 64 and Radeon Instinct MI25
HARDWARE
CPU: Intel Xeon E5–2630L
RAM: 2 x 8 GB
SOFTWARE
OS: Ubuntu 18.04 LTS
Get started Open in app
ROCm installation
In order to get everything working properly, is recommended to start the installation
process, within a fresh installed operating system. The following steps are referring to
Ubuntu 18.04 LTS operating system, for other OS please refer to the official
documentation.
Install ROCm
Is now required to update apt repository list and install rocm-dkms meta-package:
Set permissions
The official documentation suggests creating a new video group in order to have access
Get started Open in app
to GPU resources, using the current user.
groups
You may want to ensure that any future users you add to your system are put into the
“video” group by default. To do that, you can run the following commands:
reboot
/opt/rocm/bin/rocminfo
/opt/rocm/opencl/bin/x86_64/clinfo
echo 'export
PATH=$PATH:/opt/rocm/bin:/opt/rocm/profiler/bin:/opt/rocm/opencl/bin/
x86_64' | sudo tee -a /etc/profile.d/rocm.sh
rocm-smi
Tip: Take a look to rocm-smi -h command, to explore more functionalities and OC tools
Tensorflow Docker
The fastest and more reliable method to get ROCm + Tensorflow backend to work is to
use the docker image provided by AMD developers.
Install Docker CE
Get started Open in app
First, is required to install Docker. In order to do that, please follow the instructions for
Ubuntu systems:
Tip: To avoid inserting sudo docker <command> instead of docker <command> it’s useful to
provide access to non-root users: Manage Docker as a non-root user.
after a few minutes, the image will be installed in your system, ready to go.
For this reason is useful to create a persistent space in the physical drive for storing files
and Jupyter notebooks. The simpler method is to create a folder to initialize with a
docker container. To do that issue the command:
mkdir /home/$LOGNAME/tf_docker_share
This command will create a folder named tf_docker_share useful for storing and
Get started Open in app
reviewing data created within the docker.
Starting Docker
Now, execute the image in a new container session. Simply send the following
command:
docker run -i -t \
--network=host \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--workdir=/tf_docker_share \
-v $HOME/tf_docker_share:/tf_docker_share rocm/tensorflow:latest
/bin/bash
The docker is in execution on the directory /tf_docker_share and you should see
something similar to:
it means that you are now operating inside the Tensorflow-ROCm virtual system.
Installing Jupyter
Jupyter is a very useful tool, for the development, debug and test of neural networks.
Unfortunately, it’s not currently installed, as default, on the Tensorflow-ROCm, Docker
image, published by ROCm team. It’s therefore required to manually install Jupyter.
It will install the Jupyter package into the virtual system. Leave open this terminal.
2. Open a new terminal CTRL + ALT + T .
Get started Open in app
docker ps
The first column represents the Container ID of the executed container. Copy that
because it’s necessary for the next step.
3. It’s time to commit, to permanently write modifications of the image. From the same
terminal, execute:
4. To double check that the image has been generated correctly, from the same terminal,
issue the command:
docker images
docker run -i -t \
--network=host \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--workdir=/tf_docker_share \
-v $HOME/tf_docker_share:/tf_docker_share rocm/tensorflow:<tag>
/bin/bash
Cleaning
Firstly close all the previously executing Docker containers.
docker ps
docker run -i -t \
--network=host \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--workdir=/tf_docker_share \
-v $HOME/tf_docker_share:/tf_docker_share rocm/tensorflow:personal
/bin/bash
follow (press: CTRL + left mouse button on it) then, a new tab in your browser redirects
you to Jupyter root directory.
Notebook renaming
then press SHIFT + ENTER to execute. The output should look like:
Tensorflow V1.12.0
batch_size = 128
num_classes = 10
epochs = 10
We will now download and preprocess inputs, loading them into system memory.
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes,
Get started Open in app
activation='softmax'))
We will use a very simple, two-layer fully-connected network, with 512 neurons per
layer. It’s also included a 20% drop probability on the neuron connections, in order to
prevent overfitting.
model.summary()
Network architecture
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])
and start training:
Get started Open in app
Training process
The neural network has been trained on a single RX 480 with a respectable 47us/step.
For comparison, an Nvidia Tesla K80 is reaching 43us/step but is 10x more expensive.
Multi-GPU Training
As an additional step, if your system has multiple GPUs, is possible to leverage Keras
capabilities, in order to reduce training time, splitting the batch among different GPUs.
To do that, first it’s required to specify the number of GPUs to use for training by,
declaring an environmental variable (put the following command on a single cell and
execute):
!export
Get started HIP_VISIBLE_DEVICES=0,1,...
Open in app
Numbers from 0 to … are defining which GPU to use for training. In case you want to
disable GPU acceleration simply:
!export HIP_VISIBLE_DEVICES=-1
As an example, if you have 3 GPUs, the previous code will modify accordingly.
Conclusions
That concludes this tutorial. The next step will be to test a Convolutional Neural
Network on the MNIST dataset. Comparing performances in both single and multi-GPU.
What’s relevant here, is that AMD GPUs perform quite well under computational load at
a fraction of the price. GPU market is changing rapidly and ROCm gave to researchers,
engineers, and startups, very powerful, open-source tools to adopt, lowering upfront
costs in hardware equipment.
Please feel free to comment on this article in order to improve his quality and effectiveness.
Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday to
Thursday. Make learning your daily ritual. Take a look
Your email