Openstack Admin2 CL210 RHOSP10.1 en 2 20171006

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 454

RED HAT®

TRAINING

Comprehensive, hands-on training that solves real world problems

Red Hat OpenStack


Administration II
Student Workbook

© 2017 Red Hat, Inc. CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Rendered for Nokia. Please do not distribute.
RED HAT
OPENSTACK
ADMINISTRATION II

Rendered for Nokia. Please do not distribute.


Red Hat OpenStack Administration II

Red Hat OpenStack Platform 10.1 CL210


Red Hat OpenStack Administration II
Edition 2 20171006 20171006

Authors: Adolfo Vazquez, Snehangshu Karmakar, Razique Mahroua,


Morgan Weetman, Victor Costea, Michael Jarrett, Philip Sweany,
Fiona Allen, Prasad Mukhedkar
Editor: Seth Kenlon, David O'Brien, Forrest Taylor, Robert Locke

Copyright © 2017 Red Hat, Inc.

The contents of this course and all its modules and related materials, including handouts to
audience members, are Copyright © 2017 Red Hat, Inc.

No part of this publication may be stored in a retrieval system, transmitted or reproduced in


any way, including, but not limited to, photocopy, photograph, magnetic, electronic or other
record, without the prior written permission of Red Hat, Inc.

This instructional program, including all material provided herein, is supplied without any
guarantees from Red Hat, Inc. Red Hat, Inc. assumes no liability for damages or legal action
arising from the use or misuse of contents or details contained herein.

If you believe Red Hat training materials are being used, copied, or otherwise improperly
distributed please e-mail training@redhat.com or phone toll-free (USA) +1 (866) 626-2994
or +1 (919) 754-3700.

Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, Hibernate, Fedora, the
Infinity Logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and
other countries.

Linux® is the registered trademark of Linus Torvalds in the United States and other
countries.

Java® is a registered trademark of Oracle and/or its affiliates.

XFS® is a registered trademark of Silicon Graphics International Corp. or its subsidiaries in


the United States and/or other countries.

The OpenStack® Word Mark and OpenStack Logo are either registered trademarks/service
marks or trademarks/service marks of the OpenStack Foundation, in the United States
and other countries and are used with the OpenStack Foundation's permission. We are not
affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack
community.

All other trademarks are the property of their respective owners.

Rendered for Nokia. Please do not distribute.


Document Conventions                                                                                                                                                                                                     vii
Notes and Warnings ................................................................................................ vii

Introduction                                                                                                                                                                                                                                       ix
Red Hat OpenStack Administration II ......................................................................... ix
Orientation to the Classroom Environment ................................................................. x
Internationalization ................................................................................................ xix

1. Managing an Enterprise OpenStack Deployment                                                                                                                           1


Describing Undercloud and Overcloud Architectures .................................................... 2
Quiz: Describing Undercloud and Overcloud Architectures ............................................ 8
Describing Undercloud Components ......................................................................... 10
Guided Exercise: Describing Undercloud Components ................................................. 17
Verifying the Functionality of Overcloud Services ...................................................... 20
Guided Exercise: Verifying the Functionality of Overcloud Services .............................. 28
Lab: Managing an Enterprise OpenStack Deployment ................................................. 35
Summary ............................................................................................................... 41

2. Managing Internal OpenStack Communication                                                                                                                           43


Describing the Identity Service Architecture ............................................................. 44
Quiz: Describing the Identity Service Architecture ...................................................... 51
Administering the Service Catalog ........................................................................... 53
Guided Exercise: Administering the Service Catalog ................................................... 57
Managing Message Brokering ................................................................................... 61
Guided Exercise: Managing Message Brokering ......................................................... 66
Lab: Managing Internal OpenStack Communication .................................................... 70
Summary .............................................................................................................. 76

3. Building and Customizing Images                                                                                                                                                               77


Describing Image Formats ....................................................................................... 78
Quiz: Describing Image Formats .............................................................................. 80
Building an Image .................................................................................................. 82
Guided Exercise: Building an Image .......................................................................... 87
Customizing an Image ............................................................................................. 91
Guided Exercise: Customizing an Image .................................................................... 95
Lab: Building and Customizing Images .................................................................... 102
Summary ............................................................................................................. 109

4. Managing Storage                                                                                                                                                                                                             111


Describing Storage Options ..................................................................................... 112
Quiz: Describing Storage Options ............................................................................ 116
Configuring Ceph Storage ....................................................................................... 118
Guided Exercise: Configuring Ceph Storage .............................................................. 124
Managing Object Storage ....................................................................................... 128
Guided Exercise: Managing Object Storage .............................................................. 135
Lab: Managing Storage .......................................................................................... 138
Summary ............................................................................................................. 143

5. Managing and Troubleshooting Virtual Network Infrastructure                                                                         145


Managing SDN Segments and Subnets .................................................................... 146
Guided Exercise: Managing SDN Segments and Subnets ............................................ 156
Tracing Multitenancy Network Flows ....................................................................... 164
Guided Exercise: Tracing Multitenancy Network Flows ............................................... 184
Troubleshooting Network Issues ............................................................................. 197

CL210-RHOSP10.1-en-2-20171006 v

Rendered for Nokia. Please do not distribute.


Red Hat OpenStack Administration II

Guided Exercise: Troubleshooting Network Issues ...................................................... 211


Lab: Managing and Troubleshooting Virtual Network Infrastructure ........................... 220
Summary ............................................................................................................. 232
6. Managing Resilient Compute Resources                                                                                                                                         233
Configuring an Overcloud Deployment ................................................................... 234
Guided Exercise: Configuring an Overcloud Deployment ........................................... 246
Scaling Compute Nodes ........................................................................................ 254
Guided Exercise: Scaling Compute Nodes ................................................................ 264
Migrating Instances using Block Storage ................................................................. 267
Guided Exercise: Migrating Instances using Block Storage ......................................... 273
Migrating Instances with Shared Storage ................................................................ 280
Guided Exercise: Migrating Instances with Shared Storage ........................................ 284
Lab: Managing Resilient Compute Resources ............................................................ 291
Summary ............................................................................................................. 302
7. Troubleshooting OpenStack Issues                                                                                                                                                         303
Troubleshooting Compute Nodes ........................................................................... 304
Guided Exercise: Troubleshooting Compute Nodes ................................................... 309
Troubleshooting Authentication and Messaging ........................................................ 314
Guided Exercise: Troubleshooting Authentication and Messaging ................................ 318
Troubleshooting OpenStack Networking, Image, and Volume Services ........................ 322
Guided Exercise: Troubleshooting OpenStack Networking, Image, and Volume
Services .............................................................................................................. 329
Lab: Troubleshooting OpenStack ............................................................................ 339
Summary ............................................................................................................ 349
8. Monitoring Cloud Metrics for Autoscaling                                                                                                                                     351
Describing OpenStack Telemetry Architecture ......................................................... 352
Quiz: Describing OpenStack Telemetry Architecture ................................................. 358
Analyzing Cloud Metrics for Autoscaling ................................................................. 360
Guided Exercise: Analyzing Cloud Metrics for Autoscaling .......................................... 371
Lab: Monitoring Cloud Metrics for Autoscaling ........................................................ 378
Summary ............................................................................................................ 388
9. Orchestrating Deployments                                                                                                                                                                           389
Describing Orchestration Architecture .................................................................... 390
Quiz: Describing Orchestration Architecture ............................................................ 394
Writing Heat Orchestration Templates .................................................................... 396
Guided Exercise: Writing Heat Orchestration Templates ............................................ 406
Configuring Stack Autoscaling ................................................................................ 418
Quiz: Configuring Stack Autoscaling ....................................................................... 427
Summary ............................................................................................................. 431

vi CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Document Conventions
Notes and Warnings

Note
"Notes" are tips, shortcuts or alternative approaches to the task at hand. Ignoring a
note should have no negative consequences, but you might miss out on a trick that
makes your life easier.

Important
"Important" boxes detail things that are easily missed: configuration changes that
only apply to the current session, or services that need restarting before an update
will apply. Ignoring a box labeled "Important" will not cause data loss, but may cause
irritation and frustration.

Warning
"Warnings" should not be ignored. Ignoring warnings will most likely cause data loss.

References
"References" describe where to find external documentation relevant to a subject.

CL210-RHOSP10.1-en-2-20171006 vii

Rendered for Nokia. Please do not distribute.


viii

Rendered for Nokia. Please do not distribute.


Introduction
Red Hat OpenStack Administration II
Red Hat OpenStack Administration II (CL210) is designed for system administrators who intend
to implement a cloud computing environment using OpenStack. Students will learn how to
configure, use, and maintain Red Hat OpenStack Platform.

The focus of this course is managing OpenStack using the unified command-line interface,
managing instances, and maintaining an enterprise deployment of OpenStack. Exam
competencies covered in the course include: expand compute nodes on Red Hat OpenStack
Platform using the undercloud (Red Hat OpenStack Platform director); manage images,
networking, object storage, and block storage; provide orchestration and autoscaling (scale-out
and scale-in); and build a customized image.

Objectives
• Expand compute nodes on the overcloud.

• Customize instances.

• Troubleshoot individual services as well as OpenStack holistically.

• Manage the migration of live instances.

• Create templates and configure autoscaling of stacks.

Audience
• Cloud administrators, cloud operators, and system administrators interested in, or responsible
for, maintaining a private cloud.

Prerequisites
• Red Hat Certified System Administrator (RHCSA in Red Hat Enterprise Linux) certification or
equivalent experience.

• Red Hat OpenStack Administration I (CL110) course or equivalent experience.

CL210-RHOSP10.1-en-2-20171006 ix

Rendered for Nokia. Please do not distribute.


Introduction

Orientation to the Classroom Environment

Figure 0.1: CL210 classroom architecture

Student systems share an external IPv4 network, 172.25.250.0/24, with a gateway of


172.25.250.254 (workstation.lab.example.com). DNS services for the private network
are provided by 172.25.250.254. The OpenStack overcloud virtual machines share internal
IPv4 networks, 172.24.X.0/24 and connect to the undercloud virtual machine and power
interfaces on the 172.25.249.0/24 network. The networks used by instances include the
192.168.Y.0/24 IPv4 networks and allocate from 172.25.250.0/24 for public access.

The workstation virtual machine is the only one that provides a graphical user interface.
In most cases, students should log in to the workstation virtual machine and use ssh to
connect to the other virtual machines. A web browser can also be used to log in to the Red Hat
OpenStack Platform Dashboard web interface. The following table lists the virtual machines that
are available in the classroom environment:

Classroom Machines
Machine name IP addresses Role
workstation.lab.example.com, 172.25.250.254, Graphical workstation
workstationN.example.com 172.25.252.N
director.lab.example.com 172.25.250.200, Undercloud node
172.25.249.200

x CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


OpenStack Packages and Documentation

Machine name IP addresses Role


power.lab.example.com 172.25.250.100, IPMI power management of
172.25.249.100, nodes
172.25.249.101+
controller0.overcloud.example.com 172.25.250.1, Overcloud controller node
172.25.249.P,
172.24.X.1
compute0.overcloud.example.com 172.25.250.2, Overcloud first compute
172.25.249.R, node
172.24.X.2
compute1.overcloud.example.com 172.25.250.12, Overcloud additional
172.25.249.S, compute node
172.24.X.12
ceph0.overcloud.example.com 172.25.250.3, Overcloud storage node
172.25.249.T,
172.24.X.3
classroom.example.com 172.25.254.254, Classroom utility server
172.25.252.254,
172.25.253.254

The environment runs a central utility server, classroom.example.com, which acts as a


NAT router for the classroom network to the outside world. It provides DNS, DHCP, HTTP,
and other content services to the student lab machines. It uses two alternative names,
content.example.com and materials.example.com, to provide course content used in the
hands-on exercises.

Note
Access to the classroom utility server is restricted; shell access is unavailable.

System and Application Credentials


System credentials User name Password
Unprivileged shell login student student
Privileged shell login root redhat

OpenStack Packages and Documentation


Repositories suitable for package installation are available at http://
content.example.com/rhosp10.1/x86_64/dvd/. This URL also provides a docs
subdirectory, containing a documentation snapshot in PDF format (docs/pdfs) and HTML
format (docs/html).

Lab Exercise Setup and Grading


Most activities use the lab command, executed on workstation, to prepare and evaluate
exercises. The lab command takes two arguments: the activity's name and a verb of setup,
grade, or cleanup.

CL210-RHOSP10.1-en-2-20171006 xi

Rendered for Nokia. Please do not distribute.


Introduction

• The setup verb is used at the beginning of an exercise or lab. It verifies that the systems are
ready for the activity, possibly making some configuration changes to them.
• The grade verb is executed at the end of a lab. It provides external confirmation that the
activity's requested steps were performed correctly.
• The cleanup verb can be used to selectively undo elements of the activity before moving on
to later activities.

Instructor-Led Training (ILT)


In an Instructor-Led Training classroom, students are assigned a physical computer
(foundationX.ilt.example.com), to access the virtual machines running on that host.
Students are automatically logged in to the host as user kiosk with the password redhat.

Controlling the Virtual Machines


On foundationX, the rht-vmctl command is used to work with the virtual machines. The
rht-vmctl commands in the following table must be run as kiosk on foundationX, and can
be used with controller0 (as in the examples) or any virtual machine.

rht-vmctl Commands
Action Command
Start controller0 machine. rht-vmctl start controller0
View physical console to log in and work with rht-vmctl view controller0
controller0 machine.
Reset controller0 machine to its previous rht-vmctl reset controller0
state and restart the virtual machine.
Caution: Any work generated on the disk
will be lost.

At the start of a lab exercise, if instructed to reset a single virtual machine node, then you are
expected to run rht-vmctl reset nodename on the foundationX system as the kiosk
user.

At the start of a lab exercise, if instructed to reset all virtual machines, then run the rht-vmctl
reset all command on the foundationX system as the kiosk user. In this course, however,
"resetting all virtual machines" normally refers to resetting only the overcloud nodes and the
undercloud node, as described in the following section.

Starting the Overcloud from a New Provision


The course lab environment automatically starts only the foundation lab nodes workstation,
power and director. If not yet started, then first start the course lab environment.

Use the rht-vmctl command.

[kiosk@foundationX ~]$ rht-vmctl start all

Wait sufficiently to ensure that all nodes have finished booting and initializing services. The rht-
vmctl output displays RUNNING when the nodes are initialized, but is not an indication that the
nodes have completed their startup procedures.

When ready, open a workstation console to continue. Log in as student, password student.
Confirm that the nova-compute service is running.

xii CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Starting the Overcloud from a New Provision

[student@workstation ~]$ ssh stack@director


[stack@director ~]$ openstack compute service list
+----+----------------+--------------------------+----------+---------+-------+
| ID | Binary | Host | Zone | Status | State |
+----+----------------+--------------------------+----------+---------+-------+
| 1 | nova-cert | director.lab.example.com | internal | enabled | up |
| 2 | nova-scheduler | director.lab.example.com | internal | enabled | up |
| 3 | nova-conductor | director.lab.example.com | internal | enabled | up |
| 4 | nova-compute | director.lab.example.com | nova | enabled | down |
+----+----------------+--------------------------+----------+---------+-------+

Verify that the nova-compute service is up, or comes up within 60 seconds. Uncommonly,
after environment resets, nova-compute can appear to remain in a down state. Restart nova-
compute to resolve this issue. Although the openstack-service restart nova-compute
command works correctly, using the systemctl cpmmand may be faster because it is a lower
level operating system request. Use of sudo, for root privilege, is required.

[stack@director ~]$ sudo systemctl restart openstack-nova-compute


[stack@director ~]$ openstack compute service list
+----+----------------+--------------------------+----------+---------+-------+
| ID | Binary | Host | Zone | Status | State |
+----+----------------+--------------------------+----------+---------+-------+
| 1 | nova-cert | director.lab.example.com | internal | enabled | up |
| 2 | nova-scheduler | director.lab.example.com | internal | enabled | up |
| 3 | nova-conductor | director.lab.example.com | internal | enabled | up |
| 4 | nova-compute | director.lab.example.com | nova | enabled | up |
+----+----------------+--------------------------+----------+---------+-------+

Do not continue until the nova-compute service is up.

Determine whether the overcloud nodes are actually running, from the viewpoint of the
hypervisor environment underneath the virtual machines, not from the viewpoint of the
openstack server list. For a newly provisioned environment, the overcloud nodes will still
be off, but it is recommended practice to always check.

Use the rht-vmctl command. Are the overcloud nodes controller0, ceph0 and compute0
still DEFINED as expected?

[kiosk@foundationX ~]$ rht-vmctl status all

Return to the director system and start each node using the openstack command. Under all
normal circumstances, do not use rht-vmctl to start overcloud nodes!

Include compute1 only when working in the chapter where the second compute node is built and
used. In all other chapters, compute1 is powered off and ignored.

[stack@director ~]$ openstack server list -c Name -c Status -c Networks


+-------------------------+---------+------------------------+
| Name | Status | Networks |
+-------------------------+---------+------------------------+
| overcloud-compute-0 | SHUTOFF | ctlplane=172.25.249.P |
| overcloud-cephstorage-0 | SHUTOFF | ctlplane=172.25.249.Q |
| overcloud-controller-0 | SHUTOFF | ctlplane=172.25.249.R |
+-------------------------+---------+------------------------+
[stack@director ~]$ openstack server start overcloud-controller-0
[stack@director ~]$ openstack server start overcloud-cephstorage-0
[stack@director ~]$ openstack server start overcloud-compute-0

CL210-RHOSP10.1-en-2-20171006 xiii

Rendered for Nokia. Please do not distribute.


Introduction

Stopping Cleanly at the End of a Session


When finished for the day or whenever you are done practicing for a while, you may shut down
your course lab environment safely. Start by shutting down the overcloud nodes.

[stack@director ~]$ openstack server list -c Name -c Status -c Networks


+-------------------------+--------+------------------------+
| Name | Status | Networks |
+-------------------------+--------+------------------------+
| overcloud-compute-0 | ACTIVE | ctlplane=172.25.249.P |
| overcloud-cephstorage-0 | ACTIVE | ctlplane=172.25.249.Q |
| overcloud-controller-0 | ACTIVE | ctlplane=172.25.249.R |
+-------------------------+--------+------------------------+
[stack@director ~]$ openstack server stop overcloud-controller-0
[stack@director ~]$ openstack server stop overcloud-cephstorage-0
[stack@director ~]$ openstack server stop overcloud-compute-0

Wait until OpenStack has stopped the overcloud nodes, then shut down the rest of the
environment.

Use the rht-vmctl command to stop the remaining virtual machines.

[kiosk@foundationX ~]$ rht-vmctl stop all

Starting After an Unclean Shutdown


If your classroom system or environment was shutdown without using the clean shutdown
procedure above, you may experience an environment where the OpenStack knowledge
about the nodes does not match the physical or running status of the nodes. This is simple to
determine and resolve.

As with a clean startup, verify that the nova-compute service is up. Use the sudo systemctl
command if necessary. Do not continue until the nova-compute service is up.

[stack@director ~]$ sudo systemctl restart openstack-nova-compute


[stack@director ~]$ openstack compute service list
+----+----------------+--------------------------+----------+---------+-------+
| ID | Binary | Host | Zone | Status | State |
+----+----------------+--------------------------+----------+---------+-------+
| 1 | nova-cert | director.lab.example.com | internal | enabled | up |
| 2 | nova-scheduler | director.lab.example.com | internal | enabled | up |
| 3 | nova-conductor | director.lab.example.com | internal | enabled | up |
| 4 | nova-compute | director.lab.example.com | nova | enabled | up |
+----+----------------+--------------------------+----------+---------+-------+

At this point, it is expected that the overcloud nodes are not running yet, because the course
lab environment only auto-starts workstation, power and director. Check using the rht-
vmctl.

[kiosk@foundationX ~]$ rht-vmctl status all

This is an important step


The node status at the hypervisor level determines the correct command to use from director
such that both the hypervisor and director agree about the overcloud nodes’ state. Return to
director and determine the overcloud node status from an OpenStack viewpoint.

xiv CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Checking the Health of the Overcloud Environment

[stack@director ~]$ openstack server list

Use the following table to determine the correct command to use to either start or synchronize
the state or the overcloud nodes. The hypervisor state is down the left column. The OpenStack
state is along the top row.

Action choice depends on the hypervisor and OpenStack states


SHUTOFF ACTIVE
DEFINED openstack server start <node> openstack server
reboot <node>
RUNNING nova reset-state No action needed
--active <node>

The exception to the rule, for crtitical scenarios


Starting overcloud nodes from director physically starts nodes at the hypervisor level. On rare
occasions, nodes appear to start in OpenStack, but remain DEFINED in the rht-vmctl status.
To resolve, use rht-vmctl start for each affected node.

This is also the resolution for hung or unresponsive nodes. Resolve using rht-vmctl
poweroff. Once the node powers off, use rht-vmctl start to boot only the affected node.

Checking the Health of the Overcloud Environment


The overcloud-health-check script is provided for the ability to check the general health of
the course overcloud environment at any time. This script is invoked, with a prompt that allows
skipping, at the beginning of every exercise setup script. Invoke this script manually at any time
to verify the overcloud.

[student@workstation ~]$ lab overcloud-health-check setup

Checking the health of the overcloud:

This script's initial task thoroughly validates the overcloud environment,"


taking a minute or more, but checking is not required before each exercise."
If you are without overcloud problems, with a stable environment, say (n)o."
Pressing 'Enter' or allowing the 20 second timeout will default to (n)o."

You should *always* say (y)es if any of the following conditions are true:"
- You have just reset the overcloud nodes using "rht-vmctl reset" in ILT."
- You have just reset the overcloud nodes from the Online course dashboard."
- You have restarted or rebooted overcloud nodes or any critical services."
- You suspect your environment has a problem and would prefer to validate."

[?] Check the overcloud environment? (y|N)

Verifying overcloud nodes

· Retrieving state for overcloud-compute-0.................... SUCCESS


· Retrieving state for overcloud-cephstorage-0................ SUCCESS
· Retrieving state for overcloud-controller-0................. SUCCESS
· Waiting for overcloud-compute-0 to be available............. SUCCESS
· Waiting for overcloud-cephstorage-0 to be available......... SUCCESS
· Waiting for overcloud-controller-0 to be available.......... SUCCESS
· Verifying ceph0 access...................................... SUCCESS
· Starting ceph0 disk arrays and restarting ceph.target....... SUCCESS
· Verifying ceph0 service, please wait........................ SUCCESS

CL210-RHOSP10.1-en-2-20171006 xv

Rendered for Nokia. Please do not distribute.


Introduction

· Checking RabbitMQ (5m timer)................................ SUCCESS


· Ensuring the Downloads directory exists..................... SUCCESS
· Ensuring OpenStack services are running, please wait........ SUCCESS

Ceph Command Sumnmary


As a general troubleshooting technique, these commands can be perfomed at any time if the
Ceph services are found to be down or unresponsive. These commands are also built into the
overcloud-health-check and are performed when that script is run.

[student@workstation ~]$ ssh stack@director


[stack@director ~]$ ssh heat-admin@ceph0
[heat-admin@overcloud-ceph-storage-0 ~]$ systemctl list-units ceph\*
UNIT LOAD ACTIVE SUB DESCRIPTION
ceph-osd@0.service loaded active running Ceph object storage daemon
ceph-osd@1.service loaded active running Ceph object storage daemon
ceph-osd@2.service loaded active running Ceph object storage daemon
ceph-mon.target loaded active running ceph target allowing to start...
ceph-osd.target loaded active running ceph target allowing to start...
ceph-radosgw.target loaded active running ceph target allowing to start...
ceph.target loaded active running ceph target allowing to start...
... output omitted ...

You should see three ceph-osd@# services. If these services do not exist at all, then the systemd
services that were to create the OSD services for each disk device did not complete successfully.
In this scenario, manually create the OSDs by starting these device services:

[heat-admin@overcloud-ceph-storage-0 ~]$ sudo systemctl start ceph-disk@dev-vdb1


[heat-admin@overcloud-ceph-storage-0 ~]$ sudo systemctl start ceph-disk@dev-vdb2
[heat-admin@overcloud-ceph-storage-0 ~]$ sudo systemctl start ceph-disk@dev-vdc1
[heat-admin@overcloud-ceph-storage-0 ~]$ sudo systemctl start ceph-disk@dev-vdc2
[heat-admin@overcloud-ceph-storage-0 ~]$ sudo systemctl start ceph-disk@dev-vdd1
[heat-admin@overcloud-ceph-storage-0 ~]$ sudo systemctl start ceph-disk@dev-vdd2

These ceph-disk services will complete and then exit when their corresponding OSD service is
created. If the ceph-disk services exist in a failed state, then an actual problem exists with the
physical or virtual storage devices used as the ceph storage: /dev/vdb, /dev/vdc, and /dev/vdd.

If the ceph-osd@# services exist in a failed state, they can usually fixed by restarting them.

[heat-admin@overcloud-ceph-storage-0 ~]$ sudo systemctl restart ceph-dis@0


[heat-admin@overcloud-ceph-storage-0 ~]$ sudo systemctl restart ceph-disk@1
[heat-admin@overcloud-ceph-storage-0 ~]$ sudo systemctl restart ceph-disk@2

The above three commands are equivalent to the single command below. Target services are
designed to simplify starting sets of services or for declaring the services that represent a
functional state. After starting the OSDs, use the ceph -s command to verify that ceph has a
status of HEALTH_OK.

[heat-admin@overcloud-ceph-storage-0 ~]$ sudo systemctl restart ceph.target0


[heat-admin@overcloud-ceph-storage-0 ~]$ sudo ceph -s
cluster 8b57b9ee-a257-11e7-bac9-52540001fac8
health HEALTH_OK
monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0}
election epoch 5, quorum 0 overcloud-controller-0
osdmap e47: 3 osds: 3 up, 3 in

xvi CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


To Reset Your Environment

flags sortbitwise,require_jewel_osds
pgmap v1004: 224 pgs, 6 pools, 4751 MB data, 1125 objects
5182 MB used, 53152 MB / 58334 MB avail
224 active+clean

To Reset Your Environment


Critical concept
When the CL210 coursebook instructs to reset virtual machines, the intention is to reset only the
overcloud to an initial state. Unless something else is wrong with any physical system or online
environment that is deemed unfixable, there is no reason to reset all virtual machines or to re-
provision a new lab environment.

What "resetting the overcloud" means


Whether you are working in a physical or online environment, certain systems never need to
be reset, because they remain materially unaffected by exercises and labs. This table lists the
systems never to be reset and those intending to be reset as a group during this course:

Which systems normally should or should not be reset


never to be reset always reset as a group
classroom controller0

workstation compute0

power compute1

ceph0

director

Technically, the director system is the undercloud. However, in the context of "resetting the
overcloud", director must be included because director's services and databases are full of
control, management and monitoring information about the overcloud it is managing. Therefore,
to reset the overcloud without resetting director is to load a fresh overcloud with director
still retaining stale information about the previous overcloud just discarded.

In a physical clkassroom, use the rht-vmctl command to reset only the relevant nodes.
Although you can type one rht-vmctl command per node, which is tedious, there is an
interactive option to choose which nodes to reset and which nodes to skip. Don't forget the -i
option or else you will inadvertently reset all of your virtual machines. While not catastrophic, it
can be an annoying time-waster.

[kiosk@foundation ~]$ rht-vmctl reset -i all


Are you sure you want to rest workstation? (y/n) n
Are you sure you want to rest director? (y/n) y
Are you sure you want to rest controller0? (y/n) y
Are you sure you want to rest compute0? (y/n) y
Are you sure you want to rest compute1? (y/n) n
Are you sure you want to rest ceph0? (y/n) y
Are you sure you want to rest power? (y/n) n
Powering off director.
Resetting director.
Creating virtual machine disk overlay for cl210-director-vda.qcow2
Starting director.

CL210-RHOSP10.1-en-2-20171006 xvii

Rendered for Nokia. Please do not distribute.


Introduction

Powering off controller0.


Resetting controller0.
Creating virtual machine disk overlay for cl210-controller0-vda.qcow2
Powering off compute0.
Resetting compute0.
Creating virtual machine disk overlay for cl210-compute0-vda.qcow2
Powering off ceph0.
Resetting ceph0.
Creating virtual machine disk overlay for cl210-ceph0-vda.qcow2
Creating virtual machine disk overlay for cl210-ceph0-vdb.qcow2
Creating virtual machine disk overlay for cl210-ceph0-vdc.qcow2
Creating virtual machine disk overlay for cl210-ceph0-vdd.qcow2

The director node is configured to start automatically, while the overcloud nodes are
configured to not start automatically. This is the same behavior as a newly provisioned lab
environment. Give director sufficient time to finish booting and initializing services, then ssh
to director to complete the normal overcloud nodes startup tasks.

[student@workstation ~]$ ssh stack@director


[stack@director ~]$ openstack compute service list
[stack@director ~]$ openstack server list
[stack@director ~]$ openstack server start overcloud-controller-0
[stack@director ~]$ openstack server start overcloud-cephstorage-0
[stack@director ~]$ openstack server start overcloud-compute-0
[stack@director ~]$ openstack server start overcloud-compute-1

Wait sufficiently to allow overcloud nodes to finish booting and initializing services. Then use
the health check script to validate the overcloud lab environment.

[stack@director ~]$ exit


[student@workstation ~]$ lab overcloud-health-check setup

What if "resetting the overcloud" does not result in a stable anvironment?


Resettting the covercloud properly always creates a stable environment. There could be further
technical issues outside of simple control in a physical system or the online environment. It is
possible to have improper disks, foundation domain configuration, images, blueprints, or virtual
volumes, or damage caused by misconfiguration, typing mistakes, misuse of setup scripts, or
neglect to use cleanup scripts.

To reset everything, if deemed necessary, takes time but results in a fresh environment.

Use rht-vmctl fullreset to pull down and start clean disk images from the classroom
system.

[kiosk@foundationX ~]$ rht-vmctl fullreset all

After the environment is re-provisioned, start again with the instructions for a new environment.

xviii CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Internationalization

Internationalization

Language support
Red Hat Enterprise Linux 7 officially supports 22 languages: English, Assamese, Bengali, Chinese
(Simplified), Chinese (Traditional), French, German, Gujarati, Hindi, Italian, Japanese, Kannada,
Korean, Malayalam, Marathi, Odia, Portuguese (Brazilian), Punjabi, Russian, Spanish, Tamil, and
Telugu.

Per-user language selection


Users may prefer to use a different language for their desktop environment than the system-
wide default. They may also want to set their account to use a different keyboard layout or input
method.

Language settings
In the GNOME desktop environment, the user may be prompted to set their preferred language
and input method on first login. If not, then the easiest way for an individual user to adjust their
preferred language and input method settings is to use the Region & Language application. Run
the command gnome-control-center region, or from the top bar, select (User) > Settings.
In the window that opens, select Region & Language. The user can click the Language box and
select their preferred language from the list that appears. This will also update the Formats
setting to the default for that language. The next time the user logs in, these changes will take
full effect.

These settings affect the GNOME desktop environment and any applications, including gnome-
terminal, started inside it. However, they do not apply to that account if accessed through an
ssh login from a remote system or a local text console (such as tty2).

Note
A user can make their shell environment use the same LANG setting as their graphical
environment, even when they log in through a text console or over ssh. One way to do
this is to place code similar to the following in the user's ~/.bashrc file. This example
code will set the language used on a text login to match the one currently set for the
user's GNOME desktop environment:

i=$(grep 'Language=' /var/lib/AccountService/users/${USER} \


| sed 's/Language=//')
if [ "$i" != "" ]; then
export LANG=$i
fi

Japanese, Korean, Chinese, or other languages with a non-Latin character set may not
display properly on local text consoles.

Individual commands can be made to use another language by setting the LANG variable on the
command line:

[user@host ~]$ LANG=fr_FR.utf8 date

CL210-RHOSP10.1-en-2-20171006 xix

Rendered for Nokia. Please do not distribute.


Introduction

jeu. avril 24 17:55:01 CDT 2014

Subsequent commands will revert to using the system's default language for output. The locale
command can be used to check the current value of LANG and other related environment
variables.

Input method settings


GNOME 3 in Red Hat Enterprise Linux 7 automatically uses the IBus input method selection
system, which makes it easy to change keyboard layouts and input methods quickly.

The Region & Language application can also be used to enable alternative input methods. In the
Region & Language application's window, the Input Sources box shows what input methods are
currently available. By default, English (US) may be the only available method. Highlight English
(US) and click the keyboard icon to see the current keyboard layout.

To add another input method, click the + button at the bottom left of the Input Sources window.
An Add an Input Source window will open. Select your language, and then your preferred input
method or keyboard layout.

Once more than one input method is configured, the user can switch between them quickly by
typing Super+Space (sometimes called Windows+Space). A status indicator will also appear
in the GNOME top bar, which has two functions: It indicates which input method is active, and
acts as a menu that can be used to switch between input methods or select advanced features of
more complex input methods.

Some of the methods are marked with gears, which indicate that those methods have advanced
configuration options and capabilities. For example, the Japanese Japanese (Kana Kanji) input
method allows the user to pre-edit text in Latin and use Down Arrow and Up Arrow keys to
select the correct characters to use.

US English speakers may find also this useful. For example, under English (United States) is the
keyboard layout English (international AltGr dead keys), which treats AltGr (or the right Alt)
on a PC 104/105-key keyboard as a "secondary-shift" modifier key and dead key activation key
for typing additional characters. There are also Dvorak and other alternative layouts available.

Note
Any Unicode character can be entered in the GNOME desktop environment if the user
knows the character's Unicode code point, by typing Ctrl+Shift+U, followed by the
code point. After Ctrl+Shift+U has been typed, an underlined u will be displayed to
indicate that the system is waiting for Unicode code point entry.

For example, the lowercase Greek letter lambda has the code point U+03BB, and can be
entered by typing Ctrl+Shift+U, then 03bb, then Enter.

System-wide default language settings


The system's default language is set to US English, using the UTF-8 encoding of Unicode as its
character set (en_US.utf8), but this can be changed during or after installation.

From the command line, root can change the system-wide locale settings with the localectl
command. If localectl is run with no arguments, it will display the current system-wide locale
settings.

xx CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Language packs

To set the system-wide language, run the command localectl set-locale LANG=locale,
where locale is the appropriate $LANG from the "Language Codes Reference" table in this
chapter. The change will take effect for users on their next login, and is stored in /etc/
locale.conf.

[root@host ~]# localectl set-locale LANG=fr_FR.utf8

In GNOME, an administrative user can change this setting from Region & Language and clicking
the Login Screen button at the upper-right corner of the window. Changing the Language of
the login screen will also adjust the system-wide default language setting stored in the /etc/
locale.conf configuration file.

Important
Local text consoles such as tty2 are more limited in the fonts that they can display
than gnome-terminal and ssh sessions. For example, Japanese, Korean, and Chinese
characters may not display as expected on a local text console. For this reason, it may
make sense to use English or another language with a Latin character set for the
system's text console.

Likewise, local text consoles are more limited in the input methods they support, and
this is managed separately from the graphical desktop environment. The available
global input settings can be configured through localectl for both local text virtual
consoles and the X11 graphical environment. See the localectl(1), kbd(4), and
vconsole.conf(5) man pages for more information.

Language packs
When using non-English languages, you may want to install additional "language packs" to
provide additional translations, dictionaries, and so forth. To view the list of available langpacks,
run yum langavailable. To view the list of langpacks currently installed on the system,
run yum langlist. To add an additional langpack to the system, run yum langinstall
code, where code is the code in square brackets after the language name in the output of yum
langavailable.

References
locale(7), localectl(1), kbd(4), locale.conf(5), vconsole.conf(5),
unicode(7), utf-8(7), and yum-langpacks(8) man pages

Conversions between the names of the graphical desktop environment's X11 layouts and
their names in localectl can be found in the file /usr/share/X11/xkb/rules/
base.lst.

CL210-RHOSP10.1-en-2-20171006 xxi

Rendered for Nokia. Please do not distribute.


Introduction

Language Codes Reference


Language Codes
Language $LANG value
English (US) en_US.utf8
Assamese as_IN.utf8
Bengali bn_IN.utf8
Chinese (Simplified) zh_CN.utf8
Chinese (Traditional) zh_TW.utf8
French fr_FR.utf8
German de_DE.utf8
Gujarati gu_IN.utf8
Hindi hi_IN.utf8
Italian it_IT.utf8
Japanese ja_JP.utf8
Kannada kn_IN.utf8
Korean ko_KR.utf8
Malayalam ml_IN.utf8
Marathi mr_IN.utf8
Odia or_IN.utf8
Portuguese (Brazilian) pt_BR.utf8
Punjabi pa_IN.utf8
Russian ru_RU.utf8
Spanish es_ES.utf8
Tamil ta_IN.utf8
Telugu te_IN.utf8

xxii CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


TRAINING
CHAPTER 1

MANAGING AN ENTERPRISE
OPENSTACK DEPLOYMENT

Overview
Goal Manage the Undercloud, the Overcloud, and related services.
Objectives • Describe the Undercloud architecture and the Overcloud
architecture.

• Describe the Undercloud components used for building the


Overcloud.

• Verify the functionality of the Undercloud and the


Overcloud services.
Sections • Describing Undercloud and Overcloud Architectures (and
Quiz)

• Describing Undercloud Components (and Guided Exercise)

• Verifying the Functionality of Undercloud and Overcloud


Services (and Guided Exercise)
Lab • Managing an Enterprise OpenStack Deployment

CL210-RHOSP10.1-en-2-20171006 1

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

Describing Undercloud and Overcloud


Architectures

Objectives
After completing this section, students should be able to:

• Describe the OpenStack overcloud architecture and terminology.

• Describe the OpenStack undercloud architecture and terminology.

• Describe the benefits of using OpenStack to install OpenStack.

Introducing Red Hat OpenStack Platform


The Red Hat OpenStack Platform consists of interacting components implemented as services
that control computing, storage, and networking resources. Cloud administrators manage their
infrastructure to configure, control, and automate the provisioning and monitoring of OpenStack
resources. Figure 1.1: OpenStack core components provides an overview of the OpenStack
architecture as presented in the prerequisite OpenStack Administration I (CL110) course.

Figure 1.1: OpenStack core components

The following table reviews the OpenStack core services. Together, these components provide
the services necessary to deploy either tenant workload systems or OpenStack infrastructure
systems.

2 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Introducing Red Hat OpenStack Platform

OpenStack Core Component Services


Service Function
Dashboard Provides a modular, graphical user interface to manage OpenStack. It
can launch server instances, configure networking topology, set role-
based access controls, provision persistent and ephemeral storage,
monitor run-time metrics, and organize projects and users.
Identity Provides user authentication and authorization to OpenStack
components. Identity supports multiple authentication mechanisms,
including user name and password credentials, tokens, and other
authentication protocols. As the central user and service account
catalog, Identity acts as a single sign-on (SSO) for command line and
graphical end user activity and the inter-component service API.
OpenStack Networking Provides the creation and management of a virtual networking
infrastructure in an OpenStack cloud, including networks, subnets,
routers, firewalls, and virtual private networks (VPN). Designed as
a pluggable architecture, OpenStack Networking supports multiple
vendors and networking technologies.
Block Storage Provides persistent block storage and management to create
and delete virtual disk devices, and to attach and detach server
instance block devices. It also manages snapshots, backups, and boot
functionality.
Compute Provides and schedules on-demand virtual machines deployed and run
on preconfigured compute nodes operating on nested virtualization or
bare metal hardware. The Compute service scales by adding additional
virtualization resources, such as hypervisor hosts utilizing libvirtd,
Qemu, and KVM technologies.
Image Storage Provides a registry service for virtual disk images, storing prebuilt
images, system snapshots, and vendor-supplied appliances for
retrieval and use as templates to deploy server instances and
applications.
Object Storage Provides HTTP-accessible, redundant, distributed, and replicated
storage for large amounts of data, including static entities such as
pictures, videos, email messages, files, disk images, and backups.
Telemetry Provides central collection, storage and retrieval for user-level usage
metrics on OpenStack clouds. Data is collected from component-aware
agent notifications or infrastructure polling, used for alerting, system
monitoring, customer billing, and implementing advanced features
such as auto scaling.
Orchestration Provides a template-based methodology for creating and managing
OpenStack cloud storage, networking and compute resources. A heat
orchestration template (HOT) defines a collection of resources, known
as a stack, to be provisioned and deployed as a single, repeatable,
running entity. In addition to recognizing essential resource types
such as server instances, subnets, volumes, security groups, and
floating IPs, templates provide additional configuration for advanced
functionality, such as high availability, auto-scaling, authentication,
and nested stacks.

CL210-RHOSP10.1-en-2-20171006 3

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

The OpenStack core components provide a comprehensive set of services to provision end user
cloud workloads consisting of deployed server instances organized by tenant projects. With
orchestration, arrangements of complex multi-server applications have become easy to define
and deploy with push-button simplicity. Still, the installation and management of OpenStack
cloud infrastructure itself has remained difficult to master and maintain, until the introduction of
Red Hat OpenStack Platform (RHOSP) director.

The RHOSP director is a standalone OpenStack all-in-one installation, providing a tool set
for installing and managing a complete OpenStack infrastructure environment. It is based
primarily on the OpenStack Deployment component developed in the TripleO project, which
is an abbreviation for "OpenStack-On-OpenStack". The Deployment service uses OpenStack
components running on the dedicated all-in-one installation (the undercloud) to install an
operational OpenStack cloud (the overcloud), utilizing extended core components, plus new
components, to locate, provision, deploy and configure bare metal systems as OpenStack
controller, compute, networking and storage nodes. The following table describes the OpenStack
deployment component services.

OpenStack Component Services for OpenStack-On-OpenStack


Service Function
Orchestration for Provides a set of YAML-based templates to define configuration and
TripleO provisioning instructions to deploy OpenStack infrastructure servers.
Orchestration, defined previously as a core component, defines server
roles to provision OpenStack infrastructure.
Bare Metal Enables provisioning server instance deployments to physical
Provisioning (bare metal) machines using hardware-specific drivers. Bare Metal
Provisioning integrates with the Compute service to provision the
bare metal machines in the same way as virtual machines, first
introspecting the physical machines to obtain hardware attributes and
configuration.
Workflow Managed by the Mistral workflow service. A user typically writes a
workflow using workflow language based on YAML and uploads the
workflow definition to Mistral with its REST API. Then user can start
this workflow manually using the same API, or configure a trigger
to start the workflow on some event. Provides a set of workflows
for certain RHOSP director-specific actions, such as importing and
deploying plans.
Messaging Provides a secure and scalable messaging service for providing
asynchronous communication for intra-cloud applications. Other
OpenStack components integrate with Messaging to provide
functional equivalence to third-party Simple Queue Service (SQS) and
Simple Notification Service (SNS) services. Messaging provides the
communication for the Workflow service.
Deployment Provides a tool set for installing, upgrading and operating OpenStack
clouds using OpenStack components and methods.

Introducing the Undercloud and Overcloud


The Red Hat OpenStack Platform uses a pair of terms to distinguish between the standalone
RHOSP director cloud used to deploy and manage production clouds, and the production cloud
or clouds used to deploy and manage end-user production workloads: undercloud and overcloud.

4 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Introducing the Undercloud and Overcloud

Figure 1.2: An undercloud deploys an overcloud

The undercloud is the Red Hat OpenStack Platform director machine itself, plus the provisioning
network and resources required to perform undercloud tasks. During the building process for the
overcloud, the machine nodes being provisioned to become controller, compute, network, and
storage systems are considered to be the workload of the undercloud. When deployment and all
configuration stages are complete, these nodes reboot to become the overcloud.

The overcloud is a Red Hat OpenStack Platform environment resulting from a template


configuration deployed from the undercloud. Prior to the introduction of the undercloud, any
similar Red Hat OpenStack Platform environment would have simply been called the cloud.
Using the terms undercloud and overcloud provides a distinction between the two Red Hat
OpenStack Platform installations. Each cloud has a complete set of component services,
endpoints, authentication, and purpose. To access and manage the undercloud, connect to the
Identity service endpoint of the RHOSP director system. To access and manage the overcloud,
connect to the Identity service endpoint on a controller system in the overcloud.

Stated again: the undercloud installs the overcloud. However, the undercloud is not only an
installation tool set. It is a comprehensive platform for managing, monitoring, upgrading, scaling
and deleting overclouds. Currently, the undercloud supports deploying and managing a single
overcloud. In the future, the undercloud will allow an administrator to deploy and manage many
tenant overclouds.

What about Packstack?


The Puppet-based Packstack installer was the original tool for effective installations of Red Hat
OpenStack Platform. Packstack is deprecated, and will be discontinued in a future Red Hat

CL210-RHOSP10.1-en-2-20171006 5

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

OpenStack Platform release. Packstack is no longer the preferred tool for common cloud
installations, but remains useful for limited use cases. Packstack was an internal tool developed
to create proof-of-concept (POC) deployments of one or possibly a few systems. First-adopter
RHOSP clients and analysts popularized it, and some have pushed the tool beyond recommended
use. Compared to RHOSP director, there are advantages and disadvantages:

Packstack advantages and disadvantages


Advantages Disadvantages
Easy to use command-line interface Command line only, no GUI, no web UI
Permits installations to preinstalled hosts Requires preinstalled hosts, no bare metal
Puppet-driven configuration is powerful Requires Puppet mastery to extend
One install host drives multi-host deployment Does not scale well to larger deployments
Permits limited changes by rerunning tool Not guaranteed safe or idempotent
Simple to use, debugged through log files No workflow or orchestration
Single controller POC installation No multiple controller or HA installs
Single customizable answer file Complex configurations are difficult
Installs controller and compute nodes No custom roles, no storage nodes
Simple configure-and-run implementation No validation, no deploy monitoring
Single interface implementation No composable roles
Installation only No upgrading, monitoring or management

Undercloud Recommended Practices


Red Hat OpenStack Platform director is a tool used to install and manage the deployment and
lifecycle of Red Hat OpenStack Platform 7 (Kilo) and later versions. It is targeted for cloud
operator use cases where managed updates, upgrades and infrastructure control are critical for
underlying OpenStack operations. It also provides an API driven framework providing hardware
introspection, environment monitoring, capacity planning, utilization metrics, service allocation
and stack management.

Lifecycle management for cloud infrastructure has operational tasks similar to legacy enterprise
management, but also incorporates new interpretations of Continuous Integration and (DevOps).
The cloud industry differentiates stages of lifecycle management by categorizing tasks as Day 0
(Planning), Day 1 (Deploying), and Day 2 (Operations).

• Planning - introspection, network topology, service parameters, resource capacity.


• Deployment - deployment orchestration, service configuration, sanity checks, testing.
• Operations - updates and upgrades, scaling up and down, change management, compliance.

As a Day 0 Planning tool, director provides default, customizable configuration files to define
cloud architecture, including networking and storage topologies, OpenStack service parameters,
and third party plugin integration. These default files and templates implement Red Hat's highly
available reference architecture and recommended practices.

Director is most commonly recognized as a Day 1 Deployment tool, performing orchestration,


configuration and validation for building overclouds. Tasks include hardware preparation,
software deployment using Puppet manifests and Tempest validation scripts, making it easier
on operators to learn and implement customizations within the director framework as a
recommended practice for consistency and re-usability.

6 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Undercloud Recommended Practices

Director is designed as central management for ongoing Day 2 Operations. It can perform
environment health checks, auto-scale an overcloud by adding or replacing nodes, minor release
updates and major version upgrades, plus patching, monitoring and regulation compliance.

To use the overcloud for Day 2 management, all management must be accomplished using the
undercloud CLI or APIs. Currently, there is no reasonable expectation that the undercloud can
detect, interpret, or reconcile manual changes not implemented through the undercloud. Using
outside tool sets loses the ability to perform safe and predictable updates, upgrades, and scaling.

Integration with third party tools that exclusively call undercloud APIs is recommended, and does
not break Day 2 operation support. Recommended examples include integration between the
undercloud and Red Hat CloudForms, Red Hat Satellite, and Ansible Tower by Red Hat.

The undercloud uses a variety of popular and stable OpenStack components to provide required
services, including the Deployment Service for image deployment, creation, and environment
templating, Bare Metal for bare metal introspection, Orchestration for component definition,
ordering, and deployment, and Puppet for post-instantiation configuration. The undercloud
includes tools that help with hardware testing, and is architected to facilitate future functionality
for automated OpenStack upgrades and patch management, centralized log collection, and
problem identification.

Overcloud nodes are deployed from the undercloud machine using a dedicated, isolated
provisioning network. Overcloud nodes must be configured to PXE boot on this provisioning
network, with network booting on other NICs disabled. These nodes must also support the
Intelligent Platform Management Interface (IPMI). Each candidate system needs to have a single
NIC on the provisioning network. This NIC must not be used for remote connectivity, because the
deployment process will reconfigure NICs for Open vSwitch bridging.

Minimal information must be gathered about candidate nodes before beginning deployment
configuration, including the MAC address of the appropriate provisioning NIC, the IP address of
the IPMI NIC, the IPMI user name and password.

Later in this course, you will view and learn the undercloud configuration used to build the
classroom overcloud on your student system. No previous undercloud knowledge is required, but
it is recommended to become proficient with the technologies mentioned in this section before
using the undercloud to deploy and manage a production environment.

References
Further information is available about RHOSP Director at
Red Hat OpenStack Platform Director Life Cycle
https://access.redhat.com/support/policy/updates/openstack/platform/director

TripleO Architecture
https://docs.openstack.org/tripleo-docs/latest/install/introduction/architecture.html

TripleO documentation
https://docs.openstack.org/tripleo-docs/latest/

CL210-RHOSP10.1-en-2-20171006 7

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

Quiz: Describing Undercloud and Overcloud


Architectures

Choose the correct answer(s) to the following questions:

1. Which tool is recommended for all production Red Hat OpenStack Platform installs?

a. The overcloud
b. Foreman
c. Packstack
d. RHOSP director (undercloud)
e. Manual package install

2. Which four of these components are services of the undercloud? (Choose four.)

a. Data Processing
b. Deployment
c. Bare Metal
d. Database
e. Orchestration
f. Workflow

3. Which four of these capabilities are part of the undercloud's duties? (Choose four.)

a. Application scaling
b. Automated upgrades
c. Patch management
d. Central log collection
e. Monitoring

8 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

Solution
Choose the correct answer(s) to the following questions:

1. Which tool is recommended for all production Red Hat OpenStack Platform installs?

a. The overcloud
b. Foreman
c. Packstack
d. RHOSP director (undercloud)
e. Manual package install

2. Which four of these components are services of the undercloud? (Choose four.)

a. Data Processing
b. Deployment
c. Bare Metal
d. Database
e. Orchestration
f. Workflow

3. Which four of these capabilities are part of the undercloud's duties? (Choose four.)

a. Application scaling
b. Automated upgrades
c. Patch management
d. Central log collection
e. Monitoring

CL210-RHOSP10.1-en-2-20171006 9

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

Describing Undercloud Components

Objectives
After completing this section, students should be able to:

• Describe the OpenStack components performing undercloud services.

• Describe the technologies that implement bare metal deployment.

• Start the overcloud from the undercloud.

Undercloud Services
Red Hat OpenStack Platform director is a deployment cloud for OpenStack infrastructure,
in which the cloud workload is the overcloud systems themselves: controllers, compute
nodes, and storage nodes. Since infrastructure nodes are commonly built directly on physical
hardware systems, the undercloud may be referred to as a bare metal cloud. However, as you
will experience in this course, an undercloud can deploy infrastructure to virtual systems for
learning, testing, and specific use cases. Similarly, overclouds almost exclusively deploy virtual
machines and containers but can be used to deploy tenant workloads directly to dedicated,
physical systems, such as blade servers or enterprise rack systems, by incorporating bare metal
drivers and methods. Therefore, the terms bare metal cloud and tenant workload cloud are only a
convenient frame of reference.

Deployment Service Architecture


The Deployment Service is an architecture designed to use native OpenStack component APIs
to configure, deploy, and manage OpenStack environments using other existing, supported
OpenStack components. By utilizing the technology of other current projects, the Deployment
Service developers can focus on creating additional technology required to manage the
deployment process instead of attempting to reimplement services already provided by other
components. When these other components receive feature requests, patches and bug fixes,
the undercloud automatically inherits these enhancements. System administrators will find the
Deployment Service architecture relatively easy to learn, because they are already experienced
with the standard OpenStack components that it uses. For example, the Deployment Service:

• stores its images in the Image service.

• creates Heat templates for resource deployment by the Orchestration service.

• obtains physical machine configuration using the Bare Metal service.

• performs complex post-deployment configuration using Puppet manifests.

• manages task interaction and prerequisite ordering using the Workflow service.

• configures network interfaces using the Networking service.

• obtains provisioning volumes from the Block Storage service.

The Deployment service generates the data required to instruct subordinate services to perform
deployment and installation tasks. It comes preconfigured with custom configurations and
sample templates for common deployment scenarios. The following table describes the primary
concepts and tasks being introduced in the Deployment service.

10 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Undercloud Services

Deployment Service Terminology


Term Definition
Bare metal provisioning Building a node starting with a machine that has no operating
system installed, or re-purposing an existing system by replacing
its boot disk and configuration. Bare metal provisioning
methodology works for building virtual machine nodes from
scratch, as is demonstrated in this course.
Introspection Discovering the attributes of physical or virtual nodes to determine
whether they meet deployment requirements. This process
requires PXE network booting candidate nodes using prebuilt
images designed to query IPMI attributes and communicate
the information back to a database-recoding service on the
undercloud.
overcloud-full A prebuilt boot-disk image containing an unconfigured but
completely installed set of OpenStack and Ceph software packages.
Used to create overcloud nodes quickly.
Orchestration Deploying the foundation configuration to the overcloud nodes
from predefined overcloud role templates tailored for a specific use
case by additional environment files.
High availability Controller nodes can be built with redundancy by creating more
than one using Pacemaker clustering to provide failover for each
component service between the controller nodes. Compute nodes,
by design, are already redundantly scalable.
Deployment roles The Deployment service comes preconfigured with a handful of
well-defined and customizable overcloud node deployment roles:
Controller API service node, Compute hypervisor node, CephStorage
Ceph block (RADOS) and object (RGW) storage node, BlockStorage
block (Cinder) storage node, and ObjectStorage object (Swift)
storage node. Node roles may be allocated by manual tagging, or
by configuring automated detection using the Automated Health
Check (AHC) tools.
Composable services A pattern-based design architecture for OpenStack node roles,
allowing custom service placement, collocation, and new service
integration beyond the five predefined deployment roles.
Workflow A predeployment plan generated to manage task ordering and
inter-communication. Workflow allows administrators to monitor
the provisioning process, troubleshoot, customize, and restart
provisioning tasks.

Orchestration Service (Heat)


The orchestration service provides a template-based engine for the undercloud, used to create
and manage resources such as storage, networking, instances, and applications as a repeatable
running environment. The default Heat templates are located at /usr/share/openstack-
tripleo-heat-templates. Templates create stacks. Stacks are collections of resources such
as server instances, virtual disk volumes, fixed and floating IP addresses, users, projects, and
configuration files. The packaged templates include working examples of multiple configurations
for tailoring a custom infrastructure stack. The following table describes the primary concepts
and entities of the Orchestration service templates.

CL210-RHOSP10.1-en-2-20171006 11

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

Orchestration Terminology
Term Definition
Resources A template section that defines infrastructure elements to deploy,
such as virtual machines, network ports, and storage disks.
Parameters A template section to define deployment-specific parameter
settings provided to satisfy template resource requirements. Most
templates define default parameters for all settings.
Outputs Output parameters dynamically generated during deployment
and specified as information required to be passed back to the
administrator. For example, public IP addresses, instance names,
and other deployment results.
Template directory A location for storing and invoking modified templates, allowing
the default templates to remain unmodified and reusable.
Environment directory A location for storing environment files. Environment files are
specific to a deployment event, containing parameter settings
defining this particular deployment. The design allows a specific
overcloud design to be reused with new resource names and
settings, without modify underlying templates. Environment files
affect the runtime behavior of a template, overriding resource
implementations and parameters.

An overcloud deployment is invoked by specifying a template directory and a location for the
environment files:

[user@undercloud]$ openstack overcloud deploy \


--templates /my_template_dir --environment_directory /my_environment_files_dir

Bare Metal Service (Ironic)


The Bare Metal provisioning service first performs introspection on each candidate node, to
query and record node-specific hardware capabilities and configuration. It also provides a
mechanism, through the use of PXE and iSCSI, to install a boot disk image on to a qualified node,
as depicted in the Figure 6.1: Bare Metal boot disk provisioning image later in this course. The
default image is called overcloud-full, which has a full set of OpenStack and Ceph software
packages preinstalled on it, ready to be configured as any of the overcloud deployment roles.
A correctly built, custom image may also be used as a deployment disk, to create a specialized
enterprise or cloud server instance.

Workflow Service (Mistral)


The Workflow service creates and implements task execution plans called workflows. Complex
multi-step deployments have task sets and interconnected task relationships that determine
order of execution and prioritization. The Workflow service provides state management, correct
execution order, parallelism, synchronization and high availability. Cloud administrators can
define and modify plans to coordinate resource building and redeployment. The Workflow service
does not perform the actual tasks, but acts as a coordinator for worker processes and manages
asynchronous event messaging and notification to track task execution. The design allows for the
creation of custom work processes and the ability to scale and be highly available.

Intelligent Platform Management Interface


The Intelligent Platform Management Interface (IPMI) is an industry-standard specification for
out-of-band management, monitoring, and configuration of computer systems. It is independent

12 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Completed Classroom Topology

of any installed operating system, providing access to a hardware-implemented, message-based


network management interface. IPMI provides the ability to perform power management tasks
(power down, power up, reboot) on systems even when an operating system or CPU is non-
functional. Management can be used to interact with a system during failures, or as part of boot
or initialization procedures. IPMI can also be used to gather run-time information about hardware
state, including component status, temperatures, voltages, and may include the ability to send
alerts. A baseboard management controller (BMC) provides the chip-level functionality of IPMI.
Commonly implemented as an embedded micro-controller, BMC manages the interaction and
reporting between relevant IPMI and system buses.

IPMI is designed as a server remote access and control interface specification. It remains
consistent across a variety of vendor hardware implementations, including CIMC, DRAC, iDRAC,
iLO, ILOM, and IMM hardware platform interfaces. The primary functions of the specification
include monitoring, power control, logging, and inventory management. IPMI is intended to be
used with systems management software, although it can be invoked directly through simple
command line utilities.

Note
In this course, the overcloud is deployed on virtual machines possessing no hardware
or IPMI layer. Instead, a single virtual machine named power emulates a separate
IPMI interface for each overcloud virtual machine. IPMI commands are sent to a
node-specific IP address on power, where virtual BMC software performs power
management activities by communicating with the hypervisor to perform platform
management requests. A subset of the IPMI specification is implemented: to power up,
power down, and obtain configuration and state notifications.

Completed Classroom Topology


On the following page, Figure 1.3: Completed classroom overcloud portrays four deployed nodes:
controller0, compute0, compute1, and ceph0. The compute1 node will be deployed later in
this chapter as an overcloud stack upgrade. Use this diagram as a reference when verifying the
live overcloud configuration.

CL210-RHOSP10.1-en-2-20171006 13

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

Figure 1.3: Completed classroom overcloud

14 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Overcloud Management

Overcloud Management
Following deployment, the overcloud can be managed from the undercloud.

Use the OpenStack CLI to start, stop, and monitor the status of the overcloud nodes. Use
openstack server list to determine the servers' current status.

[stack@director ~]$ openstack server list -c Name -c Status


+-------------------------+---------+
| Name | Status |
+-------------------------+---------+
| overcloud-compute-0 | SHUTOFF |
| overcloud-controller-0 | SHUTOFF |
| overcloud-cephstorage-0 | SHUTOFF |
+-------------------------+---------+

Use openstack server start to boot each node. The servers should be started in the order
shown. The servers may take many minutes to display an ACTIVE status, so be patient and
continue to recheck until all servers are running.

[stack@director ~]$ openstack server start overcloud-controller-0


[stack@director ~]$ openstack server start overcloud-cephstorage-0
[stack@director ~]$ openstack server start overcloud-compute-0
[stack@director ~]$ openstack server list -c Name -c Status
+-------------------------+--------+
| Name | Status |
+-------------------------+--------+
| overcloud-compute-0 | ACTIVE |
| overcloud-controller-0 | ACTIVE |
| overcloud-cephstorage-0 | ACTIVE |
+-------------------------+--------+

You may experience a scenario where the status of nodes is ACTIVE, but checking the virtual
machine power state from the online environment or the hypervisor shows the nodes are
actually powered off. In this scenario, the undercloud must instruct the nodes to be stopped first
(to synchronize the recognized node state, even though the nodes are already off) before the
nodes are started again. This can all be accomplished with one command; enter openstack
server reboot for each node.

[stack@director ~]$ openstack server reboot overcloud-controller-0


[stack@director ~]$ openstack server reboot overcloud-cephstorage-0
[stack@director ~]$ openstack server reboot overcloud-compute-0
[stack@director ~]$ openstack server list -c Name -c Status
+-------------------------+--------+
| Name | Status |
+-------------------------+--------+
| overcloud-compute-0 | REBOOT |
| overcloud-controller-0 | REBOOT |
| overcloud-cephstorage-0 | REBOOT |
+-------------------------+--------+

The nodes will first display a status of REBOOT, but will quickly switch to ACTIVE while they
continue to start.

CL210-RHOSP10.1-en-2-20171006 15

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

References
The Director Installation & Usage guide for Red Hat OpenStack Platform 10
https://access.redhat.com/documentation/en-US/index.html

The Architecture Guide for Red Hat OpenStack Platform 10


https://access.redhat.com/documentation/en-US/index.html

Intelligent Platform Management Interface Specification


https://www.intel.com/content/www/us/en/servers/ipmi/ipmi-second-gen-interface-
spec-v2-rev1-1.html

16 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Guided Exercise: Describing Undercloud Components

Guided Exercise: Describing Undercloud


Components

In this exercise, you will connect to the undercloud node, director to launch the predefined
overcloud. You will use the OpenStack CLI on the undercloud to manage the overcloud nodes.

Outcomes
You should be able to:

• Connect to and observe the undercloud system.

• Launch the overcloud from the undercloud.

Steps
1. Confirm that the infrastructure and undercloud virtual machines (workstation, power,
and director) are started and accessible.

1.1. Log in to workstation as student with a password of student.

1.2. Log in to power as student, using SSH, then exit.

[student@workstation ~]$ ssh power.lab.example.com


[student@power ~]$ exit

1.3. Log in to director as the stack user, using SSH. The login is passwordless when
coming from workstation.

[student@workstation ~]$ ssh stack@director.lab.example.com


[stack@director ~]$

2. As the stack user on director, check the status of the undercloud. If the nova-compute
service displays as down, wait until the status changes to up before continuing. The wait
should be no more than a minute or two.

2.1. Use the OpenStack CLI to list the status of the undercloud compute services.

[stack@director ~]$ openstack compute service list -c Binary -c Status -c State


+----------------+---------+-------+
| Binary | Status | State |
+----------------+---------+-------+
| nova-cert | enabled | up |
| nova-scheduler | enabled | up |
| nova-conductor | enabled | up |
| nova-compute | enabled | up |
+----------------+---------+-------+

Wait until nova-compute displays as up before trying to start the overcloud nodes.

3. As the stack user on director, check the overcloud status. If necessary, start the
overcloud.

CL210-RHOSP10.1-en-2-20171006 17

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

3.1. Use the OpenStack CLI to list the overcloud server names and current status.

[stack@director ~]$ openstack server list -c Name -c Status


+-------------------------+---------+
| Name | Status |
+-------------------------+---------+
| overcloud-compute-0 | SHUTOFF |
| overcloud-controller-0 | SHUTOFF |
| overcloud-cephstorage-0 | SHUTOFF |
+-------------------------+---------+

In the above output, the overcloud nodes are SHUTOFF and need to be started.

3.2. Use the OpenStack CLI to start the overcloud nodes in the order shown.

[stack@director ~]$ openstack server start overcloud-controller-0


[stack@director ~]$ openstack server start overcloud-cephstorage-0
[stack@director ~]$ openstack server start overcloud-compute-0

3.3. Use the OpenStack CLI to confirm that the overcloud nodes have transitioned ACTIVE.
When done, log out from director.

[stack@director ~]$ openstack server list -c Name -c Status


+-------------------------+--------+
| Name | Status |
+-------------------------+--------+
| overcloud-compute-0 | ACTIVE |
| overcloud-controller-0 | ACTIVE |
| overcloud-cephstorage-0 | ACTIVE |
+-------------------------+--------+
[stack@director ~]$ exit

18 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Note
The classroom environment uses virtualization and imaging techniques that would
not be appropriate if used for a production OpenStack infrastructure. Due to these
techniques, it is possible for the node status reported by the undercloud, the
power state determined by the IPMI service, and the actual virtual machine state
to initially be out of sync.

If an initial openstack server list command displays all nodes as ACTIVE,


but the actual virtual machines are shut down, run openstack server reboot
for each node.

[stack@director ~]$ openstack server reboot overcloud-compute-0


[stack@director ~]$ openstack server reboot overcloud-cephstorage-0
[stack@director ~]$ openstack server reboot overcloud-controller-0

If openstack server start or openstack server reboot commands


generate errors, or the nodes fail to become ACTIVE, first confirm that the nova-
compute service is up, then run the openstack server set command for
each node, followed by the openstack server reboot command for each
node. Allow each set of commands, for all three nodes, to show the expected state
before continuing with the next set of commands:

[stack@director ~]$ openstack compute service list

[stack@director ~]$ openstack server set --state active \


overcloud-compute-0
[stack@director ~]$ openstack server set --state active \
overcloud-cephstorage-0
[stack@director ~]$ openstack server set --state active \
overcloud-controller-0

[stack@director ~]$ openstack server reboot overcloud-compute-0


[stack@director ~]$ openstack server reboot overcloud-cephstorage-0
[stack@director ~]$ openstack server reboot overcloud-controller-0

CL210-RHOSP10.1-en-2-20171006 19

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

Verifying the Functionality of Overcloud


Services

Objectives
After completing this section, students should be able to:

• Locate and view output from overcloud provisioning

• Test specific overcloud functionality

• Run tests on overcloud components

Verifying an Undercloud
The undercloud is architected to be more than an installation tool. This course discusses
orchestration for both initial install and for compute node scaling. RHOSP director also performs
numerous Day 2 activities. Therefore, the undercloud system is not intended to be uninstalled
or decommissioned after the overcloud is installed. The undercloud can be checked for proper
configuration:

• view service and network configuration

• view introspection results to confirm accurate node capability assessment

• view workflow configurations

Currently, the undercloud is capable of installing a single overcloud with the stack name
overcloud. The Workflow Service is capable of managing multiple plans and stacks. In a future
release, the undercloud will be able to install, access, and manage multiple overclouds. Currently
supported ongoing activities for the undercloud include:

• monitoring the health of an overcloud

• gathering and storing metrics from an overcloud

• validating and introspecting new nodes for overcloud scaling

• performance testing of nodes, components, and scenarios

• performing minor release updates, such as security fixes

• performing automated major version upgrades to Red Hat OpenStack Platform

• auto-scaling or replacing HA controllers and compute nodes; currently, scaling storage nodes is
handled by the storage platform, not the undercloud

• managing platform-level access to infrastructure nodes, including power management

Verifying an Overcloud
Once built, an overcloud is a production infrastructure with many interacting components.
To avoid damaging live data and applications, verify installation operation before deploying
production workloads. Verifying involves multiple levels of checking:

20 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Verifying an Overcloud

• view introspection results to confirm accurate node capability assessment


• compare compute, storage and network configuration to original templates
• perform power management testing on every node, not just selected nodes
• install and run the Testing service to confirm component-specific operation
• deploy an internal-only server instance to validate console access

Viewing introspection results


The bare metal introspection process returned data to the ironic-inspector listener to store
as baremetal node parameters, which may be viewed using baremetal show. Extra data was
stored in the Object Store, one text file object per node, in an object container called ironic-
inspector. The container is owned by the ironic user in the service project. To view this
data, download the object file and parse it with a JSON tool such as jq.

Downloading an object file created by a service account (or, more broadly, to run any OpenStack
command as a service user) requires using that service user's authentication. It is not
necessary to create a permanent authentication rc file for a service account, since running
commands as a service user is not a typical or regular task. Instead, override the current
authentication environment by prepending only the service account's environment variables to
the extraordinary command. For example:

[user@demo ~]$ OS_TENANT_NAME=service OS_USERNAME=service account \


OS_PASSWORD=service password openstack object action

The container name in which the files are stored matches the name of the service which created
them; for the introspection process the service is ironic-inspector. The password for the
ironic service user is found in the undercloud-passwords.conf file. Use the openstack
baremetal node to locate the file name used to store introspection results for a node.

[user@demo ~]$ openstack baremetal node show 5206cc66-b513-4b01-ac1b-cd2d6de06b7d -c


extra
+-------+----------------------------------------------------------------+
| Field | Value |
+-------+----------------------------------------------------------------+
| extra | {u'hardware_swift_object': |
| | 'extra_hardware-5206cc66-b513-4b01-ac1b-cd2d6de06b7d'} |
+-------+----------------------------------------------------------------+

The information required to download introspection results in summarized in the following table.

Locating introspection node data objects


Parameter Value or location
service user ironic
service passwords /home/stack/undercloud-passwords.conf
container name ironic-inspector
baremetal node field name extra
parameter name in field 'hardware_swift_object'
container name ironic-inspector
object name 'extra_hardware-node-id'

The following example downloads the extra_hardware-node-id file:

CL210-RHOSP10.1-en-2-20171006 21

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

[user@demo ~]$ OS_TENANT_NAME=service OS_USERNAME=ironic \


OS_PASSWORD=260f5ab5bd24adc54597ea2b6ea94fa6c5aae326 \
openstack object save ironic-inspector \
extra_hardware-5206cc66-b513-4b01-ac1b-cd2d6de06b7d

Parse the resulting JSON structure with the jq command:

[user@demo ~]$ jq . < extra_hardware-5206cc66-b513-4b01-ac1b-cd2d6de06b7d


[
[
"disk",
"logical",
"count",
"1"
],
[
"disk",
"vda",
"size",
"42"
],
...output omitted...

The displayed data are attributes of the introspected node. This data can be used to verify
that the introspection process correctly analyzed this node, or to customize the introspection
process. Such customization is an advanced RHOSP director installation topic and is beyond the
scope of this course.

Viewing orchestration results


The orchestration process deployed each of the registered nodes as one of the standard server
roles. Compare the orchestration templates and environment files to those finished servers.
To browse those servers, use the heat-admin user, the same Linux user account used by
orchestration to access and configure the systems using SSH. When using the provisioning
network for direct access from director, the stack user has password-less SSH access. The
heat-admin user has sudo privileges; use sudo -i to switch user to root password-less. View
the following resources to verify the configuration of your course-specific overcloud:

• list services on each node to view which systemd-configured services are running on each
type of deployment role server.

• compare the static IP addresses set in the orchestration network template files to the network
addresses on each overcloud node

• compare the NIC configuration of the controller deployment role network orchestration
template to the network interfaces and OpenvSwitch bridges on controller0

• compare the NIC configuration of the compute deployment role network orchestration
template to the network interfaces on compute0

• compare the NIC configuration of the ceph-storage deployment role network orchestration
template to the network interfaces on ceph0

• compare the disk configuration in the orchestration storage template file to the output of
ceph osd and ceph status commands

22 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Verifying an Overcloud

Testing IPMI power management


The power virtual machine acts like an IPMI hardware layer for each of the overcloud nodes.
One virtual address, per node, is added to the provisioning network interface on power, and
is configured to listen on port 623. Properly structured IPMI commands sent to a listener are
translated by the IPMI emulator into requests for the underlying hypervisor system, which
performs the action on the requested node.

IPMI IP addresses
Node name and KVM IP address on provisioning Virtual IP address on power
domain name network IPMI emulator
controller0 172.25.249.1 172.25.249.101
compute0 172.25.249.2 172.25.249.102
compute1 172.25.249.12 172.25.249.112
ceph0 172.25.249.3 172.25.249.103

This classroom does not require the full IPMI set of capabilities, only the ability to power cycle or
start nodes programmatically on demand. The command-line utility to test the functionality of
the power IPMI emulation uses this syntax:

[user@demo ~]$ ipmitool -I lanplus -U admin -P password -H IP power status|on|off

The -I interface options are compiled into the command and may be seen with ipmitool -h.
The lanplus choice indicates the use of the IPMI v2.0 RMCP+ LAN Interface.

For example, to view the power status of the controller0 node, run the following command.

[user@demo ~]$ ipmitool -I lanplus -U admin -P password -H 172.25.249.101 power status


Chassis Power is on

Testing OpenStack components


Red Hat OpenStack Platform includes a Testing Service module (codenamed Tempest) with
preconfigured per-module tests to perform rigorous testing prior to beginning production. The
standard tests are designed to load the overcloud and run for many hours to prove readiness.
The Testing Service systems also includes a shorter and simpler set of tests known as the smoke
tests. These tests also perform standard OpenStack operations, but are designed to confirm a
working configuration. Failures in these atomic tests, an inability to perform typical OpenStack
project user tasks, indicates a probable misconfiguration or inoperable hardware.

Using the Testing service requires some preparatory tasks:

• Testing is invoked as the admin user of the overcloud to be tested. The current environment
file must be loaded before starting.
• The system running the tests must have access to the internal API network. This can be a
temporary interface configured only for the duration of testing.
• An external network and subnet, must exist before running testing.
• Internet access is expected by default, to obtain a CirrOS image to use in testing. In our
classroom, we specify a local image from the command line to avoid this requirement.
• The heat_stack_user role must exist in the tested overcloud.
• Installing the openstack-tempest-all package installs all component tests, including tests for
components not installed on the overcloud. Manual editing of the tempest configuration file
can turn off unneeded components.

CL210-RHOSP10.1-en-2-20171006 23

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

The Testing service API tests are designed to use only the OpenStack API, and not one of the
Python client interfaces. The intent is for this testing to validate the API, by performing both
valid and invalid API invocations against component APIs to ensure stability and proper error
handling. The Testing Service can also be used to test client tool implementations if they can
operate in a raw testing mode which allows passing JSON directly to the client. Scenario tests
are also included. These test are a related series of steps to create more complex objects and
project states, confirmed for functionality, and then removed.

The Testing service runs the full-length tests by default. However, the service also provides a
method for running only shorter smoke tests or to skip tests, by creating a text file to list tests
by name, then including the file as an option when testing is run. This is useful for including or
excluding tests as required, such as skipping tests that may be inoperable due to component
updates or customization, or where individual features have been disabled. Adding *.smoke to
the skip list limits tests to the smoke tests.

One method for running tests is the tools/run-test.sh script, which uses a skip list file with
both include and exclude regular expression syntax for selecting tests. This course uses this
method because the Testing service CLI in RHOSP10 is not yet feature complete. However, the
tempest run command is available as another simple test invocation method.

The newer Testing service CLI also includes the useful temptest cleanup command, which
can find and delete resources created by the Testing service, even if tests have aborted or
completed with a failed status and left orphaned resources. To use this tool, first run the
command with the --init-saved-state option before running any tests. This option creates
a saved_state.json file containing a list of existing resources from the current cloud
deployment that will be preserved from subsequent cleanup commands. The following example
demonstrates the correct order in which to use the tempest cleanup commands.

[user@demo ~]$ tempest cleanup --init-saved-state


[user@demo ~]$ tempest run --smoke
[user@demo ~]$ tempest cleanup

Using VNC to access an internal-only instance console


An internal-only instance, by definition, is a server available on an internal project network
without external access. Because an internal-only server requires that absolute minimum number
of prerequisite objects, it is common to use one to test basic cloud functionality. The objects
required include a flavor, an image, a network and a subnet available for use in this user's project.
The objects may be owned by this project or shared from another project. No external networks,
routers, floating IPs, security groups, key pairs, persistent volumes or other non-core resources
are required. The subnet to which the instance is deployed can use DHCP or not, however, an IP
address must be available.

An internal-only server instance is not be accessible from any system other than an authorized
controller node for that overcloud. To gain access to the server's console, a user may access the
controller through a VNC- or Spice-enabled browser, or a websockets-implemented VNC or Spice
client. Since Red Hat OpenStack Platform support for Spice is not yet released, this course uses
and describes VNC console components and configuration.

Each compute node runs a vncserver process, listening on the internal API network at one or
more ports starting at 5900 and going up, depending on the number of instances deployed on
that compute node. Each controller node runs a novncproxy process, listening at port 6080
on the same internal API network. The remaining services belong to the Compute Service
(codenamed Nova) with components on both the controller and compute nodes.

24 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Verifying an Overcloud

To access the console, a user clicks the server's instance name from the Dashboard Project/
Compute/Instances screen to reach the instance detail screen, which has 4 sub-tabs. Clicking
the Console sub-tab initiates a request for a VNC console connection. The following list describes
the resulting Compute service and VNC interactions to build the instance-specific URL. Each
component named is followed by its access location in parentheses (node name, network name,
port number):

• A client browser (workstation, external), configured with the NoVNC plug-in, connects to
the Dashboard haproxy (controller0, external, port 80) to request to open a console to a
specific running server instance.

• Haproxy passes an access URL request to nova-api (controller0, internal API, port 8774).

• Nova-api passes a get_vnc_console request to nova-compute (compute0, internal_API,


AMQP).

• Nova-compute passes the get_vnc_console request to libvirt (compute0), which


returns a host IP and port.

• Nova-compute returns a generated token and a connect_info object to nova-api


(controller0, internal API, AMQP).

• Nova-api passes an authorize_console request to nova-consoleauth (compute0,


internal API, AMQP), which caches the connect_info object with the token as the index,
waiting for the actual connection request to occur.

• Nova-api returns a nova-novncproxy URL and the instance-specific token to Dashboard


(controller0, internal API), which passes the URL and token to the browser (workstation,
external).

In Figure 1.4: The constructed nova-novncproxy instance-specific URL, notice the URL, which
includes inline parameters for the token and instance ID for the requested server instance demo,
at the bottom of the Dashboard screen as the mouse hovers over the click-able link in the blue
message area.

Figure 1.4: The constructed nova-novncproxy instance-specific URL

CL210-RHOSP10.1-en-2-20171006 25

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

Note
The requirement that a user clicks the link titled Click here to show only console, plus
any messages about keyboard non-response, is not an error. It is the result of browser
settings forbidding cross domain scripts from running automatically. A user could
select settings, such as show all content or load unsafe scripts, that disable protective
security policies, but it is not recommended. Instead, manually click the link.

The Compute service has obtained connection information that it has cached with the console
authorization service, to be requested and used by any user who provides the correct token.
The URL passed to the browser is not the direct address of the demo instance, but instead is the
novncproxy address, which constructs a connection reverse proxy, to allow the demo instance
to initiate console screen refreshes. The following list describes the remaining interactions to
complete the reverse proxy VNC connection when the URL is clicked:

• The browser (workstation, external) connects to the URL, proxied by haproxy


(controller0, external, port 6080) to reach nova-novncproxy (controller0, internal
API, port 6080). nova-novncproxy parses the token and instance ID from the URL.

• Using the token, nova-novncproxy retrieves the connect_info object from nova-
consoleauth (controller0, internal API, AMQP).

• nova-novncproxy connects directly to vncserver (compute0, internal API, 5900+) at


the port designated for the requested VM and creates a reverse proxy to send graphics back
through the Dashboard haproxy (controller0, internal API, port 80) to the user's browser
(workstation, external).

Deploying and connecting to the VNC console of an internal-only server instance validates core
Compute service, Messaging service and network access functionality.

Verifying an Overcloud
The following steps outline the process to verify an overcloud deployment.

1. On the undercloud, add a port to the control plane interface br-ctlplane and assign it an
IP address.

2. Install the openstack-tempest package and component test packages.

3. Create a testing configuration directory and populate it with configuration files.

4. Create a provider network on the overcloud and retrieve its ID.

5. Run the config_tempest tool configuration script using the external network ID as an
argument.

6. Optionally, edit the /etc/tempest.conf file to select or clear the services to be tested.

7. Use the tempest-smoke-skip-sample sample file to create the tempest-smoke-skip


file. The file lists tests to run and tests to skip.

8. Run the tools/run-tests-sh --skip-file ./tempest-smoke-skip command to


test the environment.

26 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Verifying an Overcloud

References
Intelligent Platform Management Interface
https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface

Access an instance from console


https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/
console-url.html

How to select virtual consoles


https://docs.openstack.org/security-guide/compute/how-to-select-virtual-
consoles.html

Further information is available in the OpenStack Integration Test Suite Guide for
Red Hat OpenStack Platform 10; at
https://access.redhat.com/documentation/en-US/index.html

CL210-RHOSP10.1-en-2-20171006 27

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

Guided Exercise: Verifying the Functionality of


Overcloud Services

In this exercise, you will view the results of the deployment tasks that created the overcloud on
your system. You will verify the operation and configuration of the undercloud, then verify the
operation and configuration of the overcloud to compare and contrast the differences. Finally, to
validate that the overcloud is functional, you will install and run the Testing service.

Outcomes
You should be able to:

• Connect to and observe the undercloud virtual machine.

• Connect to and observe the overcloud virtual machines.

• Install and run the Testing service.

Before you begin


Log in to workstation as student with password student.

On workstation, run the lab deployment-overcloud-verify setup command. The


script checks that the m1.web flavor, the rhel7 image, and the default admin account are
available.

[student@workstation ~]$ lab deployment-overcloud-verify setup

Steps
1. Log in to director as the stack user. Observe that the stackrc environment file
automatically loaded. You will use the stack user's authentication environment to query
and manage the undercloud.

1.1. SSH to the stack user on the director system. No password is required. View the
stack user's environment, which is used to connect to the undercloud.

[student@workstation ~]$ ssh stack@director


[stack@director ~]$ env | grep OS_
OS_IMAGE_API_VERSION=1
OS_PASSWORD=96c087815748c87090a92472c61e93f3b0dcd737
OS_AUTH_URL=https://172.25.249.201:13000/v2.0
OS_USERNAME=admin
OS_TENANT_NAME=admin
OS_NO_CACHE=True
OS_CLOUDNAME=undercloud

1.2. View the current overcloud server list to find the provisioning network address for each
node. The IP addresses shown here may differ from yours.

[stack@director ~]$ openstack server list -c Name -c Status -c Networks


+-------------------------+---------+------------------------+
| Name | Status | Networks |
+-------------------------+---------+------------------------+
| overcloud-controller-0 | ACTIVE | ctlplane=172.25.249.52 |

28 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


| overcloud-compute-0 | ACTIVE | ctlplane=172.25.249.53 |
| overcloud-cephstorage-0 | ACTIVE | ctlplane=172.25.249.58 |
+-------------------------+---------+------------------------+

2. Log in to each overcloud system to view the unique services running on each node type,
using the heat-admin account that was provisioned during deployment. The heat-admin
on each node is configured with the SSH keys for the stack user from director to allow
password-less access.

2.1. Using SSH, log in to the controller0 service API node. List relevant services and
network configuration, then log out.

[stack@director ~]$ ssh heat-admin@controller0


[heat-admin@overcloud-controller-0 ~]$ ip addr | grep -E 'eth0|vlan|br-ex'
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
inet 172.25.249.59/24 brd 172.25.249.255 scope global eth0
inet 172.25.249.50/32 brd 172.25.249.255 scope global eth0
9: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
inet 172.25.250.1/24 brd 172.25.250.255 scope global br-ex
inet 172.25.250.50/32 brd 172.25.250.255 scope global br-ex
10: vlan40: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
inet 172.24.4.1/24 brd 172.24.4.255 scope global vlan40
inet 172.24.4.50/32 brd 172.24.4.255 scope global vlan40
11: vlan20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
inet 172.24.2.1/24 brd 172.24.2.255 scope global vlan20
12: vlan10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
inet 172.24.1.1/24 brd 172.24.1.255 scope global vlan10
inet 172.24.1.51/32 brd 172.24.1.255 scope global vlan10
inet 172.24.1.50/32 brd 172.24.1.255 scope global vlan10
13: vlan30: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
inet 172.24.3.1/24 brd 172.24.3.255 scope global vlan30
inet 172.24.3.50/32 brd 172.24.3.255 scope global vlan30
[heat-admin@overcloud-controller-0 ~]$ sudo ovs-vsctl list-br
...output omitted...
[heat-admin@overcloud-controller-0 ~]$ sudo ovs-vsctl list-ifaces br-trunk
...output omitted...
[heat-admin@overcloud-controller-0 ~]$ sudo ovs-vsctl list-ifaces br-ex
...output omitted...
[heat-admin@overcloud-controller-0 ~]$ systemctl -t service list-units \
open\* neutron\* ceph\*
...output omitted...
[heat-admin@overcloud-controller-0 ~]$ exit
[stack@director ~]$

2.2. Using SSH, log in to the compute0 hypervisor node. List relevant services and network
configuration, then log out.

[stack@director ~]$ ssh heat-admin@compute0


[heat-admin@overcloud-compute-0 ~]$ ip addr | grep -E 'eth0|vlan|eth2'
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
inet 172.25.249.57/24 brd 172.25.249.255 scope global eth0
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
inet 172.25.250.2/24 brd 172.25.250.255 scope global eth2
10: vlan20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
inet 172.24.2.2/24 brd 172.24.2.255 scope global vlan20
11: vlan10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
inet 172.24.1.2/24 brd 172.24.1.255 scope global vlan10
12: vlan30: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
inet 172.24.3.2/24 brd 172.24.3.255 scope global vlan30

CL210-RHOSP10.1-en-2-20171006 29

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

[heat-admin@overcloud-compute-0 ~]$ sudo ovs-vsctl list-br


...output omitted...
[heat-admin@overcloud-compute-0 ~]$ sudo ovs-vsctl list-ifaces br-trunk
...output omitted...
[heat-admin@overcloud-compute-0 ~]$ systemctl -t service list-units \
open\* neutron\* ceph\*
...output omitted...
[heat-admin@overcloud-compute-0 ~]$ exit
[stack@director ~]$

2.3. Using SSH, log in to the ceph0 storage node. List relevant services and network
configuration, then log out.

[stack@director ~]$ ssh heat-admin@ceph0


[heat-admin@overcloud-cephstorage-0 ~]$ ip addr | grep -E 'eth0|vlan|eth2'
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
inet 172.25.249.56/24 brd 172.25.249.255 scope global eth0
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
inet 172.25.250.3/24 brd 172.25.250.255 scope global eth2
6: vlan40: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
inet 172.24.4.3/24 brd 172.24.4.255 scope global vlan40
7: vlan30: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
inet 172.24.3.3/24 brd 172.24.3.255 scope global vlan30
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ovs-vsctl show
...output omitted...
[heat-admin@overcloud-cephstorage-0 ~]$ systemctl -t service list-units \
open\* neutron\* ceph\*
...output omitted...
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph status
...output omitted...
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd lspools
...output omitted...
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd ls
...output omitted...
[heat-admin@overcloud-cephstorage-0 ~]$ lsblk -fs
...output omitted...
[heat-admin@overcloud-cephstorage-0 ~]$ exit
[stack@director ~]$

3. Test the IPMI emulation software which is performing power management for the
overcloud's virtual machine nodes.

3.1. Use the IPMI command-line tool to power the compute1 node on and off. The
compute1 node will be provisioned as the second compute node in a later chapter,
but is not currently in use. All other nodes are currently functioning cloud nodes; do
not perform these commands on any other nodes. The IPMI address for compute1 is
172.25.249.112. Start by checking the node's current power status.

[stack@director ~]$ ipmitool -I lanplus -U admin -P password \


-H 172.25.249.112 power status
Chassis Power is off

3.2. Toggle the compute1 power on and off. When you are finished practicing the IPMI
functionality, leave the compute1 node powered off.

30 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


You might receive a failure message, as in the example commands below. This can
indicate that the command request was received while the host was transitioning
between states. Wait, then submit the command request again.

[stack@director ~]$ ipmitool -I lanplus -U admin -P password \


-H 172.25.249.112 power on
Chassis Power Control: Up/On

[stack@director ~]$ ipmitool -I lanplus -U admin -P password \


-H 172.25.249.112 power off
Set Chassis Power Control to Down/Off failed: Command response could not be
provided

[stack@director ~]$ ipmitool -I lanplus -U admin -P password \


-H 172.25.249.112 power off
Chassis Power Control: Down/Off

4. Authenticate as the admin user in the admin project in the overcloud.

Source the overcloudrc authentication environment file. The loaded environment


provides admin user access in the overcloud.

[stack@director ~]$ source overcloudrc

5. Confirm that the heat_stack_user role is available in the overcloud.

[stack@director ~]$ openstack role list -c Name


+-----------------+
| Name |
+-----------------+
| heat_stack_user |
| ResellerAdmin |
| _member_ |
| swiftoperator |
| admin |
+-----------------+

6. Install the Tempest testing service and component tests. Create a test configuration
directory, and populate it with configuration files using the configure-tempest-
directory script. Run the config-tempest script to configure the tests
for the overcloud, using overcloudrc environment parameters, the external
provider-172.25.250 network ID, and the cirros-0.3.4-x86_64-disk.img image
from http://materials.example.com.

6.1. Install the tempest package and all available component test packages.

[stack@director ~]$ sudo yum -y install openstack-tempest{,-all}

6.2. Create a test configuration working directory. Run the configure-tempest-


directory script from the new directory. The script populates the working directory
with configuration files.

[stack@director ~]$ mkdir ~/tempest

CL210-RHOSP10.1-en-2-20171006 31

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

[stack@director ~]$ cd ~/tempest


[stack@director tempest]$
/usr/share/openstack-tempest-13.0.0/tools/configure-tempest-directory

6.3. Locate the network ID for the provider-172.25.250 external network.

[stack@director tempest]$ openstack network show provider-172.25.250 \


-c id -f value
1eef8ec9-d4be-438b-bf18-381a40cbec60

6.4. Run the config_tempest setup script using the external network ID. This populates
the tempest configuration files based on components currently installed.

[stack@director tempest]$ tools/config_tempest.py \


--deployer-input ~/tempest-deployer-input.conf --debug \
--create identity.uri $OS_AUTH_URL identity.admin_password $OS_PASSWORD \
--image http://materials.example.com/cirros-0.3.4-x86_64-disk.img \
--network-id 1eef8ec9-d4be-438b-bf18-381a40cbec60
2017-06-19 16:02:56.499 10562 INFO tempest [-] Using tempest config file /etc/
tempest/tempest.conf
2017-06-19 16:02:57.415 10562 INFO __main__ [-] Reading defaults from file '/
home/stack/tempest/etc/default-overrides.conf'
2017-06-19 16:02:57.418 10562 INFO __main__ [-] Adding options from deployer-
input file '/home/stack/tempest-deployer-input.conf'
...output omitted...

7. Configure and run a smoke test. The dynamic configuration in the previous step included
mistral and designate component tests, which are not installed in this overcloud. Edit
the configuration to disable mistral and designate testing. Use the test skip file found
in student's Downloads directory on workstation to also exclude tests for API versions
not in use on this overcloud. Exit from director after the test run.

7.1. Edit the etc/tempest.conf testing configuration file to mark components as not
available. Locate and edit the service_available section to disable mistral and
designate testing. Leave existing entries; only add mistral and designate as
False. The section should appear as shown when done.

[stack@director tempest]$ cat ./etc/tempest.conf


...output omitted...
[service_available]
glance = True
manila = False
cinder = True
swift = True
sahara = False
nova = True
neutron = True
trove = False
ceilometer = True
ironic = False
heat = True
zaqar = False
horizon = True
mistral = False
designate = False
...output omitted...

32 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


7.2. Create a file named tempest-smoke-skip to list tests to run and tests to skip. Locate
the sample file named tempest-smoke-skip-sample in student's Downloads
directory on workstation. Copy the file to the Testing service working directory on
director and rename it. Review the entries in the skip file.

[stack@director tempest]$ scp \


student@workstation:Downloads/tempest-smoke-skip-sample ./tempest-smoke-skip
Warning: Permanently added 'workstation,172.25.250.254' (ECDSA) to the list of
known hosts.
student@workstation's password: student
tempest-smoke-skip 100% 998 1.0KB/s 00:00
[stack@director tempest]$ cat ./tempest-smoke-skip
+.*smoke
-ceilometer.*
-designate_tempest_plugin.*
-inspector_tempest_plugin.*
-manila_tempest_tests.*
-mistral_tempest_tests.*
-neutron.*
-neutron_fwaas.*
-neutron_vpnaas.*
-sahara_tempest_plugin.*
-tempest.api.data_processing.*
-tempest.api.identity.*
-tempest.api.image.*
-tempest.api.network.*
-tempest.api.object_storage.*
-tempest.api.orchestration.*
-tempest.api.volume.*
-tempest.scenario.*

7.3. Run the tempest cleanup command to save a list of pre-existing cloud resources.

[stack@director tempest]$ tempest cleanup --init-saved-state

7.4. Run the tests, specifying tempest-smoke-skip as the skip file. Although no test
failures are expected, view the output for any that occur to observe the troubleshooting
information provided by the Testing Service. This command may take 10 minutes or
longer to complete.

[stack@director tempest]$ tools/run-tests.sh --skip-file ./tempest-smoke-skip \


--concurrency 1
======
Totals
======
Ran: 13 tests in 93.0000 sec.
- Passed: 13
- Skipped: 0
- Expected Fail: 0
- Unexpected Success: 0
- Failed: 0
Sum of execute time for each test: 15.4832 sec.

==============
Worker Balance
==============
- Worker 0 (13 tests) => 0:01:26.541830

CL210-RHOSP10.1-en-2-20171006 33

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

7.5. Run the tempest cleanup command to remove resources not listed in the earlier
save list. There may be none to delete, if all tests completed successfully and performed
their own cleanups.

[stack@director tempest]$ tempest cleanup

7.6. Finish the test results review, then exit director.

[stack@director tempest]$ exit


[student@workstation ~]$

Cleanup
On workstation, run the lab deployment-overcloud-verify cleanup script to clean up
this exercise.

[student@workstation ~]$ lab deployment-overcloud-verify cleanup

34 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Lab: Managing an Enterprise OpenStack Deployment

Lab: Managing an Enterprise OpenStack


Deployment

In this lab, you will validate that the overcloud is functional by deploying a server instance using
a new user and project, creating the resources required. The lab is designed to be accomplished
using the OpenStack CLI, but you can also perform tasks using the dashboard (http://
dashboard.overcloud.example.com). You can find the admin password in the /home/
stack/overcloudrc file on director.

Outcomes
You should be able to:

• Create the resources required to deploy a server instance.

• Deploy and verify an external instance.

Before you begin


Log in to workstation as student using student as the password.

On workstation, run the lab deployment-review setup command. The script checks
that the m1.web flavor, the rhel7 image, and the provider-172.25.250 network exist to test
instance deployment. The script also checks that the default admin account is available.

[student@workstation ~]$ lab deployment-review setup

Steps
1. On workstation, load the admin user environment file. To prepare for deploying an server
instance, create the production project in which to work, and an operator1 user with the
password redhat. Create an authentication environment file for this new user.

2. The lab setup script preconfigured an external provider network and subnet, an image, and
multiple flavors. Working as the operator1 user, create the security resources required to
deploy this server instance, including a key pair named operator1-keypair1.pem placed
in student's home directory, and a production-ssh security group with rules for SSH
and ICMP.

3. Create the network resources required to deploy an external instance, including a


production-network1 network, a production-subnet1 subnet using the range
192.168.0.0/24, a DNS server at 172.25.250.254, and a production-router1
router. Use the external provider-172.25.250 network to provide a floating IP address.

4. Deploy the production-web1 server instance using the rhel7 image and the m1.web
flavor.

5. When deployed, use ssh to log in to the instance console. From the instance, verify network
connectivity by using ping to reach the external gateway at 172.25.250.254. Exit the
production-web1 instance when finished.

Evaluation
On workstation, run the lab deployment-review grade command to confirm the success
of this exercise.

CL210-RHOSP10.1-en-2-20171006 35

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

[student@workstation ~(operator1-production)]$ lab deployment-review grade

Cleanup
On workstation, run the lab deployment-review cleanup script to clean up this
exercise.

[student@workstation ~(operator1-production)]$ lab deployment-review cleanup

36 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

Solution
In this lab, you will validate that the overcloud is functional by deploying a server instance using
a new user and project, creating the resources required. The lab is designed to be accomplished
using the OpenStack CLI, but you can also perform tasks using the dashboard (http://
dashboard.overcloud.example.com). You can find the admin password in the /home/
stack/overcloudrc file on director.

Outcomes
You should be able to:

• Create the resources required to deploy a server instance.

• Deploy and verify an external instance.

Before you begin


Log in to workstation as student using student as the password.

On workstation, run the lab deployment-review setup command. The script checks
that the m1.web flavor, the rhel7 image, and the provider-172.25.250 network exist to test
instance deployment. The script also checks that the default admin account is available.

[student@workstation ~]$ lab deployment-review setup

Steps
1. On workstation, load the admin user environment file. To prepare for deploying an server
instance, create the production project in which to work, and an operator1 user with the
password redhat. Create an authentication environment file for this new user.

1.1. On workstation, source the admin-rc authentication environment file in the


student home directory. View the admin password in the OS_PASSWORD variable.

[student@workstation ~]$ source admin-rc


[student@workstation ~(admin-admin)]$ env | grep "^OS_"
OS_REGION_NAME=regionOne
OS_PASSWORD=mbhZABea3qjUTZGNqVMWerqz8
OS_AUTH_URL=http://172.25.250.50:5000/v2.0
OS_USERNAME=admin
OS_TENANT_NAME=admin

1.2. As admin, create the production project and the operator1 user.

[student@workstation ~(admin-admin)]$ openstack project create \


--description Production production
...output omitted...
[student@workstation ~(admin-admin)]$ openstack user create \
--project production --password redhat --email student@example.com operator1

1.3. Create a new authentication environment file by copying the existing admin-rc file.

[student@workstation ~(admin-admin)]$ cp admin-rc operator1-production-rc

1.4. Edit the file with the new user's settings. Match the settings shown here.

CL210-RHOSP10.1-en-2-20171006 37

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

unset OS_SERVICE_TOKEN
export OS_AUTH_URL=http://172.25.250.50:5000/v2.0
export OS_PASSWORD=redhat
export OS_REGION_NAME=regionOne
export OS_TENANT_NAME=production
export OS_USERNAME=operator1
export PS1='[\u@\h \W(operator1-production)]\$ '

2. The lab setup script preconfigured an external provider network and subnet, an image, and
multiple flavors. Working as the operator1 user, create the security resources required to
deploy this server instance, including a key pair named operator1-keypair1.pem placed
in student's home directory, and a production-ssh security group with rules for SSH
and ICMP.

2.1. Source the new environment file. Remaining lab tasks must be performed as this
production project member.

[student@workstation ~(admin-admin)]$ source operator1-production-rc

2.2. Create a keypair. Redirect the command output into the operator1-keypair1.pem
file. Set the required permissions on the key pair file.

[student@workstation ~(operator1-production)]$ openstack keypair create \


operator1-keypair1 > /home/student/operator1-keypair1.pem
[student@workstation ~(operator1-production)]$ chmod 600 operator1-keypair1.pem

2.3. Create a security group with rules for SSH and ICMP access.

[student@workstation ~(operator1-production)]$ openstack security group \


create production-ssh
...output omitted...
[student@workstation ~(operator1-production)]$ openstack security group \
rule create --protocol tcp --dst-port 22 production-ssh
...output omitted...
[student@workstation ~(operator1-production)]$ openstack security group \
rule create --protocol icmp production-ssh
...output omitted...

3. Create the network resources required to deploy an external instance, including a


production-network1 network, a production-subnet1 subnet using the range
192.168.0.0/24, a DNS server at 172.25.250.254, and a production-router1
router. Use the external provider-172.25.250 network to provide a floating IP address.

3.1. Create a project network and subnet.

[student@workstation ~(operator1-production)]$ openstack network create \


production-network1
...output omitted...
[student@workstation ~(operator1-production)]$ openstack subnet create \
--dhcp \
--subnet-range 192.168.0.0/24 \
--dns-nameserver 172.25.250.254 \
--network production-network1 \

38 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

production-subnet1
...output omitted...

3.2. Create a router. Set the gateway address. Add the internal network interface.

[student@workstation ~(operator1-production)]$ openstack router create \


production-router1
...output omitted...
[student@workstation ~(operator1-production)]$ neutron router-gateway-set \
production-router1 provider-172.25.250
...output omitted...
[student@workstation ~(operator1-production)]$ openstack router add subnet \
production-router1 production-subnet1
...output omitted...

3.3. Create a floating IP, taken from the external network. You will use this address to deploy
the server instance.

[student@workstation ~(operator1-production)]$ openstack floating ip \


create provider-172.25.250
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
...output omitted...
| floating_ip_address | 172.25.250.N |
...output omitted...

4. Deploy the production-web1 server instance using the rhel7 image and the m1.web
flavor.

4.1. Deploy the server instance, and verify the instance has an ACTIVE status.

[student@workstation ~(operator1-production)]$ openstack server create \


--nic net-id=production-network1 \
--security-group production-ssh \
--image rhel7 \
--flavor m1.web \
--key-name operator1-keypair1 \
--wait production-web1
...output omitted...
[student@workstation ~(operator1-production)]$ openstack server show \
production-web1 -c status -f value
ACTIVE

4.2. Attach the floating IP address to the active server.

[student@workstation ~(operator1-production)]$ openstack server add \


floating ip production-web1 172.25.250.N
...output omitted...

5. When deployed, use ssh to log in to the instance console. From the instance, verify network
connectivity by using ping to reach the external gateway at 172.25.250.254. Exit the
production-web1 instance when finished.

CL210-RHOSP10.1-en-2-20171006 39

Rendered for Nokia. Please do not distribute.


Chapter 1. Managing an Enterprise OpenStack Deployment

5.1. Use the ssh command with the key pair to log in to the instance as the cloud-user
user at the floating IP address.

[student@workstation ~(operator1-production)]$ ssh -i operator1-keypair1.pem \


cloud-user@172.25.250.N

5.2. Test for external network access. Ping the network gateway from production-web1.

[cloud-user@production-web1 ~]$ ping -c3 172.25.250.254


PING 172.25.250.254 (172.25.250.254) 56(84) bytes of data.
64 bytes from 172.25.250.254: icmp_seq=1 ttl=63 time=0.804 ms
64 bytes from 172.25.250.254: icmp_seq=2 ttl=63 time=0.847 ms
64 bytes from 172.25.250.254: icmp_seq=3 ttl=63 time=0.862 ms

--- 172.25.250.254 ping statistics ---


3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.804/0.837/0.862/0.041 ms

5.3. When finished testing, exit the production-web1 server instance.

[cloud-user@production-web1 ~]$ exit


[student@workstation ~(operator1-production)]$

Evaluation
On workstation, run the lab deployment-review grade command to confirm the success
of this exercise.

[student@workstation ~(operator1-production)]$ lab deployment-review grade

Cleanup
On workstation, run the lab deployment-review cleanup script to clean up this
exercise.

[student@workstation ~(operator1-production)]$ lab deployment-review cleanup

40 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Summary

Summary
In this chapter, you learned:

• Enterprise clouds today are built using multiple, interconnected cloud structures. The
undercloud is a provisioning and management cloud for building and managing the production
clouds. Red Hat OpenStack Platform director is the undercloud in Red Hat OpenStack Platform.

• An enterprise production cloud is known as an overcloud. Underclouds and overclouds


utilize the same technologies, but manage different workloads. Underclouds manage cloud
infrastructure, while overclouds manage production, tenant workloads.

• There are three major steps in overcloud provisioning. Introspection discovers and queries
available systems to gather node capabilities. Orchestration uses templates and environment
files to configure everything about the cloud deployment. Testing is designed to validate all the
standard functionality of the components that were installed.

• Common open technologies are used in physical and virtual clouds. Intelligent Platform
Management Interface (IPMI) is the power management technology used to control nodes.
Virtual Network Computing (VNC) is the remote access technology used to access deployed
instance consoles.

• The introspection process defines the basic technical characteristics of nodes to be deployed.
Using those characteristics, overcloud deployment can automatically assign deployment roles
to specific nodes.

• The orchestration process defines the specific configuration for each node's hardware and
software. The provided default templates cover a majority of common use cases and designs.

• OpenStack includes a testing component which has hundreds of tests to verify every
component in an overcloud. Tests and configuration are completely customizable, and include
short, validation smoke tests and longer running, more comprehensive full tests.

CL210-RHOSP10.1-en-2-20171006 41

Rendered for Nokia. Please do not distribute.


42

Rendered for Nokia. Please do not distribute.


TRAINING
CHAPTER 2

MANAGING INTERNAL
OPENSTACK COMMUNICATION

Overview
Goal Administer the Keystone identity service and the AMQP
messaging service.
Objectives • Describe the user and service authentication architecture.

• Administer the service catalog.

• Manage messages with the message broker.


Sections • Describing the Identity Service Architecture (and Quiz)

• Administering the Service Catalog (and Guided Exercise)

• Managing Message Brokering (and Guided Exercise)


Lab • Managing Internal OpenStack Communication

CL210-RHOSP10.1-en-2-20171006 43

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

Describing the Identity Service Architecture

Objectives
After completing this section, students should be able to:

• Describe the Identity Service architecture

• Compare and contrast the available token providers

• Describe differences between Identity Service versions

Identity Service Architecture


The OpenStack Identity Service (code named Keystone) provides authentication, role-based
authorization, policy management, and token handling using internal service functions
categorized as identity, policy, token, resource, role assignment, and catalog. The Identity Service
API is available at configurable endpoints segregated by public and internal traffic. The API can
be provided redundantly by multiple Controller nodes using Pacemaker with a virtual IP (VIP)
address. The internal service functions manage different aspects of the Identity Service:

Identity
Identity encompasses authentication and authorization functions. Users are a digital
representation of a person, system, or service using other OpenStack services. Users are
authenticated before requesting services from OpenStack components. Users must be assigned a
role to participate in a project. Users may be managed using groups, introduced in Identity Service
v3, which can be assigned roles and attached to projects the same as individual users.

Projects (also referred to by the deprecated description tenant) are collections of owned
resources such as networks, images, servers, and security groups. These are structured
according to the development needs of an organization. A project can represent a customer,
account, or any organizational unit. With Identity Service v3, projects can contain sub-projects,
which inherit project role assignments and quotas from higher projects.

Resource
Resource functions manage domains, which are an Identity Service v3 entity for creating
segregated collections of users, groups and projects. Domains allow multiple organizations to
share a single OpenStack installation. Users, projects, and resources created in one domain
cannot be transferred to another domain; by design, they must be recreated. OpenStack creates
a single domain named default for a new installation. In Identity Service v2, multiple domains
are not recognized and all activities use the default domain.

Token
Token functions create, manage and validate time-limited tokens which users pass to other
OpenStack components to request service. A token is a structured enumeration of user access
rights designed to simplify the requirement that each individual OpenStack service request be
verified for sufficient user privilege. Token protocols have evolved since the early OpenStack
days, and are discussed further in this chapter.

Policy
Policy functions provide a rule-based authorization engine and an associated rule management
interface. Policy rules define the capabilities of roles. Default roles include admin, _member_,
swiftoperator, and heat_stack_user. Custom roles may be created by building policies.

44 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Identity Service Architecture

Role Assignment
Role assignment functions are used to assign users to projects. Users do not belong to projects,
instead they have a role in a project. Users may be assigned multiple roles for the same project,
and may also be assigned different roles in multiple projects.

Roles define a set of user privileges to perform specific operations on OpenStack services,
defined by policy definitions. The most commonly recognized roles are _member_, which can
perform all normal activities within a project, and admin, which adds additional permissions to
create users, projects, and other restricted resource objects.

Catalog
Catalog functions store connection information about every other OpenStack service
component, in the form of endpoints. The catalog contains multiple endpoint entries for each
service, to allow service traffic to be segregated by public, internal, and administration tasks
for traffic management and security reasons. Since OpenStack services may be redundantly
installed on multiple controller and compute nodes, the catalog contains endpoints for each.
When users authenticate and obtain a token to use when accessing services, they are, at the
same time, being given the current URL of the requested service.

Note
Red Hat OpenStack Platform supports both Identity Service v2 and v3.
Identity v3 requires the use of the new authentication environment variables
OS_IDENTITY_API_VERSION and OS_DOMAIN_NAME, and a change to the
OS_AUTH_URL for the new version's endpoint. This OpenStack System Administration II
course only uses Identity Service v2.

Each listed Identity Service function supports multiple choices of back ends, defined through
plug-ins, which can be one of the following types (not all functions support all back-end types):

• Key Value Store: A file-based or in-memory dictionary using primary key lookups.

• Memcached: A distributed-memory shared caching structure.

• Structured Query Language: OpenStack uses SQLAlchemy as the default persistent data
store for most components. SQLAlchemy is a Python-based SQL toolkit.

• Pluggable Authentication Module: Using the Linux PAM authentication service.

• Lightweight Directory Access Protocol: Uses the LDAP protocol to connect to an


existing back-end directory, such as IdM or AD, for user authentication and role information.

Configuration files are located in the /etc/keystone directory:

Configuration and Log Files in /etc/keystone


File name Description
The primary configuration file defines drivers, credentials,
keystone.conf token protocols, filters and policies, and security
attributes.
Specified by the config_file parameter in the
keystone-paste-ini primary configuration file, this file provides PasteDeploy
configuration entries. PasteDeploy is a method for

CL210-RHOSP10.1-en-2-20171006 45

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

File name Description


configuring a WSGI pipeline and server, specified from an
INI-style file rather than being hard-coded into program
code. This configuration file defines the WSGI server, the
applications used, and the middleware pipelines and filters
that process requests.
Specifies the logging configuration for the Identity
logging.conf
Service.
Specifies role-based access policies determining which
policy.json user can access which objects and how they can be
accessed.
The relative API endpoints for all OpenStack services are
default_catalog.templates defined in this template file, which is referenced in the
primary configuration file.

Authentication Tokens
The Identity Service confirms a user's identity through an authentication process specified
through plug-in configuration, then provides the user with a token that represents the user's
identity. A typical user token is scoped, meaning that it lists the resources and access for which
it may be used. Tokens have a limited time frame, allowing the user to perform service requests
without further authentication until the token expires or is revoked. A scoped token lists the user
rights and privileges, as defined in roles relevant to the current project. A requested OpenStack
service checks the provided roles and requested resource access, then either allow or deny the
requested service.

Any user may use the openstack token issue command to request a current scoped
token with output showing the user id, the (scope) project, and the new token expiration. This
token type is actually one of three types of authorization scope: unscoped, project-scoped, and
domain-scoped. Because domains are a new feature supported in the Identity Service v3, earlier
documentation may refer only to scoped and unscoped tokens, in which scope is project-based.

Token Scope Description


Unscoped tokens are authentication-only tokens that do not contain a
project, role, and service information payload. For example, an unscoped
token may be used when authentication is provided by an Identity Provider
other than the Identity Service, such as an LDAP, RADIUS, or AD server.
Unscoped The token is used to authenticate with the Identity Service, which then
exchanges the unscoped token with the authenticated user's appropriate
scoped token. An unscoped token may also be referred to as an Identity
Service default token, which is not associated with a project or domain and
may be exchanged for a scoped token.
Project-scoped tokens provide authorization to perform operations on
a service endpoint utilizing the resources of a single project, allowing
Project-scoped activities specified by the user's role in that project. These tokens contain
the relevant service catalog, roles, and project information as payload and
are considered to be associated to a specific project.
Domain-scoped tokens apply to services that occur at the domain level,
Domain-scoped
not at the project or user level. This token's payload contains the domain's

46 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Token Providers

Token Scope Description


service catalog, and is limited to services that do not require per-project
endpoints. The token payload also contains project and role information for
the user, within the specified domain.

Token Providers
There are four types of token providers: UUID, PKI, PKIZ, and the newest provider, Fernet
(pronounced  fehr'nεt). All tokens are comprised of a payload, in JSON or random-generated
UUID format, contained in a transport format, such as a URL-friendly hexadecimal or
cryptographic message syntax (CMS) packaging. The default OpenStack recommended token
provider has changed a few times, as the OpenStack developers have addressed token size,
security, and performance issues.

UUID Tokens
UUID tokens were the original and default token provider up until the Folsom release. They are
32 byte randomly generated UUIDs, which must be persistently stored in the Identity Service's
configured back end to permit the Identity Service to validate the UUID each time a user makes a
service request to any service endpoint. Although UUIDs are lightweight and easy to validate with
a simple lookup, they have two disadvantages.

First, because UUID tokens must be retained by the Identity Service back end for repetitive
lookups, the storage space used grows as new tokens are generated. Until recently, expired
tokens were not regularly purged from the back-end store, leading to service performance
degradation.

Second, every individual service API call must bundle the request and token together to send to
the service component, where the service unpacks the UUID and sends a validation request to
the Identity Service. The Identity Service looks up the token's identity to determine the roles and
authorizations of the user, sending the information back to the resource service to determine if
the service component will process the user request. This generates a tremendous amount of
network traffic and activity to and from the Identity Service, which creates a scaling limitation.

PKI and PKIZ Tokens


Public Key Infrastructure (PKI) tokens were introduced in the Grizzly release as a solution that
would decrease the scale-limiting overhead on the Identity Service back-end and increase the
security of tokens by using certificates and keys to sign and validate tokens. PKI uses a JSON
payload, asymmetric keys, and the cryptographic message syntax (CMS) transport format. PKIZ
tokens apply zlib compression after the JSON payload in an attempt to shrink the total token
size, which typically exceeds 1600 bytes. The payload contains the service catalog with a size
generally proportional to the number of service entries in the catalog.

The advantage of PKI tokens, because of the public key methodology, is the ability of the
requested resource service component to verify and read the payload authorizations without
needing to send the token back to the Identity Service for every request. To process request
tokens, the requested service is only required to obtain the Identity Service's signing certificate,
the current revocation list, and the CA public certificate that validates the signing certificate.
Validated and unencoded tokens and payloads can be stored and shared using memcache,
eliminating some repetitive token processing overhead.

The disadvantage of the PKI token provider method is unacceptable performance due to
oversized shared caches, increased load on the identity service back end, and other problems
associated with handling tokens with large payloads. PKI tokens take longer to create and to

CL210-RHOSP10.1-en-2-20171006 47

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

validate than UUID tokens. Subsequently, UUID tokens again became the recommended token
provider. PKI/PKIZ token support was deprecated in the Mitaka release and was removed in the
Ocata release.

Fernet Tokens
Fernet tokens are an implementation of a symmetric key cryptographic authentication method,
which uses the same key to both encrypt and decrypt, designed specifically to process service
API request tokens. Fernet supports using multiple keys, always using the first key (the current
key) in the list to perform encryption and attempting other keys in the list (former keys and
about-to-become-current staged keys) to perform decryption. This technique allows Fernet keys
to be rotated regularly for increased security, while still allowing tokens created with previous
keys to be decrypted.

Fernet tokens do not exceed 250 bytes and are not persisted in the Identity Service back end.
Fernet token payloads use the MessagePack binary serialization format to efficiently carry the
authentication and authorization metadata, which is then encrypted and signed. Fernet tokens
do not require persistence nor do they require maintenance, as they are created and validated
instantaneously on any Identity Service node that can access the Fernet symmetric keys. The
symmetric keys are stored and shared on all Identity Service nodes in a key repository located by
default at /etc/keystone/fernet-keys/. The Fernet token provider was introduced in the
Kilo release and is the default token provider in the Ocata release. In earlier OpenStack developer
documentation, these tokens were referred to as authenticated encryption (AE) tokens.

Warning
All of these token providers (UUID, PKI, PKIZ, and Fernet) are known as bearer tokens,
which means that anyone holding the token can impersonate the user represented in
that token without having to provide any authentication credentials. Bearer tokens
must be protected from unnecessary disclosure to prevent unauthorized access.

Identity Service Administration


Token providers typically require minimal management tasks after they have been properly
installed and configured. UUID tokens require flushing expired tokens regularly. Fernet tokens
require rotating keys regularly for security, and distributing the key repository among all Identity
Service nodes in a multi-node HA deployment. PKI token providers require maintenance of
certificates, expirations, and revocations, plus management of the persistent store. Since PKI
tokens are deprecated, this section only discusses UUID and Fernet token tasks.

Flushing Expired UUID Tokens


By default, the Identity Service's expired tokens remain stored in its database, increasing the
database size and degrading service performance. Red Hat recommends changing the daily
token_flush cron job to run hourly to find and flush expired tokens. In /var/spool/cron/
keystone, modify the task to be hourly (instead of the default daily) and redirect output to a log
file:

PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh
@hourly keystone-manage token_flush &> /var/log/keystone/keystone-tokenflush.log

If necessary, the tokens flushed in the last hour can be viewed in the log file /var/log/
keystone/keystone-tokenflush.log. The log file does not grow in size, since the cron

48 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Identity Service Administration

job overwrites the log file each hour. When the cron job is first modified, the token database will
be larger than it will need to be in the future, since it will now be flushed hourly. However, the
database will not automatically reclaim unused space and should be truncated to relinquish all
currently used disk space:

[user@demo ~]# echo "TRUNCATE TABLE token" | sudo mysql -D keystone

Rotating Fernet Tokens


Fernet tokens do not require persistence, but the Fernet symmetric keys must be shared by
all Identity Service nodes that may be asked to validate a Fernet token. Since tokens should
be replaced on a regular basis to minimize the ability to create impersonated Fernet tokens,
the Fernet token provider uses a rotation method to put new symmetric keys into use without
breaking the ability to decrypt Fernet tokens created with a previous key.

To understand key rotation, the terminology of Fernet key usage is descriptive:

• Primary Key: the primary key is considered to be the current key. There can only be one
primary key on a single Identity Service node, recognized because its file name always has the
highest index number. Primary keys are used to both encrypt and decrypt Fernet tokens.

• Secondary Key: a secondary key is the key that was formerly a primary key and has been
replaced (rotated out). It is only used to decrypt Fernet tokens; specifically, to decrypt any
remaining Fernet tokens that it had originally encrypted. A secondary key's file is named with
an index that is lower than the highest, but never has the index of 0.

• Staged Key: a staged key is a newly added key that will be the next primary key when the keys
are next rotated. Similar to a secondary key, it is only used to decrypt tokens, which seems
unnecessary since it has not yet been a primary key and has never encrypted tokens on this
Identity Service node. However, in a multi-node Identity Service configuration, after the key
repository has been updated with a new staged key and distributed to all Identity Service
nodes, those nodes will perform key rotation one at a time. A staged key on one node may be
needed to decrypt tokens created by another node where that key has already become the
primary key. The staged key is always recognized by having a file name with the index of 0.

Service Account Deprecation


OpenStack users learn about the default service accounts that are created by a typical
installation and are role-assigned to the service project. These service users have names
for each of the OpenStack service components, such as keystone, nova, glance, neutron,
swift, and cinder. The primary purpose of these accounts is to be the service side of two-
way PKI certificate authentication protocols. As the OpenStack Identity Service developers move
towards future tokenless authentication methods in the Pike, Queen, and later releases, and the
removal of the PKI token provider in the Ocata release, these service accounts will no longer be
necessary and will also be removed in a future release.

Endpoint Deprecation for adminURL


Since the beginning of the Identity Service, there have always been three types of endpoints;
publicURL, internalURL, and adminURL. Originally, the endpoints were designed to
segregate traffic onto public or private networks for security reasons. The admin_url endpoint
was implemented only in the Identity Service, where a small set of additional API functions
allowed an admin to bootstrap the Identity Service. Other services did not implement admin-
only API distinctions. In later OpenStack releases, having a separate adminURL endpoint became
unnecessary because users could be checked for their role privileges no matter which endpoint
they used, and allowed access to admin-only privileges accordingly.

CL210-RHOSP10.1-en-2-20171006 49

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

When the Identity Service v2 API becomes deprecated in some future release, the last remaining
adminURL distinction, that of the end user and admin CRUD PasteDeploy pipeline routines, will
no longer be necessary and the adminURL endpoint will also be deprecated and removed.

References
Keystone tokens
https://docs.openstack.org/keystone/latest/admin/identity-tokens.html

50 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Quiz: Describing the Identity Service Architecture

Quiz: Describing the Identity Service


Architecture

Choose the correct answer(s) to the following questions:

1. Which service in the Keystone architecture is responsible for domains?

a. Policy
b. Resource
c. Catalog
d. Token
e. User

2. Which service in the Keystone architecture provides a rule-based authorization engine?

a. Policy
b. Resource
c. Catalog
d. Token
e. User

3. Which type of token authorization describes tokens that are not attached to a project?

a. Scoped Token
b. Domain Token
c. Unscoped Token
d. PKI Token

4. Which Keystone configuration file contains role-based access policy entries that determine
which user can access which objects and how they can be accessed?

a. policy.json
b. default_catalog.templates
c. keystone-paste.ini
d. keystone-env.conf

5. Which two token providers use cryptographic message syntax (CMS)? (Choose two.)

a. Fernet
b. PKI
c. PKIZ
d. Scoped token
e. UUID

CL210-RHOSP10.1-en-2-20171006 51

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

Solution
Choose the correct answer(s) to the following questions:

1. Which service in the Keystone architecture is responsible for domains?

a. Policy
b. Resource
c. Catalog
d. Token
e. User

2. Which service in the Keystone architecture provides a rule-based authorization engine?

a. Policy
b. Resource
c. Catalog
d. Token
e. User

3. Which type of token authorization describes tokens that are not attached to a project?

a. Scoped Token
b. Domain Token
c. Unscoped Token
d. PKI Token

4. Which Keystone configuration file contains role-based access policy entries that determine
which user can access which objects and how they can be accessed?

a. policy.json
b. default_catalog.templates
c. keystone-paste.ini
d. keystone-env.conf

5. Which two token providers use cryptographic message syntax (CMS)? (Choose two.)

a. Fernet
b. PKI
c. PKIZ
d. Scoped token
e. UUID

52 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Administering the Service Catalog

Administering the Service Catalog

Objective
After completing this section, students should be able to administer the service catalog.

Keystone Service Catalog


The service catalog is a crucial element of the Keystone architecture. The service catalog
provides a list of endpoint URLs that can be dynamically discovered by API clients. The service
endpoints that can be accessed by a token are provided by the service catalog. Without a service
catalog, API clients would be unaware of which URL an API request should use. The openstack
catalog show command displays catalog information for a service. The service name is
passed as an argument to the command. For example, to view the service catalog data for nova
compute, use the following command:

[user@demo ~(admin)]$ openstack catalog show nova


+-----------+----------------------------------------------------+
| Field | Value |
+-----------+----------------------------------------------------+

| endpoints | regionOne |

| | publicURL: http://172.25.250.50:8774/v2.1 |
| | internalURL: http://172.24.1.50:8774/v2.1 |
| | adminURL: http://172.24.1.50:8774/v2.1 |
| | |

| name | nova |

| type | compute |
+-----------+----------------------------------------------------+

The region for the URLs.


The list of internal, public, and admin URLs by which an API client request can access the
Nova compute service.
The user-facing service name.
The OpenStack registered type, such as image-service and object-store.

Endpoints
An endpoint is a URL that an API client uses to access a service in OpenStack. Every service
has one or more endpoints. There are three types of endpoint URLs: adminURL, publicURL,
and internalURL. The adminURL should only be consumed by those who require administrative
access to a service endpoint. The internalURL is used by services to communicate with each
other on a network that is unmetered or free of bandwidth charges. The publicURL is designed
with the intention of being consumed by end users from a public network. The adminURL is
meant only for access requiring administrative privileges.

To list the services and their endpoints, use the openstack catalog list command as the
OpenStack admin user.

[user@demo ~(admin)]$ openstack catalog list


+---------+---------+------------------------------------------------+

CL210-RHOSP10.1-en-2-20171006 53

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

| Name | Type | Endpoints |


+---------+---------+------------------------------------------------+
| nova | compute | regionOne |
| | | publicURL: https://172.25.249.201:13774/v2.1 |
| | | internalURL: http://172.25.249.200:8774/v2.1 |
| | | adminURL: http://172.25.249.200:8774/v2.1 |
| | | |
| neutron | network | regionOne |
| | | publicURL: https://172.25.249.201:13696 |
| | | internalURL: http://172.25.249.200:9696 |
| | | adminURL: http://172.25.249.200:9696 |
...output omitted...

To list the ID, region, service name, and service type of all the endpoints, use the openstack
endpoint list command.

[user@demo ~(admin)]$ openstack endpoint list


+----------------------------------+-----------+--------------+----------------+
| ID | Region | Service Name | Service Type |
+----------------------------------+-----------+--------------+----------------+
| d1812da138514794b27d266a22f66b15 | regionOne | aodh | alarming |
| b1484c933ba74028965a51d4d0aa9f04 | regionOne | nova | compute |
| 4c6117b491c243aabbf40d7dfdf5ce9a | regionOne | heat-cfn | cloudformation |
| eeaa5964c26042e38c632d1a12e001f3 | regionOne | heat | orchestration |
| 1aeed510fa9a433795a4ab5db80e19ec | regionOne | glance | image |
...output omitted...

Troubleshooting
A proper catalog and endpoint configuration are essential for the OpenStack environment to
function effectively. Common issues that lead to troubleshooting are misconfigured endpoints
and user authentication. There is a known issue documented in BZ-1404324 where the scheduled
token flushing job is not effective enough for large deployments, we will review the fix in the
following guided exercise. When issues do arise, there are steps that can be taken to investigate
and find a resolution to the issue. The following is a list of troubleshooting steps:

• Ensure the authentication credentials and token are appropriate using the curl command to
retrieve the service catalog.

[user@demo ~(admin)]$ curl -s -X POST http://172.25.250.50:35357/v2.0/tokens \


-d '{"auth": {"passwordCredentials": {"username":"admin", \
"password":"Y7Q72DfAjKjUgA2G87yHEJ2Bz"}, "tenantName":"admin"}}' \
-H "Content-type: application/json" | jq .
{
"access": {
"metadata": {
"roles": [
"f79b6d8bfada4ab89a7d84ce4a0747ff"
],
"is_admin": 0
},
"user": {
"name": "admin",
"roles": [
{
"name": "admin"
}
],
"id": "15ceac73d7bb4437a34ee26670571612",

54 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Troubleshooting

"roles_links": [],
"username": "admin"
},
"serviceCatalog": [
{
"name": "nova",
"type": "compute",
"endpoints_links": [],
"endpoints": [
...output omitted...

• Inspect the /var/log/keystone/keystone.log for [Errno 111] Connection refused errors.


This indicates there is an issue connecting to a service endpoint.

2017-06-04 14:07:49.332 2855 ERROR oslo.messaging._drivers.impl_rabbit


[req-1b8d5196-d787-49db-be60-025ce0ab575d - - - - -] [73809126-9833-487a-
a69a-4a7d9dffd08c]
AMQP server on 172.25.249.200:5672 is unreachable: [Errno 111] Connection refused.
Trying again in 1 seconds. Client port: None

• Every service has an API log that should be inspected when troubleshooting endpoints. For
example, if an operator cannot retrieve Glance image data, an inspection of /var/log/
glance/api.log may provide useful information. Query the file for DiscoveryFailure.

DiscoveryFailure: Could not determine a suitable URL for the plugin


2017-05-30 04:31:17.650 277258 INFO eventlet.wsgi.server [-] 172.24.3.1 - - [30/
May/2017 04:31:17] "GET /v2/images HTTP/1.1" 500 139 0.003257

• Include the --debug option to the openstack catalog show command (or to any
openstack command) to view the HTTP request from the client and the responses from
the endpoints. For example, the following lists the HTTP request from nova compute and the
response from the endpoint.

[user@demo ~(admin)]$ openstack catalog show nova --debug


...output omitted...
Get auth_ref
REQ: curl -g -i -X GET http://172.25.250.50:5000/v2.0 -H "Accept: application/json" -H
"User-Agent:
osc-lib keystoneauth1/2.12.2 python-requests/2.10.0 CPython/2.7.5"
Starting new HTTP connection (1): 172.25.250.50
"GET /v2.0 HTTP/1.1" 200 230
RESP: [200] Date: Mon, 05 Jun 2017 08:11:19 GMT Server: Apache Vary: X-Auth-
Token,Accept-Encoding
x-openstack-request-id: req-64ed1753-5c56-4f61-b62a-46bf3097c912 Content-Encoding:
gzip Content-Length: 230 Content-Type: application/json
Making authentication request to http://172.25.250.50:5000/v2.0/tokens
"POST /v2.0/tokens HTTP/1.1" 200 1097

Administering the Service Catalog


The following steps outline the process for displaying the service catalog and service endpoints.

1. Use the command openstack token issue to retrieve a scoped token.

2. Verify the token by using the curl command with the token to list projects.

3. Display the service catalog using the openstack catalog list command.

CL210-RHOSP10.1-en-2-20171006 55

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

4. Display endpoints and the ID for a particular service using the openstack catalog show
command, for instance, passing the service name nova as an argument.

References
Identity Concepts
https://docs.openstack.org/keystone/latest/admin/identity-concepts.html

API endpoint configuration


https://docs.openstack.org/security-guide/api-endpoints/api-endpoint-configuration-
recommendations.html

56 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Guided Exercise: Administering the Service Catalog

Guided Exercise: Administering the Service


Catalog

In this exercise, you will view the Keystone endpoints and catalog, issue a token, and manage
token expiration.

Outcomes
You should be able to:

• View the Keystone service catalog.

• View the Keystone service endpoints.

• Issue a Keystone token.

• Clear expired tokens from the database.

Before you begin


Log in to workstation as student using student as the password.

On workstation, run the lab communication-svc-catalog setup command. This script


will ensure the OpenStack services are running and the environment is properly configured for
this guided exercise.

[student@workstation ~]$ lab communication-svc-catalog setup

Steps
1. On workstation, source the Keystone admin-rc file and list the Keystone endpoints
registry. Take note of the available service names and types.

[student@workstation ~]$ source admin-rc


[student@workstation ~(admin-admin)]$ openstack endpoint list
+----------------------------------+-----------+--------------+----------------+
| ID | Region | Service Name | Service Type |
+----------------------------------+-----------+--------------+----------------+
| d1812da138514794b27d266a22f66b15 | regionOne | aodh | alarming |
| b1484c933ba74028965a51d4d0aa9f04 | regionOne | nova | compute |
| 4c6117b491c243aabbf40d7dfdf5ce9a | regionOne | heat-cfn | cloudformation |
| eeaa5964c26042e38c632d1a12e001f3 | regionOne | heat | orchestration |
| 1aeed510fa9a433795a4ab5db80e19ec | regionOne | glance | image |
| 77010d1ff8684b3292aad55e30a3db29 | regionOne | gnocchi | metric |
| 1d023037af8e4feea5e23ff57ad0cb77 | regionOne | keystone | identity |
| 30b535478d024416986a8e3cc52a7971 | regionOne | cinderv2 | volumev2 |
| 23fef1b434664188970e2e6b011eb3fa | regionOne | ceilometer | metering |
| 4cf973e0d1f34f2497f8c521b6128ca7 | regionOne | swift | object-store |
| 853e51122b5e490ab0b85289ad879371 | regionOne | cinderv3 | volumev3 |
| 7f2a3a364a7a4a608f8581aed3b7b9e0 | regionOne | neutron | network |
| ca01cf7bee8542b7bd5c068f873bcd51 | regionOne | cinder | volume |
+----------------------------------+-----------+--------------+----------------+

2. View the Keystone service catalog and notice the endpoint URLs (especially the IP
addresses), the version number, and the port number.

CL210-RHOSP10.1-en-2-20171006 57

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

[student@workstation ~(admin-admin)]$ openstack catalog list -f value


...output omitted...

keystone identity regionOne


publicURL: http://172.25.250.50:5000/v2.0
internalURL: http://172.24.1.50:5000/v2.0
adminURL: http://172.25.249.50:35357/v2.0

3. Issue an admin token to manually (using curl) find information about OpenStack.

[student@workstation ~(admin-admin)]$ openstack token issue


+------------+----------------------------------+
| Field | Value |
+------------+----------------------------------+
| expires | 2017-05-26 09:21:38+00:00 |
| id | 1cdacca5070b44ada325f861007461c1 |
| project_id | fd0ce487ea074bc0ace047accb3163da |
| user_id | 15ceac73d7bb4437a34ee26670571612 |
+------------+----------------------------------+

4. Verify the token retrieved in the previous command. Use the curl command with the token
ID to retrieve the projects (tenants) for the admin user.

[student@workstation ~(admin-admin)]$ curl -H "X-Auth-Token:\


1cdacca5070b44ada325f861007461c1" http://172.25.250.50:5000/v2.0/tenants
{"tenants_links": [], "tenants": [{"description": "admin tenant", "enabled": true,
"id": "0b73c3d8b10e430faeb972fec5afa5e6", "name": "admin"}]}

5. Use SSH to connect to director as the user root. The database, MariaDB, resides on
director and provides storage for expired tokens. Accessing MariaDB enables you to
determine the amount of space used for expired tokens.

[student@workstation ~(admin-admin)]$ ssh root@director

6. Log in to MariaDB.

[root@director ~]# mysql -u root


Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 981
Server version: 5.5.52-MariaDB MariaDB Server
...output omitted...

7. Use an SQL statement to list the tables and pay special attention to the size of the token
table.

MariaDB [(none)]> use keystone


MariaDB [keystone]> SELECT table_name, (data_length+index_length) tablesize \
FROM information_schema.tables;
+----------------------------------------------+-----------+
| table_name | tablesize |
+----------------------------------------------+-----------+
...output omitted...
token | 4308992 |

58 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


...output omitted...

8. Use an SQL statement to view the amount of space used for expired Keystone tokens.

MariaDB [keystone]> SELECT COUNT(*) FROM token WHERE token.expires < \


CONVERT_TZ(NOW(), @@session.time_zone, '+00:00');
+----------+
| COUNT(*) |
+----------+
| 149 |
+----------+
1 row in set (0.00 sec)

9. Truncate the token table then ensure the amount of space used for expired tokens is zero.

MariaDB [keystone]> TRUNCATE TABLE token;


Query OK, 0 rows affected (0.04 sec)
MariaDB [keystone]> SELECT COUNT(*) FROM token WHERE token.expires < \
CONVERT_TZ(NOW(), @@session.time_zone, '+00:00');
+----------+
| COUNT(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)

10. Log out of MariaDB.

MariaDB [keystone]> exit


Bye
[root@director ~]#

11. Ensure that the Keystone user has a cron job to flush tokens from the database.

[root@director ~]# crontab -u keystone -l


...output omitted...
PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh
1 0 * * * keystone-manage token_flush >>/dev/null 2>1

12. Modify the cron job to run keystone-manage token_flush hourly.

[root@director ~]# crontab -u keystone -e


...output omitted...
PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh
@hourly keystone-manage token_flush >>/dev/null 2>1

13. Log out of director.

[root@director ~]# exit


[student@workstation ~(admin-admin)]$

CL210-RHOSP10.1-en-2-20171006 59

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

Cleanup
From workstation, run the lab communication-svc-catalog cleanup script to clean up
the resources created in this exercise.

[student@workstation ~(admin-admin)]$ lab communication-svc-catalog cleanup

60 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Managing Message Brokering

Managing Message Brokering

Objective
After completing this section, students should be able to manage messages and the message
broker.

RabbitMQ Overview
OpenStack software provides a collection of services covering all the functionality associated
with a private cloud solution. Those services are composed internally of different components,
allowing a flexible and scalable configuration. OpenStack services base their back end on two
services, a database for persistence and a message broker for supporting communications
among the components of each service. Any message broker solution supporting AMQP can be
used as a message broker back end. Red Hat includes RabbitMQ as the message broker to be
used on its OpenStack architecture, since it provides enterprise-level features useful for setting
up advanced configurations.

The following table provides some common RabbitMQ terms and definitions.

Term Description
Exchange retrieves published messages from the producer and distributes them
to queues
Publisher/Producer applications that publish the message
Consumer applications that process the message
Queues stores the message
Routing Key used by the exchange to determine how to route the message
Binding the link between a queue and an exchange

A message broker allows message sending and receiving among producers and consumer
applications. Internally, this communication is executed by RabbitMQ using exchanges, queues,
and the bindings among those two. When an application produces a message that it wants to
send to one or more consumer applications, it places that message on an exchange to which
one or more queues are bound. Consumers can subscribe to those queues in order to receive
the message from the producer. The communication is based on the routing key included in the
message to be transmitted.

Exchange Overview
The exchange's interaction with a queue is based on the match between the routing key included
in the message and the binding key associated to the queue on the related exchange. Depending
on the usage of those two elements, there are several types of exchanges in RabbitMQ.

• Direct

Consumers are subscribed to a queue with an associated binding key, and the producer sets
the routing key of the message to be the same as that of the binding key of the queue to which
the desired consumer is subscribed.

• Topic

CL210-RHOSP10.1-en-2-20171006 61

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

Consumers are subscribed to a queue that has a binding key including wildcards, so producers
can send messages with different but related routing keys to that queue.

• Fanout

The message is broadcast to all the subscribed queues without regard for whether the routing
and binding keys match.

• Headers

This makes use of the header properties of the message to perform the match against the
binding arguments of the queue.

Configuration Files and Logs


The following table provides a description of RabbitMQ configuration files.

File name Description


/etc/rabbitmq/enabled_plugins contains a list of the enabled plugins
/etc/rabbitmq/rabbitmq-env.conf overrides the defaults built in to the RabbitMQ startup
scripts
/etc/rabbitmq/rabbitmq.config provides the standard Erlang configuration file that allows
the RabbitMQ core application, Erlang services, and
RabbitMQ plugins to be configured
/var/log/rabbitmq/ contains logs of runtime events
rabbit@overcloud-controller-0.log

Troubleshooting
OpenStack services follow a component architecture. The functionalities of a service are split
into different components, and each component communicates with other components using the
message broker. In order to troubleshoot a problem with an OpenStack service, it is important
to understand the workflow a request follows as it moves through the different components
of the service. Generally, the OpenStack service architecture provides a unique component to
make each service’s API available. The Cinder block storage service, for example, is managed by
the cinder-api service. The API component is the entry point to the rest of the component
architecture of its service. When trying to isolate a problem with a service, check its API provider
first.

After the API component has been verified, and if no errors appear in the log files, confirm
that the remaining components can communicate without issue. Any error related to the
RabbitMQ message broker, or its configuration in the related service configuration file, should
appear in the log files of the service. For the Cinder block storage service, after the cinder-
api has processed the petition through the Cinder API, the petition is processed by both
the cinder-volume and cinder-scheduler processes. These components take care of
communicating among themselves using the RabbitMQ message broker to create the volume on
the most feasible storage back end location. Cinder block storage service components (cinder-
scheduler, for example) do not function correctly with a broken RabbitMQ back end that
crashes unexpectedly. Debug the issue by checking the component-related logs, such as /var/
log/cinder/scheduler.log. Then check for problems with the component as a client for the
RabbitMQ message broker. When a component crashes from RabbitMQ-related issues, it is usually

62 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


RabbitMQ Utilities

due to a misconfiguration of either authorization or encryption. These errors are in a related


configuration file, such as /etc/cinder/cinder.conf for Cinder components. Sometimes,
however, a crash occurs for reasons other than RabbitMQ, such as unavailable Cinder block
storage services.

RabbitMQ Utilities
RabbitMQ provides a suite of utilities to check the RabbitMQ daemon status and to execute
administrative operations. These tools are used to check the different configurable elements
on a RabbitMQ instance, including the queues used by the producers and consumers to share
messages, the exchanges to which those queues are connected to, and the bindings among the
components. The following table describes the RabbitMQ utility commands.

Utility Description
rabbitmqctl command line tool for managing a RabbitMQ broker
rabbitmqadmin provided by the management plugin, used to perform the same actions as
the web-based UI, and can be used with scripting

Commands for rabbitmqctl


The following is a list of typical commands that are used with the rabbitmqctl command.

• Use the report command to show a summary of the current status of the RabbitMQ daemon,
including the number and types of exchanges and queues.

[user@demo ~]$ rabbitmqctl report


{listeners,[{clustering,25672,"::"},{amqp,5672,"172.25.249.200"}]},
...output omitted...

• Use the add_user command to create RabbitMQ users. For example, to create a RabbitMQ
user named demo with redhat as the password, use the following command:

[user@demo ~]$ rabbitmqctl add_user demo redhat

• Use the set_permissions command to set the authorization for a RabbitMQ user. This
option sets the configure, write, and read permissions that correspond to the three wildcards
used in the command, respectively. For example, to set configure, write, and read permissions
for the RabbitMQ user demo, use the following command:

[user@demo ~]$ rabbitmqctl set_permissions demo ".*" ".*" ".*"

• Use the list_users command to list the RabbitMQ users.

[user@demo ~]$ rabbitmqctl list_users


Listing users ...
c65393088ebee0e2170b044f924f2d924ae78276 [administrator]
demo

• Use the set_user_tags command to enable authorization for the management back end. For
example, to assign the RabbitMQ user demo administrator access, use the following command.

CL210-RHOSP10.1-en-2-20171006 63

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

[user@demo ~]$ rabbitmqctl set_user_tags demo administrator

• Use the list_exchanges command with rabbitmqctl to show the default configured
exchanges on the RabbitMQ daemon.

[user@demo ~]$ rabbitmqctl list_exchanges


Listing exchanges ...
amq.match headers
keystone topic
q-agent-notifier-security_group-update_fanout fanout
...output omitted...

• Use the list_queues command to list the available queues and their attributes.

[user@demo ~]$ rabbitmqctl list_queues


Listing queues ...
q-agent-notifier-port-delete.director.lab.example.com 0
...output omitted...

• Use the list_consumers command to list all the consumers and the queues to which they
are subscribed.

[user@demo ~]$ rabbitmqctl list_consumers


Listing consumers ...
q-agent-notifier-port-delete.director.lab.example.com rabbit@director.1.1143.0> 2 true
0 []
q-agent-notifier-port-update.director.lab.example.com rabbit@director.1.1098.0> 2 true
0 []
mistral_executor.0.0.0.0 rabbit@director.1.19045.35> 2 true 0 []
...output omitted...

Commands for rabbitmqadmin


The following is a list of typical commands that are used with the rabbitmqadmin command.
The rabbitmqadmin command must be executed as the root user or a RabbitMQ user with
appropriate permissions. Prior to using the command, the rabbitmq_management plugin must
be enabled and the rabbitmqadmin binary must be added to the PATH environment variable
and set executable for the user root. This can be accomplished through manually exporting
the binary and setting execution permissions or by using the command rabbitmq-plugins
enable rabbitmq_management.

• Use the declare queue command to create a queue. For example, to create a new queue
name demo.queue, use the following command:

[root@demo ~]# rabbitmqadmin -u demo -p redhat declare queue name=demo.queue

• Use the declare exchange command to create an exchange. For example, to create a topic
exchange named demo.topic, use the following command:

[root@demo ~]# rabbitmqadmin -u demo -p redhat declare exchange name=demo.topic \


type=topic

64 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


RabbitMQ Utilities

• Use the publish command to publish a message to a queue. For example, to publish the
message 'demo message!' to the demo.queue queue, execute the command, type the
message, then press Ctrl+D to publish the message.

[root@demo ~]# rabbitmqadmin -u demo -p redhat publish routing_key=demo.queue


'demo message!'
Ctrl+D
Message published

• Use the get command to display a message for a queue. For example, to display the message
published to the queue demo.queue use the following command:

[root@demo ~]# rabbitmqadmin -u rabbitmqauth -p redhat get queue=demo.queue -f json


{
"exchange": "",
"message_count": 0,
"payload": "'demo message!'\n",
"payload_bytes": 15,
"payload_encoding": "string",
"properties": [],
"redelivered": true,
"routing_key": "demo.queue"
}
]

Publishing a Message to a Queue


The following steps outline the process for publishing a message to a queue.

1. Create a RabbitMQ user using the rabbitmqctl add_user command.

2. Configure the user permissions using the rabbitmqctl set_permissions command.

3. Set the user tag to administrator or guest, using the rabbitmqctl set_user_tags
command.

4. Create a message queue using the rabbitmqadmin declare queue command.

5. Publish a message to a queue using the rabbitmqadmin publish command.

6. Display the queued message using the rabbitmqadmin get command.

References
Management CLI
https://www.rabbitmq.com/management-cli.html

Management Plugins
https://www.rabbitmq.com/management.html

Troubleshooting
https://www.rabbitmq.com/troubleshooting.html

CL210-RHOSP10.1-en-2-20171006 65

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

Guided Exercise: Managing Message Brokering

In this exercise, you will enable the RabbitMQ Management Plugin to create an exchange and
queue, publish a message, and retrieve it.

Resources
Files: http://material.example.com/cl210_producer, http://
material.example.com/cl210_consumer

Outcomes
You should be able to:

• Authorize a RabbitMQ user.


• Enable the RabbitMQ Management Plugin.
• Create a message exchange.
• Create a message queue.
• Publish a message to a queue.
• Retrieve a published message.

Before you begin


Log in to workstation as student using student as the password.

On workstation, run the lab communication-msg-brokering setup command. This


ensures that the required utility is available on director.

[student@workstation ~]$ lab communication-msg-brokering setup

Steps
1. From workstation, use SSH to connect to director as the stack user. Use sudo to become
the root user.

[student@workstation ~]$ ssh stack@director


[stack@director ~]$ sudo -i

2. Create a rabbitmq user named rabbitmqauth with redhat as the password.

[root@director ~]# rabbitmqctl add_user rabbitmqauth redhat


Creating user "rabbitmqauth" ...

3. Configure permissions for the rabbitmqauth user. Use wildcard syntax to assign all
resources to each of the three permissions for configure, write, and read.

[root@director ~]# rabbitmqctl set_permissions rabbitmqauth ".*" ".*" ".*"


Setting permissions for user "rabbitmqauth" in vhost "/" ...

4. Set the administrator user tag to enable privileges for rabbitmqauth.

[root@director ~]# rabbitmqctl set_user_tags rabbitmqauth administrator


Setting tags for user "rabbitmqauth" to [administrator] ...

66 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


5. Verify that a RabbitMQ Management configuration file exists in root's home directory. The
contents should match as shown here.

[root@director ~]# cat ~/.rabbitmqadmin.conf


[default]
hostname = 172.25.249.200
port = 15672
username = rabbitmqauth
password = redhat

6. Verify that rabbitmqauth is configured as an administrator.

[root@director ~]# rabbitmqctl list_users


Listing users ...
c65393088ebee0e2170b044f924f2d924ae78276 [administrator]
rabbitmqauth [administrator]

7. Create an exchange topic named cl210.topic.

[root@director ~]# rabbitmqadmin -c ~/.rabbitmqadmin.conf declare exchange \


name=cl210.topic type=topic
exchange declared

8. Verify that the exchange topic is created.

[root@director ~]# rabbitmqctl list_exchanges | grep cl210.topic


cl210.topic topic

9. Download the scripts cl210_producer and cl210_consumer from http://


materials.example.com/ to /root and make them executable.

[root@director ~]# wget http://materials.example.com/cl210_producer


[root@director ~]# wget http://materials.example.com/cl210_consumer

10. On workstation, open a second terminal. Using SSH, log in as the stack user
to director. Switch to the root user. Launch the cl210_consumer script using
anonymous.info as the routing key.

[student@workstation ~]$ ssh stack@director


[stack@director ~]$ sudo -i
[root@director ~]# python /root/cl210_consumer anonymous.info

11. In the first terminal, launch the cl210_producer script to send messages using the routing
key anonymous.info.

[root@director ~]# python /root/cl210_producer


[x] Sent 'anonymous.info':'Hello World!'

12. In the second terminal, sent message(s) are received and displayed. Running the
cl210_producer script multiple times sends multiple messages.

CL210-RHOSP10.1-en-2-20171006 67

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

[x] 'anonymous.info':'Hello World!'

Exit this cl210_consumer terminal after observing the message(s) being received. You are
finished with the example publisher-consumer exchange scripts.

13. The next practice is to observe a message queue. Create a queue named redhat.queue.

[root@director ~]# rabbitmqadmin -c ~/.rabbitmqadmin.conf declare queue \


name=redhat.queue
queue declared

14. Verify that the queue is created. The message count is zero.

[root@director ~]# rabbitmqctl list_queues | grep redhat


redhat.queue 0

15. Publish messages to the redhat.queue queue. These first two examples include the
message payload on the command line.

[root@director ~]# rabbitmqadmin -c ~/.rabbitmqadmin.conf \


publish routing_key=redhat.queue payload="a message"
Message published
[root@director ~]# rabbitmqadmin -c ~/.rabbitmqadmin.conf \
publish routing_key=redhat.queue payload="another message"
Message published

16. Publish a third message to the redhat.queue queue, but without using the payload
parameter. When executing the command without specifying a payload, rabbitmqadmin
waits for multi-line input. Press Ctrl+D when the cursor is alone at the first space of a new
line to end message entry and publish the message.

[root@director ~]# rabbitmqadmin -c ~/.rabbitmqadmin.conf \


publish routing_key=redhat.queue
message line 1
message line 2
message line 3
Ctrl+D
Message published

17. Verify that the redhat queue has an increased message count.

[root@director ~]# rabbitmqctl list_queues | grep redhat


redhat.queue 3

18. Display the first message in the queue. The message_count field indicates how many more
messages exist after this one.

[root@director ~]# rabbitmqadmin -c ~/.rabbitmqadmin.conf get queue=redhat.queue \


-f pretty_json
[
{

68 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


"exchange": "",
"message_count": 2,
"payload": "a message",
"payload_bytes": 9,
"payload_encoding": "string",
"properties": [],
"redelivered": false,
"routing_key": "redhat.queue"
}
]

19. Display multiple messages using the count option. Each displayed message indicates how
many more messages follow. The redelivered field indicates whether you have previously
viewed this specific message.

[root@director ~]# rabbitmqadmin -c ~/.rabbitmqadmin.conf get queue=redhat.queue \


count=2 -f pretty_json
[
{
"exchange": "",
"message_count": 2,
"payload": "a message",
"payload_bytes": 9,
"payload_encoding": "string",
"properties": [],
"redelivered": true,
"routing_key": "redhat.queue"
}
{
"exchange": "",
"message_count": 1,
"payload": "another message",
"payload_bytes": 15,
"payload_encoding": "string",
"properties": [],
"redelivered": false,
"routing_key": "redhat.queue"
}
]

20. When finished, delete the queue named redhat.queue. Return to workstation.

[root@director ~]# rabbitmqadmin -c ~/.rabbitmqadmin.conf delete queue \


name=redhat.queue
queue deleted
[root@director ~]# exit
[stack@director ~]$ exit
[student@workstation ~]$

Cleanup
From workstation, run lab communication-msg-brokering cleanup to clean up
resources created for this exercise.

[student@workstation ~]$ lab communication-msg-brokering cleanup

CL210-RHOSP10.1-en-2-20171006 69

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

Lab: Managing Internal OpenStack


Communication

In this lab, you will troubleshoot and fix issues with the Keystone identity service and the
RabbitMQ message broker.

Outcomes
You should be able to:

• Troubleshoot the Keystone identity service.

• Troubleshoot the RabbitMQ message broker.

Scenario
During a recent deployment of the overcloud, cloud administrators are reporting issues with the
Compute and Image services. Cloud administrators are not able to access the Image service nor
the Compute service APIs. You have been tasked with troubleshooting and fixing these issues.

Before you begin


Log in to workstation as student with a password of student.

On workstation, run the lab communication-review setup command. This ensures that
the OpenStack services are running and the environment has been properly configured for this
lab.

[student@workstation ~]$ lab communication-review setup

Steps
1. From workstation, verify the issue by attempting to list instances as the OpenStack
admin user. The command is expected to hang.

2. Use SSH to connect to controller0 as the heat-admin user to begin troubleshooting.

3. Check the Compute service logs for any applicable errors.

4. Investigate and fix the issue based on the error discovered in the log. Modify the incorrect
rabbitmq port value in /etc/rabbitmq/rabbitmq-env.conf and use HUP signal to
respawn the beam.smp process. Log out of the controller0 node when finished.

5. From workstation, attempt to aqgain list instances, to verify that the issue is fixed. This
command is expected to display instances or return to a command prompt without hangiing.

6. Next, attempt to list images as well. The command is expected to fail, returning an internal
server error.

7. Use SSH to connect to controller0 as the heat-admin user to begin troubleshooting.

8. Inspect the Image service logs for any applicable errors.

9. The error in the Image service log indicates a communication issue with the Image service
API and the Identity service. In a previous step, you verified that the Identity service could

70 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


communicate with the Compute service API, so the next logical step is to focus on the Image
service configuration. Investigate and fix the issue based on the traceback found in the
Image service log.

10. From workstation, again attempt to list images to verify the fix. This command should
succeed and returning a command prompt without error.

Cleanup
From workstation, run the lab communication-review cleanup script to clean up the
resources created in this exercise.

[student@workstation ~]$ lab communication-review cleanup

CL210-RHOSP10.1-en-2-20171006 71

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

Solution
In this lab, you will troubleshoot and fix issues with the Keystone identity service and the
RabbitMQ message broker.

Outcomes
You should be able to:

• Troubleshoot the Keystone identity service.

• Troubleshoot the RabbitMQ message broker.

Scenario
During a recent deployment of the overcloud, cloud administrators are reporting issues with the
Compute and Image services. Cloud administrators are not able to access the Image service nor
the Compute service APIs. You have been tasked with troubleshooting and fixing these issues.

Before you begin


Log in to workstation as student with a password of student.

On workstation, run the lab communication-review setup command. This ensures that
the OpenStack services are running and the environment has been properly configured for this
lab.

[student@workstation ~]$ lab communication-review setup

Steps
1. From workstation, verify the issue by attempting to list instances as the OpenStack
admin user. The command is expected to hang.

1.1. From workstation, source the admin-rc credential file. Attempt to list any running
instances. The command is expected to hang, and does not return to the command
prompt. Use Ctrl+C to escape the command.

[student@workstation ~]$ source admin-rc


[student@workstation ~(admin-admin)]$ openstack server list

2. Use SSH to connect to controller0 as the heat-admin user to begin troubleshooting.

2.1. From workstation, use SSH to connect to controller0 as the heat-admin user.

[student@workstation ~(admin-admin)]$ ssh heat-admin@controller0

3. Check the Compute service logs for any applicable errors.

3.1. Check /var/log/nova/nova-conductor.log on controller0 for a recent error


from the AMQP server.

[heat-admin@controller0 ~]$ sudo tail /var/log/nova/nova-conductor.log


2017-05-30 02:54:28.223 6693 ERROR oslo.messaging._drivers.impl_rabbit [-]
[3a3a6e2f-00bf-4a4a-8ba5-91bc32c381dc] AMQP server on 172.24.1.1:5672 is
unreachable:
[Errno 111] ECONNREFUSED. Trying again in 32 seconds. Client port: None

72 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

4. Investigate and fix the issue based on the error discovered in the log. Modify the incorrect
rabbitmq port value in /etc/rabbitmq/rabbitmq-env.conf and use HUP signal to
respawn the beam.smp process. Log out of the controller0 node when finished.

4.1. Modify the incorrect rabbitmq port value in /etc/rabbitmq/rabbitmq-env.conf


by setting the variable NODE_PORT to 5672. Check that the variable is correct by
displaying the value again with the --get option. Because this file does not have a
section header, crudini requires specifying the section as "".

[heat-admin@controller0 ~]$ sudo crudini \


--set /etc/rabbitmq/rabbitmq-env.conf \
"" NODE_PORT 5672
[heat-admin@controller0 ~]$ sudo crudini \
--get /etc/rabbitmq/rabbitmq-env.conf \
"" NODE_PORT
5672

4.2. List the process ID for the beam.smp process. The beam.smp process is the application
virtual machine that interprets the Erlang language bytecode in which RabbitMQ works.
By locating and restarting this process, RabbitMQ reloads the fixed configuration.

[heat-admin@controller0 ~]$ sudo ps -ef | grep beam.smp


rabbitmq 837197 836998 10 03:42 ? 00:00:01 /usr/lib64/erlang/erts-7.3.1.2/bin/
beam.smp
-rabbit tcp_listeners [{"172.24.1.1",56721

4.3. Restart beam.smp by send a hangup signal to the retrieved process ID.

[heat-admin@controller0 ~]$ sudo kill -HUP 837197

4.4. List the beam.smp process ID to verify the tcp_listeners port is now 5672.

[heat-admin@controller0 ~]$ sudo ps -ef |grep beam.smp


rabbitmq 837197 836998 10 03:42 ? 00:00:01 /usr/lib64/erlang/erts-7.3.1.2/bin/
beam.smp
-rabbit tcp_listeners [{"172.24.1.1",5672

4.5. Log out of controller0.

[heat-admin@controller0 ~]$ exit


[student@workstation ~(admin-admin)]$

5. From workstation, attempt to aqgain list instances, to verify that the issue is fixed. This
command is expected to display instances or return to a command prompt without hangiing.

5.1. From workstation, list the instances again.

[student@workstation ~(admin-admin)]$ openstack server list


[student@workstation ~(admin-admin)]$

CL210-RHOSP10.1-en-2-20171006 73

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

6. Next, attempt to list images as well. The command is expected to fail, returning an internal
server error.

6.1. Attempt to list images.

[student@workstation ~(admin-admin)]$ openstack image list


Internal Server Error (HTTP 500)

7. Use SSH to connect to controller0 as the heat-admin user to begin troubleshooting.

7.1. From workstation, use SSH to connect to controller0 as the heat-admin user.

[student@workstation ~(admin-admin)]$ ssh heat-admin@controller0

8. Inspect the Image service logs for any applicable errors.

8.1. Inspect /var/log/glance/api.log on controller0 and focus on tracebacks that


involve auth and URL

[heat-admin@controller0 ~]$ sudo tail /var/log/glance/api.log -n 30


raise exceptions.DiscoveryFailure('Could not determine a suitable URL '
DiscoveryFailure: Could not determine a suitable URL for the plugin
2017-05-30 04:31:17.650 277258 INFO eventlet.wsgi.server [-] 172.24.3.1 - - [30/
May/2017 04:31:17] "GET /v2/images HTTP/1.1" 500 139 0.003257

9. The error in the Image service log indicates a communication issue with the Image service
API and the Identity service. In a previous step, you verified that the Identity service could
communicate with the Compute service API, so the next logical step is to focus on the Image
service configuration. Investigate and fix the issue based on the traceback found in the
Image service log.

9.1. First, view the endpoint URL for the Identity service.

[student@workstation ~(admin-admin)]$ openstack catalog show identity


+-----------+---------------------------------------------+
| Field | Value |
+-----------+---------------------------------------------+
| endpoints | regionOne |
| | publicURL: http://172.25.250.50:5000/v2.0 |
| | internalURL: http://172.24.1.50:5000/v2.0 |
| | adminURL: http://172.25.249.50:35357/v2.0 |
| | |
| name | keystone |
| type | identity |
+-----------+---------------------------------------------+

9.2. The traceback in /var/log/glance/api.log indicated an issue determining the


authentication URL. Inspect /etc/glance/glance-api.conf to verify auth_url
setting, noting the incorrect port.

[heat-admin@controller0 ~]$ sudo grep 'auth_url' /etc/glance/glance-api.conf


#auth_url = None
auth_url=http://172.25.249.60:3535

74 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

9.3. Modify the auth_url setting in /etc/glance/glance-api.conf to use port 35357.


Check that the variable is correct by displaying the value again with the --get option.

[heat-admin@controller0 ~]$ sudo crudini \


--set /etc/glance/glance-api.conf keystone_authtoken \
auth_url http://172.25.249.50:35357
[heat-admin@controller0 ~]$ sudo crudini \
--get /etc/glance/glance-api.conf \
keystone_authtoken auth_url
http://172.25.249.50:35357

9.4. Restart the openstack-glance-api service. When finished, exit from controller0.

[heat-admin@controller0 ~]$ sudo systemctl restart openstack-glance-api


[heat-admin@controller0 ~]$ exit
[student@workstation ~(admin-admin)]$

10. From workstation, again attempt to list images to verify the fix. This command should
succeed and returning a command prompt without error.

10.1. From workstation, attempt to list images. This command should succeed and returning a
command prompt without error.

[student@workstation ~(admin-admin)]$ openstack image list


[student@workstation ~(admin-admin)]$

Cleanup
From workstation, run the lab communication-review cleanup script to clean up the
resources created in this exercise.

[student@workstation ~]$ lab communication-review cleanup

CL210-RHOSP10.1-en-2-20171006 75

Rendered for Nokia. Please do not distribute.


Chapter 2. Managing Internal OpenStack Communication

Summary
In this chapter, you learned:

• RabbitMQ provides a suite of utilities to check the RabbitMQ daemon status and to execute
administrative operations on it.

• Red Hat OpenStack Platform recommends creating a cron job that runs hourly to purge
expired Keystone tokens.

• The Keystone endpoint adminURL should only be consumed by those who require
administrative access to a service endpoint.

• PKIZ tokens add compression using zlib making them smaller than PKI tokens.

• Fernet tokens have a maximum limit of 250 bytes, which makes them small enough to be ideal
for API calls and minimize the data kept on disk.

76 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


TRAINING
CHAPTER 3

BUILDING AND CUSTOMIZING


IMAGES

Overview
Goal Build and customize images
Objectives • Describe common image formats for OpenStack.

• Build an image using diskimage-builder.

• Customize an image using guestfish and virt-customize.


Sections • Describing Image Formats (and Quiz)

• Building an Image (and Guided Exercise)

• Customizing an Image (and Guided Exercise)


Lab • Building and Customizing Images

CL210-RHOSP10.1-en-2-20171006 77

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

Describing Image Formats

Objective
After completing this section, students should be able to describe the common image formats
used within Red Hat OpenStack Platform.

Common Image Formats


Building custom images is a great way to implement items that are standard across your
organization. Instances may be short-lived, so adding them to a configuration management
system may or may not be desirable. Items that could be configured in a custom image include
security hardening, third-party agents for monitoring or backup, and operator accounts with
associated SSH keys.

Red Hat OpenStack Platform supports many virtual disk image formats, including RAW, QCOW2,
AMI, VHD, and VMDK. In this chapter we will discuss the RAW and QCOW2 formats, their features,
and their use in Red Hat OpenStack Platform.

Image Format Overview


Format Description
RAW A RAW format image usually has an img extension, and
contains an exact copy of a disk.
QCOW2 The Qemu Copy On Write v2 format.
AMI Amazon Machine Image format.
VHD Virtual Hard Disk format, used in Microsoft Virtual PC.
VMDK Virtual Machine Disk format, created by VMware but now an
open format.

The RAW format is a bootable, uncompressed virtual disk image, whereas the QCOW2 format
is more complex and supports many features. File systems that support sparse files allow RAW
images to be only the size of the used data. This means that a RAW image of a 20 GiB disk may
only be 3 GiB in size. The attributes of both are compared in the following table.

Comparison of RAW and QCOW2 Image Formats


Attribute RAW QCOW2
Image Size A RAW image will take up the same QCOW2 is a sparse representation of
amount of disk space as the data the virtual disk image. Consequently,
it contains as long as it is sparse. it is smaller than a RAW image of
Unused space in the source does not the same source. It also supports
consume space in the image. compression using zlib.
Performance Considered better than QCOW2 Considered not as good as RAW due
because disk space is all allocated to the latency of performing disk
on VM creation. This avoids the allocation as space is required.
latencies introduced by allocating
space as required.
Encryption Not applicable. Optional. Uses 128-bit AES.

78 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Images in OpenStack Services

Attribute RAW QCOW2


Snapshots Not applicable. Supports multiple snapshots, which
are a read-only record of the image
at a particular point in time.
Copy-on-write Not applicable. Reduces storage consumption by
writing changes back to a copy of
the data to be modified. The original
is left unchanged.

When choosing between improved VM performance and reduced storage consumption, reduced
storage consumption is usually preferred. The performance difference between RAW and QCOW2
images is not great enough to outweigh the cost of allocated but underused storage.

Images in OpenStack Services


The OpenStack Compute service is the role that runs instances within Red Hat OpenStack
Platform. The image format required is dependent on the backend storage system configured.
In the default file based backend QCOW2 is the preferred image format because libvirt does not
support snapshots of RAW images. However when using Ceph the image needs to be converted
to RAW in order to leverage the clusters own snapshotting capabilities.

References
Further information is available in the documentation for Red Hat OpenStack Platform
at
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform

CL210-RHOSP10.1-en-2-20171006 79

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

Quiz: Describing Image Formats

Choose the correct answers to the following questions:

1. What is the correct image format when using Ceph as the back end for the OpenStack Image
service?

a. QCOW2
b. VHD
c. VMDK
d. RAW

2. Which four image formats are supported by Red Hat OpenStack Platform? (Choose four.)

a. VMDK
b. VBOX
c. VHD
d. QCOW2
e. RAW

3. Which three features are part of the QCOW2 format? (Choose three.)

a. Encryption
b. DFRWS support
c. Snapshots
d. Multi-bit error correction
e. Copy-on-write

80 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

Solution
Choose the correct answers to the following questions:

1. What is the correct image format when using Ceph as the back end for the OpenStack Image
service?

a. QCOW2
b. VHD
c. VMDK
d. RAW

2. Which four image formats are supported by Red Hat OpenStack Platform? (Choose four.)

a. VMDK
b. VBOX
c. VHD
d. QCOW2
e. RAW

3. Which three features are part of the QCOW2 format? (Choose three.)

a. Encryption
b. DFRWS support
c. Snapshots
d. Multi-bit error correction
e. Copy-on-write

CL210-RHOSP10.1-en-2-20171006 81

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

Building an Image

Objective
After completing this section, students should be able to build an image using diskimage-
builder.

Building a Custom Image


The benefits of building custom images include: ensuring monitoring agents are present; aligning
with the organization's security policy; and provisioning a common set of troubleshooting tools.

diskimage-builder is a tool for building and customizing cloud images. It can output virtual disk
images in a variety of formats, such as QCOW2 and RAW. Elements are applied by diskimage-
builder during the build process to customize the image. An element is a code set that runs
within a chroot environment and alters how an image is built. For example, the docker
elements export a tar file from a named container allowing other elements to build on top of it,
or the element bootloader, which installs grub2 on the boot partition of the system.

Diskimage-builder Architecture
diskimage-builder bind mounts /proc, /sys, and /dev in a chroot environment. The image-
building process produces minimal systems that possess all the required bits to fulfill their
purpose with OpenStack. Images can be as simple as a file system image or can be customized
to provide whole disk images. Upon completion of the file system tree, a loopback device with file
system (or partition table and file system) is built and the file system tree copied into it.

Diskimage-builder Elements
Elements are used to specify what goes into the image and any modifications that are desired.
Images are required to use at least one base distribution element, and there are multiple
elements for a given distribution. For example, the distribution element could be rhel7, and
then other elements are used to modify the rhel7 base image. Scripts are invoked and applied
to the image based on multiple elements.

Diskimage-builder Element Dependencies


Each element has the ability to use element-deps and element-provides to define or affect
dependencies. element-deps is a plain-text file containing a list of elements that will be added
to the list of elements built into the image when it is created. element-provides is a plain-text
file that contains a list of elements that are provided by this element. These particular elements
are not included with the elements built into the image at creation time.

The diskimage-builder package includes numerous elements:

[user@demo ~]$ ls /usr/share/diskimage-builder/elements


apt-conf docker pip-and-virtualenv
apt-preferences dpkg pip-cache
apt-sources dracut-network pkg-map
architecture-emulation-binaries dracut-ramdisk posix
baremetal dynamic-login proliant-tools
base element-manifest pypi
bootloader enable-serial-console python-brickclient
cache-url epel ramdisk
centos fedora ramdisk-base
centos7 fedora-minimal rax-nova-agent

82 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Diskimage-builder Architecture

centos-minimal gentoo redhat-common


cleanup-kernel-initrd growroot rhel
cloud-init grub2 rhel7
cloud-init-datasources hpdsa rhel-common
cloud-init-disable-resizefs hwburnin rpm-distro
cloud-init-nocloud hwdiscovery runtime-ssh-host-keys
debian ilo select-boot-kernel-initrd
debian-minimal install-bin selinux-permissive
debian-systemd install-static serial-console
debian-upstart install-types simple-init
debootstrap ironic-agent source-repositories
deploy ironic-discoverd-ramdisk stable-interface-names
deploy-baremetal iso svc-map
deploy-ironic local-config sysctl
deploy-kexec manifests uboot
deploy-targetcli mellanox ubuntu
deploy-tgtadm modprobe-blacklist ubuntu-core
devuser no-final-image ubuntu-minimal
dhcp-all-interfaces oat-client ubuntu-signed
dib-init-system openssh-server vm
dib-python opensuse yum
dib-run-parts opensuse-minimal yum-minimal
disable-selinux package-installs zypper
dkms partitioning-sfdisk zypper-minimal

Each element has scripts that are applied to the images as they are built. The following example
shows the scripts for the base element.

[user@demo ~]$ tree /usr/share/diskimage-builder/elements/base


/usr/share/diskimage-builder/elements/base/
|-- cleanup.d
| |-- 01-ccache
| `-- 99-tidy-logs
|-- element-deps
|-- environment.d
| `-- 10-ccache.bash
|-- extra-data.d
| `-- 50-store-build-settings
|-- install.d
| |-- 00-baseline-environment
| |-- 00-up-to-date
| |-- 10-cloud-init
| |-- 50-store-build-settings
| `-- 80-disable-rfc3041
|-- package-installs.yaml
|-- pkg-map
|-- pre-install.d
| `-- 03-baseline-tools
|-- README.rst
`-- root.d
`-- 01-ccache

6 directories, 15 files

Diskimage-builder Phase Subdirectories


Phase subdirectories should be located under an element directory; they may or may not exist
by default, so create them as required. They contain executable scripts that have a two-digit
numerical prefix, and are executed in numerical order. The convention is to store data files in the
element directory, but to only store executable scripts in the phase subdirectory. If a script is
not executable it will not run. The phase subdirectories are processed in the order listed in the
following table:

CL210-RHOSP10.1-en-2-20171006 83

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

Phase Subdirectories
Phase Subdirectory Description
root.d Builds or modifies the initial root file system content. This
is where customizations are added, such as building on an
existing image. Only one element can use this at a time unless
particular care is taken not to overwrite, but instead to adapt
the context extracted by other elements.
extra-data.d Include extra data from the host environment that hooks may
need when building the image. This copies any data such as
SSH keys, or HTTP proxy settings, under $TMP_HOOKS_PATH.
pre-install.d Prior to any customization or package installation, this code
runs in a chroot environment.
install.d In this phase the operating system and packages are installed,
this code runs in a chroot environment.
post-install.d This is the recommended phase to use for performing
tasks that must be handled after the operating system and
application installation, but before the first boot of the image.
For example, running systemctl enable to enable required
services.
block-device.d Customize the block device, for example, to make partitions.
Runs before the cleanup.d phase runs, but after the target
tree is fully populated.
finalize.d Runs in a chroot environment upon completion of the root
file system content being copied to the mounted file system.
Tuning of the root file system is performed in this phase, so
it is important to limit the operations to only those necessary
to affect the file system metadata and image itself. post-
install.d is preferred for most operations.
cleanup.d The root file system content is cleaned of temporary files.

Diskimage-builder Environment Variables


A number of environment variables must be exported, depending upon the required image
customization. Typically, at a minimum, the following variables will be exported:

Minimal Diskimage-builder Variables


Variable Description
DIB_LOCAL_IMAGE The base image to build from.
NODE_DIST The distribution of the base image, for example rhel7.
DIB_YUM_REPO_CONF The client yum repository configuration files to be copied into
the chroot environment during image building.
ELEMENTS_PATH The path to a working copy of the elements.

84 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Diskimage-builder Architecture

Important
Yum repository configuration files specified by DIB_YUM_REPO_CONF are copied into
/etc/yum.repos.d during the image build and removed when the build is done.
The intention is to provide the specified yum repository access only during the build
and not to leave that yum repository access in the final image. However, this removal
behavior may cause an unintended result; a yum repository configuration file specified
in DIB_YUM_REPO_CONF that matches an already existing configuration file in the
starting base image will result in that configuration file being removed from the final
image at the end of the build. Be sure to check for existing repository configuration
and exclude it from DIB_YUM_REPO_CONF if it should remain in the final built image.

Diskimage-builder Options
We will examine some of the options available in the context of the following example:

[user@demo ~]$ disk-image-create vm rhel7 -n \


-p python-django-compressor -a amd64 -o web.img 2>&1 | tee diskimage-build.log

The vm element provides sane defaults for virtual machine disk images. The next option is the
distribution; the rhel7 option is provided to specify that the image will be Red Hat Enterprise
Linux 7. The -n option skips the default inclusion of the base element, which might be desirable
if you prefer not to have cloud-init and package updates installed. The -p option specifies which
packages to install; here we are installing the python-django-compressor package. The -a option
specifies the architecture of the image. The -o option specifies the output image name.

Diskimage-builder Execution
Each element contains a set of scripts to execute. In the following excerpt from the diskimage-
build.log file, we see the scripts that were executed as part of the root phase.

Target: root.d

Script Seconds
--------------------------------------- ----------

01-ccache 0.017
10-rhel7-cloud-image 93.202
50-yum-cache 0.045
90-base-dib-run-parts 0.037

The run time for each script is shown on the right. Scripts that reside in the extra-data.d
phase subdirectory were then executed:

Target: extra-data.d

Script Seconds
--------------------------------------- ----------

01-inject-ramdisk-build-files 0.031
10-create-pkg-map-dir 0.114
20-manifest-dir 0.021
50-add-targetcli-module 0.038
50-store-build-settings 0.006
75-inject-element-manifest 0.040

CL210-RHOSP10.1-en-2-20171006 85

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

98-source-repositories 0.041
99-enable-install-types 0.023
99-squash-package-install 0.221
99-yum-repo-conf 0.039

From these examples, you can confirm the order that the phases were executed in and the order
of script execution in each phase.

Red Hat-provided Images: Cloud-Init


CloudInit is included in images provided by Red Hat and provides an interface for complex
customization early in the instance initialization process. It can accept customization data in
several formats, including standard shell script and cloud-config. The choice of customization
methods must be considered carefully because there are different features in all of them.

To avoid the proliferation of images, you can choose to add customization that is common across
the organization to images, and then perform more granular customization with CloudInit. If only
a small variety of system types are required, it might be simpler to perform all customization
using diskimage-builder.

Building an Image
The following steps outline the process for building an image with diskimage-builder.

1. Download a base image.

2. Open a terminal and create a working copy of the diskimage-builder elements.

3. Add a script to perform the desired customization under the working copy of the relevant
element phase directory.

4. Export the variables that diskimage-builder requires: ELEMENT_PATH,


DIB_LOCAL_IMAGE, NODE_DIST, and DIB_YUM_REPO_CONF.

5. Build the image using the disk-image-create command and appropriate options.

6. Upload the image to the OpenStack Image service.

7. Launch an instance using the custom image.

8. Attach a floating IP to the instance.

9. Connect to the instance using SSH and verify the customization was executed.

References
Diskimage-builder Documentation
https://docs.openstack.org/diskimage-builder/latest/

86 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Guided Exercise: Building an Image

Guided Exercise: Building an Image

In this exercise you will build and customize a disk image using diskimage-builder.

Resources
Base Image http://materials.example.com/osp-small.qcow2
Working Copy of diskimage- /home/student/elements
builder Elements

Outcomes
You should be able to:

• Build and customize an image using diskimage-builder.

• Upload the image into the OpenStack image service.

• Spawn an instance using the customized image.

Before you begin


Log in to workstation as student using student as the password.

On workstation, run the lab customization-img-building setup command. This


ensures that the required packages are installed on workstation, and provision the environment
with a public network, a private network, a private key, and security rules to access the instance.

[student@workstation ~]$ lab customization-img-building setup

Steps
1. From workstation, retrieve the osp-small.qcow2 image from http://
materials.example.com/osp-small.qcow2 and save it under /home/student/.

[student@workstation ~]$ wget http://materials.example.com/osp-small.qcow2

2. Create a copy of the diskimage-builder elements directory to work with under /home/
student/.

[student@workstation ~]$ cp -a /usr/share/diskimage-builder/elements /home/student/

3. Create a post-install.d directory under the working copy of the rhel7 element.

[student@workstation ~]$ mkdir -p /home/student/elements/rhel7/post-install.d

4. Add three scripts under the rhel7 element post-install.d directory to enable the
vsftpd service, add vsftpd:ALL to /etc/hosts.allow, and disable anonymous ftp in /
etc/vsftpd/vsftpd.conf.

[student@workstation ~]$ cd /home/student/elements/rhel7/post-install.d/


[student@workstation post-install.d]$ cat <<EOF > 01-enable-services
#!/bin/bash

CL210-RHOSP10.1-en-2-20171006 87

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

systemctl enable vsftpd


EOF
[student@workstation post-install.d]$ cat <<EOF > 02-vsftpd-hosts-allow
#!/bin/bash
if ! grep -q vsftpd /etc/hosts.allow
then
echo "vsftpd:ALL" >> /etc/hosts.allow
fi
EOF
[student@workstation post-install.d]$ cat <<EOF > 03-vsftpd-disable-anonymous
#!/bin/bash
sed -i 's|^anonymous_enable=.*|anonymous_enable=NO|' /etc/vsftpd/vsftpd.conf
EOF

5. Return to the student home directory. Set the executable permission on the scripts.

[student@workstation post-install.d]$ cd
[student@workstation ~]$ chmod +x /home/student/elements/rhel7/post-install.d/*

6. Export the following environment variables, which diskimage-builder requires.

Environment Variables
Variable Content
NODE_DIST rhel7
DIB_LOCAL_IMAGE /home/student/osp-small.qcow2
DIB_YUM_REPO_CONF /etc/yum.repos.d/openstack.repo
ELEMENTS_PATH /home/student/elements

[student@workstation ~]$ export NODE_DIST=rhel7


[student@workstation ~]$ export DIB_LOCAL_IMAGE=/home/student/osp-small.qcow2
[student@workstation ~]$ export DIB_YUM_REPO_CONF=/etc/yum.repos.d/openstack.repo
[student@workstation ~]$ export ELEMENTS_PATH=/home/student/elements

7. Build the finance-rhel-ftp.qcow2 image and include the vsftpd package. The scripts
created earlier are automatically integrated.

[student@workstation ~]$ disk-image-create vm rhel7 \


-t qcow2 \
-p vsftpd \
-o finance-rhel-ftp.qcow2
...output omitted...

8. As the developer1 OpenStack user, upload the finance-rhel-ftp.qcow2 image to the


image service as finance-rhel-ftp, with a minimum disk requirement of 10 GiB, and a
minimum RAM requirement of 2 GiB.

8.1. Source the developer1-finance-rc credentials file.

[student@workstation ~]$ source developer1-finance-rc


[student@workstation ~(developer1-finance)]$

88 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


8.2. Using the openstack command, upload the image finance-rhel-ftp.qcow2 into
the OpenStack image service.

[student@workstation ~(developer1-finance)]$ openstack image create \


--disk-format qcow2 \
--min-disk 10 \
--min-ram 2048 \
--file finance-rhel-ftp.qcow2 \
finance-rhel-ftp
...output omitted...

9. Launch an instance in the environment using the following attributes.

Instance Attributes
Attribute Value
flavor m1.web
key pair developer1-keypair1
network finance-network1
image finance-rhel-ftp
security group finance-ftp
name finance-ftp1

[student@workstation ~(developer1-finance)]$ openstack server create \


--flavor m1.web \
--key-name developer1-keypair1 \
--nic net-id=finance-network1 \
--image finance-rhel-ftp \
--security-group finance-ftp \
--wait finance-ftp1
...output omitted...

10. List the available floating IP addresses, then allocate one to finance-ftp1.

10.1. List the available floating IPs.

[student@workstation ~(developer1-finance)]$ openstack floating ip list \


-c "Floating IP Address" -c Port
+---------------------+------+
| Floating IP Address | Port |
+---------------------+------+
| 172.25.250.P | None |
+---------------------+------+

10.2.Attach an available floating IP to the instance finance-ftp1.

[student@workstation ~(developer1-finance)]$ openstack server add floating \


ip finance-ftp1 172.25.250.P

CL210-RHOSP10.1-en-2-20171006 89

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

11. When the image build was successful, the resulting FTP server displays messages and
requests login credentials. If the following ftp command does not prompt for login
credentials, troubleshoot the image build or deployment.

Attempt to log in to the finance-ftp1 instance as student using the ftp command. Look
for the 220 (vsFTPd 3.0.2) message indicating server response. After login, exit at the
ftp prompt.

[student@workstation ~(developer1-finance)]$ ftp 172.25.250.P


Connected to 172.25.250.P (172.25.250.P).
220 (vsFTPd 3.0.2)
Name (172.25.250.P:student): student
331 Please specify the password.
Password: student
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> exit

Cleanup
From workstation, run the lab customization-img-building cleanup command to
clean up this exercise.

[student@workstation ~]$ lab customization-img-building cleanup

90 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Customizing an Image

Customizing an Image

Objectives
After completing this section, students should be able to customize an image using guestfish
and virt-customize.

Making Minor Image Customizations


Building an image using diskimage-builder can take several minutes, and may require a
copy of the elements directory to be used for each image. If you only require a small number of
customizations, you could save time by using the guestfish or virt-customize commands
to modify a base image, such as the one provided by Red Hat in the rhel-guest-image-7 package.
The image provided in the rhel-guest-image-7 package has a minimal set of packages and
has cloud-init installed and enabled. You can download the rhel-guest-image-7 package from
https://access.redhat.com/downloads.

Guestfish and Virt-customize Internals


guestfish and virt-customize both use the libguestfs API to perform their functions.
Libguestfs needs a back end that can work with the various image formats, and by default it uses
libvirt. The process to open an image for editing with a libvirt back end includes creating
an overlay file for the image, creating an appliance, booting the appliance with or without
network support, and mounting the partitions. You can investigate the process in more detail by
exporting two environment variables, LIBGUESTFS_DEBUG=1 and LIBGUESTFS_TRACE=1, and
then executing guestfish or virt-customize with the -a option to add a disk.

Using Guestfish to Customize Images


guestfish is a low-level tool that exposes the libguestfs API directly, which means that you
can manipulate images in a very granular fashion. The following example uses the -i option
to mount partitions automatically, the -a option to add the disk image, and the --network
option to enable network access. It then installs the aide package, sets the password for root, and
restores SELinux file contexts.

[user@demo ~]$ guestfish -i --network -a ~/demo-rhel-base.qcow2

Welcome to guestfish, the guest filesystem shell for


editing virtual machine filesystems and disk images.

Type: 'help' for help on commands


'man' to read the manual
'quit' to quit the shell

Operating system: Red Hat Enterprise Linux Server 7.3 (Maipo)


/dev/sda1 mounted on /

><fs> command "yum -y install aide"


...output omitted...
><fs> command "echo letmein | passwd --stdin root"
><fs> selinux-relabel /etc/selinux/targeted/contexts/files/file_contexts /

CL210-RHOSP10.1-en-2-20171006 91

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

Using Virt-customize to Customize Images


virt-customize is a high-level tool that also uses the libguestfs API, but eases image-
building by performing tasks using simple options that may have required multiple API calls to
achieve using guestfish or the libguestfs API directly. The following example shows virt-
customize using the -a option to add the disk, installing a package, setting the root password,
and resetting SELinux contexts.

[user@demo ~]$ virt-customize -a ~/demo-rhel-base.qcow2 \


--install aide \
--root-password password:letmein \
--selinux-relabel
...output omitted...

The following table compares the two tools.

Comparison of guestfish and virt-customize Commands


Feature guestfish virt-customize
Complexity A low-level tool that exposes the A high-level tool that is easier to use
guestfs API directly. and simplifies common tasks.
SELinux support Use the selinux-relabel /etc/ Use the --selinux-relabel
selinux/targeted/contexts/ option to restore file contexts.
files/file_contexts / This option will use the
command to restore SELinux file touch /.autorelabel command
contexts. if relabeling is unsuccessful.
Options For low-level tasks such as For common tasks such as installing
manipulating partitions, scripting, packages, changing passwords,
and remote access. setting the host name and
time zone, and registering with
Subscription Manager.

The --selinux-relabel customization option relabels files in the guest so that they
have the correct SELinux label. This option tries to relabel files immediately. If unsuccessful,
/.autorelabel is created on the image. This schedules the relabel operation for the next time
the image boots.

Use Cases
For most common image customization tasks, virt-customize is the best choice. However,
as listed in the table above, the less frequent low-level tasks should be performed with the
guestfish command.

Important
When working with images that have SELinux enabled, ensure that the correct SELinux
relabeling syntax is used to reset proper labels on files modified. Files with incorrectly
labeled context will cause SELinux access denials. If the mislabeled files are critical
system files, the image may not be able to boot until labeling is fixed.

Customizing an Image with guestfish

92 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Use Cases

The following steps outline the process for customizing an image with guestfish.

1. Download a base image.

2. Execute the guestfish command. Use -i to automatically mount the partitions and use -
a to add the image.

3. Perform the changes you require, using commands such as add, rm, and command.

Important
If your image will have SELinux enabled, ensure you relabel any affected files
using the selinux-relabel /etc/selinux/targeted/contexts/files/
file_contexts / command.

4. Exit the guestfish shell.

5. Upload the image to the OpenStack Image service.

6. Launch an instance using the custom image.

7. Attach a floating IP to the instance.

8. Connect to the instance using SSH and verify the customization was executed.

Customizing an Image with virt-customize


The following steps outline the process for customizing an image with virt-customize.

1. Download a base image.

2. Execute the virt-customize command. Use -a to add the image, and then use other
options such as --run-command, --install, --write and --root-password.

Important
If your image will have SELinux enabled, ensure you use the --selinux-
relabel option last. Running the restorecon command inside the image will
not work through virt-customize.

3. Upload the image to the OpenStack Image service.

4. Launch an instance using the custom image.

5. Attach a floating IP to the instance.

6. Connect to the instance using SSH and verify the customization was executed.

CL210-RHOSP10.1-en-2-20171006 93

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

References
guestfish - the guest file system shell
http://libguestfs.org/guestfish.1.html

virt-customize - Customize a virtual machine


http://libguestfs.org/virt-customize.1.html

94 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Guided Exercise: Customizing an Image

Guided Exercise: Customizing an Image

In this exercise you will customize disk images using guestfish and virt-customize.

Resources
Base Image http://materials.example.com/osp-small.qcow2

Outcomes
You should be able to:

• Customize an image using guestfish.

• Customize an image using virt-customize.

• Upload an image into Glance.

• Spawn an instance using a customized image.

Before you begin


Log in to workstation as student using student as the password.

On workstation, run the lab customization-img-customizing setup command. This


ensures that the required packages are installed on workstation, and provisions the environment
with a public network, a private network, a private key, and security rules to access the instance.

[student@workstation ~]$ lab customization-img-customizing setup

Steps
1. From workstation, retrieve the osp-small.qcow2 image from http://
materials.example.com/osp-small.qcow2 and save it as /home/student/
finance-rhel-db.qcow2.

[student@workstation ~]$ wget http://materials.example.com/osp-small.qcow2 \


-O ~/finance-rhel-db.qcow2

2. Using the guestfish command, open the image for editing and include network access.

[student@workstation ~]$ guestfish -i --network -a ~/finance-rhel-db.qcow2

Welcome to guestfish, the guest filesystem shell for


editing virtual machine filesystems and disk images.

Type: 'help' for help on commands


'man' to read the manual
'quit' to quit the shell

Operating system: Red Hat Enterprise Linux Server 7.3 (Maipo)


/dev/sda1 mounted on /

><fs>

3. Install the mariadb and mariadb-server packages.

CL210-RHOSP10.1-en-2-20171006 95

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

><fs> command "yum -y install mariadb mariadb-server"


...output omitted...
Installed:
mariadb.x86_64 1:5.5.52-1.el7 mariadb-server.x86_64 1:5.5.52-1.el7

Dependency Installed:
libaio.x86_64 0:0.3.109-13.el7
perl-Compress-Raw-Bzip2.x86_64 0:2.061-3.el7
perl-Compress-Raw-Zlib.x86_64 1:2.061-4.el7
perl-DBD-MySQL.x86_64 0:4.023-5.el7
perl-DBI.x86_64 0:1.627-4.el7
perl-Data-Dumper.x86_64 0:2.145-3.el7
perl-IO-Compress.noarch 0:2.061-2.el7
perl-Net-Daemon.noarch 0:0.48-5.el7
perl-PlRPC.noarch 0:0.2020-14.el7

Complete!

4. Enable the mariadb service.

><fs> command "systemctl enable mariadb"

5. Because there was no output, ensure the mariadb service was enabled.

><fs> command "systemctl is-enabled mariadb"


enabled

6. Ensure the SELinux contexts for all affected files are correct.

Important
Files modified from inside the guestfish tool are written without valid SELinux
context. Failure to relabel critical modified files, such as /etc/passwd, will result
in an unusable image, since SELinux properly denies access to files with improper
context, during the boot process.

Although a relabel can be configured using touch /.autorelabel from within


guestfish, this would be persistent on the image, resulting in a relabel being
performed on every boot for every instance deployed using this image. Instead,
the foollowing step performs the relabel just once, right now.

><fs> selinux-relabel /etc/selinux/targeted/contexts/files/file_contexts /

7. Exit from the guestfish shell.

><fs> exit

[student@workstation ~]$

96 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


8. As the developer1 OpenStack user, upload the finance-rhel-db.qcow2 image to the
image service as finance-rhel-db, with a minimum disk requirement of 10 GiB, and a
minimum RAM requirement of 2 GiB.

8.1. Source the developer1-finance-rc credential file.

[student@workstation ~]$ source developer1-finance-rc


[student@workstation ~(developer1-finance)]$

8.2. As the developer1 OpenStack user, upload the finance-rhel-db.qcow2 image to


the image service as finance-rhel-db.

[student@workstation ~(developer1-finance)]$ openstack image create \


--disk-format qcow2 \
--min-disk 10 \
--min-ram 2048 \
--file finance-rhel-db.qcow2 \
finance-rhel-db
...output omitted...

9. Launch an instance in the environment using the following attributes:

Instance Attributes
Attribute Value
flavor m1.database
key pair developer1-keypair1
network finance-network1
image finance-rhel-db
security group finance-db
name finance-db1

[student@workstation ~(developer1-finance)]$ openstack server create \


--flavor m1.database \
--key-name developer1-keypair1 \
--nic net-id=finance-network1 \
--security-group finance-db \
--image finance-rhel-db \
--wait finance-db1
...output omitted...

10. List the available floating IP addresses, and then allocate one to finance-db1.

10.1. List the floating IPs; unallocated IPs have None listed as their Port value.

[student@workstation ~(developer1-finance)]$ openstack floating ip list \


-c "Floating IP Address" -c Port
+---------------------+------+
| Floating IP Address | Port |
+---------------------+------+
| 172.25.250.P | None |
| 172.25.250.R | None |

CL210-RHOSP10.1-en-2-20171006 97

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

+---------------------+------+

10.2.Attach an unallocated floating IP to the finance-db1 instance.

[student@workstation ~(developer1-finance)]$ openstack server add floating \


ip finance-db1 172.25.250.P

11. Use ssh to connect to the finance-db1 instance. Ensure the mariadb-server package is
installed, and that the mariadb service is enabled and running.

11.1. Log in to the finance-db1 instance using ~/developer1-keypair1.pem with ssh.

[student@workstation ~(developer1-finance)]$ ssh -i ~/developer1-keypair1.pem \


cloud-user@172.25.250.P
Warning: Permanently added '172.25.250.P' (ECDSA) to the list of known hosts.
[cloud-user@finance-db1 ~]$

11.2. Confirm that the mariadb-server package is installed.

[cloud-user@finance-db1 ~]$ rpm -q mariadb-server


mariadb-server-5.5.52-1.el7.x86_64

11.3. Confirm that the mariadb service is enabled and running, and then log out.

[cloud-user@finance-db1 ~]$ systemctl status mariadb


...output omitted...
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor
preset: disabled)
Active: active (running) since Mon 2017-05-29 20:49:37 EDT; 9min ago
Process: 1033 ExecStartPost=/usr/libexec/mariadb-wait-ready $MAINPID
(code=exited, status=0/SUCCESS)
Process: 815 ExecStartPre=/usr/libexec/mariadb-prepare-db-dir %n (code=exited,
status=0/SUCCESS)
Main PID: 1031 (mysqld_safe)
...output omitted...
[cloud-user@finance-db1 ~]$ exit
logout
Connection to 172.25.250.P closed.
[student@workstation ~(developer1-finance)]$

12. From workstation, retrieve the osp-small.qcow2 image from http://


materials.example.com/osp-small.qcow2 and save it as /home/student/
finance-rhel-mail.qcow2.

[student@workstation ~(developer1-finance)]$ wget \


http://materials.example.com/osp-small.qcow2 -O ~/finance-rhel-mail.qcow2

13. Use the virt-customize command to customize the ~/finance-rhel-mail.qcow2


image. Enable the postfix service, configure postfix to listen on all interfaces, and relay
all mail to workstation.lab.example.com. Install the mailx package to enable sending a test
email. Ensure the SELinux contexts are restored.

98 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


[student@workstation ~(developer1-finance)]$ virt-customize \
-a ~/finance-rhel-mail.qcow2 \
--run-command 'systemctl enable postfix' \
--run-command 'postconf -e "relayhost = [workstation.lab.example.com]"' \
--run-command 'postconf -e "inet_interfaces = all"' \
--run-command 'yum -y install mailx' \
--selinux-relabel
[ 0.0] Examining the guest ...
[ 84.7] Setting a random seed
[ 84.7] Running: systemctl enable postfix
[ 86.5] Running: postconf -e "relayhost = [workstation.lab.example.com]"
[ 88.4] Running: postconf -e "inet_interfaces = all"
[ 89.8] Running: yum -y install mailx
[ 174.0] SELinux relabelling
[ 532.7] Finishing off

14. As the developer1 OpenStack user, upload the finance-rhel-mail.qcow2 image to


the image service as finance-rhel-mail, with a minimum disk requirement of 10 GiB, and
a minimum RAM requirement of 2 GiB.

14.1. Use the openstack command to upload the finance-rhel-mail.qcow2 image to


the image service.

[student@workstation ~(developer1-finance)]$ openstack image create \


--disk-format qcow2 \
--min-disk 10 \
--min-ram 2048 \
--file ~/finance-rhel-mail.qcow2 \
finance-rhel-mail
...output omitted...

15. Launch an instance in the environment using the following attributes:

Instance Attributes
Attribute Value
flavor m1.web
key pair developer1-keypair1
network finance-network1
image finance-rhel-mail
security group finance-mail
name finance-mail1

[student@workstation ~(developer1-finance)]$ openstack server create \


--flavor m1.web \
--key-name developer1-keypair1 \
--nic net-id=finance-network1 \
--security-group finance-mail \
--image finance-rhel-mail \
--wait finance-mail1
...output omitted...

16. List the available floating IP addresses, and allocate one to finance-mail1.

CL210-RHOSP10.1-en-2-20171006 99

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

16.1. List the available floating IPs.

[student@workstation ~(developer1-finance)]$ openstack floating ip list \


-c "Floating IP Address" -c Port
+---------------------+--------------------------------------+
| Floating IP Address | Port |
+---------------------+--------------------------------------+
| 172.25.250.P | 1ce9ffa5-b52b-4581-a696-52f464912500 |
| 172.25.250.R | None |
+---------------------+--------------------------------------+

16.2.Attach an available floating IP to the finance-mail1 instance.

[student@workstation ~(developer1-finance)]$ openstack server add floating \


ip finance-mail1 172.25.250.R

17. Use ssh to connect to the finance-mail1 instance. Ensure the postfix service is
running, that postfix is listening on all interfaces, and that the relay_host directive is
correct.

17.1. Log in to the finance-mail1 instance using ~/developer1-keypair1.pem with


ssh.

[student@workstation ~(developer1-finance)]$ ssh -i ~/developer1-keypair1.pem \


cloud-user@172.25.250.R
Warning: Permanently added '172.25.250.R' (ECDSA) to the list of known hosts.
[cloud-user@finance-mail1 ~]$

17.2. Ensure the postfix service is running.

[cloud-user@finance-mail1 ~]$ systemctl status postfix


...output omitted...
Loaded: loaded (/usr/lib/systemd/system/postfix.service; enabled; vendor
preset: disabled)
Active: active (running) since Mon 2017-05-29 00:59:32 EDT; 4s ago
Process: 1064 ExecStart=/usr/sbin/postfix start (code=exited, status=0/
SUCCESS)
Process: 1061 ExecStartPre=/usr/libexec/postfix/chroot-update (code=exited,
status=0/SUCCESS)
Process: 1058 ExecStartPre=/usr/libexec/postfix/aliasesdb (code=exited,
status=0/SUCCESS)
Main PID: 1136 (master)
...output omitted...

17.3. Ensure postfix is listening on all interfaces.

[cloud-user@finance-mail1 ~]$ sudo ss -tnlp | grep master


LISTEN 0 100 *:25 *:* users:(("master",pid=1136,fd=13))
LISTEN 0 100 :::25 :::* users:(("master",pid=1136,fd=14))

17.4. Ensure the relayhost directive is configured correctly.

[cloud-user@finance-mail1 ~]$ postconf relayhost

100 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


relayhost = [workstation.lab.example.com]

17.5. Send a test email to student@workstation.lab.example.com.

[cloud-user@finance-mail1 ~]$ mail -s "Test" student@workstation.lab.example.com


Hello World!
.
EOT

17.6. Return to workstation. Use the mail command to confirm that the test email arrived.

[cloud-user@finance-mail1 ~]$ exit


[student@workstation ~]$ mail
Heirloom Mail version 12.5 7/5/10. Type ? for help.
"/var/spool/mail/student": 1 message 1 new
>N 1 Cloud User Mon May 29 01:18 22/979 "Test"
& q

Cleanup
From workstation, run the lab customization-img-customizing cleanup command
to clean up this exercise.

[student@workstation ~]$ lab customization-img-customizing cleanup

CL210-RHOSP10.1-en-2-20171006 101

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

Lab: Building and Customizing Images

In this lab, you will build a disk image using diskimage-builder, and then modify it using
guestfish.

Resources
Base Image URL http://materials.example.com/osp-small.qcow2
Diskimage-builder elements /usr/share/diskimage-builder/elements
directory

Outcomes
You will be able to:

• Build an image using diskimage-builder.

• Customize the image using the guestfish command.

• Upload the image to the OpenStack image service.

• Spawn an instance using the customized image.

Before you begin


Log in to workstation as student using student as the password.

On workstation, run the lab customization-review setup command. This ensures that
the required packages are installed on workstation, and provisions the environment with a public
network, a private network, a key pair, and security rules to access the instance.

[student@workstation ~]$ lab customization-review setup

Steps
1. From workstation, retrieve the osp-small.qcow2 image from http://
materials.example.com/osp-small.qcow2 and save it in the /home/student/
directory.

2. Create a copy of the diskimage-builder elements directory to work with in the /home/
student/ directory.

3. Create a post-install.d directory under the working copy of the rhel7 element.

4. Add a script under the rhel7/post-install.d directory to enable the httpd service.

5. Export the following environment variables, which diskimage-builder requires.

Environment Variables
Variable Content
NODE_DIST rhel7
DIB_LOCAL_IMAGE /home/student/osp-small.qcow2
DIB_YUM_REPO_CONF "/etc/yum.repos.d/openstack.repo"

102 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Variable Content
ELEMENTS_PATH /home/student/elements

6. Build a RHEL 7 image named production-rhel-web.qcow2 using the diskimage-


builder elements configured previously. Include the httpd package in the image.

7. Add a custom web index page to the production-rhel-web.qcow2 image using


guestfish. Include the text production-rhel-web in the index.html file. Ensure the
SELinux context of /var/www/html/index.html is correct.

8. As the operator1 user, create a new OpenStack image named production-rhel-web


using the production-rhel-web.qcow2 image, with a minimum disk requirement of
10 GiB, and a minimum RAM requirement of 2 GiB.

9. As the operator1 user, launch an instance using the following attributes:

Instance Attributes
Attribute Value
flavor m1.web
key pair operator1-keypair1
network production-network1
image production-rhel-web
security group production-web
name production-web1

10. List the available floating IP addresses, and then allocate one to production-web1.

11. Log in to the production-web1 instance using operator1-keypair1.pem with ssh.


Ensure the httpd package is installed, and that the httpd service is enabled and running.

12. From workstation, confirm that the custom web page, displayed from production-
web1, contains the text production-rhel-web.

Evaluation
From workstation, run the lab customization-review grade command to confirm the
success of this exercise. Correct any reported failures and rerun the command until successful.

[student@workstation ~]$ lab customization-review grade

Cleanup
From workstation, run the lab customization-review cleanup command to clean up
this exercise.

[student@workstation ~]$ lab customization-review cleanup

CL210-RHOSP10.1-en-2-20171006 103

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

Solution
In this lab, you will build a disk image using diskimage-builder, and then modify it using
guestfish.

Resources
Base Image URL http://materials.example.com/osp-small.qcow2
Diskimage-builder elements /usr/share/diskimage-builder/elements
directory

Outcomes
You will be able to:

• Build an image using diskimage-builder.

• Customize the image using the guestfish command.

• Upload the image to the OpenStack image service.

• Spawn an instance using the customized image.

Before you begin


Log in to workstation as student using student as the password.

On workstation, run the lab customization-review setup command. This ensures that
the required packages are installed on workstation, and provisions the environment with a public
network, a private network, a key pair, and security rules to access the instance.

[student@workstation ~]$ lab customization-review setup

Steps
1. From workstation, retrieve the osp-small.qcow2 image from http://
materials.example.com/osp-small.qcow2 and save it in the /home/student/
directory.

[student@workstation ~]$ wget http://materials.example.com/osp-small.qcow2 \


-O /home/student/osp-small.qcow2

2. Create a copy of the diskimage-builder elements directory to work with in the /home/
student/ directory.

[student@workstation ~]$ cp -a /usr/share/diskimage-builder/elements /home/student/

3. Create a post-install.d directory under the working copy of the rhel7 element.

[student@workstation ~]$ mkdir -p /home/student/elements/rhel7/post-install.d

4. Add a script under the rhel7/post-install.d directory to enable the httpd service.

4.1. Add a script to enable the httpd service.

104 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

[student@workstation ~]$ cd /home/student/elements/rhel7/post-install.d/


[student@workstation post-install.d]$ cat <<EOF > 01-enable-services
#!/bin/bash
systemctl enable httpd
EOF

4.2. Set the executable permission on the script.

[student@workstation post-install.d]$ chmod +x 01-enable-services

4.3. Change back to your home directory.

[student@workstation post-install.d]$ cd
[student@workstation ~]$

5. Export the following environment variables, which diskimage-builder requires.

Environment Variables
Variable Content
NODE_DIST rhel7
DIB_LOCAL_IMAGE /home/student/osp-small.qcow2
DIB_YUM_REPO_CONF "/etc/yum.repos.d/openstack.repo"
ELEMENTS_PATH /home/student/elements

[student@workstation ~]$ export NODE_DIST=rhel7


[student@workstation ~]$ export DIB_LOCAL_IMAGE=/home/student/osp-small.qcow2
[student@workstation ~]$ export DIB_YUM_REPO_CONF=/etc/yum.repos.d/openstack.repo
[student@workstation ~]$ export ELEMENTS_PATH=/home/student/elements

6. Build a RHEL 7 image named production-rhel-web.qcow2 using the diskimage-


builder elements configured previously. Include the httpd package in the image.

[student@workstation ~]$ disk-image-create vm rhel7 \


-t qcow2 \
-p httpd \
-o production-rhel-web.qcow2

7. Add a custom web index page to the production-rhel-web.qcow2 image using


guestfish. Include the text production-rhel-web in the index.html file. Ensure the
SELinux context of /var/www/html/index.html is correct.

7.1. Open a guestfish shell for the production-rhel-web.qcow2 image.

[student@workstation ~]$ guestfish -i -a production-rhel-web.qcow2


...output omitted...
><fs>

7.2. Create a new /var/www/html/index.html file.

CL210-RHOSP10.1-en-2-20171006 105

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

><fs> touch /var/www/html/index.html

7.3. Edit the /var/www/html/index.html file and include the required key words.

><fs> edit /var/www/html/index.html


This instance uses the production-rhel-web image.

7.4. To ensure the new index page works with SELinux in enforcing mode, restore the /var/
www/ directory context (including the index.html file).

><fs> selinux-relabel /etc/selinux/targeted/contexts/files/file_contexts /var/


www/

7.5. Exit the guestfish shell.

><fs> exit

[student@workstation ~]$

8. As the operator1 user, create a new OpenStack image named production-rhel-web


using the production-rhel-web.qcow2 image, with a minimum disk requirement of
10 GiB, and a minimum RAM requirement of 2 GiB.

8.1. Source the operator1-production-rc credentials file.

[student@workstation ~]$ source operator1-production-rc


[student@workstation ~(operator1-production)]$

8.2. Upload the production-rhel-web.qcow2 image to the OpenStack Image service.

[student@workstation ~(operator1-production)]$ openstack image create \


--disk-format qcow2 \
--min-disk 10 \
--min-ram 2048 \
--file production-rhel-web.qcow2 \
production-rhel-web
...output omitted...

9. As the operator1 user, launch an instance using the following attributes:

Instance Attributes
Attribute Value
flavor m1.web
key pair operator1-keypair1
network production-network1
image production-rhel-web
security group production-web

106 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

Attribute Value
name production-web1

[student@workstation ~(operator1-production)]$ openstack server create \


--flavor m1.web \
--key-name operator1-keypair1 \
--nic net-id=production-network1 \
--image production-rhel-web \
--security-group production-web \
--wait production-web1
...output omitted...

10. List the available floating IP addresses, and then allocate one to production-web1.

10.1. List the floating IPs. Available IP addresses have the Port attribute set to None.

[student@workstation ~(operator1-production)]$ openstack floating ip list \


-c "Floating IP Address" -c Port
+---------------------+------+
| Floating IP Address | Port |
+---------------------+------+
| 172.25.250.P | None |
+---------------------+------+

10.2.Attach an available floating IP to the production-web1 instance.

[student@workstation ~(operator1-production)]$ openstack server add \


floating ip production-web1 172.25.250.P

11. Log in to the production-web1 instance using operator1-keypair1.pem with ssh.


Ensure the httpd package is installed, and that the httpd service is enabled and running.

11.1. Use SSH to log in to the production-web1 instance using operator1-


keypair1.pem.

[student@workstation ~(operator1-production)]$ ssh -i operator1-keypair1.pem \


cloud-user@172.25.250.P
Warning: Permanently added '172.25.250.P' (ECDSA) to the list of known hosts.
[cloud-user@production-web1 ~]$

11.2. Confirm that the httpd package is installed.

[cloud-user@production-web1 ~]$ rpm -q httpd


httpd-2.4.6-45.el7.x86_64

11.3. Confirm that the httpd service is running.

[cloud-user@production-web1 ~]$ systemctl status httpd


...output omitted...
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor
preset: disabled)
Active: active (running) since Wed 2017-05-24 23:55:42 EDT; 8min ago
Docs: man:httpd(8)

CL210-RHOSP10.1-en-2-20171006 107

Rendered for Nokia. Please do not distribute.


Chapter 3. Building and Customizing Images

man:apachectl(8)
Main PID: 833 (httpd)
...output omitted...

11.4. Exit the instance to return to workstation.

[cloud-user@production-web1 ~]$ exit

12. From workstation, confirm that the custom web page, displayed from production-
web1, contains the text production-rhel-web.

[student@workstation ~(operator1-production)]$ curl http://172.25.250.P/index.html


This instance uses the production-rhel-web image.

Evaluation
From workstation, run the lab customization-review grade command to confirm the
success of this exercise. Correct any reported failures and rerun the command until successful.

[student@workstation ~]$ lab customization-review grade

Cleanup
From workstation, run the lab customization-review cleanup command to clean up
this exercise.

[student@workstation ~]$ lab customization-review cleanup

108 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Summary

Summary
In this chapter, you learned:

• The pros and cons of building an image versus customizing an existing one, such as meeting
organization security standards, including third-party agents, and adding operator accounts.

• When to use the guestfish or virt-customize tools. Use guestfish when you need
to perform low-level tasks such as partitioning disks. Use virt-customize for all common
customization tasks such as setting passwords and installing packages.

• Making changes to an image using these tools affects SELinux file contexts, because SELinux
is not supported directly in the chroot environment.

CL210-RHOSP10.1-en-2-20171006 109

Rendered for Nokia. Please do not distribute.


110

Rendered for Nokia. Please do not distribute.


TRAINING
CHAPTER 4

MANAGING STORAGE

Overview
Goal Manage Ceph and Swift storage for OpenStack.
Objectives • Describe back-end storage options for OpenStack
services.

• Configure Ceph as the back-end storage for OpenStack


services.

• Manage Swift as object storage.


Sections • Describing Storage Options (and Quiz)

• Configuring Ceph Storage (and Guided Exercise)

• Managing Object Storage (and Guided Exercise)


Lab • Managing Storage

CL210-RHOSP10.1-en-2-20171006 111

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

Describing Storage Options

Objectives
After completing this section, students should be able to describe back-end storage options for
OpenStack services.

Storage in Red Hat OpenStack Platform


A cloud environment such as Red Hat OpenStack Platform requires applications that take
advantage of the features provided by this environment. They should be designed to leverage the
scalability of compute and storage resources in Red Hat OpenStack Platform. These resources
include both computing and storage resources used by users. Although the some storage
configurations can use simple back ends, such as the volume group for the OpenStack block
storage service, Red Hat OpenStack Platform also supports enterprise-level back ends. This
support includes the most common SAN infrastructures, as well as support for DAS and NAS
devices. This allows reuse of existing storage infrastructure as a back end for OpenStack.

In a physical enterprise environment, servers are often installed with local storage drives
attached to them, and use external storage to scale that local storage. This is also true of a
cloud-based instance, where the instance has some associated local storage, and also some
external storage as a way to scale the local storage. In cloud environments, storage is a key
resource that needs to be managed appropriately so that the maximum number of users can
take advantage of those resources. Local storage for instances is based in the compute nodes
where those instances run, and Red Hat OpenStack Platform recycles this local storage when
an instance terminates. This type of storage is known as ephemeral storage, and it includes both
the effective storage space a user can use inside of an instance and the storage used for swap
memory by the instance. All the ephemeral storage resources are removed when the instance
terminates.

The disk drive space of the physical servers on which instances run limits the available local
storage. To scale the storage of an instance, Red Hat OpenStack Platform provisions additional
space with the OpenStack block storage service, object storage service, or file share service. The
storage resources provided by those services are persistent, so they remain after the instance
terminates.

Storage Options for OpenStack Services


OpenStack services require two types of storage: ephemeral storage and persistent storage.
Ephemeral storage uses the local storage available in the compute nodes on which instances run.
This storage usually provides better performance because it uses the same back end that the
instance's virtual disk. Because of this, ephemeral storage is usually the best option for storing
elements that require the best performance, such as the operating system or swap disks.

Although ephemeral storage usually provides better performance, sometimes users need to store
data persistently. Red Hat OpenStack Platform services provide persistent storage in the form
of block storage and object storage. The block storage service allows storing data on a device
available in the instance file system. The object storage service provides an external storage
infrastructure available to instances.

Red Hat OpenStack Platform supports several storage systems as the back end for their services.
Those storage systems include:

112 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Benefits, Recommended Practices, and Use Cases

LVM
The block storage service supports LVM as a storage back end. LVM is available but not officially
supported by Red Hat. An LVM-based back end requires a volume group. Each block storage
volume uses a logical volume as its back end.

Red Hat Ceph Storage


The block storage and image services both support Red Hat Ceph Storage as a storage back
end. Red Hat Ceph Storage provides petabyte-scale storage and has no single point of failure.
Red Hat OpenStack Platform uses RBD to access Red Hat Ceph Storage. Each new volume or
image created in Red Hat OpenStack Platform uses an RBD image on Red Hat Ceph Storage.

NFS
Red Hat OpenStack Platform services such as the block storage service support NFS as a storage
back end. Each volume back end resides in the NFS shares specified in the driver options in the
block storage service configuration file.

Vendor-specific Storage
Supported storage hardware vendor provides a driver for Red Hat OpenStack Platform services
to use their storage infrastructure as a back end.

Note
Red Hat provides support for Red Hat Ceph Storage and NFS.

Benefits, Recommended Practices, and Use Cases


The undercloud currently supports both Red Hat Ceph Storage and NFS as storage back ends for
Red Hat OpenStack Platform systems. Most of the existing NAS and SAN solutions can export
storage using NFS. In addition, some storage hardware vendors provide drivers for the different
Red Hat OpenStack Platform services. These drivers can interact natively with the storage
systems provided by those vendors.

LVM is suitable for use in test environments. The storage volumes are created on the local
storage of the machine where the block storage service is running. This back end uses that
machine as an iSCSI target to export those storage volumes. This configuration is a bottleneck
when scaling up the environment.

Red Hat Ceph Storage is a separate infrastructure from Red Hat OpenStack Platform. This
storage system provides fault tolerance and scalability. Red Hat Ceph Storage is not the best
choice for some proof-of-concept environments, because of its hardware requirements. The
undercloud can collocate some Red Hat Ceph Storage services in the controller node. This
configuration reduces the number of resources needed.

Because of the growing demand for computing and storage resources, the undercloud now
supports hyper-converged infrastructures (HCI). These infrastructures use compute nodes where
both Red Hat OpenStack Platform and Red Hat Ceph Storage services run. The use of hyper-
converged nodes is pushing the need for better utilization of the underlying hardware resources.

Storage Architecture for OpenStack Services


The supported architectures by Red Hat Ceph Storage and the object storage service (Swift) are
the following:

CL210-RHOSP10.1-en-2-20171006 113

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

Red Hat Ceph Storage Architecture


The Red Hat Ceph Storage architecture has two main elements: monitors (MONs) and object
storage devices (OSDs). The monitors manage the cluster metadata, and they are the front end
for the Ceph cluster. A client that wants to access a Ceph cluster needs at least one monitor IP
address or host name. Each object storage device has a disk device associated with it. A node
can have several object storage devices. The undercloud deploys Ceph with one monitor running
on the controller node. The Red Hat Ceph Storage architecture is discussed further in a later
section.

Note
The Red Hat OpenStack Platform block storage and image services support Red Hat
Ceph Storage as their storage back end.

Swift Architecture
The Red Hat OpenStack Platform Swift service architecture has a front-end service, the proxy
server (swift-proxy), and three back-end services: account server (swift-account); object
server (swift-object); and container server (swift-container). The proxy server maintains
the Swift API. Red Hat OpenStack Platform configures the Keystone endpoint for Swift with the
URI for this API.

Figure 4.1: Swift architecture

114 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Storage Architecture for OpenStack Services

References
Further information is available in the Storage Guide for Red Hat OpenStack Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform

CL210-RHOSP10.1-en-2-20171006 115

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

Quiz: Describing Storage Options

Choose the correct answers to the following questions:

1. Red Hat provides support for which two storage back ends? (Choose two.)

a. In-memory
b. NFS
c. Red Hat Ceph Storage
d. Raw devices
e. LVM

2. Which two benefits are provided by a Red Hat Ceph Storage-based back end over NFS?
(Choose two.)

a. Snapshots
b. No single point of failure
c. Petabyte-scale storage
d. Thin provisioning
e. Integration with Red Hat OpenStack Platform

3. What is an LVM-based back end suitable for in Red Hat OpenStack Platform?

a. Production-ready environments
b. Cluster environments
c. Proof of concept environments
d. High performance environments (local storage based)

4. Which method uses the Red Hat OpenStack Platform block storage service to access Ceph?

a. CephFS
b. Ceph Gateway (RADOSGW)
c. RBD
d. Ceph native API (librados)

5. Which two Red Hat OpenStack Platform services are supported to use Red Hat Ceph Storage
as its back end? (Choose two.)

a. Share file system service


b. Block storage service
c. Image service
d. Compute service
e. Object storage service

116 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

Solution
Choose the correct answers to the following questions:

1. Red Hat provides support for which two storage back ends? (Choose two.)

a. In-memory
b. NFS
c. Red Hat Ceph Storage
d. Raw devices
e. LVM

2. Which two benefits are provided by a Red Hat Ceph Storage-based back end over NFS?
(Choose two.)

a. Snapshots
b. No single point of failure
c. Petabyte-scale storage
d. Thin provisioning
e. Integration with Red Hat OpenStack Platform

3. What is an LVM-based back end suitable for in Red Hat OpenStack Platform?

a. Production-ready environments
b. Cluster environments
c. Proof of concept environments
d. High performance environments (local storage based)

4. Which method uses the Red Hat OpenStack Platform block storage service to access Ceph?

a. CephFS
b. Ceph Gateway (RADOSGW)
c. RBD
d. Ceph native API (librados)

5. Which two Red Hat OpenStack Platform services are supported to use Red Hat Ceph Storage
as its back end? (Choose two.)

a. Share file system service


b. Block storage service
c. Image service
d. Compute service
e. Object storage service

CL210-RHOSP10.1-en-2-20171006 117

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

Configuring Ceph Storage

Objectives
After completing this section, students should be able to configure Ceph as the back-end storage
for OpenStack services.

Red Hat Ceph Storage Architecture


Hardware-based storage infrastructures inherently have limited scalability by design. Cloud
computing infrastructures require a storage system that can scale in parallel with the computing
resources. Software-defined storage systems, such as Red Hat Ceph Storage, can scale at the
same pace as the computing resources. Red Hat Ceph Storage also supports features such as
snapshotting and thin-provisioning.

The Ceph architecture is based on the daemons listed in Figure 4.2: Red Hat Ceph storage
architecture. Multiple OSDs can run on a single server, but can also run across servers. These
daemons can be scaled out to meet the requirements of the architecture being deployed.

Figure 4.2: Red Hat Ceph storage architecture

Ceph Monitors
Ceph monitors (MONs) are daemons that maintain a master copy of the cluster map. The cluster
map is a collection of five maps that contain information about the Ceph cluster state and
configuration. Ceph daemons and clients can check in periodically with the monitors to be sure
they have the most recent copy of the map. In this way they provide consensus for distributed

118 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Ceph Access Methods

decision making. The monitors must establish a consensus regarding the state of the cluster.
This means that an odd number of monitors is required to avoid a stalled vote, and a minimum
of three monitors must be configured. For the Ceph Storage cluster to be operational and
accessible, more than 50% of monitors must be running and operational. If the number of active
monitors falls below this threshold, the complete Ceph Storage cluster will become inaccessible
to any client. This is done to protect the integrity of the data.

Ceph Object Storage Devices


Ceph Object Storage Devices (OSDs) are the building blocks of a Ceph Storage cluster. OSDs
connect a disk to the Ceph Storage cluster. Each hard disk to be used for the Ceph cluster has
a file system on it, and an OSD daemon associated with it. Red Hat Ceph Storage currently only
supports using the XFS file system. Extended Attributes (xattrs) are used to store information
about the internal object state, snapshot metadata, and Ceph Gateway Access Control Lists (ACLs).
Extended attributes are enabled by default on XFS file systems.

The goal for the OSD daemon is to bring the computing power as close as possible to the physical
data to improve performance.

Each OSD has its own journal, not related to the file-system journal. Journals use raw volumes on
the OSD nodes, and should be configured on a separate device, and if possible a fast device, such
as an SSD, for performance oriented and/or heavy write environments. Depending on the Ceph
deployment tool used, the journal is configured such that if a Ceph OSD, or a node where a Ceph
OSD is located, fails, the journal is replayed when the OSD restarts. The replay sequence starts
after the last sync operation, as previous journal records were trimmed out.

Metadata Server
The Ceph Metadata Server (MDS) is a service that provides POSIX-compliant, shared file-system
metadata management, which supports both directory hierarchy and file metadata, including
ownership, time stamps, and mode. MDS uses RADOS to store metadata instead of local storage,
and has no access to file content, because it is only required for file access. RADOS is an object
storage service and is part of Red Hat Ceph Storage.

MDS also enables CephFS to interact with the Ceph Object Store, mapping an inode to an object,
and recording where data is stored within a tree. Clients accessing a CephFS file system first
make a request to an MDS, which provides the information needed to get files from the correct
OSDs.

Note
The metadata server is not deployed by the undercloud in the default Ceph
configuration.

Ceph Access Methods


The following methods are available for accessing a Ceph cluster:

• The Ceph native API (librados): native interface to the Ceph cluster. Service interfaces built
on this native interface include the Ceph Block Device, the Ceph Gateway, and the Ceph File
System.

• The Ceph Gateway (RADOSGW): RESTful APIs for Amazon S3 and Swift compatibility. The Ceph
Gateway is referred to as radosgw.

CL210-RHOSP10.1-en-2-20171006 119

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

• The Ceph Block Device (RBD, librbd): This is a Python module that provides file-like access to
Ceph Block Device images.

• The Ceph File System (CephFS, libcephfs): provides access to a Ceph cluster via a POSIX-like
interface.

Red Hat Ceph Storage Terminology


Pools
Pools are logical partitions for storing objects under the same name tag, which support multiple
name spaces. The Controlled Replication Under Scalable Hashing (CRUSH) algorithm is used to
select the OSDs hosting the data for a pool. Each pool is assigned a single CRUSH rule for its
placement strategy. The CRUSH rule is responsible for determining which OSDs receive the data
for all the pools using that particular CRUSH rule. A pool name must be specified for each I/O
request.

When a cluster is deployed without creating a pool, Ceph uses the default pools for storing data.
By default, only the rbd pool is created when Red Hat Ceph Storage is installed.

The ceph osd lspools command displays the current pools in the cluster. This includes the
pools created by the undercloud to integrate Red Hat Ceph Storage with Red Hat OpenStack
Platform services.

[root@demo]# ceph osd lspools


0 rbd,1 metrics,2 images,3 backups,4 volumes,5 vms,

Users
A Ceph client, which can be either a user or a service, requires a Ceph user to access the Ceph
cluster. By default, Red Hat Ceph Storage creates the admin user. The admin user can create
other users and their associated key-ring files. Each user has an associated key-ring file. The
usual location of this file is the /etc/ceph directory on the client machine.

Permissions are granted at the pool level for each Ceph user, either for all pools or to one or
more specific pools. These permissions can be read, write, or execute. The users available in a
Ceph cluster can be listed using the ceph auth list command. These users include the admin
user created by default, and the openstack user created by the undercloud for integration with
Red Hat OpenStack Platform services.

[root@demo]# ceph auth list


installed auth entries:
[... output omitted ...]
client.admin
key: AQBELB9ZAAAAABAAt7mbiBwBA8H60Z7p34D6hA==
caps: [mds] allow *
caps: [mon] allow *
caps: [osd] allow *
[... output omitted ...]
client.openstack
key: AQBELB9ZAAAAABAAmS+6yVgIuc7aZA/CL8rZoA==
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes,
allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx
pool=metrics

120 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Integration with Red Hat OpenStack Platform

Integration with Red Hat OpenStack Platform


The undercloud supports the deployment of Red Hat Ceph Storage as the back end for Red Hat
OpenStack Platform services, such as the block storage and the image services. Both the
block storage and image services use RBD images as the back end for volumes and images
respectively. Each service requires a Ceph user and a Ceph pool. The undercloud creates a pool
named images for the image service, and a pool named volumes for the block storage service.
The undercloud also creates by default the openstack user, who has access to both the block
storage service pools and the image service pools.

Hyper-converged Infrastructures
The demand for computing and storage resources in cloud computing environments is growing.
This growing demand is pushing for better utilization of the underlying hardware resources. The
undercloud supports this initiative by supporting the deployment of hyper-converged nodes.
These hyper-converged nodes include both compute and Red Hat Ceph Storage services.

The undercloud supports the deployment and management of Red Hat OpenStack Platform
environments that only use hyper-converged nodes, as well as Red Hat OpenStack Platform
environments with a mix of hyper-converged and compute nodes without any Ceph service.
Hyper-converged node configuration needs to be adjusted manually after deployment to avoid
degradation of either computing or storage services, because of shared hardware resources.

Troubleshooting Ceph
Red Hat Ceph Storage uses a configuration file, ceph.conf, under the /etc/ceph directory. All
the machines running Ceph daemons, and the Ceph clients use this configuration file. Each Ceph
daemon creates a log file on the machine where it is running. These log files are located in the /
var/log/ceph directory.

The Red Hat Ceph Storage CLI tools provide several commands that you can use to determine
the status of the Ceph cluster. For example, the ceph health command determines the
current health status of the cluster. This status can be HEALTH_OK when no errors are present,
HEALTH_WARN, or HEALTH_ERR when the cluster has some issues.

[root@demo]# ceph health


HEALTH_OK

The ceph -s command provides more details about the Ceph cluster's status, such as the
number of MONs and OSDs and the status of the current placement groups (PGs).

[root@demo]# ceph -s
cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8
health HEALTH_OK
monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0}
election epoch 4, quorum 0 overcloud-controller-0
osdmap e53: 3 osds: 3 up, 3 in
flags sortbitwise
pgmap v1108: 224 pgs, 6 pools, 595 MB data, 404 objects
1897 MB used, 56437 MB / 58334 MB avail
224 active+clean

The ceph -w command, in addition to the Ceph cluster's status, returns Ceph cluster events.
Enter Ctrl+C to exit this command.

CL210-RHOSP10.1-en-2-20171006 121

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

[root@demo]# ceph -w
cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8
health HEALTH_OK
monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0}
election epoch 4, quorum 0 overcloud-controller-0
osdmap e53: 3 osds: 3 up, 3 in
flags sortbitwise
pgmap v1108: 224 pgs, 6 pools, 595 MB data, 404 objects
1897 MB used, 56437 MB / 58334 MB avail
224 active+clean

2017-05-30 15:43:35.402634 mon.0 [INF] from='client.? 172.24.3.3:0/2002402609'


entity='client.admin'
cmd=[{"prefix": "auth list"}]: dispatch
...output omitted...

There are other commands available, such as the ceph osd tree command, which shows the
status of the OSD daemons, either up or down. This command also displays the machine where
those OSD daemons are running.

[root@demo]# ceph osd tree


ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.05499 root default
-2 0.05499 host overcloud-cephstorage-0
0 0.01799 osd.0 up 1.00000 1.00000
1 0.01799 osd.1 up 1.00000 1.00000
2 0.01799 osd.2 up 1.00000 1.00000

OSD daemons can be managed using systemd unit files. The systemctl stop ceph-
osd@osdid command supports the management of a single OSD daemon with the ID osdid.
This command has to be executed in the Ceph node where the OSD with the corresponding ID is
located. If the OSD with an ID of 0 is located on the demo server, the following command would
be used to stop that OSD daemon:

[root@demo]# systemctl stop ceph-osd@0

Troubleshooting OSD Problems


If the cluster is not healthy, the ceph -s command displays a detailed status report. This status
report contains the following information:

• Current status of the OSDs (up, down, out, in). An OSD's status is up if the OSD is running,
and down if the OSD is not running. An OSD's status is in if the OSD allows data read and
write, or out if the OSD does not.

• OSD capacity limit information (nearfull or full).

• Current status of the placement groups (PGs).

Although Ceph is built for seamless scalability, this does not mean that the OSDs cannot run out
of space. Space-related warning or error conditions are reported both by the ceph -s and ceph
health commands, and OSD usage details are reported by the ceph osd df command. When
an OSD reaches the full threshold, it stops accepting write requests, although read requests
are still served.

122 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Troubleshooting Ceph

Troubleshooting MON Problems


Monitoring servers (MONs) maintain the cluster map to ensure cluster quorum and to avoid
typical split-brain situations. For a Ceph cluster to be healthy, it has to have quorum, which
means that more than half of the configured MON servers are operational, and that operational
MON servers communicate with each other. If a MON only sees half of all other MONs or fewer,
it becomes non-operational. This behavior prohibits normal operation, and can lead to cluster
downtime, affecting users. Usually a MON failure is related to network problems, but additional
information about what caused the crash can be gained using the ceph daemon mon.monid
quorum_status command, and investigating the /var/log/ceph/ceph.log and /var/log/
ceph/ceph-mon.hostname.log files. In the previous command, a MON ID uses the following
format: mon.monid, where monid is the ID of the MON (a number starting at 0). Recovery
typically involves restarting any failed MONs.

If the MON with an ID of 1 is located on the demo server, the following command would be used
to get additional information about the quorum status for the MON:

[root@demo]# ceph daemon mon.1 quorum_status

Configuring Ceph Storage


The following steps outline the process for managing Ceph MON and OSD daemons and verifying
their status.

1. Log in to an OpenStack controller node.

2. Verify the availability of the ceph client key rings.

3. Verify the monitor daemon and authentication settings in the Ceph cluster's configuration
file.

4. Verify that the Ceph cluster health is HEALTH_OK.

5. Verify the number of MON and OSD daemons configured in the Ceph cluster.

6. Verify that the MON daemon's associated service, ceph-mon, is running.

7. Locate the log file for the MON daemon.

8. Log in to a Ceph node.

9. Verify which two OSDs are in the up state.

10. Locate the log files for the three OSD daemons.

References
Further information is available in the Red Hat Ceph Storage for the Overcloud Guide for
Red Hat OpenStack Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

CL210-RHOSP10.1-en-2-20171006 123

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

Guided Exercise: Configuring Ceph Storage

In this exercise, you will verify the status of a Ceph cluster. You will also verify the Ceph cluster
configuration as the back end for OpenStack services. Finally you will troubleshoot and fix an
issue with a Ceph OSD.

Outcomes
You should be able to:

• Verify the status of a Ceph cluster.

• Verify Ceph pools and user for Red Hat OpenStack Platform services.

• Troubleshoot and fix an issue with a Ceph OSD.

Before you begin


Log in to workstation as student using student as the password.

From workstation, run lab storage-config-ceph setup to verify that OpenStack


services are running and the resources created in previous sections are available.

[student@workstation ~]$ lab storage-config-ceph setup

Steps
1. Verify that the Ceph cluster status is HEALTH_OK.

1.1. Log in to controller0 using the heat-admin user.

[student@workstation ~]$ ssh heat-admin@controller0

1.2. Verify Ceph cluster status using the sudo ceph health command.

[heat-admin@overcloud-controller-0 ~]$ sudo ceph health


HEALTH_OK

2. Verify the status of the Ceph daemons and the cluster's latest events.

2.1. Using the sudo ceph -s command, you will see a MON daemon and three OSD
daemons. The three OSD daemons' states will be up and in.

[heat-admin@overcloud-controller-0 ~]$ sudo ceph -s


cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8
health HEALTH_OK
monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0}
election epoch 4, quorum 0 overcloud-controller-0
osdmap e50: 3 osds: 3 up, 3 in
flags sortbitwise
pgmap v556: 224 pgs, 6 pools, 1358 kB data, 76 objects
121 MB used, 58213 MB / 58334 MB avail
224 active+clean

124 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


2.2. Display the Ceph cluster's latest events using the sudo ceph -w command. Press
Ctrl+C to break the event listing.

[heat-admin@overcloud-controller-0 ~]$ sudo ceph -w


cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8
health HEALTH_OK
monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0}
election epoch 4, quorum 0 overcloud-controller-0
osdmap e50: 3 osds: 3 up, 3 in
flags sortbitwise
pgmap v556: 224 pgs, 6 pools, 1358 kB data, 76 objects
121 MB used, 58213 MB / 58334 MB avail
224 active+clean

2017-05-22 10:48:03.427574 mon.0 [INF] pgmap v574: 224 pgs: 224 active+clean;
1359 kB data, 122 MB used, 58212 MB / 58334 MB avail
...output omitted...
Ctrl+C

3. Verify that the pools and the openstack user, required for configuring Ceph as the back
end for Red Hat OpenStack Platform services, are available.

3.1. Verify that the images and volumes pools are available using the sudo ceph osd
lspools command.

[heat-admin@overcloud-controller-0 ~]$ sudo ceph osd lspools


0 rbd,1 metrics,2 images,3 backups,4 volumes,5 vms,

3.2. Verify that the openstack user is available using the sudo ceph auth list
command. This user will have rwx permissions for both the images and volumes
pools.

[heat-admin@overcloud-controller-0 ~]$ sudo ceph auth list


...output omitted...
client.openstack
key: AQBELB9ZAAAAABAAmS+6yVgIuc7aZA/CL8rZoA==
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children,
allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms,
allow rwx pool=images, allow rwx pool=metrics
...output omitted...

4. Stop the OSD daemon with ID 0. Verify the Ceph cluster's status.

4.1. Verify that the Ceph cluster's status is HEALTH_OK, and the three OSD daemons are up
and in.

[heat-admin@overcloud-controller-0 ~]$ sudo ceph -s


cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8
health HEALTH_OK
monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0}
election epoch 4, quorum 0 overcloud-controller-0
osdmap e50: 3 osds: 3 up, 3 in
flags sortbitwise
pgmap v556: 224 pgs, 6 pools, 1358 kB data, 76 objects

CL210-RHOSP10.1-en-2-20171006 125

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

121 MB used, 58213 MB / 58334 MB avail


224 active+clean

4.2. Log out of controller0. Log in to ceph0 as heat-admin.

[heat-admin@overcloud-controller-0 ~]$ exit


[student@workstation ~]$ ssh heat-admin@ceph0

4.3. Use the systemd unit file for ceph-osd to stop the OSD daemon with ID 0.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl stop ceph-osd@0

4.4. Verify that the OSD daemon with ID 0 is down.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd tree


ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
...output omitted...
0 0.01799 osd.0 down 1.00000 1.00000
...output omitted...

4.5. Verify the Ceph cluster's status is HEALTH_WARN. The two OSDs daemons are up and in
out of three.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph -w


cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8
health HEALTH_WARN
224 pgs degraded
224 pgs undersized
recovery 72/216 objects degraded (33.333%)
1/3 in osds are down
monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0}
election epoch 4, quorum 0 overcloud-controller-0
osdmap e43: 3 osds: 2 up, 3 in; 224 remapped pgs
flags sortbitwise
pgmap v153: 224 pgs, 6 pools, 1720 kB data, 72 objects
114 MB used, 58220 MB / 58334 MB avail
72/216 objects degraded (33.333%)
224 active+undersized+degraded

mon.0 [INF] pgmap v153: 224 pgs: 224 active+undersized+degraded;


1720 kB data, 114 MB used, 58220 MB / 58334 MB avail;
72/216 objects degraded (33.333%)
mon.0 [INF] osd.0 out (down for 304.628763)
mon.0 [INF] osdmap e44: 3 osds: 2 up, 2 in
...output omitted...
Ctrl+C

5. Start the OSD daemon with ID 0 to fix the issue. Verify that the Ceph cluster's status is
HEALTH_OK.

5.1. Use the systemd unit file for ceph-osd to start the OSD daemon with ID 0.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl start ceph-osd@0

126 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


5.2. Verify the Ceph cluster's status is HEALTH_OK. The three OSD daemons are up and in.
It may take some time until the cluster status changes to HEALTH_OK.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph -s


cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8
health HEALTH_OK
monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0}
election epoch 4, quorum 0 overcloud-controller-0
osdmap e50: 3 osds: 3 up, 3 in
flags sortbitwise
pgmap v556: 224 pgs, 6 pools, 1358 kB data, 76 objects
121 MB used, 58213 MB / 58334 MB avail
224 active+clean

5.3. Exit the ceph0 node to return to workstation.

[heat-admin@overcloud-cephstorage-0 ~]$ exit


[student@workstation ~]$

Cleanup
From workstation, run the lab storage-config-ceph cleanup script to clean up this
exercise.

[student@workstation ~]$ lab storage-config-ceph cleanup

CL210-RHOSP10.1-en-2-20171006 127

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

Managing Object Storage

Objectives
After completing this section, students should be able to manage Swift as object storage.

Swift Architecture
Swift is a fully distributed storage solution, where both static data and binary objects are stored.
It is neither a file system nor a real-time data storage system. It can easily scale to multiple
petabytes or billions of objects.

The Swift components listed in the following table are all required for the architecture to work
properly.

Component Description
Proxy Server Processes all API calls and locates the requested object.
Encodes and decodes data if Erasure Code is being used.
Ring Maps the names of entities to their stored location on disk.
Accounts, containers, and object servers each have their own
ring.
Account Server Holds a list of all containers.
Container Server Holds a list of all objects.
Object Server Stores, retrieves, and deletes objects.

The proxy server interacts with the appropriate ring to route requests and locate objects. The
ring stores a mapping between stored entities and their physical location.

By default, each partition of the ring is replicated three times to ensure a fully distributed
solution. Data is evenly distributed across the capacity of the cluster. Zones ensure that data is
isolated. Because data is replicated across zones, failure in one zone does not impact the rest of
the cluster.

Removing and Rebalancing Zones


It is important to understand the concepts behind a storage system, to comprehend the policies,
and to design and plan carefully before production.

Zones are created to ensure that failure is not an option. Each data replica should reside within
a different zone. Zone configuration ensures that should one zone fail there are still two up and
running that can either accept new objects or retrieve stored objects.

The recommended number of zones is five, on five separate nodes. As mentioned previously,
Swift, by default, writes three replicas. If there are only three zones and one becomes
unavailable, Swift cannot hand off the replica to another node. With five nodes, Swift has options
and can automatically write the replica to another node ensuring that eventually there will be
three replicas.

After Swift is set up and configured, it is possible to rectify or alter the storage policy. Extra
devices can be added at any time.

128 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Swift Commands

Storage rings can be built on any hardware that has the appropriate version of Swift installed.
Upon building or rebalancing (changing) the ring structure, the rings must be redistributed to
include all of the servers in the cluster. The swift-ring-builder utility is used to build and
manage rings.

To build the three rings for account, object, and container, the following syntax is used to add a
new device to a ring:

[root@demo]# swift-ring-builder account.builder add zzone-ipaddress:6202/device weight


[root@demo]# swift-ring-builder container.builder add zzone-ipaddress:6201/device weight
[root@demo]# swift-ring-builder object.builder add zzone-ipaddress:6200/device weight

The zone includes a number as the ID for the rack to which the server belongs. The ipaddress is
the IP address of the server. The device is the device partition to add. The weight includes the size
of the device's partition.

Note
Pfrior to the Netwon release of OpenStack, the Object service used ports 6002, 6001
and 6000 for the account, container, and object services. These earlier default Swift
ports overlapped with ports already registered with IANA for X-Server, causing SELinux
policy conflicts and security risks. Red Hat OpenStack Platform switched to the new
ports in the Juno release, and the upstream Swift project completed the switch in
Newton.

Swift Commands
There are two sets of commands for Swift, an older version and a newer version. The older
commands, for example, swift post, swift list, and swift stat, are still supported.
However, OpenStack is moving to the OpenStack Unified CLI described below.

Note
By default, the following commands require the OpenStack user to have either the
admin or swiftoperator roles.

The openstack container command is used to manage objects in Openstack. The


openstack container create command is used to create containers:

[user@demo ~]$ openstack container create cont1

The openstack container list command displays all containers available to the user:

[user@demo ~]$ openstack container list


+------------+
| Name |
+------------+
| cont1 |
+------------+

CL210-RHOSP10.1-en-2-20171006 129

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

The openstack container delete command deletes the specified container:

[user@demo ~]$ openstack container delete cont1

The openstack object create command uploads an existing object to the specified
container:

[user@demo ~]$ openstack object create cont1 object.cont1


+--------------+-----------+----------------------------------+
| object | container | etag |
+--------------+-----------+----------------------------------+
| object.cont1 | cont1 | d41d8cd98f00b204e9800998ecf8427e |
+--------------+-----------+----------------------------------+

The openstack container save command saves the contents of an existing container
locally:

[user@demo ~]$ openstack container save cont1


[user@demo ~]$ ls -al
-rw-rw-r--. 1 user user 0 May 29 09:45 object.cont1

The openstack object list command lists all of the objects stored in the specified
container:

[user@demo ~]$ openstack object list cont1


+--------------+
| Name |
+--------------+
| object.cont1 |
+--------------+

The openstack object delete command deletes an object from the specified container:

[user@demo ~]$ openstack object delete cont1 object.cont1

Comparing Ceph with Swift for Object Storage


Both Swift and Ceph are open source Object Storage systems. They both use standard hardware,
allow scale-out storage, and are easy to deploy in enterprises of all sizes.

This is perhaps where the similarities end. Ceph lends itself to block access storage, transactional
storage, and is recommended for single sites. Swift uses Object API access to storage, and is
recommended for unstructured data and geographical distribution. Applications that mostly use
block access storage are built in a different way from those that use object access storage. The
decision might come down to which applications need object storage and how they access it.

Swift protects written data first and can therefore take additional time to update the entire
cluster. Ceph does not do this, which makes it a better candidate for databases and real-time
data. Swift would be a better choice for large-scale, geographically dispersed, unstructured data.
This means that you might need or want both Ceph and Swift. This decision will depend on the
types of applications, the geographical structure of your data centers, the type of objects that
need to be stored, consistency of the data replicated, transactional performance requirements,
and the number of objects to be stored.

130 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Benefits, Use Cases, and Recommended Practices

Benefits, Use Cases, and Recommended Practices


Comparing object storage with block storage
One of the main differences between block storage and object storage is that a volume can only
be accessed via instances, and by one instance at a time, whereas any instance or service can
access the objects stored in containers as all objects stored within Swift have an accessible URL.
Swift also supports the Amazon Simple Storage Service (S3) API.

The Benefits of Using Swift


Object storage has several distinct advantages over volume storage. As previously mentioned, it
is accessible from any OpenStack service, it supports the Amazon S3 API, and is fully distributed.

The reduced cost can also be an advantage, with object storage you only pay for the amount of
storage that you use—you upload 5GB, you pay for 5GB. With volume storage, you pay for the size
of the disk you create; if you create a 50GB volume, you will pay for all 50GB whether or not it
is all used. However, be aware that if you use Swift over multiple data centers then the cost can
spiral because you are moving a lot of data over the internet; this can get expensive.

Swift is best used for large pools of small objects. It is easily scalable, whereas volumes are not.

Use Cases
A major university uses Swift to store videos of every sporting event for both men's and women's
sporting events. All events for an entire year are stored in an omnipresent and easily accessible
storage solution. Students, alumni, and fans can use any internet-enabled web browser to access
the university's web site and click a link to view, in its entirety, their desired sporting event.

Recommended Practice: Disk Failure


It is Friday night and a drive has failed. You do not want to start messing with it before the
weekend. Swift starts an automatic, self-healing, workaround by writing replicas to a hand-off
node. Monday comes around and you change the failed drive, format it and mount it. The drive is,
of course, empty. Swift, however, will automatically start replicating data that is supposed to be in
that zone. In this case, you do not even have to do anything to the ring as the physical drive was
simply replaced—zones do not change so no need to rebalance the ring.

Note
If you were to change the size of the physical drive, then you would have to rebalance
the ring.

Configuration and Log Files


File name Description
/var/log/swift/swift.log Default location of all log entries.
/var/log/messages Location of all messages related to HAProxy and
Swift configuration, and Swift CLI tool requests.
/etc/swift/object-server.conf Holds the configuration for the different back-end
Swift services supporting replication (object-
replicator), object information management in

CL210-RHOSP10.1-en-2-20171006 131

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

File name Description


containers (object-updater), and object integrity
(object-auditor).

Troubleshooting Swift
Swift logs all troubleshooting events in /var/log/swift/swift.log. You should start your
troubleshooting process here. Swift logging is very verbose and the generated logs can be used
for monitoring, audit records, and performance. Logs are organized by log level and syslog
facility. Log lines for the same request have the same transaction ID.

Make sure that all processes are running; the basic ones required are Proxy Server, Account
Server, Container Server, Object Server, and Auth Server.

[user@demo ~]$ ps -aux | grep swift


swift 2249....../usr/bin/python2 /usr/bin/swift-container-updater /etc/swift/
container-server.conf
swift 2267....../usr/bin/python2 /usr/bin/swift-account-replicator /etc/swift/
account-server.conf
swift 2275....../usr/bin/python2 /usr/bin/swift-container-auditor /etc/swift/
container-server.conf
swift 2276....../usr/bin/python2 /usr/bin/swift-account-reaper /etc/swift/account-
server.conf
swift 2281....../usr/bin/python2 /usr/bin/swift-container-replicator /etc/swift/
container-server.conf
swift 2294....../usr/bin/python2 /usr/bin/swift-object-updater /etc/swift/object-
server.conf
swift 2303....../usr/bin/python2 /usr/bin/swift-account-auditor /etc/swift/account-
server.conf
swift 2305....../usr/bin/python2 /usr/bin/swift-object-replicator /etc/swift/object-
server.conf
swift 2306....../usr/bin/python2 /usr/bin/swift-object-auditor /etc/swift/object-
server.conf
swift 2311....../usr/bin/python2 /usr/bin/swift-container-server /etc/swift/
container-server.conf
swift 2312....../usr/bin/python2 /usr/bin/swift-account-server /etc/swift/account-
server.conf
swift 2313....../usr/bin/python2 /usr/bin/swift-object-server /etc/swift/object-
server.conf
swift 2314....../usr/bin/python2 /usr/bin/swift-proxy-server /etc/swift/proxy-
server.conf
swift 2948....../usr/bin/python2 /usr/bin/swift-account-server /etc/swift/account-
server.conf
swift 2954....../usr/bin/python2 /usr/bin/swift-container-server /etc/swift/
container-server.conf
swift 2988....../usr/bin/python2 /usr/bin/swift-object-server /etc/swift/object-
server.conf

Detecting Failed Drives


Swift has a script called swift-drive-audit, which you can run either manually or via cron.
This script checks for bad drives and unmounts them if any errors are found. Swift then works
around the bad drives by replicating data to another drive. The output of the script is written to /
var/log/kern.log.

Drive Failure
It is imperative to unmount the failed drive; this should be the first step taken. This action makes
object retrieval by Swift much easier. Replace the drive, format it and mount it, and let the

132 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Troubleshooting Swift

replication feature take over. The new drive will quickly populate with replicas. If a drive cannot
be replaced immediately, ensure that it is unmounted, that the mount point is owned by root, and
the device weight is set to 0. Setting the weight to 0 is preferable to removing it from the ring
because it gives Swift the chance to try and replicate from the failing disk (it could be that some
data is retrievable), and after the disk has been replaced you can increase the weight of the disk,
removing the need to rebuild the ring.

The following commands show how to change the weight of a device using the swift-
ring-builder command. In the following command, service is either account, object, or
container, device is the device's partition name, and weight is the new weight.

[root@demo]# swift-ring-builder service.builder set_weight device weight

For example, to set the weight of a device named vdd to 0, the previous command must be
executed using the three rings, as follows:

[root@demo]# swift-ring-builder account.builder set_weight z1-172.24.4.1:6002/vdd 0


d1r1z1-172.24.4.1:6002R172.24.4.1:6002/vdd_"" weight set to 0.0
[root@demo]# swift-ring-builder container.builder set_weight z1-172.24.4.1:6001/vdd 0
d1r1z1-172.24.4.1:6002R172.24.4.1:6001/vdd_"" weight set to 0.0
[root@demo]# swift-ring-builder object.builder set_weight z1-172.24.4.1:6000/vdd 0
d1r1z1-172.24.4.1:6002R172.24.4.1:6000/vdd_"" weight set to 0.0

The three rings must then be rebalanced:

[root@demo]# swift-ring-builder account.builder rebalance


[root@demo]# swift-ring-builder container.builder rebalance
[root@demo]# swift-ring-builder object.builder rebalance

The device can be added back to Swift using the swift-ring-builder set_weight
command, with the new weight for the device. The device's weight has to be updated in the three
rings. For example, if a device's weight has to be changed to 100, the following commands must
be executed using the three rings, as follows:

[root@demo]# swift-ring-builder account.builder set_weight z1-172.24.4.1:6002/vdd 100


d1r1z1-172.24.4.1:6002R172.24.4.1:6002/vdd_"" weight set to 100.0
[root@demo]# swift-ring-builder container.builder set_weight z1-172.24.4.1:6001/vdd 100
d1r1z1-172.24.4.1:6002R172.24.4.1:6001/vdd_"" weight set to 100.0
[root@demo]# swift-ring-builder object.builder set_weight z1-172.24.4.1:6000/vdd 100
d1r1z1-172.24.4.1:6002R172.24.4.1:6000/vdd_"" weight set to 100.0

The three rings must then be rebalanced. The weight associated with each device on each ring
can then be obtained using the swift-ring-builder command. The following command
returns information for each device, including the weight associated with the device in that ring:

[root@demo]# swift-ring-builder /etc/swift/account.builder


/etc/swift/account.builder, build version 6
... output omitted ...
Devices: id region zone ip address:port replication ip:port name weight partitions
balance
0 1 1 172.24.4.1:6002 172.24.4.1:6002 vdd 100.00
flags meta
1024 100.00

CL210-RHOSP10.1-en-2-20171006 133

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

Server Failure
Should a server be experiencing hardware issues, ensure that the Swift services are not running.
This guarantees that Swift will work around the failure and start replicating to another server. If
the problem can be fixed within a relatively short time, for example, a couple of hours, then let
Swift work around the failure automatically and get the server back online. When online again,
Swift will ensure that anything missing during the downtime is updated.

If the problem is more severe, or no quick fix is possible, it is best to remove the devices from
the ring. After repairs have been carried out, add the devices to the ring again. Remember
to reformat the devices before adding them to the ring, because they will almost certainly be
responsible for a different set of partitions than before.

Managing Object Storage


The following steps outline the process for managing object storage using the OpenStack unified
CLI.

1. Source the keystone credentials environment file.

2. Create a new container.

3. Verify that the container has been correctly created.

4. Create a file to upload to the container.

5. Upload the file as an object to the container.

6. Verify that the object has been correctly created.

7. Download the object.

References
Further information is available in the Object Storage section of the Storage Guide for
Red Hat OpenStack Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

134 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Guided Exercise: Managing Object Storage

Guided Exercise: Managing Object Storage

In this exercise, you will upload an object to the OpenStack object storage service, retrieve that
object from an instance, and then verify that the object has been correctly downloaded to the
instance.

Resources
Files /home/student/developer1-finance-rc

Outcomes
You should be able to:

• Upload an object to the OpenStack object storage service.

• Download an object from the OpenStack object storage service to an instance.

Before you begin


Log in to workstation as student using student as the password.

From workstation, run lab storage-obj-storage setup to verify that OpenStack


services are running and the resources created in previous sections are available.

[student@workstation ~]$ lab storage-obj-storage setup

Steps
1. Create a 10MB file named dataset.dat. As the developer1 user, create a container called
container1 in the OpenStack object storage service. Upload the dataset.dat file to this
container.

1.1. Create a 10MB file named dataset.dat.

[student@workstation ~]$ dd if=/dev/zero of=~/dataset.dat bs=10M count=1

1.2. Load the credentials for the developer1 user. This user has been configured by the lab
script with the role swiftoperator.

[student@workstation ~]$ source developer1-finance-rc

1.3. Create a new container named container1.

[student@workstation ~(developer1-finance)]$ openstack container create \


container1
+--------------------+------------+---------------+
| account | container | x-trans-id |
+--------------------+------------+---------------+
| AUTH_c968(...)020a | container1 | tx3b(...)e8f3 |
+--------------------+------------+---------------+

1.4. Upload the dataset.dat file to the container1 container.

CL210-RHOSP10.1-en-2-20171006 135

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

[student@workstation ~(developer1-finance)]$ openstack object create \


container1 dataset.dat
+-------------+------------+----------------------------------+
| object | container | etag |
+-------------+------------+----------------------------------+
| dataset.dat | container1 | f1c9645dbc14efddc7d8a322685f26eb |
+-------------+------------+----------------------------------+

2. Download the dataset.dat object to the finance-web1 instance created by the lab
script.

2.1. Verify that the finance-web1 instance's status is ACTIVE. Verify the floating IP
address associated with the instance.

[student@workstation ~(developer1-finance)]$ openstack server show \


finance-web1
+------------------------+---------------------------------------------+
| Field | Value |
+------------------------+---------------------------------------------+
...output omitted...
| addresses | finance-network1=192.168.0.N, 172.25.250.P |
...output omitted...
| key_name | developer1-keypair1 |
| name | finance-web1 |
...output omitted...
| status | ACTIVE |
...output omitted...
+------------------------+---------------------------------------------+

2.2. Copy the credentials file for the developer1 user to the finance-web1 instance. Use
the cloud-user user and the /home/student/developer1-keypair1.pem key
file.

[student@workstation ~(developer1-finance)]$ scp -i developer1-keypair1.pem \


developer1-finance-rc \
cloud-user@172.25.250.P:~

2.3. Log in to the finance-web1 instance using cloud-user as the user and the /home/
student/developer1-keypair1.pem key file.

[student@workstation ~(developer1-finance)]$ ssh -i ~/developer1-keypair1.pem \


cloud-user@172.25.250.P

2.4. Load the credentials for the developer1 user.

[cloud-user@finance-web1 ~]$ source developer1-finance-rc

2.5. Download the dataset.dat object from the object storage service.

[cloud-user@finance-web1 ~(developer1-finance)]$ openstack object save \


container1 dataset.dat

136 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


2.6. Verify that the dataset.dat object has been downloaded. When done, log out from
the instance.

[cloud-user@finance-web1 ~(developer1-finance)]$ ls -lh dataset.dat


-rw-rw-r--. 1 cloud-user cloud-user 10M May 26 06:58 dataset.dat
[cloud-user@finance-web1 ~(developer1-finance)]$ exit

Cleanup
From workstation, run the lab storage-obj-storage cleanup script to clean up this
exercise.

[student@workstation ~]$ lab storage-obj-storage cleanup

CL210-RHOSP10.1-en-2-20171006 137

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

Lab: Managing Storage

In this lab, you will fix an issue in the Ceph environment. You will also upload a MOTD file to the
OpenStack object storage service. Finally, you will retrieve that MOTD file inside an instance.

Resources
Files: http://materials.example.com/motd.custom

Outcomes
You should be able to:

• Fix an issue in a Ceph environment.

• Upload a file to the Object storage service.

• Download and implement an object in the Object storage service inside an instance.

Before you begin


Log in to workstation as student using student as the password.

From workstation, run lab storage-review setup, which verifies OpenStack services and
previously created resources. This script also misconfigures Ceph and launches a production-
web1 instance with OpenStack CLI tools.

[student@workstation ~]$ lab storage-review setup

Steps
1. The Ceph cluster has a status issue. Fix the issue to return the status to HEALTH_OK.

2. As the operator1 user, create a new container called container4 in the Object storage
service. Upload the custom MOTD file available at http://materials.example.com/
motd.custom to this container.

3. Log in to the production-web1 instance, and download the motd.custom object from
Swift to /etc/motd. Use the operator1 user credentials.

4. Verify that the MOTD file includes the message Updated MOTD message.

Evaluation
On workstation, run the lab storage-review grade command to confirm success of this
exercise.

[student@workstation ~]$ lab storage-review grade

Cleanup
From workstation, run the lab storage-review cleanup script to clean up this exercise.

[student@workstation ~]$ lab storage-review cleanup

138 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

Solution
In this lab, you will fix an issue in the Ceph environment. You will also upload a MOTD file to the
OpenStack object storage service. Finally, you will retrieve that MOTD file inside an instance.

Resources
Files: http://materials.example.com/motd.custom

Outcomes
You should be able to:

• Fix an issue in a Ceph environment.

• Upload a file to the Object storage service.

• Download and implement an object in the Object storage service inside an instance.

Before you begin


Log in to workstation as student using student as the password.

From workstation, run lab storage-review setup, which verifies OpenStack services and
previously created resources. This script also misconfigures Ceph and launches a production-
web1 instance with OpenStack CLI tools.

[student@workstation ~]$ lab storage-review setup

Steps
1. The Ceph cluster has a status issue. Fix the issue to return the status to HEALTH_OK.

1.1. Log in to ceph0 as the heat-admin user.

[student@workstation ~]$ ssh heat-admin@ceph0

1.2. Determine the Ceph cluster status. This status will be HEALTH_WARN.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph health


HEALTH_WARN 224 pgs degraded; 224 pgs stuck unclean; 224 pgs undersized;
recovery 501/870 objects degraded (57.586%)

1.3. Determine what the issue is by verifying the status of the Ceph daemons. Only two OSD
daemons will be reported as up and in, instead of the expected three up and three in.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph -s


health HEALTH_WARN
...output omitted...
osdmap e50: 3 osds: 2 up, 2 in; 224 remapped pgs
flags sortbitwise
...output omitted...

1.4. Determine which OSD daemon is down. The status of the OSD daemon with ID 0 on
ceph0 is down.

CL210-RHOSP10.1-en-2-20171006 139

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd tree


ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.05499 root default
-2 0.05499 host overcloud-cephstorage-0
0 0.01799 osd.0 down 0 1.00000
1 0.01799 osd.1 up 1.00000 1.00000
2 0.01799 osd.2 up 1.00000 1.00000

1.5. Start the OSD daemon with ID 0 using the systemd unit file.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl start ceph-osd@0

1.6. Verify that the Ceph cluster status is HEALTH_OK. Initial displays may show the Ceph
cluster in recovery mode, with the percentage still degraded shown in parenthesis.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph health


HEALTH_WARN 8 pgs degraded; recovery 26/27975 objects degraded (0.093%)
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph health
HEALTH_OK

1.7. Exit the ceph0 node to return to workstation.

[heat-admin@overcloud-cephstorage-0 ~]$ exit

2. As the operator1 user, create a new container called container4 in the Object storage
service. Upload the custom MOTD file available at http://materials.example.com/
motd.custom to this container.

2.1. Download the motd.custom file from http://materials.example.com/


motd.custom.

[student@workstation ~]$ wget http://materials.example.com/motd.custom

2.2. View the contents of the motd.custom file. This file contains a new MOTD message.

[student@workstation ~]$ cat ~/motd.custom


Updated MOTD message

2.3. Load the credentials for the operator1 user.

[student@workstation ~]$ source operator1-production-rc

2.4. Create a new container named container4.

[student@workstation ~(operator1-production)]$ openstack container create \


container4
+--------------------+------------+---------------+
| account | container | x-trans-id |
+--------------------+------------+---------------+
| AUTH_fd0c(...)63da | container4 | txb9(...)8011 |

140 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

+--------------------+------------+---------------+

2.5. Create a new object in the container4 container using the motd.custom file.

[student@workstation ~(operator1-production)]$ openstack object create \


container4 motd.custom
+-------------+------------+----------------------------------+
| object | container | etag |
+-------------+------------+----------------------------------+
| motd.custom | container4 | 776c9b861983c6e95da77499046113bf |
+-------------+------------+----------------------------------+

3. Log in to the production-web1 instance, and download the motd.custom object from
Swift to /etc/motd. Use the operator1 user credentials.

3.1. Verify the floating IP for the production-web1 instance.

[student@workstation ~(operator1-production)]$ openstack server list \


-c Name -c Networks
+-----------------+-------------------------------------------------+
| Name | Networks |
+-----------------+-------------------------------------------------+
| production-web1 | production-network1=192.168.0.N, 172.25.250.P |
+-----------------+-------------------------------------------------+

3.2. Copy the operator1 user credentials to the production-web1 instance. Use cloud-
user as the user and the /home/student/operator1-keypair1.pem key file.

[student@workstation ~(operator1-production)]$ scp \


-i ~/operator1-keypair1.pem \
operator1-production-rc \
cloud-user@172.25.250.P:~

3.3. Log in to the production-web1 instance as the cloud-user user. Use the /home/
student/operator1-keypair1.pem key file.

[student@workstation ~(operator1-production)]$ ssh \


-i ~/operator1-keypair1.pem \
cloud-user@172.25.250.P

3.4. Load the operator1 user credentials.

[cloud-user@production-web1 ~]$ source operator1-production-rc

3.5. Download the motd.custom object from the Object service using the operator1-
production-rc user credentials. Use the --file option to save the object as /etc/
motd.

Because writing /etc files requires root privileges, use sudo. Use the -E option to
carry the operator1 shell environment credentials into the new sudo root child shell,
because this command requires operator1's access to the Object storage container
while also requiring root privilege to write the /etc/motd file.

CL210-RHOSP10.1-en-2-20171006 141

Rendered for Nokia. Please do not distribute.


Chapter 4. Managing Storage

[cloud-user@production-web1 ~(operator1-production)]$ sudo -E \


openstack object save \
--file /etc/motd \
container4 \
motd.custom

4. Verify that the MOTD file includes the message Updated MOTD message.

4.1. Verify that the MOTD file was updated.

[cloud-user@production-web1 ~(operator1-production)]$ cat /etc/motd


Updated MOTD message

Evaluation
On workstation, run the lab storage-review grade command to confirm success of this
exercise.

[student@workstation ~]$ lab storage-review grade

Cleanup
From workstation, run the lab storage-review cleanup script to clean up this exercise.

[student@workstation ~]$ lab storage-review cleanup

142 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Summary

Summary
In this chapter, you learned:

• Red Hat OpenStack Platform supports both Red Hat Ceph Storage and NFS as storage back
ends.

• The Red Hat Ceph Storage architecture is based on monitor (MON) daemons and object
storage device (OSD) daemons.

• Red Hat Ceph Storage features include seamless scalability and no single point of failure.

• The Red Hat OpenStack Platform block storage and image services use RBDs to access Ceph,
and require both a user and pool to access the cluster.

• The Red Hat OpenStack Platform object storage service (Swift) provides object storage for
instances.

• The Swift architecture includes a front-end service, the proxy server, and three back-end
services: the account server, the object server, and the container server.

• Users can create containers in Swift, and upload objects to those containers.

CL210-RHOSP10.1-en-2-20171006 143

Rendered for Nokia. Please do not distribute.


144

Rendered for Nokia. Please do not distribute.


TRAINING
CHAPTER 5

MANAGING AND
TROUBLESHOOTING VIRTUAL
NETWORK INFRASTRUCTURE

Overview
Goal Manage and troubleshoot virtual network infrastructure
Objectives • Manage software-defined networking (SDN) segments and
subnets.

• Follow multi-tenant network paths.

• Troubleshoot software-defined network issues.


Sections • Managing SDN Segments and Subnets (and Guided
Exercise)

• Tracing Multitenancy Network Flows (and Guided Exercise)

• Troubleshooting Network Issues (and Guided Exercise)


Lab • Managing and Troubleshooting Virtual Network
Infrastructure

CL210-RHOSP10.1-en-2-20171006 145

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Managing SDN Segments and Subnets

Objectives
After completing this section, students should be able to:

• Discuss Software-defined networking (SDN).

• Discuss SDN implementation and use cases.

Software-defined Networking
Software-defined networking (SDN) is a networking model that allows network administrators to
manage network services through the abstraction of several networking layers. SDN decouples
the software that handles the traffic, called the control plane, and the underlying mechanisms
that route the traffic, called the data plane. SDN enables communication between the control
plane and the data plane. For example, the OpenFlow project, combined with the OpenDaylight
project, provides such implementation.

SDN does not change the underlying protocols used in networking; rather, it enables the
utilization of application knowledge to provision networks. Networking protocols, such as TCP/
IP and Ethernet standards, rely on manual configuration by administrators for applications.
They do not manage networking applications, such as their network usage, the endpoint
requirements, or how much and how fast the data needs to be transferred. The goal of SDN is
to extract knowledge of how an application is being used by the application administrator or the
application's configuration data itself.

History
The origins of SDN development can be traced to around the mid 1990s. Research and
development continued through the early 2000s by several universities and organizations. In
2011, the Open Networking Foundation (ONF) was founded to promote SDN and other related
technologies such as OpenFlow.

Benefits of SDN
Consumers continue to demand fast, reliable, secure, and omnipresent network connections
to satisfy their need for personal mobile devices such as smartphones and tablets. Service
providers are utilizing virtualization and SDN technologies to better meet those needs.

Benefits of SDN include:

• The decoupling of the control plane and data plane enables both planes to evolve
independently, which results in several advantages such as high flexibility, being vendor-
agnostic, open programmability, and a centralized network view.

• Security features that allow administrators to route traffic through a single, centrally located,
firewall. One advantage of this is the ability to utilize intrusion detection methods on real-time
captures of network traffic.

• Automated load balancing in SDNs enhances performance of servers load balancing, and
reduces the complexity of implementation.

146 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Benefits of SDN over Hardware for Networking

• Network scalability allows data centers to use features of software-defined networking along
with virtualized servers and storage to implement dynamic environments where computing
resources are added and removed as needed.

• Reduced operational costs by minimizing the need to deploy, maintain, and replace expensive
hardware such as many of the servers and network switches within a data center.

Benefits of SDN over Hardware for Networking


Hardware-based networking solutions require extensive manual deployment, configuration,
maintenance, and a replacement plan. Traditional network infrastructures are mostly static
configurations that commingle vendor solutions and proprietary hardware and software
solutions that make it difficult to scale to business needs.

The SDN architecture delivers an open technology that eliminates costly vendor lock-in and
proprietary networking devices.

Arguments for using SDN over hardware for networking are growing as the technology continues
to develop as a smart and inexpensive approach to deploy network solutions. Many companies
and organizations currently use SDN technology within their data centers, taking advantage of
cost savings, performance factors, and scalability.

SDN Architecture and Services


SDN is based on the concept of separation between controlled services and controllers that
control those services. Controllers manipulate services by way of interfaces. Interfaces are
mainly API invocations through some library or system call. However, such interfaces may be
extended with protocol definitions that use local inter-process communication (IPC) or a protocol
that can also act remotely. A protocol may be defined as an open standard or in a proprietary
manner.

Architectural Components
The following list defines and explains the architectural components:

• Application Plane: The plane where applications and services that define network behavior
reside.

• Management Plane: Handles monitoring, configuration, and maintenance of network devices,


such as making decisions regarding the state of a network device.

• Control Plane: Responsible for making decisions on how packets should be forwarded by one
or more network devices, and for pushing such decisions down to the network devices for
execution.

• Operational Plane: Responsible for managing the operational state of the network device, such
as whether the device is active or inactive, the number of ports available, the status of each
port, and so on.

• Forwarding Plane: Responsible for handling packets in the data path based on the instructions
received from the control plane. Actions of the forwarding plane include actions like
forwarding, dropping, and changing packets.

CL210-RHOSP10.1-en-2-20171006 147

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Figure 5.1: SDN Architecture

SDN Terminology
Term Definition
Application SDN applications are programs that communicate their
network requirements and desired network behavior to the
SDN controller over a northbound interface (NBI).
Datapath The SDN datapath is a logical network device that exposes
visibility control over its advertised forwarding and data
processing capabilities. An SDN datapath comprises a Control
to Data-Plane Interface (CDPI) agent and a set of one or more
traffic forwarding engines.
Controller The SDN controller is a logically centralized entity in charge of
translating the requirements from the SDN application layer
down to the SDN datapaths. SDN controllers provides a view of
the network to the SDN applications.
Control to Data-Plane The CDPI is the interface defined between an SDN controller
Interface (CDPI) and an SDN datapath that provides control of all forwarding
operations, capabilities advertisement, statistics reporting,
and event notification.
Northbound Interfaces (NBI) NBIs are interfaces between SDN applications and SDN
controllers. They typically provide network views and enable
expression of network behavior and requirements.

Introduction to Networking
Administrators should be familiar with networking concepts when working with Red Hat
OpenStack Platform. The Neutron networking service is the SDN networking project that
provides Networking-as-a-service (NaaS) in virtual environments. It implements traditional
networking features such as subnetting, bridging, VLANs, and more recent technologies, such as
VXLANs and GRE tunnels.

Network Bridges
A network bridge is a network device that connects multiple network segments. Bridges can
connect multiple devices, and each device can send Ethernet frames to other devices without
having the frame removed and replaced by a router. Bridges keep the traffic isolated, and in most

148 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Introduction to Networking

cases, the switch is aware of which MAC addresses are accessible at each port. Switches monitor
network activity and maintain a MAC learning table.

Generic Router Encapsulation (GRE)


The Generic Routing Encapsulation (GRE) protocol is an encapsulation protocol developed
by Cisco Systems, which encapsulates a wide variety of network layer protocols inside virtual
point-to-point links, called tunnels, over an Internet network. A point-to-point connection is a
connection between two nodes, or endpoints. The GRE protocol is used to run networks on top
of other networks, and within an existing TCP/IP network, two endpoints can be configured with
GRE tunnels. The GRE data is encapsulated in a header, itself encapsulated in the header of the
underlying TCP/IP network. The endpoints can either be bridged or routed if IP addresses are
manually assigned by administrators. For routed traffic, a single link is the next hop in a routing
table.

Figure 5.2: GRE Ethernet header

Virtual LAN (VLAN)


You can partition a single layer 2 network to create multiple distinct broadcast domains that are
mutually isolated, so that packets can only pass between them through one or more routers.
Such segregation is referred to as a Virtual Local Area Network (VLAN). VLANs provide the
segmentation services traditionally provided only by routers in LAN configurations. VLANs
address issues such as scalability, security, and network management. Routers in VLAN
topologies provide broadcast filtering, security, address summary, and traffic-flow management.
VLANs can also help to create multiple layer 3 networks on a single physical infrastructure.
For example, if a DHCP server is plugged into a switch, it serves any host on that switch that is
configured for DHCP. By using VLANs, the network can be easily split up, so that some hosts do
not use that DHCP server and obtain link-local addresses, or an address from a different DHCP
server.

A VLAN is defined by an IEEE 802.1Q standard for carrying traffic on an Ethernet. 802.1.Q VLANs
are distinguished by their 4-bytes VLAN tag inserted in the Ethernet header. Within this 4-byte
VLAN tag, 12 bits represent the VLAN ID. This limits the number of VLAN IDs on a network to
4096.

CL210-RHOSP10.1-en-2-20171006 149

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Figure 5.3: VLAN header

VXLAN Tunnels
Virtual eXtensible LAN (VXLAN) is a network virtualization technology that solves the scalability
problems associated with large cloud computing deployments. It increases scalability up to 16
million logical networks and allows the adjacency of layer 2 links across IP networks. The VXLAN
protocol encapsulates L2 networks and tunnels them over L3 networks.

Figure 5.4: VXLAN Ethernet header

Introducing the Neutron Networking Service


The OpenStack networking (Neutron) project provides networking as a service, which is
consumed by other OpenStack projects, such as the Nova compute service or the Designate DNS
as a Service (DNSaaS). Similar to the other OpenStack services, OpenStack Networking exposes
a set of various Application Program Interfaces (APIs) to programmatically build rich networking
topologies and implement networking policies, such as multi-tier application topologies or
highly-available web applications. OpenStack Networking ships with a set of core plug-ins that
administrators can install and configure based on their needs. Such implementation allows
administrators to utilize a variety of layer 2 and layer 3 networking technologies.

Figure 5.5: The OpenStack Networking service shows how OpenStack Networking services can
be deployed: the two compute nodes run the Open vSwitch agent, which communicate with the
network node, itself running a set of dedicated OpenStack Networking services. Services includes
the metadata server, the Neutron networking server, as well as a set of extra components, such
as the Firewall-as-a-Service (FWaaS), or the Load Balancing-as-a-Service (LBaaS).

150 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Introducing the Neutron Networking Service

Figure 5.5: The OpenStack Networking service

OpenStack Networking Terminology and Concepts


OpenStack Networking defines two types of networks, tenant and provider networks.
Administrators can share any of these types of networks among projects as part of the network
creation process. The following lists some of the OpenStack Networking concepts administrators
should be familiar with.

• Tenant networks

OpenStack users create tenant networks for connectivity within projects. By default, these
networks are completely isolated and are not shared among projects. OpenStack Networking
supports the following types of network isolation and overlay technologies:
◦ Flat: All instances reside on the same network and can be shared with underlying hosts.
Flat networks do not recognize the concepts of VLAN tagging or network segregation. Use
cases for flat networks are limited to testing or proof-of-concept because there is no overlap
allowed. Only one network is supported, which limits the number of available IP addresses.

◦ VLAN: This type of networking allows users to create multiple tenant networks using
VLAN IDs, allowing network segregation. One use case is a web layer instance with traffic
segregated from database layer instances.

◦ GRE and VXLAN: These networks provide encapsulation for overlay networks to activate and
control communication between compute instances.

• Provider networks

CL210-RHOSP10.1-en-2-20171006 151

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

These networks map to the existing physical network in a data center and are usually flat or
VLAN networks.

• Subnets

A subnet is a block of IP addresses provided by the tenant and provider networks whenever
new ports are created.

• Ports

A port is a connection for attaching a single device, such as the virtual NIC of an instance, to
the virtual network. Ports also provide the associated configuration, such as a MAC address
and IP address, to be used on that port.

• Routers

Routers forward data packets between networks. They provide L3 and NAT forwarding for
instances on tenant networks to external networks. A router is required to send traffic outside
of the tenant networks. Routers can also be used to connect the tenant network to an external
network using a floating IP address.

Routers are created by authenticated users within a project and are owned by that project.
When tenant instances require external access, users can assign networks that have been
declared external by an OpenStack administrator to their project-owned router.

Routers implement Source Network Address Translation (SNAT) to provide outbound external
connectivity and Destination Network Address Translation (DNAT) for inbound external
connectivity.

• Security groups

A security group is a virtual firewall allowing instances to control outbound and inbound traffic.
It contains a set of security group rules, which are parsed when data packets are sent out of or
into an instance.

Managing Networks
Before launching instances, the virtual network infrastructure to which instances will connect
must be created. Prior to creating a network, it is important to consider what subnets will be
used. A router is used to direct traffic from one subnet to another.

Create provider network


The provider network enables external access to instances. It allows external access from
instances using Network Address Translation (NAT), a floating IP address, and suitable security
group rules.

• To create a provider network, run the openstack network create command. Specify the
network type by using the --provider-network-type option.

[user@demo ~]$ openstack network create \


--external \
--provider-network-type vlan \
--provider-physical-network datacentre \
--provider-segment 500 \
provider-demo

152 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Layer 2 Traffic Flow

• Similar to a physical network, the virtual network requires a subnet. The provider network
shares the same subnet and gateway associated with the physical network connected to the
provider network. To create a subnet for a provider network, run the openstack subnet
create command:

[user@demo ~]$ openstack subnet create \


--no-dhcp \
--subnet-range 172.25.250.0/24 \
--gateway 172.25.250.254 \
--dns-nameserver 172.25.250.254 \
--allocation-pool start=172.25.250.101,end=172.25.250.189 \
--network provider-demo \
provider-subnet-demo

Managing Tenant Networks


Tenant networks provide internal network access for instances of a particular project.

• To create a tenant network, run the openstack network create command.

[user@demo ~]$ openstack network create demo-network1

• Create the corresponding subnet for the tenant network, specifying the tenant network
CIDR. By default, this subnet uses DHCP so the instances can obtain IP addresses. The first IP
address of the subnet is reserved as the gateway IP address.

[user@demo ~]$ openstack subnet create \


--network demo-network1 \
--subnet-range=192.168.1.0/24 \
--dns-nameserver=172.25.250.254 \
--dhcp demo-subnet1

Layer 2 Traffic Flow


Figure 5.6: Layer 2 Traffic Flow describes the network flow for an instance running in an
OpenStack environment, using Open vSwitch as a virtual switch.

CL210-RHOSP10.1-en-2-20171006 153

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Figure 5.6: Layer 2 Traffic Flow

1. Packets leaving the eth0 interface of the instance are routed to a Linux bridge.

2. The Linux bridge is connected to an Open vSwitch bridge by a vEth pair. The Linux bridge
is used for inbound and outbound firewall rules, as defined by the security groups. Packets
traverse the vEth pair to reach the integration bridge, usually named br-int.

3. Packets are then moved to the external bridge, usually br-ex, over patch ports. OVS flows
manage packet headers according to the network configuration. For example, flows are used
to strip VLAN tags from network packets before forwarding them to the physical interfaces.

Managing Networks and Subnets


The following steps outline the process for managing networks and subnets in OpenStack.

1. Create the provider network.

2. Create a subnet for a provider network, and specify the floating IP address slice using the
--allocation-pool option.

3. Create a tenant network; for example, a VXLAN-based network.

154 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Layer 2 Traffic Flow

4. Create the corresponding subnet for the tenant network, specifying the tenant network
CIDR. The first IP address of the subnet is reserved as the gateway IP address.

References
Further information is available in the Networking Guide for Red Hat OpenStack
Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

CL210-RHOSP10.1-en-2-20171006 155

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Guided Exercise: Managing SDN Segments and


Subnets

In this exercise, you will manage networks and routers. You will also review the implementation of
the network environment.

Outcomes
You should be able to:

• Create networks

• Create routers

• Review the network implementation

Before you begin


Log in to workstation as student using student as the password.

Run the lab network-managing-sdn setup command. This script ensures that the
OpenStack services are running and the environment is properly configured for this exercise.
The script creates the OpenStack user developer1 and the OpenStack administrative user
architect1 in the research project. The script also creates the rhel7 image and the
m1.small flavor.

[student@workstation ~]$ lab network-managing-sdn setup

Steps
1. From workstation, source the developer1-research-rc credentials file. As the
developer1 user, create a network for the project. Name the network research-
network1.

[student@workstation ~]$ source developer1-research-rc


[student@workstation ~(developer1-research)]$ openstack network create \
research-network1
+-------------------------+--------------------------------------+
| Field | Value |
+-------------------------+--------------------------------------+
| admin_state_up | UP |
| availability_zone_hints | |
| availability_zones | |
| created_at | 2017-06-07T18:43:05Z |
| description | |
| headers | |
| id | b4b6cea6-51ed-45ae-95ff-9e67512a4fc8 |
| ipv4_address_scope | None |
| ipv6_address_scope | None |
| mtu | 1446 |
| name | research-network1 |
| port_security_enabled | True |
| project_id | 6b2eb5c2e59743b9b345ee54a7f87321 |
| project_id | 6b2eb5c2e59743b9b345ee54a7f87321 |
| qos_policy_id | None |
| revision_number | 3 |
| router:external | Internal |

156 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


| shared | False |
| status | ACTIVE |
| subnets | |
| tags | [] |
| updated_at | 2017-06-07T18:43:05Z |
+-------------------------+--------------------------------------+

2. Create the subnet research-subnet1 for the network in the 192.168.1.0/24 range. Use
172.25.250.254 as the DNS server.

[student@workstation ~(developer1-research)]$ openstack subnet create \


--network research-network1 \
--subnet-range=192.168.1.0/24 \
--dns-nameserver=172.25.250.254 \
--dhcp research-subnet1
+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| allocation_pools | 192.168.1.2-192.168.1.254 |
| cidr | 192.168.1.0/24 |
| created_at | 2017-06-07T18:47:44Z |
| description | |
| dns_nameservers | 172.25.250.254 |
| enable_dhcp | True |
| gateway_ip | 192.168.1.1 |
| headers | |
| host_routes | |
| id | f952b9e9-bf30-4889-bb89-4303b4e849ae |
| ip_version | 4 |
| ipv6_address_mode | None |
| ipv6_ra_mode | None |
| name | research-subnet1 |
| network_id | b4b6cea6-51ed-45ae-95ff-9e67512a4fc8 |
| project_id | 6b2eb5c2e59743b9b345ee54a7f87321 |
| project_id | 6b2eb5c2e59743b9b345ee54a7f87321 |
| revision_number | 2 |
| service_types | [] |
| subnetpool_id | None |
| updated_at | 2017-06-07T18:47:44Z |
+-------------------+--------------------------------------+

3. Open another terminal and log in to the controller node, controller0, to review the ML2
configuration. Ensure that there are driver entries for VLAN networks.

3.1. Log in to the controller node as the heat-admin user and become root.

[student@workstation ~]$ ssh heat-admin@controller0


[heat-admin@overcloud-controller-0 ~]$ sudo -i
[root@overcloud-controller-0 ~]#

3.2. Go to the /etc/neutron/ directory. Use the crudini command to retrieve the values
for the type_drivers key in the ml2 group. Ensure that the vlan driver is included.

[root@overcloud-controller-0 heat-admin]# cd /etc/neutron


[root@overcloud-controller-0 neutron]# crudini --get plugin.ini ml2 type_drivers
vxlan,vlan,flat,gre

CL210-RHOSP10.1-en-2-20171006 157

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

3.3. Retrieve the name of the physical network used by VLAN networks. ML2 groups are
named after the driver, for example, ml2_type_vlan.

[root@overcloud-controller-0 neutron]# crudini --get plugin.ini \


ml2_type_vlan network_vlan_ranges
datacentre:1:1000
[root@overcloud-controller-0 neutron]# exit
[heat-admin@overcloud-controller-0 ~]$ exit
[student@workstation ~]$ exit

4. On workstation, as the architect1 user, create the provider network


provider-172.25.250. The network will be used to provide external connectivity. Use
vlan as the provider network type with an segment ID of 500. Use datacentre as the
physical network name, as defined in the ML2 configuration file.

[student@workstation ~(developer1-research)]$ source architect1-research-rc


[student@workstation ~(architect1-research)]$ openstack network create \
--external \
--provider-network-type vlan \
--provider-physical-network datacentre \
--provider-segment 500 \
provider-172.25.250
+---------------------------+--------------------------------------+
| Field | Value |
+---------------------------+--------------------------------------+
| admin_state_up | UP |
| availability_zone_hints | |
| availability_zones | |
| created_at | 2017-06-07T20:33:50Z |
| description | |
| headers | |
| id | e4ab7774-8f69-4383-817f-e6e1d063c7d3 |
| ipv4_address_scope | None |
| ipv6_address_scope | None |
| is_default | False |
| mtu | 1496 |
| name | provider-172.25.250 |
| port_security_enabled | True |
| project_id | 6b2eb5c2e59743b9b345ee54a7f87321 |
| project_id | 6b2eb5c2e59743b9b345ee54a7f87321 |
| provider:network_type | vlan |
| provider:physical_network | datacentre |
| provider:segmentation_id | 500 |
| qos_policy_id | None |
| revision_number | 4 |
| router:external | External |
| shared | False |
| status | ACTIVE |
| subnets | |
| tags | [] |
| updated_at | 2017-06-07T20:33:50Z |
+---------------------------+--------------------------------------+

5. Create the subnet for the provider network provider-172.25.250 with an allocation
pool of 172.25.250.101 - 172.25.250.189. Name the subnet provider-
subnet-172.25.250. Use 172.25.250.254 for both the DNS server and the gateway.
Disable DHCP for this network.

158 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


[student@workstation ~(architect1-research)]$ openstack subnet create \
--no-dhcp \
--subnet-range 172.25.250.0/24 \
--gateway 172.25.250.254 \
--dns-nameserver 172.25.250.254 \
--allocation-pool start=172.25.250.101,end=172.25.250.189 \
--network provider-172.25.250 \
provider-subnet-172.25.250
+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| allocation_pools | 172.25.250.101-172.25.250.189 |
| cidr | 172.25.250.0/24 |
| created_at | 2017-06-07T20:42:26Z |
| description | |
| dns_nameservers | 172.25.250.254 |
| enable_dhcp | False |
| gateway_ip | 172.25.250.254 |
| headers | |
| host_routes | |
| id | 07ea3c70-18ab-43ba-a334-717042842cf7 |
| ip_version | 4 |
| ipv6_address_mode | None |
| ipv6_ra_mode | None |
| name | provider-subnet-172.25.250 |
| network_id | e4ab7774-8f69-4383-817f-e6e1d063c7d3 |
| project_id | 6b2eb5c2e59743b9b345ee54a7f87321 |
| project_id | 6b2eb5c2e59743b9b345ee54a7f87321 |
| revision_number | 2 |
| service_types | [] |
| subnetpool_id | None |
| updated_at | 2017-06-07T20:42:26Z |
+-------------------+--------------------------------------+

6. As the developer1 user, create the router research-router1. Add an interface to


research-router1 in the research-subnet1 subnet. Define the router as a gateway for
the provider-172.25.250 network.

6.1. Source the developer1-research-rc credentials file and create the research-
router1 router.

[student@workstation ~(architect1-research)]$ source developer1-research-rc


[student@workstation ~(developer1-research)]$ openstack router create \
research-router1
+-------------------------+--------------------------------------+
| Field | Value |
+-------------------------+--------------------------------------+
| admin_state_up | UP |
| availability_zone_hints | |
| availability_zones | |
| created_at | 2017-06-07T20:56:46Z |
| description | |
| external_gateway_info | null |
| flavor_id | None |
| headers | |
| id | dbf911e3-c3c4-4607-b4e2-ced7112c7541 |
| name | research-router1 |
| project_id | 6b2eb5c2e59743b9b345ee54a7f87321 |
| project_id | 6b2eb5c2e59743b9b345ee54a7f87321 |
| revision_number | 3 |

CL210-RHOSP10.1-en-2-20171006 159

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

| routes | |
| status | ACTIVE |
| updated_at | 2017-06-07T20:56:46Z |
+-------------------------+--------------------------------------+

6.2. Add an interface to research-router1 in the research-subnet1 subnet. The


command does not produce any output.

[student@workstation ~(developer1-research)]$ openstack router add \


subnet research-router1 research-subnet1

6.3. Use the neutron command to define the router as a gateway for the
provider-172.25.250 network.

[student@workstation ~(developer1-research)]$ neutron router-gateway-set \


research-router1 provider-172.25.250
Set gateway for router research-router1

7. Create a floating IP in the provider network, provider-172.25.250.

[student@workstation ~(developer1-research)]$ openstack floating ip \


create provider-172.25.250
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
| created_at | 2017-06-07T22:44:51Z |
| description | |
| fixed_ip_address | None |
| floating_ip_address | 172.25.250.P |
| floating_network_id | e4ab7774-8f69-4383-817f-e6e1d063c7d3 |
| headers | |
| id | 26b0ab61-170e-403f-b67d-558b94597e08 |
| port_id | None |
| project_id | 6b2eb5c2e59743b9b345ee54a7f87321 |
| project_id | 6b2eb5c2e59743b9b345ee54a7f87321 |
| revision_number | 1 |
| router_id | None |
| status | DOWN |
| updated_at | 2017-06-07T22:44:51Z |
+---------------------+--------------------------------------+

8. Launch the research-web1 instance in the environment. Use the m1.small flavor and the
rhel7 image. Connect the instance to the research-network1 network.

[student@workstation ~(developer1-research)]$ openstack server create \


--image rhel7 \
--flavor m1.small \
--nic net-id=research-network1 \
--wait research-web1
+--------------------------------------+--------------------------------------+
| Field | Value |
+--------------------------------------+--------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |

160 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


| OS-SRV-USG:launched_at | 2017-06-07T22:50:55.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | research-network1=192.168.1.N |
| adminPass | CEkrjL8hKWtR |
...output omitted...
+--------------------------------------+--------------------------------------+

9. Associate the floating IP, created previously, to the instance.

9.1. View the floating IP created earlier.

[student@workstation ~(developer1-research)]$ openstack floating ip list \


-f value -c 'Floating IP Address'
172.25.250.P

9.2. Associate the IP to the research-web1 instance.

[student@workstation ~(developer1-research)]$ openstack server add \


floating ip research-web1 172.25.250.P

9.3. List the network ports. Locate the UUID of the port corresponding to the instance in the
research-network1 network.

In the output, f952b9e9-bf30-4889-bb89-4303b4e849ae is the ID of the subnet


for the research-network1 network.

[student@workstation ~(developer1-research)]$ openstack subnet list \


-c ID -c Name
+--------------------------------------+------------------+
| ID | Name |
+--------------------------------------+------------------+
| f952b9e9-bf30-4889-bb89-4303b4e849ae | research-subnet1 |
...output omitted..
+--------------------------------------+------------------+
[student@workstation ~(developer1-research)]$ openstack port list -f json
[
{
"Fixed IP Addresses": "ip_address='192.168.1.N', subnet_id='f952b9e9-
bf30-4889-bb89-4303b4e849ae'",
"ID": "1f5285b0-76b5-41db-9cc7-578289ddc83c",
"MAC Address": "fa:16:3e:f0:04:a9",
"Name": ""
},
...output omitted...

10. Open another terminal. Use the ssh command to log in to the compute0 virtual machine as
the heat-admin user.

[student@workstation ~]$ ssh heat-admin@compute0


[heat-admin@overcloud-compute-0 ~]$

11. List the Linux bridges in the environment. Ensure that there is a qbr bridge that uses the
first ten characters of the Neutron port in its name. The bridge has two ports in it: the TAP

CL210-RHOSP10.1-en-2-20171006 161

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

device that the instance uses and the qvb vEth pair, which connects the Linux bridge to the
integration bridge.

[heat-admin@overcloud-compute-0 ~]$ brctl show


qbr1f5285b0-76 8000.ce25a52e5a32 no qvb1f5285b0-76
tap1f5285b0-76

12. Exit from the compute node and connect to the controller node.

[heat-admin@overcloud-compute-0 ~]$ exit


[student@workstation ~]$ ssh heat-admin@controller0
[heat-admin@overcloud-controller-0 ~]$

13. To determine the port ID of the phy-br-ex bridge, use the ovs-ofctl command. The
output lists the ports in the br-ex bridge.

[heat-admin@overcloud-controller-0 ~]$ sudo ovs-ofctl show br-ex


OFPT_FEATURES_REPLY (xid=0x2): dpid:000052540002fa01
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst
mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(eth2): addr:52:54:00:02:fa:01
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(phy-br-ex): addr:1a:5d:d6:bb:01:a1
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
...output omitted...

14. Dump the flows for the external bridge, br-ex. Review the entries to locate the flow for the
packets passing through the tenant network. Locate the rule that handles packets in the
phy-br-ex port. The following output shows how the internal VLAN ID, 2, is replaced with
the VLAN ID 500 as defined by the --provider-segment 500 option.

[heat-admin@overcloud-controller-0 ~]$ sudo ovs-ofctl dump-flows br-ex


NXST_FLOW reply (xid=0x4):
cookie=0xbcb9ae293ed51406, duration=2332.961s, table=0, n_packets=297,
n_bytes=12530, idle_age=872, priority=4,in_port=2,dl_vlan=2
actions=mod_vlan_vid:500,NORMAL
...output omitted...

15. Exit from the controller0 node.

[heat-admin@overcloud-controller-0 ~]$ exit


[student@workstation ~]$

Cleanup
From workstation, run the lab network-managing-sdn cleanup script to clean up the
resources created in this exercise.

162 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


[student@workstation ~]$ lab network-managing-sdn cleanup

CL210-RHOSP10.1-en-2-20171006 163

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Tracing Multitenancy Network Flows

Objectives
After completing this section, students should be able to:

• Discuss network flow and network paths.

• Discuss VLAN translation in OpenStack.

• Discuss network tunneling.

• Discuss the usage of Netfilter in OpenStack.

• Discuss the various network devices used in OpenStack.

• Discuss security groups and floating IPs.

Introduction to Modular Layer 2 (ML2)


The Modular Layer 2 (ML2) plug-in is a framework that enables the usage of various
technologies. For instance, administrators can interact with Open vSwitch, which is a technology
that provides virtual switching, or Cisco equipment, using the various plug-ins available for
OpenStack Networking.

ML2 Drivers and Networks Types


Starting with Red Hat OpenStack Platform 4 (Havana), the introduction of the ML2 architecture
allows users to use more than one networking technology. Before the introduction of the ML2
plug-in, it was not possible to simultaneously run multiple network plug-ins such as Linux bridges
and Open vSwitch bridges. The ML2 framework creates a layer of abstraction that separates the
management of network types from the mechanisms used to access these networks, and allows
multiple mechanism drivers to access the same networks simultaneously. The implementation of
ML2 gives the possibility to companies and manufacturers to develop their own plug-ins.

To this day, there are more than 20 drivers available from various manufacturers, including
Cisco, Microsoft, Nicira, Ryu, and Lenovo. Drivers implement a set of extensible mechanisms for
various network back-ends to be able to communicate with OpenStack Networking services. The
implementations can either utilize layer 2 agents with a Remote Procedure Call (RPC) or use the
OpenStack Networking mechanism drivers to interact with external devices or controllers. In
OpenStack, each network type is managed by a ML2 driver. Such drivers maintain any needed
network state, and can perform network validation or the creation of networks for OpenStack
projects.

The ML2 plug-in currently includes drivers for the following network types:

• Local: a network that can only be implemented on a single host. Local networks must only be
used in proof-of-concept or development environments.

• Flat: a network that does not support segmentation. A traditional layer 2 Ethernet network
can be considered a flat network. Servers that are connected to flat networks can listen to the
broadcast traffic and can contact each other. In OpenStack terminology, flat networks are used
to connect instances to existing layer 2 networks, or provider networks.

164 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


OpenStack Networking Concepts

• VLAN: a network that uses VLANs for segmentation. When users create VLAN networks,
a VLAN identifier (ID) is assigned from the range defined in the OpenStack Networking
configuration. Administrators must configure the network switches to trunk the corresponding
VLANs.

• GRE and VXLAN: networks that are similar to VLAN networks. GRE and VXLAN are overlay
networks that encapsulate network traffic. Both networks receive a unique tunnel identifier.
However, unlike VLANs, overlay networks do not require any synchronization between the
OpenStack environment and layer 2 switches.

The following lists some of the available OpenStack Networking ML2 plug-ins:

• Open vSwitch
• Cisco UCS and Nexus
• Linux Bridge
• Nicira Network Virtualization Platform (NVP)
• Ryu and OpenFlow Controller
• NEC OpenFlow
• Big Switch Controller
• Cloudbase Hyper-V
• MidoNet
• PLUMgrid
• Embrane
• IBM SDN-VE
• Nuage Networks
• OpenContrail
• Lenovo Networking

OpenStack Networking Concepts


OpenStack Networking manages services such as network routing, DHCP, and injection of
metadata into instances. OpenStack Networking services can either be deployed on a stand-alone
node, which is usually referred to as Network node, or adjacently to other OpenStack services. In
a stand-alone configuration, servers perform dedicated network tasks, such as managing Layer 3
routing for the network traffic to and from the instances.

Note
Red Hat OpenStack Platform 10 adds support for composable roles. Composable roles
allow administrators to separate the network services into a custom role.

Layer 2 Population
The layer 2 (L2) population driver enables broadcast, multicast, and unicast traffic to scale
out on large overlay networks. By default, Open vSwitch GRE and VXLAN networks replicate
broadcasts to every agent, including those that do not host the destination network. This leads
to a significant network and processing overhead. L2 population is a mechanism driver for
OpenStack Networking ML2 plug-ins that leverages the implementation of overlay networks. The
service works by gaining full knowledge of the topology, which includes the MAC address and the
IP address of each port. As a result, forwarding tables can be programmed beforehand and the
processing of ARP requests is optimized. By populating the forwarding tables of virtual switches,

CL210-RHOSP10.1-en-2-20171006 165

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

such as Linux bridges or Open vSwitch bridges, the driver decreases the broadcast traffic inside
the physical networks.

Introduction to Layer 2 and Layer 3 Networking


When designing their virtual network, administrators need to anticipate where the majority
of traffic is going to be sent. In general, network traffic moves faster within the same logical
network than between different networks. This is explained by the fact that the traffic between
logical networks, which use different subnets, needs to pass through a router, which results in
additional latency and overhead.

Figure 5.7: Network routing on separate VLANs shows the network traffic flowing between
instances on separate VLANs:

Figure 5.7: Network routing on separate VLANs

Switching occurs at a lower level of the network, that is, on layer 2, which functions faster than
routing that occurs at layer 3. Administrators should consider having as few network hops as
possible between instances. Figure 5.8: Network switching shows a switched network that spans
on two physical systems, which allows two instances to directly communicate without using a
router. The instances share the same subnet, which indicates that they are on the same logical
network:

166 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Introduction to Subnets

Figure 5.8: Network switching

Introduction to Subnets
A subnet is a logical subdivision of an IP network. On TCP/IP networks, the logical subdivision
is defined as all devices whose IP addresses have the same prefix. For example, using a /24
subnet mask, all devices with IP addresses on 172.16.0.0/24 would be part of the same
subnet with 256 possible addresses. Addresses on the /24 subnet include a network address of
172.16.0.0 and a broadcast address of 172.16.0.255, leaving 254 available host addresses
on the same subnet. A /24 subnet can be split by using a /25 subnet mask: 172.16.0.0/25
and 172.16.0.128/25, with 126 hosts per subnet. The first subnet would have a range from
172.16.0.0 (network) to 172.16.0.127 (broadcast) leaving 126 available host addresses.
The second subnet would have a range from 172.16.0.128 (network) to 172.16.0.255
(broadcast) leaving 126 available host addresses. This demonstrates that networks can be divided
into one or more subnets depending on their subnet mask.

A subnet may be used to represent all servers present in the same geographic location, or on
the same Local Area Network (LAN). By using subnets to divide the network, administrators can
connect many devices spread across multiple segments to the Internet. Subnets are a useful way
to share a network and create subdivisions on segments. The practice of creating subnet is called
subnetting. Figure 5.9: Network subnets shows three subnets connected to the same router.

CL210-RHOSP10.1-en-2-20171006 167

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Figure 5.9: Network subnets

Subnets can be represented in two ways:

• Variable Length Subnet Mask (VLSM): subnet addresses are traditionally displayed using the
network address accompanied by the subnet mask. For example:

Network Address: 192.168.100.0


Subnet mask: 255.255.255.0

• Classless Inter Domain Routing (CIDR): this format shortens the subnet mask into its
total number of active bits. For example, in 192.168.100.0/24 the /24 is a shortened
representation of 255.255.255.0, which is a total of the number of flipped bits when
converted to binary.

Management of Subnets in OpenStack


This same networking concept of subnetting applies in OpenStack. OpenStack Networking
provides the API for virtual networking capabilities, which includes not only subnet management,
but also routers and firewalls. The virtual network infrastructure allows instances to
communicate with each other, as well as externally using the physical network. In OpenStack, a
subnet is attached to a network, and a network can have one or multiple subnets. IP addresses
are generally first allocated in blocks of subnets.

For example, the IP address range of 192.168.100.0 - 192.168.100.255 with a subnet


mask of 255.555.255.0 allows for 254 IP addresses to be used. The first and last addresses are
reserved for the network and broadcast.

168 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Introduction to Subnets

Note
Since all layer 2 plug-ins provide a total isolation between layer 2 networks,
administrators can use overlapping subnets. This is made possible by the use of
network namespaces that have their own routing tables. Routing tables manage the
routing of traffic. As each namespace has its own routing table, OpenStack Networking
is able to provide overlapping address in different virtual networks.

Administrators can use both the Horizon dashboard and the command-line interface to manage
subnets. The following output shows two subnets, each belonging to a network.

[user@demo ~]$ openstack subnet list -c Name -c Network -c Subnet


+--------------+--------------------------------------+-----------------+
| Name | Network | Subnet |
+--------------+--------------------------------------+-----------------+
| subinternal1 | 0062e02b-7e40-407f-ac43-49e84de096ed | 192.168.0.0/24 |
| subexternal1 | 8d633bda-3ef4-4267-878f-265d5845f20a | 172.25.250.0/24 |
+--------------+--------------------------------------+-----------------+

The subinternal1 subnet is an internal subnet, which provides internal networking for
instances. The openstack subnet show command allows administrators to review the details
for a given subnet.

[user@demo ~]$ openstack subnet show subinternal1


+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| allocation_pools | 192.168.0.2-192.168.0.254 |
| cidr | 192.168.0.0/24 |
| created_at | 2017-05-03T16:47:29Z |
| description | |
| dns_nameservers | |
| enable_dhcp | True |
| gateway_ip | 192.168.0.1 |
| host_routes | |
| id | 9f42ecca-0f8b-4968-bb53-a01350df7c7c |
| ip_version | 4 |
| ipv6_address_mode | None |
| ipv6_ra_mode | None |
| name | subinternal1 |
| network_id | 0062e02b-7e40-407f-ac43-49e84de096ed |
| project_id | c06a559eb68d4c5a846d9b7c829b50d2 |
| project_id | c06a559eb68d4c5a846d9b7c829b50d2 |
| revision_number | 2 |
| service_types | [] |
| subnetpool_id | None |
| updated_at | 2017-05-03T16:47:29Z |
+-------------------+--------------------------------------+

The Network Topology view in the Horizon dashboard allows administrators to review their
network infrastructure. Figure 5.10: Network topology shows a basic topology comprised of an
external network and a private network, connected by a router:

CL210-RHOSP10.1-en-2-20171006 169

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Figure 5.10: Network topology

Introduction to Network Namespaces


A Linux network namespace is a copy of the Linux network stack, which can be seen as a
container for a set of identifiers. Namespaces provide a level of direction to specific identifiers
and make it possible to differentiate between identifiers with the same exact name. Namespaces
give administrators the possibility to have different and separate instances of network interfaces
and routing tables that operate independently of each other. Network namespaces have their
own network routes and their own firewall rules, as well as their own network devices. Linux
network namespaces are used to prevent collisions between the physical networks on the
network host and the logical networks used by the virtual machines. They also prevent collisions
across different logical networks that are not routed to each other.

Usage of Namespaces in OpenStack


Networks for OpenStack projects might overlap with those of the physical network. For example,
if a management network is implemented on the eth2 device, and also happens to be on the
192.168.101.0/24 subnet, routing problems will occur, because the host cannot determine
whether to send a packet on the subnet of a project network or to eth2. If end users are
permitted to create their own logical networks and subnets, then the system must be designed to
avoid the possibility of such collisions. OpenStack Networking uses Linux network namespaces to
prevent collisions between the physical networks on the network host, and the logical networks
used by the instances.

OpenStack Networking typically implements two namespaces:

• Namespaces for routers, named qrouter-UUID, where UUID is the router ID. The router
namespace contains TAP devices like qr-YYY, qr-ZZZ, and qg-VVV as well as the
corresponding routes.

170 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Introduction to Floating IPs

• Namespaces for projects that use DHCP services, named qdhcp-UUID, where UUID is the
network ID. The project namespace contains the tapXXX interfaces and the dnsmasq process
that listens on that interface in order to provide DHCP services for project networks. This
namespace allows overlapping IPs between various subnets on the same network host.

The following output shows the implementation of network namespaces after the creation
of a project. In this setup, the namespaces are created on the controller, which also runs the
networking services.

[user@demo ~]$ ip netns list

qrouter-89bae387-396c-4b24-a064-241103bcdb14

qdhcp-0062e02b-7e40-407f-ac43-49e84de096ed

The UUID of a OpenStack Networking router.


The UUID of a OpenStack Networking network.
Administrators can access the various network devices in the namespace by running the ip
netns exec qdhcp-UUID command. The following output shows the TAP device that the
DHCP server uses for providing IP leases to the instances in a qdhcp namespace:

[user@demo ~]$ sudo ip netns exec qdhcp-0062e02b-7e40-407f-ac43-49e84de096ed ip a


...output omitted...
21: tapae83329c-91: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1446 qdisc noqueue state
UNKNOWN qlen 1000
link/ether fa:16:3e:f2:48:da brd ff:ff:ff:ff:ff:ff
inet 192.168.0.2/24 brd 192.168.0.255 scope global tapae83329c-91
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fef2:48da/64 scope link
valid_lft forever preferred_lft forever

Introduction to Floating IPs


In OpenStack terminology, a floating IP is an IP address allocated from a pool for a network. A
floating IP is a routable IP address that is publicly reachable. Floating IPs enable communication
from the external network to instances with a floating IP. Routing from a floating IP to a private
IP assigned to an instance is performed by the OpenStack Networking L3 agent, which manages
the routers as well as the floating IPs. The service generates a set of routing rules to create a
static one-to-one mapping, from a floating IP on the external network, to the private IP assigned
to an instance. The OpenStack Networking L3 agent interacts with the Netfilter service in order
to create a routing topology for the floating IPs.

Implementation of Floating IPs


Floating IP addresses are not directly assigned to instances. Rather, a floating IP is an IP address
attached to an OpenStack networking virtual device. They are IP aliases defined on router
interfaces.

The following sequence is a high-level description of how the floating IP address


172.25.250.28 is implemented when a user assigns it to an instance. It does not describe the
extra configuration performed by various network agents.

1. When a floating IP is attached to an instance, an IP alias is added to the qg-UUID


device, where UUID is the truncated identifier of the router port in the external network.
Administrators can view the IP address on the network node by listing the IP addresses in
the network namespace for the router:

CL210-RHOSP10.1-en-2-20171006 171

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

[user@demo ~]$ ip netns exec qrouter-UUID ip addr sh dev qg-XXX


23: qg-9d11d7d6-45: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1496 qdisc noqueue state
UNKNOWN qlen 1000
link/ether fa:16:3e:56:28:e4 brd ff:ff:ff:ff:ff:ff
inet 172.25.250.25/24 brd 172.25.250.255 scope global qg-9d11d7d6-45
valid_lft forever preferred_lft forever
inet 172.25.250.28/32 brd 172.25.250.28 scope global qg-9d11d7d6-45
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe56:28e4/64 scope link
valid_lft forever preferred_lft forever

2. A set of Netfilter rules is created in the router namespace. This routes the packet between
the instance's IP and the floating IP. OpenStack Networking implements a rule for incoming
traffic (SNAT) as well as for the outgoing traffic (DNAT). The following output shows the two
Netfilter rules in the router namespace.

[user@demo ~]$ ip netns exec qrouter-UUID iptables -L -nv -t nat | grep 250.28
24 1632 DNAT all -- * * 0.0.0.0/0 172.25.250.28
to:192.168.0.11
8 672 SNAT all -- * * 192.168.0.11 0.0.0.0/0
to:172.25.250.28

Note
The same network can be used to allocate floating IP addresses to instances even if
they have been added to private networks at the same time. The addresses allocated
as floating IPs from this network are bound to the qrouter namespace on the network
node, and perform both the Source Network Address Translation (SNAT) and Destination
Network Address Translation (DNAT) to the associated private IP address.

In contrast, the IP address allocated to the instance for direct external network access
is bound directly inside the instance, and allows the instance to communicate directly
with external networks.

Usage of Netfilter by OpenStack Networking


Netfilter, which is a framework provided by the Linux kernel, allows networking-related
operations to be implemented in the form of rules. Netfilter analyzes and inspects packets in
order to determine how to handle them. It uses user-defined rules to route the packet through
the network stack. OpenStack Networking uses Netfilter for handling network packets, managing
security groups, and routing network packets for the floating IPs allocated to instances. Figure
5.11: Netfilter packets inspection shows how networks packets are handled by the program.

172 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Introduction to Floating IPs

Prerouting Routing Forward Postrouting


Decision

Input Output

Local Process

Linux Kernel
Inspection Point

Figure 5.11: Netfilter packets inspection

OpenStack Networking uses Netfilter to:

• Set basic rules for various network services, such as NTP, VXLAN, or SNMP traffic.

• Allow source NAT on outgoing traffic, which is the traffic originating from instances.

• Set a default rule that drops any unmatched traffic.

• Create a rule that allows direct traffic from the instance's network devices to the security
group chain.

• Set rules that allow traffic from a defined set of IP and MAC address pairs.

• Allow DHCP traffic from DCHP servers to the instances.

• Prevent DHCP spoofing by the instances.

• Drop any packet that is not associated with a state. States include NEW, ESTABLISHED,
RELATED, INVALID, and UNTRACKED.

• Routes direct packets that are associated with a known session to the RETURN chain.

The following output shows some of the rules implemented in a compute node. The neutron-
openvswi-FORWARD chain contains the two rules that direct the instance's traffic to the
security group chain. In the following output, the instance's security group chain is named
neutron-openvswi-scb2aafd8-b

...output omitted...
Chain neutron-openvswi-FORWARD (1 references)
pkts bytes target prot opt in out source destination
4593 387K neutron-openvswi-sg-chain all -- * * 0.0.0.0/0
0.0.0.0/0 PHYSDEV match --physdev-out tapcb2aafd8-b1 --physdev-is-bridged /*
Direct traffic from the VM interface to the security group chain. */

CL210-RHOSP10.1-en-2-20171006 173

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

4647 380K neutron-openvswi-sg-chain all -- * * 0.0.0.0/0


0.0.0.0/0 PHYSDEV match --physdev-in tapcb2aafd8-b1 --physdev-is-bridged /*
Direct traffic from the VM interface to the security group chain. */
...output omitted...

Chain neutron-openvswi-scb2aafd8-b (1 references)


pkts bytes target prot opt in out source destination
4645 379K RETURN all -- * * 192.168.0.11 0.0.0.0/0
MAC FA:16:3E:DC:58:D1 /* Allow traffic from defined IP/MAC pairs. */
0 0 DROP all -- * * 0.0.0.0/0 0.0.0.0/0
/* Drop traffic without an IP/MAC allow rule. */

Virtual Network Devices


OpenStack Networking uses virtual devices for various purposes. Routers, floating IPs, instances,
and DHCP server are the main virtual objects that require virtual network devices. Assuming the
usage of Open vSwitch as the network plug-in, there are four distinct type of virtual networking
devices: TAP devices, vEth pairs, Linux bridges, and Open vSwitch bridges.

A TAP device, such as vnet0 is how hypervisors such as KVM implement a virtual network
interface card. Virtual network cards are typically called VIF or vNIC. An Ethernet frame sent to
a TAP device is received by the guest operating system.

A vEth pair is a pair of virtual network interfaces connected together. An Ethernet frame sent to
one end of a vEth pair is received by the other end of a vEth pair. OpenStack Networking makes
use of vEth pairs as virtual patch cables in order to make connections between virtual bridges.

A Linux bridge behaves like a hub: administrators can connect multiple network interface
devices, whether physical or virtual, to a Linux bridge. Any Ethernet frames that come in from
one interface attached to the bridge is transmitted to all of the other devices. Moreover, bridges
are aware of the MAC addresses of the devices attached to them.

An Open vSwitch bridge behaves like a virtual switch: network interface devices connect to Open
vSwitch bridge's ports, and the ports can be configured like a physical switch's ports, including
VLAN configurations.

For an Ethernet frame to travel from eth0, which is the local network interface of a instance, to
the physical network, it must pass through six devices inside of the host:

1. A TAP device, such as vnet0.

2. A Linux bridge, such as qbrcb2aafd8-b1.

3. A project vEth pair, such as qvbcb2aafd8-b1 and qvocb2aafd8-b1.

4. The Open vSwitch integration bridge, br-int.

5. The provider vEth pair, int-br-eth1 and phy-br-eth1.

6. The physical network interface card; for example, eth1.

Introduction to Security Groups


Security groups and security rules filter the type and direction of network traffic sent to, and
received from, an OpenStack Networking port. This provides an additional layer of security to
complement any firewall rules present on compute nodes. Security groups are containers of
objects with one or more security rules. A single security group can manage traffic to multiple

174 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


br-tun and VLAN Translation

OpenStack instances. Both the ports created for floating IP addresses, as well as the instances,
are associated with a security group. If none is specified, the port is then associated with the
default security group. Additional security rules can be added to the default security group to
modify its behavior, or new security groups can be created as necessary.

Note
By default, the group drops all inbound traffic and allows all outbound traffic.

Implementation of Security Groups


When a new security group is created, OpenStack Networking and the Nova compute service
define an adequate set of Netfilter rules. For example, if administrators add a security rule
to allow the ICMP traffic to pass through in order to reach instances in a project, a set of rule
sequences is implemented to route the traffic from the external network to the instance.

Netfilter rules are created on the compute node. Each time a new rule is created, a Netfilter rule
is inserted in the neutron-openvswi-XXX chain. The following output shows the Netfilter rule
that allow remote connections to the TCP port 565 after the creation of a security group rule.

[user@demo ~]$ iptables -L -nv


Chain neutron-openvswi-icb2aafd8-b (1 references)
...output omitted...
0 0 RETURN tcp -- * * 0.0.0.0/0 0.0.0.0/0
tcp dpt:565
...output omitted...

br-tun and VLAN Translation


When creating virtual networks, the translation between VLAN IDs and tunnel IDs is performed
by OpenFlow rules running on the br-tun tunnel bridge. The tunnel bridge is connected to the
Open vSwitch integration bridge, br-int through patch ports. The OpenFlow rules manage the
traffic in the tunnel, which translates VLAN-tagged traffic from the integration bridge into GRE
tunnels.

The following output shows the flow rules on the bridge before the creation of any instance. This
is a single rule that causes the bridge to drop all traffic.

[user@demo ~]$ ovs-ofctl dump-flows br-tun


NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=871.283s, table=0, n_packets=4, n_bytes=300, idle_age=862,
priority=1 actions=drop

After an instance is running on a compute node, the rules are modified to look something like the
following output.

[user@demo ~]$ ovs-ofctl dump-flows br-tun


NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=422.158s, table=0, n_packets=2, n_bytes=120,
idle_age=55, priority=3,tun_id=0x2,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00
actions=mod_vlan_vid:1,output:1
cookie=0x0, duration=421.948s, table=0, n_packets=64, n_bytes=8337, idle_age=31,
priority=3,tun_id=0x2,dl_dst=fa:16:3e:dd:c1:62 actions=mod_vlan_vid:1,NORMAL
cookie=0x0, duration=422.357s, table=0, n_packets=82, n_bytes=10443, idle_age=31,
priority=4,in_port=1,dl_vlan=1 actions=set_tunnel:0x2,NORMAL

CL210-RHOSP10.1-en-2-20171006 175

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

cookie=0x0, duration=1502.657s, table=0, n_packets=8, n_bytes=596, idle_age=423,


priority=1 actions=drop

The Open vSwitch agent is responsible for configuring flow rules on both the integration bridge
and the external bridge for VLAN translation. For example, when br-ex receives a frame marked
with VLAN ID of 1 on the port associated with phy-br-eth1, it modifies the VLAN ID in the
frame to 101. Similarly, when the integration bridge, br-int receives a frame marked with
VLAN ID of 101 on the port associated with int-br-eth1, it modifies the VLAN ID in the frame
to 1.

OpenStack Networking DHCP


The OpenStack Networking DHCP agent manages the network namespaces as well as the IP
allocations for instances in projects. The DHCP agent uses the dnsmasq process to manage the
IP address allocated to the virtual machines.

Note
If the OpenStack Networking DHCP agent is enabled and running when a subnet is
created, then by default, the subnet has DHCP enabled.

The DHCP agent runs inside a network namespace, named qdhcp-UUID, where UUID is the UUID
of a project network.

[user@demo ~]$ ip netns list


qdhcp-0062e02b-7e40-407f-ac43-49e84de096ed

Inside the namespace, the dnsmasq process binds to a TAP device, such as tapae83329c-91.
The following output shows the TAP device on a network node, inside a namespace.

[user@demo ~]$ ip netns exec qdhcp-0062e02b-7e40-407f-ac43-49e84de096ed ip a


...output omitted...
21: tapae83329c-91: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1446 qdisc noqueue state
UNKNOWN qlen 1000
link/ether fa:16:3e:f2:48:da brd ff:ff:ff:ff:ff:ff
inet 192.168.0.2/24 brd 192.168.0.255 scope global tapae83329c-91
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fef2:48da/64 scope link
valid_lft forever preferred_lft forever

This interface is a port in the integration bridge.

[user@demo ~]$ ovs-vsctl show


Bridge br-int
...output omitted...
Port "tapae83329c-91"
tag: 5
Interface "tapae83329c-91"
type: internal
...output omitted...

Administrators can locate the dnsmasq process associated with the namespace by searching the
output of the ps command for the UUID of the network.

176 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Instance Network Flow

[user@demo ~]$ ps -fe | grep 0062e02b-7e40-407f-ac43-49e84de096ed


dnsmasq
--no-hosts
--no-resolv
--strict-order
--except-interface=lo

--pid-file=/var/lib/neutron/dhcp/0062e02b-7e40-407f-ac43-49e84de096ed /pid
--dhcp-hostsfile=/var/lib/neutron/dhcp/0062e02b-7e40-407f-ac43-49e84de096ed/host
--addn-hosts=/var/lib/neutron/dhcp/0062e02b-7e40-407f-ac43-49e84de096ed/addn_hosts
--dhcp-optsfile=/var/lib/neutron/dhcp/0062e02b-7e40-407f-ac43-49e84de096ed/opts
--dhcp-leasefile=/var/lib/neutron/dhcp/0062e02b-7e40-407f-ac43-49e84de096ed/leases
--dhcp-match=set:ipxe,175
--bind-interfaces

--interface=tapae83329c-91
--dhcp-range=set:tag0,192.168.0.0,static,86400s
--dhcp-option-force=option:mtu,1446
--dhcp-lease-max=256
--conf-file=/etc/neutron/dnsmasq-neutron.conf
--domain=openstacklocal

The network identifier.


The TAP device that the dnsmasq process listens on.

Instance Network Flow


The following scenario describes the usage of a VLAN provider network that connects instances
directly to external networks. Each instance belongs to a different project. For this scenario,
Open vSwitch and Linux bridges are the two network back ends.

The scenario assumes the following:

• vlan is declared as an ML2 driver in /etc/neutron/plugins/ml2/ml2_conf.ini.

[ml2]
type_drivers = vlan

• A range of VLAN IDs that reflects the physical network is set in /etc/neutron/plugins/
ml2/ml2_conf.ini. For example, 171-172.

[ml2_type_vlan]
network_vlan_ranges=physnet1:171:172

• The br-ex bridge is set on the compute node, with eth1 enslaved to it.

• The physical network, physnet1 is mapped to the br-ex bridge in /etc/neutron/


plugins/ml2/openvswitch_agent.ini.

bridge_mappings = physnet1:br-ex

• The external_network_bridge has an empty value in /etc/neutron/l3_agent.ini.


This allows the usage of a providers-based networks instead of bridges-based networks.

external_network_bridge =

CL210-RHOSP10.1-en-2-20171006 177

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Figure 5.12: Network flow between two VLANs shows the implementation of the various network
bridges, ports, and virtual interfaces.

Figure 5.12: Network flow between two VLANs

Such a scenario can be used by administrators for connecting multiple VLAN-tagged interfaces
on a single network device to multiple provider networks. This scenario uses the physical network
called physnet1 mapped to the br-ex bridge. The VLANs use the IDs 171 and 172; the network
nodes and compute nodes are connected to the physical network using eth1 as the physical
interface.

Note
The ports of the physical switch on which these interfaces are connected must be
configured to trunk the VLAN ranges. If the trunk is not configured, the traffic will be
blocked.

The following procedure shows the creation of the two networks and their associated subnets.

178 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Instance Network Flow

1. The following commands create the two networks. Optionally, administrators can mark the
networks as shared.

[user@demo ~(keystone_admin)]$ neutron net-create provider-vlan171 \


--provider:network-type vlan \
--router:external true \
--provider:physical_network physnet1 \
--provider:segmentation_id 171 \
--shared

[user@demo ~(keystone_admin)]$ neutron net-create provider-vlan172 \


--provider:network_type vlan \
--router:external true \
--provider:physical_network physnet1 \
--provider:segmentation_id 172 \
--shared

2. The following commands create the subnets and for each external network.

[user@demo ~(keystone_admin)]$ openstack subnet create \


--network provider-vlan171 \
--subnet-range 10.65.217.0/24 \
--dhcp \
--gateway 10.65.217.254 \
subnet-provider-171 \

[user@demo ~(keystone_admin)]$ openstack subnet create \


--network provider-vlan172 \
--subnet-range 10.65.218.0/24 \
--dhcp \
--gateway 10.65.218.254 \
subnet-provider-172 \

Traffic Flow Implementation


The following describes the implementation of the traffic flow. The qbr bridge is connected
to the integration bridge, br-int via a veth pair. qvb is the endpoint connected to the Linux
bridge. qvo is the endpoint connected to the Open vSwitch bridge.

Run the brctl command to review the Linux bridges and their ports.

[user@demo ~]$ brctl show


bridge name bridge id STP enabled interfaces

qbr84878b78-63 8000.e6b3df9451e0 no qvb84878b78-63

tap84878b78-63

qbr86257b61-5d 8000.3a3c888eeae6 no qvb86257b61-5d


tap86257b61-5d

The project bridge.


The end point of the vEth pair connected to the project bridge.
The TAP device, which is the network interface of the instance.
Run the ovs-vsctl command to review the implementation of the Open vSwitch bridges.

[user@demo ~]$ ovs-vsctl show

CL210-RHOSP10.1-en-2-20171006 179

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Bridge br-int
fail_mode: secure
Port int-br-ex
Interface int-br-ex
type: patch

options: {peer=phy-br-ex}
Port br-int
Interface br-int
type: internal
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}

Port "qvo86257b61-5d"
tag: 3
Interface "qvo86257b61-5d"

Port "qvo84878b78-63"
tag: 2
Interface "qvo84878b78-63"

The Open vSwitch integration bridge.


The patch that connects the integration bridge, br-int, to the external bridge, br-ex
The end point of the vEth pair that connects the project bridge to the integration bridge for
the second project.
The end point of the vEth pair that connects the project bridge to the integration bridge for
the first project.

Outgoing Traffic Flow


The following describes the network flow for the two instances for packets destined to an
external network.

1. The packets that leave the instances from the eth0 interface arrive to the Linux bridge,
qbr. The instances use the virtual device, tap, as the network device. The device is set as a
port in the qbr bridge.

2. Each qvo end point residing in the Open vSwitch bridge is tagged with the internal VLAN
tag associated with the VLAN provider network. In this example, the internal VLAN tag 2 is
associated with the VLAN provider network provider-171, and VLAN tag 3 is associated
with VLAN provider network provider-172.

When a packet reaches the qvo end point, the VLAN tag is added to the packet header.

3. The packet is then moved to the Open vSwitch bridge br-ex using the patch between int-
br-ex and phy-br-ex.

Run the ovs-vsctl show command to view the ports in the br-ex and br-int bridges.

[user@demo ~]$ ovs-vsctl show


Bridge br-ex
Port phy-br-ex
Interface phy-br-ex
type: patch

options: {peer=int-br-ex}
...output omitted...
Bridge br-int

180 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Instance Network Flow

Port int-br-ex
Interface int-br-ex
type: patch

options: {peer=phy-br-ex}

Patch port in the br-ex bridge.


Patch port in the br-int bridge.

4. When the packet reaches the endpoint phy-br-ex on the br-ex bridge, an Open vSwitch
flow inside the br-ex bridge replaces the internal VLAN tag with the actual VLAN tag
associated with the VLAN provider network.

Run the ovs-ofctl show br-ex command to retrieve the port number of the phy-br-
ex port. In the following example, the port phy-br-ex has a value of 4.

[user@demo ~]$ ovs-ofctl show br-ex


4(phy-br-ex): addr:32:e7:a1:6b:90:3e
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max

5. The following output shows how Open Flow handles packets in the phy-br-ex bridge
(in_port=4), with a VLAN ID of 2 (dl_vlan=2). Open vSwitch replaces the VLAN
tag with 171 (actions=mod_vlan_vid:171,NORMAL), then forwards the packet.
The output also shows any packets that arrive on the phy-br-ex (in_port=4
with the VLAN tag 3 (dl_vlan=3). Open vSwitch replaces the VLAN tag with 172
(actions=mod_vlan_vid:172,NORMAL), then forwards the packet.

Note
These rules are automatically added by the OpenStack Networking Open vSwitch
Agent.

[user@demo ~]$ ovs-ofctl dump-flows br-ex


NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=6527.527s, table=0, n_packets=29211, n_bytes=2725576,
idle_age=0, priority=1 actions=NORMAL
cookie=0x0, duration=2939.172s, table=0, n_packets=117, n_bytes=8296, idle_age=58,
priority=4,in_port=4 actions=mod_vlan_vid:172,NORMAL
cookie=0x0, duration=6111.389s, table=0, n_packets=145, n_bytes=9368, idle_age=98,

priority=4,in_port=4 ,dl_vlan=2 actions=mod_vlan_vid:171 ,NORMAL


cookie=0x0, duration=6526.675s, table=0, n_packets=82, n_bytes=6700, idle_age=2462,
priority=2,in_port=4 actions=drop

The bridge identifier.


The identifier of the input VLAN.
The VLAN identifier to apply to the packet.

6. The packet is then forwarded to the physical interface, eth1.

CL210-RHOSP10.1-en-2-20171006 181

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Incoming Traffic Flow


The following describes the network flow for incoming traffic to the instances.

1. Incoming packets destined to instances from the external network first reach the eth1
network device. They are then forwarded to the br-ex bridge.

From the br-ex bridge, packets are moved to the integration bridge, br-int over the peer
patch that connects the two bridges (phy-br-ex and int-br-ex).

The following output shows the port with a number of 18.

[user@demo ~]$ ovs-ofctl show br-int


18(int-br-ex): addr:fe:b7:cb:03:c5:c1
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max

2. When the packet passes through the int-br-ex port, an Open vSwitch flow rule inside the
bridge adds the internal VLAN tag 2 if the packets belongs to the provider-171 network,
or the VLAN tag 3 if the packet belongs to the provider-172 network.

Run the ovs-ofctl dump-flows br-int command to view the flow in the integration
bridge:

[user@demo ~]$ ovs-ofctl dump-flows br-int


...output omitted...
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=6770.572s, table=0, n_packets=1239, n_bytes=127795,
idle_age=106, priority=1 actions=NORMAL
cookie=0x0, duration=3181.679s, table=0, n_packets=2605, n_bytes=246456, idle_age=0,

priority=3,in_port=18 ,dl_vlan=172 actions=mod_vlan_vid:3 ,NORMAL


cookie=0x0, duration=6353.898s, table=0, n_packets=5077, n_bytes=482582, idle_age=0,
priority=3,in_port=18,dl_vlan=171 actions=mod_vlan_vid:2,NORMAL
cookie=0x0, duration=6769.391s, table=0, n_packets=22301, n_bytes=2013101,
idle_age=0, priority=2,in_port=18 actions=drop
cookie=0x0, duration=6770.463s, table=23, n_packets=0, n_bytes=0, idle_age=6770,
priority=0 actions=drop
...output omitted...

The port identifier of the integration bridge.


The VLAN ID of the packet.
The tagging of the packet.
In the previous output, the second rule instructs that packets passing through the int-
br-ex port (in_port=18), with a VLAN tag of 172 (dl_vlan=172), have the VLAN tag
replaced with 3 (actions=mod_vlan_vid:3,NORMAL), then forwards the packet.

The third rule instructs that packets passing through the int-br-ex port (in_port=18),
with a VLAN tag of 171 (dl_vlan=171), have the VLAN tag replaced with 2
(actions=mod_vlan_vid:2,NORMAL), and then forwards the packet. These rules are
automatically added by the OpenStack Networking Open vSwitch agent.

With the internal VLAN tag added to the packet, the qvo interface accepts it and forwards
it to the qvb interface after the VLAN tag has been stripped. The packet then reaches the
instance.

182 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Instance Network Flow

Tracing Multitenancy Network Flows


The following steps outline the process for tracing multitenancy network flows.

1. Create the provider network and its associated subnet.

2. Create a router and connect it to all the projects' subnets. This allows for connectivity
between two instances in separate projects.

3. Set the router as a gateway for the provider network.

4. Connect to the network node and use the tcpdump command against all network interfaces.

5. Connect to the compute node and use the tcpdump command against the qvb devices in
the qrouter namespace.

References
Further information is available in the Networking Guide for Red Hat OpenStack
Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

Highly recommended document called "Networking in too much detail"


https://www.rdoproject.org/networking/networking-in-too-much-detail/

CL210-RHOSP10.1-en-2-20171006 183

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Guided Exercise: Tracing Multitenancy Network


Flows

In this exercise, you will manage network flow for two projects. You will review the network
implementation for multitenancy and trace packets between projects.

Outcomes
You should be able to:

• Create a router for multiple projects.

• Review the network implementation for multiple projects.

• Use Linux tools to trace network packets between multiple projects.

Before you begin


Log in to workstation as student using student as the password.

Run the lab network-tracing-net-flows setup command. The script ensures that
OpenStack services are running and the environment is properly configured for the general
exercise. This script creates two projects: research and finance. The developer1 user is a
member of the research project, the developer2 user is a member of the finance project.
The architect1 user is the administrative user for the two projects. The script also spawns one
instance in each project.

[student@workstation ~]$ lab network-tracing-net-flows setup

Steps
1. As the architect1 administrative user, review the instances for each of the two projects.

1.1. From workstation, source the credential file for the architect1 user in the
finance project, available at /home/student/architect1-finance-rc. List the
instances in the finance project.

[student@workstation ~]$ source architect1-finance-rc


[student@workstation ~(architect1-finance)]$ openstack server list -f json
[
{
"Status": "ACTIVE",
"Networks": "finance-network1=192.168.2.F",
"ID": "fcdd9115-5e05-4ec6-bd1c-991ab36881ee",
"Image Name": "rhel7",
"Name": "finance-app1"
}
]

1.2. Source the credential file of the architect1 user for the research project, available
at /home/student/architect1-research-rc. List the instances in the project.

[student@workstation ~(architect1-finance)]$ source architect1-research-rc


[student@workstation ~(architect1-research)]$ openstack server list -f json
[

184 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


{
"Status": "ACTIVE",
"Networks": "research-network1=192.168.1.R",
"ID": "d9c2010e-93c0-4dc7-91c2-94bce5133f9b",
"Image Name": "rhel7",
"Name": "research-app1"
}
]

2. As the architect1 administrative user in the research project, create a shared external
network to provide external connectivity for the two projects. Use provider-172.25.250
as the name of the network. The environment uses flat networks with datacentre as the
physical network name.

[student@workstation ~(architect1-research)]$ openstack network create \


--external --share \
--provider-network-type flat \
--provider-physical-network datacentre \
provider-172.25.250
+---------------------------+--------------------------------------+
| Field | Value |
+---------------------------+--------------------------------------+
| admin_state_up | UP |
| availability_zone_hints | |
| availability_zones | |
| created_at | 2017-06-09T21:03:49Z |
| description | |
| headers | |
| id | 56b18acd-4f5a-4da3-a83a-fdf7fefb59dc |
| ipv4_address_scope | None |
| ipv6_address_scope | None |
| is_default | False |
| mtu | 1496 |
| name | provider-172.25.250 |
| port_security_enabled | True |
| project_id | c4606deb457f447b952c9c936dd65dcb |
| project_id | c4606deb457f447b952c9c936dd65dcb |
| provider:network_type | flat |
| provider:physical_network | datacentre |
| provider:segmentation_id | None |
| qos_policy_id | None |
| revision_number | 4 |
| router:external | External |
| shared | True |
| status | ACTIVE |
| subnets | |
| tags | [] |
| updated_at | 2017-06-09T21:03:49Z |
+---------------------------+--------------------------------------+

3. Create the subnet for the provider network in the 172.25.250.0/24 range. Name the
subnet provider-subnet-172.25.250. Disable the DHCP service for the network and
use an allocation pool of 172.25.250.101 - 172.25.250.189. Use 172.25.250.254
as the DNS server and the gateway for the network.

[student@workstation ~(architect1-research)]$ openstack subnet create \


--network provider-172.25.250 \
--no-dhcp --subnet-range 172.25.250.0/24 \
--gateway 172.25.250.254 \

CL210-RHOSP10.1-en-2-20171006 185

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

--dns-nameserver 172.25.250.254 \
--allocation-pool start=172.25.250.101,end=172.25.250.189 \
provider-subnet-172.25.250
+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| allocation_pools | 172.25.250.101-172.25.250.189 |
| cidr | 172.25.250.0/24 |
| created_at | 2017-06-09T22:28:03Z |
| description | |
| dns_nameservers | 172.25.250.254 |
| enable_dhcp | False |
| gateway_ip | 172.25.250.254 |
| headers | |
| host_routes | |
| id | e5d37f20-c976-4719-aadf-1b075b17c861 |
| ip_version | 4 |
| ipv6_address_mode | None |
| ipv6_ra_mode | None |
| name | provider-subnet-172.25.250 |
| network_id | 56b18acd-4f5a-4da3-a83a-fdf7fefb59dc |
| project_id | c4606deb457f447b952c9c936dd65dcb |
| project_id | c4606deb457f447b952c9c936dd65dcb |
| revision_number | 2 |
| service_types | [] |
| subnetpool_id | None |
| updated_at | 2017-06-09T22:28:03Z |
+-------------------+--------------------------------------+

4. List the subnets present in the environment. Ensure that there are three subnets: one
subnet for each project and one subnet for the external network.

[student@workstation ~(architect1-research)]$ openstack subnet list -f json


[
{
"Network": "14f8182a-4c0f-442e-8900-daf3055e758d",
"Subnet": "192.168.2.0/24",
"ID": "79d5d45f-e9fd-47a2-912e-e1acb83c6978",
"Name": "finance-subnet1"
},
{
"Network": "f51735e7-4992-4ec3-b960-54bd8081c07f",
"Subnet": "192.168.1.0/24",
"ID": "d1dd16ee-a489-4884-a93b-95028b953d16",
"Name": "research-subnet1"
},
{
"Network": "56b18acd-4f5a-4da3-a83a-fdf7fefb59dc",
"Subnet": "172.25.250.0/24",
"ID": "e5d37f20-c976-4719-aadf-1b075b17c861",
"Name": "provider-subnet-172.25.250"
}
]

5. Create the research-router1 router and connect it to the two subnets, finance and
research.

5.1. Create the router.

[student@workstation ~(architect1-research)]$ openstack router create \

186 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


research-router1
+-------------------------+--------------------------------------+
| Field | Value |
+-------------------------+--------------------------------------+
| admin_state_up | UP |
| availability_zone_hints | |
| availability_zones | |
| created_at | 2017-06-09T23:03:15Z |
| description | |
| distributed | False |
| external_gateway_info | null |
| flavor_id | None |
| ha | False |
| headers | |
| id | 3fed0799-5da7-48ac-851d-c2b3dee01b24 |
| name | research-router1 |
| project_id | c4606deb457f447b952c9c936dd65dcb |
| project_id | c4606deb457f447b952c9c936dd65dcb |
| revision_number | 3 |
| routes | |
| status | ACTIVE |
| updated_at | 2017-06-09T23:03:15Z |
+-------------------------+--------------------------------------+

5.2. Connect the router to the research-subnet1 subnet.

[student@workstation ~(architect1-research)]$ openstack router add subnet \


research-router1 research-subnet1

5.3. Connect the router to the finance-subnet1 subnet.

[student@workstation ~(architect1-research)]$ openstack router add subnet \


research-router1 finance-subnet1

6. Define the router as a gateway for the provider network, provider-172.25.250.

[student@workstation ~(architect1-research)]$ neutron router-gateway-set \


research-router1 provider-172.25.250
Set gateway for router research-router1

7. Ensure that the router is connected to the three networks by listing the router ports.

[student@workstation ~(architect1-research)]$ neutron router-port-list \


research-router1 -f json
[
{
"mac_address": "fa:16:3e:65:71:68",
"fixed_ips": "{\"subnet_id\": \"0e6db9a7-40b6-4b10-b975-9ac32c458879\",
\"ip_address\": \"192.168.2.1\"}",
"id": "ac11ea59-e50e-47fa-b11c-1e93d975b534",
"name": ""
},
{
"mac_address": "fa:16:3e:5a:74:28",
"fixed_ips": "{\"subnet_id\": \"e5d37f20-c976-4719-aadf-1b075b17c861\",
\"ip_address\": \"172.25.250.S\"}",
"id": "dba2aba8-9060-4cef-be9f-6579baa016fb",

CL210-RHOSP10.1-en-2-20171006 187

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

"name": ""
},
{
"mac_address": "fa:16:3e:a1:77:5f",
"fixed_ips": "{\"subnet_id\": \"d1dd16ee-a489-4884-a93b-95028b953d16\",
\"ip_address\": \"192.168.1.1\"}",
"id": "fa7dab05-e5fa-4c2d-a611-d78670006ddf",
"name": ""
}
]

8. As the developer1 user, create a floating IP and attach it to the research-app1 virtual
machine.

8.1. Source the credentials for the developer1 user and create a floating IP.

[student@workstation ~(architect1-finance)]$ source developer1-research-rc


[student@workstation ~(developer1-research)]$ openstack floating ip create \
provider-172.25.250
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
| created_at | 2017-06-10T00:40:51Z |
| description | |
| fixed_ip_address | None |
| floating_ip_address | 172.25.250.N |
| floating_network_id | 56b18acd-4f5a-4da3-a83a-fdf7fefb59dc |
| headers | |
| id | d9c2010e-93c0-4dc7-91c2-94bce5133f9b |
| port_id | None |
| project_id | c4606deb457f447b952c9c936dd65dcb |
| project_id | c4606deb457f447b952c9c936dd65dcb |
| revision_number | 1 |
| router_id | None |
| status | DOWN |
| updated_at | 2017-06-10T00:40:51Z |
+---------------------+--------------------------------------+

8.2. Attach the floating IP to the research-app1 virtual machine.

[student@workstation ~(developer1-research)]$ openstack server add floating ip \


research-app1 172.25.250.N

9. As the developer2 user, create a floating IP and attach it to the finance-app1 virtual
machine.

9.1. Source the credentials for the developer2 user and create a floating IP.

[student@workstation ~(developer1-research)]$ source developer2-finance-rc


[student@workstation ~(developer2-finance)]$ openstack floating ip create \
provider-172.25.250
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
| created_at | 2017-06-10T00:40:51Z |
| description | |
| fixed_ip_address | None |
| floating_ip_address | 172.25.250.P |

188 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


| floating_network_id | 56b18acd-4f5a-4da3-a83a-fdf7fefb59dc |
| headers | |
| id | 797854e4-1253-4059-a6d1-3cb5a99a98ec |
| port_id | None |
| project_id | cd68b32fa14942d587a4be838ac722be |
| project_id | cd68b32fa14942d587a4be838ac722be |
| revision_number | 1 |
| router_id | None |
| status | DOWN |
| updated_at | 2017-06-10T00:40:51Z |
+---------------------+--------------------------------------+

9.2. Attach the floating IP to the finance-app1 virtual machine.

[student@workstation ~(developer2-finance)]$ openstack server add floating ip \


finance-app1 172.25.250.P

10. Source the credentials for the developer1 user and retrieve the floating IP attached to the
research-app1 virtual machine.

[student@workstation ~(developer2-finance)]$ source developer1-research-rc


[student@workstation ~(developer1-research)]$ openstack server list -f json
[
{
"Status": "ACTIVE",
"Networks": "research-network1=192.168.1.R, 172.25.250.N",
"ID": "d9c2010e-93c0-4dc7-91c2-94bce5133f9b",
"Image Name": "rhel7",
"Name": "research-app1"
}
]

11. Test the connectivity to the instance research-app1, running in the research project by
using the ping command.

[student@workstation ~(developer1-research)]$ ping -c 3 172.25.250.N


PING 172.25.250.N (172.25.250.N) 56(84) bytes of data.
64 bytes from 172.25.250.N: icmp_seq=1 ttl=63 time=1.77 ms
64 bytes from 172.25.250.N: icmp_seq=2 ttl=63 time=0.841 ms
64 bytes from 172.25.250.N: icmp_seq=3 ttl=63 time=0.861 ms

--- 172.25.250.N ping statistics ---


3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.841/1.159/1.776/0.437 ms

12. As the developer2 user, retrieve the floating IP attached to the finance-app1 virtual
machine so you can test connectivity.

[student@workstation ~(developer1-research)]$ source developer2-finance-rc


[student@workstation ~(developer2-finance)]$ openstack server list -f json
[
{
"Status": "ACTIVE",
"Networks": "finance-network1=192.168.2.F, 172.25.250.P",
"ID": "797854e4-1253-4059-a6d1-3cb5a99a98ec",
"Image Name": "rhel7",
"Name": "finance-app1"

CL210-RHOSP10.1-en-2-20171006 189

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

}
]

13. Use the ping command to reach the 172.25.250.P IP. Leave the command running, as
you will connect to the overcloud nodes to review how the packets are routed.

[student@workstation ~(developer2-finance)]$ ping 172.25.250.P


PING 172.25.250.P (172.25.250.P) 56(84) bytes of data.
64 bytes from 172.25.250.P: icmp_seq=1 ttl=63 time=1.84 ms
64 bytes from 172.25.250.P: icmp_seq=2 ttl=63 time=0.639 ms
64 bytes from 172.25.250.P: icmp_seq=3 ttl=63 time=0.708 ms
...output omitted...

14. Open another terminal. Use the ssh command to log in to controller0 as the heat-
admin user.

[student@workstation ~]$ ssh heat-admin@controller0


[heat-admin@overcloud-controller-0 ~]$

15. Run the tcpdump command against all interfaces. Notice the two IP address to which the
ICMP packets are routed: 192.168.2.F, which is the private IP of the finance-app1
virtual machine, and 172.25.250.254, which is the gateway for the provider network.

[heat-admin@overcloud-controller-0 ~]$ sudo tcpdump \


-i any -n -v \
'icmp[icmptype] = icmp-echoreply' \
or 'icmp[icmptype] = icmp-echo'
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535
bytes
16:15:09.301102 IP (tos 0x0, ttl 64, id 31032, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 172.25.250.P: ICMP echo request, id 24572, seq 10, length 64
16:15:09.301152 IP (tos 0x0, ttl 63, id 31032, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 192.168.2.F: ICMP echo request, id 24572, seq 10, length 64
16:15:09.301634 IP (tos 0x0, ttl 64, id 12980, offset 0, flags [none], proto ICMP
(1), length 84)
192.168.2.F > 172.25.250.254: ICMP echo reply, id 24572, seq 10, length 64
16:15:09.301677 IP (tos 0x0, ttl 63, id 12980, offset 0, flags [none], proto ICMP
(1), length 84)
172.25.250.P > 172.25.250.254: ICMP echo reply, id 24572, seq 10, length 64
16:15:10.301102 IP (tos 0x0, ttl 64, id 31282, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 172.25.250.P: ICMP echo request, id 24572, seq 11, length 64
16:15:10.301183 IP (tos 0x0, ttl 63, id 31282, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 192.168.2.F: ICMP echo request, id 24572, seq 11, length 64
16:15:10.301693 IP (tos 0x0, ttl 64, id 13293, offset 0, flags [none], proto ICMP
(1), length 84)
192.168.2.F > 172.25.250.254: ICMP echo reply, id 24572, seq 11, length 64
16:15:10.301722 IP (tos 0x0, ttl 63, id 13293, offset 0, flags [none], proto ICMP
(1), length 84)
172.25.250.P > 172.25.250.254: ICMP echo reply, id 24572, seq 11, length 64
...output omitted...

16. Cancel the tcpdump command by pressing Ctrl+C and list the network namespaces.
Retrieve the routes in the qrouter namespace to determine the network device that

190 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


handles the routing for the 192.168.2.0/24 network. The following output indicates that
packets destined to the 192.168.2.0/24 network are routed through the qr-ac11ea59-
e5 device (the IDs and names will be different in your output).

[heat-admin@overcloud-controller-0 ~]$ ip netns list


qrouter-3fed0799-5da7-48ac-851d-c2b3dee01b24
...output omitted...
[heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \
qrouter-3fed0799-5da7-48ac-851d-c2b3dee01b24 \
ip route
172.25.250.0/24 dev qg-dba2aba8-90 proto kernel scope link src 172.25.250.107
192.168.1.0/24 dev qr-fa7dab05-e5 proto kernel scope link src 192.168.1.1
192.168.2.0/24 dev qr-ac11ea59-e5 proto kernel scope link src 192.168.2.1

17. Within the qrouter namespace, run the ping command to confirm that the private IP of
the finance-app1 virtual machine, 192.168.2.F, is reachable.

[heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \


qrouter-3fed0799-5da7-48ac-851d-c2b3dee01b24 \
ping -c 3 -I qr-ac11ea59-e5 192.168.2.F
PING 192.168.2.F (192.168.2.F) from 192.168.2.1 qr-ac11ea59-e5: 56(84) bytes of
data.
64 bytes from 192.168.2.F: icmp_seq=1 ttl=64 time=0.555 ms
64 bytes from 192.168.2.F: icmp_seq=2 ttl=64 time=0.507 ms
64 bytes from 192.168.2.F: icmp_seq=3 ttl=64 time=0.601 ms

--- 192.168.2.F ping statistics ---


3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.507/0.554/0.601/0.042 ms

18. From the first terminal, cancel the ping command by pressing Ctrl+C. Rerun the ping
command against the floating IP of the research-app1 virtual machine, 172.25.250.N.
Leave the command running, as you will be inspecting the packets from the controller0.

[student@workstation ~(developer2-finance)]$ ping 172.25.250.N


PING 172.25.250.N (172.25.250.N) 56(84) bytes of data.
64 bytes from 172.25.250.N: icmp_seq=1 ttl=63 time=1.84 ms
64 bytes from 172.25.250.N: icmp_seq=2 ttl=63 time=0.639 ms
64 bytes from 172.25.250.N: icmp_seq=3 ttl=63 time=0.708 ms
...output omitted...

19. From the terminal connected to the controller-0, run the tcpdump command. Notice the
two IP address to which the ICMP packets are routed: 192.168.1.R, which is the private IP
of the research-app1 virtual machine, and 172.25.250.254, which is the IP address of
the gateway for the provider network.

[heat-admin@overcloud-controller-0 ~]$ sudo tcpdump \


-i any -n -v \
'icmp[icmptype] = icmp-echoreply' or \
'icmp[icmptype] = icmp-echo'
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535
bytes
16:58:40.340643 IP (tos 0x0, ttl 64, id 65405, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 172.25.250.N: ICMP echo request, id 24665, seq 47, length 64

CL210-RHOSP10.1-en-2-20171006 191

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

16:58:40.340690 IP (tos 0x0, ttl 63, id 65405, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 192.168.1.R: ICMP echo request, id 24665, seq 47, length 64
16:58:40.341130 IP (tos 0x0, ttl 64, id 41896, offset 0, flags [none], proto ICMP
(1), length 84)
192.168.1.R > 172.25.250.254: ICMP echo reply, id 24665, seq 47, length 64
16:58:40.341141 IP (tos 0x0, ttl 63, id 41896, offset 0, flags [none], proto ICMP
(1), length 84)
172.25.250.N > 172.25.250.254: ICMP echo reply, id 24665, seq 47, length 64
16:58:41.341051 IP (tos 0x0, ttl 64, id 747, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 172.25.250.N: ICMP echo request, id 24665, seq 48, length 64
16:58:41.341102 IP (tos 0x0, ttl 63, id 747, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 192.168.1.R: ICMP echo request, id 24665, seq 48, length 64
16:58:41.341562 IP (tos 0x0, ttl 64, id 42598, offset 0, flags [none], proto ICMP
(1), length 84)
192.168.1.R > 172.25.250.254: ICMP echo reply, id 24665, seq 48, length 64
16:58:41.341585 IP (tos 0x0, ttl 63, id 42598, offset 0, flags [none], proto ICMP
(1), length 84)
172.25.250.N > 172.25.250.254: ICMP echo reply, id 24665, seq 48, length 64
...output omitted...

20. Cancel the tcpdump command by pressing Ctrl+C and list the network namespaces.
Retrieve the routes in the qrouter namespace to determine the network device that
handles routing for the 192.168.1.0/24 network. The following output indicates that
packets destined to the 192.168.1.0/24 network are routed through the qr-fa7dab05-
e5 device (the IDs and names will be different in your output).

[heat-admin@overcloud-controller-0 ~]$ sudo ip netns list


qrouter-3fed0799-5da7-48ac-851d-c2b3dee01b24
...output omitted...
[heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \
qrouter-3fed0799-5da7-48ac-851d-c2b3dee01b24 \
ip route
172.25.250.0/24 dev qg-dba2aba8-90 proto kernel scope link src 172.25.250.107
192.168.1.0/24 dev qr-fa7dab05-e5 proto kernel scope link src 192.168.1.1
192.168.2.0/24 dev qr-ac11ea59-e5 proto kernel scope link src 192.168.2.1

21. Within the qrouter namespace, run the ping command to confirm that the private IP of
the finance-app1 virtual machine, 192.168.1.F, is reachable.

[heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \


qrouter-3fed0799-5da7-48ac-851d-c2b3dee01b24 \
ping -c 3 -I qr-fa7dab05-e5 192.168.1.R
PING 192.168.192.168.1.R (192.168.192.168.1.R) from 192.168.1.1 qr-fa7dab05-e5:
56(84) bytes of data.
64 bytes from 192.168.192.168.1.R: icmp_seq=1 ttl=64 time=0.500 ms
64 bytes from 192.168.192.168.1.R: icmp_seq=2 ttl=64 time=0.551 ms
64 bytes from 192.168.192.168.1.R: icmp_seq=3 ttl=64 time=0.519 ms

--- 192.168.192.168.1.R ping statistics ---


3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.500/0.523/0.551/0.028 ms

22. Exit from controller0 and connect to compute0.

[heat-admin@overcloud-controller-0 ~]$ exit

192 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


[student@workstation ~]$ ssh heat-admin@compute0
[heat-admin@overcloud-compute-0 ~]$

23. List the Linux bridges. The following output indicates two bridges with two ports each.
Each bridge corresponds to an instance. The TAP devices in each bridge correspond to the
virtual NIC; the qvb devices correspond to the vEth pair that connect the Linux bridge to the
integration bridge, br-int.

[heat-admin@overcloud-compute-0 ~]$ brctl show


bridge name bridge id STP enabled interfaces
qbr03565cda-b1 8000.a2117e24b27b no qvb03565cda-b1
tap03565cda-b1
qbr92387a93-92 8000.9a21945ec452 no qvb92387a93-92
tap92387a93-92

24. Run the tcpdump command against any of the two qvb interface while the ping command
is still running against the 172.25.250.N floating IP. If the output does not show any
packets being captured, press CTRL+C and rerun the command against the other qvb
interface.

[heat-admin@overcloud-compute-0 ~]$ sudo tcpdump -i qvb03565cda-b1 \


-n -vv 'icmp[icmptype] = icmp-echoreply' or \
'icmp[icmptype] = icmp-echo'
tcpdump: WARNING: qvb03565cda-b1: no IPv4 address assigned
tcpdump: listening on qvb03565cda-b1, link-type EN10MB (Ethernet), capture size
65535 bytes
CTRL+C

[heat-admin@overcloud-compute-0 ~]$ sudo tcpdump -i qvb92387a93-92 \


-n -vv 'icmp[icmptype] = icmp-echoreply' or \
'icmp[icmptype] = icmp-echo'
tcpdump: WARNING: qvb92387a93-92: no IPv4 address assigned
tcpdump: listening on qvb92387a93-92, link-type EN10MB (Ethernet), capture size
65535 bytes
17:32:43.781928 IP (tos 0x0, ttl 63, id 48653, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 192.168.1.12: ICMP echo request, id 24721, seq 1018, length 64
17:32:43.782197 IP (tos 0x0, ttl 64, id 37307, offset 0, flags [none], proto ICMP
(1), length 84)
192.168.1.R > 172.25.250.254: ICMP echo reply, id 24721, seq 1018, length 64
17:32:44.782026 IP (tos 0x0, ttl 63, id 49219, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 192.168.1.12: ICMP echo request, id 24721, seq 1019, length 64
17:32:44.782315 IP (tos 0x0, ttl 64, id 38256, offset 0, flags [none], proto ICMP
(1), length 84)
192.168.1.R > 172.25.250.254: ICMP echo reply, id 24721, seq 1019, length 64
...output omitted...

25. From the first terminal, cancel the ping command. Rerun the command against the
172.25.250.P IP, which is the IP of the finance-app1 instance.

[student@workstation ~(developer2-finance)]$ ping 172.25.250.P


PING 172.25.250.P (172.25.250.P) 56(84) bytes of data.
64 bytes from 172.25.250.P: icmp_seq=1 ttl=63 time=0.883 ms
64 bytes from 172.25.250.P: icmp_seq=2 ttl=63 time=0.779 ms
64 bytes from 172.25.250.P: icmp_seq=3 ttl=63 time=0.812 ms
64 bytes from 172.25.250.P: icmp_seq=4 ttl=63 time=0.787 ms

CL210-RHOSP10.1-en-2-20171006 193

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

...output omitted...

26. From the terminal connected to compute0 node, enter CTRL+C to cancel the tcpdump
command. Rerun the command against the second qvb interface, qvb03565cda-b1.
Confirm that the output indicates some activity.

[heat-admin@overcloud-compute-0 ~]$ sudo tcpdump -i qvb03565cda-b1 \


-n -vv 'icmp[icmptype] = icmp-echoreply' or \
'icmp[icmptype] = icmp-echo'
tcpdump: WARNING: qvb03565cda-b1: no IPv4 address assigned
17:40:20.596012 IP (tos 0x0, ttl 63, id 58383, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 192.168.2.F: ICMP echo request, id 24763, seq 172, length 64
17:40:20.596240 IP (tos 0x0, ttl 64, id 17005, offset 0, flags [none], proto ICMP
(1), length 84)
192.168.2.F > 172.25.250.254: ICMP echo reply, id 24763, seq 172, length 64
17:40:21.595997 IP (tos 0x0, ttl 63, id 58573, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 192.168.2.F: ICMP echo request, id 24763, seq 173, length 64
17:40:21.596294 IP (tos 0x0, ttl 64, id 17064, offset 0, flags [none], proto ICMP
(1), length 84)
192.168.2.F > 172.25.250.254: ICMP echo reply, id 24763, seq 173, length 64
17:40:22.595953 IP (tos 0x0, ttl 63, id 59221, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 192.168.2.F: ICMP echo request, id 24763, seq 174, length 64
17:40:22.596249 IP (tos 0x0, ttl 64, id 17403, offset 0, flags [none], proto ICMP
(1), length 84)
192.168.2.F > 172.25.250.254: ICMP echo reply, id 24763, seq 174, length 64
...output omitted...

27. From the first terminal, cancel the ping and confirm that the IP address 192.168.2.F is
the private IP of the finance-app1 instance.

27.1. Retrieve the private IP of the finance-app1 instance.

[student@workstation ~(developer2-finance)]$ openstack server show \


finance-app1 -f json
{
"OS-EXT-STS:task_state": null,
"addresses": "finance-network1=192.168.2.F, 172.25.250.P",
...output omitted...

28. Log in to the finance-app1 instance as the cloud-user user. Run the ping command
against the floating IP assigned to the research-app1 virtual machine, 172.25.250.N.

28.1.Use the ssh command as the cloud-user user to log in to finance-app1, with an
IP address of 172.25.250.P. Use the developer2-keypair1 located in the home
directory of the student user.

[student@workstation ~(developer2-finance)]$ ssh -i developer2-keypair1.pem \


cloud-user@172.25.250.P
[cloud-user@finance-app1 ~]$

28.2.Run the ping command against the floating IP of the research-app1 instance,
172.25.250.N.

194 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


[cloud-user@finance-app1 ~]$ ping 172.25.250.N

29. From the terminal connected to compute-0, enter CTRL+C to cancel the tcpdump
command. Rerun the command without specifying any interface. Confirm that the output
indicates some activity.

[heat-admin@overcloud-compute-0 ~]$ sudo tcpdump -i any \


-n -v 'icmp[icmptype] = icmp-echoreply' or \
'icmp[icmptype] = icmp-echo'
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535
bytes
18:06:05.030442 IP (tos 0x0, ttl 64, id 39160, offset 0, flags [DF], proto ICMP (1),
length 84)

192.168.2.F > 172.25.250.N: ICMP echo request, id 12256, seq 309, length 64 ;
18:06:05.030489 IP (tos 0x0, ttl 63, id 39160, offset 0, flags [DF], proto ICMP (1),
length 84)

172.25.250.P > 192.168.1.R: ICMP echo request, id 12256, seq 309, length 64 ;
18:06:05.030774 IP (tos 0x0, ttl 64, id 32646, offset 0, flags [none], proto ICMP
(1), length 84)

192.168.1.R > 172.25.250.P: ICMP echo reply, id 12256, seq 309, length 64 ;
18:06:05.030786 IP (tos 0x0, ttl 63, id 32646, offset 0, flags [none], proto ICMP
(1), length 84)

172.25.250.N > 192.168.2.F: ICMP echo reply, id 12256, seq 309, length 64 ;
18:06:06.030527 IP (tos 0x0, ttl 64, id 40089, offset 0, flags [DF], proto ICMP (1),
length 84)
192.168.2.F > 172.25.250.N: ICMP echo request, id 12256, seq 310, length 64
18:06:06.030550 IP (tos 0x0, ttl 63, id 40089, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.P > 192.168.1.R: ICMP echo request, id 12256, seq 310, length 64
18:06:06.030880 IP (tos 0x0, ttl 64, id 33260, offset 0, flags [none], proto ICMP
(1), length 84)
192.168.1.R > 172.25.250.P: ICMP echo reply, id 12256, seq 310, length 64
18:06:06.030892 IP (tos 0x0, ttl 63, id 33260, offset 0, flags [none], proto ICMP
(1), length 84)
172.25.250.N > 192.168.2.F: ICMP echo reply, id 12256, seq 310, length 64
...output omitted...

The output indicates the following flow for the sequence ICMP 309 (seq 309):

The private IP of the finance-app1 instance, 192.168.2.F sends an echo request


to the floating IP of the research-app1 instance, 172.25.250.N.
The floating IP of the finance-app1 instance, 172.25.250.P sends an echo
request to the private IP of the research-app1 instance, 192.168.1.R.
The private IP of the research-app1 instance, 192.168.1.R sends an echo reply
to the floating IP of the finance-app1 instance, 172.25.250.P.
The floating IP of the research-app1 instance, 172.25.250.N sends an echo
reply to the private IP of the finance-app1 instance, 192.168.2.F.

30. Close the terminal connected to compute-0. Cancel the ping command, and log out of
finance-app1.

Cleanup
From workstation, run the lab network-tracing-net-flows cleanup script to clean up
the resources created in this exercise.

CL210-RHOSP10.1-en-2-20171006 195

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

[student@workstation ~]$ lab network-tracing-net-flows cleanup

196 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Troubleshooting Network Issues

Troubleshooting Network Issues

Objectives
After completing this section, students should be able to:

• Troubleshoot common networking issues.

• Review OpenStack services configuration files.

Common Networking Issues


While Software-defined Networking may seem to introduce complexity at first glance, the
diagnostic process of troubleshooting network connectivity in OpenStack is similar to that of a
physical network. The OpenStack virtual infrastructure can be considered the same as a physical
infrastructure, hence, administrators can use the same tools and utilities they would use when
troubleshooting physical servers.

The following table lists some of the basic tools that administrators can use to troubleshoot their
environment.

Troubleshooting Utilities
Command Purpose
ping Sends packets to network hosts. The ping command is a useful tool for
analyzing network connectivity problems. The results serve as a basic
indicator of network connectivity. The ping command works by sending
traffic to specified destinations, and then reports back whether the attempts
were successful.
ip Manipulates routing tables, network devices and tunnels. The command allows
you to review IP addresses, network devices, namespaces, and tunnels.
traceroute Tracks the route that packets take from an IP network on their way to a given
host.
tcpdump A packet analyzer that allows users to display TCP/IP and other packets being
transmitted or received over a network to which a computer is attached.
ovs-vsctl High-level interface for managing the Open vSwitch database. The command
allows the management of Open vSwitch bridges, ports, tunnels, and patch
ports.
ovs-ofctl Administers OpenFlow switches. It can also show the current state of an
OpenFlow switch, including features, configuration, and table entries.
brctl Manages Linux bridges. The command allows you to manage Linux bridges.
Administrators can retrieve MAC addresses, devices names, and bridge
configurations.
openstack The OpenStack unified CLI. The command can be used to review networks and
networks ports.
neutron The Neutron networking service CLI. The command can be used to review
router ports and network agents.

CL210-RHOSP10.1-en-2-20171006 197

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Troubleshooting Scenarios
Troubleshooting procedures help mitigate issues and isolate them. There are some basic
recurring scenarios in OpenStack environments that administrators are likely to face. The
following potential scenarios include basic troubleshooting steps.

Note
Some of the resolution steps outlined in the following scenarios can overlap.

Instances are not able to reach the external network.


1. Use the ip command from within the instance to ensure that the DHCP provided an IP
address.

2. Review the bridges on the compute node to ensure that a vEth pair connects the project
bridge to the integration bridge.

3. Review the network namespaces on the network node. Ensure that the router namespace
exists and that routes are properly set.

4. Review the security group that the instance uses to make sure that there is a rule that
allows outgoing traffic.

5. Review the OpenStack Networking configuration to ensure that the mapping between the
physical interfaces and the provider network is properly set.

Instances do not retrieve an IP address.


1. Use the ps command to ensure that the dnsmasq service is running on the controller (or
network) node.

2. Review the namespaces to ensure that the qdhcp namespace exists and has the TAP device
that the dnsmasq service uses.

3. If the environment uses VLANs, ensure that the switch ports are set in trunk mode or that
the right VLAN ID is set for the port.

4. If a firewall manages the compute node, ensure that there are not any conflicting rules that
prevent the DHCP traffic from passing.

5. Use the neutron command to review the state of the DHCP agent.

Metadata is not injected into instances.


1. Ensure that the cloud-init package is installed in the source image.

2. Review the namespace to make sure that there is a route for the 169.254.169.254/32
address, and that it uses the right network interface. This IP addresses is used in Amazon
EC2 and other cloud computing platforms to distribute metadata to cloud instances. In
OpenStack, a Netfilter rule redirects packets destined to this IP address to the IP address of
the node that runs the metadata service.

3. Ensure that there is a Netfilter rule that redirects the calls from the 169.254.169.254 IP
address to the Nova metadata service.

198 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Troubleshooting Scenarios

Using the ping Command for Troubleshooting


The ping command is a useful tool when troubleshooting an environment. By sending traffic
to specified destinations, and then reporting back the status, it helps administrators analyze
potential network connectivity problems. The results that are obtained are a good indicator of
network connectivity, or lack thereof. However, the command might exclude some connectivity
issues, such as firewalls blocking the traffic.

Note
As a general practice, it is not recommended to configure firewalls to block ICMP
packets. Doing so makes troubleshooting more difficult.

The ping command can be run from the instance, the network node, and the compute node. The
-I interface option allows administrators to send packets from the specified interface. The
command allows the validation of multiple layers of the network infrastructure, such as:

• Name resolution, which implies the availability of the DNS server.

• IP routing, which uses the routing table rules.

• Network switching, which implies proper connectivity between the various network devices.

Results from a test using the ping command can reveal valuable information, depending on which
destination is tested. For example, in the following diagram, the instance VM1 is experiencing
connectivity issues. The possible destinations are numbered and the conclusions drawn from a
successful or failed result are presented below.

Figure 5.13: A basic troubleshooting scenario

1. Internet: a common first step is to send a packet to an external network, such as


www.redhat.com.

CL210-RHOSP10.1-en-2-20171006 199

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

• If the packet reaches the Internet, it indicates that all the various network points are
working as expected. This includes both the physical and virtual infrastructures.

• If the packet does not reach the Internet, while other servers are able to reach it, it
indicates that an intermediary network point is at fault.

2. Physical router: this is the IP address of the physical router, as configured by the network
administrator to direct the OpenStack internal traffic to the external network.

• If the packet reaches the IP address of the router, it indicates that the underlying switches
are properly set. Note that the packets at this stage do not traverse the router, therefore,
this step cannot be used to determine if there is a routing issue present on the default
gateway.

• If the packet does not reach the router, it indicates a failure in the path between the
instance and the router. The router or the switches could be down, or the gateway could
be improperly set.

3. Physical switch: the physical switch connects the different nodes on the same physical
network.

• If the instance is able to reach an instance on the same subnet, this indicates that the
physical switch allows the packets to pass.

• If the instance is not able to reach an instance on the same subnet, this could indicate
that switch ports do not trunk the required VLANs.

4. OpenStack Networking router: the virtual OpenStack Networking router that directs the
traffic of the instances.

• If the instance is able to reach the virtual router, this indicates that there are rules that
allow the ICMP traffic. This also indicates that the OpenStack Networking network node is
available and properly synchronized with the OpenStack Networking server.

• If the instance is not able to reach the virtual router, this could indicate that the security
group that the instance uses does not allow ICMP packets to pass. This could also indicate
that the L3 agent is down or not properly registered to the OpenStack Networking server.

5. VM2: the instance running on the same compute node.

• If the instance VM1 is able to reach VM2, this indicates that the network interfaces are
properly configured.

• If the instance VM1 is not able to reach VM2, this could indicate that VM2 prevents the
ICMP traffic. This could also indicate that the virtual bridges are not set correctly.

Troubleshooting VLANs
OpenStack Networking trunks VLAN networks through SDN switches. The support of VLAN-
tagged provider networks means that the instances are able to communicate with servers
located in the physical network. To troubleshoot connectivity to a VLAN Provider network,
administrator can use the ping command to reach the IP address of the gateway defined during
the creation of the network.

200 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Troubleshooting Scenarios

There are many ways to review the mapping of VLAN networks. For example, to discover which
internal VLAN tag is in use for a given external VLAN, administrators can use the ovs-ofctl
command

The following scenario assumes the creation of a subnet with a segmentation ID of 6.

[user@demo ~]$ openstack network create \


--external \
--provider-network-type vlan \
--provider-physical-network datacentre \
--provider-segment 6 \
provider-net
[user@demo ~]$ openstack subnet create \
--network provider-net \
--subnet-range=192.168.1.0/24 \
--dns-nameserver=172.25.250.254 \
--allocation-pool start=192.168.120.254,end=192.168.120.0/24
--dhcp provider-subnet

1. Retrieve the VLAN ID of the network, referred to as provider:segmentation_id.

[user@demo ~]$ openstack network show provider-net


+---------------------------+--------------------------------------+
| Field | Value |
+---------------------------+--------------------------------------+
...output omitted...
| provider:segmentation_id | None |
...output omitted...
+---------------------------+--------------------------------------+

2. Connect to the compute node and run the ovs-ofctl dump-flows command against the
integration bridge. Review the flow to make sure that there is a matching rule for the VLAN
tag 6. The following output shows that packets received on port ID 1 with the VLAN tag 6
are modified to have the internal VLAN tag 15.

[user@demo ~]$ ovs-ofctl dump-flows br-int


NXST_FLOW reply (xid=0x4):
cookie=0xa6bd2d041ea176d1, duration=547156.698s, table=0, n_packets=1184,
n_bytes=145725, idle_age=82, hard_age=65534, priority=3,in_port=1,dl_vlan=6
actions=mod_vlan_vid:15,NORMAL
...

3. Run the ovs-ofctl show br-int command to access the flow table and the ports of the
integration bridge. The following output shows that the port with the ID of 1 is assigned to
the int-br-ex port.

[user@demo ~]$ ovs-ofctl show br-int


OFPT_FEATURES_REPLY (xid=0x2): dpid:0000828a09b0f949
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst
mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(int-br-ex): addr:82:40:82:eb:0a:58
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max

CL210-RHOSP10.1-en-2-20171006 201

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

4. Use tools such as the ping command throughout the various network layers to detect
potentials connectivity issues. For example, if packets are lost between the compute node
and the controller node, this may indicate network congestion on the equipment that
connects the two nodes.

5. Review the configuration of physical switches to ensure that ports through which the
project traffic passes allow the traffic for network packets tagged with the provider ID.
Usually, ports need to be set in trunk mode.

Troubleshooting OpenStack Networking Agents


OpenStack Networking agents are services that perform a set of particular tasks. They can be
seen as software that handles data packets. Such agents include: the DHCP agent, the L3 agent,
the metering agent, and the LBaaS agent.

The neutron agent-list command can be used to review the state of the agent. If an agent
is out of synchronization or not properly registered, this can lead to unexpected results. For
example, if the DHCP agent is not marked as alive, instances will not retrieve any IP address
from the agent.

The following command shows how the neutron agent-list and neutron agent-show
command can be used to retrieve more information about OpenStack Networking agents.

[user@demo ~]$ neutron agent-list


+-----------------------+--------------------+------------------------------------+
| id | agent_type | host |
+-----------------------+--------------------+------------------------------------+

| 878cd9a3-addf11b7b302 | DHCP agent | overcloud-controller-0.localdomain |


| c60d8343-ed6ba0320a76 | Metadata agent | overcloud-controller-0.localdomain |
| cabd8fe5-a37f9aa68111 | L3 agent | overcloud-controller-0.localdomain |
| cc054b29-32b83bf41a95 | Open vSwitch agent | overcloud-compute-0.localdomain |
| f7921f0a-6c89cea15286 | Open vSwitch agent | overcloud-controller-0.localdomain |
+-----------------------+--------------------+------------------------------------+
-------------------+-------+----------------+---------------------------+
availability_zone | alive | admin_state_up | binary |
-------------------+-------+----------------+---------------------------+

nova | :-) | True | neutron-dhcp-agent |


| :-) | True | neutron-metadata-agent |
nova | :-) | True | neutron-l3-agent |
| xxx | True | neutron-openvswitch-agent |
| :-) | True | neutron-openvswitch-agent |
-------------------+-------+----------------+---------------------------+

[user@demo ~]$ neutron agent-show cabd8fe5-82e1-467a-b59c-a37f9aa68111


+---------------------+-----------------------------------------------+
| Field | Value |
+---------------------+-----------------------------------------------+
| admin_state_up | True |
| agent_type | L3 agent |
| alive | True |
| availability_zone | nova |
| binary | neutron-l3-agent |

| configurations | { |
| | "agent_mode": "legacy", |
| | "gateway_external_network_id": "", |
| | "handle_internal_only_routers": true, |
| | "routers": 0, |
| | "interfaces": 0, |
| | "floating_ips": 0, |

202 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Troubleshooting Scenarios

| | "interface_driver":
"neutron.agent.
linux.interface.OVSInterfaceDriver",|
| | "log_agent_heartbeats": false, |
| | "external_network_bridge": "", |
| | "ex_gw_ports": 0 |
| | } |
| created_at | 2017-04-29 01:47:50 |
| description | |
| heartbeat_timestamp | 2017-05-10 00:56:15 |
| host | overcloud-controller-0.localdomain |
| id | cabd8fe5-82e1-467a-b59c-a37f9aa68111 |
| started_at | 2017-05-09 19:22:14 |
| topic | l3_agent |
+---------------------+-----------------------------------------------+

The agent type.


The host that the agent runs on.
The status of the agent. :-) indicates that the agent is alive and registered. xxx indicates
that the agent is not able to contact the OpenStack Networking server.
Extra information about the agent configuration.

Troubleshooting OpenStack Networking Configuration Files


OpenStack Networking configuration files orchestrate the behavior of OpenStack Networking
services. They allow administrators to configure each network service. For example,
dhcp_agent.ini is used by the DHCP agent. Most OpenStack Networking configuration files
use the INI file format. INI files are text files that specify options as key=value pairs. Each
entry belongs to a group, such as DEFAULT.

The following output shows the ovs_integration_bridge key with a value of br-int in
the DEFAULT group. The entry is commented out, as this is the default value that OpenStack
Networking defines.

[DEFAULT]
#
# From neutron.base.agent
#

# Name of Open vSwitch bridge to use (string value)


#ovs_integration_bridge = br-int

OpenStack Networking configuration files are automatically configured by the undercloud when
deploying both the undercloud and the overcloud. The installers parse values defined in the
undercloud.conf or the Heat template files. However, the tools do not check for environment-
related error, such as a missing connectivity to external networks of misconfigured interfaces.
The following table lists the configuration files for OpenStack Networking services, located in /
etc/neutron:

OpenStack Networking Configuration Files


File Purpose
dhcp_agent.ini Used by the OpenStack Networking DHCP agent.
l3_agent.ini Used by the OpenStack Networking L3 agent.
lbass_agent.ini Used by the OpenStack Networking LBaaS agent.
metadata_agent.ini Used by the OpenStack Networking metadata agent.

CL210-RHOSP10.1-en-2-20171006 203

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

File Purpose
metering_agent.ini Used by the OpenStack Networking metering agent.
neutron.conf Used by the OpenStack Networking server.
conf.d/agent The conf.d directory contains extra directories for each
OpenStack Networking agent. This directory can be used
to configure OpenStack Networking services with custom
user-defined configuration files.
plugins/ml2 The ml2 directory contains a configuration file for each
plugin. For example, the openvswitch_agent.ini
contains the configuration for the Open vSwitch plugin.
plugins/ml2/ml2_conf.ini Defines the configuration for the ML2 framework. In this
file, administrators can set the VLAN ranges or the drivers
to enable.

Most of the options in the configuration files are documented with a short comment explaining
how the option is used by the service. Therefore, administrators can understand what the option
does before setting the value. Consider the ovs_use_veth option in the dhcp_agent.ini,
which provides instructions for using vEth interfaces:

# Uses veth for an OVS interface or not. Support kernels with limited namespace
# support (e.g. RHEL 6.5) so long as ovs_use_veth is set to True. (boolean
# value)
#ovs_use_veth = false

Important
While some options use boolean values, such as true or false, other options require
a value. Even if the text above each value specifies the type (string value or
boolean value), administrators need to understand the option before changing it.

Note
Modified configuration files in the overcloud are reset to their default state when
the overcloud is updated. If custom options are set, administrators must update the
configuration files after each overcloud update.

Administrators are likely to need to troubleshoot the configuration files when some action
related to a service fails. For example, upon creation of a VXLAN network, if OpenStack
Networking complains about a missing provider, administrators need to review the configuration
of ML2. They would then make sure that the type_drivers key in the ml2 section of the /etc/
neutron/plugins/ml2/ml2_conf.ini configuration file has the proper value set.

[ml2]
type_drivers = vxlan

They also have to make sure that the VLAN range in the section dedicated to VLAN is set
correctly. For example:

204 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Troubleshooting Scenarios

[ml2_type_vlan]
network_vlan_ranges=physnet1:171:172

Common Configuration Errors


The following lists some of the most common errors related to misconfigured files and their
resolution.

• Traffic does not reach the external network: administrators should review the bridge
mapping. Traffic that leaves the provider network from the router arrives in the integration
bridge. A patch port between the integration bridge and the external bridge allows the
traffic to pass through the bridge of the provider network and out to the physical network.
Administrators must ensure that there is an interface connected to the Internet which belongs
to the external bridge. The bridge mapping is defined in /etc/neutron/plugins/ml2/
openvswitch_agent.ini:

bridge_mappings = datacentre:br-ex

The bridge mapping configuration must correlate with that of the VLAN range. For the
example given above, the network_vlan_ranges should be set as follows:

network_vlan_ranges = datacentre:1:1000

• Packets in a VLAN network are not passing through a switch ports: administrators
should review the network_vlan_ranges in the /etc/neutron/plugin.ini
configuration file to make sure it matches the VLAN IDs allowed to pass through the switch
ports.

• The OpenStack Networking metadata server is unreachable by instances: administrators


should review the enable_isolated_metadata setting in the /etc/neutron/
dhcp_agent.ini. If the instances are directly attached to a provider's external network, and
have an external router configured as their default gateway, OpenStack Networking routers
are not used. Therefore, the OpenStack Netorking routers cannot be used to proxy metadata
requests from instances to the Nova metadata server. This can be resolved by setting the
enable_isolated_metadata key to True:

enable_isolated_metadata = True

• Support of overlapping IPs is disabled: overlapping IPs require the usage of Linux
network namespace. To enable the support of overlapping IPs, administrators must set the
allow_overlapping_ips key to True in the /etc/neutron/neutron.conf configuration
file:

# MUST be set to False if OpenStack Networking is being used in conjunction with Nova
# security groups. (boolean value)
# allow_overlapping_ips = True
allow_overlapping_ips=True

CL210-RHOSP10.1-en-2-20171006 205

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Troubleshooting Project Networks


When using namespaces, project traffic is contained within the network namespaces. As a result,
administrators can use both the openstack and ip commands to review the implementation of
the project network on the physical system that acts as a network node.

The following output shows the available project networks. In this example, there is only one
network, internal1.

[user@demo ~]$ openstack network list


+----------------------------+-----------+----------------------------+
| ID | Name | Subnets |
+----------------------------+-----------+----------------------------+
| 0062e02b-7e40-49e84de096ed | internal1 | 9f42ecca-0f8b-a01350df7c7c |
+----------------------------+-----------+----------------------------+

Notice the UUID of the network, 0062e02b-7e40-49e84de096ed. This value is appended to


the network namespace, as shown by the following output.

[user@demo ~]$ ip netns list


qdhcp-0062e02b-7e40-49e84de096ed

This mapping allows for further troubleshooting. For example, administrators can review the
routing table for this project network.

[user@demo ~]$ ip netns exec 0062e02b-7e40-49e84de096ed ip route


default via 192.168.0.1 dev tapae83329c-91
192.168.0.0/24 dev tapae83329c-91 proto kernel scope link src 192.168.0.2

The tcpdump command can be used for the namespace. Administrators can, for example, open
another terminal window while trying to reach an external server.

[user@demo ~]$ ip netns exec 0062e02b-7e40-407f-ac43-49e84de096ed ping -c 3


172.25.250.254
PING 172.25.250.254 (172.25.250.254) 56(84) bytes of data.
64 bytes from 172.25.250.254: icmp_seq=1 ttl=63 time=0.368 ms
64 bytes from 172.25.250.254: icmp_seq=2 ttl=63 time=0.265 ms
64 bytes from 172.25.250.254: icmp_seq=3 ttl=63 time=0.267 ms

--- 172.25.250.254 ping statistics ---


3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.265/0.300/0.368/0.048 ms

[user@demo ~]$ ip netns exec 0062e02b-7e40-407f-ac43-49e84de096ed tcpdump -qnntpi any


icmp
IP 192.168.0.2 > 172.25.250.254: ICMP echo request, id 42731, seq 1, length 64
IP 172.25.250.254 > 192.168.0.2: ICMP echo reply, id 42731, seq 1, length 64
IP 192.168.0.2 > 172.25.250.254: ICMP echo request, id 42731, seq 2, length 64
IP 172.25.250.254 > 192.168.0.2: ICMP echo reply, id 42731, seq 2, length 64
IP 192.168.0.2 > 172.25.250.254: ICMP echo request, id 42731, seq 3, length 64
IP 172.25.250.254 > 192.168.0.2: ICMP echo reply, id 42731, seq 3, length 64

OpenStack Log Files


Each OpenStack service uses a log file located in /var/log/service/, where service is
the name of the service, such as nova. Inside the service's directory, there is one log file per
component. The following output lists the log files for OpenStack Networking services:

206 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


OpenStack Log Files

[root@overcloud-controller-0 neutron]# ls -al


total 512
drwxr-x---. 2 neutron neutron 144 May 30 20:34 .
drwxr-xr-x. 40 root root 4096 May 31 16:10 ..
-rw-r--r--. 1 neutron neutron 54540 May 31 16:16 dhcp-agent.log
-rw-r--r--. 1 neutron neutron 37025 May 31 16:17 l3-agent.log
-rw-r--r--. 1 neutron neutron 24094 May 31 16:16 metadata-agent.log
-rw-r--r--. 1 neutron neutron 91136 May 31 16:17 openvswitch-agent.log
-rw-r--r--. 1 neutron neutron 1097 May 31 16:13 ovs-cleanup.log
-rw-r--r--. 1 neutron neutron 298734 May 31 17:13 server.log

The log files use the standard logging levels defined by RFC 5424. The following table lists all log
levels and provides some examples that administrators are likely to encounter:

OpenStack Logging Levels


Level Explanation
TRACE Only logged if the service has a stack trace:

2015-09-18 17:32:47.156 649 TRACE neutron __import__(mod_str) 2015-09-18


17:32:47.156 649 TRACE neutron ValueError: Empty module name

DEBUG Logs all statements when debug is set to true in the service's configuration file.
INFO Logs informational messages. For example, an API call to the service:

2017-05-31 13:29:10.565 3537 INFO neutron.wsgi [req-d75c4ac3-


c338-410e-ae43-57f12fa34151 3b98aed2205547dca61dae9d774c228f
b51d4c2d48de4dc4a867a60ef1e24201 - - -] 172.25.249.200 - - [31/
May/2017 13:29:10] "GET /v2.0/ports.json?network_id=3772b5f7-
ee03-4ac2-9361-0119c15c5747&device_owner=network%3Adhcp HTTP/1.1" 200 1098
0.075020

AUDIT Logs significant events affecting server state or resources.


WARNING Logs non-fatal errors that prevents a request from executing successfully:

2017-05-30 16:33:10.628 3135 WARNING neutron.agent.securitygroups_rpc [-]


Driver configuration doesn't match with enable_security_group

ERROR Logs errors, such as miscommunication between two services:

2017-05-31 12:12:54.954 3540 ERROR oslo.messaging._drivers.impl_rabbit [-]


[0536dd37-7342-4763-a9b6-ec24e605ec1e] AMQP server on 172.25.249.200:5672
is unreachable: [Errno 111] ECONNREFUSED. Trying again in 6 seconds.
Client port: None

CRITICAL Logs critical errors that prevent a service from properly functioning:

2017-05-30 16:43:37.259 3082 CRITICAL nova


[req-4b5efa91-5f1f-4a68-8da5-8ad1f6b7e2f1 - - - - -]
MessagingTimeout: Timed out waiting for a reply to message ID
d778513388b748e5b99944aa42245f56

Most of the errors contain explicit statements about the nature of the problem, helping
administrators troubleshoot their environment. However, there are cases where the error that
is logged does not indicate the root cause of the problem. For example, if there is a critical error

CL210-RHOSP10.1-en-2-20171006 207

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

being logged, this does not say anything about what caused that error. Such an error can be
caused by a firewall rule, or by a congested network. OpenStack services communicate through a
message broker, which provides a resilient communication mechanism between the services. This
allows for most of the services to receive the messages even if there are network glitches.

Log files contain many entries, which makes it difficult to locate errors. Administrators can use
the grep command to filter on a specific log level. The following output indicates a network
timeout while a message was being exchanged between the DHCP agent the OpenStack
Networking server.

[root@demo]# grep -R ERROR /var/log/neutron/dhcp-agent.log


ERROR neutron.agent.dhcp.agent [req-515b6204-73c5-41f3-8ac6-70561bbad73f - - - - -]
Failed reporting state!
ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/
dhcp/agent.py", line 688, in
ERROR neutron.agent.dhcp.agent ctx, self.agent_state, True)
ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/
rpc.py", line 88, in report_state
ERROR neutron.agent.dhcp.agent return method(context, 'report_state', **kwargs)
ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/
rpc/client.py", line 169, in
ERROR neutron.agent.dhcp.agent retry=self.retry)
ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/
transport.py", line 97, in _send
ERROR neutron.agent.dhcp.agent timeout=timeout, retry=retry)
ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/
_drivers/amqpdriver.py", line
ERROR neutron.agent.dhcp.agent retry=retry)
ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/
_drivers/amqpdriver.py", line
ERROR neutron.agent.dhcp.agent result = self._waiter.wait(msg_id, timeout)
ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/
_drivers/amqpdriver.py", line
ERROR neutron.agent.dhcp.agent message = self.waiters.get(msg_id, timeout=timeout)
ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/
_drivers/amqpdriver.py", line
ERROR neutron.agent.dhcp.agent 'to message ID %s' % msg_id)
ERROR neutron.agent.dhcp.agent MessagingTimeout: Timed out waiting for a reply to
message ID 486663d30ddc488e98c612363779f4be
ERROR neutron.agent.dhcp.agent

Enabling Debug Mode


All OpenStack services use the same parameter to enable the debug level, named debug, in the
DEFAULT section. To enable debug mode for a given service, locate and open the configuration
file. For example, to enable debug mode for the OpenStack Networking DHCP agent, edit the /
etc/neutron/dhcp_agent.ini configuration file. In the file, locate the key debug and set it
to True and restart the service. To disable debug mode, give the key a value of False.

Troubleshooting Tips
When troubleshooting, administrators can start by drawing a diagram that details the network
topology. This helps to review the network interfaces being used, and how the servers are
connected to each other. They should also get familiar with most of the troubleshooting tools
presented in the table titled “Troubleshooting Utilities” of this section. When troubleshooting,
administrators can ask questions like:

• Are the OpenStack Networking services running?

208 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Troubleshooting Tips

• Did the instance retrieve an IP address?

• If an instance failed to boot, was there a port binding issue?

• Are the bridges present on the network node?

• Can instance be reached with the ping command in the project's network namespace?

Introduction to easyOVS
easyOVS, available on Github, is an open source tool for OpenStack that lists the rules or
validates the configuration of Open vSwitch bridges, Netfilter rules, and DVR configuration.
It can be used to map the IP address of an instance to the virtual port, or the VLAN tags and
namespaces in use. The tool is fully compatible with network namespaces.

The following output lists the Netfilter rules associated to a particular IP address:

EasyOVS > ipt vm 192.168.0.2


## IP = 192.168.0.2, port = qvo583c7038-d ##
PKTS SOURCE DESTINATION PROT OTHER
#IN:
672 all all all state RELATED,ESTABLISHED
0 all all tcp tcp dpt:22
0 all all icmp
0 192.168.0.4 all all
3 192.168.0.5 all all
8 10.0.0.2 all all
85784 192.168.0.3 all udp udp spt:67 dpt:68
#OUT:
196K all all udp udp spt:68 dpt:67
86155 all all all state RELATED,ESTABLISHED
1241 all all all
#SRC_FILTER:
59163 192.168.0.2 all all MAC FA:16:3E:9C:DC:3A

The following output shows information related to a port. In the following example, c4493802 is
the first portion of the port UUID that uses the IP address 10.0.0.2.

EasyOVS > query 10.0.0.2,c4493802


## port_id = f47c62b0-dbd2-4faa-9cdd-8abc886ce08f
status: ACTIVE
name:
allowed_address_pairs: []
admin_state_up: True
network_id: ea3928dc-b1fd-4a1a-940e-82b8c55214e6
tenant_id: 3a55e7b5f5504649a2dfde7147383d02
extra_dhcp_opts: []
binding:vnic_type: normal
device_owner: compute:az_compute
mac_address: fa:16:3e:52:7a:f2
fixed_ips: [{u'subnet_id': u'94bf94c0-6568-4520-aee3-d12b5e472128', u'ip_address':
u'10.0.0.2'}]
id: f47c62b0-dbd2-4faa-9cdd-8abc886ce08f
security_groups: [u'7c2b801b-4590-4a1f-9837-1cceb7f6d1d0']
device_id: c3522974-8a08-481c-87b5-fe3822f5c89c
## port_id = c4493802-4344-42bd-87a6-1b783f88609a
status: ACTIVE
name:
allowed_address_pairs: []
admin_state_up: True
network_id: ea3928dc-b1fd-4a1a-940e-82b8c55214e6

CL210-RHOSP10.1-en-2-20171006 209

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

tenant_id: 3a55e7b5f5504649a2dfde7147383d02
extra_dhcp_opts: []
binding:vnic_type: normal
device_owner: compute:az_compute
mac_address: fa:16:3e:94:84:90
fixed_ips: [{u'subnet_id': u'94bf94c0-6568-4520-aee3-d12b5e472128', u'ip_address':
u'10.0.0.4'}]
id: c4493802-4344-42bd-87a6-1b783f88609a
security_groups: [u'7c2b801b-4590-4a1f-9837-1cceb7f6d1d0']
device_id: 9365c842-9228-44a6-88ad-33d7389cda5f

Troubleshooting Network Issues


The following steps outline the process for tracing troubleshooting network issues.

1. Review the security group rules to ensure that, for example, ICMP traffic is allowed.

2. Connect to the network nodes to review the implementation of routers and networks
namespaces.

3. Use the ping command within network namespaces to reach the various network devices,
such as the interface for the router in the internal network.

4. Review the list of OpenStack Networking agents and their associated processes by using the
ps command to make sure that they are running.

References
Further information is available in the Networking Guide for Red Hat OpenStack
Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

easyOVS GitHub
https://github.com/yeasy/easyOVS

easyOVS Launchpad
https://launchpad.net/easyovs

RFC 5424: The Syslog Protocol


https://datatracker.ietf.org/doc/rfc5424/

210 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Guided Exercise: Troubleshooting Network Issues

Guided Exercise: Troubleshooting Network


Issues

In this exercise, you will troubleshoot network connectivity issues in a project network.

Outcomes
You should be able to:

• Review the network implementation for a project.

• Use Linux tools to troubleshoot network connectivity issues.

Scenario
Users are complaining that they cannot get to their instances using the floating IPs. A user has
provided an instance to test named research-app1 that can be used to troubleshoot the issue.

Before you begin


Log in to workstation as student using student as the password.

On workstation, run the lab network-troubleshooting setup command. This script


creates the research project for the developer1 user and creates the /home/student/
developer1-research-rc credentials file. The SSH public key is available at /home/
student/developer1-keypair1. The script deploys the instance research-app1 in the
research project with a floating IP in the provider-172.25.250 network.

[student@workstation ~]$ lab network-troubleshooting setup

Steps
1. From workstation, source the credentials for the developer1 user and review the
environment.

1.1. Source the credentials for the developer1 user located at /home/student/
developer1-research-rc. List the instances in the environment.

[student@workstation ~]$ source developer1-research-rc


[student@workstation ~(developer1-research)]$ openstack server list -f json
[
{
"Status": "ACTIVE",
"Networks": "research-network1=192.168.1.N, 172.25.250.P",
"ID": "2cfdef0a-a664-4d36-b27d-da80b4b8626d",
"Image Name": "rhel7",
"Name": "research-app1"
}
]

1.2. Retrieve the name of the security group that the instance uses.

[student@workstation ~(developer1-research)]$ openstack server show \


research-app1 -f json
{
...output omitted...

CL210-RHOSP10.1-en-2-20171006 211

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

"security_groups": [
{
"name": "default"
}
],
...output omitted...
}

1.3. List the rules for the default security group. Ensure that there is one rule that allows
traffic for SSH connections and one rule for ICMP traffic.

[student@workstation ~(developer1-research)]$ openstack security group \


rule list default -f json
[
...output omitted...
{
"IP Range": "0.0.0.0/0",
"Port Range": "22:22",
"Remote Security Group": null,
"ID": "3488a2cd-bd85-4b6e-b85c-e3cd7552fea6",
"IP Protocol": "tcp"
},
...output omitted...
{
"IP Range": "0.0.0.0/0",
"Port Range": "",
"Remote Security Group": null,
"ID": "f7588545-2d96-44a0-8ab7-46aa7cfbdb44",
"IP Protocol": "icmp"
}
]

1.4. List the networks in the environment.

[student@workstation ~(developer1-research)]$ openstack network list -f json


[
{
"Subnets": "8647161a-ada4-468f-ad64-8b7bb6f97bda",
"ID": "93e91b71-402e-45f6-a006-53a388e053f6",
"Name": "provider-172.25.250"
},
{
"Subnets": "ebdd4578-617c-4301-a748-30b7ca479e88",
"ID": "eed90913-f5f4-4e5e-8096-b59aef66c8d0",
"Name": "research-network1"
}
]

1.5. List the routers in the environment.

[student@workstation ~(developer1-research)]$ openstack router list -f json


[
{
"Status": "ACTIVE",
"Name": "research-router1",
"Distributed": false,
"Project": "ceb4194a5a3c40839a5b9ccf25c6794b",
"State": "UP",
"HA": false,

212 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


"ID": "8ef58601-1b60-4def-9e43-1935bb708938"
}
]

The output indicates that there is one router, research-router1.

1.6. Ensure that the router research-router1 has an IP address defined as a gateway
for the 172.25.250.0/24 network and an interface in the research-network1
network.

[student@workstation ~(developer1-research)]$ neutron router-port-list \


research-router1 -f json
[
{
"mac_address": "fa:16:3e:28:e8:85",
"fixed_ips": "{\"subnet_id\": \"ebdd4578-617c-4301-a748-30b7ca479e88\",
\"ip_address\": \"192.168.1.S\"}",
"id": "096c6e18-3630-4993-bafa-206e2f71acb6",
"name": ""
},
{
"mac_address": "fa:16:3e:d2:71:19",
"fixed_ips": "{\"subnet_id\": \"8647161a-ada4-468f-ad64-8b7bb6f97bda\",
\"ip_address\": \"172.25.250.R\"}",
"id": "c684682c-8acc-450d-9935-33234e2838a4",
"name": ""
}
]

2. Retrieve the floating IP assigned to the research-app1 instance and run the ping
command against the floating IP assigned to the instance, 172.25.250.P. The command
should fail.

[student@workstation ~(developer1-research)]$ openstack server list -f json


[
{
"Status": "ACTIVE",
"Networks": "research-network1=192.168.1.N, 172.25.250.P",
"ID": "2cfdef0a-a664-4d36-b27d-da80b4b8626d",
"Image Name": "rhel7",
"Name": "research-app1"
}
]
[student@workstation ~(developer1-research)]$ ping -c 3 172.25.250.P
PING 172.25.250.P (172.25.250.P) 56(84) bytes of data.
From 172.25.250.254 icmp_seq=1 Destination Host Unreachable
From 172.25.250.254 icmp_seq=2 Destination Host Unreachable
From 172.25.250.254 icmp_seq=3 Destination Host Unreachable

--- 172.25.250.P ping statistics ---


3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 1999ms

3. Attempt to connect to the instance as the root at its floating IP. The command should fail.

[student@workstation ~(developer1-research)]$ ssh root@172.25.250.P


ssh: connect to host 172.25.250.P port 22: No route to host

CL210-RHOSP10.1-en-2-20171006 213

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

4. Reach the IP address assigned to the router in the provider network, 172.25.250.R.

[student@workstation ~(developer1-research)]$ openstack router show \


research-router1 -f json
{
"external_gateway_info":
"{\"network_id\":
...output omitted...
\"ip_address\": \"172.25.250.R\"}]}",
...output omitted...
}

[student@workstation ~(developer1-research)]$ ping -c 3 172.25.250.R


PING 172.25.250.R (172.25.250.R) 56(84) bytes of data.
64 bytes from 172.25.250.R: icmp_seq=1 ttl=64 time=0.642 ms
64 bytes from 172.25.250.R: icmp_seq=2 ttl=64 time=0.238 ms
64 bytes from 172.25.250.R: icmp_seq=3 ttl=64 time=0.184 ms

--- 172.25.250.R ping statistics ---


3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.184/0.354/0.642/0.205 ms

5. Review the namespaces implementation on controller0. Use the ping command within
the qrouter namespace to reach the router's private IP.

5.1. Retrieve the UUID of the router research-router1. You will compare this UUID with
the one of the qrouter namespace.

[student@workstation ~(developer1-research)]$ openstack router show \


research-router1 -f json
{
...output omitted...
"id": "8ef58601-1b60-4def-9e43-1935bb708938",
"name": "research-router1"
}

5.2. Open another terminal and use the ssh command to log in to controller0 as the
heat-admin user. Review the namespace implementation. Ensure that the qrouter
namespace uses the ID returned by the previous command.

[student@workstation ~]$ ssh heat-admin@controller0


[heat-admin@overcloud-controller-0 ~]$ sudo ip netns list
qrouter-8ef58601-1b60-4def-9e43-1935bb708938

5.3. List the network devices in the qrouter namespace.

[heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \


qrouter-8ef58601-1b60-4def-9e43-1935bb708938 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
52: qr-096c6e18-36: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1446 qdisc noqueue
state UNKNOWN qlen 1000
link/ether fa:16:3e:28:e8:85 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.S/24 brd 192.168.1.255 scope global qr-096c6e18-36

214 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe28:e885/64 scope link
valid_lft forever preferred_lft forever
53: qg-c684682c-8a: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1496 qdisc noqueue
state UNKNOWN qlen 1000
link/ether fa:16:3e:d2:71:19 brd ff:ff:ff:ff:ff:ff
inet 172.25.250.R/24 brd 172.25.250.255 scope global qg-c684682c-8a
valid_lft forever preferred_lft forever
inet 172.25.250.P/32 brd 172.25.250.108 scope global qg-c684682c-8a
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fed2:7119/64 scope link
valid_lft forever preferred_lft forever</BROADCAST,MULTICAST,UP,LOWER_UP>

The output indicates that there are three devices: the loopback interface, lo, the TAP
device with the IP 172.25.250.R, and 172.25.250.P.

5.4. Within the qrouter namespace, run the ping command against the private IP of the
router, 192.168.1.S.

[heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \


qrouter-8ef58601-1b60-4def-9e43-1935bb708938 ping -c 3 192.168.1.S
PING 192.168.1.S (192.168.1.S) 56(84) bytes of data.
64 bytes from 192.168.1.S: icmp_seq=1 ttl=64 time=0.070 ms
64 bytes from 192.168.1.S: icmp_seq=2 ttl=64 time=0.041 ms
64 bytes from 192.168.1.S: icmp_seq=3 ttl=64 time=0.030 ms

--- 192.168.1.S ping statistics ---


3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.030/0.047/0.070/0.016 ms

6. From the first terminal, retrieve the private IP of the research-app1 instance. From the
second terminal, run the ping command against the private IP of the instance IP within the
qrouter namespace.

6.1. From the first terminal, retrieve the private IP of the research-app1 instance.

[student@workstation ~(developer1-research)]$ openstack server list -f json


[
{
"Status": "ACTIVE",
"Networks": "research-network1=192.168.1.N, 172.25.250.P",
"ID": "2cfdef0a-a664-4d36-b27d-da80b4b8626d",
"Image Name": "rhel7",
"Name": "research-app1"
}
]

6.2. From the second terminal, run the ping command in the qrouter namespace against
192.168.1.N. The output indicates that the command fails.

[heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \


qrouter-8ef58601-1b60-4def-9e43-1935bb708938 ping -c 3 192.168.1.N
PING 192.168.1.N (192.168.1.N) 56(84) bytes of data.
From 192.168.1.S icmp_seq=1 Destination Host Unreachable
From 192.168.1.S icmp_seq=2 Destination Host Unreachable
From 192.168.1.S icmp_seq=3 Destination Host Unreachable

--- 192.168.1.N ping statistics ---

CL210-RHOSP10.1-en-2-20171006 215

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2000ms

7. The previous output that listed the namespace indicated that the qdhcp namespace is
missing. Review the namespaces in controller0 to confirm that the namespace is missing.

[heat-admin@overcloud-controller-0 ~]$ sudo ip netns list


qrouter-8ef58601-1b60-4def-9e43-1935bb708938

8. The qdhcp namespace is created for the DHCP agents. List the running processes on
controller0. Use the grep command to filter dnsmasq processes. The output indicates
that no dnsmasq is running on the server.

[heat-admin@overcloud-controller-0 ~]$ ps axl | grep dnsmasq


0 1000 579973 534047 20 0 112648 960 pipe_w S+ pts/1 0:00 grep --
color=auto dnsmasq

9. From the first terminal, source the credentials of the administrative user, architect1,
located at /home/student/architect1-research-rc. List the Neutron agents to
ensure that there is one DHCP agent.

[student@workstation ~(developer1-research)]$ source architect1-research-rc


[student@workstation ~(architect1-research)]$ neutron agent-list -f json
...output omitted...
{
"binary": "neutron-dhcp-agent",
"admin_state_up": true,
"availability_zone": "nova",
"alive": ":-)",
"host": "overcloud-controller-0.localdomain",
"agent_type": "DHCP agent",
"id": "98fe6c9b-3f66-4d14-a88a-bfd7d819ddb7"
},
...output omitted...

10. List the Neutron ports to ensure that there is one IP assigned to the DHCP agent in the
192.168.1.0/24 network.

[student@workstation ~(architect1-research)]$ openstack port list \


-f json | grep 192.168.1
"Fixed IP Addresses": "ip_address='192.168.1.S', subnet_id='ebdd4578-617c-4301-
a748-30b7ca479e88'",
"Fixed IP Addresses": "ip_address='192.168.1.N', subnet_id='ebdd4578-617c-4301-
a748-30b7ca479e88'",

The output indicates that there are two ports in the subnet. This indicates that the
research-subnet1 does not run a DHCP server.

11. Update the subnet to run a DHCP server and confirm the updates in the environment.

11.1. Review the subnet properties. Locate the enable_dhcp property and confirm that it
reads False.

[student@workstation ~(architect1-research)]$ openstack subnet show \


research-subnet1

216 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| allocation_pools | 192.168.1.2-192.168.1.254 |
| cidr | 192.168.1.0/24 |
...output omitted...
| enable_dhcp | False |
...output omitted...

11.2. Run the openstack subnet set command to update the subnet. The command does
not produce any output.

[student@workstation ~(architect1-research)]$ openstack subnet \


set --dhcp research-subnet1

11.3. Review the updated subnet properties. Locate the enable_dhcp property and confirm
that it reads True.

[student@workstation ~(architect1-research)]$ openstack subnet show \


research-subnet1
+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| allocation_pools | 192.168.1.2-192.168.1.254 |
| cidr | 192.168.1.0/24 |
...output omitted...
| enable_dhcp | True |
...output omitted...

11.4. From the terminal connected to controller0, rerun the ps command. Ensure that a
dnsmasq is now running.

[heat-admin@overcloud-controller-0 ~]$ ps axl | grep dnsmasq


5 99 649028 1 20 0 15548 892 poll_s S ? 0:00 dnsmasq
--no-hosts \
--no-resolv \
--strict-order \
--except-interface=lo \
...output omitted...
--dhcp-match=set:ipxe,175 \
--bind-interfaces \
--interface=tapdc429585-22 \
--dhcp-range=set:tag0,192.168.1.0,static,86400s \
--dhcp-option-force=option:mtu,1446 \
--dhcp-lease-max=256 \
--conf-file= \
--domain=openstacklocal
0 1000 650642 534047 20 0 112648 960 pipe_w S+ pts/1 0:00 grep --
color=auto dnsmasq

11.5. From the first terminal, rerun the openstack port list command. Ensure that there
is a third IP in the research-subnet1 network.

[student@workstation ~(architect1-research)]$ openstack port list \


-f json | grep 192.168.1

CL210-RHOSP10.1-en-2-20171006 217

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

"Fixed IP Addresses": "ip_address='192.168.1.S',


subnet_id='ebdd4578-617c-4301-a748-30b7ca479e88'",
"Fixed IP Addresses": "ip_address='192.168.1.N',
subnet_id='ebdd4578-617c-4301-a748-30b7ca479e88'"
"Fixed IP Addresses": "ip_address='192.168.1.2',
subnet_id='ebdd4578-617c-4301-a748-30b7ca479e88'",

11.6. From the terminal connected to controller0, list the network namespaces. Ensure
that there is a new namespace called qdhcp.

[heat-admin@overcloud-controller-0 ~]$ ip netns list


qdhcp-eed90913-f5f4-4e5e-8096-b59aef66c8d0
qrouter-8ef58601-1b60-4def-9e43-1935bb708938

11.7. List the interfaces in the qdhcp namespace. Confirm that there is an interface with an
IP address of 192.168.1.2.

[heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \


qdhcp-eed90913-f5f4-4e5e-8096-b59aef66c8d0 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
61: tap7e9c0f8b-a7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1446 qdisc noqueue
state UNKNOWN qlen 1000
link/ether fa:16:3e:7c:45:e1 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.2/24 brd 192.168.1.255 scope global tap7e9c0f8b-a7
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe7c:45e1/64 scope link
valid_lft forever preferred_lft forever

12. From the first terminal, stop then start the research-app1 instance to reinitialize IP
assignment and cloud-init configuration.

12.1. Stop the instance.

[student@workstation ~(architect1-research)]$ openstack server stop \


research-app1

12.2.Confirm the instance is down.

[student@workstation ~(architect1-research)]$ openstack server show \


research-app1 -c status -f value
SHUTOFF

12.3.Start the instance.

[student@workstation ~(architect1-research)]$ openstack server start \


research-app1

13. Confirm that the instance is reachable.

218 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


13.1. Verify the floating IP is assigned to the research-app1 instance.

[student@workstation ~(architect1-research)]$ openstack server list -f json


[
{
"Status": "ACTIVE",
"Networks": "research-network1=192.168.1.N, 172.25.250.P",
"ID": "2cfdef0a-a664-4d36-b27d-da80b4b8626d",
"Image Name": "rhel7",
"Name": "research-app1"
}
]

13.2.Run the ping command against the floating IP 172.25.250.P until it responds.

[student@workstation ~(architect1-research)]$ ping 172.25.250.P


PING 172.25.250.P (172.25.250.P) 56(84) bytes of data.
...output omitted...
From 172.25.250.P icmp_seq=22 Destination Host Unreachable
From 172.25.250.P icmp_seq=23 Destination Host Unreachable
From 172.25.250.P icmp_seq=24 Destination Host Unreachable
From 172.25.250.P icmp_seq=25 Destination Host Unreachable
64 bytes from 172.25.250.P: icmp_seq=26 ttl=63 time=1.02 ms
64 bytes from 172.25.250.P: icmp_seq=27 ttl=63 time=0.819 ms
64 bytes from 172.25.250.P: icmp_seq=28 ttl=63 time=0.697 ms
...output omitted...
^C
--- 172.25.250.P ping statistics ---
35 packets transmitted, 10 received, +16 errors, 71% packet loss, time 34019ms
rtt min/avg/max/mdev = 4.704/313.475/2025.262/646.005 ms, pipe 4

13.3. Use ssh to connect to the instance. When finished, exit from the instance.

[student@workstation ~(architect1-research)]$ ssh -i developer1-keypair1.pem \


cloud-user@172.25.250.P
[cloud-user@research-app1 ~]$ exit

Cleanup
From workstation, run the lab network-troubleshooting cleanup script to clean up the
resources created in this exercise.

[student@workstation ~]$ lab network-troubleshooting cleanup

CL210-RHOSP10.1-en-2-20171006 219

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Lab: Managing and Troubleshooting Virtual


Network Infrastructure

In this lab, you will troubleshoot the network connectivity of OpenStack instances.

Outcomes
You should be able to:

• Use Linux tools to review the network configuration of instances.

• Review the network namespaces for a project.

• Restore the network connectivity of OpenStack instances.

Scenario
Cloud users reported issues reaching their instances via their floating IPs. Both ping and ssh
connections time out. You have been tasked with troubleshooting and fixing these issues.

Before you begin


Log in to workstation as student using student as the password.

On workstation, run the lab network-review setup command. This script creates the
production project for the operator1 user and creates the /home/student/operator1-
production-rc credentials file. The SSH public key is available at /home/student/
operator1-keypair1.pem. The script deploys the instance production-app1 in the
production project with a floating IP in the provider-172.25.250 network.

[student@workstation ~]$ lab network-review setup

Steps
1. As the operator1 user, list the instances present in the environment. The credentials file
for the user is available at /home/student/operator1-production-rc. Ensure that the
instance production-app1 is running and has an IP in the 192.168.1.0/24 network

2. Attempt to reach the instance via its floating IP by using the ping and ssh commands.
Confirm that the commands time out. The private key for the SSH connection is available at
/home/student/operator1-keypair1.pem.

3. Review the security rules for the security group assigned to the instance. Ensure that there
is a rule that authorizes packets sent by the ping command to pass.

4. As the administrative user, architect1, ensure that the external network


provider-172.25.250 is present. The credentials file for the user is available at /home/
student/architect1-production-rc. Review the network type and the physical
network defined for the network. Ensure that the network is a flat network that uses the
datacentre provider network.

5. As the operator1 user, list the routers in the environment. Ensure that production-
router1 is present, has a private network port, and is the gateway for the external network.

220 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


6. From the compute node, review the network implementation by listing the Linux bridges and
ensure that the ports are properly defined. Ensure that there is one bridge with two ports in
it. The bridge and the port names should be named after the first 10 characters of the port
UUID in the private network for the instance production-app1.

7. From workstation, use the ssh command to log in to controller0 as the heat-admin
user. List the network namespaces to ensure that there is a namespace for the router and
for the internal network production-network1. Review the UUID of the router and the
UUID of the internal network to make sure they match the UUIDs of the namespaces.

List the interfaces in the network namespace for the internal network. Within the private
network namespace, use the ping command to reach the private IP address of the router.
Run the ping command within the qrouter namespace against the IP assigned as a
gateway to the router. From the tenant network namespace, use the ping command to reach
the private IP of the instance.

8. From controller0, review the bridge mappings configuration. Ensure that the provider
network named datacentre is mapped to the br-ex bridge. Review the configuration
of the Open vSwitch bridge br-int. Ensure that there is a patch port for the connection
between the integration bridge and the external bridge. Retrieve the name of the peer
port for the patch from the integration bridge to the external bridge. Make any necessary
changes.

9. From workstation use the ping command to reach the IP defined as a gateway for the
router and the floating IP associated to the instance. Use the ssh command to log in to the
instance production-app1 as the cloud-user user. The private key is available at /
home/student/operator1-keypair1.pem.

Evaluation
From workstation, run the lab network-review grade command to confirm the success
of this exercise. Correct any reported failures and rerun the command until successful.

[student@workstation ~]$ lab network-review grade

Cleanup
From workstation, run the lab network-review cleanup command to clean up this
exercise.

[student@workstation ~]$ lab network-review cleanup

CL210-RHOSP10.1-en-2-20171006 221

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Solution
In this lab, you will troubleshoot the network connectivity of OpenStack instances.

Outcomes
You should be able to:

• Use Linux tools to review the network configuration of instances.

• Review the network namespaces for a project.

• Restore the network connectivity of OpenStack instances.

Scenario
Cloud users reported issues reaching their instances via their floating IPs. Both ping and ssh
connections time out. You have been tasked with troubleshooting and fixing these issues.

Before you begin


Log in to workstation as student using student as the password.

On workstation, run the lab network-review setup command. This script creates the
production project for the operator1 user and creates the /home/student/operator1-
production-rc credentials file. The SSH public key is available at /home/student/
operator1-keypair1.pem. The script deploys the instance production-app1 in the
production project with a floating IP in the provider-172.25.250 network.

[student@workstation ~]$ lab network-review setup

Steps
1. As the operator1 user, list the instances present in the environment. The credentials file
for the user is available at /home/student/operator1-production-rc. Ensure that the
instance production-app1 is running and has an IP in the 192.168.1.0/24 network

1.1. From workstation, source the operator1-production-rc file and list the running
instances.

[student@workstation ~]$ source ~/operator1-production-rc


[student@workstation ~(operator1-production)]$ openstack server list -f json
[
{
"Status": "ACTIVE",
"Networks": "production-network1=192.168.1.N, 172.25.250.P",
"ID": "ab497ff3-0335-4b17-bd3d-5aa2a4497bf0",
"Image Name": "rhel7",
"Name": "production-app1"
}
]

2. Attempt to reach the instance via its floating IP by using the ping and ssh commands.
Confirm that the commands time out. The private key for the SSH connection is available at
/home/student/operator1-keypair1.pem.

2.1. Run the ping command against the floating IP 172.25.250.P. The command should
fail.

222 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

[student@workstation ~(operator1-production)]$ ping -c 3 172.25.250.P


PING 172.25.250.P (172.25.250.P) 56(84) bytes of data.
From 172.25.250.254 icmp_seq=1 Destination Host Unreachable
From 172.25.250.254 icmp_seq=2 Destination Host Unreachable
From 172.25.250.254 icmp_seq=3 Destination Host Unreachable

--- 172.25.250.102 ping statistics ---


3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 1999ms

2.2. Attempt to connect to the instance using the ssh command. The command should fail.

[student@workstation ~(operator1-production)]$ ssh -i ~/operator1-keypair1.pem \


cloud-user@172.25.250.P
ssh: connect to host 172.25.250.P port 22: No route to host

3. Review the security rules for the security group assigned to the instance. Ensure that there
is a rule that authorizes packets sent by the ping command to pass.

3.1. Retrieve the name of the security group that the instance production-app1 uses.

[student@workstation ~(operator1-production)]$ openstack server show \


production-app1
+--------------------------------------+-------------------------+
| Field | Value |
+--------------------------------------+-------------------------+
...output omitted...
| security_groups | [{u'name': u'default'} |
...output omitted...
+--------------------------------------+-------------------------+

3.2. List the rules in the default security group. Ensure that there is a rule for ICMP traffic.

[student@workstation ~(operator1-production)]$ openstack security group \


rule list default -f json
[
{
"IP Range": "0.0.0.0/0",
"Port Range": "",
"Remote Security Group": null,
"ID": "68baac6e-7981-4326-a054-e8014565be6e",
"IP Protocol": "icmp"
},
...output omitted...

The output indicates that there is a rule for the ICMP traffic. This indicates that the
environment requires further troubleshooting.

4. As the administrative user, architect1, ensure that the external network


provider-172.25.250 is present. The credentials file for the user is available at /home/
student/architect1-production-rc. Review the network type and the physical
network defined for the network. Ensure that the network is a flat network that uses the
datacentre provider network.

CL210-RHOSP10.1-en-2-20171006 223

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

4.1. Source the architect1 credentials. List the networks. Confirm that the
provider-172.25.250 network is present.

[student@workstation ~(operator1-production)]$ source ~/architect1-production-rc


[student@workstation ~(architect1-production)]$ openstack network list -f json
[
{
"Subnets": "2b5110fd-213f-45e6-8761-2e4a2bcb1457",
"ID": "905b4d65-c20f-4cac-88af-2b8e0d2cf47e",
"Name": "provider-172.25.250"
},
{
"Subnets": "a4c40acb-f532-4b99-b8e5-d1df14aa50cf",
"ID": "712a28a3-0278-4b4e-94f6-388405c42595",
"Name": "production-network1"
}
]

4.2. Review the provider-172.25.250 network details, including the network type and
the physical network defined.

[student@workstation ~(architect1-production)]$ openstack network \


show provider-172.25.250
+---------------------------+--------------------------------------+
| Field | Value |
+---------------------------+--------------------------------------+
| admin_state_up | UP |
| availability_zone_hints | |
| availability_zones | nova |
| created_at | 2017-06-02T16:37:48Z |
| description | |
| id | 905b4d65-c20f-4cac-88af-2b8e0d2cf47e |
| ipv4_address_scope | None |
| ipv6_address_scope | None |
| is_default | False |
| mtu | 1496 |
| name | provider-172.25.250 |
| port_security_enabled | True |
| project_id | 91f3ed0e78ad476495a6ad94fbd7d2c1 |
| project_id | 91f3ed0e78ad476495a6ad94fbd7d2c1 |
| provider:network_type | flat |
| provider:physical_network | datacentre |
| provider:segmentation_id | None |
| qos_policy_id | None |
| revision_number | 6 |
| router:external | External |
| shared | True |
| status | ACTIVE |
| subnets | 2b5110fd-213f-45e6-8761-2e4a2bcb1457 |
| tags | [] |
| updated_at | 2017-06-02T16:37:51Z |
+---------------------------+--------------------------------------+

5. As the operator1 user, list the routers in the environment. Ensure that production-
router1 is present, has a private network port, and is the gateway for the external network.

5.1. Source the operator1-production-rc credentials file and list the routers in the
environment.

224 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

[student@workstation ~(architect1-production)]$ source ~/operator1-production-rc


[student@workstation ~(operator1-production)]$ openstack router list -f json
[
{
"Status": "ACTIVE",
"Name": "production-router1",
"Distributed": "",
"Project": "91f3ed0e78ad476495a6ad94fbd7d2c1",
"State": "UP",
"HA": "",
"ID": "e64e7ed3-8c63-49ab-8700-0206d1b0f954"
}
]

5.2. Display the router details. Confirm that the router is the gateway for the external
network provider-172.25.250.

[student@workstation ~(operator1-production)]$ openstack router show \


production-router1
+-------------------------+-----------------------------------------+
| Field | Value |
+-------------------------+-----------------------------------------+
| admin_state_up | UP |
| availability_zone_hints | |
| availability_zones | nova |
| created_at | 2017-06-02T17:25:00Z |
| description | |
| external_gateway_info | {"network_id": "905b(...)f47e", |
| | "enable_snat": true, |
| | "external_fixed_ips": |
| | [{"subnet_id": "2b51(...)1457", |
| | "ip_address": |
| | "172.25.250.S"}]} |
| flavor_id | None |
| id | e64e7ed3-8c63-49ab-8700-0206d1b0f954 |
| name | production-router1 |
| project_id | 91f3ed0e78ad476495a6ad94fbd7d2c1 |
| project_id | 91f3ed0e78ad476495a6ad94fbd7d2c1 |
| revision_number | 7 |
| routes | |
| status | ACTIVE |
| updated_at | 2017-06-02T17:25:04Z |
+-------------------------+-----------------------------------------+

5.3. Use ping to test the IP defined as the router gateway interface. Observe the command
timing out.

[student@workstation ~(operator1-production)]$ ping -W 5 -c 3 172.25.250.S


PING 172.25.250.S (172.25.250.S) 56(84) bytes of data.

--- 172.25.250.S ping statistics ---


3 packets transmitted, 0 received, 100% packet loss, time 1999ms

The ping test was unable to reach the external gateway interface of the router from an
external host, but the root cause is still unknown, so continue troubleshooting.

CL210-RHOSP10.1-en-2-20171006 225

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

6. From the compute node, review the network implementation by listing the Linux bridges and
ensure that the ports are properly defined. Ensure that there is one bridge with two ports in
it. The bridge and the port names should be named after the first 10 characters of the port
UUID in the private network for the instance production-app1.

From workstation, use ssh to connect to compute0 as the heat-admin user. Review
the configuration of the Open vSwitch integration bridge. Ensure that the vEth pair, which
has a port associated to the bridge, has another port in the integration bridge. Exit from the
virtual machine.

6.1. From the first terminal, list the network ports. Ensure that the UUID
matches the private IP of the instance. In this example, the UUID is
04b3f285-7183-4673-836b-317d80c27904, which matches the characters displayed
above.

[student@workstation ~(operator1-production)]$ openstack port list -f json


[
{
"Fixed IP Addresses": "ip_address='192.168.1.N',
subnet_id='a4c40acb-f532-4b99-b8e5-d1df14aa50cf'",
"ID": "04b3f285-7183-4673-836b-317d80c27904",
"MAC Address": "fa:16:3e:c8:cb:3d",
"Name": ""
},
...output omitted...

6.2. Use the ssh command to log in to compute0 as the heat-admin user. Use the brctl
command to list the Linux bridges. Ensure that there is a qbr bridge with two ports in it.
The bridge and the ports should be named after the first 10 characters of the port of the
instance in the private network.

[student@workstation ~] $ ssh heat-admin@compute0


[heat-admin@overcloud-compute-0 ~]$ brctl show
bridge name bridge id STP enabled interfaces
qbr04b3f285-71 8000.9edbfc39d5a5 no qvb04b3f285-71
tap04b3f285-71

6.3. As the root user from the compute0 virtual machine, list the network ports in the
integration bridge, br-int. Ensure that the port of the vEth pair qvo is present in the
integration bridge.

[heat-admin@overcloud-compute-0 ~]$ sudo ovs-vsctl list-ifaces br-int


int-br-ex
patch-tun
qvo04b3f285-71

The qvo port exists as expected, so continue troubleshooting.

6.4. Exit from compute0.

[heat-admin@overcloud-compute-0 ~]$ exit


[stack@director ~] $

226 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

7. From workstation, use the ssh command to log in to controller0 as the heat-admin
user. List the network namespaces to ensure that there is a namespace for the router and
for the internal network production-network1. Review the UUID of the router and the
UUID of the internal network to make sure they match the UUIDs of the namespaces.

List the interfaces in the network namespace for the internal network. Within the private
network namespace, use the ping command to reach the private IP address of the router.
Run the ping command within the qrouter namespace against the IP assigned as a
gateway to the router. From the tenant network namespace, use the ping command to reach
the private IP of the instance.

7.1. Use the ssh command to log in to controller0 as the heat-admin user. List the
network namespaces.

[student@workstation ~] $ ssh heat-admin@controller0


[heat-admin@overcloud-controller-0 ~]$ ip netns list
qrouter-e64e7ed3-8c63-49ab-8700-0206d1b0f954
qdhcp-712a28a3-0278-4b4e-94f6-388405c42595

7.2. From the previous terminal, retrieve the UUID of the router production-router1.
Ensure that the output matches the qrouter namespace.

[student@workstation ~(operator1-production)]$ openstack router show \


production-router1
+-------------------------+-------------------------------------------+
| Field | Value |
+-------------------------+-------------------------------------------+
...output omitted... |
| flavor_id | None |
| id | e64e7ed3-8c63-49ab-8700-0206d1b0f954 |
| name | production-router1 |
...output omitted...
| updated_at | 2017-06-02T17:25:04Z |
+-------------------------+-------------------------------------------+

7.3. Retrieve the UUID of the private network, production-network1. Ensure that the
output matches the qdhcp namespace.

[student@workstation ~(operator1-production)]$ openstack network \


show production-network1
+-------------------------+--------------------------------------+
| Field | Value |
+-------------------------+--------------------------------------+
...output omitted...
| description | |
| id | 712a28a3-0278-4b4e-94f6-388405c42595 |
...output omitted...
+-------------------------+--------------------------------------+

7.4. Use the neutron command to retrieve the interfaces of the router production-
router1.

[student@workstation ~(operator1-production)]$ neutron router-port-list \


production-router1
+--------------------------------------+------+-------------------+

CL210-RHOSP10.1-en-2-20171006 227

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

| id | name | mac_address |
+--------------------------------------+------+-------------------+
| 30fc535c-85a9-4be4-b219-e810deec88d1 | | fa:16:3e:d4:68:d3 |
| bda4e07f-64f4-481d-a0bd-01791c39df92 | | fa:16:3e:90:4f:45 |
+--------------------------------------+------+-------------------+
-------------------------------------------------------------------+
fixed_ips |
-------------------------------------------------------------------+
{"subnet_id": "a4c40acb-f532-4b99-b8e5-d1df14aa50cf",
"ip_address": "192.168.1.R"} |
{"subnet_id": "2b5110fd-213f-45e6-8761-2e4a2bcb1457",
"ip_address": "172.25.250.S"} |
-------------------------------------------------------------------+

7.5. From the terminal connected to the controller, use the ping command within the qdhcp
namespace to reach the private IP of the router.

[heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \


qdhcp-712a28a3-0278-4b4e-94f6-388405c42595 ping -c 3 192.168.1.R
PING 192.168.1.R (192.168.1.R) 56(84) bytes of data.
64 bytes from 192.168.1.R: icmp_seq=1 ttl=64 time=0.107 ms
64 bytes from 192.168.1.R: icmp_seq=2 ttl=64 time=0.041 ms
64 bytes from 192.168.1.R: icmp_seq=3 ttl=64 time=0.639 ms

--- 192.168.1.1 ping statistics ---


3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.041/0.262/0.639/0.268 ms

7.6. Within the router namespace, qrouter, run the ping command against the IP defined
as a gateway in the 172.25.250.0/24 network.

[heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \


qrouter-e64e7ed3-8c63-49ab-8700-0206d1b0f954 ping -c 3 172.25.250.S
PING 172.25.250.S (172.25.250.S) 56(84) bytes of data.
64 bytes from 172.25.250.S: icmp_seq=1 ttl=64 time=0.091 ms
64 bytes from 172.25.250.S: icmp_seq=2 ttl=64 time=0.037 ms
64 bytes from 172.25.250.S: icmp_seq=3 ttl=64 time=0.597 ms

--- 172.25.250.25 ping statistics ---


3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.037/0.241/0.597/0.252 ms

7.7. Retrieve the IP of the instance in the internal network.

[student@workstation ~(operator1-production)]$ openstack server list -f json


[
{
"Status": "ACTIVE",
"Networks": "production-network1=192.168.1.N, 172.25.250.P",
"ID": "ab497ff3-0335-4b17-bd3d-5aa2a4497bf0",
"Image Name": "rhel7",
"Name": "production-app1"
}
]

7.8. Use the ping command in the same namespace to reach the private IP of the instance
production-app1.

228 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

[heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec \


qdhcp-712a28a3-0278-4b4e-94f6-388405c42595 ping -c 3 192.168.1.N
PING 192.168.1.N (192.168.1.N) 56(84) bytes of data.
64 bytes from 192.168.1.N: icmp_seq=1 ttl=64 time=0.107 ms
64 bytes from 192.168.1.N: icmp_seq=2 ttl=64 time=0.041 ms
64 bytes from 192.168.1.N: icmp_seq=3 ttl=64 time=0.639 ms

--- 192.168.1.1 ping statistics ---


3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.041/0.262/0.639/0.268 ms

8. From controller0, review the bridge mappings configuration. Ensure that the provider
network named datacentre is mapped to the br-ex bridge. Review the configuration
of the Open vSwitch bridge br-int. Ensure that there is a patch port for the connection
between the integration bridge and the external bridge. Retrieve the name of the peer
port for the patch from the integration bridge to the external bridge. Make any necessary
changes.

8.1. From controller0, as the root user, review the bridge mappings configuration.
Bridge mappings for Open vSwitch are defined in the /etc/neutron/plugins/ml2/
openvswitch_agent.ini configuration file. Ensure that the provider network name,
datacentre, is mapped to the br-ex bridge.

[heat-admin@overcloud-controller-0 ~]$ cd /etc/neutron/plugins/ml2/


[heat-admin@overcloud-controller-0 ml2]$ sudo grep \
bridge_mappings openvswitch_agent.ini
#bridge_mappings =
bridge_mappings =datacentre:br-ex

8.2. Review the ports in the integration bridge br-int. Ensure that there is a patch port in
the integration bridge. The output lists phy-br-ex as the peer for the patch

[heat-admin@overcloud-controller-0 ml2]$ sudo ovs-vsctl show


...output omitted...
Bridge br-int
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port "tapfabe9e7e-0b"
tag: 1
Interface "tapfabe9e7e-0b"
type: internal
Port "qg-bda4e07f-64"
tag: 3
Interface "qg-bda4e07f-64"
type: internal
Port int-br-ex
Interface int-br-ex
type: patch
options: {peer=phy-br-ex}
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port "qr-30fc535c-85"
tag: 1
Interface "qr-30fc535c-85"

CL210-RHOSP10.1-en-2-20171006 229

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

type: internal
Port br-int
Interface br-int
type: internal

8.3. List the ports in the external bridge, br-ex. The output indicates that the port phy-br-
ex is absent from the bridge.

[heat-admin@overcloud-controller-0 ml2]$ sudo ovs-vsctl show


...output omitted...
Bridge br-ex
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port "eth2"
Interface "eth2"
Port br-ex
Interface br-ex
type: internal

8.4. Patch ports are managed by the neutron-openvswitch-agent, which uses the
bridge mappings for Open vSwitch bridges. Restart the neutron-openvswitch-
agent.

[heat-admin@overcloud-controller-0 ml2]$ sudo systemctl restart \


neutron-openvswitch-agent.service

8.5. Wait a minute then list the ports in the external bridge, br-ex. Ensure that the patch
port phy-br-ex is present in the external bridge.

[heat-admin@overcloud-controller-0 ml2]$ sudo ovs-vsctl show


...output omitted...
Bridge br-ex
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port "eth2"
Interface "eth2"
Port br-ex
Interface br-ex
type: internal
Port phy-br-ex
Interface phy-br-ex
type: patch
options: {peer=int-br-ex}

9. From workstation use the ping command to reach the IP defined as a gateway for the
router and the floating IP associated to the instance. Use the ssh command to log in to the
instance production-app1 as the cloud-user user. The private key is available at /
home/student/operator1-keypair1.pem.

9.1. Use the ping command to reach the IP of the router defined as a gateway.

[student@workstation ~(operator1-production)]$ ping -W 5 -c 3 172.25.250.S


PING 172.25.250.S (172.25.250.S) 56(84) bytes of data.
64 bytes from 172.25.250.S: icmp_seq=1 ttl=64 time=0.658 ms

230 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

64 bytes from 172.25.250.S: icmp_seq=2 ttl=64 time=0.273 ms


64 bytes from 172.25.250.S: icmp_seq=3 ttl=64 time=0.297 ms

--- 172.25.250.25 ping statistics ---


3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.273/0.409/0.658/0.176 ms

9.2. Retrieve the floating IP allocated to the production-app1 instance.

[student@workstation ~(operator1-production)]$ openstack server list -f json


[
{
"Status": "ACTIVE",
"Networks": "production-network1=192.168.1.N, 172.25.250.P",
"ID": "ab497ff3-0335-4b17-bd3d-5aa2a4497bf0",
"Image Name": "rhel7",
"Name": "production-app1"
}
]

9.3. Use the ping command to reach the floating IP allocated to the instance.

[student@workstation ~(operator1-production)]$ ping -W 5 -c 3 172.25.250.P


PING 172.25.250.P (172.25.250.P) 56(84) bytes of data.
64 bytes from 172.25.250.P: icmp_seq=1 ttl=63 time=0.658 ms
64 bytes from 172.25.250.P: icmp_seq=2 ttl=63 time=0.616 ms
64 bytes from 172.25.250.P: icmp_seq=3 ttl=63 time=0.690 ms

--- 172.25.250.P ping statistics ---


3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.616/0.654/0.690/0.042 ms

9.4. Use the ssh command to log in to the instance as the cloud-user user. The private
key is available at /home/student/operator1-keypair1.pem. Exit from the
instance.

[student@workstation ~(operator1-production)]$ ssh -i ~/operator1-keypair1 \


cloud-user@172.25.250.P
[cloud-user@production-app1 ~]$ exit
[student@workstation ~(operator1-production)]$

Evaluation
From workstation, run the lab network-review grade command to confirm the success
of this exercise. Correct any reported failures and rerun the command until successful.

[student@workstation ~]$ lab network-review grade

Cleanup
From workstation, run the lab network-review cleanup command to clean up this
exercise.

[student@workstation ~]$ lab network-review cleanup

CL210-RHOSP10.1-en-2-20171006 231

Rendered for Nokia. Please do not distribute.


Chapter 5. Managing and Troubleshooting Virtual Network Infrastructure

Summary
In this chapter, you learned:

• Software-defined networking (SDN) is a networking model that allows network administrators to


manage network services through the abstraction of several networking layers. SDN decouples
the software that handles the traffic, called the control plane, and the underlying mechanisms
that route the traffic, called the data plane.

• OpenStack Networking (Neutron) is the SDN networking project that provides Networking-as-
a-service (NaaS) in virtual environments. It implements traditional networking features such as
subnetting, bridging, VLANs, and more recent technologies, such as VXLAN and GRE tunnels.

• The Modular Layer 2 (ML2) plug-in is a framework that enables the usage of various
technologies. Administrators can interact with Open vSwitch or any vendor technology, such as
Cisco equipments, thanks to the various plug-ins available for OpenStack Networking.

• When troubleshooting, administrators can use a variety of tools, such as ping, ip,
traceroute, and tcpdump.

232 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


TRAINING
CHAPTER 6

MANAGING RESILIENT
COMPUTE RESOURCES

Overview
Goal Add compute nodes, manage shared storage, and perform
instance live migration.
Objectives • View introspection data, orchestration templates, and
configuration manifests used to build the Overcloud.

• Add a compute node to the Overcloud using the


Undercloud

• Perform instance live migration using block storage.

• Configure shared storage for Nova compute services and


perform instance live migration with shared storage.
Sections • Configuring an Overcloud Deployment (and Guided
Exercise)

• Scaling Compute Nodes (and Guided Exercise)

• Migrating Instances using Block Storage (and Guided


Exercise)

• Migrating Instances using Shared Storage (and Guided


Exercise)
Lab • Managing Resilient Compute Resources

CL210-RHOSP10.1-en-2-20171006 233

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

Configuring an Overcloud Deployment

Objectives
After completing this section, students should be able to:

• Prepare to deploy an overcloud

• Describe the undercloud introspection process

• Describe the overcloud orchestration process

Red Hat OpenStack Platform director is the undercloud, with components for provision and
managing the infrastructure nodes that will become the overcloud. An undercloud is responsible
for planning overcloud roles, creating the provisioning network configuration and services,
locating and inventorying nodes prior to deployment, and running the workflow service that
facilitates the deployment process. The Red Hat OpenStack Platform director installation comes
complete with sample deployment templates and both command-line and web-based user
interface tools for configuring and monitoring overcloud deployments.

Note
Underclouds and tools for provisioning overclouds are relatively new technologies and
are still evolving. The choices for overcloud design and configuration are as limitless
as the use cases for which they are built. The following demonstration and lecture is
an introduction to undercloud tasks and overcloud preparation, and is not intended
to portray recommended practice for any specific use case. The cloud architecture
presented here is designed to satisfy the technical requirements of this classroom.

Introspecting Nodes
To provision overcloud nodes, the undercloud is configured with a provisioning network and IPMI
access information about the nodes it will manage. The provisioning network is a large-capacity,
dedicated, and isolated network, separate from the normal public network. During deployment,
orchestration will reconfigure nodes' network interfaces with Open vSwitch bridges, which would
cause the deployment process to disconnect if the provisioning and deployed networks shared
the same interface. After deployment, Red Hat OpenStack Platform director will continue to
manage and update the overcloud across this isolated, secure provisioning network, completely
segregated from both external and internal OpenStack traffic.

Verify the provisioning network


View the undercloud.conf file, created to build the undercloud, to verify the provisioning
network. In the output below, the DHCP address range, from dhcp_start to dhcp_end, is
the scope for the OpenStack Networking dnsmasq service managing the provisioning subnet.
Nodes deployed to the provisioning network are assigned an IP address from this scope for their
provisioning NIC. The inspection_iprange is the scope for the bare metal dnsmasq service,
for assigning addresses temporarily to registered nodes during the PXE boot at the start of the
introspection process.

[user@undercloud]$ head -12 undercloud.conf

234 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Introspecting Nodes

[DEFAULT]
local_ip = 172.25.249.200/24
undercloud_public_vip = 172.25.249.201
undercloud_admin_vip = 172.25.249.202
local_interface = eth0
masquerade_network = 172.25.249.0/24
dhcp_start = 172.25.249.51
dhcp_end = 172.25.249.59
network_cidr = 172.25.249.0/24
network_gateway = 172.25.249.200
inspection_iprange = 172.25.249.150,172.25.249.180
generate_service_certificate = true

View the undercloud's configured network interfaces. The br-ctlplane bridge is the
172.25.249.0 provisioning network; the eth1 interface is the 172.25.250.0 public network.

[user@undercloud]$ ip a | grep -E 'br-ctlplane|eth1'


3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
inet 172.25.250.200/24 brd 172.25.250.255 scope global eth1
7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
inet 172.25.249.200/24 brd 172.25.249.255 scope global br-ctlplane
inet 172.25.249.202/32 scope global br-ctlplane
inet 172.25.249.201/32 scope global br-ctlplane

The provisioning subnet is configured for DHCP. The Networking service has configured
a dnsmasq instance to manage the scope. Verify the subnet with the location of the DNS
nameserver, to be handed out to DHCP clients as a scope option with a default gateway.

[user@undercloud]$ openstack subnet list -c ID -c Name


+--------------------------------------+-----------------+
| ID | Subnet |
+--------------------------------------+-----------------+
| 5e627758-6ec6-48f0-9ea6-1d4803f0196d | 172.25.249.0/24 |
+--------------------------------------+-----------------+
[user@undercloud]$ openstack subnet show 5e627758-6ec6-48f0-9ea6-1d4803f0196d
+-------------------+------------------------------------------------------------+
| Field | Value |
+-------------------+------------------------------------------------------------+
| allocation_pools | 172.25.249.51-172.25.249.59 |
| cidr | 172.25.249.0/24 |
| dns_nameservers | 172.25.250.200 |
| enable_dhcp | True |
| host_routes | destination='169.254.169.254/32', gateway='172.25.249.200' |
...output omitted...

Confirm resources for the nodes


The nodes to be deployed are typically bare metal physical systems, such as blade servers or
rack systems, with IPMI management interfaces for remote power off access and administration.
Access each node to verify that the systems are configured with multiple NICs, and the correct
configuration of CPU, RAM, and hard disk space for the assigned deployment role. In this course,
the nodes are virtual machines with a small-scale configuration.

Power management in a cloud environment normally uses the IPMI management NIC built into
a server chassis. However, virtual machines do not normally have a lights-out-management
platform interface. Instead, they are controlled by the appropriate virtualization management
software, which connects to the running virtual machine's hypervisor to request power
management actions and events. In this classroom, a Baseboard Management Controller (BMC)

CL210-RHOSP10.1-en-2-20171006 235

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

emulator is running on the power virtual machine, configured with one unique IP address
per virtual machine node. Upon receiving a valid IPMI request at the correct listener, the BMC
emulator sends the request to the hypervisor, which performs the request on the corresponding
virtual machine.

Define and verify the MAC address, IPMI address, power management user name and password,
for each node to be registered, in the instack configuration file instackenv-initial.json.
This node registration file can be either JSON or YAML format. The following example shows the
instack configuration file in JSON format.

[user@undercloud]$ cat instackenv-initial.json


{
"nodes": [
{
"name": "controller0",
"pm_user": "admin",
"arch": "x86_64",
"mac": [ "52:54:00:00:f9:01" ],
"cpu": "2",
"memory": "8192",
"disk": "40",
"pm_addr": "172.25.249.101",
"pm_type": "pxe_ipmitool",
"pm_password": "password"
}
..output omitted...

The next step is to register the nodes with the Bare Metal service. The Workflow service manages
this task set, which includes the ability to schedule and monitor multiple tasks and actions.

[user@undercloud]$ openstack baremetal import --json instackenv-initial.json


Started Mistral Workflow. Execution ID: 112b4907-2499-4538-af5d-37d3f934f31c
Successfully registered node UUID 5206cc66-b513-4b01-ac1b-cd2d6de06b7d
Successfully registered node UUID 099b3fd5-370d-465b-ba7d-e9a19963d0af
Successfully registered node UUID 4fef49a8-fe55-4e96-ac26-f23f192a6408
Started Mistral Workflow. Execution ID: 2ecd83b1-045d-4536-9cf6-74a2db52baca
Successfully set all nodes to available.

Single or multiple hosts may be introspected simultaneously. When building new clouds,
performing bulk introspection is common. After an overcloud cloud is operational, it is best to
set a manageable provisioning state on selected nodes, then invoke introspection only on those
selected nodes. Introspection times vary depending on the number of nodes and the throughput
capacity of the provisioning network, because the introspection image must be pushed to each
node during the PXE boot. If introspection appears to not finish, check the Bare Metal services
logs for troubleshooting.

[user@undercloud]$ openstack baremetal node manage controller0


[user@undercloud]$ openstack baremetal node manage compute0
[user@undercloud]$ openstack baremetal node manage ceph0
[user@undercloud]$ openstack baremetal node list -c Name -c "Power State" \
-c "Provisioning State" -c Maintenance
+-------------+-------------+--------------------+-------------+
| Name | Power State | Provisioning State | Maintenance |
+-------------+-------------+--------------------+-------------+
| controller0 | power off | manageable | False |
| compute0 | power off | manageable | False |
| ceph0 | power off | manageable | False |

236 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Orchestrating an Overcloud

+-------------+-------------+--------------------+-------------+

[user@undercloud]$ openstack overcloud node introspect --all-manageable --provide


Started Mistral Workflow. Execution ID: 28ea0111-fac8-4298-8d33-1aaeb633f6b7
Waiting for introspection to finish...
Introspection for UUID 4fef49a8-fe55-4e96-ac26-f23f192a6408 finished successfully.
Introspection for UUID 099b3fd5-370d-465b-ba7d-e9a19963d0af finished successfully.
Introspection for UUID 5206cc66-b513-4b01-ac1b-cd2d6de06b7d finished successfully.
Introspection completed.
Started Mistral Workflow. Execution ID: 8a3ca1c5-a641-4e4c-8ef9-95ff9e35eb33

What happened during Introspection?


The managed nodes are configured to PXE boot by default. When introspection starts, IPMI (in
your classroom, the BMC emulation on the power node) is contacted to reboot the nodes. Each
node requests a DHCP address, a kernel, and a RAM disk to network boot, seen in the listing
below as bm-deploy-ramdisk and bm-deploy-kernel. This boot image extensively queries
and benchmarks the node, then reports the results to a Bare Metal listener on the undercloud,
which updates the Bare Metal database and the Object Store. The node is then shut down and is
available for the orchestration provisioning steps.

[user@undercloud]$ openstack image list


+--------------------------------------+------------------------+--------+
| ID | Name | Status |
+--------------------------------------+------------------------+--------+
| 7daae61f-18af-422a-a350-d9eac3fe9549 | bm-deploy-kernel | active |
| 6cee6ed5-bee5-47ef-96b9-3f0998876729 | bm-deploy-ramdisk | active |
...output omitted...

Introspecting Nodes
The following steps outline the process to introspect managed nodes from the undercloud.

1. Install the undercloud and verify available services.

2. Create separate provisioning and public networks.

3. Verify undercloud baremetal DHCP configuration and listeners.

4. Configure provisioning network with DNS nameserver location.

5. Upload baremetal and overcloud network boot images to the Image Service.

6. Check baremetal nodes for correct NIC and disk physical configuration.

7. Gather node MAC addresses, IPMI addresses, access user names and passwords.

8. Create and import an instack node registration file.

9. Set nodes to manageable status; invoke the introspection process.

10. Review reported node characteristics.

Orchestrating an Overcloud
The undercloud has obtained sizing and configuration information about each node through
introspection. Nodes can be dynamically assigned to overcloud roles (controller, compute,
ceph-storage, block-storage, or object-storage) by comparing each node to capability

CL210-RHOSP10.1-en-2-20171006 237

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

conditions set by the cloud administrator. Different roles usually have recognizable sizing
distinctions. In this classroom, the nodes are small-scale virtual machines that could be assigned
automatically, but assigning deployment roles manually is useful in many cases.

Overcloud deployment roles can be assigned in the orchestration templates by including


scheduler hints that direct the orchestration engine how to assign roles. Instead, here we assign
roles by creating profile tags to attach to flavors and nodes. First, create flavors for each of the
deployment roles with sufficient CPU, RAM, and disk for the role. The baremetal flavor is used
for building infrastructure servers other than one of the predefined roles, such as a Hadoop data-
processing application cluster. It is not mandatory to use the role name as the flavor name but it
is recommended for simplicity.

[user@undercloud]$ openstack flavor list -c Name -c RAM -c Disk -c Ephemeral -c VCPUs


+---------------+------+------+-----------+-------+
| Name | RAM | Disk | Ephemeral | VCPUs |
+---------------+------+------+-----------+-------+
| ceph-storage | 2048 | 10 | 0 | 1 |
| compute | 4096 | 20 | 0 | 1 |
| swift-storage | 2048 | 10 | 0 | 1 |
| control | 4096 | 30 | 0 | 1 |
| baremetal | 4096 | 20 | 0 | 1 |
| block-storage | 2048 | 10 | 0 | 1 |
+---------------+------+------+-----------+-------+

Add the correct profile tag to each flavor as a property using the capabilities index. Use the same
tag names when setting a profile on each node.

[user@undecloud]$ openstack flavor show control


+----------------------------+------------------------------------------------+
| Field | Value |
+----------------------------+------------------------------------------------+
| disk | 30 |
| id | a761d361-5529-4992-8b99-6f9b2f0a3a42 |
| name | control |
| properties | capabilities:boot_option='local', |
| | capabilities:profile='control', |
| | cpu_arch='x86_64', name='control' |
| ram | 4096 |
| vcpus | 1 |
...output omitted...

Add the correct matching profile tag to each node as a property using the capabilities index.

[user@undercloud]$ openstack baremetal node show controller0


+------------------------+----------------------------------------------------+
| Field | Value |
+------------------------+----------------------------------------------------+
| console_enabled | False |
| created_at | 2017-06-05T04:05:23+00:00 |
| driver | pxe_ipmitool |
| driver_info | {u'ipmi_password': u'******', u'ipmi_address': |
| | u'172.25.249.101', u'deploy_ramdisk': |
| | u'6cee6ed5-bee5-47ef-96b9-3f0998876729', |
| | u'deploy_kernel': u'7daae61f-18af- |
| | 422a-a350-d9eac3fe9549', u'ipmi_username': |
| | u'admin'} |
| name | controller0 |
| properties | {u'memory_mb': u'8192', u'cpu_arch': u'x86_64', |

238 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Orchestrating an Overcloud

| | u'local_gb': u'39', u'cpus': u'2', |


| | u'capabilities': |
| | u'profile:control,boot_option:local'} |
| uuid | 5206cc66-b513-4b01-ac1b-cd2d6de06b7d |
...output omitted...

Organizing orchestration templates


Red Hat OpenStack Platform director ships with a full set of working overcloud templates,
including many optional configuration environment files, in the /usr/share/openstack-
tripleo-heat-templates/ directory. During provisioning, the top-level file invoked is the
overcloud.j2.yaml file, referencing objects defined in the top-level overcloud-resource-
registry-puppet.j2.yaml file. These files are constructed using the Jinja2 template engine
for Python. The remaining template files are either YAML files or Puppet manifests. These
main files reference the remaining resource files or scripts, depending on the environment files
chosen.

[user@undercloud]$ ls -l /usr/share/openstack-tripleo-heat-templates/
-rw-r--r--. 1 root root 808 Jan 2 19:14 all-nodes-validation.yaml
-rw-r--r--. 1 root root 583 Jan 2 19:14 bootstrap-config.yaml
-rw-r--r--. 1 root root 20903 Jan 2 19:14 capabilities-map.yaml
drwxr-xr-x. 5 root root 75 May 19 13:22 ci
-rw-r--r--. 1 root root 681 Jan 2 19:14 default_passwords.yaml
drwxr-xr-x. 3 root root 128 May 19 13:22 deployed-server
drwxr-xr-x. 4 root root 168 May 19 13:22 docker
drwxr-xr-x. 4 root root 4096 May 19 13:22 environments
drwxr-xr-x. 6 root root 73 May 19 13:22 extraconfig
drwxr-xr-x. 2 root root 162 May 19 13:22 firstboot
-rw-r--r--. 1 root root 735 Jan 2 19:14 hosts-config.yaml
-rw-r--r--. 1 root root 325 Jan 2 19:14 j2_excludes.yaml
-rw-r--r--. 1 root root 2594 Jan 2 19:14 net-config-bond.yaml
-rw-r--r--. 1 root root 1895 Jan 2 19:14 net-config-bridge.yaml
-rw-r--r--. 1 root root 2298 Jan 2 19:14 net-config-linux-bridge.yaml
-rw-r--r--. 1 root root 1244 Jan 2 19:14 net-config-noop.yaml
-rw-r--r--. 1 root root 3246 Jan 2 19:14 net-config-static-bridge-with-external-
dhcp.yaml
-rw-r--r--. 1 root root 2838 Jan 2 19:14 net-config-static-bridge.yaml
-rw-r--r--. 1 root root 2545 Jan 2 19:14 net-config-static.yaml
drwxr-xr-x. 5 root root 4096 May 19 13:22 network
-rw-r--r--. 1 root root 25915 Jan 2 19:14 overcloud.j2.yaml
-rw-r--r--. 1 root root 13866 Jan 17 12:44 overcloud-resource-registry-puppet.j2.yaml
drwxr-xr-x. 5 root root 4096 May 19 13:22 puppet
-rw-r--r--. 1 root root 6555 Jan 17 12:44 roles_data.yaml
drwxr-xr-x. 2 root root 26 May 19 13:22 validation-scripts

Recommended practice is to copy this whole directory structure to a new working directory, to
ensure that local customizations are not overwritten by package updates. In this classroom, the
working directory is /home/stack/templates/. The environment subdirectory contains the
sample configuration files to choose features and configurations for this overcloud deployment.
Create a new environment working subdirectory and copy only the needed environment files into
it. Similarly, create a configuration working subdirectory and save any modified template files
into it. The subdirectories are cl210-environment and cl210-configuration.

The classroom configuration includes environment files to build trunked VLANs, statically
configured node IP address, an explicit Ceph server layout and more. The need for 3 NICs per
virtual machine required customizing existing templates, which were copied to the configuration
subdirectory before modification. Browse these files of interest to correlate template settings to
the live configuration:

CL210-RHOSP10.1-en-2-20171006 239

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

• templates/cl210-environment/30-network-isolation.yaml
• templates/cl210-environment/32-network-environment.yaml
• templates/cl210-configuration/single-nic-vlans/controller.yaml
• templates/cl210-configuration/single-nic-vlans/compute.yaml
• templates/cl210-configuration/single-nic-vlans/ceph-storage.yaml

The final step is to start the deployment, specifying the main working directories for templates
and environment files. Deployment time varies greatly, depending on the number of nodes
being deployed and the features selected. Orchestration processes tasks in dependency order.
Although many tasks may be running on different nodes simultaneously, some tasks must
finish before others can begin. This required structure is organized into a workflow plan, which
manages the whole provisioning orchestration process.

[user@demo ~]$ openstack overcloud deploy \


--templates /home/stack/templates \
--environment-directory /home/stack/templates/cl210-environment
Removing the current plan files
Uploading new plan files
Started Mistral Workflow. Execution ID: 7c29ea92-c54e-4d52-bfb1-9614a485fa2d
Plan updated
Deploying templates in the directory /tmp/tripleoclient-0_T1mA/tripleo-heat-templates
Started Mistral Workflow. Execution ID: 1edb7bb3-27f5-4b0a-a248-29bf949a4d57

The undercloud returns the status of the overcloud stack.

[user@undercloud]$ openstack stack list


+--------------------+------------+--------------------+----------------------+
| ID | Stack Name | Stack Status | Creation Time |
+--------------------+------------+--------------------+----------------------+
| 6ce5fe42-5d16-451d | overcloud | CREATE_IN_PROGRESS | 2017-06-05T06:05:35Z |
| -88f9-a89de206d785 | | | |
+--------------------+------------+--------------------+----------------------+

Monitor the orchestration process on the console where the deployment command was invoked.
Orchestration plans that do not complete can be corrected, edited and restarted. The following
text displays when the overcloud stack deployment is complete.

Stack overcloud CREATE_COMPLETE

Started Mistral Workflow. Execution ID: 6ab02187-fc99-4d75-8b45-8354c8826066


/home/stack/.ssh/unbeknownst updated.
Original contents retained as /home/stack/.ssh/known_hosts.old
Overcloud Endpoint: http://172.25.250.50:5000/v2.0
Overcloud Deployed

Query the undercloud about the status of the overcloud stack.

[user@undercloud]$ openstack stack list


+--------------------+------------+-----------------+----------------------+
| ID | Stack Name | Stack Status | Creation Time |
+--------------------+------------+-----------------+----------------------+
| 6ce5fe42-5d16-451d | overcloud | CREATE_COMPLETE | 2017-06-05T06:05:35Z |
| -88f9-a89de206d785 | | | |
+--------------------+------------+-----------------+----------------------+

240 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Orchestrating an Overcloud

What happened during Orchestration?


The managed nodes are configured to PXE boot by default. When orchestration starts, IPMI
(in our case, the BMC emulation on power node) is contacted to reboot the nodes. Each node
requests a DHCP address and network boot image, seen in the listing below as overcloud-
full-initrd and overcloud-full-vmlinuz. This boot image runs as an iSCSI target
server to configure and publish the node's boot disk as an iSCSI target. It then contacts the Bare
Metal conductor, which connects to that iSCSI LUN, partitions it, then overwrites it, copying
the overcloud-full image to become the node's boot disk. The system then prepares and
performs a reboot, coming up on this new, permanent disk.

[user@undercloud]$ openstack image list


+--------------------------------------+------------------------+--------+
| ID | Name | Status |
+--------------------------------------+------------------------+--------+
...output omitted...
| f5725232-7474-4d78-90b9-92f75fe84615 | overcloud-full | active |
| daca43d2-67a3-4333-896e-69761e986431 | overcloud-full-vmlinuz | active |
| 1e346cba-a7f2-4535-b9a6-d9fa0bf68491 | overcloud-full-initrd | active |
+--------------------------------------+------------------------+--------+

On the following page, Figure 6.1: Bare Metal boot disk provisioning visually describes the
procedure for delivering a new boot disk to a node being provisioned. The overcloud-full
image is a working Red Hat Enterprise Linux system with all of the Red Hat OpenStack Platform
and Red Hat Ceph Storage packages already installed but not configured. By pushing the
same overcloud-full image to all nodes, any node could be sent instructions to build any of
the supported deployment roles: Controller, Compute, Ceph-Storage, Image-Storage or Block-
Storage. When the node boots this image for the first time, the image is configured to send
a call back message to the Orchestration service to say that it is ready to be unconfigured.
Orchestration then coordinates the sending and processing of resource instructions and Puppet
invocations that accomplish the remainder of the build and configuration of the node. When
orchestration is complete, the result is a complete server running as one of the deployment roles.

CL210-RHOSP10.1-en-2-20171006 241

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

Figure 6.1: Bare Metal boot disk provisioning

242 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Completed Classroom Topology

Orchestrating an Overcloud
The following steps outline the process to orchestrate an overcloud from the undercloud.

1. Create the flavors for each node deployment role.

2. Assign matching profile tags to specify which nodes will be selected for which flavors.

3. Copy the default template directory, located at /usr/share/openstack-tripleo-


heat-templates/, to a new work directory.

4. Create the environment files required to customize the overcloud deployment.

5. Run the openstack overcloud deploy command. Use the --templates parameter to
specify the template directory. Use the --environment-directory parameter to specify
the environment file directory.

6. Use ssh to connect to each deployed node as the heat-admin user, to verify deployment.

7. Review the network interfaces, bridges, and disks to verify that each is correctly configured.

Completed Classroom Topology


On the following page, Figure 6.2: Completed classroom overcloud portrays four deployed nodes:
controller0, compute0, compute1, and ceph0. The compute1 node will be deployed later in
this chapter as an overcloud stack upgrade. Use this diagram as a reference when verifying the
live overcloud configuration.

CL210-RHOSP10.1-en-2-20171006 243

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

Figure 6.2: Completed classroom overcloud

244 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Completed Classroom Topology

References
The Ironic developer documentation page
https://docs.openstack.org/developer/ironic/

The Mistral documentation page


https://docs.openstack.org/developer/mistral/index.html

The Heat documentation page


https://docs.openstack.org/developer/heat/

Further information about Red Hat OpenStack Platform director is available in the


Director Installation & Usage guide for Red Hat OpenStack Platform 10; at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

Further information about template customization is available in the Advanced


Overcloud Customization guide for Red Hat OpenStack Platform 10; at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

CL210-RHOSP10.1-en-2-20171006 245

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

Guided Exercise: Configuring an Overcloud


Deployment

In this exercise, you will view the results of the deployment tasks that created the overcloud on
your virtual machines. You will verify the configuration and status of the undercloud, then verify
the configuration and status of the overcloud.

Outcomes
You should be able to:

• Connect to and observe the undercloud system.

• View an introspection configuration.

• View an orchestration configuration.

• View an overcloud deployment configuration.

Before you begin


Log in to workstation as student using student as the password.

On workstation, run the lab resilience-overcloud-depl setup command. The script


verifies that overcloud nodes are accessible and running the correct OpenStack services.

[student@workstation ~]$ lab resilience-overcloud-depl setup

Steps
1. Log in to director and review the environment.

1.1. Use the ssh command to connect to director. Review the environment file for the
stack user. OpenStack environment variables all begin with OS_.

[student@workstation ~]$ ssh stack@director


[stack@director ~]$ env | grep "OS_"
OS_IMAGE_API_VERSION=1
OS_PASSWORD=9ee0904a8dae300a37c4857222b10fb10a2b6db5
OS_AUTH_URL=https://172.25.249.201:13000/v2.0
OS_USERNAME=admin
OS_TENANT_NAME=admin
OS_NO_CACHE=True
OS_CLOUDNAME=undercloud

1.2. View the environment file for the stack user. This file is automatically sourced when
the stack user logs in. The OS_AUTH_URL variable in this file defines the Identity
Service endpoint of the undercloud.

[stack@director ~]$ grep "^OS_" stackrc


OS_PASSWORD=$(sudo hiera admin_password)
OS_AUTH_URL=https://172.25.249.201:13000/v2.0
OS_USERNAME=admin
OS_TENANT_NAME=admin
OS_BAREMETAL_API_VERSION=1.15
OS_NO_CACHE=True

246 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


OS_CLOUDNAME=undercloud

2. Review the network configuration for the undercloud.

2.1. Inspect the /home/stack/undercloud.conf configuration file. Locate the variables


that define the provisioning network, such as undercloud_admin_vip.

[DEFAULT]
local_ip = 172.25.249.200/24
undercloud_public_vip = 172.25.249.201
undercloud_admin_vip = 172.25.249.202
local_interface = eth0
masquerade_network = 172.25.249.0/24
dhcp_start = 172.25.249.51
dhcp_end = 172.25.249.59
network_cidr = 172.25.249.0/24
network_gateway = 172.25.249.200
inspection_iprange = 172.25.249.150,172.25.249.180
...output omitted...

2.2. Compare the IP addresses in the configuration file with the IP address assigned to the
virtual machine. Use the ip command to list the devices.

[stack@director ~]$ ip addr | grep -E 'eth1|br-ctlplane'


3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
inet 172.25.250.200/24 brd 172.25.250.255 scope global eth1
7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
inet 172.25.249.200/24 brd 172.25.249.255 scope global br-ctlplane
inet 172.25.249.202/32 scope global br-ctlplane
inet 172.25.249.201/32 scope global br-ctlplane

2.3. List the networks configured in the undercloud. If an overcloud is currently deployed,
then approximately six networks are displayed. If the overcloud has been deleted or
has not been deployed, only one network will display. Look for the provisioning network
named ctlplane. This display includes the subnets configured within the networks
listed. You will use the ID for the provisioning network's subnet in the next step.

[stack@director ~]$ openstack network list --long -c Name -c Subnets \


-c "Network Type"
+--------------+--------------------------------------+--------------+
| Name | Subnets | Network Type |
+--------------+--------------------------------------+--------------+
| external | 52af2265-5c3f-444f-b595-5cbdb56f434f | flat |
| tenant | 6d6f5f79-ed32-4c3c-8147-b7d84fb1e02c | flat |
| internal_api | 0e703161-c389-47b3-b6ab-e984e9b15bef | flat |
| storage_mgmt | 2ee6fb45-77bb-46b5-bb66-978d687b9558 | flat |
| ctlplane | 64f6a0a6-dc27-4c92-a81a-b294d1bb22a4 | flat |
| storage | 2d146a94-effc-461d-a38b-f7e4da319a2e | flat |
+--------------+--------------------------------------+--------------+

2.4. Display the subnet for the ctlplane provisioning network using the subnet ID obtained
in the previous step. The allocation_pools field is the DHCP scope, and the
dns_nameservers and gateway_ip fields are DHCP options, for an overcloud node's
provisioning network interface.

CL210-RHOSP10.1-en-2-20171006 247

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

[stack@director ~]$ openstack subnet show 64f6a0a6-dc27-4c92-a81a-b294d1bb22a4


+-------------------+------------------------------------------------------+
| Field | Value |
+-------------------+------------------------------------------------------+
| allocation_pools | 172.25.249.51-172.25.249.59 |
| cidr | 172.25.249.0/24 |
| created_at | 2017-06-12T19:37:40Z |
| description | |
| dns_nameservers | 172.25.250.200 |
| enable_dhcp | True |
| gateway_ip | 172.25.249.200 |
...output omitted...

3. List the services, and their passwords, installed for the undercloud.

3.1. List the undercloud services running on director.

[stack@director ~]$ openstack service list -c Name -c Type


+------------------+-------------------------+
| Name | Type |
+------------------+-------------------------+
| zaqar-websocket | messaging-websocket |
| heat | orchestration |
| swift | object-store |
| aodh | alarming |
| mistral | workflowv2 |
| ceilometer | metering |
| keystone | identity |
| nova | compute |
| zaqar | messaging |
| glance | image |
| ironic | baremetal |
| ironic-inspector | baremetal-introspection |
| neutron | network |
+------------------+-------------------------+

3.2. Review the admin and other component service passwords located in the /home/
stack/undercloud-passwords.conf configuration file. You will use various service
passwords in later exercises.

[auth]
undercloud_db_password=eb35dd789280eb196dcbdd1e8e75c1d9f40390f0
undercloud_admin_token=529d7b664276f35d6c51a680e44fd59dfa310327
undercloud_admin_password=96c087815748c87090a92472c61e93f3b0dcd737
undercloud_glance_password=6abcec10454bfeec6948518dd3de6885977f6b65
undercloud_heat_encryption_key=45152043171b30610cb490bb40bff303
undercloud_heat_password=a0b7070cd8d83e59633092f76a6e0507f85916ed
undercloud_neutron_password=3a19afd3302615263c43ca22704625db3aa71e3f
undercloud_nova_password=d59c86b9f2359d6e4e19d59bd5c60a0cdf429834
undercloud_ironic_password=260f5ab5bd24adc54597ea2b6ea94fa6c5aae326
...output omitted...

4. View the configuration used to prepare for deploying the overcloud and the resulting
overcloud nodes.

4.1. View the /home/stack/instackenv-initial.json configuration file. The file was


used to define each overcloud node, including power management access settings.

248 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


{
"nodes": [
{
"name": "controller0",
"pm_user": "admin",
"arch": "x86_64",
"mac": [ "52:54:00:00:f9:01" ],
"cpu": "2",
"memory": "8192",
"disk": "40",
"pm_addr": "172.25.249.101",
"pm_type": "pxe_ipmitool",
"pm_password": "password"
...output omitted...

4.2. List the provisioned nodes in the current overcloud environment. This command lists
the nodes that were created using the configuration file shown in the previous step.

[stack@director ~]$ openstack baremetal node list -c Name -c 'Power State' \


-c 'Provisioning State' -c 'Maintenance'
+-------------+-------------+--------------------+-------------+
| Name | Power State | Provisioning State | Maintenance |
+-------------+-------------+--------------------+-------------+
| controller0 | power on | active | False |
| compute0 | power on | active | False |
| ceph0 | power on | active | False |
+-------------+-------------+--------------------+-------------+

4.3. List the servers in the environment. Review the status and the IP address of the nodes.
This command lists the overcloud servers built on the bare-metal nodes defined in the
previous step. The IP address assigned to the nodes are reachable from the director
virtual machine.

[stack@director ~]$ openstack server list -c Name -c Status -c Networks


+-------------------------+--------+------------------------+
| Name | Status | Networks |
+-------------------------+--------+------------------------+
| overcloud-controller-0 | ACTIVE | ctlplane=172.25.249.52 |
| overcloud-compute-0 | ACTIVE | ctlplane=172.25.249.53 |
| overcloud-cephstorage-0 | ACTIVE | ctlplane=172.25.249.58 |
+-------------------------+--------+------------------------+

5. Using the controller0 node and the control role as an example, review the settings that
define how a node is selected to be built for a server role.

5.1. List the flavors created for each server role in the environment. These flavors were
created to define the sizing for each deployment server role. It is recommended practice
that flavors are named for the roles for which they are used. However, properties set on
a flavor, not the flavor's name, determine its use.

[stack@director ~]$ openstack flavor list -c Name -c RAM -c Disk -c Ephemeral \


-c VCPUs
+---------------+------+------+-----------+-------+
| Name | RAM | Disk | Ephemeral | VCPUs |
+---------------+------+------+-----------+-------+

CL210-RHOSP10.1-en-2-20171006 249

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

| ceph-storage | 2048 | 10 | 0 | 1 |
| compute | 4096 | 20 | 0 | 1 |
| swift-storage | 2048 | 10 | 0 | 1 |
| control | 4096 | 30 | 0 | 1 |
| baremetal | 4096 | 20 | 0 | 1 |
| block-storage | 2048 | 10 | 0 | 1 |
+---------------+------+------+-----------+-------+

5.2. Review the control flavor's properties by running the openstack flavor show
command. The capabilities settings include the profile='control' tag.
When this flavor is specified, it will only work with nodes that match these requested
capabilities, including the profile='control' tag.

[stack@director ~]$ openstack flavor show control


+----------------------------+------------------------------------------------+
| Field | Value |
+----------------------------+------------------------------------------------+
| disk | 30 |
| id | a761d361-5529-4992-8b99-6f9b2f0a3a42 |
| name | control |
| properties | capabilities:boot_option='local', |
| | profile='control', cpu_arch='x86_64', |
| ram | 4096 |
| vcpus | 1 |
...output omitted...

5.3. Review the controller0 node's properties field. The capabilities settings
include the same profile:control tag as defined on the control flavor. When a
flavor is specified during deployment, only a node that matches a flavor's requested
capabilities is eligible to be selected for deployment.

[stack@director ~]$ openstack baremetal node show controller0


+------------------------+----------------------------------------------------+
| Field | Value |
+------------------------+----------------------------------------------------+
| console_enabled | False |
| created_at | 2017-06-05T04:05:23+00:00 |
| driver | pxe_ipmitool |
| driver_info | {u'ipmi_password': u'******', u'ipmi_address': |
| | u'172.25.249.101', u'deploy_ramdisk': |
| | u'6cee6ed5-bee5-47ef-96b9-3f0998876729', |
| | u'deploy_kernel': u'7daae61f-18af- |
| | 422a-a350-d9eac3fe9549', u'ipmi_username': |
| | u'admin'} |
| extra | {u'hardware_swift_object': u'extra_hardware- |
| | 5206cc66-b513-4b01-ac1b-cd2d6de06b7d'} |
| name | controller0 |
| properties | {u'memory_mb': u'8192', u'cpu_arch': u'x86_64', |
| | u'local_gb': u'39', u'cpus': u'2', |
| | u'capabilities':boot_option:local', |
| | u'profile:control} |
| uuid | 5206cc66-b513-4b01-ac1b-cd2d6de06b7d |
...output omitted...

6. Review the template and environment files that were used to define the deployment
configuration.

6.1. Locate the environment files used for the overcloud deployment.

250 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


[stack@director ~]$ ls ~/templates/cl210-environment/
00-node-info.yaml 32-network-environment.yaml 50-pre-config.yaml
20-storage-environment.yaml 34-ips-from-pool-all.yaml 60-post-config.yaml
30-network-isolation.yaml 40-compute-extraconfig.yaml

6.2. Locate the configuration files used for the overcloud deployment.

[stack@director ~]$ ls ~/templates/cl210-configuration/single-nic-vlans/


ceph-storage.yaml compute.yaml controller.yaml

6.3. Review the /home/stack/templates/cl210-environment/30-network-


isolation.yaml environment file that defines the networks and VLANs for each
server. For example, this partial output lists the networks (port attachments) to be
configured on a node assigned the controller role.

...output omitted...
# Port assignments for the controller role
OS::TripleO::Controller::Ports::ExternalPort: ../network/ports/external.yaml
OS::TripleO::Controller::Ports::InternalApiPort: ../network/ports/internal...
OS::TripleO::Controller::Ports::StoragePort: ../network/ports/storage.yaml
OS::TripleO::Controller::Ports::StorageMgmtPort: ../network/ports/storage_...
OS::TripleO::Controller::Ports::TenantPort: ../network/ports/tenant.yaml
...output omitted...

6.4. Review the /home/stack/templates/cl210-environment/32-network-


environment.yaml environment file that defines the overall network configuration
for the overcloud. For example, this partial output lists the IP addressing used for the
Internal API VLAN.

...output omitted...
# Internal API - used for private OpenStack services traffic
InternalApiNetCidr: '172.24.1.0/24'
InternalApiAllocationPools: [{'start': '172.24.1.60','end': '172.24.1.99'}]
InternalApiNetworkVlanID: 10
InternalApiVirtualFixedIPs: [{'ip_address':'172.24.1.50'}]
RedisVirtualFixedIPs: [{'ip_address':'172.24.1.51'}]
...output omitted...

6.5. View the /home/stack/templates/cl210-configuration/single-nic-vlans/


controller.yaml configuration file that defines the network interfaces for the
controller0 node. For example, this partial output shows the Internal API interface,
using variables seen previously in the 32-network-environment.yaml file.

...output omitted...
type: vlan
# mtu: 9000
vlan_id: {get_param: InternalApiNetworkVlanID}
addresses:
-
ip_netmask: {get_param: InternalApiIpSubnet}
...output omitted...

CL210-RHOSP10.1-en-2-20171006 251

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

6.6. View the /home/stack/templates/cl210-configuration/single-nic-vlans/


compute.yaml configuration file that defines the network interfaces for the compute0
node. This partial output shows that compute nodes also uses the Internal API VLAN.

...output omitted...
type: vlan
# mtu: 9000
vlan_id: {get_param: InternalApiNetworkVlanID}
addresses:
-
ip_netmask: {get_param: InternalApiIpSubnet}
...output omitted...

6.7. View the /home/stack/templates/cl210-configuration/single-nic-vlans/


ceph-storage.yaml configuration file that defines the network interfaces for the
ceph0 node. This partial output shows that Ceph nodes connect to the Storage VLAN.

...output omitted...
type: vlan
# mtu: 9000
vlan_id: {get_param: StorageNetworkVlanID}
addresses:
-
ip_netmask: {get_param: StorageIpSubnet}
...output omitted...

7. Confirm the successful completion of the overcloud deployment.

7.1. Review the status of the stack named overcloud.

[stack@director ~]$ openstack stack list -c "Stack Name" -c "Stack Status" \


-c "Creation Time"
+------------+-----------------+----------------------+
| Stack Name | Stack Status | Creation Time |
+------------+-----------------+----------------------+
| overcloud | CREATE_COMPLETE | 2017-06-12T19:46:07Z |
+------------+-----------------+----------------------+

7.2. Source the overcloudrc authentication environment file. The OS_AUTH_URL variable
in this file defines the Identity Service endpoint of the overcloud.

[stack@director ~]$ source overcloudrc


[stack@director ~]$ env | grep "OS_"
OS_PASSWORD=y27kCBdDrqkkRHuzm72DTn3dC
OS_AUTH_URL=http://172.25.250.50:5000/v2.0
OS_USERNAME=admin
OS_TENANT_NAME=admin
OS_NO_CACHE=True
OS_CLOUDNAME=overcloud

7.3. List the services running on the overcloud.

[stack@director ~]$ openstack service list -c Name -c Type


+------------+----------------+

252 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


| Name | Type |
+------------+----------------+
| keystone | identity |
| cinderv3 | volumev3 |
| neutron | network |
| cinder | volume |
| cinderv2 | volumev2 |
| swift | object-store |
| aodh | alarming |
| glance | image |
| nova | compute |
| gnocchi | metric |
| heat | orchestration |
| ceilometer | metering |
+------------+----------------+

7.4. Review general overcloud configuration. This listing contains default settings, formats,
and core component version numbers. The currently empty network field displays
networks created, although none yet exist in this new overcloud.

[stack@director ~]$ openstack configuration show


+---------------------------------+--------------------------------+
| Field | Value |
+---------------------------------+--------------------------------+
| alarming_api_version | 2 |
| api_timeout | None |
| auth.auth_url | http://172.25.250.50:5000/v2.0 |
| auth.password | <redacted> |
| auth.project_name | admin |
| auth.username | admin |
| auth_type | password |
...output omitted...
| networks | [] |
...output omitted...

7.5. Exit from director.

[stack@director ~]$ exit


[student@workstation ~]$

CL210-RHOSP10.1-en-2-20171006 253

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

Scaling Compute Nodes

Objective
After completing this section, students should be able to add a compute node to the overcloud
using the undercloud.

Scaling
An important feature of cloud computing is the ability to rapidly scale up or down an
infrastructure. Administrators can provision their infrastructure with nodes that can fulfill
multiple roles (for example, computing, storage, or controller) and can be pre-installed with a
base operating system. Administrators can then integrate these nodes into their environment
as needed. Cloud computing provides services that are automatically able to take into account
the increase or decrease in load usage, and adequately warn the administrators in case the
environment needs to be scaled. In traditional computing models, it is often required to manually
install, configure, and integrate new servers into existing environments, thus requiring extra
time and effort to provision the node. Autoscaling is one of the main benefits that the cloud-
computing model provides, as it permits, for example, quick response to load spikes.

Red Hat OpenStack Platform director, with the Heat orchestration service, implements scaling
features. Administrators can rerun the command used to deploy an overcloud, increasing
or decreasing the roles based on infrastructure requirements. For example, the overcloud
environment can scale by adding two additional compute nodes, bringing the total to three.
Red Hat OpenStack Platform director then automatically reviews the current configuration
and reconfigures the available services to provision the OpenStack environment with the three
compute nodes.

Heat Orchestration Service


The Orchestration service provides a template-based orchestration engine for the undercloud,
which can be used to create and manage resources such as storage, networking, instances, and
applications as a repeatable running environment.

Templates are used to create stacks, which are collections of resources (for example, instances,
floating IPs, volumes, security groups, or users). The Orchestration service offers access to all
the undercloud core services through a single modular template, with additional orchestration
capabilities such as autoscaling and basic high availability.

An Orchestration stack is a collection of multiple infrastructure resources deployed and managed


through the same interface. Stacks can be used to standardize and speed up delivery, by
providing a unified human-readable format. While the Heat project started as an analog of AWS
CloudFormation, making it compatible with the template formats used by CloudFormation (CFN),
it also supports its own native template format, called HOT, for Heat Orchestration Templates.
The undercloud provides a collection of Heat templates in order to deploy the different overcloud
elements.

254 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Heat Orchestration Service

Note
Administrators must give a special role to OpenStack users that allows them to manage
stacks. The role name is defined by the heat_stack_user_role variable in /etc/
heat/heat.conf. The default role name is heat_stack_user.

A Heat template is written using YAML syntax, and has three major sections:

1. Parameters: Input parameters provided when deploying from the template.

2. Resources: Infrastructure elements to deploy, such as virtual machines or network ports.

3. Outputs: Output parameters dynamically generated by the Orchestration service; for


example, the public IP and an instance that has been deployed using the template.

The openstack command supports stack management, including commands shown in the
following table.

Heat Stack Management with the openstack Command


Command Description
stack create Create a stack.
stack list List the user's stacks.
stack show Show the details for a stack.
stack delete Delete a stack.
resource list Show the list of resources created by a stack. The -n option is
STACKNAME used to specify the depth of nested stacks for which resources
are to be displayed.
deployment list List the software deployed and its deployment ID.
deployment show ID Show the details for the software components being deployed.

Troubleshooting the Heat orchestration service requires administrators to understand how the
underlying infrastructure has been configured, since Heat makes use of these resources in order
to create the stack. For example, when creating an instance, the Orchestration service is invoked
the same way users would the Compute service API, through Identity authentication. When a
network port is requested by the OpenStack Networking service, the requests are also made to
the API through the Identity service. This means the infrastructure needs to be configured and
working. Administrators must ensure that the resources requested through Heat can also be
requested manually. Orchestration troubleshooting includes:

• Ensuring all undercloud services that the templates refer to are configured and running.

• Ensuring the resources, such as images or key pairs, exist.

• Ensuring the infrastructure has the capacity to deploy the stack.

After the troubleshooting has completed, administrators can review the configuration of
Orchestration services:

• Orchestration service API configuration.

• Identity service configuration.

CL210-RHOSP10.1-en-2-20171006 255

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

Figure 6.3: The Orchestration service

Orchestration Terminology
The following table lists the terms that administrators should be familiar with to properly
administer their cloud with the Orchestration service.

Heat Terminology Table


Term Definition
Orchestration service API The Orchestration service API server provides a
REST API that forwards orchestration requests to the
Orchestration service engine using remote procedure
calls RPCs.
Orchestration service engine The service that applies templates and orchestrates the
creation and launch of cloud resources. It reports event
status back to the API customer.
YAML The YAML format is a human-readable data serialization
language. Orchestration templates are YAML-based,
providing administrators with a convenient way to
manage their cloud infrastructure.

256 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Bare Metal Provisioning Service (Ironic)

Term Definition
Heat Orchestration Template (HOT) A Heat Orchestration Template (HOT) is a YAML-based
configuration file that administrators write and pass
to the Orchestration service API to deploy their cloud
infrastructure. HOT is a template format designed to
replace the legacy Orchestration CloudFormation-
compatible format (CFN).
CloudFormation template (CFN) CFN is a legacy template format used by Amazon AWS
services. The heat-api-cfn service manages this
legacy format.
Orchestration template parameters Orchestration template parameters are settings passed
to the Orchestration service that provide a way to
customize a stack. They are defined in a Heat template
file, with optional default values used when values are
not passed. These are defined in the parameters
section of a template.
Orchestration template resources Orchestration template resources are the specific
objects that are created and configured as part of
a stack. OpenStack contains a set of core resources
that span all components. These are defined in the
resources section of a Heat template.
Orchestration template outputs Orchestration template outputs are values, defined
in a Heat template file, that are returned by the
Orchestration service after a stack is created. Users can
access these values either through the Orchestration
service API or client tools. These are defined in the
output section of a template.

Bare Metal Provisioning Service (Ironic)


The Red Hat OpenStack Platform bare metal provisioning service, Ironic, supports the
provisioning of both virtual and physical machines to be used for the overcloud deployment.

All the information about a node is retrieved through a process called introspection. After
introspection has completed, it is ready to be used to deploy overcloud services. The Bare Metal
service makes use of the different services included in the undercloud, to deploy the overcloud
services. The Bare Metal service supports different drivers to run the introspection process,
based on what the environment hardware supports (for example, IPMI, DRAC).

The following table includes the most common openstack baremetal commands for
provisioning a new node in Red Hat OpenStack Platform director.

Bare Metal Management with the ironic Command


Command Description
node list List nodes registered with Ironic
node show Show node details
node set Update node information
node maintenance set Change the maintenance state for a node

CL210-RHOSP10.1-en-2-20171006 257

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

Scaling Compute Nodes


To scale the overcloud with additional compute nodes, the following pre-deployment steps are
required:

• Import an appropriate overcloud environment JSON environment file.

• Run introspection of the additional compute node(s).

• Update the appropriate properties.

• Deploy the overcloud.

Import the Overcloud Environment JSON File


A node definition template file, instackenv.json, is required to define the overcloud node.
This file contains the hardware and power management details for the overcloud node.

{
"nodes": [
{
"pm_user": "admin",
"arch": "x86_64",
"name": "compute1",
"pm_addr": "172.25.249.112",
"pm_password": "password",
"pm_type": "pxe_ipmitool",
"mac": [
"52:54:00:00:f9:0c"
],
"cpu": "2",
"memory": "6144",
"disk": "40"
}
]
}

The node description contains the following required fields:

• pm_type: Power management driver to be used by the nodes

• pm_addr: Power management server address

• pm_user, pm_password: Power management server user name and password used to access it

The following are optional fields used when the introspection has completed:

• mac: List of MAC addresses of the overcloud nodes

• cpu: Number of CPUs in these nodes

• arch: CPU architecture

• memory: Memory size in MiB

• disk: Hard disk size in GiB

• capabilities: Ironic node capabilities

258 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Scaling Compute Nodes

"capabilities": "profile:compute,boot_option:local"

There are various Ironic drivers provided for power management, which include:

• pxe_ipmitool: Driver that uses the ipmitool utility to manage nodes.

• pxe_ssh: Driver which can be used in a virtual environment. It uses virtualized environment
commands to power on and power off the VMs over SSH.

• pxe_ilo: Used on HP servers with iLO interfaces.

• pxe_drac: Used on DELL servers with DRAC interfaces.

• fake_pxe: All power management for this driver requires manual intervention. It can be used as
a fallback for unusual or older hardware.

To import the instackenv.json file, use the openstack baremetal import command.

[user@undercloud]$ source stackrc


[user@undercloud]$ openstack baremetal import --json instackenv.json

Introspection of Overcloud Nodes


Introspection of nodes allows for collecting system information such as CPU count, memory,
disk space, and network interfaces. Introspection allows advanced role matching, which ensures
that correct roles are allocated to the most appropriate nodes. In cases where advanced role
matching with Advanced Health Check (AHC) is not performed, manual tagging can be used to set
the profile and extra capabilities for the nodes.

[user@undercloud]$ openstack baremetal node set \


--property "capabilities=profile:compute,boot_option:local"

When overcloud nodes are booted into the introspection stage, they are provided with the
discovery images by the ironic-inspector service located under /httpboot. The import
process assigns each node the bm_deploy_kernel and bm_deploy_ramdisk images automatically.
Manual use of openstack baremetal configure boot is no longer needed. In the following
output, verify that deploy_kernel and deploy_ramdisk are assigned to the new nodes.

[user@undercloud]$ openstack baremetal node show compute2 | grep -A1 deploy


| driver_info | {u'ssh_username': u'stack',
u'deploy_kernel': u'7bfa6b9e-2d2a-42ab-ac5d- |
| | 7b7db9370982', u'deploy_ramdisk': |
| | u'd402e2a9-a782-486f-8934-6c20b31c92d3',
| | u'ssh_key_contents': u'----- |

To introspect the hardware attributes of all registered nodes, run the command openstack
baremetal introspection bulk start.

[user@undercloud]$ openstack baremetal introspection bulk start


Started Mistral Workflow. Execution ID: d9191784-e730-4179-9cc4-a73bc31b5aec
Waiting for introspection to finish...

CL210-RHOSP10.1-en-2-20171006 259

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

To limit the introspection to nodes that are in the manageable provision state, use the --
all-manageable --provide options with the openstack baremetal introspection
command.

[user@undercloud]$ openstack baremetal introspection --all-manageable --provide

Monitor and troubleshoot the introspection process with the following command.

[user@undercloud]$ sudo journalctl -l -u openstack-ironic-inspector -u openstack-ironic-


inspector-dnsmasq -u openstack-ironic-conductor -f

Creating Flavors and Setting Appropriate Properties


Red Hat OpenStack Platform director requires flavors to provision the overcloud nodes. The
overcloud Orchestration templates look for a fallback flavor named baremetal. A flavor must be
created, and can be used to specify the hardware used to create the overcloud nodes.

[user@undercloud]$ openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 \
baremetal

The capabilities, such as boot_option for the flavors, must be set to the boot_mode for the flavor,
and the profile defines the node profile to use with the flavor.

[user@undercloud]$ openstack flavor set \


--property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" \
--property "capabilities:profile"="compute" compute

Deploy Overcloud
Red Hat OpenStack Platform undercloud uses the Orchestration service to orchestrate the
deployment of the overcloud with a stack definition. These Orchestration templates can be
customized to suit various deployment patterns. The stack templates define all resources
required for the deployment, and maintain the dependencies for these resource deployments.

Red Hat OpenStack Platform can deploy these nodes:

• control: A node with the controller role.

• compute: A node on which the Compute instances are run.

• ceph-storage: A node that runs the Ceph OSDs. Monitors run on the controller node.

• block-storage: A dedicated node providing the Block Storage service (Cinder).

• object-storage: A dedicated node with the Object Storage service (Swift).

The overcloud is deployed using the openstack overcloud deploy command.

[user@undercloud]$ openstack overcloud deploy \


--templates ~/templates \
--environment-directory ~/templates/cl210-environment

• --templates: Must specify the template location. If no location is specified, the default template
location of /usr/share/openstack-tripleo-heat-templates is used.

260 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Scaling Compute Nodes

• --environment-directory: Specifies the directory location of the environment Orchestration


templates to be processed. This reduces the complexity of the deployment syntax by not
requiring every template to be listed individually.

Note
The --compute-scale deployment option is deprecated in Red Hat OpenStack
Platform 10 (Newton) in favor of using an environment file. Administrators can define
number of nodes to scale out in an environment file and supply that environment file
to overcloud deployment stack. All the --*-scale deployment parameters, which
includes --compute-scale,--swift-storage-scale, --block-storage-
scale, and --ceph-storage-scale, will be discontinued in a future Red Hat
OpenStack Platform release.

Phases of Overcloud node deployment


Registration:
• The stack user uploads information about additional overcloud nodes. The information
includes credentials for power management.

• The information is saved in the Ironic database and used during the introspection phase.

Introspection:
• The Bare Metal service uses PXE (Preboot eXecution Environment) to boot nodes over a
network.

• The Bare Metal service connects to the registered nodes to gather more details about the
hardware resources.

• The discovery kernel and ramdisk images are used during this process.

Deployment:
• The stack user deploys overcloud nodes, allocating resources and nodes that were discovered
during the introspection phase.

• Hardware profiles and Orchestration templates are used during this phase.

Registering an Overcloud Node


Registering an overcloud node consists of adding it to the Bare Metal service list for possible
nodes for the overcloud. The undercloud needs the following information to register a node:

• The type of power management, such as IPMI or PXE over SSH, being used. The various
power management drivers supported by the Bare Metal service can be listed using ironic
driver-list.

• The power management IP address for the node.

• The credentials to be used for the power management interface.

• The MAC address for the NIC on the PXE/provisioning network.

• The kernel and ramdisk used for introspection.

CL210-RHOSP10.1-en-2-20171006 261

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

All of this information can be passed using a JSON (JavaScript Object Notation) file or using a
CSV file. The openstack baremetal import command imports this file into the Bare Metal
service database.

[user@undercloud]$ openstack baremetal import --json instackenv.json

For the introspection and discovery of overcloud nodes, the Bare Metal service uses PXE (Preboot
eXecution Environment), provided by the undercloud. The dnsmasq service is used to provide
DHCP and PXE capabilities to the Bare Metal service. The PXE discovery images are delivered
over HTTP. Prior to introspection, the registered nodes must have a valid kernel and ramdisk
assigned to them, and every node for introspection has the following settings:

• Power State set to power off.

• Provision State set to manageable.

• Maintenance set to False.

• Instance UUID set to None.

The openstack baremetal introspection command is used to start the introspection, and
--all-manageable --provide informs the Bare Metal service to perform introspection on
nodes that are in the manageable provision state.

[user@undercloud]$ openstack baremetal introspection --all-manageable --provide


Started Mistral Workflow. Execution ID: d9191784-e730-4179-9cc4-a73bc31b5aec
Waiting for introspection to finish...

Overcloud Deployment Roles


After introspection, the undercloud knows which nodes are used for the deployment of the
overcloud, but it may not know what overcloud node types are to be deployed. Flavors are used
to assign node deployment roles, and they correspond to the overcloud node types:

• control: for a controller node

• compute: for a compute node

• ceph-storage: for a Ceph storage node

• block-storage: a Cinder storage node

• object-storage: a Swift storage node

The undercloud uses the baremetal hard-coded flavor, which must be set as the default flavor
for any unused roles; otherwise, the role-specific flavors are used.

[user@undercloud]$ openstack flavor create --id auto --ram 6144 --disk 38 --vcpus 2 \
baremetal
[user@undercloud]$ openstack flavor create --id auto --ram 6144 --disk 38 --vcpus 2 \
compute

The undercloud performs automated role matching to apply appropriate hardware for each flavor
of node. When nodes are on identical hardware and no flavors are created, the deployment roles

262 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Scaling Compute Nodes

are randomly chosen for each node. Manual tagging can also be used to tie the deployment role
to a node.

To use these deployment profiles, they need to be associated to the respective flavors using the
capabilities:profile property. The capabilities:boot_option property is required to
set the boot mode for flavors.

Scaling Overcloud Compute Nodes


The following steps outline the process for adding an additional compute node to the overcloud.

1. Modify any applicable Orchestration templates located in /home/stack/templates/


cl210-environment directory on the undercloud node.

2. On the undercloud node, create an instackenv.json file containing definitions for the
additional compute node.

3. Import the instackenv.json file using the command openstack baremetal import.

4. Assign boot images to the additional compute node using the command openstack
baremetal configure boot.

5. Set the provisioning state to manageable using the command openstack baremetal
node manage.

6. Use the command openstack overcloud node introspect --all-manageable --


provide to begin introspection.

7. After introspection has completed successfully, update the node profile to use the compute
role.

8. Deploy the overcloud with the command openstack overcloud deploy --templates
~/templates --environment-directory ~/templates/cl210-environment.

References
Further information is available for Adding Additional Nodes in the Director Installation
and Usage guide for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

CL210-RHOSP10.1-en-2-20171006 263

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

Guided Exercise: Scaling Compute Nodes

In this exercise, you will add a compute node to the overcloud.

Resources
Files: http://materials.example.com/instackenv-onenode.json

Outcomes
You should be able to add a compute node to the overcloud.

Before you begin


Log into workstation as student with a password of student.

On workstation, run the lab resilience-scaling-nodes setup command. This ensures


that the OpenStack services are running and the environment has been properly configured for
this lab.

[student@workstation ~]$ lab resilience-scaling-nodes setup

Steps
1. Use SSH to connect to director as the user stack and source the stackrc credentials
file.

1.1. From workstation, use SSH to connect to director as the user stack and source
the stackrc credentials file.

[student@workstation ~]$ ssh stack@director


[stack@director ~]$ source stackrc

2. Prepare compute1 for introspection.

2.1. Download the instackenv-onenode.json file from http://


materials.example.com, for introspection of compute1, to /home/stack.

[stack@director ~]$ wget http://materials.example.com/instackenv-onenode.json

2.2. Verify that the instackenv-onenode.json file is for compute1.

[stack@director ~]$ cat ~/instackenv-onenode.json


{
"nodes": [
{
"pm_user": "admin",
"arch": "x86_64",
"name": "compute1",
"pm_addr": "172.25.249.112",
"pm_password": "password",
"pm_type": "pxe_ipmitool",
"mac": [
"52:54:00:00:f9:0c"
],
"cpu": "2",

264 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


"memory": "6144",
"disk": "40"
}
]
}

2.3. Import instackenv-onenode.json into the Bare Metal service using the command
openstack baremetal import, and ensure that the node has been properly
imported.

[stack@director ~]$ openstack baremetal import --json \


/home/stack/instackenv-onenode.json
Started Mistral Workflow. Execution ID: 8976a32a-6125-4c65-95f1-2b97928f6777
Successfully registered node UUID b32d3987-9128-44b7-82a5-5798f4c2a96c
Started Mistral Workflow. Execution ID: 63780fb7-bff7-43e6-bb2a-5c0149bc9acc
Successfully set all nodes to available
[stack@director ~]$ openstack baremetal node list \
-c Name -c 'Power State' -c 'Provisioning State' -c Maintenance
+-------------+--------------------+-------------+-------------+
| Name | Provisioning State | Power State | Maintenance |
+-------------+--------------------+-------------+-------------+
| controller0 | active | power on | False |
| compute0 | active | power on | False |
| ceph0 | active | power on | False |
| compute1 | available | power off | False |
+-------------+--------------------+-------------+-------------+

2.4. Prior to starting introspection, set the provisioning state for compute1 to manageable.

[stack@director ~]$ openstack baremetal node manage compute1

3. Begin introspection of compute1.

3.1. Initiate introspection of compute1. Introspection may take few minutes.

[stack@director ~]$ openstack overcloud node introspect \


--all-manageable --provide
Started Mistral Workflow. Execution ID: d9191784-e730-4179-9cc4-a73bc31b5aec
Waiting for introspection to finish...
...output omitted...

4. Update the node profile for compute1.

4.1. Update the node profile for compute1 to assign it the compute profile.

[stack@director ~]$ openstack baremetal node set compute1 \


--property "capabilities=profile:compute,boot_option:local"

5. Configure 00-node-info.yaml to scale two compute nodes.

5.1. Edit /home/stack/templates/cl210-environment/00-node-info.yaml to


scale to two compute nodes.

...output omitted...

CL210-RHOSP10.1-en-2-20171006 265

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

ComputeCount: 2
...output omitted...

6. Deploy the overcloud, to scale out the compute nodes.

6.1. Deploy the overcloud, to scale out compute node by adding one more node.

The deployment may take 40 minutes or more to complete.

[stack@director ~]$ openstack overcloud deploy \


--templates ~/templates \
--environment-directory ~/templates/cl210-environment
Removing the current plan files
Uploading new plan files
Started Mistral Workflow. Execution ID: 6de24270-c3ed-4c52-8aac-820f3e1795fe
Plan updated
Deploying templates in the directory /tmp/tripleoclient-WnZ2aA/tripleo-heat-
templates
Started Mistral Workflow. Execution ID: 50f42c4c-d310-409d-8d58-e11f993699cb
...output omitted...

Cleanup
From workstation, run the lab resilience-scaling-nodes cleanup command to clean
up this exercise.

[student@workstation ~]$ lab resilience-scaling-nodes cleanup

266 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Migrating Instances using Block Storage

Migrating Instances using Block Storage

Objectives
After completing this section, students should be able to:

• Describe the principle concepts and terminology of migration

• Describe use cases for implementing block-based live migration

• Configure block-based live migration

Introduction to Migration
Migration is the process of moving a server instance from one compute node to another. In this
and the following section of this chapter, the major lecture topic is live migration. Live migration
relocates a server instance (virtual machine) from one compute node hypervisor to another
while the server application is running, offering uninterrupted service. This section discusses
the method known as block-based live migration and the next section discusses an alternative
method known as shared storage live migration. First, however, it is important to define what
is meant by migration, because one of the primary design goals of cloud architecture is to
eliminate the need for legacy server management techniques, including many former use cases
for migration.

A major feature of cloud-designed applications is that they are resilient, scalable, distributed
and stateless; commonly implemented in what is known as a microservices architecture. A
microservice application scales, relocates, and self-repairs by deploying itself as replicated
components instantiated as virtual machines or containers across many compute nodes, cells,
zones, and regions. Applications designed this way share live state information such that the loss
of any single component instance has little or no affect on the application or the service being
offered. By definition, microservice cloud applications do not need to perform live migration. If a
microservices component is to be relocated for any reason, a new component is instantiated in
the desired location from the appropriate component image. The component joins the existing
application and begins work while the unwanted component instance is simply terminated.

Legacy applications, also referred to as enterprise applications, may also include resilient,
scalable, and distributed features, but are distinguished by their need to act stateful. Enterprise
application server instances cannot be terminated and discarded without losing application state
or data, or corrupting data storage structures. Such applications must be migrated to relocate
from one compute node to another.

The simplest form of migration is cold migration. In legacy computing, a virtual machine is
shut down, preserving configuration and state on its assigned disks, then rebooted on another
hypervisor or in another data center after relocating the physical or virtual disks. This same
concept remains available in OpenStack today. Cold migration is accomplished by taking an
instance snapshot on a running, quiesced instance, then saving the snapshot as an image. As
with legacy computing, the image is relocated and used to boot a new instance. The original
instance remains in service, but the state transferred to the new instance only matches that
which existed when the snapshot was taken.

CL210-RHOSP10.1-en-2-20171006 267

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

Block Based Live Migration


Live migration transfers a running instance from its current location to a new compute
node, maintaining active client connections and performing work during the migration. The
current state of the original instance is transferred to the new instance. Applications and
users communicating with the applications and services on this instance should not detect
interruption, other than some slight delays discernible at the moment of the final hand off to
the new instance. Restated, live migration involves transferring memory-based active kernel and
process structures from one virtual machine to another, with the destination taking over the
activity while the source is eventually discarded.

What about the disks on the source virtual machine, such as the root disk, extra ephemeral disks,
swap disk, and persistent volumes? These disks also must be transferred and attached to the
destination virtual machine. The method used, block based or shared storage, is directly related
to the overcloud storage architecture that is implemented. With the shared storage method, if
both the source and destination compute nodes connect to and have sufficient access privileges
for the same shared storage locations containing the migrating instance's disks, then no physical
disk movement occurs. The source compute node stops using the disks while the destination
compute node takes over disk activity.

Block-based live migration is the alternate method used when shared storage is not
implemented. When the source and destination compute nodes do not share common-access
storage, the root, ephemeral, swap and persistent volumes must be transferred to the storage
location used by the destination compute node. When performance is a primary focus, block-
based live migration should be avoided. Instead, implement shared storage structures across
common networks where live migration occurs regularly.

Block-based Live Migration Use Cases


Red Hat recommended practice for overcloud deployment is to install shared storage using a
Ceph RBD storage cluster, but earlier use cases offered various configurations for managing
instance disks:

• Original, proof of concept installations, such as default Packstack installations, used the
Compute service (Nova) to manage non-persistent root disks, ephemeral disks, and swap disks.
Instance virtual disks managed by the Compute service are found in subdirectories in /var/
lib/nova/instances on each compute node's own disk.

• Although /var/lib/nova/instances can be shared across compute nodes using GlusterFS


or NFS, the default configuration had each compute node maintaining disk storage for each
instance scheduled to their hypervisor. An instance rescheduled or redeployed to another
hypervisor would cause a duplicate set of that instance's disks to be deployed on the new
compute node.

• Different compute nodes, even when operating in the same networks, can be connected
to different storage arrays, Red Hat Virtualization data stores, or other back end storage
subsystems.

• Instances can be deployed using the Block Storage service volume-based transient or
persistent disks instead of using the Compute service ephemeral storage, but compute nodes
configured with different back ends require block-based migration.

Implementation Requirements for Block Based Live Migration


There are specific requirements for implementing block-based live migration:

268 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Configuring Block-based Live Migration

• Both source and destination compute nodes must be located in the same subnet.

• Both compute nodes must use the same processor type.

• All controller and compute nodes must have consistent name resolution for all other nodes.

• The UID and GID of the nova and libvirt users must be identical on all compute nodes.

• Compute nodes must be using KVM with libvirt, which is expected when using Red Hat
OpenStack Platform. The KVM with libvirt platform has the best coverage of features and
stability for live migration.

• The permissions and system access of local directories must be consistent across all nodes.

• libvirt must be able to securely communicate between nodes.

• Consistent multipath device naming must be used on both the source and destination compute
nodes. Instances expect to resolve multipath device names similarly in both locations.

Configuring Block-based Live Migration


Preparing for block-based live migration requires configuration for secure transfer of disk blocks
over TCP, opening firewall ports for the block transfer, adding common users and access across
all compute nodes, and configuring controllers with the configured access information.

Secure TCP for Live Migration


There are three secure options for remote access over TCP that are typically used for live
migration. Using a libvirtd TCP socket, with one of these methods to match your environment's
authentication resources:

• TLS for encryption, X.509 client certificates for authentication

• GSSAPI/Kerberos for both encryption and authentication

• TLS for encryption, Kerberos for authentication

Edit the /etc/libvirt/libvirtd.conf file with the chosen strategy:

TCP Security Strategy Settings


TLS with X509 GSSAPI with Kerberos TLS with Kerberos
listen_tls = 1 listen_tls = 0 listen_tls = 1
listen_tcp = 0 listen_tcp = 1 listen_tcp = 0
auth_tls = "none" auth_tls = "sasl"
auth_tcp = "sasl"
tls_no_verify_certificate = 0
tls_allowed_dn_list =
["distinguished name"]
sasl_allowed_username_list = sasl_allowed_username_list =
["Kerberos principal name"] ["Kerberos principal name"]

Inform libvirt about which security strategy is implemented.

• Update the /etc/sysconfig/libvirtd file to include: LIBVIRTD_ARGS="--listen"

CL210-RHOSP10.1-en-2-20171006 269

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

• Update the access URI string in /etc/nova/nova.conf to match the strategy. Use
"live_migration_uri=qemu+ACCESSTYPE://USER@%s/system", where ACCESSTYPE is
tcp or tls and USER is nova or use %s, which defaults to the root user.

• Restart the libvirtd service.

[root@compute]# systemctl restart libvirtd.service

Configure Compute Nodes for Live Migration


On each compute node, make the following configuration changes:

• Ensure that OpenStack utilities and the VNC proxy are installed, using:

[root@compute]# yum -y install iptables openstack-utils \


openstack-nova-novncproxy

• Add the nova group to the /etc/group file with a line like the following:

nova:x:162:nova

• Add the nova user to the /etc/passwd file with a line like the following

nova:x:162:162:OpenStack Nova Daemons:/var/lib/nova:/sbin/nologin

• Allow the nova user access to the compute node's ephemeral directory:

[root@compute]# chown nova:nova /var/lib/nova/instances


[root@compute]# chmod 775 /var/lib/nova/instances

• Add rules for TCP, TLS, and the ephemeral ports to the firewall:

If using TCP:

[root@compute]# iptables -v -I INPUT 1 -p tcp --dport 16509 -j ACCEPT


[root@compute]# iptables -v -I INPUT -p tcp --dport 49152:49261 -j ACCEPT

If using TLS:

[root@compute]# iptables -v -I INPUT 1 -p tcp --dport 16514 -j ACCEPT


[root@compute]# iptables -v -I INPUT -p tcp --dport 49152:49261 -j ACCEPT

• Save the firewall rules.

[root@compute]# service iptables save

• Update Qemu with three settings in the /etc/libvirt/qemu.conf file:

user="root"
group="root"

270 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Configuring Block-based Live Migration

vnc_listen="0.0.0.0"

• Restart the libvirtd service to reflect these changes:

[root@compute]# systemctl restart libvirtd.service

• Make the following changes to the compute service configuration file:

[root@compute]# crudini --set /etc/nova/nova.conf DEFAULT \


instances_path /var/lib/nova/instances
[root@compute]# crudini --set /etc/nova/nova.conf DEFAULT \
novncproxy_base_url http://controller0.overcloud.example.com:6080/vnc_auto.html
[root@compute]# crudini --set /etc/nova/nova.conf DEFAULT \
vncserver_listen 0.0.0.0
[root@compute]# crudini --set /etc/nova/nova.conf DEFAULT \
block_migration_flag \
VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,\
VIR_MIGRATE_NON_SHARED_INC

• Restart the firewall and the OpenStack services:

[root@compute]# systemctl restart iptables.service


[user@compute]$ sudo openstack-service restart

Configure Controller Nodes for Live Migration


On each controller node, make the following configuration changes:

• Ensure that OpenStack utilities and the VNC proxy are installed, using:

[root@controller]# yum -y install openstack-utils openstack-nova-novncproxy

• Make the following changes to the compute service configuration file:

[root@controller]# crudini --set /etc/nova/nova.conf DEFAULT \


vncserver_listen 0.0.0.0

• Restart the OpenStack services:

[user@controller]$ openstack-service restart

Migrate an Instance Using Block-based Live Migration.


Locate the instance to be migrated, and verify the size and settings required. List available
compute nodes by checking for active nova-compute services. Ensure that the intended
destination compute node has sufficient resources for the migration. Invoke the migration using
the syntax for block based live migration:

[user@workstation~]$ openstack server migrate --block-migration \


--live dest_compute_node instance

CL210-RHOSP10.1-en-2-20171006 271

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

Troubleshooting
When migrations fail or appear to take too long, check the activity in the compute service log
files on both the source and the destination compute node:

• /var/log/nova/nova-api.log
• /var/log/nova/nova-compute.log
• /var/log/nova/nova-conductor.log
• /var/log/nova/nova-scheduler.log

Migrating Instances with Block Storage


The following steps outline the process for the live migration of an instance using the block
storage method.

1. Ensure that the overcloud has more than one compute node added.

2. Configure block storage and live migration on all compute nodes. Ensure that SELinux is set
to permissive mode, and appropriate iptables rules are configured.

3. On the controller node, update the vncserver_listen variable to listen for all
connections in the /etc/nova/nova.conf file.

4. As an administrator, ensure the instance to be migrated is in a running state.

5. Using the administrator credentials, live migrate the instance to the destination compute
node using the openstack server migrate command.

6. Verify that the instance got migrated successfully to the destination compute node.

References
Further information is available for Configuring Block Migration in the Migrating
Instances guide for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

Further information is available for Migrating Live Instances in the Migrating Instances
guide for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

272 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Guided Exercise: Migrating Instances using Block Storage

Guided Exercise: Migrating Instances using


Block Storage

In this exercise, you will migrate a live instance using block storage.

Outcomes
You should be able to:

• Configure block storage.

• Live migrate an instance using block storage.

Before you begin


Log in to workstation as student with a password of student.

This guided exercise requires two compute nodes, as configured in a previous guided exercise
which added compute1 to the overcloud. If you did not successfully complete that guided
exercise, have reset your overcloud systems, or for any reason have an overcloud with only a
single installed compute node, you must first run the command lab resilience-block-
storage add-compute on workstation. The command's add-compute task adds the
compute1 node to the overcloud, taking between 40 and 90 minutes to complete.

Important
As described above, only run this command if you still need to install a second compute
node. If you already have two functioning compute nodes, skip this task and continue
with the setup task.

[student@workstation ~]$ lab resilience-block-storage add-compute

After the add-compute task has completed successfuly, continue with the setup task
in the following paragraph.

Start with the setup task if you have two functioning compute nodes, either from having
completed the previous overcloud scaling guided exercise, or by completing the extra add-
compute task described above. On workstation, run the lab resilience-block-storage
setup command. This command verifies the OpenStack environment and creates the project
resources used in this exercise.

[student@workstation ~]$ lab resilience-block-storage setup

Steps
1. Configure compute0 to use block-based migration. Later in this exercise, you will repeat
these steps on compute1.

1.1. Log into compute0 as heat-admin and switch to the root user.

[student@workstation ~]$ ssh heat-admin@compute0


[heat-admin@overcloud-compute-0 ~]$ sudo -i

CL210-RHOSP10.1-en-2-20171006 273

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

[root@overcloud-compute-0 ~]#

1.2. Configure iptables for live migration.

[root@overcloud-compute-0 ~]# iptables -v -I INPUT 1 -p tcp \


--dport 16509 -j ACCEPT
ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpt:16509
[root@overcloud-compute-0 ~]# iptables -v -I INPUT -p tcp \
--dport 49152:49261 -j ACCEPT
ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpts:49152:49261
[root@overcloud-compute-0 ~]# service iptables save

1.3. Configure user, group, and vnc_listen in /etc/libvirt/qemu.conf. Include the


following lines at the bottom of the file.

user="root"
group="root"
vnc_listen="0.0.0.0"

1.4. The classroom overcloud deployment uses Ceph as shared storage by default.
Demonstrating block-based migration requires disabling shared storage for the
Compute service. Enable the compute0 node to store virtual disk images, associated
with running instances, locally under /var/lib/nova/instances. Edit the /etc/
nova/nova.conf file to set the images_type variable to default.

[root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \


libvirt images_type default

1.5. Configure /etc/nova/nova.conf for block-based live migration.

[root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \


DEFAULT instances_path /var/lib/nova/instances
[root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT novncproxy_base_url \
http://172.25.250.1:6080/vnc_auto.html
[root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT vncserver_listen 0.0.0.0
[root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT block_migration_flag VIR_MIGRATE_UNDEFINE_SOURCE,\
VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_NON_SHARED_INC

1.6. Restart OpenStack services and log out of compute0.

[root@overcloud-compute-0 ~]# openstack-service restart


[root@overcloud-compute-0 ~]# exit
[heat-admin@overcloud-compute-0 ~]$ exit
[student@workstation ~]$

2. Configure compute1 to use block-based migration.

2.1. Log into compute1 as heat-admin and switch to the root user.

274 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


[student@workstation ~]$ ssh heat-admin@compute1
[heat-admin@overcloud-compute-1 ~]$ sudo -i
[root@overcloud-compute-1 ~]#

2.2. Configure iptables for live migration.

[root@overcloud-compute-1 ~]# iptables -v -I INPUT 1 -p tcp \


--dport 16509 -j ACCEPT
ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpt:16509
[root@overcloud-compute-1 ~]# iptables -v -I INPUT -p tcp \
--dport 49152:49261 -j ACCEPT
ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpts:49152:49261
[root@overcloud-compute-1 ~]# service iptables save

2.3. Configure user, group, and vnc_listen in /etc/libvirt/qemu.conf. Include the


following lines at the bottom of the file.

user="root"
group="root"
vnc_listen="0.0.0.0"

2.4. The classroom overcloud deployment uses Ceph as shared storage by default.
Demonstrating block-based migration requires disabling shared storage for the
Compute service. Enable the compute0 node to store virtual disk images, associated
with running instances, locally under /var/lib/nova/instances. Edit the /etc/
nova/nova.conf file to set the images_type variable to default.

[root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \


libvirt images_type default

2.5. Configure /etc/nova/nova.conf for block-based live migration.

[root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \


DEFAULT instances_path /var/lib/nova/instances
[root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT novncproxy_base_url \
http://172.25.250.1:6080/vnc_auto.html
[root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT vncserver_listen 0.0.0.0
[root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT block_migration_flag VIR_MIGRATE_UNDEFINE_SOURCE,\
VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_NON_SHARED_INC

2.6. Restart OpenStack services and log out of compute1.

[root@overcloud-compute-1 ~]# openstack-service restart


[root@overcloud-compute-1 ~]# exit
[heat-admin@overcloud-compute-1 ~]$ exit
[student@workstation ~]$

3. Configure controller0 for block-based live migration.

CL210-RHOSP10.1-en-2-20171006 275

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

3.1. Log into controller0 as heat-admin and switch to the root user.

[student@workstation ~]$ ssh heat-admin@controller0


[heat-admin@overcloud-controller-0 ~]$ sudo -i
[root@overcloud-controller-0 ~]#

3.2. Update the vncserver_listen variable in /etc/nova/nova.conf.

[root@overcloud-controller-0 ~]# openstack-config --set /etc/nova/nova.conf \


DEFAULT vncserver_listen 0.0.0.0

3.3. Restart the OpenStack Compute services. Exit controller0.

[root@overcloud-controller-0 ~]# openstack-service restart nova


[root@overcloud-controller-0 ~]# exit
[heat-admin@overcloud-controller-0 ~]$ exit
[student@workstation ~]$

4. From workstation, source the /home/student/developer1-finance-rc environment


file and launch an instance as the user developer1 using the following attributes:

Instance Attributes
Attribute Value
flavor m1.web
key pair developer1-keypair1
network finance-network1
image rhel7
security group finance-web
name finance-web1

[student@workstation ~]$ source ~/developer1-finance-rc


[student@workstation ~(developer1-finance)]$ openstack server create \
--flavor m1.web \
--key-name developer1-keypair1 \
--nic net-id=finance-network1 \
--security-group finance-web \
--image rhel7 finance-web1 --wait
...output omitted...

5. List the available floating IP addresses, then allocate one to the finance-web1 instance.

5.1. List the floating IPs. An available one has the Port attribute set to None.

[student@workstation ~(developer1-finance)]$ openstack floating ip list \


-c "Floating IP Address" -c Port
+---------------------+------+
| Floating IP Address | Port |
+---------------------+------+
| 172.25.250.N | None |

276 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


+---------------------+------+

5.2. Attach an available floating IP to the instance finance-web1.

[student@workstation ~(developer1-finance)]$ openstack server add \


floating ip finance-web1 172.25.250.N

5.3. Log in to the finance-web1 instance using /home/student/developer1-


keypair1.pem with ssh to ensure it is working properly, then log out of the instance.

[student@workstation ~(developer1-finance)]$ ssh -i ~/developer1-keypair1.pem \


cloud-user@172.25.250.N
Warning: Permanently added '172.25.250.N' (ECDSA) to the list of known hosts.
[cloud-user@finance-web1 ~]$ exit
[student@workstation ~(developer1-finance)]$

6. Migrate the instance finance-web1 using block-based live migration.

6.1. To perform live migration, the user developer1 must have the admin role assigned for
the project finance. Assign the admin role to developer1 for the project finance.
The developer1 user may already have been assigned the admin role.

[student@workstation ~(developer1-finance)]$ source ~/admin-rc


[student@workstation ~(admin-admin)]$ openstack role add --user \
developer1 --project finance admin
[student@workstation ~(admin-admin)]$ source ~/developer1-finance-rc

6.2. Determine whether the instance is currently running on overcloud-compute-0


or overcloud-compute-1. This example starts with the instance running on
overcloud-compute-1.

[student@workstation ~(developer1-finance)]$ openstack server show \


finance-web1 -f json | grep compute
"OS-EXT-SRV-ATTR:host": "overcloud-compute-1.localdomain",
"OS-EXT-SRV-ATTR:hypervisor_hostname": "overcloud-compute-1.localdomain",

6.3. Prior to migration, ensure the destination compute node has sufficient resources to host
the instance. In this example, the current server instance location node is overcloud-
compute-1.localdomain, and the destination to check is overcloud-compute-0.
Modify the command to reflect your actual source and destination compute nodes.
Estimate whether the total minus the amount used now is sufficient.

[student@workstation ~(developer1-finance)]$ openstack host show \


overcloud-compute-0.localdomain -f json
[
{
"Project": "(total)",
"Disk GB": 39,
"Host": "overcloud-compute-0.localdomain",
"CPU": 2,
"Memory MB": 6143
},
{

CL210-RHOSP10.1-en-2-20171006 277

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

"Project": "(used_now)",
"Disk GB": 0,
"Host": "overcloud-compute-0.localdomain",
"CPU": 0,
"Memory MB": 2048
},
{
"Project": "(used_max)",
"Disk GB": 0,
"Host": "overcloud-compute-0.localdomain",
"CPU": 0,
"Memory MB": 0
}

6.4. Migrate the instance finance-web1 to a new compute node. In this example, the
instance is migrated from overcloud-compute-1 to overcloud-compute-0. Your
scenario may require migrating in the reverse direction.

[student@workstation ~(developer1-finance)]$ openstack server migrate \


--block-migration \
--live overcloud-compute-0.localdomain \
--wait finance-web1
Complete

7. Use the command openstack server show to verify that the migration of finance-
web1 using block storage migration was successful. The compute node displayed should be
the destination node.

[student@workstation ~(developer1-finance)]$ openstack server show \


finance-web1 -f json | grep compute
"OS-EXT-SRV-ATTR:host": "overcloud-compute-0.localdomain",
"OS-EXT-SRV-ATTR:hypervisor_hostname": "overcloud-compute-0.localdomain",

Cleanup
From workstation, run the lab resilience-block-storage cleanup command to clean
up this exercise.

[student@workstation ~]$ lab resilience-block-storage cleanup

278 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Important - your next step
After this Guided Exercise, if you intend to either continue directly to ending Chapter
Lab or skip directly to the next Chapter, you must first reset your virtual machines.
Save any data that you would like to keep from the virtual machines. After the data is
saved, reset all of the virtual machines. In the physical classroom environment, use the
rht-vmctl reset all command. In the online classroom environment, delete the
current lab environment and provision a new lab environment.

If you intend to repeat either of the two Live Migration Guided Exercises in this
chapter that require two compute nodes, do not reset your virtual machines. Because
your overcloud currently has two functioning compute nodes, you may repeat the
Live Migration Guided Exercises without running the add-compute task that was
required to build the second compute node.

CL210-RHOSP10.1-en-2-20171006 279

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

Migrating Instances with Shared Storage

Objectives
After completing this section, students should be able to:

• Configure shared storage for the Compute services.

• Perform instance live migration with shared storage.

Shared Storage for Live Migration


Live migration using shared storage is one of the two methods used for live migration. With
the shared storage method, if both the source and destination compute nodes connect to and
have sufficient access privileges for the same shared storage locations containing the migrating
instance's disks, then no disk data transfer occurs. The source compute node stops using the
disks while the destination compute node takes over disk activity. When the openstack server
migrate command is issued, the source sends an instance's memory content to the destination.
During the transfer process, the memory pages on the source host are still being modified in
real time. The source host tracks the memory pages that were modified during the transfer and
retransmits them after the initial bulk transfer is completed. The instance's memory content
must be transferred faster than memory pages are written on the source virtual machine. After
all retransmittal is complete, an identical instance is started on the destination host. In parallel,
the virtual network infrastructure redirects the network traffic.

Live migration using block storage uses a similar process as shared storage live migration.
However, with block storage, disk content is copied before the memory content is transferred,
making live migration with shared storage quicker and more efficient.

Live migration configuration options


The following is a list of the live migration configuration options available for libvirt in /etc/
nova/nova.conf and their default values.

Live migration configuration options


Parameters Description
live_migration_retry_count = 30 Number of retries needed in live migration.
max_concurrent_live_migrations = 1 Maximum number of concurrent live
migrations to run.
live_migration_bandwidth = 0 Maximum bandwidth to be used in MiB/s. If set
to 0, a suitable value is chosen automatically.
live_migration_completion_timeout = Timeout value in seconds for successful
800 migration to complete before aborting the
operation.
live_migration_downtime = 500 Maximum permitted downtime, in
milliseconds.
live_migration_downtime_delay = 75 Time to wait, in seconds, between each step
increase of the migration downtime.
live_migration_downtime_steps = 10 Number of incremental steps to reach max
downtime value.

280 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Configuring Shared Storage Live Migration

Parameters Description
live_migration_flag = Migration flags to be set for live migration.
VIR_MIGRATE_UNDEFINE_SOURCE,
VIR_MIGRATE_PEER2PEER,
VIR_MIGRATE_LIVE,
VIR_MIGRATE_TUNNELLED
live_migration_progress_timeout = Time to wait, in seconds, for migration to
150 make progress in transferring data before
aborting the operation.
live_migration_uri = qemu+tcp://%s/ Migration target URI.
system

Configuring Shared Storage Live Migration


Preparing for shared storage based live migration requires configuring a shared file system,
opening firewall ports, ensuring common users and access across all compute nodes, and
configuring controllers with the required access information.

Secure TCP for Live Migration


There are three secure options for remote access over TCP that are typically used for live
migration. Use a libvirtd TCP socket, with one of these methods to match your environments
authentication resources:

• TLS for encryption, X.509 client certificates for authentication

• GSSAPI/Kerberos for both encryption and authentication

• TLS for encryption, Kerberos for authentication

Edit the /etc/libvirt/libvirtd.conf file with the chosen strategy:

TCP Security Strategy Settings


TLS with X509 GSSAPI with Kerberos TLS with Kerberos
listen_tls = 1 listen_tls = 0 listen_tls = 1
listen_tcp = 0 listen_tcp = 1 listen_tcp = 0
auth_tls = "none" auth_tls = "sasl"
auth_tcp = "sasl"
tls_no_verify_certificate = 0
tls_allowed_dn_list =
["distinguished name"]
sasl_allowed_username_list = sasl_allowed_username_list =
["Kerberos principal name"] ["Kerberos principal name"]

Inform libvirt about which security strategy is implemented.

• Update the /etc/sysconfig/libvirtd file to include: LIBVIRTD_ARGS="--listen"

• Update the access URI string in /etc/nova/nova.conf to match the strategy. Use
"live_migration_uri=qemu+ACCESSTYPE://USER@%s/system", where ACCESSTYPE is
tcp or tls and USER is nova or use '%s', which defaults to the root user.

CL210-RHOSP10.1-en-2-20171006 281

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

• Restart the libvirtd service.

[user@compute ~]$ sudo systemctl restart libvirtd.service

Shared Storage Live Migration Configuration for Controllers


The following outlines the process for configuring controllers for live migration using shared
storage.

1. Ensure that the nfs-utils, openstack-nova-novncproxy, and openstack-utils packages are


installed.

2. Configure /etc/sysconfig/nfs to set fixed ports for NFS server services.

3. Add firewall rules for NFS, TCP, TLS, and Portmap.

4. Configure the /etc/exports file to export /var/lib/nova/instances to the compute


nodes.

5. Start and enable the NFS service.

6. Export the NFS directory.

7. Update /etc/nova/nova.conf with vncserver_listen 0.0.0.0 to enable VNC


access.

8. Restart the OpenStack services.

Shared Storage Live Migration Configuration for Compute


The following outlines the process for configuring the compute nodes for live migration using
shared storage.

1. Ensure that the nfs-utilsand openstack-utils packages are installed.

2. Add rules for TCP, TLS, and the ephemeral ports to the firewall.

3. Update qemu with three settings in /etc/libvirt/qemu.conf for user, group, and
vnc_listen.

4. Restart the libvirtd service to activate these changes.

5. Edit the nova.conf to set Compute service configuration parameters.

6. Restart the compute node services.

Migrating an Instance with Shared Storage


The following steps outline the process for live migrating an instance using shared storage
method.

1. Determine which node the instance is currently running on.

2. Ensure the destination compute node has sufficient resources to host the instance.

3. Migrate the instance from node one compute node to another by using the openstack
server migrate command.

282 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Configuring Shared Storage Live Migration

4. Verify that the instance has migrated successfully.

Troubleshooting
When migration fails or takes too long, check the activity in the Compute service log files on both
the source and the destination compute nodes:

• /var/log/nova/nova-api.log

• /var/log/nova/nova-compute.log

• /var/log/nova/nova-conductor.log

• /var/log/nova/nova-scheduler.log

References
Further information is available for Configuring NFS Shared Storage in the Migrating
Instances guide for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

Further information is available for Migrating Live Instances in the Migrating Instances
guide for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

CL210-RHOSP10.1-en-2-20171006 283

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

Guided Exercise: Migrating Instances with


Shared Storage

In this exercise, you will configure shared storage and migrate a live instance.

Outcomes
You should be able to:

• Configure shared storage.

• Live migrate an instance using shared storage.

Before you begin


Log in to workstation as student with a password of student.

This guided exercise requires two compute nodes, as configured in a previous guided exercise
which added compute1 to the overcloud. If you did not successfully complete that guided
exercise, have reset your overcloud systems, or for any reason have an overcloud with only a
single installed compute node, you must first run the command lab resilience-shared-
storage add-compute on workstation. The command's add-compute task adds the
compute1 node to the overcloud, taking between 40 and 90 minutes to complete.

Important
As described above, only run this command if you still need to install a second compute
node. If you already have two functioning compute nodes, skip this task and continue
with the setup task.

[student@workstation ~]$ lab resilience-shared-storage add-compute

After the add-compute task has completed successfuly, continue with the setup task
in the following paragraph.

Start with the setup task if you have two functioning compute nodes, either from having
completed the previous overcloud scaling guided exercise, or by completing the extra add-
compute task described above. On workstation, run the lab resilience-shared-
storage setup command. This command verifies the OpenStack environment and creates the
project resources used in this exercise.

[student@workstation ~]$ lab resilience-shared-storage setup

Steps
1. Configure controller0 for shared storage.

1.1. Log into controller0 as heat-admin and switch to the root user.

[student@workstation ~]$ ssh heat-admin@controller0


[heat-admin@overcloud-controller-0 ~]$ sudo -i
[root@overcloud-controller-0 ~]#

284 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


1.2. Install the nfs-utils package.

[root@overcloud-controller-0 ~]# yum -y install nfs-utils

1.3. Configure iptables for NFSv4 shared storage.

[root@overcloud-controller-0 ~]# iptables -v -I INPUT \


-p tcp --dport 2049 -j ACCEPT
ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpt:2049
[root@overcloud-controller-0 ~]# service iptables save

1.4. Configure /etc/exports to export /var/lib/nova/instances via NFS to


compute0 and compute1. Add the following lines to the bottom of the file.

/var/lib/nova/instances 172.25.250.2(rw,sync,fsid=0,no_root_squash)
/var/lib/nova/instances 172.25.250.12(rw,sync,fsid=0,no_root_squash)

1.5. Enable and start the NFS service.

[root@overcloud-controller-0 ~]# systemctl enable nfs --now

1.6. Confirm the directory is exported.

[root@overcloud-controller-0 ~]# exportfs


/var/lib/nova/instances
172.25.250.2
/var/lib/nova/instances
172.25.250.12

1.7. Update the vncserver_listen variable in /etc/nova/nova.conf.

[root@overcloud-controller-0 ~]# openstack-config --set /etc/nova/nova.conf \


DEFAULT vncserver_listen 0.0.0.0

1.8. Restart OpenStack Compute services, then log out of controller0.

[root@overcloud-controller-0 ~]# openstack-service restart nova


[root@overcloud-controller-0 ~]# exit
[heat-admin@overcloud-controller-0 ~]$ exit
[student@workstation ~]$

2. Configure compute0 to use shared storage. Later in this exercise, you will repeat these
steps on compute1.

2.1. Log into compute0 as heat-admin and switch to the root user.

[student@workstation ~]$ ssh heat-admin@compute0


[heat-admin@overcloud-compute-0 ~]$ sudo -i
[root@overcloud-compute-0 ~]#

CL210-RHOSP10.1-en-2-20171006 285

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

2.2. Configure /etc/fstab to mount the directory /var/lib/nova/instances,


exported from controller0. Add the following line to the bottom of the file. Confirm
that the entry is on a single line in the file; the two line display here in the book is due to
insufficient width.

172.25.250.1:/ /var/lib/nova/instances nfs4


context="system_u:object_r:nova_var_lib_t:s0" 0 0

2.3. Mount the export from controller0 on /var/lib/nova/instances.

[root@overcloud-compute-0 ~]# mount -v /var/lib/nova/instances

2.4. Configure iptables to allow shared storage live migration.

[root@overcloud-compute-0 ~]# iptables -v -I INPUT -p tcp \


--dport 16509 -j ACCEPT
ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpt:16509
[root@overcloud-compute-0 ~]# iptables -v -I INPUT -p tcp \
--dport 49152:49261 -j ACCEPT
ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpts:49152:49261
[root@overcloud-compute-0 ~]# service iptables save

2.5. Configure user, group, and vnc_listen in /etc/libvirt/qemu.conf Add the


following lines to the bottom of the file.

user="root"
group="root"
vnc_listen="0.0.0.0"

2.6. Configure /etc/nova/nova.conf virtual disk storage and other properties for live
migration. Use the nfs mounted /var/lib/nova/instances directory to store
instance virtual disks.

[root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \


libvirt images_type default
[root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT instances_path /var/lib/nova/instances
[root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT novncproxy_base_url http://172.25.250.1:6080/vnc_auto.html
[root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT vncserver_listen 0.0.0.0
[root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT live_migration_flag \
VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE

2.7. Restart OpenStack services and log out of compute0.

[root@overcloud-compute-0 ~]# openstack-service restart


[root@overcloud-compute-0 ~]# exit
[heat-admin@overcloud-compute-0 ~]$ exit
[student@workstation ~]$

286 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


3. Configure compute1 to use shared migration.

3.1. Log into compute1 as heat-admin and switch to the root user.

[student@workstation ~]$ ssh heat-admin@compute1


[heat-admin@overcloud-compute-1 ~]$ sudo -i
[root@overcloud-compute-1 ~]#

3.2. Configure /etc/fstab to mount the directory /var/lib/nova/instances,


exported from controller0. Add the following line to the bottom of the file. Confirm
that the entry is on a single line in the file; the two line display here in the book is due to
insufficient width.

172.25.250.1:/ /var/lib/nova/instances nfs4


context="system_u:object_r:nova_var_lib_t:s0" 0 0

3.3. Mount the export from controller0 on /var/lib/nova/instances.

[root@overcloud-compute-1 ~]# mount -v /var/lib/nova/instances

3.4. Configure iptables for live migration.

[root@overcloud-compute-1 ~]# iptables -v -I INPUT -p tcp \


--dport 16509 -j ACCEPT
ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpt:16509
[root@overcloud-compute-1 ~]# iptables -v -I INPUT -p tcp \
--dport 49152:49261 -j ACCEPT
ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpts:49152:49261
[root@overcloud-compute-1 ~]# service iptables save

3.5. Configure user, group, and vnc_listen in /etc/libvirt/qemu.conf Add the


following lines to the bottom of the file.

user="root"
group="root"
vnc_listen="0.0.0.0"

3.6. Configure /etc/nova/nova.conf virtual disk storage and other properties for live
migration. Use the nfs mounted /var/lib/nova/instances directory to store
instance virtual disks.

[root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \


libvirt images_type default
[root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT instances_path /var/lib/nova/instances
[root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT novncproxy_base_url http://172.25.250.1:6080/vnc_auto.html
[root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT vncserver_listen 0.0.0.0
[root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT live_migration_flag \
VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE

CL210-RHOSP10.1-en-2-20171006 287

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

3.7. Restart OpenStack services and log out of compute1.

[root@overcloud-compute-1 ~]# openstack-service restart


[root@overcloud-compute-1 ~]# exit
[heat-admin@overcloud-compute-1 ~]$ exit
[student@workstation ~]$

4. From workstation, source the /home/student/developer1-finance-rc environment


file, and launch an instance as the user developer1 using the following attributes:

Instance Attributes
Attribute Value
flavor m1.web
key pair developer1-keypair1
network finance-network1
image rhel7
security group finance-web
name finance-web2

[student@workstation ~]$ source ~/developer1-finance-rc


[student@workstation ~(developer1-finance)]$ openstack server create \
--flavor m1.web \
--key-name developer1-keypair1 \
--nic net-id=finance-network1 \
--security-group finance-web \
--image rhel7 finance-web2 --wait
...output omitted...

5. List the available floating IP addresses, then allocate one to the finance-web2 instance.

5.1. List the floating IPs. An available one has the Port attribute set to None.

[student@workstation ~(developer1-finance)]$ openstack floating ip list \


-c "Floating IP Address" -c Port
+---------------------+------+
| Floating IP Address | Port |
+---------------------+------+
| 172.25.250.N | None |
+---------------------+------+

5.2. Attach an available floating IP to the instance finance-web2.

[student@workstation ~(developer1-finance)]$ openstack server add \


floating ip finance-web2 172.25.250.N

5.3. Log in to the finance-web2 instance using /home/student/developer1-


keypair1.pem with ssh to ensure it is working properly, then log out of the instance.

[student@workstation ~(developer1-finance)]$ ssh -i ~/developer1-keypair1.pem \

288 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


cloud-user@172.25.250.N
Warning: Permanently added '172.25.250.P' (ECDSA) to the list of known hosts.
[cloud-user@finance-web2 ~]$ exit
[student@workstation ~(developer1-finance)]$

6. Migrate the instance finance-web2 using shared storage migration.

6.1. To perform live migration, the developer1 user must have the admin role assigned for
the project finance. Assign the admin role to developer1 for the project finance.
The developer1 user may already have been assigned the admin role.

[student@workstation ~(developer1-finance)]$ source ~/admin-rc


[student@workstation ~(admin-admin)]$ openstack role add --user \
developer1 --project finance admin
[student@workstation ~(admin-admin)]$ source ~/developer1-finance-rc

6.2. Determine whether the instance is currently running on overcloud-compute-0


or overcloud-compute-1. In the following example the instance is running on
overcloud-compute-1. However, your instance may be running on overcloud-
compute-0.

[student@workstation ~(developer1-finance)]$ openstack server show \


finance-web2 -f json | grep compute
"OS-EXT-SRV-ATTR:host": "overcloud-compute-1.localdomain",
"OS-EXT-SRV-ATTR:hypervisor_hostname": "overcloud-compute-1.localdomain",

6.3. Prior to migration, ensure the destination compute node has sufficient resources to host
the instance. In this example, the current server instance location node is overcloud-
compute-1.localdomain, and the destination to check is overcloud-compute-0.
Modify the command to reflect your actual source and destination compute nodes.
Estimate whether the total, minus the amount used now, is sufficient.

[student@workstation ~(developer1-finance)]$ openstack host show \


overcloud-compute-0.localdomain -f json
[
{
"Project": "(total)",
"Disk GB": 56,
"Host": "overcloud-compute-0.localdomain",
"CPU": 2,
"Memory MB": 6143
},
{
"Project": "(used_now)",
"Disk GB": 0,
"Host": "overcloud-compute-0.localdomain",
"CPU": 0,
"Memory MB": 2048
},
{
"Project": "(used_max)",
"Disk GB": 0,
"Host": "overcloud-compute-0.localdomain",
"CPU": 0,
"Memory MB": 0
}

CL210-RHOSP10.1-en-2-20171006 289

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

6.4. Migrate the instance finance-web1 to a new compute node. In this example, the
instance is migrated from overcloud-compute-1 to overcloud-compute-0. Your
scenario may require migrating in the opposite direction.

[student@workstation ~(developer1-finance)]$ openstack server migrate \


--shared-migration \
--live overcloud-compute-0.localdomain \
--wait finance-web2
Complete

7. Use the command openstack server show to verify that finance-web2 is now running
on the other compute node.

[student@workstation ~(developer1-finance)]$ openstack server show \


finance-web2 -f json | grep compute
"OS-EXT-SRV-ATTR:host": "overcloud-compute-0.localdomain",
"OS-EXT-SRV-ATTR:hypervisor_hostname": "overcloud-compute-0.localdomain",

Cleanup
From workstation, run the lab resilience-shared-storage cleanup command to
clean up this exercise.

[student@workstation ~]$ lab resilience-shared-storage cleanup

Important - your next step


After this Guided Exercise, if you intend to either continue directly to ending Chapter
Lab or skip directly to the next Chapter, you must first reset your virtual machines.
Save any data that you would like to keep from the virtual machines. After the data is
saved, reset all of the virtual machines. In the physical classroom environment, use the
rht-vmctl reset all command. In the online classroom environment, delete the
current lab environment and provision a new lab environment.

If you intend to repeat either of the two Live Migration Guided Exercises in this
chapter that require two compute nodes, do not reset your virtual machines. Because
your overcloud currently has two functioning compute nodes, you may repeat the
Live Migration Guided Exercises without running the add-compute task that was
required to build the second compute node.

290 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Lab: Managing Resilient Compute Resources

Lab: Managing Resilient Compute Resources

In this lab, you will add compute nodes, manage shared storage, and perform instance live
migration.

Resources
Files: http://materials.example.com/instackenv-onenode.json

Outcomes
You should be able to:

• Add a compute node.

• Configure shared storage.

• Live migrate an instance using shared storage.

Before you begin


If you have not done so already, save any data that you would like to keep from the virtual
machines. After the data is saved, reset all of the virtual machines. In the physical classroom
environment, reset all of the virtual machines using the command rht-vmctl. In the online
environment, delete and provision a new classroom lab environment.

Log in to workstation as student with a password of student.

On workstation, run the lab resilience-review setup command. The script ensures
that OpenStack services are running and the environment has been properly configured for the
lab.

[student@workstation ~]$ lab resilience-review setup

Steps
1. Use SSH to connect to director as the user stack and source the stackrc credentials
file.

2. Prepare compute1 for introspection. Use the details available in http://


materials.example.com/instackenv-onenode.json file.

3. Initiate introspection of compute1. Introspection may take a few minutes.

4. Update the node profile for compute1 to use the compute profile.

5. Configure 00-node-info.yaml to scale two compute nodes.

6. Deploy the overcloud, to scale compute by adding one node.

7. Prepare compute1 for the next part of the lab.

[student@workstation ~] lab resilience-review prep-compute1

8. Configure controller0 for shared storage.

CL210-RHOSP10.1-en-2-20171006 291

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

9. Configure shared storage for compute0.

10. Configure shared storage for compute1.

11. Launch an instance named production1 as the user operator1 using the following
attributes:

Instance Attributes
Attribute Value
flavor m1.web
key pair operator1-keypair1
network production-network1
image rhel7
security group production
name production1

12. List the available floating IP addresses, then allocate one to the production1 instance.

13. Ensure that the production1 instance is accessible by logging in to the instance as the
user cloud-user, then log out of the instance.

14. Migrate the instance production1 using shared storage.

15. Verify that the migration of production1 using shared storage was successful.

Evaluation
From workstation, run the lab resilience-review grade command to confirm the
success of this exercise. Correct any reported failures and rerun the command until successful.

[student@workstation ~]$ lab resilience-review grade

Cleanup
Save any data that you would like to keep from the virtual machines. After the data is saved,
reset all of the overcloud virtual machines and the director virtual machine. In the physical
classroom environment, reset all of the overcloud virtual machines and the director virtual
machine using the rht-vmctl command. In the online environment, reset and start the director
and overcloud nodes.

292 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

Solution
In this lab, you will add compute nodes, manage shared storage, and perform instance live
migration.

Resources
Files: http://materials.example.com/instackenv-onenode.json

Outcomes
You should be able to:

• Add a compute node.

• Configure shared storage.

• Live migrate an instance using shared storage.

Before you begin


If you have not done so already, save any data that you would like to keep from the virtual
machines. After the data is saved, reset all of the virtual machines. In the physical classroom
environment, reset all of the virtual machines using the command rht-vmctl. In the online
environment, delete and provision a new classroom lab environment.

Log in to workstation as student with a password of student.

On workstation, run the lab resilience-review setup command. The script ensures
that OpenStack services are running and the environment has been properly configured for the
lab.

[student@workstation ~]$ lab resilience-review setup

Steps
1. Use SSH to connect to director as the user stack and source the stackrc credentials
file.

[student@workstation ~]$ ssh stack@director


[stack@director ~]$

2. Prepare compute1 for introspection. Use the details available in http://


materials.example.com/instackenv-onenode.json file.

2.1. Download the instackenv-onenode.json file from http://


materials.example.com to /home/stack for introspection of compute1.

[stack@director ~]$ wget http://materials.example.com/instackenv-onenode.json

2.2. Verify that the instackenv-onenode.json file is for compute1.

[stack@director ~]$ cat ~/instackenv-onenode.json


{
"nodes": [
{
"pm_user": "admin",

CL210-RHOSP10.1-en-2-20171006 293

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

"arch": "x86_64",
"name": "compute1",
"pm_addr": "172.25.249.112",
"pm_password": "password",
"pm_type": "pxe_ipmitool",
"mac": [
"52:54:00:00:f9:0c"
],
"cpu": "2",
"memory": "6144",
"disk": "40"
}
]
}

2.3. Import instackenv-onenode.json into the baremetal service using openstack


baremetal import, and ensure that the node has been properly imported.

[stack@director ~]$ openstack baremetal import --json \


/home/stack/instackenv-onenode.json
Started Mistral Workflow. Execution ID: 8976a32a-6125-4c65-95f1-2b97928f6777
Successfully registered node UUID b32d3987-9128-44b7-82a5-5798f4c2a96c
Started Mistral Workflow. Execution ID: 63780fb7-bff7-43e6-bb2a-5c0149bc9acc
Successfully set all nodes to available
[stack@director ~]$ openstack baremetal node list \
-c Name -c 'Power State' -c 'Provisioning State' -c Maintenance
+-------------+--------------------+-------------+-------------+
| Name | Provisioning State | Power State | Maintenance |
+-------------+--------------------+-------------+-------------+
| controller0 | active | power on | False |
| compute0 | active | power on | False |
| ceph0 | active | power on | False |
| compute1 | available | power off | False |
+-------------+--------------------+-------------+-------------+

2.4. Prior to starting introspection, set the provisioning state for compute1 to manageable.

[stack@director ~]$ openstack baremetal node manage compute1

3. Initiate introspection of compute1. Introspection may take a few minutes.

[stack@director ~]$ openstack overcloud node introspect \


--all-manageable --provide
Started Mistral Workflow. Execution ID: d9191784-e730-4179-9cc4-a73bc31b5aec
Waiting for introspection to finish...
...output omitted...

4. Update the node profile for compute1 to use the compute profile.

[stack@director ~]$ openstack baremetal node set compute1 \


--property "capabilities=profile:compute,boot_option:local"

5. Configure 00-node-info.yaml to scale two compute nodes.

Update the ComputeCount line as follows.

294 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

ComputeCount: 2

6. Deploy the overcloud, to scale compute by adding one node.

[stack@director ~]$ openstack overcloud deploy \


--templates ~/templates \
--environment-directory ~/templates/cl210-environment
Removing the current plan files
Uploading new plan files
Started Mistral Workflow. Execution ID: 6de24270-c3ed-4c52-8aac-820f3e1795fe
Plan updated
Deploying templates in the directory /tmp/tripleoclient-WnZ2aA/tripleo-heat-
templates
Started Mistral Workflow. Execution ID: 50f42c4c-d310-409d-8d58-e11f993699cb
...output omitted...

7. Prepare compute1 for the next part of the lab.

[student@workstation ~] lab resilience-review prep-compute1

8. Configure controller0 for shared storage.

8.1. Log into controller0 as heat-admin and switch to the root user.

[student@workstation ~]$ ssh heat-admin@controller0


[heat-admin@overcloud-controller-0 ~]$ sudo -i
[root@overcloud-controller-0 ~]#

8.2. Install the nfs-utils package.

[root@overcloud-controller-0 ~]# yum -y install nfs-utils

8.3. Configure iptables for NFSv4 shared storage.

[root@overcloud-controller-0 ~]# iptables -v -I INPUT \


-p tcp --dport 2049 -j ACCEPT
ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpt:2049
[root@overcloud-controller-0 ~]# service iptables save

8.4. Configure /etc/exports to export /var/lib/nova/instances via NFS to


compute0 and compute1. Add the following lines to the bottom of the file.

/var/lib/nova/instances 172.25.250.2(rw,sync,fsid=0,no_root_squash)
/var/lib/nova/instances 172.25.250.12(rw,sync,fsid=0,no_root_squash)

8.5. Enable and start the NFS service.

[root@overcloud-controller-0 ~]# systemctl enable nfs --now

CL210-RHOSP10.1-en-2-20171006 295

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

8.6. Confirm the directory is exported.

[root@overcloud-controller-0 ~]# exportfs


/var/lib/nova/instances
172.25.250.2
/var/lib/nova/instances
172.25.250.12

8.7. Update the vncserver_listen variable in /etc/nova/nova.conf.

[root@overcloud-controller-0 ~]# openstack-config --set /etc/nova/nova.conf \


DEFAULT vncserver_listen 0.0.0.0

8.8. Restart OpenStack Compute services, then log out of controller0.

[root@overcloud-controller-0 ~]# openstack-service restart nova


[root@overcloud-controller-0 ~]# exit
[heat-admin@overcloud-controller-0 ~]$ exit
[student@workstation ~]$

9. Configure shared storage for compute0.

9.1. Log into compute0 as heat-admin and switch to the root user.

[student@workstation ~]$ ssh heat-admin@compute0


[heat-admin@overcloud-compute-0 ~]$ sudo -i
[root@overcloud-compute-0 ~]#

9.2. Configure /etc/fstab to mount the directory /var/lib/nova/instances,


exported from controller0. Add the following line to the bottom of the file. Confirm
that the entry is on a single line in the file; the two line display here in the book is due to
insufficient width.

172.25.250.1:/ /var/lib/nova/instances nfs4


context="system_u:object_r:nova_var_lib_t:s0" 0 0

9.3. Mount the export from controller0 on /var/lib/nova/instances.

[root@overcloud-compute-0 ~]# mount -v /var/lib/nova/instances

9.4. Configure iptables to allow shared storage live migration.

[root@overcloud-compute-0 ~]# iptables -v -I INPUT -p tcp \


--dport 16509 -j ACCEPT
ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpt:16509
[root@overcloud-compute-0 ~]# iptables -v -I INPUT -p tcp \
--dport 49152:49261 -j ACCEPT
ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpts:49152:49261
[root@overcloud-compute-0 ~]# service iptables save

296 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

9.5. Configure user, group, and vnc_listen in /etc/libvirt/qemu.conf Add the


following lines to the bottom of the file.

user="root"
group="root"
vnc_listen="0.0.0.0"

9.6. Configure /etc/nova/nova.conf virtual disk storage and other properties for live
migration. Use the nfs mounted /var/lib/nova/instances directory to store
instance virtual disks.

[root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \


libvirt images_type default
[root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT instances_path /var/lib/nova/instances
[root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT novncproxy_base_url http://172.25.250.1:6080/vnc_auto.html
[root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT vncserver_listen 0.0.0.0
[root@overcloud-compute-0 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT live_migration_flag \
VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE

9.7. Restart OpenStack services and log out of compute0.

[root@overcloud-compute-0 ~]# openstack-service restart


[root@overcloud-compute-0 ~]# exit
[heat-admin@overcloud-compute-0 ~]$ exit
[student@workstation ~]$

10. Configure shared storage for compute1.

10.1. Log into compute1 as heat-admin and switch to the root user.

[student@workstation ~]$ ssh heat-admin@compute1


[heat-admin@overcloud-compute-1 ~]$ sudo -i
[root@overcloud-compute-1 ~]#

10.2.Configure /etc/fstab to mount the directory /var/lib/nova/instances,


exported from controller0. Add the following line to the bottom of the file. Confirm
that the entry is on a single line in the file; the two line display here in the book is due to
insufficient width.

172.25.250.1:/ /var/lib/nova/instances nfs4


context="system_u:object_r:nova_var_lib_t:s0" 0 0

10.3.Mount the export from controller0 on /var/lib/nova/instances.

[root@overcloud-compute-1 ~]# mount -v /var/lib/nova/instances

10.4.Configure iptables for live migration.

CL210-RHOSP10.1-en-2-20171006 297

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

[root@overcloud-compute-1 ~]# iptables -v -I INPUT -p tcp \


--dport 16509 -j ACCEPT
ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpt:16509
[root@overcloud-compute-1 ~]# iptables -v -I INPUT -p tcp \
--dport 49152:49261 -j ACCEPT
ACCEPT tcp opt -- in * out * 0.0.0.0/0 -> 0.0.0.0/0 tcp dpts:49152:49261
[root@overcloud-compute-1 ~]# service iptables save

10.5.Configure user, group, and vnc_listen in /etc/libvirt/qemu.conf Add the


following lines to the bottom of the file.

user="root"
group="root"
vnc_listen="0.0.0.0"

10.6.Configure /etc/nova/nova.conf virtual disk storage and other properties for live
migration. Use the nfs mounted /var/lib/nova/instances directory to store
instance virtual disks.

[root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \


libvirt images_type default
[root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT instances_path /var/lib/nova/instances
[root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT novncproxy_base_url http://172.25.250.1:6080/vnc_auto.html
[root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT vncserver_listen 0.0.0.0
[root@overcloud-compute-1 ~]# openstack-config --set /etc/nova/nova.conf \
DEFAULT live_migration_flag \
VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE

10.7. Restart OpenStack services and log out of compute1.

[root@overcloud-compute-1 ~]# openstack-service restart


[root@overcloud-compute-1 ~]# exit
[heat-admin@overcloud-compute-1 ~]$ exit
[student@workstation ~]$

11. Launch an instance named production1 as the user operator1 using the following
attributes:

Instance Attributes
Attribute Value
flavor m1.web
key pair operator1-keypair1
network production-network1
image rhel7
security group production
name production1

298 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

[student@workstation ~]$ source ~/operator1-production-rc


[student@workstation ~(operator1-production)]$ openstack server create \
--flavor m1.web \
--key-name operator1-keypair1 \
--nic net-id=production-network1 \
--security-group production \
--image rhel7 \
--wait production1
...output omitted...

12. List the available floating IP addresses, then allocate one to the production1 instance.

12.1. List the floating IPs. An available one has the Port attribute set to None.

[student@workstation ~(operator1-production)]$ openstack floating ip list \


-c "Floating IP Address" -c Port
+---------------------+------+
| Floating IP Address | Port |
+---------------------+------+
| 172.25.250.P | None |
+---------------------+------+

12.2.Attach an available floating IP to the instance production1.

[student@workstation ~(operator1-production)]$ openstack server add \


floating ip production1 172.25.250.P

13. Ensure that the production1 instance is accessible by logging in to the instance as the
user cloud-user, then log out of the instance.

[student@workstation ~(operator1-production)]$ ssh -i ~/operator1-keypair1.pem \


cloud-user@172.25.250.P
Warning: Permanently added '172.25.250.P' (ECDSA) to the list of known hosts.
[cloud-user@production1 ~]$ exit
[student@workstation ~(operator1-production)]$

14. Migrate the instance production1 using shared storage.

14.1. To perform live migration, the user operator1 must have the admin role assigned
for the project production. Assign the admin role to operator1 for the project
production.

Source the /home/student/admin-rc file to export the admin user credentials.

[student@workstation ~(operator1-production)]$ source ~/admin-rc


[student@workstation ~(admin-admin)]$ openstack role add --user \
operator1 --project production admin

14.2.Determine whether the instance is currently running on compute0 or compute1. In the


example below, the instance is running on compute0, but your instance may be running
on compute1.

CL210-RHOSP10.1-en-2-20171006 299

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

Source the /home/student/operator1-production-rc file to export the


operator1 user credentials.

[student@workstation ~(admin-admin)]$ source ~/operator1-production-rc


[student@workstation ~(operator1-production)]$ openstack server show \
production1 -f json | grep compute
"OS-EXT-SRV-ATTR:host": "overcloud-compute-0.localdomain",
"OS-EXT-SRV-ATTR:hypervisor_hostname": "overcloud-compute-0.localdomain",

14.3.Prior to migration, ensure compute1 has sufficient resources to host the instance. The
example below uses compute1, however you may need to use compute0. The compute
node should contain 2 VCPUs, a 56 GB disk, and 2048 MBs of available RAM.

[student@workstation ~(operator1-production)]$ openstack host show \


overcloud-compute-1.localdomain -f json
[
{
"Project": "(total)",
"Disk GB": 56,
"Host": "overcloud-compute-1.localdomain",
"CPU": 2,
"Memory MB": 6143
},
{
"Project": "(used_now)",
"Disk GB": 0,
"Host": "overcloud-compute-1.localdomain",
"CPU": 0,
"Memory MB": 2048
},
{
"Project": "(used_max)",
"Disk GB": 0,
"Host": "overcloud-compute-1.localdomain",
"CPU": 0,
"Memory MB": 0
}

14.4.Migrate the instance production1 using shared storage. In the example below, the
instance is migrated from compute0 to compute1, but you may need to migrate the
instance from compute1 to compute0.

[student@workstation ~(operator1-production)]$ openstack server migrate \


--shared-migration \
--live overcloud-compute-1.localdomain \
production1

15. Verify that the migration of production1 using shared storage was successful.

15.1. Verify that the migration of production1 using shared storage was successful. The
example below displays compute1, but your output may display compute0.

[student@workstation ~(operator1-production)]$ openstack server show \


production1 -f json | grep compute
"OS-EXT-SRV-ATTR:host": "overcloud-compute-1.localdomain",

300 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

"OS-EXT-SRV-ATTR:hypervisor_hostname": "overcloud-compute-1.localdomain",

Evaluation
From workstation, run the lab resilience-review grade command to confirm the
success of this exercise. Correct any reported failures and rerun the command until successful.

[student@workstation ~]$ lab resilience-review grade

Cleanup
Save any data that you would like to keep from the virtual machines. After the data is saved,
reset all of the overcloud virtual machines and the director virtual machine. In the physical
classroom environment, reset all of the overcloud virtual machines and the director virtual
machine using the rht-vmctl command. In the online environment, reset and start the director
and overcloud nodes.

CL210-RHOSP10.1-en-2-20171006 301

Rendered for Nokia. Please do not distribute.


Chapter 6. Managing Resilient Compute Resources

Summary
In this chapter, you learned:

• The Red Hat OpenStack Platform Bare Metal provisioning service, Ironic, supports the
provisioning of both virtual and physical machines to be used for the overcloud deployment.

• Red Hat OpenStack Platform director (undercloud) uses the Orchestration service (Heat) to
orchestrate the deployment of the overcloud with a stack definition.

• Low level system information, such as CPU count, memory, disk space, and network interfaces
of a node is retrieved through a process called introspection.

• Block-based live migration is the alternate method used when shared storage is not
implemented.

• When migrating using shared storage, the instance's memory content must be transferred
faster than memory pages are written to the source instance.

• When using block-based live migration, disk content is copied before memory content is
transferred, which makes shared storage live migration quicker and more efficient.

302 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


TRAINING
CHAPTER 7

TROUBLESHOOTING
OPENSTACK ISSUES

Overview
Goal Holistically diagnose and troubleshoot OpenStack issues.
Objectives • Diagnose and troubleshoot instance launch issues on a
compute node.

• Diagnose and troubleshoot the identity and messaging


services.

• Diagnose and troubleshoot the OpenStack networking,


image, and volume services.
Sections • Troubleshooting Compute Nodes (and Guided Exercise)

• Troubleshooting Authentication and Messaging (and


Guided Exercise)

• Troubleshooting OpenStack Networking, Image, and


Volume Services (and Guided Exercise)
Lab • Troubleshooting OpenStack Issues

CL210-RHOSP10.1-en-2-20171006 303

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

Troubleshooting Compute Nodes

Objectives
After completing this section, students should be able to diagnose and troubleshoot instance
launch issues on a compute node.

The OpenStack Compute Service Architecture


The OpenStack compute service supports the deployment of instances on compute nodes.
As are many other OpenStack services, the OpenStack compute service is modular, and their
components are deployed on different machines. Each of these components plays a different role
in this deployment. The components are deployed on the controller node, and provide a front
end using the Compute API. The Nova API component supports the Compute API. The Compute
components deployed on the controller node also support the scheduling of instances based on
certain scheduling algorithms. These algorithms are configurable, and can be customized.

The scheduling is based on the data retrieved from the compute nodes, and is supported by the
Compute scheduler component. This data includes the hardware resources currently available
in the compute node, like the available memory or the number of CPUs. The Nova compute
component, which runs on each compute node, captures this data. This component uses the
RabbitMQ messaging service to connect to the Compute service core components deployed
on the controller node. The Nova compute component also gathers together all the required
resources to launch an instance. This task also includes the scheduling of the instance in the
hypervisor running on the compute node.

In addition to the RabbitMQ messaging service, Compute also uses the MariaDB service to store
its configuration settings. The communication with both RabbitMQ and MariaDB is handled by the
Compute conductor component, running on the controller node.

The log files for Compute components are in the /var/log/nova directory on both the
controller node and the compute node. Each Compute component logs their events to a different
log file. The Nova compute component logs to the /var/log/nova/compute.log file in the
compute node. The Compute components running on the controller node log to the /var/log/
nova directory on that node.

Nova Compute Service Log File


Scheduler /var/log/nova/scheduler.log
Conductor /var/log/nova/conductor.log
API /var/log/nova/api.log

Compute service commands provide additional visibility on the status of the different Compute
components on each node. This status can help troubleshooting issues created by other
auxiliary services used by Compute components, such as RabbitMQ or MariaDB. The openstack
compute service list command displays the hosts where the Compute components are
running on the controller and compute nodes as follows:

[user@demo]$ openstack compute service list -c Binary -c Host


+------------------+------------------------------------+
| Binary | Host |
+------------------+------------------------------------+
| nova-consoleauth | overcloud-controller-0.localdomain |

304 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


The OpenStack Compute Service Architecture

| nova-scheduler | overcloud-controller-0.localdomain |
| nova-conductor | overcloud-controller-0.localdomain |
| nova-compute | overcloud-compute-0.localdomain |
+------------------+------------------------------------+

This command also shows the status, state, and last update of the Compute components, as
follows:

[user@demo]$ openstack compute service list -c Binary -c Status -c State -c "Updated At"
+------------------+---------+-------+----------------------------+
| Binary | Status | State | Updated At |
+------------------+---------+-------+----------------------------+
| nova-consoleauth | enabled | up | 2017-06-16T19:38:38.000000 |
| nova-scheduler | enabled | up | 2017-06-16T19:38:39.000000 |
| nova-conductor | enabled | up | 2017-06-16T19:38:35.000000 |
| nova-compute | enabled | up | 2017-06-16T19:38:39.000000 |
+------------------+---------+-------+----------------------------+

The previous output shows the node where each Compute component is deployed in the Host
field, the status of the component in the Status field, and the state of the component in
the State field. The Status field shows whether the Compute component is enabled or
disabled. The previous command is used to detect issues related to RabbitMQ. A RabbitMQ
unavailability issue is indicated when all the Nova Compute components are down.

Note
The openstack compute service list command requires admin credentials.

A Compute component can be enabled or disabled using the openstack compute service
command. This command is useful, for example, when a compute node has to be put under
maintenance, as follows:

[user@demo]$ openstack compute service set --disable \


overcloud-compute-0.localdomain \
nova-compute
[user@demo]$ openstack compute service list -c Binary -c Host -c Status
+------------------+------------------------------------+----------+
| Binary | Host | Status |
+------------------+------------------------------------+----------+
| nova-consoleauth | overcloud-controller-0.localdomain | enabled |
| nova-scheduler | overcloud-controller-0.localdomain | enabled |
| nova-conductor | overcloud-controller-0.localdomain | enabled |
| nova-compute | overcloud-compute-0.localdomain | disabled |
+------------------+------------------------------------+----------+

When the compute node maintenance finishes, the compute node can be enabled again, as
follows:

[user@demo]$ openstack compute service set --enable \


overcloud-compute-0.localdomain \
nova-compute
[user@demo]$ openstack compute service list -c Binary -c Host -c Status
+------------------+------------------------------------+---------+
| Binary | Host | Status |
+------------------+------------------------------------+---------+
| nova-consoleauth | overcloud-controller-0.localdomain | enabled |

CL210-RHOSP10.1-en-2-20171006 305

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

| nova-scheduler | overcloud-controller-0.localdomain | enabled |


| nova-conductor | overcloud-controller-0.localdomain | enabled |
| nova-compute | overcloud-compute-0.localdomain | enabled |
+------------------+------------------------------------+---------+

All Compute components use the /etc/nova/nova.conf file as their configuration file. This
applies both for Compute components running on a controller node and a compute node. That
configuration file contains configuration settings for the different Compute components, and also
for connecting those to the back-end services. For example, the messaging-related settings are
identified by the rabbit prefix (for RabbitMQ).

In the compute node, other Nova Compute settings can be configured, like the settings related
to the ratio between the physical and virtual resources provided by the compute node. The
following settings specify this ratio:

• The ram_allocation_ratio parameter for the memory ratio.

• The disk_allocation_ratio parameter for the disk ratio.

• The cpu_allocation_ratio parameter for the cpu ratio.

For example, specifying a ratio of 1.5 will allow cloud users to use 1.5 times as many virtual CPUs
as physical CPUs that are available.

Compute Node Placement Algorithm


The Compute scheduler component uses a scheduler algorithm to select which compute
node is going to be used to deploy an instance. This algorithm is configurable using the
scheduler_driver parameter in the /etc/nova/nova.conf configuration file, available on
the controller node. By default, the Compute scheduler component uses filter_scheduler,
an algorithm based on filters. This algorithm uses a collection of filters to select a suitable host
for deploying instances. Those filters will filter hosts based on facts such as the RAM memory
available to hosts. When filtered, hosts are sorted according to some cost functions implemented
in the Compute scheduler component. Finally, a list of suitable hosts, with their associated costs,
is generated.

Some of the filters applied by Compute scheduler when using the filter-based algorithm are:

• The RetryFilter filter identifies the hosts not used previously.

• The RamFilter filter identifies the hosts with enough RAM memory to deploy the instance.

• The ComputeFilter filter identifies the compute nodes available to deploy the instance.

Note
The Compute scheduler component supports the usage of custom scheduling
algorithms.

Regions, Availability Zones, Host Aggregates, and


Cells
The OpenStack compute service supports the usage of a hierarchy to define its architecture.
The top element of that hierarchy is a region. A region usually includes a complete Red Hat

306 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Common Issues with Compute Nodes

OpenStack Platform environment. Inside of a region, several availability zones are


defined to group compute nodes. A user can specify on which availability zone an instance
needs to be deployed.

In addition to availability zones, the OpenStack compute service supports host


aggregates to group compute nodes. Compute nodes can be grouped in a region using both
availability zones and host aggregates. Host aggregates are only visible for cloud
administrators.

The usage of auxiliary services to connect the different components, like RabbitMQ or MariaDB,
can cause issues affecting the OpenStack compute service availability. The OpenStack compute
service supports a different hierarchy based on cells. This hierarchy groups compute nodes into
cells. Each cell has all the Compute components running except for the Compute API component,
which runs on a top-level node. This configuration uses the nova-cells service to select the
cell to deploy a new instance. The default OpenStack compute service configuration does not
support cells.

Common Issues with Compute Nodes


Compute nodes issues are usually related to:

• A hardware failure on the compute node.

• A failure in the messaging service connecting the Nova compute service with the Compute
scheduler service.

• Lack of resources, for example CPU or RAM, on the available compute nodes.

Those issues usually raise a no valid host issue at the Compute conductor logs because
the Compute conductor and scheduler services cannot find a suitable Nova compute service to
deploy the instance.

[root@demo]# cat /var/log/nova/nova-conductor.log


NoValidHost: No valid host was found. There are not enough hosts available.
WARNING [instance: 1685(...)02f8] Setting
instance to ERROR state.

This error can also be related to the lack of resources on the available compute nodes. The
current resources available in the compute nodes running on the Red Hat OpenStack Platform
environment can be retrieved using the openstack host list and openstack host show
commands as follows.

[user@demo]$ openstack host list


+------------------------------------+-------------+----------+
| Host Name | Service | Zone |
+------------------------------------+-------------+----------+
| overcloud-controller-0.localdomain | consoleauth | internal |
| overcloud-controller-0.localdomain | scheduler | internal |
| overcloud-controller-0.localdomain | conductor | internal |
| overcloud-compute-0.localdomain | compute | nova |
+------------------------------------+-------------+----------+
[user@demo]$ openstack host show overcloud-compute-0.localdomain
+---------------------------------+------------+-----+-----------+---------+
| Host | Project | CPU | Memory MB | Disk GB |
+---------------------------------+------------+-----+-----------+---------+
| overcloud-compute-0.localdomain | (total) | 2 | 6143 | 56 |

CL210-RHOSP10.1-en-2-20171006 307

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

| overcloud-compute-0.localdomain | (used_now) | 0 | 2048 | 0 |


| overcloud-compute-0.localdomain | (used_max) | 0 | 0 | 0 |
+---------------------------------+------------+-----+-----------+---------+

Note
If there is an instance deployed on a compute node, the openstack host show
command also shows the usage of CPU, memory, and disk for that instance.

The Compute conductor log file also includes the messages related to issues caused by those
auxiliary services. For example, the following message in the Compute conductor log file
indicates that the RabbitMQ service is not available:

[root@demo]# cat /var/log/nova/conductor.log


ERROR oslo.messaging._drivers.impl_rabbit [-] [3cb7...857f] AMQP server on
172.24.1.1:5672 is unreachable:
[Errno 111] ECONNREFUSED. Trying again in 16 seconds. Client port: None

The following message indicates that the MariaDB service is not available:

[root@demo]# less /var/log/nova/conductor.log


WARNING oslo_db.sqlalchemy.engines [req-(...)ac35 - - - - -]
SQL connection failed. -1 attempts left.

Troubleshooting Compute Nodes


The following steps outline the process for troubleshooting issues in compute nodes.

1. Log into an OpenStack controller node.

2. Locate the Compute services log files.

3. Review the log file for the Compute conductor service.

4. Review the log file for the Compute scheduler service.

5. Load admin credentials.

6. List the Compute services available.

7. Disable a Nova compute service.

8. Enable the previous Nova compute service.

References
Further information is available in the Logging, Monitoring, and Troubleshooting Guide
for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

308 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Guided Exercise: Troubleshooting Compute Nodes

Guided Exercise: Troubleshooting Compute


Nodes

In this exercise, you will fix an issue with the Nova compute service that prevents it from
launching instances. Finally, you will verify that the fix was correctly applied by launching an
instance.

Outcomes
You should be able to troubleshoot and fix an issue in the Nova compute service.

Before you begin


Log in to workstation as student using student as the password.

From workstation, run lab troubleshooting-compute-nodes setup to verify that


OpenStack services are running, and that resources created in previous sections are available.
This script also intentionally breaks the Nova compute service.

[student@workstation ~]$ lab troubleshooting-compute-nodes setup

Steps
1. Launch an instance named finance-web1 using the rhel7 image, the m1.web flavor, the
finance-network1 network, the finance-web security group, and the developer1-
keypair1 key pair. These resources were all created by the setup script. The instance
deployment will return an error.

1.1. Load the developer1 user credentials.

[student@workstation ~]$ source ~/developer1-finance-rc

1.2. Verify that the rhel7 image is available.

[student@workstation ~(developer1-finance)]$ openstack image list


+---------------+-------+--------+
| ID | Name | Status |
+---------------+-------+--------+
| 926c(...)4600 | rhel7 | active |
+---------------+-------+--------+

1.3. Verify that the m1.web flavor is available.

[student@workstation ~(developer1-finance)]$ openstack flavor list


+---------------+--------+------+------+-----------+-------+-----------+
| ID | Name | RAM | Disk | Ephemeral | VCPUs | Is Public |
+---------------+--------+------+------+-----------+-------+-----------+
| dd1b(...)6900 | m1.web | 2048 | 10 | 0 | 1 | True |
+---------------+--------+------+------+-----------+-------+-----------+

1.4. Verify that the finance-network1 network is available.

[student@workstation ~(developer1-finance)]$ openstack network list

CL210-RHOSP10.1-en-2-20171006 309

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

+---------------+---------------------+---------------+
| ID | Name | Subnets |
+---------------+---------------------+---------------+
| b0b7(...)0db4 | finance-network1 | a29f(...)855e |
... output omitted ...

1.5. Verify that the finance-web security group is available.

[student@workstation ~(developer1-finance)]$ openstack security group list


+---------------+-------------+------------------------+---------------+
| ID | Name | Description | Project |
+---------------+-------------+------------------------+---------------+
| bdfd(...)b154 | finance-web | finance-web | d9cc(...)ae0f |
... output omitted ...

1.6. Verify that the developer1-keypair1 key pair, and its associated file located at /
home/student/developer1-keypair1.pem are available.

[student@workstation ~(developer1-finance)]$ openstack keypair list


+---------------------+-----------------+
| Name | Fingerprint |
+---------------------+-----------------+
| developer1-keypair1 | cc:59(...)0f:f9 |
+---------------------+-----------------+
[student@workstation ~(developer1-finance)]$ file ~/developer1-keypair1.pem
/home/student/developer1-keypair1.pem: PEM RSA private key

1.7. Launch an instance named finance-web1 using the rhel7 image, the m1.web
flavor, the finance-network1 network, the finance-web security group, and the
developer1-keypair1 key pair. The instance deployment will return an error.

[student@workstation ~(developer1-finance)]$ openstack server create \


--image rhel7 \
--flavor m1.web \
--security-group finance-web \
--key-name developer1-keypair1 \
--nic net-id=finance-network1 \
finance-web1
...output omitted...

1.8. Verify the status of the finance-web1 instance. The instance status will be ERROR.

[student@workstation ~(developer1-finance)]$ openstack server show \


finance-web1 -c name -c status
+--------+--------------+
| Field | Value |
+--------+--------------+
| name | finance-web1 |
| status | ERROR |
+--------+--------------+

2. Verify on which host the Nova scheduler and Nova conductor services are running. You will
need to load the admin credentials located at the /home/student/admin-rc file.

2.1. Load the admin credentials located at the /home/student/admin-rc file.

310 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


[student@workstation ~(developer1-finance)]$ source ~/admin-rc

2.2. Verify in which host the Nova scheduler and Nova conductor services are running. Both
services are running in controller0.

[student@workstation ~(admin-admin)]$ openstack host list


+------------------------------------+-------------+----------+
| Host Name | Service | Zone |
+------------------------------------+-------------+----------+
| overcloud-controller-0.localdomain | scheduler | internal |
| overcloud-controller-0.localdomain | conductor | internal |
...output omitted...

3. Review the logs for the Compute scheduler and conductor services in controller0. Find
the issue related to a no valid host found for the finance-web1 instance in the Compute
conductor log file located at /var/log/nova/nova-conductor.log. Find also the issue
related to no hosts found by the compute filter in the Compute scheduler log file located at /
var/log/nova/nova-scheduler.log

3.1. Log in to controller0 as the heat-admin user.

[student@workstation ~(admin-admin)]$ ssh heat-admin@controller0

3.2. Become root in controller0.

[heat-admin@overcloud-controller-0 ~]$ sudo -i

3.3. Locate the log message in the Compute conductor log file, which sets the finance-
web1 instance's status to error, since no valid host is available to deploy the instance.
The log file shows the instance ID.

[root@overcloud-controller-0 heat-admin]# cat /var/log/nova/nova-conductor.log


...output omitted...
NoValidHost: No valid host was found. There are not enough hosts available.
(...) WARNING (...) [instance: 168548c9-a7bb-41e1-a7ca-aa77dca302f8] Setting
instance to ERROR state.
...output omitted...

3.4. Locate the log message, in the Nova scheduler file, which returns zero hosts for the
compute filter. When done, log out of the root account.

[root@overcloud-controller-0 heat-admin]# cat /var/log/nova/nova-scheduler.log


...output omitted...
(...) Filter ComputeFilter returned 0 hosts
(...) Filtering removed all hosts for the request with instance ID '168548c9-
a7bb-41e1-a7ca-aa77dca302f8'. (...)
[root@overcloud-controller-0 heat-admin]# exit

4. Verify how many Nova compute services are enabled.

4.1. Load the admin credentials.

CL210-RHOSP10.1-en-2-20171006 311

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

[heat-admin@overcloud-controller-0 ~]$ source overcloudrc

4.2. List the Compute services. The nova-compute service running on compute0 is
disabled.

[heat-admin@overcloud-controller-0 ~]$ openstack compute service list \


-c Binary -c Host -c Status
+------------------+------------------------------------+---------+
| Binary | Host | Status |
+------------------+------------------------------------+---------+
| nova-compute | overcloud-compute-0.localdomain | disabled |
...output omitted...
+------------------+------------------------------------+---------+

5. Enable and verify the Nova compute service in compute0.

5.1. Enable the Nova compute service on compute0.

[heat-admin@overcloud-controller-0 ~]$ openstack compute service set \


--enable \
overcloud-compute-0.localdomain \
nova-compute

5.2. Verify that the Nova compute service has been correctly enabled on compute0. When
done, log out from the controller node.

[heat-admin@overcloud-controller-0 ~]$ openstack compute service list \


-c Binary -c Host -c Status
+------------------+------------------------------------+---------+
| Binary | Host | Status |
+------------------+------------------------------------+---------+
| nova-compute | overcloud-compute-0.localdomain | enabled |
...output omitted...
[heat-admin@overcloud-controller-0 ~]$ exit

6. Launch the finance-web1 instance again from workstation using the developer1 user
credentials. Use the rhel7 image, the m1.web flavor, the finance-network1 network, the
finance-web security group, and the developer1-keypair1 key pair. The instance will
be deployed without errors. You will need to delete the previous instance deployment with
an error status before deploying the new instance.

6.1. Load the developer1 user credentials.

[student@workstation ~(admin-admin)]$ source ~/developer1-finance-rc

6.2. Delete the previous finance-web1 instance which deployment issued an error.

[student@workstation ~(developer1-finance)]$ openstack server delete \


finance-web1

312 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


6.3. Verify that the instance has been correctly deleted. The command should not return any
instances named finance-web1.

[student@workstation ~(developer1-finance)]$ openstack server list

6.4. Launch the finance-web1 instance again, using the rhel7 image, the m1.web
flavor, the finance-network1 network, the finance-web security group, and the
developer1-keypair1 key pair.

[student@workstation ~(developer1-finance)]$ openstack server create \


--image rhel7 \
--flavor m1.web \
--security-group finance-web \
--key-name developer1-keypair1 \
--nic net-id=finance-network1 \
--wait finance-web1
... output omitted ...

6.5. Verify the status of the finance-web1 instance. The instance status will be ACTIVE. It
may take some time for the instance's status to became ACTIVE.

[student@workstation ~(developer1-finance)]$ openstack server show \


finance-web1 -c name -c status
+--------+--------------+
| Field | Value |
+--------+--------------+
| name | finance-web1 |
| status | ACTIVE |
+--------+--------------+

Cleanup
From workstation, run the lab troubleshooting-compute-nodes cleanup script to
clean up this exercise.

[student@workstation ~]$ lab troubleshooting-compute-nodes cleanup

CL210-RHOSP10.1-en-2-20171006 313

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

Troubleshooting Authentication and Messaging

Objectives
After completing this section, students should be able to diagnose and troubleshoot the Identity
and Messaging services.

The OpenStack Identity Service Architecture


The Keystone identity service supports user authentication and authorization. This service is
the front-end service for a Red Hat OpenStack Platform environment. The cloud administrator
creates credentials for each user. These credentials usually include a user name, a password, and
an authentication URL. This authentication URL points to the Identity API. The Identity API is
used to authenticate to the Red Hat OpenStack Platform environment.

The Keystone identity service, like other OpenStack services, has three endpoints associated
to it. Those endpoints are the public endpoint, the admin endpoint, and the internal endpoint.
The public endpoint, by default bound to port TCP/5000, provides the API functionality required
for an external user to use Keystone authentication. This endpoint is usually the one used as
the authentication URL provided to cloud users. A user's machine needs to have access to the
TCP/5000 port on the machine where the Keystone identity service is running to authenticate in
the Red Hat OpenStack Platform environment. The Keystone identity service usually runs on the
controller node.

The admin endpoint provides additional functionality to the public endpoint. The other Red Hat
OpenStack Platform services use the internal endpoint to run authentication and authorization
queries on the Keystone identity service. The openstack catalog show identity
command displays the list of endpoints available for the user credentials.

[user@demo]$ openstack catalog show identity


+-----------+---------------------------------------------+
| Field | Value |
+-----------+---------------------------------------------+
| endpoints | regionOne |
| | publicURL: http://172.25.250.50:5000/v2.0 |
| | internalURL: http://172.24.1.50:5000/v2.0 |
| | adminURL: http://172.25.249.50:35357/v2.0 |
| | |
| name | keystone |
| type | identity |
+-----------+---------------------------------------------+

In the previous output, each endpoint uses a different IP address based on the availability
required for each of those endpoints. The HAProxy service manages all of these IP addresses.
This service runs on the controller node. The HAProxy configuration file includes two services
to manage the three endpoints' IP addresses: keystone_admin and keystone_public.
Both services include two IP addresses, one internal and one external. For example, the
keystone_public service serves the public endpoint using both an internal IP address and an
external IP address:

[user@demo]$ less /etc/haproxy/haproxy.cfg


...output omitted...
listen keystone_public

314 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


The OpenStack Messaging Service Architecture

bind 172.24.1.50:5000 transparent


bind 172.25.250.50:5000 transparent
mode http
http-request set-header X-Forwarded-Proto https if { ssl_fc }
http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
server overcloud-controller-0.internalapi.localdomain 172.24.1.1:5000 check fall 5
inter 2000 rise 2
...output omitted...

In the previous definition for the keystone_public service, the first IP address 172.24.1.50,
is configured with the internal IP address. This IP address is used by other OpenStack services
for user authentication and authorization, made possible by the Keystone identity service.
The second IP address configured for the keystone_public service, 172.25.250.50, is
configured with the external IP address. Cloud users use this IP address in their authorization
URL.

The Keystone identity service runs on top of the httpd service. Issues in Keystone are usually
related to the configuration or availability of either the HAProxy or httpd service. If the httpd
service is not available, the following error message is displayed:

[user@demo]$ openstack volume create --size 1 demo-volume


Discovering versions from the identity service failed when creating the password plugin.
Attempting to determine version from URL.
Service Unavailable (HTTP 503)

The OpenStack Messaging Service Architecture


Most of the OpenStack services are modular, so they can easily scale. These services run several
components that communicate using a messaging service. Red Hat OpenStack Platform supports
RabbitMQ as the default messaging service.

When a component wants to send a message to another component, the component places that
message in a queue. Both a user and a password are required to send the message to that queue.
All Red Hat OpenStack Platform services use the guest user to log into RabbitMQ.

Pacemaker manages the RabbitMQ service as a resource. The name for the Pacemaker resource
is rabbitmq. An issue with RabbitMQ availability usually means a blocked request for the cloud
user.

The status of the RabbitMQ service can be obtained using the rabbitmqctl cluster_status
command. This command displays basic information about the RabbitMQ cluster status.

[root@demo]# rabbitmqctl cluster_status


Cluster status of node 'rabbit@overcloud-controller-0' ...
[{nodes,[{disc,['rabbit@overcloud-controller-0']}]},
{running_nodes,['rabbit@overcloud-controller-0']},
{cluster_name,<<"rabbit@overcloud-controller-0.localdomain">>},
{partitions,[]},
{alarms,[{'rabbit@overcloud-controller-0',[]}]}]

Additional information, like the IP address where RabbitMQ is listening, is available using the
rabbitmqctl status command.

[root@demo]# rabbitmqctl status


Cluster status of node 'rabbit@overcloud-controller-0' ...

CL210-RHOSP10.1-en-2-20171006 315

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

[{nodes,[{disc,['rabbit@overcloud-controller-0']}]},
{running_nodes,['rabbit@overcloud-controller-0']},
{cluster_name,<<"rabbit@overcloud-controller-0.localdomain">>},
...output omitted...
{memory,[{total,257256704},
{connection_readers,824456},
{connection_writers,232456},
{connection_channels,1002976},
{connection_other,2633224},
{queue_procs,3842568},
{queue_slave_procs,0},
...output omitted...
{listeners,[{clustering,25672,"::"},{amqp,5672,"172.24.1.1"}]},
...output omitted...

The status of the Pacemaker resource for RabbitMQ can be viewed using the pcs status
command. This command shows the status and any error reports of all the resources configured
in the Pacemaker cluster.

[root@demo]# pcs status


Cluster name: tripleo_cluster
....output omitted...
Clone Set: haproxy-clone [haproxy]
Started: [ overcloud-controller-0 ]
... output omitted...

In case of failure of the rabbitmq resource, the resource can be restarted using the pcs
resource cleanup and the pcs resource debug-start as follows:

[root@demo]# pcs resource cleanup rabbitmq


Cleaning up rabbitmq:0 on overcloud-controller-0, removing fail-count-rabbitmq
Waiting for 1 replies from the CRMd. OK
[root@demo]# pcs resource debug-start rabbitmq
Operation start for rabbitmq:0 (ocf:heartbeat:rabbitmq-cluster) returned 0
> stderr: DEBUG: RabbitMQ server is running normally
> stderr: DEBUG: rabbitmq:0 start : 0

Troubleshooting Authentication and Messaging


The following steps outline the process for troubleshooting issues in authentication and
messaging services.

1. Log into an OpenStack controller node.

2. Review the HAProxy configuration for the keystone_public and keystone_admin


services.

3. Verify the RabbitMQ cluster's status.

4. Verify the Pacemaker cluster's status.

5. Verify the rabbitmq-clone resource's status.

6. Load admin credentials.

7. Verify that the cinder-scheduler and cinder-volume services are enabled.

8. Review the Cinder messaging configuration.

316 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


The OpenStack Messaging Service Architecture

References
Further information is available in the Logging, Monitoring, and Troubleshooting Guide
for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

CL210-RHOSP10.1-en-2-20171006 317

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

Guided Exercise: Troubleshooting


Authentication and Messaging

In this exercise, you will fix an issue with the authentication and messaging services.

Outcomes
You should be able to:

• Troubleshoot and fix an issue in the Keystone identity service.

• Troubleshoot and fix an issue related to RabbitMQ.

Before you begin


Log in to workstation as student using student as the password.

From workstation, run lab troubleshooting-authentication setup to verify that


OpenStack services are running, and the resources created in previous sections are available.
This script will also break the Keystone identity service and the RabbitMQ service.

[student@workstation ~]$ lab troubleshooting-authentication setup

Steps
1. Create a 1 GB volume named finance-volume1 using developer1 user credentials. The
command will raise an issue.

1.1. Load the developer1 user credentials.

[student@workstation ~]$ source ~/developer1-finance-rc

1.2. Create a 1 GB volume named finance-volume1. This command raises a service
unavailable issue.

[student@workstation ~(developer1-finance)]$ openstack volume create \


--size 1 finance-volume1
Discovering versions from the identity service failed when creating the password
plugin. Attempting to determine version from URL.
Service Unavailable (HTTP 503)

2. Verify that the IP address used in the authentication URL of the developer1 user
credentials file is the same one configured as a virtual IP in the HAProxy service for the
keystone_public service. The HAProxy service runs in controller0.

2.1. Find the authentication URL in the developer1 user credentials file.

[student@workstation ~(developer1-finance)]$ cat ~/developer1-finance-rc


unset OS_SERVICE_TOKEN
export OS_USERNAME=developer1
export OS_PASSWORD=redhat
export OS_AUTH_URL=http://172.25.250.50:5000/v2.0
export PS1='[\u@\h \W(developer1-finance)]\$ '
export OS_TENANT_NAME=finance

318 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


export OS_REGION_NAME=regionOne

2.2. Log in to controller0 as heat-admin.

[student@workstation ~(developer1-finance)]$ ssh heat-admin@controller0

2.3. Find the virtual IP address configured in the HAProxy service for the
keystone_public service.

[heat-admin@overcloud-controller-0 ~]$ sudo less /etc/haproxy/haproxy.cfg


...output omitted...
listen keystone_public
bind 172.24.1.50:5000 transparent
bind 172.25.250.50:5000 transparent
mode http
http-request set-header X-Forwarded-Proto https if { ssl_fc }
http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
server overcloud-controller-0.internalapi.localdomain 172.24.1.1:5000 check
fall 5 inter 2000 rise 2

2.4. Verify that the HAProxy service is active.

[heat-admin@overcloud-controller-0 ~]$ systemctl status haproxy


haproxy.service - Cluster Controlled haproxy
Loaded: loaded (/usr/lib/systemd/system/haproxy.service; disabled; vendor
preset: disabled)
Drop-In: /run/systemd/system/haproxy.service.d
└─50-pacemaker.conf
Active: active (running) since Thu 2017-06-15 08:45:47 UTC; 1h 8min ago
Main PID: 13096 (haproxy-systemd)
...output omitted...

2.5. Verify the status for the httpd service. The httpd service is inactive.

[heat-admin@overcloud-controller-0 ~]$ systemctl status httpd


httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor
preset: disabled)
Drop-In: /usr/lib/systemd/system/httpd.service.d
└─openstack-dashboard.conf
Active: inactive (dead) since Thu 2017-06-15 09:37:15 UTC; 21min ago
...output omitted...

3. Start the httpd service. It may take some time for the httpd service to be started.

3.1. Start the httpd service.

[heat-admin@overcloud-controller-0 ~]$ sudo systemctl start httpd

3.2. Verify that the httpd service is active. When done, log out from the controller node.

[heat-admin@overcloud-controller-0 ~]$ systemctl status httpd


httpd.service - The Apache HTTP Server

CL210-RHOSP10.1-en-2-20171006 319

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor


preset: disabled)
Drop-In: /usr/lib/systemd/system/httpd.service.d
└─openstack-dashboard.conf
Active: active (running) since Thu 2017-06-15 10:13:15 UTC; 1min 8s ag

[heat-admin@overcloud-controller-0 ~]$ logout

4. On workstation try to create a 1 GB volume named finance-volume1 again. The


command will hang because the Keystone identity service is not able to respond. Press
Ctrl+C to get back to the prompt.

[student@workstation ~(developer1-finance)]$ openstack volume create \


--size 1 \
finance-volume1
Ctrl+C

5. Verify that the previous issue is caused by the RabbitMQ service.

5.1. Log in to controller0 as heat-admin.

[student@workstation ~(developer1-finance)]$ ssh heat-admin@controller0

5.2. Verify that the log file for the Keystone identity service reports that the RabbitMQ
service in unreachable.

[heat-admin@overcloud-controller-0 ~]$ sudo less /var/log/keystone/keystone.log


...output omitted...
(...) AMQP server on 172.24.1.1:5672 is unreachable: [Errno 111] Connection
refused. (...)

5.3. Verify that the RabbitMQ cluster is down.

[heat-admin@overcloud-controller-0 ~]$ sudo rabbitmqctl cluster_status


Cluster status of node 'rabbit@overcloud-controller-0' ...
Error: unable to connect to node 'rabbit@overcloud-controller-0': nodedown
...output omitted...

6. Verify that the root cause for the RabbitMQ cluster unavailability is that the rabbitmq
Pacemaker resource is disabled. When done, enable the rabbitmq Pacemaker resource.

6.1. Verify that the root cause for the RabbitMQ cluster unavailability is that the rabbitmq
Pacemaker resource is disabled.

[heat-admin@overcloud-controller-0 ~]$ sudo pcs status


Cluster name: tripleo_cluster
Stack: corosync
...output omitted...
Clone Set: rabbitmq-clone [rabbitmq]
Stopped (disabled): [ overcloud-controller-0 ]
...output omitted...

320 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


6.2. Enable the rabbitmq resource in Pacemaker. When done, log out from the controller
node.

[heat-admin@overcloud-controller-0 ~]$ sudo pcs resource enable rabbitmq --wait


Resource 'rabbitmq' is running on node overcloud-controller-0.
[heat-admin@overcloud-controller-0 ~]$ logout

7. On workstation, try to create again a 1 GB volume, named finance-volume1. The volume


will be created successfully.

7.1. On workstation, try to create again a 1 GB volume, named finance-volume1.

[student@workstation ~(developer1-finance)]$ openstack volume create \


--size 1 finance-volume1
...output omitted...

7.2. Verify that the volume has been created successfully.

[student@workstation ~(developer1-finance)]$ openstack volume list


+---------------+-----------------+-----------+------+-------------+
| ID | Display Name | Status | Size | Attached to |
+---------------+-----------------+-----------+------+-------------+
| 9a21(...)2d1a | finance-volume1 | available | 1 | |
+---------------+-----------------+-----------+------+-------------+

Cleanup
From workstation, run the lab troubleshooting-authentication cleanup script to
clean up this exercise.

[student@workstation ~]$ lab troubleshooting-authentication cleanup

CL210-RHOSP10.1-en-2-20171006 321

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

Troubleshooting OpenStack Networking,


Image, and Volume Services

Objectives
After completing this section, students should be able to diagnose and troubleshoot the
OpenStack networking, image, and volume services.

Networking
This section discusses the different methods, commands, procedures and log files you can use to
troubleshoot OpenStack networking issues.

Unreachable Instances
Problem: You have created an instance but are unable to assign it a floating IP.

This problem can occur when the network is not setup correctly. If a router is not set as the
gateway for the external network, then users will not be able to assign a floating IP address to
an instance. Use the neutron router-gateway-set command to set the router as a gateway
for the external network. Then use the openstack server add floating ip command to
assign a floating IP address to the instance.

Note
Floating IPs can be created even if the router is not connected to the external gateway
but when the user attempts to associate a floating IP address with an instance, an
error will display.

[user@demo]$ openstack server add floating ip finance-web1 172.25.250.N


Error: External network 7aaf57c1-3c34-45df-94d3-dbc12754b22e is not reachable from
subnet
cfc7ddfa-4403-41a7-878f-e8679596eafd.

If a router is not set as the gateway for the external network, then users will not be able to assign
a floating IP address to an instance.

[user@demo]$ openstack router show finance-router1


+--------------------------------------------------------------+
| Field | Value |
+--------------------------------------------------------------+
| admin_state_up | UP |
| availability_zone_hints | |
| availability_zones | nova |
| created_at | 2017-06-15T09:39:07Z |
| description | |
| external_gateway_info | null |
| flavor_id | None |

...output omitted ...

Use the neutron router-gateway-set command to set the router as a gateway for the
external network.

322 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Networking

[user@demo]$ neutron router-gateway-set finance-web1 provider-172.25.250


[user@demo]$ openstack router show finance-router1 -f json
{
"external_gateway_info": "{\"network_id\": \"65606551-51f5-44f0-a389-1c96b728e05f
\", \"enable_snat\": true, \"external_fixed_ips\": [{\"subnet_id\":
\"9d12d02f-7818-486b-8cbf-015798e28a4d\", \"ip_address\": \"172.25.250.32\"}]}",

Use the openstack server add floating ip command to assign a floating IP address to
the instance.

[user@demo]$ openstack server add floating ip finance-web1 172.25.250.N

Use the openstack server list command to verify that a floating IP address has been
associated with the instance.

[user@demo]$ openstack server list -c Name -c Networks


+-----------------+---------------------------------------------+
| Name | Networks |
+-----------------+---------------------------------------------+
| finance-web1 | finance-network1=192.168.0.P, 172.25.250.N |
+-----------------+---------------------------------------------+

Problem: Check that a security group has been assigned to the instance and that a rule has
been added to allow SSH traffic. SSH rules are not included by default.

[user@demo]$ ssh -i developer1-keypair.pem cloud-user@172.25.250.N


Warning: Permanently added '172.25.250.N' (ECDSA) to the list of known hosts.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

Verify that a security group has been assigned to the instance and that it has a rule allowing
SSH. By default, it does not. The rule with the Port Range 22:22, should be associated to the
same security group than the instance. Verify this by comparing the IDs.

[user@demo]$ openstack server show finance-web1 -f json


...output omitted...
"security_groups": [
{
"name": "finance-web"
}
],
...output omitted...
[user@demo]$ openstack security group list
+---------------+-------------+-------------+---------------+
| ID | Name | Description | Project |
+---------------+-------------+-------------+---------------+
| 1728(...)443f | finance-web | | 1e7d(...)b191 |
...output omitted...
[user@demo]$ openstack security group rule list
+---------------+-------------+-----------+------------+---------------+
| ID | IP Protocol | IP Range | Port Range | Security Group|
+---------------+-------------+-----------+------------+---------------+
| 0049(...)dddc | None | None | | 98d4(...)43e5 |
| 31cf(...)aelb | tcp | 0.0.0.0/0 | 22:22 | 1728(...)443f |
+---------------+-------------+-----------+------------+---------------+

CL210-RHOSP10.1-en-2-20171006 323

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

This problem can also occur if the internal network was attached to the router after the
instance was created. In this situation, the instance is not able to contact the metadata service at
boot, therefore the key is not added to the authorized_keys for the cloud-user user.

This can be verified by checking the /var/log/cloud-init.log log file on the instance itself.
Alternatively, check the contents of /home/cloud-user/.ssh/authorized-keys. You can
gain access to the instance via Horizon.

[root@host-192-168-0-N ~]# less /var/log/cloud-init.log


...output omitted...
[ 134.170335] cloud-init[475]: 2014-07-01 07:33:22,857 -
url_helper.py[WARNING]: Calling 'http://192.168.0.1//latest/meta-data/instance- id'
failed [0/120s]:
request error [HTTPConnectionPool(host='192.168.0.1', port=80): Max retries
exceeded with url: //latest/meta-data/instance-id (...)
[Errno 113] No route to host)]
...output omitted...
[root@host-192-168-0-N ~]# cat /home/cloud-user/.ssh/authorized-keys
[root@host-192-168-0-N ~]#

In this situation, there is no option but to delete the instance, attach the subnet to the router, and
re-create the instance.

[user@demo]$ openstack server delete finance-web1


[user@demo]$ openstack subnet list
+---------------+----------------------------+---------------+-----------------+
| ID | Name | Network | Subnet |
+---------------+----------------------------+---------------+-----------------+
| 72c4(...)cc37 | provider-subnet-172.25.250 | 8b00(...)5285 | 172.25.250.0/24 |
| a520(...)1d9a | finance-subnet1 | f33a(...)42b2 | 192.168.0.0/24 |
+---------------+----------------------------+---------------+-----------------+
[user@demo]$ openstack router add subnet finance-router1 finance-subnet1
[user@demo]$ neutron router-port-list finance-router1 -c fixed-ips
+-------------------------------------------------------------+
| fixed_ips |
+-------------------------------------------------------------+
| {"subnet_id": "dbac(...)673d", "ip_address": "192.168.0.1"} |
+-------------------------------------------------------------+

Problem: A key pair was not assigned to the instance at creation. SSH will not be possible. In
this scenario, the instance must be destroyed and re-created and a key pair assigned at creation.

[user@demo]$ openstack server delete finance-web1


[user@demo]$ openstack server create \
--flavor m1.web \
--nic net-id=finance-network1 \
--key-name developer1-keypair1 \
--security-group finance-web \
--image finance-rhel7-web finance-web1 --wait
[user@demo]$ openstack server show finance-web1 -f json
...output omitted...
"key_name": "developer1-keypair1",
...output omitted...

324 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Images

Images
The Glance image service stores images and metadata. Images can be created by users and
uploaded to the Image service. The Glance image service has a RESTful API that allows users to
query the metadata of an image, as well as obtaining the actual image.

Logging
The Image service has two logging files. Their use can be configured by altering the [DEFAULT]
section of the /etc/glance/glance.api configuration file. In this file, you can dictate
where and how logs should be stored, which storage method should be used, and its specific
configuration.

You can also configure the Glance image service size limit. Use the image_size_cap=SIZE in
the [DEFAULT] section of the file. You can also specify a storage capacity per user by setting
the user_storage_quota=SIZE parameter in the [DEFAULT] section.

Service Service Name Log Path


OpenStack Image Service openstack-glance-api.service /var/log/glance/api.log
API Server
OpenStack Image Service openStack-glance-registry.service /var/log/glance/registry.log
Registry Server

Managing Images
When creating a new image, a user can choose to protect that image from deletion with the --
protected option. This prevents an image from being deleted even by the administrator. It must
be unprotected first, then deleted.

[user@demo]$ openstack image delete rhel7-web


Failed to delete image with name or ID '21b3b8ba-e28e-4b41-9150-ac5b44f9d8ef':
403 Forbidden
Image 21b3b8ba-e28e-4b41-9150-ac5b44f9d8ef is protected and cannot be deleted.
(HTTP 403)
Failed to delete 1 of 1 images.
[user@demo]$ openstack image set --unprotected rhel7-web
[user@demo]$ openstack image delete rhel7-web

Volumes
Ceph
The OpenStack block storage service can use Ceph as a storage back end. Each volume created
in the block storage service has an associated RBD image in Ceph. The name of the RBD image is
the ID of the block storage volume.

The OpenStack block storage service requires a user and a pool in Ceph in order to use it. The
user is openstack, the same user configured for other services using Ceph as their back
end, like the OpenStack image service. The undercloud also creates a dedicated Ceph pool for
the block storage services, named volumes. The volumes pool contains all the RBD images
associated to volumes. These settings are included in the /etc/cinder/cinder.conf configuration
file.

[user@demo]$ grep rbd_ /etc/cinder/cinder.conf


rbd_pool=volumes

CL210-RHOSP10.1-en-2-20171006 325

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

rbd_user=openstack
...output omitted...

Permissions within Ceph are known as capabilities, and are granted by daemon type, such as
MON or OSD. Three capabilities are available within Ceph: read (r) to view, write (w) to modify,
and execute (x) to execute extended object classes. All daemon types support these three
capabilities. For the OSD daemon type, permissions can be restricted to one or more pools, for
example osd 'allow rwx pool=rbd, allow rx pool=mydata'. If no pool is specified, the
permission is granted on all existing pools. The openstack user has capabilities on all the pools
used by OpenStack services.

The openstack user requires read, write, and execute capabilities in both the volumes and
the images pools to be used by the OpenStack block storage service. The images pool is the
dedicated pool for the OpenStack image service.

[user@demo]$ ceph auth list


installed auth entries:
...output omitted...
client.openstack
key: AQCg7T5ZAAAAABAAI6ZtsCQEuvVNqoyRKzeNcw==
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children,
allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms,
allow rwx pool=images, allow rwx pool=metrics

Attach and Detach Workflow


There are 3 API calls for each attach and detach operation.

• Status of the volume is updated in the database

• Connection operations on the volume are handled

• Status of the volume is finalized and the resource is released

In order to attach a volume, it must be in the available state. Any other state results in an
error message. It can happen that a volume is stuck in a detaching state. The state can be
altered by an admin user.

[user@demo]$ cinder reset-state --state available volume_id

If you try to delete a volume and it fails, you can forcefully delete that volume using the --force
option.

[user@demo]$ openstack volume delete --force volume_id

Incorrect volume configurations cause the most common block storage errors. Consult the
Cinder block storage service log files in case of error.

Log Files

Service Log Path


OpenStack Block Service API Server /var/log/cinder/api.log
OpenStack Block Service Registry Server /var/log/cinder/volume.log

326 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Volumes

Service Log Path


Openstack Cinder Scheduler Log /var/log/cinder/scheduler.log

The Block Storage service api.log is useful in determining whether the error is due to an
endpoint or connectivity error. That is, if you try to create a volume and it fails, then the
api.log is the one you should review. If the create request was received by the Block Storage
service, then you can verify the request in this api.log log file. Assuming the request is logged
in the api.log but there are no errors, check the volume.log for errors that may have
occurred during the create request.

For Cinder Block Storage services to function properly, it must be configured to use the
RabbitMQ messaging service. All Block Storage configuration can be found in the /etc/
cinder/cinder.conf configuration file, stored on the controller node. The default
rabbit_userid is guest. If that user is wrongly configured and the Block Storage services
are restarted, RabbitMQ will not respond to Block Storage service requests. Any volume created
during that period results in a status of ERROR. Any volume with a status of ERROR must be
deleted and re-created once the Cinder Block Storage service has been restarted and is running
properly.

To determine the problem in this scenario, review the /var/log/cinder/scheduler.log log


file on the controller node. If the problem is RabbitMQ, you will see the following:

[user@demo]$ sudo less /var/log/cinder/scheduler.log


...output omitted...
201 (...) Failed to run task
cinder.scheduler.flows.create_volume.ScheduleCreateVolumeTask;volume:create:
No valid host was found. No weighed hosts available

Verify that both the RabbitMQ Cluster and the rabbitmq-clone Pacemaker resource
are available. If both resources are available the problem could be found in the cinder.conf
configuration file. Check that all usernames, passwords, IP addresses and URLs in the /etc/
cinder/cinder.conf configuration file are correct.

[user@demo]$ sudo rabbitmqctl status


Status of node 'rabbit@overcloud-controller-0' ...
...output omitted...
{listeners,[{clustering,25672,"::"},{amqp,5672,"172.24.1.1"}]},
...output omitted...
[user@demo]$ sudo pcs resource show rabbitmq-clone
Clone: rabbitmq-clone
Meta Attrs: interleave=true ordered=true
Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}"
Meta Attrs: notify=true
Operations: monitor interval=10 timeout=40 (rabbitmq-monitor-interval-10)
start interval=0s timeout=200s (rabbitmq-start-interval-0s)
stop interval=0s timeout=200s (rabbitmq-stop-interval-0s)

Troubleshooting OpenStack Networking, Image, and Volume Services


The following steps outline the process for troubleshooting issues in networking, image, and
volume services.

1. Load user credentials.

2. Try to delete a protected image.

CL210-RHOSP10.1-en-2-20171006 327

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

3. Unprotect a protected image.

4. Delete the image.

5. Load admin credentials.

6. Verify that a router has an external network configured as a gateway.

7. Log into an OpenStack controller.

8. Verify the Ceph back end configuration for the Cinder volume service.

9. Verify the capabilities configured for the Cinder volume service user in Ceph.

References
Further information is available in the Logging, Monitoring, and Troubleshooting Guide
for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

328 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Guided Exercise: Troubleshooting OpenStack Networking, Image, and Volume Services

Guided Exercise: Troubleshooting OpenStack


Networking, Image, and Volume Services

In this exercise, you will fix an issue related to image requirements. You will also fix an issue with
the accessibility of the metadata service. Finally, you will fix an issue with the Ceph back end for
the OpenStack Block Storage service.

Outcomes
You should be able to:

• Troubleshoot and fix an issue in the OpenStack Image service.

• Troubleshoot and fix an issue in the OpenStack Networking service.

• Troubleshoot and fix an issue in the OpenStack Block Storage service.

Before you begin


Log in to workstation as student using student as the password.

From workstation, run lab troubleshooting-services setup to verify that OpenStack


services are running, and resources created in previous sections are available. This script
also creates the m1.lite flavor, and detaches the finance-subnet1 subnetwork from the
finance-router1 router. Finally, the script will break the Ceph back end configuration for the
OpenStack Block Storage service.

[student@workstation ~]$ lab troubleshooting-services setup

Steps
1. Launch an instance named finance-web1. Use the rhel7 image, the finance-web
security group, the developer1-keypair1 key pair, the m1.lite flavor, and the
finance-network1 network. The instance's deployment will fail because the flavor does
not meet the image's minimal requirements.

1.1. Load the credentials for the developer1 user.

[student@workstation ~]$ source ~/developer1-finance-rc

1.2. Verify that the rhel7 image is available.

[student@workstation ~(developer1-finance)]$ openstack image list


+---------------+-------+--------+
| ID | Name | Status |
+---------------+-------+--------+
| 5864(...)ad03 | rhel7 | active |
+---------------+-------+--------+

1.3. Verify that the finance-web security group is available.

[student@workstation ~(developer1-finance)]$ openstack security group list


+---------------+-------------+------------------------+---------------+
| ID | Name | Description | Project |

CL210-RHOSP10.1-en-2-20171006 329

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

+---------------+-------------+------------------------+---------------+
| 0cb6(...)5c7e | finance-web | finance-web | 3f73(...)d660 |
...output omitted...

1.4. Verify that the developer1-keypair1 key pair, and its associated key file located at /
home/student/developer1-keypair1.pem are available.

[student@workstation ~(developer1-finance)]$ openstack keypair list


+---------------------+-----------------+
| Name | Fingerprint |
+---------------------+-----------------+
| developer1-keypair1 | 04:9c(...)cb:1d |
+---------------------+-----------------+
[student@workstation ~(developer1-finance)]$ file ~/developer1-keypair1.pem
/home/student/developer1-keypair1.pem: PEM RSA private key

1.5. Verify that the m1.lite flavor is available.

[student@workstation ~(developer1-finance)]$ openstack flavor list


+---------------+---------+------+------+-----------+-------+-----------+
| ID | Name | RAM | Disk | Ephemeral | VCPUs | Is Public |
+---------------+---------+------+------+-----------+-------+-----------+
| 7998(...)bc36 | m1.lite | 1024 | 5 | 0 | 1 | True |
...output omitted...

1.6. Verify that the finance-network1 network is available.

[student@workstation ~(developer1-finance)]$ openstack network list


+---------------+---------------------+--------------------------------------+
| ID | Name | Subnets |
+---------------+---------------------+--------------------------------------+
| a4c9(...)70ff | finance-network1 | ec0d(...)480b |
...output omitted...
+---------------+---------------------+--------------------------------------+

1.7. Create an instance named finance-web1. Use the rhel7 image, the finance-web
security group, the developer1-keypair1 key pair, the m1.lite flavor, and the
finance-network1 network. The instance's deployment will fail because the flavor
does not meet the image's minimal requirements.

[student@workstation ~(developer1-finance)]$ openstack server create \


--image rhel7 \
--security-group finance-web \
--key-name developer1-keypair1 \
--flavor m1.lite \
--nic net-id=finance-network1 \
finance-web1
Flavor's memory is too small for requested image. (HTTP 400) (...)

2. Verify the rhel7 image requirements for memory and disk, and the m1.lite flavor
specifications.

2.1. Verify the rhel7 image requirements for both memory and disk. The minimum disk
required is 10 GB. The minimum memory required is 2048 MB.

330 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


[student@workstation ~(developer1-finance)]$ openstack image show rhel7
+------------------+----------+
| Field | Value |
+------------------+----------+
...output omitted...

| min_disk | 10 |
| min_ram | 2048 |
| name | rhel7 |
...output omitted...

2.2. Verify the m1.lite flavor specifications. The disk and memory specifications for the
m1.lite flavor do not meet the rhel7 image requirements.

[student@workstation ~(developer1-finance)]$ openstack flavor show m1.lite


+-------------+--------------------------------------+
| Field | Value |
+-------------+--------------------------------------+
...output omitted...
| disk | 5 |
| name | m1.lite |
| ram | 1024 |
...output omitted...

3. Verify that the m1.web flavor meets the rhel7 image requirements. Launch an instance
named finance-web1. Use the rhel7 image, the finance-web security group, the
developer1-keypair1 key pair, the m1.web flavor, and the finance-network1 network.
The instance's deployment will be successful.

3.1. Verify that the m1.web flavor meets the rhel7 image requirements.

[student@workstation ~(developer1-finance)]$ openstack flavor show m1.web


+-------------+--------------------------------------+
| Field | Value |
+-------------+--------------------------------------+
...output omitted...
| disk | 10 |
| name | m1.web |
| ram | 2048 |
...output omitted...

3.2. Launch an instance named finance-web1. Use the rhel7 image, the finance-
web security group, the developer1-keypair1 key pair, the m1.web flavor, and the
finance-network1 network.

[student@workstation ~(developer1-finance)]$ openstack server create \


--image rhel7 \
--security-group finance-web \
--key-name developer1-keypair1 \
--flavor m1.web \
--nic net-id=finance-network1 \
--wait \
finance-web1
...output omitted...

CL210-RHOSP10.1-en-2-20171006 331

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

3.3. Verify that the finance-web1 instance is ACTIVE.

[student@workstation ~(developer1-finance)]$ openstack server list \


-c Name -c Status
+--------------+--------+
| Name | Status |
+--------------+--------+
| finance-web1 | ACTIVE |
+--------------+--------+

4. Attach an available floating IP to the finance-web1. The floating IP will not be attached
because the external network is not reachable from the internal network.

4.1. Verify which floating IPs are available.

[student@workstation ~(developer1-finance)]$ openstack floating ip list


+----------------+---------------------+------------------+------+
| ID | Floating IP Address | Fixed IP Address | Port |
+----------------+---------------------+------------------+------+
| a49b(...)a7812 | 172.25.250.P | None | None |
+----------------+---------------------+------------------+------+

4.2. Attach the previous floating IP to the finance-web1. The floating IP will not be
attached because the external network is not reachable from the internal network.

[student@workstation ~(developer1-finance)]$ openstack server add \


floating ip finance-web1 172.25.250.P
Unable to associate floating IP 172.25.250.P to fixed IP 192.168.0.N (...)
Error: External network cb3a(...)6a35 is not reachable from subnet
ec0d(...)480b.(...)

5. Fix the previous issue by adding the finance-subnet1 subnetwork to the finance-
router1 router.

5.1. Verify that the finance-router1 router is ACTIVE.

[student@workstation ~(developer1-finance)]$ openstack router list \


-c Name -c Status -c State
+-----------------+--------+-------+
| Name | Status | State |
+-----------------+--------+-------+
| finance-router1 | ACTIVE | UP |
+-----------------+--------+-------+

5.2. Verify the current subnetworks added to the finance-router1 router. No output will
display because the subnetwork has not been added.

[student@workstation ~(developer1-finance)]$ neutron router-port-list \


finance-router1

5.3. Add the finance-subnet1 subnetwork to the finance-router1 router.

332 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


[student@workstation ~(developer1-finance)]$ openstack router add subnet \
finance-router1 finance-subnet1

5.4. Verify that the finance-subnet1 subnetwork has been correctly added to the
finance-router1 router

[student@workstation ~(developer1-finance)]$ neutron router-port-list \


finance-router1 -c fixed_ips
+-------------------------------------------------------------+
| fixed_ips |
+-------------------------------------------------------------+
| {"subnet_id": "dbac(...)673d", "ip_address": "192.168.0.1"} |
+-------------------------------------------------------------+

6. Attach the available floating IP to the finance-web1 instance. When done, log in to
the finance-web1 instance as the cloud-user user, using the /home/student/
developer1-keypair1.pem key file. Even though the floating IP address is attached to
the finance-web1 instance, logging in to the instance will fail. This issue will be resolved in
an upcoming step in this exercise.

6.1. Attach the available floating IP to the finance-web1 instance.

[student@workstation ~(developer1-finance)]$ openstack server add floating ip \


finance-web1 172.25.250.P

6.2. Log in to the finance-web1 instance as the cloud-user user, using the /home/
student/developer1-keypair1.pem key file.

[student@workstation ~(developer1-finance)]$ ssh -i ~/developer1-keypair1.pem \


cloud-user@172.25.250.P
Warning: Permanently added '172.25.250.P' (ECDSA) to the list of known hosts.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

7. Verify that the instance is not able to contact the metadata service at boot time. The
metadata service is not reachable because the finance-subnet1 was not connected to
the finance-router1 router when the finance-web1 instance was created. This is the
root cause for the previous issue because the key is not added to the authorized_keys
for the cloud-user user.

7.1. Obtain the console URL for the finance-web1 instance.

[student@workstation ~(developer1-finance)]$ openstack console url show \


finance-web1
+-------+-------------------------------------------------------------+
| Field | Value |
+-------+-------------------------------------------------------------+
| type | novnc |
| url | http://172.25.250.50:6080/vnc_auto.html?token=c93c(...)d896 |
+-------+-------------------------------------------------------------+

7.2. Open Firefox, and navigate to the finance-web1 instance's console URL.

CL210-RHOSP10.1-en-2-20171006 333

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

7.3. Log in to the finance-web1 instance's console as the root user, using redhat as a
password.

7.4. Verify that the authorized_keys file for the cloud-user is empty. No key has been
injected by cloud-init during the instance's boot process.

[root@host-192-168-0-N ~]# cat /home/cloud-user/.ssh/authorized_keys

7.5. Verify in the cloud-init log file, located at /var/log/cloud-init.log, that the
finance-web1 instance cannot reach the metadata service during its boot process.

[root@host-192-168-0-N ~]# less /var/log/cloud-init.log


...output omitted...
[ 134.170335] cloud-init[475]: 2014-07-01 07:33:22,857 -
url_helper.py[WARNING]: Calling 'http://192.168.0.1//latest/meta-data/instance-
id' failed [0/120s]:
request error [HTTPConnectionPool(host='192.168.0.1', port=80): Max retries
exceeded with url: //latest/meta-data/instance-id (...)
[Errno 113] No route to host)]
...output omitted...

8. On workstation, stop then start finance-web1 instance to allow cloud-init to recover.


The metadata service is reachable now because the finance-subnet1 subnetwork is
connected to the finance-router1 router.

8.1. Stop the finance-web1 instance.

[student@workstation ~(developer1-finance)]$ openstack server stop \


finance-web1

8.2. Verify that the finance-web1 instance is in the SHUTOFF state.

[student@workstation ~(developer1-finance)]$ openstack server show \


finance-web1 -c status -f value
SHUTOFF

8.3. Start the finance-web1 instance.

[student@workstation ~(developer1-finance)]$ openstack server start \


finance-web1

8.4. Log in to the finance-web1 instance as the cloud-user user, using the /home/
student/developer1-keypair1.pem key file.

[student@workstation ~(developer1-finance)]$ ssh -i ~/developer1-keypair1.pem \


cloud-user@172.25.250.P
Warning: Permanently added '172.25.250.P' (ECDSA) to the list of known hosts.

8.5. Verify that the authorized_keys file for the cloud-user user has had a key injected
into it. When done, log out from the instance.

334 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


[cloud-user@finance-web1 ~]$ cat .ssh/authorized_keys
ssh-rsa AAAA(...)JDGZ Generated-by-Nova
[cloud-user@finance-web1 ~]$ exit

9. On workstation, create a 1 GB volume named finance-volume1. The volume creation will


fail.

9.1. On workstation, create a 1 GB volume, named finance-volume1.

[student@workstation ~(developer1-finance)]$ openstack volume create \


--size 1 finance-volume1
...output omitted...

9.2. Verify the status of the volume finance-volume1. The volume's status will be error.

[student@workstation ~(developer1-finance)]$ openstack volume list


+---------------+-----------------+--------+------+-------------+
| ID | Display Name | Status | Size | Attached to |
+---------------+-----------------+--------+------+-------------+
| b375(...)0008 | finance-volume1 | error | 1 | |
+---------------+-----------------+--------+------+-------------+

10. Confirm the reason that the finance-volume1 volume was not correctly created. It is
because no valid host was found by the Block Storage scheduler service.

10.1. Log in to controller0 as heat-admin.

[student@workstation ~(developer1-finance)]$ ssh heat-admin@controller0

10.2.Verify that the Block Storage scheduler log file, located at /var/log/cinder/
scheduler.log, reports a no valid host issue.

[heat-admin@overcloud-controller-0 ~]$ sudo less /var/log/cinder/scheduler.log


...output omitted...
(...) in rbd.RBD.create (rbd.c:3227)\n', u'PermissionError: error creating image
\n']
(...) No valid host was found. (...)

11. Verify that the Block Storage volume service's status is up to discard any issue related to
RabbitMQ.

11.1. Load admin credentials.

[heat-admin@overcloud-controller-0 ~]$ source overcloudrc

11.2. Verify that the Block Storage volume service's status is up.

[heat-admin@overcloud-controller-0 ~]$ openstack volume service list \


-c Binary -c Status -c State
+------------------+---------+-------+
| Binary | Status | State |

CL210-RHOSP10.1-en-2-20171006 335

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

+------------------+---------+-------+
| cinder-volume | enabled | up |
...output omitted...
+------------------+---------+-------+

12. Verify that the Block Storage service is configured to use the openstack user, and
the volumes pool. When done, verify that the volume creation error is related to the
permissions of the openstack user in Ceph. This user needs read, write and execute
capabilities on the volumes pools.

12.1. Verify that the block storage service is configured to use the openstack user, and the
volumes pool.

[heat-admin@overcloud-controller-0 ~]$ sudo grep "rbd_" \


/etc/cinder/cinder.conf
...output omitted...
rbd_pool=volumes
rbd_user=openstack
...output omitted...

12.2.Log in to ceph0 as heat-admin.

[heat-admin@overcloud-controller-0 ~]$ exit


[student@workstation ~(developer1-finance)]$ ssh heat-admin@ceph0

12.3.Verify that the volumes pool is available.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd lspools


0 rbd,1 metrics,2 images,3 backups,4 volumes,5 vms,

12.4.Verify that the openstack user has no capabilities on the volumes pool.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph auth get client.openstack


exported keyring for client.openstack
[client.openstack]
key = AQCg7T5ZAAAAABAAI6ZtsCQEuvVNqoyRKzeNcw==
caps mon = "allow r"
caps osd = "allow class-read object_prefix rbd_children, allow rwx
pool=backups,
allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics"

13. Fix the issue by adding read, write, and execute capabilities to the openstack user on the
volumes pool.

13.1. Add the read, write, and execute capabilities to the openstack user on the volumes
pool. Unfortunately, you cannot simply add to the list, you must retype it entirely.

Important
Please note that the line starting with osd must be entered as a single line.

336 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph auth caps \
client.openstack \
mon 'allow r' \
osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes,
allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx
pool=metrics'
updated caps for client.openstack

13.2.Verify that the openstack user's capabilities has been correctly updated. When done,
log out from the Ceph node.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph auth get client.openstack


exported keyring for client.openstack
[client.openstack]
key = AQCg7T5ZAAAAABAAI6ZtsCQEuvVNqoyRKzeNcw==
caps mon = "allow r"
caps osd = "allow class-read object_prefix rbd_children, allow rwx
pool=volumes,
allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx
pool=metrics"
[heat-admin@overcloud-cephstorage-0 ~]$ logout

14. On workstation, try to create again a 1 GB volume, named finance-volume1. The volume
creation will be successful. You need to delete the failed finance-volume1 volume.

14.1. Delete the failed finance-volume1 volume.

[student@workstation ~(developer1-finance)]$ openstack volume delete \


finance-volume1

14.2.Create a 1 GB volume named finance-volume1.

[student@workstation ~(developer1-finance)]$ openstack volume create \


--size 1 finance-volume1

14.3.Verify that the finance-volume1 volume has been correctly created. The volume
status should show available, if status is error, please ensure permissions were set
correctly in the previous step.

[student@workstation ~(developer1-finance)]$ openstack volume list


+---------------+-----------------+-----------+------+-------------+
| ID | Display Name | Status | Size | Attached to |
+---------------+-----------------+-----------+------+-------------+
| e454(...)ddc8 | finance-volume1 | available | 1 | |
+---------------+-----------------+-----------+------+-------------+

Cleanup
From workstation, run the lab troubleshooting-services cleanup script to clean up
this exercise.

CL210-RHOSP10.1-en-2-20171006 337

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

[student@workstation ~]$ lab troubleshooting-services cleanup

338 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Lab: Troubleshooting OpenStack

Lab: Troubleshooting OpenStack

In this lab, you will find and fix issues in the OpenStack environment. You will solve problems in
the areas of authentication, networking, compute nodes, and security. Finally, you will launch an
instance and ensure that everything is working as it should.

Outcomes
You should be able to:

• Troubleshoot authentication issues within OpenStack

• Search log files to help describe the nature of the problem

• Troubleshoot messaging issues within OpenStack

• Troubleshoot networking issues within OpenStack

Before you begin


Log in to workstation as student using student as the password.

From workstation, run lab troubleshooting-review setup, which verifies that


OpenStack services are running and the resources required for the lab are available. This script
also breaks the nova configuration, authentication, and networking. This script downloads the
QCOW2 file that you need to create images, and creates the rc files (admin-rc and operator1-
production-rc) that you will need during this lab.

[student@workstation ~]$ lab troubleshooting-review setup

Steps
1. As the operator1 user, remove the existing image called production-rhel7. The
operator1-production-rc file can be found in student's home directory on
workstation. Troubleshoot any problems.

2. Source the admin-rc credential file, then run lab troubleshooting-review break to
set up the next part of the lab exercise.

[student@workstation ~]$ source ~/admin-rc


[student@workstation ~(admin-admin)]$ lab troubleshooting-review break

3. Re-source the /home/student/operator1-production-rc and attempt to list the


images. It should fail. Troubleshoot any issues and fix the problem.

4. Create a new server instance named production-web1. Use the m1.web flavor, the
operator1-keypair1 key pair, the production-network1 network, the production-
web security group, and the rhel7 image. This action will fail. Troubleshoot any issues and
fix the problem.

5. Create a floating IP address and assign it to the instance. Troubleshoot any issues and fix the
problem.

CL210-RHOSP10.1-en-2-20171006 339

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

6. Access the instance using SSH. An error will occur. Troubleshoot any issues and fix the
problem.

7. Create a volume named production-volume1, size 1 GB. Verify the volume status.
Use the admin user's Identity service rc file on controller0 at /home/heat-admin/
overcloudrc. Troubleshoot any issues and fix the problem.

Evaluation
On workstation, run the lab troubleshooting-review grade command to confirm
success of this exercise.

[student@workstation ~]$ lab troubleshooting-review grade

Cleanup
From workstation, run the lab troubleshooting-review cleanup script to clean up this
exercise.

[student@workstation ~]$ lab troubleshooting-review cleanup

340 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

Solution
In this lab, you will find and fix issues in the OpenStack environment. You will solve problems in
the areas of authentication, networking, compute nodes, and security. Finally, you will launch an
instance and ensure that everything is working as it should.

Outcomes
You should be able to:

• Troubleshoot authentication issues within OpenStack

• Search log files to help describe the nature of the problem

• Troubleshoot messaging issues within OpenStack

• Troubleshoot networking issues within OpenStack

Before you begin


Log in to workstation as student using student as the password.

From workstation, run lab troubleshooting-review setup, which verifies that


OpenStack services are running and the resources required for the lab are available. This script
also breaks the nova configuration, authentication, and networking. This script downloads the
QCOW2 file that you need to create images, and creates the rc files (admin-rc and operator1-
production-rc) that you will need during this lab.

[student@workstation ~]$ lab troubleshooting-review setup

Steps
1. As the operator1 user, remove the existing image called production-rhel7. The
operator1-production-rc file can be found in student's home directory on
workstation. Troubleshoot any problems.

1.1. Source the /home/student/operator1-production-rc file.

[student@workstation ~]$ source ~/operator1-production-rc

1.2. Delete the existing image.

[student@workstation ~(operator1-production)]$ openstack image delete \


production-rhel7
Failed to delete image with name or ID '21b3b8ba-e28e-4b41-9150-ac5b44f9d8ef':
403 Forbidden
Image 21b3b8ba-e28e-4b41-9150-ac5b44f9d8ef is protected and cannot be deleted.
(HTTP 403)
Failed to delete 1 of 1 images.

1.3. The error you see is because the image is currently protected. You need to unprotect
the image before it can be deleted.

[student@workstation ~(operator1-production)]$ openstack image set \


--unprotected production-rhel7
[student@workstation ~(operator1-production)]$ openstack image delete \

CL210-RHOSP10.1-en-2-20171006 341

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

production-rhel7

2. Source the admin-rc credential file, then run lab troubleshooting-review break to
set up the next part of the lab exercise.

[student@workstation ~]$ source ~/admin-rc


[student@workstation ~(admin-admin)]$ lab troubleshooting-review break

3. Re-source the /home/student/operator1-production-rc and attempt to list the


images. It should fail. Troubleshoot any issues and fix the problem.

3.1. [student@workstation ~(admin-admin)]$ source ~/operator1-production-rc


[student@workstation ~(operator1-production)]$ openstack image list
Discovering versions from the identity service failed when creating
the password plugin. Attempting to determine version from URL. Unable
to establish connection to http://172.25.251.50:5000/v2.0/tokens:
HTTPConnectionPool(host='172.25.251.50', port=5000): Max retries exceeded
with url: /v2.0/tokens.................: Failed to establish a new connection:
[Errno 110] Connection timed out',)

3.2. The error occurs because OpenStack cannot authenticate the operator1 user. This can
happen when the rc file for the user has a bad IP address. Check the rc file and note
the OS_AUTH_URL address. Compare this IP address to the one that can be found
in /etc/haproxy/haproxy.cfg on controller0. Search for the line: listen
keystone_public. The second IP address is the one that must be used in the user's rc
file. When done, log out from the controller node.

[student@workstation ~(operator1-production)]$ ssh heat-admin@controller0 \


cat /etc/haproxy/haproxy.cfg
...output omitted...
listen keystone_public
bind 172.24.1.50:5000 transparent
bind 172.25.250.50:5000 transparent
...output omitted...

3.3. Compare the IP address from HAproxy and the rc file. You need to change it to the
correct IP address to continue.

...output omitted...
export OS_AUTH_URL=http://172.25.251.50:5000/v2.0
...output omitted...

3.4. Edit the file and correct the IP address.

...output omitted...
export OS_AUTH_URL=http://172.25.250.50:5000/v2.0
...output omitted...

3.5. Source the operator1-production-rc again. Use the openstack image list
command to ensure that the OS_AUTH_URL option is correct.

[student@workstation ~(operator1-production)]$ source ~/operator1-production-rc

342 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

[student@workstation ~(operator1-production)]$ openstack image list


+--------------------------------------+-------+--------+
| ID | Name | Status |
+--------------------------------------+-------+--------+
| 21b3b8ba-e28e-4b41-9150-ac5b44f9d8ef | rhel7 | active |
+--------------------------------------+-------+--------+

4. Create a new server instance named production-web1. Use the m1.web flavor, the
operator1-keypair1 key pair, the production-network1 network, the production-
web security group, and the rhel7 image. This action will fail. Troubleshoot any issues and
fix the problem.

4.1. Create a new server instance.

[student@workstation ~(operator1-production)]$ openstack server create \


--flavor m1.web \
--key-name operator1-keypair1 \
--nic net-id=production-network1 \
--security-group production-web \
--image rhel7 --wait production-web1
Error creating server: production-web1
Error creating server

4.2. This error is due to a problem with the nova compute service. List the Nova services.
You need to source the /home/student/admin-rc first, as operator1 does not have
permission to interact directly with nova services.

[student@workstation ~(operator1-production)]$ source ~/admin-rc


[student@workstation ~(admin-admin)]$ nova service-list
+----+-----------------+-----------------------------------+----------+------+
| ID | Binary | Host | Status | State|
+----+-----------------+-----------------------------------+----------+------+
| 3 | nova-consoleauth| overcloud-controller-0.localdomain| enabled | up |
| 4 | nova-scheduler | overcloud-controller-0.localdomain| enabled | up |
| 5 | nova-conductor | overcloud-controller-0.localdomain| enabled | up |
| 7 | nova-compute | overcloud-compute-0.localdomain | disabled | down |
+----+-----------------+-----------------------------------+----------+------+

4.3. Restart the nova-compute service.

[student@workstation ~(admin-admin)]$ nova service-enable \


overcloud-compute-0.localdomain \
nova-compute
+---------------------------------+--------------+---------+
| Host | Binary | Status |
+---------------------------------+--------------+---------+
| overcloud-compute-0.localdomain | nova-compute | enabled |
+---------------------------------+--------------+---------+

4.4. Source the operator1 rc file and try to create the instance again. First, delete the
instance that is currently showing an error status. The instance deployment will finish
correctly.

[student@workstation ~(admin-admin)]$ source ~/operator1-production-rc


[student@workstation ~(operator1-production)]$ openstack server delete \

CL210-RHOSP10.1-en-2-20171006 343

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

production-web1
[student@workstation ~(operator1-production)]$ openstack server list

[student@workstation ~(operator1-production)]$ openstack server create \


--flavor m1.web \
--nic net-id=production-network1 \
--key-name operator1-keypair1 \
--security-group production-web \
--image rhel7 --wait production-web1

5. Create a floating IP address and assign it to the instance. Troubleshoot any issues and fix the
problem.

5.1. Create the floating IP

[student@workstation ~(operator1-production)]$ openstack floating ip create \


provider-172.25.250
+--------------------+---------------------+------------------+------+
| ID | Floating IP Address | Fixed IP Address | Port |
+--------------------+---------------------+------------------+------+
| ce31(...)9ecb | 172.25.250.N | None | None |
+--------------------+---------------------+------------------+------+
[student@workstation ~(operator1-production)]$ openstack server add \
floating ip production-web1 172.25.250.N
Unable to associate floating IP 172.25.250.N to fixed IP 192.168.0.6
for instance a53e66d9-6413-4ae4-b95b-2012dd52f908. Error: External
network 7aaf57c1-3c34-45df-94d3-dbc12754b22e is not reachable from subnet
cfc7ddfa-4403-41a7-878f-e8679596eafd. Therefore, cannot associate Port
dcb6692d-0094-42ec-bc8e-a52fd97d7a4c with a Floating IP.
Neutron server returns request_ids: ['req-4f88fb24-7628-4155-a921-ff628cb4b371']
(HTTP 400) (Request-ID: req-d6862c58-66c4-44b6-a4d1-bf26514bf04b)

This error message occurs because the external network is not attached to the router of
the internal network.

5.2. Create an interface.

[student@workstation ~(operator1-production)]$ neutron router-gateway-set \


production-router1 provider-172.25.250

5.3. Attach the floating IP address to the instance. Verify that the instance has been
assigned the floating IP address.

[student@workstation ~(operator1-production)]$ openstack server add \


floating ip production-web1 172.25.250.N
[student@workstation ~(operator1-production)]$ openstack server list \
-c Name -c Networks
+-----------------+-----------------------------------------------+
| Name | Networks |
+-----------------+-----------------------------------------------+
| production-web1 | production-network1=192.168.0.P, 172.25.250.N |
+-----------------+-----------------------------------------------+

6. Access the instance using SSH. An error will occur. Troubleshoot any issues and fix the
problem.

344 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

6.1. Attempt to access the instance using SSH.

[student@workstation ~(operator1-production)]$ ssh -i ~/operator1-keypair1.pem \


cloud-user@172.25.250.N
ssh: connect to host 172.25.250.N port 22: Connection timed out

6.2. Find out which security group the instance is using, then list the rules in that security
group.

[student@workstation ~(operator1-production)]$ openstack server show \


production-web1 -f json
...output omitted...
"security_groups": [
{
"name": "production-web"
}
],
...output omitted...
[student@workstation ~(operator1-production)]$ openstack security group rule \
list production-web
+---------------+-------------+----------+------------+-----------------------+
| ID | IP Protocol | IP Range | Port Range | Remote Security Group |
+---------------+-------------+----------+------------+-----------------------+
| cc92(...)95b1 | None | None | | None |
| eb84(...)c6e7 | None | None | | None |
+---------------+-------------+----------+------------+-----------------------+

6.3. We can see that there is no rule allowing SSH to the instance. Create the security group
rule.

[student@workstation ~(operator1-production)]$ openstack security group rule \


create --protocol tcp --dst-port 22:22 production-web
+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| created_at | 2017-06-12T07:24:34Z |
| description | |
| direction | ingress |
| ethertype | IPv4 |
| headers | |
| id | 06070264-1427-4679-bd8e-e3a8f2e189e9 |
| port_range_max | 22 |
| port_range_min | 22 |
| project_id | 9913a8abd192443c96587a8dc1c0a364 |
| project_id | 9913a8abd192443c96587a8dc1c0a364 |
| protocol | tcp |
| remote_group_id | None |
| remote_ip_prefix | 0.0.0.0/0 |
| revision_number | 1 |
| security_group_id | ac9ae6e6-0056-4501-afea-f83087b8297f |
| updated_at | 2017-06-12T07:24:34Z |
+-------------------+--------------------------------------+

6.4. Now try to access the instance again.

[student@workstation ~(operator1-production)]$ ssh -i ~/operator1-keypair1.pem \


cloud-user@172.25.250.N

CL210-RHOSP10.1-en-2-20171006 345

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

Warning: Permanently added '172.25.250.N' (ECDSA) to the list of known hosts.


[cloud-user@production-web1 ~]$

6.5. Log out from production-web1.

[cloud-user@production-web1 ~]$ exit


Connection to 172.25.250.N closed.

7. Create a volume named production-volume1, size 1 GB. Verify the volume status.
Use the admin user's Identity service rc file on controller0 at /home/heat-admin/
overcloudrc. Troubleshoot any issues and fix the problem.

7.1. Create the volume.

[student@workstation ~(operator1-production)]$ openstack volume create \


--size 1 production-volume1
...output omitted...

7.2. Check the status of production-volume1.

[student@workstation ~(operator1-production)]$ openstack volume list


+---------------+--------------------+--------+------+-------------+
| ID | Display Name | Status | Size | Attached to |
+---------------+--------------------+--------+------+-------------+
| 0da8(...)be3f | production-volume1 | error | 1 | |
+---------------+--------------------+--------+------+-------------+

7.3. The volume displays an error status. The Block Storage scheduler service is unable to
find a valid host on which to create the volume. The Block Storage volume service is
currently down. Log into controller0 as heat-admin.

[student@workstation ~(operator1-production)]$ ssh heat-admin@controller0

7.4. Verify that no valid host was found to create the production-volume1 in the Block
Storage scheduler's log file.

[heat-admin@overcloud-controller-0 ~]$ sudo less /var/log/cinder/scheduler.log


...output omitted...
201 (...) Failed to run task
cinder.scheduler.flows.create_volume.ScheduleCreateVolumeTask;volume:create:
No valid host was found. No weighed hosts available

7.5. Load the admin credentials and verify that the Cinder volume service is down. The
admin credential can be found in /home/heat-admin/overcloudrc.

[heat-admin@overcloud-controller-0 ~]$ source ~/overcloudrc


[heat-admin@overcloud-controller-0 ~]$ openstack volume service list \
-c Binary -c Host -c Status -c State
+------------------+------------------------+---------+-------+
| Binary | Host | Status | State |
+------------------+------------------------+---------+-------+
| cinder-scheduler | hostgroup | enabled | up |
| cinder-volume | hostgroup@tripleo_ceph | enabled | down |

346 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

+------------------+------------------------+---------+-------+

7.6. Confirm that the IP address and port for the RabbitMQ cluster and the rabbitmq-
clone Pacemaker resource are correct.

[heat-admin@overcloud-controller-0 ~]$ sudo rabbitmqctl status


Status of node 'rabbit@overcloud-controller-0' ...
...output omitted...
{listeners,[{clustering,25672,"::"},{amqp,5672,"172.24.1.1"}]},
...output omitted...

7.7. Verify the Cinder configuration file.

The username for rabbit_userid is wrong. In the following output, you can see the
default is guest, but is currently set as change_me.

[heat-admin@overcloud-controller-0 ~]$ sudo cat /etc/cinder/cinder.conf \


| grep rabbit_userid
#rabbit_userid = guest
rabbit_userid = change_me

7.8. Using crudini change the RabbitMQ user name in the Cinder configuration file. Then
reload the Cinder configuration in the Pacemaker cluster to apply the changes and log
out.

[heat-admin@overcloud-controller-0 ~]$ sudo crudini --set \


/etc/cinder/cinder.conf \
oslo_messaging_rabbit rabbit_userid guest
[heat-admin@overcloud-controller-0 ~]$ sudo pcs resource restart \
openstack-cinder-volume
[heat-admin@overcloud-controller-0 ~]$ exit

7.9. On workstation, delete the incorrect volume and recreate it. Verify it has been
properly created.

[student@workstation ~(operator1-production]$ openstack volume delete \


production-volume1
[student@workstation ~(operator1-production)]$ openstack volume create \
--size 1 production-volume1
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2017-06-14T08:08:01.726844 |
| description | None |
| encrypted | False |
| id | 128a9514-f8bd-4162-9f7e-72036f684cba |
| multiattach | False |
| name | production-volume1 |
| properties | |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |

CL210-RHOSP10.1-en-2-20171006 347

Rendered for Nokia. Please do not distribute.


Chapter 7. Troubleshooting OpenStack Issues

| source_volid | None |
| status | creating |
| type | None |
| updated_at | None |
| user_id | 0ac575bb96e24950a9551ac4cda082a4 |
+---------------------+--------------------------------------+
[student@workstation ~(operator1-production)]$ openstack volume list
+--------------------------------------+--------------------+-----------+------+
| ID | Display Name | Status | Size |
+--------------------------------------+--------------------+-----------+------+
| 128a9514-f8bd-4162-9f7e-72036f684cba | production-volume1 | available | 1 |
+--------------------------------------+--------------------+-----------+------+

Evaluation
On workstation, run the lab troubleshooting-review grade command to confirm
success of this exercise.

[student@workstation ~]$ lab troubleshooting-review grade

Cleanup
From workstation, run the lab troubleshooting-review cleanup script to clean up this
exercise.

[student@workstation ~]$ lab troubleshooting-review cleanup

348 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Summary

Summary
In this chapter, you learned:

• The overcloud uses the HAProxy service to balance traffic to OpenStack services.

• The OpenStack compute service is composed of different components running on both the
controller and the compute nodes. These components include the Compute scheduler and the
Nova compute services.

• The Compute scheduler component selects a compute node to deploy an instance based on an
algorithm. By default, this algorithm is filter-based.

• The Compute component orchestrates the instance deployment and sends the compute node
status to the Compute scheduler component. The no valid host error means that the Compute
scheduler has not identified a compute node that can provide the resources required by the
instance.

• The keystone_admin and the keystone_public services in HAProxy support the three
endpoints for the Keystone identity service: public, admin, and internal.

• Issues in OpenStack services are usually related to either a failing communication because of
a nonfunctioning messaging service, or to a misconfiguration or issue in the storage back end,
such as Ceph.

• The RabbitMQ service is managed by a Pacemaker cluster running on the controller node.

• To access an instance using a floating IP, both the external network associated with that
floating IP and the internal network to which the instance is connected, have to be connected
using a router.

• If an image is set as protected, it cannot be removed.

• The OpenStack block storage service requires that the openstack user has read, write, and
execute capabilities in both the volumes and the images pool in Ceph.

CL210-RHOSP10.1-en-2-20171006 349

Rendered for Nokia. Please do not distribute.


350

Rendered for Nokia. Please do not distribute.


TRAINING
CHAPTER 8

MONITORING CLOUD METRICS


FOR AUTOSCALING

Overview
Goal Monitor and analyze cloud metrics for use in orchestration
autoscaling.
Objectives • Describe the architecture of Ceilometer, Aodh, Gnocchi,
Panko, and agent plugins.

• Analyze OpenStack metrics for use in autoscaling.


Sections • Describing OpenStack Telemetry Architecture (and Quiz)

• Analyzing Cloud Metrics for Autoscaling (and Guided


Exercise)
Lab • Monitoring Cloud Metrics for Autoscaling

CL210-RHOSP10.1-en-2-20171006 351

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

Describing OpenStack Telemetry Architecture

Objective
After completing this section, students should be able to describe the architecture of Ceilometer,
Aodh, Gnocchi, Panko, and agent plugins.

Telemetry Architecture and Services


In Red Hat OpenStack Platform the Telemetry service provides user-level usage data for
OpenStack components. These data are used for system monitoring, alerts, and for generating
customer usage billing. The Telemetry service collects data using polling agents and notification
agents. The polling agents poll the OpenStack infrastructure resources, such as the hypervisor,
to publish the meters on the notification bus. The notification agent listens to the notifications
on the OpenStack notification bus and converts them into meter events and samples. Most
OpenStack resources are able to send such events using the notification system built into
oslo.messaging. The normalized data collected by the Telemetry service is then published to
various targets.

The sample data collected by various agents is stored in the database by the OpenStack
Telemetry collector service. The Telemetry collector service uses a pluggable storage system and
various databases, such as MongoDB. The Telemetry API service allows executing query requests
on this data store by the authenticated users. The query requests on a data store return a list of
resources and statistics based on various metrics collected.

With this architecture, the Telemetry API encountered scalability issues with an increase in query
requests to read the metric data from the data store. Each query request requires the data
store to do a full scan of all sample data stored in the database. A new metering service named
Gnocchi was introduced to decouple the storing of metric data from the Telemetry service
to increase efficiency. Similarly, alerts that were once handled by the Telemetry service were
handed over to a new alarming service named Aodh. The Panko service now stores all the events
generated by the Telemetry service. By decoupling these services from Telemetry, the scalability
of the Telemetry service is greatly enhanced.

352 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Telemetry Architecture and Services

Figure 8.1: Telemetry Service Architecture

The Telemetry Service (Ceilometer)


The Telemetry service collects data by using two built-in plugins:

• Notification agents: This is the preferred method for collecting data. An agent monitors the
message bus for data sent by different OpenStack services such as Compute, Image, Block
Storage, Orchestration, Identity, etc. Messages are then processed by various plugins to
convert them into events and samples.

• Polling agents: These agents poll services for collecting data. Polling agents are either
configured to get information about the hypervisor or using a remote API such as IPMI to
gather the power state of a compute node. This method is less preferred as this approach
increases the load on the Telemetry service API endpoint.

Data gathered by notification and polling agents are processed by various transformers to
generate data samples. For example, to get a CPU utilization percentage, multiple CPU utilization
sample data collected over a period can be aggregated. The processed data samples get
published to Gnocchi for long term storage or to an external system using a publisher.

Polling Agent Plugins


The Telemetry service uses a polling agent to gather information about the infrastructure that
is not published by events and notifications from OpenStack components. The polling agents
use the APIs exposed by the different OpenStack services and other hardware assets such as
compute nodes. The Telemetry service uses agent plugins to support this polling mechanism. The
three default agents plugins used for polling are:

• Compute agent: This agent gathers resource data about all instances running on different
compute nodes. The compute agent is installed on every compute node to facilitate interaction
with the local hypervisor. Sample data collected by a compute agent is sent to the message
bus. The sample data is processed by the notification agent and published to different
publishers.

• Central agent: These agents use the REST APIs of various OpenStack services to gather
additional information that was not sent as a notification. A central agent polls networking,
object storage, block storage, and hardware resources using SNMP. The sample data collected
is sent to the message bus to be processed by the notification agent.

• IPMI agent: This agent uses the ipmitool utility to gather IPMI sensor data. An IPMI-capable
host requires that an IPMI agent is installed. The sample data gathered is used for providing
metrics associated with the physical hardware.

Gnocchi
Gnocchi is based on a time series database used to store metrics and resources published by the
Telemetry service. A time series database is optimized for handling data that contains arrays
of numbers indexed by time stamp. The Gnocchi service provides a REST API to create or edit
metric data. The gnocchi-metricd service computes statistics, in real time, on received data.
This computed data is stored and indexed for fast retrieval.

Gnocchi supports various back ends for storing the metric data and indexed data. Currently
supported storage drivers for storing metric data include file, Ceph, Swift, S3, and Redis. The
default storage driver is file. An overcloud deployment uses the ceph storage driver as the
storage for the metric data. Gnocchi can use a PostgreSQL or a MySQL database to store indexed
data and any associated metadata. The default storage driver for indexed data is PostgreSQL.

CL210-RHOSP10.1-en-2-20171006 353

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

Figure 8.2: Gnocchi Service Architecture

The Telemetry service uses the Gnocchi API service to publish data samples to Gnocchi for
processing and storage. Received data samples are stored in temporary measure storage. The
gnocchi-metricd service reads the measures from the measure storage. The gnocchi-
metricd service then computes the measures based on the archive policy and the aggregation
methods defined for the meter. The computed statistics are then stored for long term in the
metric storage.

To retrieve the metric data, a client, such as the Telemetry alarming service, uses the Gnocchi
API service to read the metric measures from the metric storage, and the metric metadata
stored in the index storage.

Aodh
Aodh provides the alarming services within the Telemetry architecture. For example, you might
want to trigger an alarm when CPU utilization of an instance reaches 70% for more than 10
minutes. To create an Aodh alarm, an alarm action and conditions need to be defined.

An alarm rule is used to define when the alarm is to be triggered. The alarm rule can be based
on an event or on a computed statistic. The definition of an action to be taken when the alarm is
triggered supports multiple forms:

• An HTTP callback URL, invoked when the alarm is triggered.

• A log file to log the event information.

• A notification sent using the messaging bus.

Panko
Panko provides the service to store events collected by the Telemetry service from various
OpenStack components. The Panko service allows storing event data in long term storage, to be
used for auditing and system debugging.

Telemetry Use Cases and Best Practices


The Telemetry service provides metric data to support billing systems for OpenStack cloud
resources. The Telemetry service gathers information about the system and stores it to provide

354 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Configuration Files and Logs

data required for billing purposes. This data can be fed into cloud management software, such as
Red Hat CloudForms, to provide itemized billing and a charge-back to the cloud users.

Best Practices for using Telemetry


The Telemetry service collects all data based on OpenStack components. An cloud administrator
may not require all of the data gathered by the Telemetry service. Reducing the amount of data
sent to the underlying storage increases performance, as this reduces the number of CPU cycles
spent on transformation. In order to decrease the data being collected by the Telemetry service,
an OpenStack administrator can edit the /etc/ceilometer/pipeline.yaml to include only
the relevant meters. This decreases the data gathered by the Telemetry service. The Telemetry
service polls the service API every 10 minutes by default. Increasing the polling interval will wait
before sending metric data to storage, which may increase performance.

Editing the /etc/ceilometer/pipeline.yaml is covered in further details later in this


section.

Best Practices for using Gnocchi


Gnocchi aggregates the data dynamically when it receives the data from the Telemetry service.
Gnocchi does not store the data as is, but aggregates it over a given period. An archive policy
defines the time span and the level of precision that is kept when aggregating data. The time
span defines how long the time series archive will be retained in the metric storage. The level of
precision represents the granularity to be used when performing the aggregation. For example,
if an archive policy defines a policy of 20 points with a granularity of 1 second, then the archive
keeps up to 20 seconds, each representing an aggregation over 1 second. Three archive policies
are defined by default: low, medium, and high. The archive policy to be used depends on your
use case. Depending on the usage of the data, you can either use one of the default policies or
define your own archive policy.

Gnocchi Default Archive Policies


Policy name Archive policy definition
low • Stores metric data with 5 minutes granularity over 30 days.
medium • Stores metric data with one minute granularity over 7 days.

• Stores metric data with one hour granularity over 365 days.
high • Stores metric data with one second granularity over one hour.

• Stores metric data with one minute granularity over 7 days.

• Stores metric data with one hour granularity over 365 days.

The gnocchi-metricd daemon is used to compute the statistics of gathered data samples. In
the event that the number of processes increase, the gnocchi-metricd daemon can be scaled to
any number of servers.

Configuration Files and Logs


The Telemetry service defines various configuration files present under the /etc/ceilometer
directory. These files includes:

Telemetry Configuration Files


File Name Description
ceilometer.conf Configures Telemetry services and agents.

CL210-RHOSP10.1-en-2-20171006 355

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

File Name Description


event_definitions.yaml Defines how events received from other OpenStack
components translate to Telemetry events.
pipeline.yaml Defines the pipeline for the Telemetry service to transform
and publish data. This file can be modified to adjust polling
intervals and number of samples generated by the Telemetry
module.
meters.yaml Defines meters. New meters can be added by updating this file.
gnocchi-resources.yaml Defines the mapping between Telemetry samples and Gnocchi
resources and metrics.
event_pipeline.yaml Defines which notification event types are captured and where
the events are published.
policy.json Defines access control policies for the Telemetry service.

The /etc/ceilometer/ceilometer.yaml file defines the dispatcher for the processing of


metering data with the meter_dispatchers variable. Gnocchi is used as the default meter
dispatcher in the overcloud environment. The output below shows the dispatcher configured to
use Gnocchi for processing the metering data (which is the default).

# Dispatchers to process metering data. (multi valued)


# Deprecated group/name - [DEFAULT]/dispatcher
#meter_dispatchers = database
meter_dispatchers=gnocchi

Pipelines are defined in the /etc/ceilometer/pipelines.yaml file. The processing


of sample data is handled by notification agents. The source of data is events or samples
gathered by the notification agents from the notification bus. Pipelines describe a coupling
between sources of data and corresponding sinks for transformation and publication of data.
The sinks section defined in the /etc/ceilometer/pipeline.yaml file provides the
logic for sample data transformation and how the processed data is published. In the below
pipeline configuration, the cpu meter, collected at 600 seconds interval, is subjected to two
transformations named cpu_sink and cpu_delta_sink. The transformer generates the
cpu_util meter from the sample values of the cpu counter, which represents cumulative CPU
time in nanoseconds, defined using the scale parameter.

---
sources:
- name: cpu_source
interval: 600
meters:
- "cpu"
sinks:
- cpu_sink
- cpu_delta_sink
sinks:
- name: cpu_sink
transformers:
- name: "rate_of_change"
parameters:
target:
name: "cpu_util"
unit: "%"
type: "gauge"

356 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Configuration Files and Logs

scale: "100.0 / (10**9 * (resource_metadata.cpu_number or 1))"


publishers:
- notifier://
- name: cpu_delta_sink
transformers:
- name: "delta"
parameters:
target:
name: "cpu.delta"
growth_only: True
publishers:
- notifier://

The processed data is published, over the messaging bus, to the persistent storage of several
consumers. The publishers section in pipeline.yaml defines the destination for published
data. The Telemetry service supports three types of publishers:

• gnocchi: stores the metric data in Gnocchi time series database.

• panko: stores the event data in the Panko data store.

• notifier: sends the data to the AMQP messaging bus.

Troubleshooting the Telemetry Service


To troubleshoot the Telemetry service, an administrator must analyze the following Telemetry
service log files, found in /var/log/ceilometer/:

Telemetry Log Files


File Name Description
agent-notification.log Logs the information generated by the notification agent.
central.log Logs the information generated by the central agent.
collector.log Logs the information generated by the collector service.

References
Gnocchi Project Architecture
http://gnocchi.xyz/architecture.html

Telemetry service
https://docs.openstack.org/newton/config-reference/telemetry.html
Telemetry service overview
https://docs.openstack.org/mitaka/install-guide-rdo/common/
get_started_telemetry.html

Ceilometer architecture
https://docs.openstack.org/ceilometer/latest/admin/telemetry-system-
architecture.html

CL210-RHOSP10.1-en-2-20171006 357

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

Quiz: Describing OpenStack Telemetry


Architecture

Choose the correct answer(s) to the following questions:

1. Which service is responsible for storing metering data gathered by the Telemetry service?

a. Panko
b. Oslo
c. Aodh
d. Ceilometer
e. Gnocchi

2. What two data collection mechanisms are leveraged by the Telemetry service? (Choose two.)

a. Polling agent
b. Publisher agent
c. Push agent
d. Notification agent

3. Which configuration file contains the meter definitions for the Telemetry service?

a. /etc/ceilometer/ceilometer.conf
b. /etc/ceilometer/meters.conf
c. /etc/ceilometer/definitions.yaml
d. /etc/ceilometer/meters.yaml
e. /etc/ceilometer/resources.yaml

4. What three publisher types are supported by the Telemetry service? (Choose three.)

a. Panko
b. Aodh
c. Notifier
d. Gnocchi

5. What two default archive policies are defined in the Gnocchi service? (Choose two.)

a. low
b. coarse
c. medium
d. sparse
e. moderate

358 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

Solution
Choose the correct answer(s) to the following questions:

1. Which service is responsible for storing metering data gathered by the Telemetry service?

a. Panko
b. Oslo
c. Aodh
d. Ceilometer
e. Gnocchi

2. What two data collection mechanisms are leveraged by the Telemetry service? (Choose two.)

a. Polling agent
b. Publisher agent
c. Push agent
d. Notification agent

3. Which configuration file contains the meter definitions for the Telemetry service?

a. /etc/ceilometer/ceilometer.conf
b. /etc/ceilometer/meters.conf
c. /etc/ceilometer/definitions.yaml
d. /etc/ceilometer/meters.yaml
e. /etc/ceilometer/resources.yaml

4. What three publisher types are supported by the Telemetry service? (Choose three.)

a. Panko
b. Aodh
c. Notifier
d. Gnocchi

5. What two default archive policies are defined in the Gnocchi service? (Choose two.)

a. low
b. coarse
c. medium
d. sparse
e. moderate

CL210-RHOSP10.1-en-2-20171006 359

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

Analyzing Cloud Metrics for Autoscaling

Objective
After completing this section, students should be able to analyze OpenStack metrics for use in
autoscaling.

Retrieve and Analyze OpenStack Metrics


The Telemetry service stores the metrics associated with various OpenStack services persistently
using the Time Series Database (Gnocchi) service. An authenticated user is allowed to send a
request to a Time Series Database service API endpoint to read the measures stored in the data
store.

Time Series Database Resources


Resources are objects that represent cloud components, such as an instance, volume, image,
load balancer VIP, host, IPMI sensor, and so on. The measures stored in the Gnocchi Time Series
Database service are indexed based on the resource and its attributes.

Time Series Database Measure


A measure in the Time Series Database service is the data gathered for a resource at a given
time. The Time Series Database service stores the measure, which is a lightweight component;
each measure includes a number, a time stamp, and a value.

[user@demo ~]$ openstack metric measures show \


--resource-id a509ba1e-91df-405c-b966-c41b722dfd8d \
cpu_util
+---------------------------+-------------+----------------+
| timestamp | granularity | value |
+---------------------------+-------------+----------------+
| 2017-06-14T00:00:00+00:00 | 86400.0 | 0.542669194306 |
| 2017-06-14T15:00:00+00:00 | 3600.0 | 0.542669194306 |
| 2017-06-14T15:40:00+00:00 | 300.0 | 0.542669194306 |
+---------------------------+-------------+----------------+

Time Series Database Metrics


The Time Series Database service provides an entity called metric that stores the aspect of the
resource in the data store. For example, if the resource is an instance, the aspect is the CPU
utilization, which is stored as a metric. Each metric has several attributes: a metric ID, a name, an
archive policy that defines storage lifespan, and different aggregates of the measures.

[user@demo ~]$ openstack metric metric show \


--resource-id 6bd6e073-4e97-4a48-92e4-d37cb365cddb \
image.serve
+------------------------------------+------------------------------------------------+
| Field | Value |
+------------------------------------+------------------------------------------------+
| archive_policy/aggregation_methods | std, count, 95pct, min, max, sum, median, mean |
| archive_policy/back_window | 0 |
| archive_policy/definition | - points: 12, granularity: 0:05:00, timespan: |
| | 1:00:00 |
| | - points: 24, granularity: 1:00:00, timespan: |
| | 1 day, 0:00:00 |
| | - points: 30, granularity: 1 day, 0:00:00, |

360 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Retrieve and Analyze OpenStack Metrics

| | timespan: 30 days, 0:00:00 |


| archive_policy/name | low |
...output omitted...
| id | 2a8329f3-8378-49f6-aa58-d1c5d37d9b62 |
| name | image.serve |
...output omitted...

Time Series Database Archive Policy


The archive policy defined by an OpenStack administrator specifies the data storage policy in
the Time Series Database service. For example, an administrator can define a policy to store data
for one day with one second granularity, or one hour granularity of data to be stored for one
year, or both. Aggregation methods, such as min, max, mean, sum, and so on, provided by the
Time Series Database service are used to aggregate the measures based on granularity specified
in the policy. The aggregated data is stored in the database according to the archive policies.
Archive policies are defined on a per-metric basis and are used to determine the lifespan of
stored aggregated data.

Using the OpenStack CLI to Analyze Metrics


The command-line tool provided by the python-gnocchiclient package helps retrieve and analyze
the metrics stored in the Time Series Database service. The openstack metric command is
used to retrieve and analyze the Telemetry metrics.

To retrieve all the resources and the respective resource IDs, use the openstack metric
resource list command.

[user@demo ~]$ openstack metric resource list -c type -c id


+--------------------------------------+----------------------------+
| id | type |
+--------------------------------------+----------------------------+
| 4464b986-4bd8-48a2-a014-835506692317 | image |
| 05a6a936-4a4c-5d1b-b355-2fd6e2e47647 | instance_disk |
| cef757c0-6137-5905-9edc-ce9c4d2b9003 | instance_network_interface |
| 6776f92f-0706-54d8-94a1-2dd8d2397825 | instance_disk |
| dbf53681-540f-5ee1-9b00-c06bb53cbd62 | instance_disk |
| cebc8e2f-3c8f-45a1-8f71-6f03f017c623 | swift_account |
| a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 | instance |
+--------------------------------------+----------------------------+

The Time Series Database service allows you to create custom resource types to enable the
use of elements that are part of your architecture but are not tied to any OpenStack resources.
For example, when using a hardware load balancer in the architecture, a custom resource type
can be created. These custom resource types use all the features provided by the Time Series
Database service, such as searching through the resources, associating metrics, and so on. To
create a custom resource type, use the openstack metric resource-type create. The
--attribute option is used to specify various attributes that are associated with the resource
type. These attributes are used to search for resources associated with a resource type.

[user@demo ~]$ openstack metric resource-type create \


--attribute display_name:string:true:max_length=255 \
mycustomresource
+-------------------------+----------------------------------------------------------+
| Field | Value |
+-------------------------+----------------------------------------------------------+
| attributes/display_name | max_length=255, min_length=0, required=True, type=string |
| name | mycustomresource |

CL210-RHOSP10.1-en-2-20171006 361

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

| state | active |
+-------------------------+----------------------------------------------------------+

To list the metrics associated with a resource, use the openstack metric resource show
command. The resource ID is retrieved using the openstack metric resource list --
type command, which filters based on resource type.

[user@demo ~]$ openstack metric resource list --type image -c type -c id


+--------------------------------------+----------------------------+
| id | type |
+--------------------------------------+----------------------------+
| 4464b986-4bd8-48a2-a014-835506692317 | image |
+--------------------------------------+----------------------------+
[user@demo ~]$ openstack metric resource show 4464b986-4bd8-48a2-a014-835506692317
+-----------------------+------------------------------------------------------+
| Field | Value |
+-----------------------+------------------------------------------------------+
| created_by_project_id | d42393f674a9488abe11bd0ef6d18a18 |
| created_by_user_id | 7521059a98cc4d579eea897027027575 |
| ended_at | None |
| id | 4464b986-4bd8-48a2-a014-835506692317 |
| metrics | image.download: 7b52afb7-3b25-4722-8028-3d3cc3041316 |
| | image.serve: 2e0027b9-bc99-425f-931a-a3afad313cb3 |
| | image.size: ff4b7310-e6f9-4871-98b8-fff2006fb897 |
| | image: b0feed69-078b-4ab7-9f58-d18b293c110e |
| original_resource_id | 4464b986-4bd8-48a2-a014-835506692317 |
| project_id | fd0ce487ea074bc0ace047accb3163da |
| revision_end | None |
| revision_start | 2017-05-16T03:48:57.218470+00:00 |
| started_at | 2017-05-16T03:48:57.218458+00:00 |
| type | image |
| user_id | None |
+-----------------------+------------------------------------------------------+

New metrics can be added to a resource by an administrator using the openstack metric
resource update command. The --add-metric option can be used to add any existing
metric. The --create-metric option is used to create and then add a metric. The --create-
metric option requires the metric name and the archive policy to be attached to the metric.

To add a new metric named custommetric with the low archive policy to an image resource,
use the command as shown. The resource ID in this example is the ID that was shown previously.

[user@demo ~]$ openstack metric resource update \


--type image \
--create-metric custommetric:low \
4464b986-4bd8-48a2-a014-835506692317
+-----------------------+------------------------------------------------------+
| Field | Value |
+-----------------------+------------------------------------------------------+
| container_format | bare |
| created_by_project_id | df179bcea2e540e398f20400bc654cec |
| created_by_user_id | b74410917d314f22b0301c55c0edd39e |
| disk_format | qcow2 |
| ended_at | None |
| id | 6bd6e073-4e97-4a48-92e4-d37cb365cddb |
| metrics | custommetric: ff016814-9047-4ee7-9719-839c9b79e837 |
| | image.download: fc82d8eb-2f04-4a84-8bc7-fe35130d28eb |
| | image.serve: 2a8329f3-8378-49f6-aa58-d1c5d37d9b62 |
| | image.size: 9b065b52-acf0-4906-bcc6-b9604efdb5e5 |
| | image: 09883163-6783-4106-96ba-de15201e72f9 |

362 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Retrieve and Analyze OpenStack Metrics

| name | finance-rhel7 |
| original_resource_id | 6bd6e073-4e97-4a48-92e4-d37cb365cddb |
| project_id | cebc8e2f3c8f45a18f716f03f017c623 |
| revision_end | None |
| revision_start | 2017-05-23T04:06:12.958634+00:00 |
| started_at | 2017-05-23T04:06:12.958618+00:00 |
| type | image |
| user_id | None |
+-----------------------+------------------------------------------------------+

All the metrics provided by the Telemetry service can be listed by an OpenStack administrator
using the openstack metric metric list command.

[user@demo]$ openstack metric metric list -c name -c unit -c archive_policy/name


+---------------------+---------------------------------+-----------+
| archive_policy/name | name | unit |
+---------------------+---------------------------------+-----------+
| low | disk.iops | None |
| low | disk.root.size | GB |
| low | subnet.create | None |
| low | storage.objects.outgoing.bytes | None |
| low | disk.allocation | B |
| low | network.update | None |
| low | disk.latency | None |
| low | disk.read.bytes | B |
...output omitted...

The openstack metric metric show command shows the metric details. The resource ID of
a resource is retrieved using the openstack metric resource list command.

To list the detailed information of the image.serve metric for an image with the
6bd6e073-4e97-4a48-92e4-d37cb365cddb resource ID, run the following command:

[user@demo ~]$ openstack metric metric show \


--resource-id 6bd6e073-4e97-4a48-92e4-d37cb365cddb \
image.serve
+------------------------------------+------------------------------------------------+
| Field | Value |
+------------------------------------+------------------------------------------------+
| archive_policy/aggregation_methods | std, count, 95pct, min, max, sum, median, mean |
| archive_policy/back_window | 0 |
| archive_policy/definition | - points: 12, granularity: 0:05:00, timespan: |
| | 1:00:00 |
| | - points: 24, granularity: 1:00:00, timespan: |
| | 1 day, 0:00:00 |
| | - points: 30, granularity: 1 day, 0:00:00, |
| | timespan: 30 days, 0:00:00 |
| archive_policy/name | low |
| created_by_project_id | df179bcea2e540e398f20400bc654cec |
| created_by_user_id | b74410917d314f22b0301c55c0edd39e |
| id | 2a8329f3-8378-49f6-aa58-d1c5d37d9b62 |
| name | image.serve |
| resource/created_by_project_id | df179bcea2e540e398f20400bc654cec |
| resource/created_by_user_id | b74410917d314f22b0301c55c0edd39e |
| resource/ended_at | None |
| resource/id | 6bd6e073-4e97-4a48-92e4-d37cb365cddb |
| resource/original_resource_id | 6bd6e073-4e97-4a48-92e4-d37cb365cddb |
| resource/project_id | cebc8e2f3c8f45a18f716f03f017c623 |
| resource/revision_end | None |
| resource/revision_start | 2017-05-23T04:06:12.958634+00:00 |

CL210-RHOSP10.1-en-2-20171006 363

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

| resource/started_at | 2017-05-23T04:06:12.958618+00:00 |
| resource/type | image |
| resource/user_id | None |
| unit | None |
+------------------------------------+------------------------------------------------+

The openstack metric archive-policy list command list the archive policies.

[user@demo ~]$ openstack metric archive-policy list -c name -c definition


+--------+---------------------------------------+
| name | definition |
+--------+---------------------------------------+
| high | - points: 3600, granularity: 0:00:01, |
| | timespan: 1:00:00 |
| | - points: 10080, granularity: |
| | 0:01:00, timespan: 7 days, 0:00:00 |
| | - points: 8760, granularity: 1:00:00, |
| | timespan: 365 days, 0:00:00 |
| low | - points: 12, granularity: 0:05:00, |
| | timespan: 1:00:00 |
| | - points: 24, granularity: 1:00:00, |
| | timespan: 1 day, 0:00:00 |
| | - points: 30, granularity: 1 day, |
| | 0:00:00, timespan: 30 days, 0:00:00 |
| medium | - points: 1440, granularity: 0:01:00, |
| | timespan: 1 day, 0:00:00 |
| | - points: 168, granularity: 1:00:00, |
| | timespan: 7 days, 0:00:00 |
| | - points: 365, granularity: 1 day, |
| | 0:00:00, timespan: 365 days, 0:00:00 |
+--------+---------------------------------------+
[user@demo ~]$ openstack metric archive-policy list -c name -c aggregation_methods
+--------+---------------------------------------+
| name | aggregation_methods |
+--------+---------------------------------------+
| high | std, count, 95pct, min, max, sum, |
| | median, mean |
| low | std, count, 95pct, min, max, sum, |
| | median, mean |
| medium | std, count, 95pct, min, max, sum, |
| | median, mean |
+--------+---------------------------------------+

A Telemetry service administrator can add measures to the data store using the openstack
metric measures add command. To view measures, use the openstack metric measures
show. Both commands require the metric name and resource ID as parameters.

The Time Series Database service uses ISO 8601 time stamp format for output. In ISO
8601 notation, the date, time, and time zone are represented in the following format:
yyyymmddThhmmss+|-hhmm. The date -u "+%FT%T.%6N" command converts the current
date time into the ISO 8601 timestamp format.

Measures are added using the yyyymmddThhmmss+|-hhmm@value format. Multiple measures


can be added using the openstack metric measures add --measure command.

The resource ID of a resource is retrieved using the openstack metric resource list
command. To list the metrics associated with a resource, use the openstack metric
resource show command. The default aggregation method used by the openstack metric
resource show command is mean.

364 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Retrieve and Analyze OpenStack Metrics

Important
For removing measures, administrator privileges are required.

The final entry in the following output shows the average CPU utilization for a resource.

[user@demo ~]$ openstack metric measures add \


--resource-id a509ba1e-91df-405c-b966-c41b722dfd8d \
--measure $(date -u "+%FT%T.%6N")@23 \
cpu_util
[user@demo ~]$ openstack metric measures show \
--resource-id a509ba1e-91df-405c-b966-c41b722dfd8d \
--refresh cpu_util
+---------------------------+-------------+----------------+
| timestamp | granularity | value |
+---------------------------+-------------+----------------+
| 2017-04-27T00:00:00+00:00 | 86400.0 | 11.0085787895 |
| 2017-04-27T11:00:00+00:00 | 3600.0 | 0.312042039086 |
| 2017-04-27T12:00:00+00:00 | 3600.0 | 16.3568471647 |
| 2017-04-27T11:45:00+00:00 | 300.0 | 0.374260637142 |
| 2017-04-27T11:55:00+00:00 | 300.0 | 0.24982344103 |
| 2017-04-27T12:05:00+00:00 | 300.0 | 0.263997402134 |
| 2017-04-27T12:15:00+00:00 | 300.0 | 0.163391256752 |
| 2017-04-27T12:20:00+00:00 | 300.0 | 32.5 |
+---------------------------+-------------+----------------+

Note
For querying and adding measures, a few other time stamp formats are supported. For
example: 50 minutes, which indicating 50 minutes from now, and - 50 minutes,
indicating 50 minutes ago. Time stamps based on the UNIX epoch is also supported.

Use aggregation methods such as min, max, mean, sum, etc., to display the measures based on
the granularity.

The following command shows how to list measures with a particular aggregation method.
The command uses the resource ID associated with an instance to display the minimum CPU
utilization for different granularity. The --refresh option is used to include all new measures.
The final entry of the following screen capture shows the minimum CPU utilization for the
resource.

[user@demo ~]$ openstack metric measures show \


--resource-id a509ba1e-91df-405c-b966-c41b722dfd8d \
--aggregation min \
--refresh cpu_util
+---------------------------+-------------+----------------+
| timestamp | granularity | value |
+---------------------------+-------------+----------------+
| 2017-04-27T00:00:00+00:00 | 86400.0 | 0.163391256752 |
| 2017-04-27T11:00:00+00:00 | 3600.0 | 0.24982344103 |
| 2017-04-27T12:00:00+00:00 | 3600.0 | 0.163391256752 |
| 2017-04-27T11:45:00+00:00 | 300.0 | 0.374260637142 |
| 2017-04-27T11:55:00+00:00 | 300.0 | 0.24982344103 |
| 2017-04-27T12:05:00+00:00 | 300.0 | 0.263997402134 |
| 2017-04-27T12:15:00+00:00 | 300.0 | 0.163391256752 |

CL210-RHOSP10.1-en-2-20171006 365

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

| 2017-04-27T12:20:00+00:00 | 300.0 | 23.0 |


+---------------------------+-------------+----------------+

Querying the Telemetry Metrics


The telemetry metrics stored in the database can be queried based on several conditions. For
example, you can specify a time range to look for measures based on the aggregation method.
Operators like equal to (eq), less than or equal to (le), greater than or equal to (ge), lesser than
(lt), greater than (gt), and various not operators can be used in the query. The operators or,
and, and not are also supported.

The --query option uses attributes associated with a resource type. The following command
displays the mean CPU utilization for all provisioned instances that use the flavor with an ID of 1,
or that use the image with an ID of 6bd6e073-4e97-4a48-92e4-d37cb365cddb.

[user@demo ~]$ openstack metric resource-type show instance


+-------------------------+-----------------------------------------------------------+
| Field | Value |
+-------------------------+-----------------------------------------------------------+
| attributes/display_name | max_length=255, min_length=0, required=True, type=string |
| attributes/flavor_id | max_length=255, min_length=0, required=True, type=string |
| attributes/host | max_length=255, min_length=0, required=True, type=string |
| attributes/image_ref | max_length=255, min_length=0, required=False, type=string |
| attributes/server_group | max_length=255, min_length=0, required=False, type=string |
| name | instance |
| state | active |
+-------------------------+-----------------------------------------------------------+
[user@demo ~]$ openstack metric measures \
aggregation \
--metric cpu_util \
--aggregation mean \
--resource-type instance \
--query '(flavor_id='1')or(image_ref='6bd6e073-4e97-4a48-92e4-d37cb365cddb')' \
--refresh
+---------------------------+-------------+----------------+
| timestamp | granularity | value |
+---------------------------+-------------+----------------+
| 2017-06-14T00:00:00+00:00 | 86400.0 | 0.575362745414 |
...output omitted...

Use the --start option and the --stop option in the openstack metric measures
aggregation command to provide the time range for computing aggregation statistics. For
example, the server_group attribute of the instance resource type can be used the --query
option to group a specific set of instances which can then be monitored for autoscaling. It is also
possible to search for values in the metrics by using one or more levels of granularity. Use the --
granularity option to make queries based on the granularity.

Common Telemetry Metrics


The Telemetry service collects different meters by polling the infrastructural components or by
consuming notifications provided by various OpenStack services. There are three types metric
data provided by the Telemetry service:

• Cumulative: A cumulative meter provides measures that are accumulated over time. For
example, total CPU time used.

• Gauge: A gauge meter records the current value at the time that a reading is recorded. For
example, number of images.

366 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Backing Up and Restoring Telemetry Data

• Delta: A delta meter records the change between values recorded over a particular time
period. For example, network bandwidth.

Some of the meters collected for instances are:

Compute Service Meters


Meter Name Meter Type Description
memory gauge The amount of memory in MB allocated to the
instance.
memory.usage gauge Amount of memory in MB consumed by the
instance.
cpu cumulative Total CPU time used in nanosecond (ns)
cpu_util gauge Average CPU utilization in percentage.
disk.read.requests cumulative Total number of read requests.
disk.read.requests.rate gauge Average rate of read requests per second.

The meters collected for the images are:

Image Service Meters


Meter Name Meter Type Description
image gauge The size of the image uploaded.
image.download delta The number of bytes downloaded for an image.
image.serve delta The number of bytes served out for an image.

Backing Up and Restoring Telemetry Data


To recover from the loss of Telemetry data, the database associated with the metering data
needs to have been backed up. Both the indexed and metric storage databases associated with
the Time Series Database service can be backed up using the native database tools. The indexed
data stored in PostgreSQL or MYSQL can be backed up using the database dump utilities.
Similarly, if the metering data is stored on Ceph, Swift, or the file system, then a snapshot must
be regularly taken. The procedure for restoring both data stores is to restore the data backup
using the native database utilities. The Time Series Database services should be restarted
after restoring the databases. The procedure to back up and restore is beyond the scope of this
course.

Creating and Managing Telemetry Alarms


Aodh is the alarming service in the Telemetry service architecture. Aodh allows OpenStack users
to create alarms based on events and metrics provided by OpenStack services. When creating
an alarm based on metrics, the alarm can be set for a single meter or a combination of many
meters. For example, an alarm can be configured to be triggered when the memory consumption
of the instance breaches 70%, and the CPU utilization is more than 80%. In the case of an event
alarm, the change in state of an OpenStack resource triggers the alarm. For example, updating
an image property would trigger an event alarm for the image.

The alarm action defines the action that needs to be taken when an alarm is triggered. In Aodh,
the alarm notifier notifies the activation of an alarm by using one of three methods: triggering
the HTTP callback URL, writing to a log file, or sending notifications to the messaging bus.

CL210-RHOSP10.1-en-2-20171006 367

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

You can create a threshold alarm that activates when the aggregated statistics of a metric
breaches the threshold value. In the following example, an alarm is created to trigger when the
average CPU utilization metric of the instance exceeds 80%. The alarm action specified adds an
entry to the log. A query is used so that the alarm monitors the CPU utilization of a particular
instance with an instance ID of 5757edba-6850-47fc-a8d4-c18026e686fb.

[usr@demo ~]$ openstack alarm create \


--type gnocchi_aggregation_by_resources_threshold \
--name high_cpu_util \
--description 'GnocchiAggregationByResourceThreshold' \
--metric cpu_util \
--aggregation-method mean \
--comparison-operator 'ge' \
--threshold 80 \
--evaluation-periods 2 \
--granularity 300 \
--alarm-action 'log://' \
--resource-type instance \
--query '{"=": {"id": "5757edba-6850-47fc-a8d4-c18026e686fb"}}'
+---------------------------+-------------------------------------------------------+
| Field | Value |
+---------------------------+-------------------------------------------------------+
| aggregation_method | mean |
| alarm_actions | [u'log://'] |
| alarm_id | 1292add6-ac57-4ae1-bd49-6147b68d8879 |
| comparison_operator | ge |
| description | GnocchiAggregationByResourceThreshold |
| enabled | True |
| evaluation_periods | 2 |
| granularity | 300 |
| insufficient_data_actions | [] |
| metric | cpu_util |
| name | high_cpu_util |
| ok_actions | [] |
| project_id | fd0ce487ea074bc0ace047accb3163da |
| query | {"=": {"id": "5757edba-6850-47fc-a8d4-c18026e686fb"}} |
| repeat_actions | False |
| resource_type | instance |
| severity | low |
| state | insufficient data |
| state_timestamp | 2017-05-19T06:46:19.235846 |
| threshold | 80.0 |
| time_constraints | [] |
| timestamp | 2017-05-19T06:46:19.235846 |
| type | gnocchi_aggregation_by_resources_threshold |
| user_id | 15ceac73d7bb4437a34ee26670571612 |
+---------------------------+-------------------------------------------------------+

To get the alarm state, use the openstack alarm state get command. The alarm history
can be viewed using the openstack alarm-history show command. This checks the alarm
state transition and shows the related time stamps.

[usr@demo ~]$ openstack alarm state get 1292add6-ac57-4ae1-bd49-6147b68d8879


+-------+-------+
| Field | Value |
+-------+-------+
| state | alarm |
+-------+-------+
[usr@demo ~]$ openstack alarm-history show 1292add6-ac57-4ae1-bd49-6147b68d8879 \
-c timestamp -c type -c details -f json

368 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Telemetry Metrics in Autoscaling

[
{
"timestamp": "2017-06-08T16:28:54.002079",
"type": "state transition",
"detail": "{\"transition_reason\": \"Transition to ok due to 2 samples
inside threshold, most recent: 0.687750180591\", \"state\": \"ok\"}"
},
{
"timestamp": "2017-06-08T15:25:53.525213",
"type": "state transition",
"detail": "{\"transition_reason\": \"2 datapoints are unknown\", \"state\":
\"insufficient data\"}"
},
{
"timestamp": "2017-06-08T14:05:53.477088",
"type": "state transition",
"detail": "{\"transition_reason\": \"Transition to alarm due to 2 samples
outside threshold, most recent: 70.0\", \"state\": \"alarm\"}"
},
...output omitted...

Telemetry Metrics in Autoscaling


When using Auto Scaling to scale in and scale out of instances, the Telemetry service alerts
provided by the Aodh alarm trigger the execution of the Auto Scaling policy. The alarm watches
a single metric published by the metering service and alarms send messages to Auto Scaling
when the metric crosses the threshold value. The alarm can monitor any metrics provided by the
Telemetry metering service. Most common metrics for autoscaling an instance are cpu_util,
memory.usage, disk.read.requests.rate, and disk.write.requests.rate. However,
custom metrics can also be used to trigger autoscaling.

Monitoring Cloud Resources With the Telemetry


Service
The following steps outline the process for monitoring cloud resources using the Telemetry
service.

1. Use the openstack metric resource list command to find the resource ID and the
desired resource.

2. Use the openstack metric resource show command with the resource ID found in the
previous step to view the available meters for the resource. Make note of the metric ID.

3. Use the openstack metric metric show command with the metric ID found in the
previous step to view the details of the desired meter.

4. Create an alarm based on the desired meter using the openstack alarm create
command. Use the --alarm-action option to define the action to be taken after the
alarm is triggered.

5. Verify the alarm state using the openstack alarm state get command.

6. List the alarm history using the openstack alarm-history command to check the
alarm state transition time stamps.

CL210-RHOSP10.1-en-2-20171006 369

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

References
Further information is available in the Monitoring Using the Telemetry Service
chapter of the Logging, Monitoring, and Troubleshooting Guide for Red Hat
OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

370 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Guided Exercise: Analyzing Cloud Metrics for Autoscaling

Guided Exercise: Analyzing Cloud Metrics for


Autoscaling

In this exercise, you will view and analyze common metrics required for autoscaling.

Outcomes
You should be able to:

• List the available metrics associated with a resource.

• Analyze metrics to view the aggregated values.

Before you begin


Log in to workstation as student user and student as the password.

On workstation, run the lab monitoring-analyzing-metrics setup command. This


script will ensure the OpenStack services are running and the environment is properly configured
for this exercise. The script also creates an instance named finance-web1.

[student@workstation ~]$ lab monitoring-analyzing-metrics setup

Steps
1. From workstation connect to the controller0 node. Open the /etc/ceilometer/
ceilometer.conf file and which meter dispatcher is configured for the Telemetry service.
On workstation, run the ceilometer command (which should produce an error) to verify
that the Gnocchi telemetry service is running instead of Ceilometer.

1.1. Use SSH to connect to controller0 as the user heat-admin.

[student@workstation ~]$ ssh heat-admin@controller0


[heat-admin@overcloud-controller-0 ~]$

1.2. Open the /etc/ceilometer/ceilometer.conf and search for the


meter_dispatchers variable. The meter dispatcher is set to gnocchi, which is
storing the metering data.

[heat-admin@overcloud-controller-0 ~]$ sudo grep meter_dispatchers \


/etc/ceilometer/ceilometer.conf
#meter_dispatchers = database
meter_dispatchers=gnocchi

1.3. Log out of the controller0 node.

[heat-admin@overcloud-controller-0 ~]$ exit


[student@workstation ~]$

1.4. From workstation, source the /home/student/developer1-finance-rc file.


Verify that the ceilometer command returns an error because Gnocchi is set as the
meter dispatcher.

CL210-RHOSP10.1-en-2-20171006 371

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

[student@workstation ~]$ source ~/developer1-finance-rc


[student@workstation ~(developer1-finance)]$ ceilometer --debug meter-list
...output omitted...
DEBUG (client) RESP BODY: {"error_message": "410 Gone\n\nThis resource is no
longer available.
No forwarding address is given.
\n\n This telemetry installation is configured to use Gnocchi.
Please use the Gnocchi API available on the metric endpoint to retrieve data.
"}
...output omitted...

2. List the resource types available in the Telemetry metering service. Use the resource ID of
the instance resource type to list all the meters available.

2.1. List the resource types available.

[student@workstation ~(developer1-finance)]$ openstack metric resource-type \


list -c name
+----------------------------+
| name |
+----------------------------+
| ceph_account |
| generic |
| host |
| host_disk |
| host_network_interface |
| identity |
| image |
| instance |
| instance_disk |
| instance_network_interface |
| ipmi |
| network |
| stack |
| swift_account |
| volume |
+----------------------------+

2.2. List the resources accessible by the developer1 user. Note the resource ID of the
instance resource type.

[student@workstation ~(developer1-finance)]$ openstack metric resource list \


-c id -c type
+--------------------------------------+----------------------------+
| id | type |
+--------------------------------------+----------------------------+
| 6bd6e073-4e97-4a48-92e4-d37cb365cddb | image |
| 05a6a936-4a4c-5d1b-b355-2fd6e2e47647 | instance_disk |
| cef757c0-6137-5905-9edc-ce9c4d2b9003 | instance_network_interface |
| 6776f92f-0706-54d8-94a1-2dd8d2397825 | instance_disk |
| dbf53681-540f-5ee1-9b00-c06bb53cbd62 | instance_disk |
| cebc8e2f-3c8f-45a1-8f71-6f03f017c623 | swift_account |
| a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 | instance |
+--------------------------------------+----------------------------+

2.3. Verify that the instance ID of the finance-web1 instance is the same as the resource
ID.

372 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


[student@workstation ~(developer1-finance)]$ openstack server list \
-c ID -c Name
+--------------------------------------+--------------+
| ID | Name |
+--------------------------------------+--------------+
| a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 | finance-web1 |
+--------------------------------------+--------------+

2.4. Using the resource ID, list all the meters associated with the finance-web1 instance.

[student@workstation ~(developer1-finance)]$ openstack metric resource show \


a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0
+-----------------------+------------------------------------------------------+
| Field | Value |
+-----------------------+------------------------------------------------------+
| created_by_project_id | d42393f674a9488abe11bd0ef6d18a18 |
| created_by_user_id | 7521059a98cc4d579eea897027027575 |
| ended_at | None |
| id | a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 |
| metrics | cpu.delta: 75369002-85ca-47b9-8276-88f5314aa9ad |
| | cpu: 71d9c293-f9ba-4b76-aaf8-0b1806a3b280 |
| | cpu_util: 37274980-f825-4aef-b1b9-fe46d266d1d8 |
| | disk.allocation: 93597246-5a02-4f65-b51c- |
| | 2b4946f411cd |
| | disk.capacity: cff713de-cdcc-4162-8a5b-16f76f86cf10 |
| | disk.ephemeral.size: |
| | b7cccbb5-bc27-40fe-9296-d62dfc22dfce |
| | disk.iops: ccf52c4d-9f59-4f78-8b81-0cb10c02b8e3 |
| | disk.latency: c6ae52ee-458f-4b8f-800a-cb00e5b1c1a6 |
| | disk.read.bytes.rate: 6a1299f2-a467-4eab- |
| | 8d75-f8ea68cad213 |
| | disk.read.bytes: |
| | 311bb209-f466-4713-9ac9-aa5d8fcfbc4d |
| | disk.read.requests.rate: 0a49942b-bbc9-4b2b- |
| | aee1-b6acdeeaf3ff |
| | disk.read.requests: 2581c3bd-f894-4798-bd5b- |
| | 53410de25ca8 |
| | disk.root.size: b8fe97f1-4d5e-4e2c-ac11-cdd92672c3c9 |
| | disk.usage: 0e12b7e5-3d0b-4c0f-b20e-1da75b2bff01 |
| | disk.write.bytes.rate: a2d063ed- |
| | 28c0-4b82-b867-84c0c6831751 |
| | disk.write.bytes: |
| | 8fc5a997-7fc0-43b5-88a0-ca28914e47cd |
| | disk.write.requests.rate: |
| | db5428d6-c6d7-4d31-888e-d72815076229 |
| | disk.write.requests: a39417a5-1dca- |
| | 4a94-9934-1deaef04066b |
| | instance: 4ee71d49-38f4-4368-b86f-a72d73861c7b |
| | memory.resident: 8795fdc3-0e69-4990-bd4c- |
| | 61c6e1a12c1d |
| | memory.usage: 902c7a71-4768-4d28-9460-259bf968aac5 |
| | memory: 277778df-c551-4573-a82e-fa7d3349f06f |
| | vcpus: 11ac9f36-1d1f-4e72-a1e7-9fd5b7725a14 |
| original_resource_id | a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 |
| project_id | 861b7d43e59c4edc97d1083e411caea0 |
| revision_end | None |
| revision_start | 2017-05-26T04:10:16.250620+00:00 |
| started_at | 2017-05-26T03:32:08.440478+00:00 |
| type | instance |
| user_id | cbcc0ad8d6ab460ca0e36ba96528dc03 |

CL210-RHOSP10.1-en-2-20171006 373

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

+-----------------------+------------------------------------------------------+

3. List the meters associated with the image resource type.

3.1. Retrieve the resource ID associated with the image resource type.

[student@workstation ~(developer1-finance)]$ openstack metric resource list \


--type image -c id -c type
+--------------------------------------+-------+
| id | type |
+--------------------------------------+-------+
| 6bd6e073-4e97-4a48-92e4-d37cb365cddb | image |
+--------------------------------------+-------+

3.2. List the meters associated with the image resource ID.

[student@workstation ~(developer1-finance)]$ openstack metric resource show \


6bd6e073-4e97-4a48-92e4-d37cb365cddb
+-----------------------+------------------------------------------------------+
| Field | Value |
+-----------------------+------------------------------------------------------+
| created_by_project_id | df179bcea2e540e398f20400bc654cec |
| created_by_user_id | b74410917d314f22b0301c55c0edd39e |
| ended_at | None |
| id | 6bd6e073-4e97-4a48-92e4-d37cb365cddb |
| metrics | image.download: fc82d8eb-2f04-4a84-8bc7-fe35130d28eb |
| | image.serve: 2a8329f3-8378-49f6-aa58-d1c5d37d9b62 |
| | image.size: 9b065b52-acf0-4906-bcc6-b9604efdb5e5 |
| | image: 09883163-6783-4106-96ba-de15201e72f9 |
| original_resource_id | 6bd6e073-4e97-4a48-92e4-d37cb365cddb |
| project_id | cebc8e2f3c8f45a18f716f03f017c623 |
| revision_end | None |
| revision_start | 2017-05-23T04:06:12.958634+00:00 |
| started_at | 2017-05-23T04:06:12.958618+00:00 |
| type | image |
| user_id | None |
+-----------------------+------------------------------------------------------+

4. Using the resource ID, list the details for the disk.read.requests.rate metric
associated with the finance-web1 instance.

[student@workstation ~(developer1-finance)]$ openstack metric metric show \


--resource-id a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 \
disk.read.requests.rate
+------------------------------------+-----------------------------------------+
| Field | Value |
+------------------------------------+-----------------------------------------+
| archive_policy/aggregation_methods | std, count, 95pct, min, max, sum, |
| | median, mean |
| archive_policy/back_window | 0 |
| archive_policy/definition | - points: 12, granularity: 0:05:00, |
| | timespan: 1:00:00 |
| | - points: 24, granularity: 1:00:00, |
| | timespan: 1 day, 0:00:00 |
| | - points: 30, granularity: 1 day, |
| | 0:00:00, timespan: 30 days, 0:00:00 |
| archive_policy/name | low |
| created_by_project_id | d42393f674a9488abe11bd0ef6d18a18 |
| created_by_user_id | 7521059a98cc4d579eea897027027575 |

374 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


| id | 0a49942b-bbc9-4b2b-aee1-b6acdeeaf3ff |
| name | disk.read.requests.rate |
| resource/created_by_project_id | d42393f674a9488abe11bd0ef6d18a18 |
| resource/created_by_user_id | 7521059a98cc4d579eea897027027575 |
| resource/ended_at | None |
| resource/id | a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 |
| resource/original_resource_id | a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 |
| resource/project_id | 861b7d43e59c4edc97d1083e411caea0 |
| resource/revision_end | None |
| resource/revision_start | 2017-05-26T04:10:16.250620+00:00 |
| resource/started_at | 2017-05-26T03:32:08.440478+00:00 |
| resource/type | instance |
| resource/user_id | cbcc0ad8d6ab460ca0e36ba96528dc03 |
| unit | None |
+------------------------------------+-----------------------------------------+

The disk.read.requests.rate metric uses the low archive policy. The low archive
policy uses as low as 5 minutes granularity for aggregation and the maximum life span of
the aggregated data is 30 days.

5. Display the measures gathered and aggregated by the disk.read.requests.rate metric


associated with the finance-web1 instance. The number of records returned in the output
may vary.

[student@workstation ~(developer1-finance)]$ openstack metric measures show \


--resource-id a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 \
disk.read.requests.rate
+---------------------------+-------------+----------------+
| timestamp | granularity | value |
+---------------------------+-------------+----------------+
| 2017-05-23T00:00:00+00:00 | 86400.0 | 0.277122710561 |
| 2017-05-23T04:00:00+00:00 | 3600.0 | 0.0 |
| 2017-05-23T05:00:00+00:00 | 3600.0 | 0.831368131683 |
| 2017-05-23T06:00:00+00:00 | 3600.0 | 0.0 |
| 2017-05-23T07:00:00+00:00 | 3600.0 | 0.0 |
| 2017-05-23T05:25:00+00:00 | 300.0 | 0.0 |
| 2017-05-23T05:35:00+00:00 | 300.0 | 4.92324971194 |
| 2017-05-23T05:45:00+00:00 | 300.0 | 0.0 |
| 2017-05-23T05:55:00+00:00 | 300.0 | 0.0 |
| 2017-05-23T06:05:00+00:00 | 300.0 | 0.0 |
| 2017-05-23T06:15:00+00:00 | 300.0 | 0.0 |
| 2017-05-23T06:25:00+00:00 | 300.0 | 0.0 |
| 2017-05-23T06:35:00+00:00 | 300.0 | 0.0 |
| 2017-05-23T06:45:00+00:00 | 300.0 | 0.0 |
| 2017-05-23T06:55:00+00:00 | 300.0 | 0.0 |
| 2017-05-23T07:05:00+00:00 | 300.0 | 0.0 |
| 2017-05-23T07:15:00+00:00 | 300.0 | 0.0 |
+---------------------------+-------------+----------------+

Observe the value column, which displays the aggregated values based on archive policy
associated with the metric. The 86400, 3600, and 300 granularity column values
represent the aggregation period as 1 day, 1 hour, and 5 minutes, respectively, in seconds.

6. Using the resource ID, list the maximum measures associated with the cpu_util metric
with 300 seconds granularity. The number of records returned in the output may vary.

[student@workstation ~(developer1-finance)]$ openstack metric measures show \


--resource-id a2b3bda7-1d9e-4ad0-99fe-b4f7774deda0 \
--aggregation max \

CL210-RHOSP10.1-en-2-20171006 375

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

--granularity 300 \
cpu_util
+---------------------------+-------------+-----------------+
| timestamp | granularity | value |
+---------------------------+-------------+-----------------+
| 2017-05-23T05:45:00+00:00 | 300.0 | 0.0708371692841 |
| 2017-05-23T05:55:00+00:00 | 300.0 | 0.0891683788482 |
| 2017-05-23T06:05:00+00:00 | 300.0 | 0.0907790288644 |
| 2017-05-23T06:15:00+00:00 | 300.0 | 0.0850440360854 |
| 2017-05-23T06:25:00+00:00 | 300.0 | 0.0691660923575 |
| 2017-05-23T06:35:00+00:00 | 300.0 | 0.0858326136269 |
| 2017-05-23T06:45:00+00:00 | 300.0 | 0.0666668728895 |
| 2017-05-23T06:55:00+00:00 | 300.0 | 0.0658094259754 |
| 2017-05-23T07:05:00+00:00 | 300.0 | 0.108326315232 |
| 2017-05-23T07:15:00+00:00 | 300.0 | 0.066695508806 |
| 2017-05-23T07:25:00+00:00 | 300.0 | 0.0666670677802 |
| 2017-05-23T07:35:00+00:00 | 300.0 | 0.0666727313294 |
+---------------------------+-------------+-----------------+

7. List the average CPU utilization for all instances provisioned using the rhel7 image. Query
for all instances containing the word finance in the instance name.

7.1. List the attributes supported by the instance resource type. The command returns the
attributes that may be used to query this resource type.

[student@workstation ~(developer1-finance)]$ openstack metric resource-type \


show instance
+-------------------------+----------------------------------------------------+
| Field | Value |
+-------------------------+----------------------------------------------------+
| attributes/display_name | max_length=255, min_length=0, required=True, |
| | type=string |
| attributes/flavor_id | max_length=255, min_length=0, required=True, |
| | type=string |
| attributes/host | max_length=255, min_length=0, required=True, |
| | type=string |
| attributes/image_ref | max_length=255, min_length=0, required=False, |
| | type=string |
| attributes/server_group | max_length=255, min_length=0, required=False, |
| | type=string |
| name | instance |
| state | active |
+-------------------------+----------------------------------------------------+

7.2. Only users with the admin role can query measures using resource attributes. Use the
architect1 user's Identity credentials to execute the command. The architect1
credentials are stored in the /home/student/architect1-finance-rc file.

[student@workstation ~(developer1-finance)]$ source ~/architect1-finance-rc


[student@workstation ~(architect1-finance)]$

7.3. Retrieve the image ID for the finance-rhel7 image.

[student@workstation ~(architect1-finance)]$ openstack image list


+--------------------------------------+---------------+--------+
| ID | Name | Status |
+--------------------------------------+---------------+--------+
| 6bd6e073-4e97-4a48-92e4-d37cb365cddb | finance-rhel7 | active |

376 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


+--------------------------------------+---------------+--------+

7.4. List the average CPU utilization for all the instances using the openstack metric
measures aggregation command. Use the --query option to filter the instances.
Source the admin-rc credential file.

The instance resource type has the attributes image_ref and display_name. The
image_ref attribute specifies the image used for provisioning. The display_name
attribute specifies the instance name. The query uses the like operator to search for
the finance substring. Combine the query conditions using the and operator. The --
refresh option is used to force aggregation of all known measures. The number of
records returned in the output may vary.

[student@workstation ~(architect1-finance)]$ openstack metric measures \


aggregation \
--metric cpu_util \
--aggregation mean \
--resource-type instance \
--query '(display_name like "finance%")and(image_ref='6bd6e073-4e97-4a48-92e4-
d37cb365cddb')' \
--refresh
+---------------------------+-------------+-----------------+
| timestamp | granularity | value |
+---------------------------+-------------+-----------------+
| 2017-05-23T00:00:00+00:00 | 86400.0 | 0.107856401515 |
| 2017-05-23T04:00:00+00:00 | 3600.0 | 0.0856332847432 |
| 2017-05-23T05:00:00+00:00 | 3600.0 | 0.214997947668 |
| 2017-05-23T06:00:00+00:00 | 3600.0 | 0.0772163449665 |
| 2017-05-23T07:00:00+00:00 | 3600.0 | 0.0761148056641 |
| 2017-05-23T08:00:00+00:00 | 3600.0 | 0.073333038879 |
| 2017-05-23T09:00:00+00:00 | 3600.0 | 0.111944170402 |
| 2017-05-23T10:00:00+00:00 | 3600.0 | 0.110803068583 |
| 2017-05-23T08:15:00+00:00 | 300.0 | 0.0675114757132 |
| 2017-05-23T08:25:00+00:00 | 300.0 | 0.0858683130787 |
| 2017-05-23T08:35:00+00:00 | 300.0 | 0.0658268878936 |
| 2017-05-23T08:45:00+00:00 | 300.0 | 0.065833179174 |
| 2017-05-23T08:55:00+00:00 | 300.0 | 0.0658398475278 |
| 2017-05-23T09:05:00+00:00 | 300.0 | 0.109115311727 |
| 2017-05-23T09:15:00+00:00 | 300.0 | 0.141717706062 |
| 2017-05-23T09:25:00+00:00 | 300.0 | 0.159984446046 |
| 2017-05-23T09:35:00+00:00 | 300.0 | 0.0858446020112 |
| 2017-05-23T09:45:00+00:00 | 300.0 | 0.0875042966068 |
| 2017-05-23T09:55:00+00:00 | 300.0 | 0.087498659958 |
| 2017-05-23T10:05:00+00:00 | 300.0 | 0.110803068583 |
+---------------------------+-------------+-----------------+

Cleanup
From workstation, run the lab monitoring-analyzing-metrics cleanup command to
clean up this exercise.

[student@workstation ~]$ lab monitoring-analyzing-metrics cleanup

CL210-RHOSP10.1-en-2-20171006 377

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

Lab: Monitoring Cloud Metrics for Autoscaling

In this lab, you will analyze the Telemetry metric data and create an Aodh alarm. You will also set
the alarm to trigger when the maximum CPU utilization of an instance exceeds a threshold value.

Outcomes
You should be able to:

• Search and list the metrics available with the Telemetry service for a particular user.

• View the usage data collected for a metric.

• Check which archive policy is in use for a particular metric.

• Add new measures to a metric.

• Create an alarm based on aggregated usage data of a metric, and trigger it.

• View and analyze an alarm history.

Before you begin


Log in to workstation as student with a password of student.

On workstation, run the lab monitoring-review setup command. This will ensure that
the OpenStack services are running and the environment has been properly configured for this
lab. The script also creates an instance named production-rhel7.

[studentworkstation ~]$ lab monitoring-review setup

Steps
1. List all of the instance type telemetry resources accessible by the user operator1. Ensure
the production-rhel7 instance is available. Observe the resource ID of the instance.
Credentials for user operator1 are in /home/student/operator1-production-rc on
workstation.

2. List all metrics associated with the production-rhel7 instance.

3. List the available archive policies. Verify that the cpu_util metric of the production-
rhel7 instance uses the archive policy named low.

4. Add new measures to the cpu_util metric. Observe that the newly added measures
are available using min and max aggregation methods. Use the values from the following
table. The measures must be added using the architect1 user's credentials, because
manipulating data points requires an account with the admin role. Credentials of user
architect1 are stored in /home/student/architect1-production-rc file.

Measures Parameter
Timestamp Current time in ISO 8601 formatted timestamp
Measure values 30, 42

The measure values 30 and 42 are manual data values added to the cpu_util metric.

378 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


5. Create a threshold alarm named cputhreshold-alarm based on aggregation by
resources. Set the alarm to trigger when maximum CPU utilization for the production-
rhel7 instance exceeds 50% for two consecutive 5 minute periods.

6. Simulate high CPU utilization scenario by manually adding new measures to the cpu_util
metric of the instance. Observe that the alarm triggers when the aggregated CPU utilization
exceeds the 50% threshold through two evluation periods of 5 minutes each. To simulate
high CPU utilization, manually add a measure with a value of 80 once every minute until the
alarm triggers. It is expected to take between 5 and 10 minutes to trigger.

Evaluation
On workstation, run the lab monitoring-review grade command to confirm success of
this exercise. Correct any reported failures and rerun the command until successful.

[student@workstation ~] lab monitoring-review grade

Cleanup
From workstation, run the lab monitoring-review cleanup command to clean up this
exercise.

[student@workstation ~] lab monitoring-review cleanup

CL210-RHOSP10.1-en-2-20171006 379

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

Solution
In this lab, you will analyze the Telemetry metric data and create an Aodh alarm. You will also set
the alarm to trigger when the maximum CPU utilization of an instance exceeds a threshold value.

Outcomes
You should be able to:

• Search and list the metrics available with the Telemetry service for a particular user.

• View the usage data collected for a metric.

• Check which archive policy is in use for a particular metric.

• Add new measures to a metric.

• Create an alarm based on aggregated usage data of a metric, and trigger it.

• View and analyze an alarm history.

Before you begin


Log in to workstation as student with a password of student.

On workstation, run the lab monitoring-review setup command. This will ensure that
the OpenStack services are running and the environment has been properly configured for this
lab. The script also creates an instance named production-rhel7.

[studentworkstation ~]$ lab monitoring-review setup

Steps
1. List all of the instance type telemetry resources accessible by the user operator1. Ensure
the production-rhel7 instance is available. Observe the resource ID of the instance.
Credentials for user operator1 are in /home/student/operator1-production-rc on
workstation.

1.1. From workstation, source the /home/student/operator1-production-rc file


to use operator1 user credentials. Find the ID associated with the user.

[student@workstation ~]$ source ~/operator1-production-rc


[student@workstation ~(operator1-production)]$ openstack user show operator1
+------------+----------------------------------+
| Field | Value |
+------------+----------------------------------+
| enabled | True |
| id | 4301d0dfcbfb4c50a085d4e8ce7330f6 |
| name | operator1 |
| project_id | a8129485db844db898b8c8f45ddeb258 |
+------------+----------------------------------+

1.2. Use the retrieved user ID to search the resources accessible by the operator1 user.
Filter the output based on the instance resource type.

[student@workstation ~(operator1-production)]$ openstack metric resource \


search user_id=4301d0dfcbfb4c50a085d4e8ce7330f6 \
-c id -c type -c user_id --type instance -f json

380 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

[
{
"user_id": "4301d0dfcbfb4c50a085d4e8ce7330f6",
"type": "instance",
"id": "969b5215-61d0-47c4-aa3d-b9fc89fcd46c"
}
]

1.3. Observe that the ID of the resource in the previous output matches the instance ID of
the production-rhel7 instance. The production-rhel7 instance is available.

[student@workstation ~(operator1-production)]$ openstack server show \


production-rhel7 -c id -c name -c status
+--------+--------------------------------------+
| Field | Value |
+--------+--------------------------------------+
| id | 969b5215-61d0-47c4-aa3d-b9fc89fcd46c |
| name | production-rhel7 |
| status | ACTIVE |
+--------+--------------------------------------+

The production-rhel7 instance resource ID matches the production-rhel7


instance ID. Note this resource ID, as it will be used in upcoming lab steps.

2. List all metrics associated with the production-rhel7 instance.

2.1. Use the production-rhel7 instance resource ID to list the available metrics. Verify
that the cpu_util metric is listed.

[student@workstation ~(operator1-production)]$ openstack metric resource \


show 969b5215-61d0-47c4-aa3d-b9fc89fcd46c --type instance
+--------------+---------------------------------------------------------------+
|Field | Value |
+--------------+---------------------------------------------------------------+
|id | 969b5215-61d0-47c4-aa3d-b9fc89fcd46c |
|image_ref | 280887fa-8ca4-43ab-b9b0-eea9bfc6174c |
|metrics | cpu.delta: a22f5165-0803-4578-9337-68c79e005c0f |
| | cpu: e410ce36-0dac-4503-8a94-323cf78e7b96 |
| | cpu_util: 6804b83c-aec0-46de-bed5-9cdfd72e9145 |
| | disk.allocation: 0610892e-9741-4ad5-ae97-ac153bb53aa8 |
| | disk.capacity: 0e0a5313-f603-4d75-a204-9c892806c404 |
| | disk.ephemeral.size: 2e3ed19b-fb02-44be-93b7-8f6c63041ac3 |
| | disk.iops: 83122c52-5687-4134-a831-93d80dba4b4f |
| | disk.latency: 11e2b022-b602-4c5a-b710-2acc1a82ea91 |
| | disk.read.bytes.rate: 3259c60d-0cb8-47d0-94f0-cded9f30beb2 |
| | disk.read.bytes: eefa65e9-0cbd-4194-bbcb-fdaf596a3337 |
| | disk.read.requests.rate: 36e0b15c-4f6c-4bda-bd03-64fcea8a4c70 |
| | disk.read.requests: 6f14131e-f15c-401c-9599-a5dbcc6d5f2e |
| | disk.root.size: 36f6f5c1-4900-48c1-a064-482d453a4ee7 |
| | disk.usage: 2e510e08-7820-4214-81ee-5647bdaf0db0 |
| | disk.write.bytes.rate: 059a529e-dcad-4439-afd1-7d199254bec9 |
| | disk.write.bytes: 68be5427-df81-4dac-8179-49ffbbad219e |
| | disk.write.requests.rate: 4f86c785-35ef-4d92-923f-b2a80e9dd14f|
| | disk.write.requests: 717ce076-c07b-4982-95ed-ba94a6993ce2 |
| | instance: a91c09e3-c9b9-4f9a-848b-785f9028b78a |
| | memory.resident: af7cd10e-6784-4970-98ff-49bf1e153992 |
| | memory.usage: 2b9c9c3f-05ce-4370-a101-736ca2683607 |
| | memory: dc4f5d14-1b55-4f44-a15c-48aac461e2bf |
| | vcpus: c1cc42a0-4674-44c2-ae6d-48df463a6586 |
|resource_id | 969b5215-61d0-47c4-aa3d-b9fc89fcd46c |

CL210-RHOSP10.1-en-2-20171006 381

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

| ... output omitted... |


+--------------+---------------------------------------------------------------+

3. List the available archive policies. Verify that the cpu_util metric of the production-
rhel7 instance uses the archive policy named low.

3.1. List the available archive policies and their supported aggregation methods.

[student@workstation ~(operator1-production)]$ openstack metric archive-policy \


list -c name -c aggregation_methods
+--------+------------------------------------------------+
| name | aggregation_methods |
+--------+------------------------------------------------+
| high | std, count, 95pct, min, max, sum, median, mean |
| low | std, count, 95pct, min, max, sum, median, mean |
| medium | std, count, 95pct, min, max, sum, median, mean |
+--------+------------------------------------------------+

3.2. View the definition of the low archive policy.

[student@workstation ~(operator1-production)]$ openstack metric archive-policy \


show low -c name -c definition
+------------+---------------------------------------------------------------+
| Field | Value |
+------------+---------------------------------------------------------------+
| definition | - points: 12, granularity: 0:05:00, timespan: 1:00:00 |
| | - points: 24, granularity: 1:00:00, timespan: 1 day, 0:00:00 |
| | - points: 30, granularity: 1 day, 0:00:00, timespan: 30 days |
| name | low |
+------------+---------------------------------------------------------------+

3.3. Use the resource ID of the production-rhel7 instance to check which archive policy
is in use for the cpu_util metric.

[student@workstation ~(operator1-production)]$ openstack metric metric \


show --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \
-c archive_policy/name \
cpu_util
+---------------------+-------+
| Field | Value |
+---------------------+-------+
| archive_policy/name | low |
+---------------------+-------+

3.4. View the measures collected for the cpu_util metric associated with the
production-rhel7 instance to ensure that it uses granularities according to the
definition of the low archive policy.

[student@workstation ~(operator1-production)]$ openstack metric measures \


show --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \
cpu_util
+---------------------------+-------------+----------------+
| timestamp | granularity | value |
+---------------------------+-------------+----------------+
| 2017-05-28T00:00:00+00:00 | 86400.0 | 0.838532808055 |
| 2017-05-28T15:00:00+00:00 | 3600.0 | 0.838532808055 |

382 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

| 2017-05-28T18:45:00+00:00 | 300.0 | 0.838532808055 |


+---------------------------+-------------+----------------+

4. Add new measures to the cpu_util metric. Observe that the newly added measures
are available using min and max aggregation methods. Use the values from the following
table. The measures must be added using the architect1 user's credentials, because
manipulating data points requires an account with the admin role. Credentials of user
architect1 are stored in /home/student/architect1-production-rc file.

Measures Parameter
Timestamp Current time in ISO 8601 formatted timestamp
Measure values 30, 42

The measure values 30 and 42 are manual data values added to the cpu_util metric.

4.1. Source architect1 user's credential file. Add 30 and 42 as new measure values.

[student@workstation ~(operator1-production)]$ source ~/architect1-production-rc


[student@workstation ~(architect1-production)]$ openstack metric measures add \
--resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \
--measure $(date -u --iso=seconds)@30 cpu_util
[student@workstation ~(architect1-production)]$ openstack metric measures add \
--resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \
--measure $(date -u --iso=seconds)@42 cpu_util

4.2. Verify that the new measures have been successfully added for the cpu_util metric.
Force the aggregation of all known measures. The default aggregation method is mean,
so you will see a value of 36 (the mean of 30 and 42). The number of records and their
values returned in the output may vary.

[student@workstation ~(architect1-production)]$ openstack metric measures \


show --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \
cpu_util --refresh
+---------------------------+-------------+----------------+
| timestamp | granularity | value |
+---------------------------+-------------+----------------+
| 2017-05-28T00:00:00+00:00 | 86400.0 | 15.419266404 |
| 2017-05-28T15:00:00+00:00 | 3600.0 | 15.419266404 |
| 2017-05-28T19:55:00+00:00 | 300.0 | 0.838532808055 |
| 2017-05-28T20:30:00+00:00 | 300.0 | 36.0 |
+---------------------------+-------------+----------------+

4.3. Display the maximum and minimum values for the cpu_util metric measure.

[student@workstation ~(architect1-production)]$ openstack metric measures \


show --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \
cpu_util --refresh --aggregation max
+---------------------------+-------------+----------------+
| timestamp | granularity | value |
+---------------------------+-------------+----------------+
| 2017-05-28T00:00:00+00:00 | 86400.0 | 42.0 |
| 2017-05-28T15:00:00+00:00 | 3600.0 | 42.0 |
| 2017-05-28T19:55:00+00:00 | 300.0 | 0.838532808055 |
| 2017-05-28T20:30:00+00:00 | 300.0 | 42.0 |
+---------------------------+-------------+----------------+

CL210-RHOSP10.1-en-2-20171006 383

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

[student@workstation ~(architect1-production)]$ openstack metric measures \


show --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \
cpu_util --refresh --aggregation min
+---------------------------+-------------+----------------+
| timestamp | granularity | value |
+---------------------------+-------------+----------------+
| 2017-05-28T00:00:00+00:00 | 86400.0 | 0.838532808055 |
| 2017-05-28T15:00:00+00:00 | 3600.0 | 0.838532808055 |
| 2017-05-28T20:30:00+00:00 | 300.0 | 30.0 |
+---------------------------+-------------+----------------+

5. Create a threshold alarm named cputhreshold-alarm based on aggregation by


resources. Set the alarm to trigger when maximum CPU utilization for the production-
rhel7 instance exceeds 50% for two consecutive 5 minute periods.

5.1. Create the alarm.

[student@workstation ~(architect1-production)]$ openstack alarm create \


--type gnocchi_aggregation_by_resources_threshold \
--name cputhreshold-alarm \
--description 'Alarm to monitor CPU utilization' \
--enabled True \
--alarm-action 'log://' \
--comparison-operator ge \
--evaluation-periods 2 \
--threshold 50.0 \
--granularity 300 \
--aggregation-method max \
--metric cpu_util \
--query '{"=": {"id": "969b5215-61d0-47c4-aa3d-b9fc89fcd46c"}}' \
--resource-type instance
+--------------------+-------------------------------------------------------+
| Field | Value |
+--------------------+-------------------------------------------------------+
| aggregation_method | max |
| alarm_actions | [u'log://'] |
| alarm_id | f93a2bdc-1ac6-4640-bea8-88195c74fb45 |
| comparison_operator| ge |
| description | Alarm to monitor CPU utilization |
| enabled | True |
| evaluation_periods | 2 |
| granularity | 300 |
| metric | cpu_util |
| name | cputhreshold-alarm |
| ok_actions | [] |
| project_id | ba5b8069596541f2966738ee0fee37de |
| query | {"=": {"id": "969b5215-61d0-47c4-aa3d-b9fc89fcd46c"} |
| repeat_actions | False |
| resource_type | instance |
| severity | low |
| state | insufficient data |
| state_timestamp | 2017-05-28T20:41:43.872594 |
| threshold | 50.0 |
| time_constraints | [] |
| timestamp | 2017-05-28T20:41:43.872594 |
| type | gnocchi_aggregation_by_resources_threshold |
| user_id | 1beb5c527a8e4b42b5858fc04257d1cd |
+--------------------+-------------------------------------------------------+

384 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

5.2. View the newly created alarm. Verify that the state of the alarm is either ok or
insufficient data. According to the alarm definition, data is insufficient until two
evaluation periods have been recorded. Continue with the next step if the state is ok or
insufficient data.

[student@workstation ~(architect1-production)]$ openstack alarm list -c name \


-c state -c enabled
+--------------------+-------+---------+
| name | state | enabled |
+--------------------+-------+---------+
| cputhreshold-alarm | ok | True |
+--------------------+-------+---------+

6. Simulate high CPU utilization scenario by manually adding new measures to the cpu_util
metric of the instance. Observe that the alarm triggers when the aggregated CPU utilization
exceeds the 50% threshold through two evluation periods of 5 minutes each. To simulate
high CPU utilization, manually add a measure with a value of 80 once every minute until the
alarm triggers. It is expected to take between 5 and 10 minutes to trigger.

6.1. Open two terminal windows, either stacked vertically or side-by-side. The second
terminal will be used in subsequent steps to add data points until the alarm triggers. In
the first window, use the watch to repetitively display the alarm state.

[student@workstation ~(architect-production)]$ watch openstack alarm list \


-c alarm_id -c name -c state
Every 2.0s: openstack alarm state -c alarm_id -c name -c state

+--------------------------------------+--------------------+-------+
| alarm_id | name | state |
+--------------------------------------+--------------------+-------+
| 82f0b4b6-5955-4acd-9d2e-2ae4811b8479 | cputhreshold-alarm | ok |
+--------------------------------------+--------------------+-------+

6.2. In the second terminal window, use the watch command to add new measures to the
cpu_util metric of the production-rhel7 instance every minute. A value of 80 will
simulate high CPU utilization, since the alarm is set to trigger at 50%.

[student@workstation ~(architect1-production)]$ openstack metric measures \


add --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \
--measure $(date -u --iso=seconds)@80 cpu_util

Repeat this command once per minute. Continue to add manual data points at a rate
of about one of these commands per minute. Be patient, as the trigger must detect a
maximum value greater than 50 in 2 consecutive 5 minute evaluation periods. This is
expected to take between 6 and 10 minutes. As long as you are adding one measure at a
casual pace of one per minute, the alarm will always trigger.

[student@workstation ~(architect-production)]$ openstack metric measures \


add --resource-id 969b5215-61d0-47c4-aa3d-b9fc89fcd46c \
--measure $(date -u --iso=seconds)@80 cpu_util

CL210-RHOSP10.1-en-2-20171006 385

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

Note
In a real-world environment, measures are collected automatically using
various polling and notification agents. Manually adding data point measures
for a metric is only for alarm configuration testing purposes.

6.3. The alarm-evaluator service will detect the new manually added measures. Within
the expected 6 to 10 minutes, the alarm change state to alarm in the first terminal
window. Stop manually adding new data measures as soon as the new allarm state
occurs. Observe the new alarm state. The alarm state will transition back to ok after
one more evaluation period, because high CPU utilization values are no longer being
received. Press CTRL-C to stop the watch.

Every 2.0s: openstack alarm state -c alarm_id -c name -c state

+--------------------------------------+--------------------+-------+
| alarm_id | name | state |
+--------------------------------------+--------------------+-------+
| 82f0b4b6-5955-4acd-9d2e-2ae4811b8479 | cputhreshold-alarm | alarm |
+--------------------------------------+--------------------+-------+

6.4. After stopping the watch and closing the second terminal, view the alarm history to
analyze when the alarm transitioned from the ok state to the alarm state. The output
may look similar to the lines displayed below.

[student@workstation ~(architect1-production)]$ openstack alarm-history show \


82f0b4b6-5955-4acd-9d2e-2ae4811b8479 -c timestamp -c type -c detail -f json

[
{
"timestamp": "2017-06-08T14:05:53.477088",
"type": "state transition",
"detail": "{\"transition_reason\": \"Transition to alarm due to 2 samples
outside threshold, most recent: 70.0\", \"state\": \"alarm\"}"
},
{
"timestamp": "2017-06-08T13:18:53.356979",
"type": "state transition",
"detail": "{\"transition_reason\": \"Transition to ok due to 2 samples
inside threshold, most recent: 0.579456043152\", \"state\": \"ok\"}"
},
{
"timestamp": "2017-06-08T13:15:53.338924",
"type": "state transition",
"detail": "{\"transition_reason\": \"2 datapoints are unknown\", \"state\":
\"insufficient data\"}"
},
{
"timestamp": "2017-06-08T13:11:51.328482",
"type": "creation",
"detail": "{\"alarm_actions\": [\"log:/tmp/alarm.log\"], \"user_id\":
\"b5494d9c68eb4938b024c911d75f7fa7\", \"name\": \"cputhreshold-alarm\",
\"state\": \"insufficient data\", \"timestamp\": \"2017-06-08T13:11:51.328482\",
\"description\": \"Alarm to monitor CPU utilization\", \"enabled\":
true, \"state_timestamp\": \"2017-06-08T13:11:51.328482\", \"rule\":

386 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

{\"evaluation_periods\": 2, \"metric\": \"cpu_util\", \"aggregation_method\":


\"max\", \"granularity\": 300,
\"threshold\": 50.0, \"query\": \"{\\\"=\\\": {\\\"id\\\": \\
\"969b5215-61d0-47c4-aa3d-b9fc89fcd46c\\\"}}\", \"comparison_operator
\": \"ge\", \"resource_type\": \"instance\"},\"alarm_id\":
\"82f0b4b6-5955-4acd-9d2e-2ae4811b8479\", \"time_constraints\": [], \
"insufficient_data_actions\": [], \"repeat_actions\": false, \"ok_actions
\": [], \"project_id\": \"4edf4dd1e80c4e3b99c0ba797b3f3ed8\", \"type\":
\"gnocchi_aggregation_by_resources_threshold\", \"severity\": \"low\"}"

Evaluation
On workstation, run the lab monitoring-review grade command to confirm success of
this exercise. Correct any reported failures and rerun the command until successful.

[student@workstation ~] lab monitoring-review grade

Cleanup
From workstation, run the lab monitoring-review cleanup command to clean up this
exercise.

[student@workstation ~] lab monitoring-review cleanup

CL210-RHOSP10.1-en-2-20171006 387

Rendered for Nokia. Please do not distribute.


Chapter 8. Monitoring Cloud Metrics for Autoscaling

Summary
In this chapter, you learned:

• Telemetry data is used for system monitoring, alerts, and for generating customer usage
billing.

• The Telemetry service collects data using polling agents and notification agents.

• The Time Series Database (Gnocchi) service was introduced to decouple the storing of metric
data from the Telemetry service to increase the efficiency.

• The gnocchi-metricd service is used to compute, in real time, statistics on received data.

• The Alarm (Aodh) service provides alarming services within the Telemetry service architecture.

• The Event Storage (Panko) service stores events collected by the Telemetry service from
various OpenStack components.

• The measures stored in the Time Series Database are indexed based on the resource and its
attributes.

• The aggregated data is stored in the metering database according to the archive policies
defined on a per-metric basis.

• In the Alarm service, the alarm notifier notifies the activation of an alarm by using the HTTP
callback URL, writing to a log file, or sending notifications using the messaging bus.

388 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


TRAINING
CHAPTER 9

ORCHESTRATING
DEPLOYMENTS

Overview
Goal Deploy Orchestration stacks that automatically scale.
Objectives • Describe the Orchestration service architecture and use
cases.

• Write templates using the Heat Orchestration Template


(HOT) language.

• Configure automatic scaling for a stack.


Sections • Describing Orchestration Architecture (and Quiz)

• Writing Heat Orchestration Templates (and Guided


Exercise)

• Configuring Stack Autoscaling (and Quiz)

CL210-RHOSP10.1-en-2-20171006 389

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

Describing Orchestration Architecture

Objectives
After completing this section, students should be able to describe Heat orchestration
architecture and use cases.

Heat Orchestration and Services


When managing an OpenStack infrastructure, using scripts can make it difficult to create
and manage all the infrastructural resources. Even version control and tracking changes in
the infrastructure can be challenging. Replicating production environments across multiple
developments and testing environment become much harder.

The Orchestration service (Heat) provides developers and system administrators an easy and
repeatable way to create and manage a collection of related OpenStack services. The Heat
orchestration service deploys OpenStack resources in an orderly and predictable fashion. The
user creates a Heat Orchestration Template (HOT) template to describe OpenStack resources
and run time parameters required to execute an application. The Orchestration service does
the ordering of the deployment of these OpenStack resources and resolves any dependencies.
When provisioning your infrastructure with the Orchestration service, the Orchestration template
describes the resources to be provisioned and their settings. Since the templates are text files,
the versions of the templates can be controlled using a version control system to track changes
to infrastructure.

Heat Orchestration Service Architecture


An orchestration stack is a collection of multiple infrastructure resources deployed and managed
through the same interface, either by using the Horizon dashboard or using the command-
line interface. Stacks standardize and speed up delivery by providing a unified human-readable
format. The Heat orchestration project started as an analog of AWS CloudFormation, making
it compatible with the template formats used by CloudFormation (CFN), but it also supports its
native template format, Heat Orchestration Templates (HOT). The orchestration service executes
Heat Orchestration Template (HOT) written in YAML. The YAML format is a human-readable data
serialization language.

The template, along with the input parameters, calls the Orchestration REST APIs for deploying
the stack using either the Horizon dashboard or the OpenStack CLI commands. The Heat
orchestration API service forwards requests to the Orchestration engine service using the
remote procedure calls (RPCs) over AMQP. Optionally, the Orchestration CFN service sends the
AWS CloudFormation-compatible requests to the Orchestration engine service over RPC. The
Orchestration engine service interprets the orchestration template and launches the stack. The
events generated by the Orchestration engine service are consumed by the Orchestration API
service to provide the status of the Orchestration stack that was launched.

390 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Orchestration Use Cases and Recommended Practices

Figure 9.1: Heat Orchestration Service Architecture

Heat Orchestration Service


In Red Hat OpenStack Platform, the orchestration service is provided by Heat. An orchestration
template specifies the relationship between the resources to be deployed. The relationship
specified in the template enables the orchestration engine to call different OpenStack APIs to
deploy the resources in the correct order. The Orchestration template uses resource types to
create various resources such as instances, volumes, security groups, and other resources. Next,
more complex resources are created using a nested stack. The Orchestration templates primarily
deploy various infrastructural components. Different software configuration management tools,
such as Ansible, Puppet, and others, can be integrated with the Orchestration templates to
deploy software and to make configuration changes to this software.

Orchestration Use Cases and Recommended Practices


The Orchestration template can be used repeatedly to create identical copies of the same
Orchestration stack. A template, written in YAML formatted text, can be placed under a source
control system to maintain various versions of the infrastructure deployment. Orchestration
makes it easy to organize and deploy a collection of OpenStack resources that allows you to
describe any dependencies or pass specific parameters at run time. Orchestration template
parameters are used to customize aspects of the template at run time during the creation of the
stack. Here are recommended practices to help you plan and organize the deployment of your
Orchestration stack.

CL210-RHOSP10.1-en-2-20171006 391

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

• Using multiple layers of stacks that build on top of one another is the best way to organize an
orchestration stack. Putting all the resources in one stack becomes cumbersome to manage
when the stack is scaled, and broadens the scope of resources to be provisioned.

• When using nested stacks the resources names or IDs can be hard coded into the calling stack.
However, hard coding of resources names or IDs can make templates difficult to be reused, and
may increase overhead to get the stack deployed.

• The changes in the infrastructure after updating a stack should be verified first by doing a dry
run of the stack.

• Before launching a stack, ensure all the resources to be deployed by the orchestration stack
are within the project quota limits.

• With the growth of infrastructure, declaring resources in each template becomes repetitive.
Such shared resources should be maintained as a separate stack and used inside a nested
stack. Nested stacks are the stacks that create other stacks.

• When declaring parameters in the orchestration template, use constraints to define the format
for the input parameters. Constraints allow you to describe legal input values so that the
Orchestration engine catches any invalid values before creating the stack.

• Before using a template to create or update a stack, you can use OpenStack CLI to validate
it. Validating a template helps catch syntax and some semantic errors, such as circular
dependencies before the Orchestration stack creates any resources.

Configuration Files and Logs


The Orchestration service uses the /etc/heat/heat.conf file for configuration. Some of the
most common configuration options can be found in the following table:

Parameter Description
encrypt_parameters_and_properties Encrypts the parameter and properties of a
resource; marked as hidden before storing
in the database. The parameter accepts a
Boolean value.
heat_stack_user_role The Identity user role name associated with
the user who is responsible for launching
the stack. The parameter accepts the
value as a string. The default value is the
heat_stack_user role.
num_engine_workers The number of heat-engine processes to
fork and run on the host. The parameter
accepts the value as an integer. The default
value is either the number of CPUs on the
host running the heat-engine service or 4,
whichever is greater.
stack_action_timeout The default timeout period in seconds to
timeout the creation and update of a stack.
The default value is 3600 seconds (1 hour).

392 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Troubleshooting Orchestration Service

The log files for the orchestration service are stored in the /var/log/heat directory of the
host on which the heat-api, heat-engine, and heat-manage services are running.

File name Description


heat-api.log Stores the log related to the orchestration API service.
heat-engine.log Stores the log related to the orchestration engine service.
heat-manage.log Stores the log related to the orchestration events service.

Troubleshooting Orchestration Service


Most of the errors occur while deploying the orchestration stack. The following are some of the
common errors and ways to troubleshoot the problem.

• Editing an existing template might introduce YAML syntax errors. Various tools, such as
python -m json.tool, help validate the YAML syntax errors in the template files. Using
the --dry-run option with the openstack stack create command validates some of the
YAML syntax.

• If an instance goes into the ERROR state after launching a stack, troubleshoot the problem
by looking for the /var/log/nova/scheduler.log log file on the compute node. If the
error shows No valid host was found, the compute node does not have the required
resources to launch the instance. Check the resources consumed by the instances running on
the compute nodes and, if possible, change the allocation ratio in the /etc/nova/nova.conf
file.

To over-commit the amount of the CPU, the RAM, and the disk allocated on the compute nodes
use the following commands to change the allocation ratio. The ratios shown in the commands
are arbitrary.

[user@demo ~]$ crudini --set /etc/nova/nova.conf DEFAULT disk_allocation_ratio 2.0


[user@demo ~]$ crudini --set /etc/nova/nova.conf DEFAULT cpu_allocation_ratio 8.0
[user@demo ~]$ crudini --set /etc/nova/nova.conf DEFAULT ram_allocation_ratio 1.5

• While validating a template using the --dry-run option. It checks for the existence of
resources required for the template and run time parameters. Using custom constraints helps
the template parameters to be parsed at an early stage rather than failing during the launch of
the stack.

References
Further information is available in the Components chapter of the Architecture Guide
for Red Hat OpenStack Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

CL210-RHOSP10.1-en-2-20171006 393

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

Quiz: Describing Orchestration Architecture

Choose the correct answer(s) to the following questions:

1. Which OpenStack service provides orchestration functionality in Red Hat OpenStack


Platform?

a. Nova
b. Glance
c. Heat
d. Ceilometer

2. Which two template formats are supported by the Orchestration service? (Choose two.)

a. OpenStack Orchestration Template (OOT)


b. Heat Orchestration Template (HOT)
c. Rapid Deployment Template (RDP)
d. CloudFormation (CFN)

3. In what language are Orchestration templates written?

a. XML
b. JSON
c. YAML
d. HTML

4. What is the default timeout period for a stack creation?

a. 86400 Seconds
b. 3600 Seconds
c. 300 Seconds
d. 600 Seconds

5. In which log file does information related to the Orchestration engine service get logged?

a. /var/log/heat/heat-api.log
b. /var/log/heat/heat-manage.log
c. /var/log/heat/engine.log
d. /var/log/heat/heat-engine.log

6. Which command-line interface option helps to validate a template?

a. --validate
b. --run-dry
c. --dry-run
d. --yaml

394 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

Solution
Choose the correct answer(s) to the following questions:

1. Which OpenStack service provides orchestration functionality in Red Hat OpenStack


Platform?

a. Nova
b. Glance
c. Heat
d. Ceilometer

2. Which two template formats are supported by the Orchestration service? (Choose two.)

a. OpenStack Orchestration Template (OOT)


b. Heat Orchestration Template (HOT)
c. Rapid Deployment Template (RDP)
d. CloudFormation (CFN)

3. In what language are Orchestration templates written?

a. XML
b. JSON
c. YAML
d. HTML

4. What is the default timeout period for a stack creation?

a. 86400 Seconds
b. 3600 Seconds
c. 300 Seconds
d. 600 Seconds

5. In which log file does information related to the Orchestration engine service get logged?

a. /var/log/heat/heat-api.log
b. /var/log/heat/heat-manage.log
c. /var/log/heat/engine.log
d. /var/log/heat/heat-engine.log

6. Which command-line interface option helps to validate a template?

a. --validate
b. --run-dry
c. --dry-run
d. --yaml

CL210-RHOSP10.1-en-2-20171006 395

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

Writing Heat Orchestration Templates

Objectives
After completing this section, students should be able to write templates using the Heat
Orchestration Template (HOT) language.

Introduction to YAML
Orchestration templates are written using the YAML Ain't Markup Language (YAML) language.
Therefore, it is necessary to understand the basics of YAML syntax to write an orchestration
template.

YAML was designed primarily for the representation of data structures such as lists and
associative arrays, in an easily written, human-readable format. This design objective is
accomplished primarily by abandoning traditional enclosure syntax, such as brackets, braces, or
opening and closing tags, commonly used by other languages to denote the structure of a data
hierarchy. Instead, in YAML, data hierarchy structures are maintained using outline indentation.

Data structures are represented using an outline format with space characters for indentation.
There is no strict requirement regarding the number of space characters used for indentation
other than data elements must be further indented than their parents to indicate nested
relationships. Data elements at the same level in the data hierarchy must have the same
indentation. Blank lines can be optionally added for readability.

Indentation can only be performed using the space character. Indentation is very critical to the
proper interpretation of YAML. Since tabs are treated differently by various editors and tools,
YAML forbids the use of tabs for indentation.

Adding the following line to the user's $HOME/.vimrc, two-space indentation is performed when
the Tab key is pressed. This also auto-indents subsequent lines.

autocmd FileType yaml setlocal ai ts=2 sw=2 et

Heat Orchestration Template (HOT) Language


Heat Orchestration Template (HOT) is a language supported by the Heat orchestration service.
The template uses the YAML syntax to describe various resources and properties.

Each orchestration template must include the heat_template_version key with a correct
orchestration template version. The orchestration template version defines both the supported
format of the template and the features that are valid and supported for the Orchestration
service. The orchestration template version is in a date format or uses the release name, such as
newton. The openstack orchestration template version list command lists all the
supported template versions.

[user@demo ~]$ openstack orchestration template version list


+--------------------------------------+------+
| version | type |
+--------------------------------------+------+
| AWSTemplateFormatVersion.2010-09-09 | cfn |
| HeatTemplateFormatVersion.2012-12-12 | cfn |
| heat_template_version.2013-05-23 | hot |

396 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Heat Orchestration Template (HOT) Language

| heat_template_version.2014-10-16 | hot |
| heat_template_version.2015-04-30 | hot |
| heat_template_version.2015-10-15 | hot |
| heat_template_version.2016-04-08 | hot |
| heat_template_version.2016-10-14 | hot |
+--------------------------------------+------+

The description key in a template is optional, but can include some useful text that describes
the purpose of the template. You can add multi-line text to the description key by using
folded blocks (>) in YAML. Folded blocks replace each line break with a single space, ignoring
indentation.

heat_template_version: 2016-10-14
description: >
This is multi-line description
that describes the template usage.

Parameters
The orchestration templates allow users to customize the template during deployment of
the orchestration stack by use of input parameters. The input parameters are defined in the
parameters section of the orchestration template. Each parameter is defined as a separate
nested block with required attributes such as type or default. In the orchestration template,
the parameters section uses the following syntax and attributes to define an input parameter
for the template.

parameters:
<param_name>:
type: <string | number | json | comma_delimited_list | boolean>
label: <human-readable name of the parameter>
description: <description of the parameter>
default: <default value for parameter>
hidden: <true | false>
constraints:
<parameter constraints>
immutable: <true | false>

Attribute Description
type Data type of the parameter. The supported data types are
string, number, JSON, comma delimited list, and boolean.
label Human-readable name for the parameter. This attribute is
optional.
description Short description of the parameter. This attribute is optional.
default Default value to be used in case the user does not enter any
value for the parameter. This attribute is optional.
hidden Determines whether the value of the parameter is hidden
when the user lists information about a stack created by the
orchestration template. This attribute is optional and defaults
to false.
constraints Constraints to be applied to validate the input value provided
by the user for a parameter. The constraints attribute can
apply lists of different constraints. This attribute is optional.

CL210-RHOSP10.1-en-2-20171006 397

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

Attribute Description
immutable Defines whether the parameter can be updated. The stack
fails to be updated if the parameter value is changed and the
attribute value is set to true.

The custom_constraints constraint adds an extra step of validation to verify whether


the required resource exists in the environment. Custom constraints are implemented using
Orchestration plugins. The custom_constraints attribute uses the name associated with the
Orchestration plugins. For example, use the following syntax to ensure the existence of a Block
Storage (Cinder) volume:

parameters:
volume_name:
type: string
description: volume name
constraints:
- custom_constraints: cinder.volume

Resources
The resources section in the orchestration template defines resources provisioned during
deployment of a stack. Each resource is defined as a separate nested block with its required
attributes, such as type and properties. The properties attribute defines the properties
required to provision the resource. The resources section in a template uses the following
syntax and attributes to define a resource for the stack.

resources:
<resource ID>:
type: <resource type>
properties:
<property name>: <property value>

Attribute Description
resource ID A resource name. This must be uniquely referenced within the
resources section of the template.
type The attribute uses the resource type name. The core
OpenStack resources are included in the Orchestration engine
service as a built-in resource. The Orchestration service
provides support for resource plugins using custom resources.
This attribute is mandatory and must be specified when
declaring a resource.
properties This attribute is used to specify a list of properties associated
with a resource type. The property value is either hard-coded
or uses intrinsic functions to retrieve the value. This attribute
is optional.

Resource Types
A resource requires a type attribute, such as an instance and various properties, that depend
on the resource type. To list the available resource types, use the openstack orchestration
resource type list command.

[user@demo ~]$ openstack orchestration resource type list


+----------------------------------------------+

398 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Heat Orchestration Template (HOT) Language

| Resource Type |
+----------------------------------------------+
...output omitted...
| OS::Nova::FloatingIP |
| OS::Nova::FloatingIPAssociation |
| OS::Nova::KeyPair |
| OS::Nova::Server |
| OS::Nova::ServerGroup |
| OS::Swift::Container |
+----------------------------------------------+

The OS::Heat::ResourceGroup resource type creates one or more identical resources. The
resource definition is passed as a nested stack. The required property for the ResourceGroup
resource type is resource_def. The value of the resource_def property is the definition of
the resource to be provisioned. The count property sets the number of resources to provision.

resources:
my_group:
type: OS::Heat::ResourceGroup
properties:
count: 2
resource_def:
type: OS::Nova::Server
properties:
name: { get_param: instance_name }
image: { get_param: instance_image }

Intrinsic Functions
HOT provides several built-in functions that are used to perform specific tasks in the template.
Intrinsic functions in the Orchestration template assign values to the properties that are
available during creation of the stack. Some of the widely used intrinsic functions are listed
below:

• get_attr: The get_attr function references an attribute of a resource. This function takes
the resource name and the attribute name as the parameters to retrieve the attribute value for
the resource.

resources:
the_instance:
type: OS::Nova::Server
...output omitted...

outputs:
instance_ip:
description: IP address of the instance
value: { get_attr: [the_instance, first_address] }

• get_param: The get_param function references an input parameter of a template and


returns the value of the input parameter. This function takes the parameter name as the
parameter to retrieve the value of the input parameter declared in the template.

parameters:
instance_flavor:
type: string
description: Flavor to be used by the instance.

resources:

CL210-RHOSP10.1-en-2-20171006 399

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

the_instance:
type: OS::Nova::Server
properties:
flavor: { get_param: instance_flavor }

• get_resource: The get_resource function references a resource in the template.


The function takes the resource name as the parameter to retrieve the resource ID of the
referenced resource.

resources:
the_port:
type: OS::Neutron::Port
...output omitted...

the_instance:
type: OS::Nova::Server
properties:
networks:
port: { get_resource: the_port }

• str_replace: The str_replace function substitutes variables in an input string with values
that you specify. The input string along with variables are passed to the template property of
the function. The values of the variables are instantiated using the params property as a key
value pair.

outputs:
website_url:
description: The website URL of the application.
value:
str_replace:
template: http://varname/MyApps
params:
varname: { getattr: [ the_instance, first_address ] }

• list_join: The list_join function appends a set of strings into a single value, separated
by the specified delimiter. If the delimiter is an empty string, it concatenates all of the strings.

resources:
random:
type: OS::Heat::RandomString
properties:
length: 2

the_instance:
type: OS::Nova::Server
properties:
instance_name: { list_join: [ '-', [ {get_param: instance_name}, {get_attr:
[random, value]} ] ] }

Software Configuration using Heat Orchestration


Template
Orchestration templates allow a variety of options to configure software on the instance
provisioned by the Orchestration stack. The frequency of the software configuration changes
to be applied to the software installed on the instance is the deciding factor on how to

400 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Software Configuration using Heat Orchestration Template

implement software configurations. There are, broadly, three options to implement the software
configuration changes using the orchestration template:

• Using a custom image the includes installed and configured software. This method can be used
when there is no change in software configuration required during the life cycle of an instance.

• Using the user data script and cloud-init to configure the pre-installed software in the
image. This method can be used when there is a software configuration change required once
during the life cycle of an instance (at boot time). An instance must be replaced with a new
instance when software configuration changes are made using this option.

• Using the OS::Heat::SoftwareDeployment resource allows any number of software


configuration changes to be applied to an instance throughout its life cycle.

Using User Data Scripts in a Heat Orchestration Template


When provisioning an instance, you can specify a user-data script to configure the software
installed on the instance. Software can be baked into the image, or installed using a user-
data script. In HOT language, user data is provided using the user-data property for the
OS::Nova::Server resource type. The data provided using the user-data property can
be a shell script or a cloud-init script. The str_replace intrinsic function is used to set the
variable value based on the parameters or the resources in a stack. The user_data_format
property defines the way user data is processed by an instance. Using RAW as the value of the
user_data_format property, the user data is passed to the instance unmodified.

resources:
the_instance:
type: OS::Nova::Server
properties:
...output omitted...
user_data_format: RAW
user_data:
str_replace:
template: |
#/bin/bash
echo "Hello World" > /tmp/$demo
params:
$demo: demofile

When the user data is changed and the orchestration stack is updated using the openstack
stack update command, the instance is deleted and recreated using the updated user data
script.

To provide complex scripts using the user-data property, one must use the get_file intrinsic
function. The get_file function takes the name of a file as its argument.

resources:
the_instance:
type: OS::Nova::Server
properties:
...output omitted...
user_data_format: RAW
user_data:
str_replace:
template: { get_file: demoscript.sh }
params:
$demo: demofile

CL210-RHOSP10.1-en-2-20171006 401

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

Using the Software Deployment Resource Type


Use the OS::Heat::SoftwareDeployment resource type to initiate software configuration
changes without replacing the instance with a new instance. An example use case is any situation
where an instance cannot be replaced with a new instance, but software configuration changes
are needed during the life cycle of the instance. The OS::Heat::SoftwareDeployment
resource type allows you to add or remove software configuration multiple times from an
instance during its life cycle. There are three resource types required to perform the software
configuration changes using the orchestration stack.

• The OS::Heat::SoftwareConfig resource type enables integration with various


software configuration tools such an Ansible Playbook, shell script, or Puppet manifest. The
resource type creates an immutable software configuration so that any change to software
configuration replaces the old configuration with a new configuration. Properties of the
OS::Heat::SoftwareConfig are config, group, inputs, and outputs. The group
property defines the name of the software configuration tool to be used, such as script,
ansible, or puppet. The config property sets the configuration script or manifests that
specifies the actual software configuration performed on the instance. The inputs and the
outputs properties represent the input parameter and the output parameter for the software
configuration.

resources:
the_config:
type: OS::Heat::SoftwareConfig
properties:
group: script
inputs:
- name: filename
- name: content
outputs:
- name: result
config:
get_file: demo-script.sh

• The OS::Heat::SoftwareDeployment resource type applies the software configuration


defined using the OS::Heat::SoftwareConfig resource. The SoftwareDeployment
resource type allow you to input values, based on defined input variables using the inputs
property of the SoftwareConfig resource. When the state changes to the IN_PROGRESS
state, the software configuration that has been replaced with the variable values is made
available to the instance. The state is changed to the CREATE_COMPLETE state when a success
or failure signal is received from the Orchestration API.

The required property for the OS::Heat::SoftwareDeployment resource type is the


server property. The server property is a reference to the ID of the resource to which
configuration changes are applied. Other optional properties includes action, config,
and input_values. The action property defines when the software configuration needs
to be initiated based on orchestration stack state. The action property supports CREATE,
UPDATE, SUSPEND, and RESUME actions. The config property references the resource ID
of the software configuration resource to execute when applying changes to the instance.
The input_values property maps values to the input variables defined in the software
configuration resource.

resources:
the_deployment:
type: OS::Heat::SoftwareDeployment

402 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Software Configuration using Heat Orchestration Template

properties:
server:
get_resource: the_server
actions:
- CREATE
- UPDATE
config:
get_resource: the_config
input_values:
filename: demofile
content: 'Hello World'

• The OS::Nova::Server resource type defines the instance on which the software
configuration changes are applied. The user_data_format property of the
OS::Nova::Server resource type must use the SOFTWARE_CONFIG value to support the
software configuration changes using the OS::Heat::SoftwareDeployment resource.

resources:
the_server:
type: OS::Nova::Server
properties:
...output omitted...
user_data_format: SOFTWARE_CONFIG

The instance to use OS::Heat::SoftwareDeployment resources for software configuration


requires orchestration agents to collect and process the configuration changes by polling the
Orchestration API. You must imbed these Orchestration agents into the image. The python-
heat-agent package must be included, and provides support for software configuration via shell
scripts. Support for other software configuration tools is available from the python-heat-agent-
ansible package (for Ansible playbooks) or the python-heat-agent-puppet package (for Puppet
manifests).

Figure 9.2: SoftwareDeployment Workflow

Other agents used to apply software configuration changes on an instance follows:

• The os-collect-config agents poll the Orchestration API for updated resource metadata
that is associated with the OS::Nova::Server resource.

• The os-refresh-config agent is executed when there is a change in the software


configuration and which is polled by the os-collect-config agent. It refreshes the
configuration by deleting the older configuration and replacing it with the newer configuration.

CL210-RHOSP10.1-en-2-20171006 403

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

The os-refresh-config agent uses the group property defined for the deployment
to process configuration. It uses the heat-config-hook script to apply the software
configuration changes. The heat-config-hook scripts are provided by the python-heat-
agent-* packages. Upon completion, the hook notifies the Orchestration API of a successful or
failed configuration deployment using the heat-config-notify element.

• The os-apply-config agent transforms software configuration data provided by the


orchestration template into a service configuration file.

Using SoftwareDeployment Resource from an Orchestration Stack


The following steps outline the process to use the OS::Heat::SoftwareDeployment resource
for software configuration of an instance.

1. Create a Heat Orchestration Template file to define the orchestration stack.

2. Set the required input parameters in the orchestration stack.

3. Specify the OS::Nova::Server resource to apply the software configuration.

4. Define the OS::Heat::SoftwareConfig resource to create the configuration to be


applied to the OS::Nova::Server resource.

5. Define the OS::Heat::SoftwareDeployment resource. Reference the


OS::Heat::SoftwareConfig resource to set the configuration to be used. Set the
server property of the OS::Heat::SoftwareDeployment resource to use the
OS::Nova::Server resource.

Pass the required input parameters to the OS::Heat::SoftwareDeployment resource


that is made available to the instance during runtime. Optionally, specify the actions
property to define life cycle actions that trigger the deployment.

6. Optionally, specify the output of the stack using the attributes of the
OS::Heat::SoftwareDeployment resource.

7. Create the environment file with all input parameters required for launching the
orchestration stack.

8. Execute a dry run to test stack creation.

9. Initiate the orchestration stack to configure the software using the openstack stack
create command.

10. Optionally, change the software configuration either by editing the configuration script
or by changing the input parameters passed during runtime. Commit the configuration
changes to the instance by updating the stack using the openstack stack update
command.

404 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Software Configuration using Heat Orchestration Template

References
Template Guide
https://docs.openstack.org/heat/latest/template_guide/index.html

Software configuration
https://docs.openstack.org/heat/latest/template_guide/software_deployment.html

CL210-RHOSP10.1-en-2-20171006 405

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

Guided Exercise: Writing Heat Orchestration


Templates

In this exercise, you will edit a orchestration template to launch a customized instance. You will
use a preexisting template and troubleshoot orchestration issues.

Resources
Files: http://materials.example.com/heat/finance-app1.yaml

http://materials.example.com/heat/ts-stack.yaml

http://materials.example.com/heat/ts-environment.yaml

Outcomes
You should be able to:

• Edit a orchestration template to launch a customized instance.

• Launch a stack using the orchestration template.

• Provision identical resources using the OS::Heat::ResourceGroup resource type.

• Troubleshoot orchestration issues.

Before you begin


Log in to workstation as student user and student as the password.

On workstation, run the lab orchestration-heat-templates setup command. This


script ensures the openstack services are running and the environment is properly configured
for the exercise. The script also confirms that the resources needed for launching the stack are
available.

[student@workstation ~]$ lab orchestration-heat-templates setup

Steps
1. On workstation, create a directory named /home/student/heat-templates. The
/home/student/heat-templates directory will store downloaded template files and
environment files used for orchestration.

[student@workstation ~]$ mkdir ~/heat-templates

2. When you edit YAML files, you must use spaces, not the tab character, for indentation. If you
use vi for text editing, add a setting in the .vimrc file to set auto-indentation and set the
tab stop and shift width to two spaces for YAML files. Create the /home/student/.vimrc
file with the content, as shown:

autocmd FileType yaml setlocal ai ts=2 sw=2 et

406 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


3. Download the http://materials.example.com/heat/finance-app1.yaml file in the
/home/student/heat-templates directory. Edit the orchestration template to launch a
customized instance.

The orchestration template must orchestrate the following items:

• The finance-web1 instance must install the httpd package.

• The httpd service must be started and enabled.

• The web server must host a web page containing the following content:

<h1>You are connected to $public_ip</h1>


<h2>The private IP address is: $private_ip</h2>
Red Hat Training

The $public_ip variable is the floating IP address of the instance. The $private_ip
variable is the private IP address of the instance. You will define these variables in the
template.

• The orchestration stack must retry once to execute the user data script. The user data
script must return success on the successful execution of the script. The script must
return the fail result if it is unable to execute the user data script within 600 seconds due
to timeout.

3.1. Change to the /home/student/heat-templates directory. Download the


orchestration template file from http://materials.example.com/heat/
finance-app1.yaml in the /home/student/heat-templates directory.

[student@workstation ~]$ cd ~/heat-templates


[student@workstation heat-templates]$ wget \
http://materials.example.com/heat/finance-app1.yaml

3.2. Use the user_data property to define the user data script to install the httpd
package. The httpd service must be started and enabled to start at boot time. The
user_data_format property for the OS::Nova::Server resource type must be set
to RAW.

Edit the /home/student/heat-templates/finance-app1.yaml file, as shown:

web_server:
type: OS::Nova::Server
properties:
name: { get_param: instance_name }
image: { get_param: image_name }
flavor: { get_param: instance_flavor }
key_name: { get_param: key_name }
networks:
- port: { get_resource: web_net_port }
user_data_format: RAW
user_data:
str_replace:
template: |
#!/bin/bash
yum -y install httpd

CL210-RHOSP10.1-en-2-20171006 407

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

systemctl restart httpd.service


systemctl enable httpd.service

3.3. In the user_data property, create a web page with the following content:

<h1>You are connected to $public_ip</h1>


<h2>The private IP address is: $private_ip</h2>
Red Hat Training

The web page uses the $public_ip and the $private_ip variables passed
as parameters. These parameters are defined using the params property of the
str_replace intrinsic function.

The $private_ip variable uses the web_net_port resource attribute fixed_ips


to retrieve the first IP address associated with the network interface. The $public_ip
variable uses the web_floating_ip resource attribute floating_ip_address to
set the public IP address associated with the instance.

Edit the /home/student/heat-templates/finance-app1.yaml file, as shown:

web_server:
type: OS::Nova::Server
properties:
name: { get_param: instance_name }
image: { get_param: image_name }
flavor: { get_param: instance_flavor }
key_name: { get_param: key_name }
networks:
- port: { get_resource: web_net_port }
user_data_format: RAW
user_data:
str_replace:
template: |
#!/bin/bash
yum -y install httpd
systemctl restart httpd.service
systemctl enable httpd.service
sudo touch /var/www/html/index.html
sudo cat << EOF > /var/www/html/index.html
<h1>You are connected to $public_ip</h1>
<h2>The private IP address is:$private_ip</h2>
Red Hat Training
EOF
params:
$private_ip: {get_attr: [web_net_port,fixed_ips,0,ip_address]}
$public_ip: {get_attr: [web_floating_ip,floating_ip_address]}

3.4. Use the WaitHandleCondition resource to send a signal about the status of the user
data script.

The $wc_notify variable is set to the wait handle URL using the curl_cli
attribute of the wait_handle resource. The wait handle URL value is set to the
$wc_notify variable. The $wc_notify variable returns the status as SUCCESS if
the web page deployed by the script is accessible and returns 200 as the HTTP status
code. The web_server resource state is marked as CREATE_COMPLETE when the
WaitConditionHandle resource signals SUCCESS.

408 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


The WaitConditionHandle returns FAILURE if the web page is not accessible
or if it times out after 600 seconds. The web_server resource state is marked as
CREATE_FAILED.

Edit the /home/student/heat-templates/finance-app1.yaml file, as shown:

web_server:
type: OS::Nova::Server
properties:
name: { get_param: instance_name }
image: { get_param: image_name }
flavor: { get_param: instance_flavor }
key_name: { get_param: key_name }
networks:
- port: { get_resource: web_net_port }
user_data_format: RAW
user_data:
str_replace:
template: |
#!/bin/bash
yum -y install httpd
systemctl restart httpd.service
systemctl enable httpd.service
sudo touch /var/www/html/index.html
sudo cat << EOF > /var/www/html/index.html
<h1>You are connected to $public_ip</h1>
<h2>The private IP address is:$private_ip</h2>
Red Hat Training
EOF
export response=$(curl -s -k \
--output /dev/null \
--write-out %{http_code} http://$public_ip/)
[[ ${response} -eq 200 ]] && $wc_notify \
--data-binary '{"status": "SUCCESS"}' \
|| $wc_notify --data-binary '{"status": "FAILURE"}'
params:
$private_ip: {get_attr: [web_net_port,fixed_ips,0,ip_address]}
$public_ip: {get_attr: [web_floating_ip,floating_ip_address]}
$wc_notify: {get_attr: [wait_handle,curl_cli]}

Save and exit the file.

4. Create the /home/student/heat-templates/environment.yaml environment


file. Enter the values for all input parameters defined in the /home/student/heat-
templates/finance-app1.yaml template file.

Edit the /home/student/heat-templates/environment.yaml file with the content, as


shown:

parameters:
image_name: finance-rhel7
instance_name: finance-web1
instance_flavor: m1.small
key_name: developer1-keypair1
public_net: provider-172.25.250
private_net: finance-network1
private_subnet: finance-subnet1

CL210-RHOSP10.1-en-2-20171006 409

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

5. Launch the stack and verify it by accessing the web page deployed on the instance. Use the
developer1 user credentials to launch the stack.

5.1. Using the developer1 user credentials to dry run the stack, check the resources that
will be created when launching the stack. Rectify all errors before proceeding to the
next step to launch the stack.

Use the finance-app1.yaml template file and the environment.yaml environment


file. Name the stack finance-app1.

Note
Before running the dry run of the stack. Download the http://
materials.example.com/heat/finance-app1.yaml-final template
file in the /home/student/heat-templates directory. Use the diff
command to confirm the editing of the finance-app1.yaml template file
with the known good template file; finance-app1.yaml-final. Fix any
differences you find, then proceed to launch the stack.

[student@workstation heat-templates]$ wget \


http://materials.example.com/heat/finance-app1.yaml-final
[student@workstation heat-templates]$ diff finance-app1.yaml \
finance-app1.yaml-final

[student@workstation heat-templates]$ source ~/developer1-finance-rc


[student@workstation heat-templates(developer1-finance)]$ openstack stack \
create \
--environment environment.yaml \
--template finance-app1.yaml \
--dry-run -c description \
finance-app1
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
| description | spawning a custom web server |
+---------------------+--------------------------------------+

5.2. Launch the stack using the finance-app1.yaml template file and the
environment.yaml environment file. Name the stack finance-app1.

If the dry run is successful, run the openstack stack create with the --enable-
rollback option. Do not use the --dry-run option while launching the stack.

[student@workstation heat-templates(developer1-finance)]$ openstack stack \


create \
--environment environment.yaml \
--template finance-app1.yaml \
--enable-rollback \
--wait \
finance-app1
[finance-app1]: CREATE_IN_PROGRESS Stack CREATE started
[finance-app1.wait_handle]: CREATE_IN_PROGRESS state changed
[finance-app1.web_security_group]: CREATE_IN_PROGRESS state changed
[finance-app1.wait_handle]: CREATE_COMPLETE state changed

410 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


[finance-app1.web_security_group]: CREATE_COMPLETE state changed
[finance-app1.wait_condition]: CREATE_IN_PROGRESS state changed
[finance-app1.web_net_port]: CREATE_IN_PROGRESS state changed
[finance-app1.web_net_port]: CREATE_COMPLETE state changed
[finance-app1.web_floating_ip]: CREATE_IN_PROGRESS state changed
[finance-app1.web_floating_ip]: CREATE_COMPLETE state changed
[finance-app1.web_server]: CREATE_IN_PROGRESS state changed
[finance-app1.web_server]: CREATE_COMPLETE state changed
[finance-app1.wait_handle]: SIGNAL_COMPLETE Signal: status:SUCCESS
reason:Signal 1 received
[finance-app1.wait_condition]: CREATE_COMPLETE state changed
[finance-app1]: CREATE_COMPLETE Stack CREATE completed successfully
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
| id | 23883f81-19b0-4446-a2b8-7f261958a0f1 |
| stack_name | finance-app1 |
| description | spawning a custom web server |
| creation_time | 2017-06-01T08:04:29Z |
| updated_time | None |
| stack_status | CREATE_COMPLETE |
| stack_status_reason | Stack CREATE completed successfully |
+---------------------+--------------------------------------+

5.3. List the output returned by the finance-app1 stack. Check the website_url output
value.

[student@workstation heat-templates(developer1-finance)]$ openstack stack \


output list finance-app1
+----------------+------------------------------------------------------+
| output_key | description |
+----------------+------------------------------------------------------+
| web_private_ip | IP address of first web server in private network |
| web_public_ip | Floating IP address of the web server |
| website_url | This URL is the "external" URL that |
| | can be used to access the web server. |
| | |
+----------------+------------------------------------------------------+
[student@workstation heat-templates(developer1-finance)]$ openstack stack \
output show finance-app1 website_url
+--------------+--------------------------------------------+
| Field | Value |
+--------------+--------------------------------------------+
| description | This URL is the "external" URL that can be |
| | used to access the web server. |
| | |
| output_key | website_url |
| output_value | http://172.25.250.N/ |
+--------------+--------------------------------------------+

5.4. Verify that the instance was provisioned and the user data was executed successfully on
the instance. Use the curl command to access the URL returned as the value for the
website_url output.

[student@workstation heat-templates(developer1-finance)]$ curl \


http://172.25.250.N/
<h1>You are connected to 172.25.250.N</h1>
<h2>The private IP address is:192.168.1.P</h2>
Red Hat Training

CL210-RHOSP10.1-en-2-20171006 411

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

In the previous output, the N represents the last octet of the floating IP address
associated with the instance. The P represents the last octet of the private IP address
associated with the instance.

6. Delete the finance-app1 stack.

[student@workstation heat-templates(developer1-finance)]$ openstack stack \


delete --yes --wait finance-app1
2017-06-01 08:19:01Z [finance-app1]: DELETE_IN_PROGRESS Stack DELETE started

7. Use the OS::Heat::ResourceGroup resource type to provision identical resources. The


stack must orchestrate a maximum of two such resources. The main stack must call the /
home/student/heat-templates/finance-app1.yaml for provisioning the resource
defined in the file.

Edit the orchestration template after downloading from http://


materials.example.com/heat/nested-stack.yaml to the /home/student/heat-
templates directory.

7.1. Download the orchestration template file from http://materials.example.com/


heat/nested-stack.yaml to the /home/student/heat-templates directory.

[student@workstation heat-templates(developer1-finance)]$ wget \


http://materials.example.com/heat/nested-stack.yaml

7.2. Edit the /home/student/heat-templates/nested-stack.yaml orchestration


template. Add the new input parameter named instance_count under the
parameters section. Use the range constraints to define the minimum number as 1
and maximum number as 2.

parameters:
...output omitted...
instance_count:
type: number
description: count of servers to be provisioned
constraints:
- range: { min: 1, max: 2 }

7.3. Edit the /home/student/heat-templates/nested-stack.yaml orchestration


template. Add a resource named my_resource under the resources section. Use the
OS::Heat::ResourceGroup resource type and set the count property to use the
instance_count input parameter.

...output omitted...
resources:
my_resource:
type: OS::Heat::ResourceGroup
properties:
count: { get_param: instance_count }
...output omitted...

Save and exit the file.

412 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


7.4. Edit the /home/student/heat-templates/environment.yaml environment file to
initialize the instance_count input parameter.

parameters:
image_name: finance-rhel7
instance_name: finance-web1
instance_flavor: m1.small
key_name: developer1-keypair1
public_net: provider-172.25.250
private_net: finance-network1
private_subnet: finance-subnet1
instance_count: 2

7.5. Edit the /home/student/heat-templates/environment.yaml environment


file to define a custom resource type named My::Server::Custom::WebServer.
The My::Server::Custom::WebServer custom resource type must point to the
finance-app1.yaml template.

resource_registry:
My::Server::Custom::WebServer: finance-app1.yaml
parameters:
image_name: finance-rhel7
instance_name: finance-web1
instance_flavor: m1.small
key_name: developer1-keypair1
public_net: provider-172.25.250
private_net: finance-network1
private_subnet: finance-subnet1
instance_count: 2

7.6. Open and edit the /home/student/heat-templates/nested-stack.yaml


orchestration template. Set the resource_def property of the my_resource resource
type to use the My::Server::Custom::WebServer custom resource type. The
My::Server::Custom::WebServer custom resource type uses the input parameters
required to provision the instance. Edit the file to add the content, as shown:

resources:
my_resource:
type: OS::Heat::ResourceGroup
properties:
count: { get_param: instance_count }
resource_def:
type: My::Server::Custom::WebServer
properties:
instance_name: { get_param: instance_name }
instance_flavor: { get_param: instance_flavor }
image_name: { get_param: image_name }
key_name: { get_param: key_name }
public_net: { get_param: public_net }
private_net: { get_param: private_net }
private_subnet: { get_param: private_subnet }

Save and exit the file.

CL210-RHOSP10.1-en-2-20171006 413

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

7.7. Use the developer1 user credentials to dry run the stack and check for resources
that will be created. Name the stack finance-app2. Use the nested-stack.yaml
template and the environment.yaml environment file. Rectify any errors before
proceeding to the next step to launch the stack.

Note
Before running the dry run of the stack. Download the http://
materials.example.com/heat/nested-stack.yaml-final template
file in the /home/student/heat-templates directory. Use the diff
command to confirm the editing of the nested-stack.yaml template file
with the known good template file; nested-stack.yaml-final. Fix any
differences you find, then proceed to launch the stack.

[student@workstation heat-templates]$ wget \


http://materials.example.com/heat/nested-stack.yaml-final
[student@workstation heat-templates]$ diff nested-stack.yaml \
nested-stack.yaml-final

[student@workstation heat-templates(developer1-finance)]$ openstack stack \


create \
--environment environment.yaml \
--template nested-stack.yaml \
--dry-run \
finance-app2

7.8. Launch the stack using the nested-stack.yaml template file and the
environment.yaml environment file. Name the stack finance-app2.

If the dry run succeeds, run the openstack stack create with --enable-
rollback option. Do not use the --dry-run option while launching the stack.

[student@workstation heat-templates(developer1-finance)]$ openstack stack \


create \
--environment environment.yaml \
--template nested-stack.yaml \
--enable-rollback \
--wait \
finance-app2
2017-06-01 08:48:03Z [finance-app2]: CREATE_IN_PROGRESS Stack CREATE started
2017-06-01 08:48:03Z [finance-app2.my_resource]: CREATE_IN_PROGRESS state
changed
2017-06-01 08:51:10Z [finance-app2.my_resource]: CREATE_COMPLETE state changed
2017-06-01 08:51:10Z [finance-app2]: CREATE_COMPLETE Stack CREATE completed
successfully
+---------------------+------------------------------------------------------+
| Field | Value |
+---------------------+------------------------------------------------------+
| id | dbb32889-c565-495c-971e-8f27b4e35588 |
| stack_name | finance-app2 |
| description | Using ResourceGroup to scale out the custom instance |
| creation_time | 2017-06-01T08:48:02Z |
| updated_time | None |
| stack_status | CREATE_COMPLETE |

414 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


| stack_status_reason | Stack CREATE completed successfully |
+---------------------+------------------------------------------------------+

7.9. Verify that the finance-app2 stack provisioned two instances.

[student@workstation heat-templates(developer1-finance)]$ openstack server \


list -c Name -c Status -c Networks
+--------------+--------+----------------------------------------------+
| Name | Status | Networks |
+--------------+--------+----------------------------------------------+
| finance-web1 | ACTIVE | finance-network1=192.168.1.N, 172.25.250.P |
| finance-web1 | ACTIVE | finance-network1=192.168.1.Q, 172.25.250.R |
+--------------+--------+----------------------------------------------+

7.10. Delete the finance-app1 stack.

[student@workstation heat-templates(developer1-finance)]$ openstack stack \


delete --yes --wait finance-app2
2017-06-01 08:52:01Z [finance-app2]: DELETE_IN_PROGRESS Stack DELETE started

8. Download the template from http://materials.example.com/heat/ts-stack.yaml.


Download the environment file from http://materials.example.com/heat/
ts-environment.yaml. Troubleshoot the template and fix the issues to deploy the
orchestration stack successfully.

8.1. Download the template and the environment files in the /home/student/templates
directory.

[student@workstation heat-templates(developer1-finance)]$ wget \


http://materials.example.com/heat/ts-stack.yaml
[student@workstation heat-templates(developer1-finance)]$ wget \
http://materials.example.com/heat/ts-environment.yaml

8.2. Verify that the Heat template does not contain any errors.

Use the developer1 user credentials to dry run the stack and check for any errors.
Name the stack finance-app3. Use the ts-stack.yaml template and the ts-
environment.yaml environment file.

The finance-app3 stack dry run returns the following error:

[student@workstation heat-templates(developer1-finance)]$ openstack stack \


create \
--environment ts-environment.yaml \
--template ts-stack.yaml \
--dry-run \
finance-app3
Error parsing template file:///home/student/heat-templates/ts-stack.yaml while
parsing a block mapping
in "<unicode string>", line 58, column 5:
type: OS::Nova::Server
^
expected <block end>, but found '<block mapping start>'
in "<unicode string>", line 61, column 7:

CL210-RHOSP10.1-en-2-20171006 415

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

image: { get_param: image_name }

8.3. Fix the indentation error for the name property of the OS::Nova::Server resource
type.

web_server:
type: OS::Nova::Server
properties:
name: { get_param: instance_name }

8.4. Verify the indentation fix by running the dry run of the finance-app3 stack again.

The finance-app3 stack dry run returns the following error:

[student@workstation heat-templates(developer1-finance)]$ openstack stack \


create \
--environment ts-environment.yaml \
--template ts-stack.yaml \
--dry-run \
finance-app3
ERROR: Parameter 'key_name' is invalid:
Error validating value 'finance-keypair1':
The Key (finance-keypair1) could not be found.

8.5. Resolve the error as the keypair passed in the ts-environment.yaml file does not
exists.

Check the keypair name that exists.

[student@workstation heat-templates(developer1-finance)]$ openstack keypair \


list
+---------------------+-------------------------------------------------+
| Name | Fingerprint |
+---------------------+-------------------------------------------------+
| developer1-keypair1 | e3:f0:de:43:36:7e:e9:a4:ee:04:59:80:8b:71:48:dc |
+---------------------+-------------------------------------------------+

Edit the /home/student/heat-templates/ts-environment.yaml file. Enter the


correct key pair name, developer1-keypair1.

8.6. Verify the keypair name fix in the /home/student/heat-templates/ts-


environment.yaml file.

The finance-app3 stack dry run must not return any error.

[student@workstation heat-templates(developer1-finance)]$ openstack stack \


create \
--environment ts-environment.yaml \
--template ts-stack.yaml \
--dry-run \
finance-app3

8.7. Launch the stack using the ts-stack.yaml template file and the ts-
environment.yaml environment file. Name the stack finance-app3.

416 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


If the dry run succeeds, run the openstack stack create with --enable-
rollback option. Do not use the --dry-run option while launching the stack.

[student@workstation heat-templates(developer1-finance)]$ openstack stack \


create \
--environment ts-environment.yaml \
--template ts-stack.yaml \
--enable-rollback \
--wait \
finance-app3
[finance-app3]: CREATE_IN_PROGRESS Stack CREATE started
[finance-app3.wait_handle]: CREATE_IN_PROGRESS state changed
[finance-app3.web_security_group]: CREATE_IN_PROGRESS state changed
[finance-app3.web_security_group]: CREATE_COMPLETE state changed
[finance-app3.wait_handle]: CREATE_COMPLETE state changed
[finance-app3.wait_condition]: CREATE_IN_PROGRESS state changed
[finance-app3.web_net_port]: CREATE_IN_PROGRESS state changed
[finance-app3.web_net_port]: CREATE_COMPLETE state changed
[finance-app3.web_floating_ip]: CREATE_IN_PROGRESS state changed
[finance-app3.web_server]: CREATE_IN_PROGRESS state changed
[finance-app3.web_floating_ip]: CREATE_COMPLETE state changed
[finance-app3.web_server]: CREATE_COMPLETE state changed
[finance-app3.wait_handle]: SIGNAL_COMPLETE Signal: status:SUCCESS
reason:Signal 1 received
[finance-app3.wait_condition]: CREATE_COMPLETE state changed
[finance-app3]: CREATE_COMPLETE Stack CREATE completed successfully
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
| id | 839ab589-1ded-46b2-8987-3fe18e5e823b |
| stack_name | finance-app3 |
| description | spawning a custom server |
| creation_time | 2017-06-01T12:08:22Z |
| updated_time | None |
| stack_status | CREATE_COMPLETE |
| stack_status_reason | Stack CREATE completed successfully |
+---------------------+--------------------------------------+

Cleanup
From workstation, run the lab orchestration-heat-templates cleanup command to
clean up this exercise.

[student@workstation ~]$ lab orchestration-heat-templates cleanup

CL210-RHOSP10.1-en-2-20171006 417

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

Configuring Stack Autoscaling

Objective
After completing this section, students should be able to implement Autoscaling.

Overview of Autoscaling and its Benefits


Autoscaling provides cloud applications the ability to dynamically adjust the resource capacity
to meet service requirements. Adding autoscaling to your architecture provides scalability,
availability, and fault tolerance. Automatic scaling of the cloud infrastructure allows the cloud
provider to gain following benefits:

• Autoscaling detects an unhealthy instance, terminates it, and launches a new instance to
replace it.

• Autoscaling allows cloud resources to run with the capacity required to handle the demand.

There are two types of scaling architecture: scale-up and scale-out. In scale-up architecture,
scaling adds more capacity by increasing the resources such as memory, CPU, disk IOPS, and so
on. In scale-out architecture, scaling adds more capacity by increasing the number of servers to
handle the load.

The scale-up architecture is simple to implement but hits a saturation point sooner or later. If
you keep adding more memory to an existing cloud instance to adapt with the current load,
saturation is reached once the instance's host itself runs out of resources. The scaling scope in
this case depends entirely upon the hardware capacity of the node where the cloud instance
is hosted. In scale-out architecture, new identical resources are created to fulfill the load with
virtually unlimited scaling scope. Therefore, the scale-out architecture is preferred and the
recommended approach for cloud infrastructure.

Autoscaling requires a trigger generated from an alarming service to scale out or scale in.
In Red Hat OpenStack Platform, the Orchestration service implements autoscaling by using
utilization data gathered from the Telemetry service. An alarm acts as the trigger to autoscale
an orchestration stack based on the resource utilization threshold or the event pattern defined in
the alarm.

Autoscaling Architecture and Services


The Orchestration service implements the Autoscaling feature. An administrator creates a
stack that dynamically scales based on defined scaling policies. The Telemetry service monitors
cloud instances and other resources in OpenStack. The metrics collected by the Telemetry
service are stored and aggregated by the Time Series Database service (Gnocchi). Based on data
collected by the Time Series Database service, an alarm determines the condition upon which
scaling triggers. To trigger autoscaling based on the current workload, use the metric, alarm,
and scaling policy resources.

418 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Autoscaling Use Cases and Best Practices

Figure 9.3: Autoscaling with Heat orchestration

An orchestration stack is also automatically scaled using the Aodh event alarms. For example,
when an instance abruptly stops, the stack marks the server unhealthy and launches a new
server to replace it.

Autoscaling Use Cases and Best Practices


Autoscaling builds complex environments that automatically adjust capacity by dynamically
adding or removing resources. This aids in performance, availability, and control over
infrastructure cost and usage. Autoscaling supports all the use cases where an application
architecture demands scalability to maintain performance and decreases capacity during low
demands to reduce cost. Autoscaling is well-suited for the applications which have stable usage
patterns or shows variability in usage patterns during a given period.

Autoscaling also supports creating self-healing application architecture. In a self-healing


architecture, the unhealthy cloud instances in which the application is not responding are
replaced by terminating the instance and launching a new instance.

CL210-RHOSP10.1-en-2-20171006 419

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

Figure 9.4: Deploying Infrastructure using an Orchestration Stack

Consider a deployment, illustrated in Figure 9.4: Deploying Infrastructure using an Orchestration


Stack, in which the Orchestration stack uses a public IP address, a load balancer pool, a load
balancer, a set of cloud instances, and alarms to monitor events. The stack uses predefined event
patterns generated in the OpenStack messaging queue.

420 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Autoscaling Use Cases and Best Practices

Figure 9.5: Self-Healing Infrastructure using Orchestration Stack

When the event alarm associated with the load balancer detects an event indicating one of the
instance in the pool is stopped or deleted, a scaling event occurs. It first marks the server as
unhealthy and begins a stack update to replace it with a new identical stack automatically.

The following recommended practices help you to plan and organize autoscaling with an
Orchestration stack:

• Scale-out architecture is more suitable for cloud computing and autoscaling, whereas scale up
is a better option for traditional virtualization platform.

• Stateless application architecture is most appropriate for autoscaling. When a server goes
down or transitions into an error state, it is not repaired, but is removed from the stack and
replaced by a new server.

• It is better to scale up faster than scale down. For example, when scaling up, do not add one
server after five minutes then another one after ten minutes. Instead, add two servers at once.

• Avoid unnecessary scaling by defining a reasonable cool-down period in the Autoscaling group.

• Ensure that the Telemetry service is operating correctly and emitting the metrics required for
autoscaling to work.

• The granularity defined for the alarm must match the archive policy used by the metric.

CL210-RHOSP10.1-en-2-20171006 421

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

• Test your scaling policies by simulating real-world data. For example, use the openstack
metric measures add command to push new measures directly to the metric and check if
that triggers the scaling as expected.

Autoscaling Configuration
In a template, the Autoscaling resource group defines the resource to be provisioned. It launches
a number of instances defined by the desired capacity or minimum group size parameters.

Telemetry alarms are defined to trigger autoscaling to either scale out or scale in, based on the
alarm rules. Primarily there are two alarms: one for scaling out and the other for scaling in. The
action for these alarms invokes the URL associated with the scaling-out policy and scaling-in
policy.

The Autoscaling policy defines the number of resources that need to be added or removed in the
event of scale out or scale in. It uses the defined Autoscaling group. To adjust to various usage
patterns, multiple Autoscaling policies can be defined to automatically scale the infrastructure.

Almost all metrics monitored by the Telemetry service can be used to scale orchestration
stacks dynamically. The following Orchestration resource types are used to create resources for
autoscaling:

OS::Heat::AutoScalingGroup
This resource type is used to define an Autoscaling resource group. Required properties
include max_size, min_size, and resource. Optional properties include cooldown,
desired_capacity, and rolling_updates.

The resource property defines the resource and its properties that are created in the
Autoscaling group.

The max_size property defines the maximum number of identical resources in the
Autoscaling group. The min_size property defines the minimum number of identical
resources that must be running in the Autoscaling group.

The desired_capacity property defines the desired initial number of resources. If not
specified, the value of desired_capacity is equal to the value of min_size. The optional
cooldown property defines the time gap, in seconds, between two consecutive scaling
events.

The rolling_updates property defines the sequence for rolling out the updates. It
streamlines the update rather than taking down the entire service at the same time. The
optional max_batch_size and min_in_service parameters of the property define
maximum and minimum numbers of resources to be replaced at once. The pause_time
property defines a time to wait between two consecutive updates.

web_scaler:
type: OS::Heat::AutoScalingGroup
properties:
desired_capacity: 2
cooldown: 100
max_size: 5
min_size: 1
resource:
type: My::Server::Custom::WebServer
properties:

422 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Autoscaling Configuration

instance_name: { list_join: [ '-', [ {get_param: instance_name}, {get_attr:


[random, value]} ] ] }
instance_flavor: {get_param: instance_flavor}
image_name: {get_param: image_name}
key_name: {get_param: key_name}
public_net: {get_param: public_net}
private_net: {get_param: private_net}
private_subnet: {get_param: private_subnet}
instance_metadata: { "metering.server_group": {get_param: "OS::stack_id"} }

OS::Heat::ScalingPolicy
The OS::Heat::AutoScalingGroup resource type defines the Autoscaling policy used to
manage scaling in the Autoscaling group. Required properties include adjustment_type,
auto_scaling_group_id, and scaling_adjustment. Optional properties include
cooldown and min_adjustment_step.

The Autoscaling policy uses the adjustment_type property to decide on the type of
adjustment needed. When a scaling policy is executed, it changes the current capacity
of the Autoscaling group using the scaling_adjustment specified in the policy. The
value for the property can be set to change_in_capacity, exact_capacity, or
percentage_change_in_capacity.

The Autoscaling policy uses the auto_scaling_group_id property to apply the policy to
the Autoscaling group. The scaling_adjustment property defines the size of adjustment.
A positive value indicates that resources should be added. A negative value terminates
the resource. The cooldown property defines the time gap, in seconds, between two
consecutive scaling events.

The min_adjustment_step property is used in conjunction with the


percentage_change_in_capacity property. The property defines the minimum number
of resources that are added or terminated when the Autoscaling group scales out or scales
in.

The resource return two attributes: alarm_url and signal_url. The alarm_url attribute
returns a signed URL to handle the alarm associated with the scaling policy. This attribute
is used by an alarm to send a request to either scale in or scale out, depending on the
associated scaling policy. The signal_url attribute is an URL to handle the alarm using the
native API that is used for scaling. The attribute value must be invoked as a REST API call
with a valid authentication token.

scaleup_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: { get_resource: web_scaler }
cooldown: 180
scaling_adjustment: 1

scaledown_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: { get_resource: web_scaler }
cooldown: 180
scaling_adjustment: -1

CL210-RHOSP10.1-en-2-20171006 423

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

OS::Aodh::GnocchiAggregationByResourcesAlarm
This resource type defines the Aodh telemetry alarm based on the aggregation of
resources. The alarm monitors the usage of all the sub-resources of a resource. Required
properties include metric, query, resource_type, and threshold. Optional
properties include aggregation_method, alarm_actions, comparison_operator,
evaluation_periods, and granularity.

The alarm_actions property defines the action to be taken when the alarm is triggered.
When the alarm associated with a scaling policy is triggered, the alarm_actions property
calls the signal_url attribute of the Autoscaling policy. The signal_url attribute is the
URL that handles an alarm.

The metric property defines the metric to be monitored. The evaluation_periods


property sets the number of periods to evaluate the metric measures before setting off the
alarm. The threshold property defines the value which, when exceeded or reduced, the
alarm is triggered.

memory_alarm_high:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
properties:
description: Scale up if memory usage is 50% for 5 minutes
metric: memory
aggregation_method: mean
granularity: 300
evaluation_periods: 1
threshold: 600
resource_type: instance
comparison_operator: gt
alarm_actions:
- str_replace:
template: trust+url
params:
url: {get_attr: [scaleup_policy, signal_url]}
query:
str_replace:
template: '{"=": {"server_group": "stack_id"}}'
params:
stack_id: {get_param: "OS::stack_id"}

Manually Scaling an Orchestration Stack


Manually autoscaling allows you to dry run the orchestration stack before deploying the stack
with the associated alarms. The following steps outline the process for manually autoscaling with
an orchestration stack using the signal_url attribute.

1. Write an orchestration template to autoscale a stack using the AutoScalingGroup and


ScalingPolicy resources.

2. Define the outputs section to return the output values using the signal_url attribute of
the ScalingPolicy resources.

3. Launch the orchestration stack. List the output values returned by the signal_url
attribute for both scaling out and scaling in policies.

4. Use the openstack token issue command to retrieve an authentication token.

424 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Troubleshooting Autoscaling Issues

5. Manually scale out or scale in by invoking the REST API using the signal_url attribute
value along with the token ID generated.

Troubleshooting Autoscaling Issues


When a stack with automatic scaling is deployed, useful information is logged into the log files
of the Orchestration service. The default logging level for the orchestration service is ERROR.
Enabling DEBUG logging gives more insight and helps to trace the complex issues. To enable
DEBUG logging, edit the /etc/heat/heat.conf file on the on the host where the Orchestration
components are deployed.

The following log files for each orchestration service are stored in the /var/log/heat directory
on the host where the Orchestration components are deployed.

heat-api.log
The /var/log/heat/heat-api.log log file records API calls to the orchestration service.

heat-engine.log
The /var/log/heat/heat-engine.log log file stores the processing of orchestration
templates and the requests to the underlying API for the resources defined in the template.

heat-manage.log
The /var/log/heat/heat-manage.log log file stores the events that occur when
deploying a stack, or when a scaling event is triggered.

Alarms play important roles in the autoscaling of instances. The following log files for the Aodh
alarming service is stored in the /var/log/Aodh directory of the controller node.

listener.log
Logs related to the Aodh alarming service querying the Gnocchi metering service are
recorded in this file. The /var/log/aodh/listener.log log file provides information to
troubleshoot situations when the Alarming service is unable to reach the Telemetry service
to evaluate the alarm condition.

notifier.log
Logs related to notifications provided by an Aodh alarm are recorded in this file. The /
var/log/aodh/notifier.log log file is helpful when troubleshooting situations where
the Alarming service is unable to reach the signal_url defined for the alarm to trigger
autoscaling.

evaluator.log
The Alarming service evaluates the usage data every minute, or as defined in the
alarm definition. Should the evaluation fail, errors are logged in the /var/log/aodh/
evaluator.log log file.

Troubleshooting a deployed orchestration stack that is not performing scaling up or scaling


down operations starts by looking for the orchestration events. After the orchestration
stack provisions the resources, the openstack stack event list command returns the
SIGNAL_COMPLETE status once a scaling event completes. More information about the scaling
event can be viewed using the openstack stack event show command.

If the autoscaling stack fails to deploy, use the openstack stack command to identify the
failed component. Use the openstack stack list command with the --show-nested

CL210-RHOSP10.1-en-2-20171006 425

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

option to view all nested stacks. The command returns the nested stack IDs, names, and stack
status.

Use the openstack stack resource list command to identify the failed resource. The
command returns the resource name, physical resource ID, resource type, and its status. The
physical resource ID can then be queried using the openstack stack resource show
command to check the output value returned while creating the resource.

References
Further information is available in the Configure Autoscaling for Compute section of
the Autoscaling for Compute for Red Hat OpenStack Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/

426 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Quiz: Configuring Stack Autoscaling

Quiz: Configuring Stack Autoscaling

Choose the correct answer(s) to the following questions:

1. Which OpenStack service provides the evaluation criteria for triggering auto-scaling?

a. Nova
b. Gnocchi
c. Aodh
d. Ceilometer

2. Which two statements are true about autoscaling using an orchestration stack? (Choose
two.)

a. Autoscaling allows you to scale the resources in but not out.


b. Autoscaling allows you to manually scale the resources both in and out.
c. Autoscaling allows you to scale the resources automatically out but not in.
d. Autoscaling allows you to scale the resources automatically both in and out.

3. What is the resource type required to define the Auto Scaling policy using an orchestration
stack?

a. OS::Heat::AutoScalingPolicy
b. OS::Nova::Server
c. OS::Heat::ScalingPolicy
d. OS::Heat::AutoScalingGroup

4. Which property of the AutoScalingGroup resource is used to define the time gap between
two consecutive scaling events?

a. cooldown
b. wait
c. pause
d. timeout

5. Which three are allowed values for the adjustment_type property of a scaling policy
resource? (Choose three)

a. change_capacity
b. change_in_capacity
c. exact_capacity
d. exact_in_capacity
e. percentage_change_in_capacity
f. percentage_change_capacity

6. Which attribute of the scaling policy returns the signed URL to handle the alarm associated
with the scaling policy?

CL210-RHOSP10.1-en-2-20171006 427

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

a. signed_URL
b. signal_URL
c. alarm_URL
d. scale_URL

428 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Solution

Solution
Choose the correct answer(s) to the following questions:

1. Which OpenStack service provides the evaluation criteria for triggering auto-scaling?

a. Nova
b. Gnocchi
c. Aodh
d. Ceilometer

2. Which two statements are true about autoscaling using an orchestration stack? (Choose
two.)

a. Autoscaling allows you to scale the resources in but not out.


b. Autoscaling allows you to manually scale the resources both in and out.
c. Autoscaling allows you to scale the resources automatically out but not in.
d. Autoscaling allows you to scale the resources automatically both in and out.

3. What is the resource type required to define the Auto Scaling policy using an orchestration
stack?

a. OS::Heat::AutoScalingPolicy
b. OS::Nova::Server
c. OS::Heat::ScalingPolicy
d. OS::Heat::AutoScalingGroup

4. Which property of the AutoScalingGroup resource is used to define the time gap between
two consecutive scaling events?

a. cooldown
b. wait
c. pause
d. timeout

5. Which three are allowed values for the adjustment_type property of a scaling policy
resource? (Choose three)

a. change_capacity
b. change_in_capacity
c. exact_capacity
d. exact_in_capacity
e. percentage_change_in_capacity
f. percentage_change_capacity

6. Which attribute of the scaling policy returns the signed URL to handle the alarm associated
with the scaling policy?

a. signed_URL
b. signal_URL

CL210-RHOSP10.1-en-2-20171006 429

Rendered for Nokia. Please do not distribute.


Chapter 9. Orchestrating Deployments

c. alarm_URL
d. scale_URL

430 CL210-RHOSP10.1-en-2-20171006

Rendered for Nokia. Please do not distribute.


Summary

Summary
In this chapter, you learned:

• The Orchestration service (Heat) provides the developers and the system administrators an
easy and repeatable way to create and manage a collection of related OpenStack resources.

• The Orchestration API service forwards requests to the Orchestration engine service using
remote procedure calls (RPCs) over AMQP.

• The Orchestration engine service interprets the orchestration template and launches the stack.

• Using multiple layers of stacks that build on top of one another is the best way to organize an
orchestration stack.

• Changes in infrastructure after updating a stack must be verified first by doing a dry run of the
stack.

• Intrinsic functions in the Heat orchestration template assign values to properties that are
available during the creation of a stack.

• Using the OS::Heat::SoftwareDeployment resource allows any number of software


configuration changes applied to an instance throughout its life cycle.

• When the user data is changed and the orchestration stack is updated using the openstack
stack update command, the instance is deleted and recreated using the updated user data
script.

• The AutoScalingGroup and the ScalingPolicy resources of the Orchestration stack help
build self-healing infrastructure.

• Stateless servers are more suitable for autoscaling. If a server goes down or transitions into an
error state, instead of repairing the server, it will be replaced with a new server.

CL210-RHOSP10.1-en-2-20171006 431

Rendered for Nokia. Please do not distribute.


432

Rendered for Nokia. Please do not distribute.

You might also like