Professional Documents
Culture Documents
Openstack Admin2 CL210 RHOSP10.1 en 2 20171006
Openstack Admin2 CL210 RHOSP10.1 en 2 20171006
Openstack Admin2 CL210 RHOSP10.1 en 2 20171006
TRAINING
The contents of this course and all its modules and related materials, including handouts to
audience members, are Copyright © 2017 Red Hat, Inc.
This instructional program, including all material provided herein, is supplied without any
guarantees from Red Hat, Inc. Red Hat, Inc. assumes no liability for damages or legal action
arising from the use or misuse of contents or details contained herein.
If you believe Red Hat training materials are being used, copied, or otherwise improperly
distributed please e-mail training@redhat.com or phone toll-free (USA) +1 (866) 626-2994
or +1 (919) 754-3700.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, Hibernate, Fedora, the
Infinity Logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and
other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other
countries.
The OpenStack® Word Mark and OpenStack Logo are either registered trademarks/service
marks or trademarks/service marks of the OpenStack Foundation, in the United States
and other countries and are used with the OpenStack Foundation's permission. We are not
affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack
community.
Introduction ix
Red Hat OpenStack Administration II ......................................................................... ix
Orientation to the Classroom Environment ................................................................. x
Internationalization ................................................................................................ xix
CL210-RHOSP10.1-en-2-20171006 v
vi CL210-RHOSP10.1-en-2-20171006
Note
"Notes" are tips, shortcuts or alternative approaches to the task at hand. Ignoring a
note should have no negative consequences, but you might miss out on a trick that
makes your life easier.
Important
"Important" boxes detail things that are easily missed: configuration changes that
only apply to the current session, or services that need restarting before an update
will apply. Ignoring a box labeled "Important" will not cause data loss, but may cause
irritation and frustration.
Warning
"Warnings" should not be ignored. Ignoring warnings will most likely cause data loss.
References
"References" describe where to find external documentation relevant to a subject.
CL210-RHOSP10.1-en-2-20171006 vii
The focus of this course is managing OpenStack using the unified command-line interface,
managing instances, and maintaining an enterprise deployment of OpenStack. Exam
competencies covered in the course include: expand compute nodes on Red Hat OpenStack
Platform using the undercloud (Red Hat OpenStack Platform director); manage images,
networking, object storage, and block storage; provide orchestration and autoscaling (scale-out
and scale-in); and build a customized image.
Objectives
• Expand compute nodes on the overcloud.
• Customize instances.
Audience
• Cloud administrators, cloud operators, and system administrators interested in, or responsible
for, maintaining a private cloud.
Prerequisites
• Red Hat Certified System Administrator (RHCSA in Red Hat Enterprise Linux) certification or
equivalent experience.
CL210-RHOSP10.1-en-2-20171006 ix
The workstation virtual machine is the only one that provides a graphical user interface.
In most cases, students should log in to the workstation virtual machine and use ssh to
connect to the other virtual machines. A web browser can also be used to log in to the Red Hat
OpenStack Platform Dashboard web interface. The following table lists the virtual machines that
are available in the classroom environment:
Classroom Machines
Machine name IP addresses Role
workstation.lab.example.com, 172.25.250.254, Graphical workstation
workstationN.example.com 172.25.252.N
director.lab.example.com 172.25.250.200, Undercloud node
172.25.249.200
x CL210-RHOSP10.1-en-2-20171006
Note
Access to the classroom utility server is restricted; shell access is unavailable.
CL210-RHOSP10.1-en-2-20171006 xi
• The setup verb is used at the beginning of an exercise or lab. It verifies that the systems are
ready for the activity, possibly making some configuration changes to them.
• The grade verb is executed at the end of a lab. It provides external confirmation that the
activity's requested steps were performed correctly.
• The cleanup verb can be used to selectively undo elements of the activity before moving on
to later activities.
rht-vmctl Commands
Action Command
Start controller0 machine. rht-vmctl start controller0
View physical console to log in and work with rht-vmctl view controller0
controller0 machine.
Reset controller0 machine to its previous rht-vmctl reset controller0
state and restart the virtual machine.
Caution: Any work generated on the disk
will be lost.
At the start of a lab exercise, if instructed to reset a single virtual machine node, then you are
expected to run rht-vmctl reset nodename on the foundationX system as the kiosk
user.
At the start of a lab exercise, if instructed to reset all virtual machines, then run the rht-vmctl
reset all command on the foundationX system as the kiosk user. In this course, however,
"resetting all virtual machines" normally refers to resetting only the overcloud nodes and the
undercloud node, as described in the following section.
Wait sufficiently to ensure that all nodes have finished booting and initializing services. The rht-
vmctl output displays RUNNING when the nodes are initialized, but is not an indication that the
nodes have completed their startup procedures.
When ready, open a workstation console to continue. Log in as student, password student.
Confirm that the nova-compute service is running.
xii CL210-RHOSP10.1-en-2-20171006
Verify that the nova-compute service is up, or comes up within 60 seconds. Uncommonly,
after environment resets, nova-compute can appear to remain in a down state. Restart nova-
compute to resolve this issue. Although the openstack-service restart nova-compute
command works correctly, using the systemctl cpmmand may be faster because it is a lower
level operating system request. Use of sudo, for root privilege, is required.
Determine whether the overcloud nodes are actually running, from the viewpoint of the
hypervisor environment underneath the virtual machines, not from the viewpoint of the
openstack server list. For a newly provisioned environment, the overcloud nodes will still
be off, but it is recommended practice to always check.
Use the rht-vmctl command. Are the overcloud nodes controller0, ceph0 and compute0
still DEFINED as expected?
Return to the director system and start each node using the openstack command. Under all
normal circumstances, do not use rht-vmctl to start overcloud nodes!
Include compute1 only when working in the chapter where the second compute node is built and
used. In all other chapters, compute1 is powered off and ignored.
CL210-RHOSP10.1-en-2-20171006 xiii
Wait until OpenStack has stopped the overcloud nodes, then shut down the rest of the
environment.
As with a clean startup, verify that the nova-compute service is up. Use the sudo systemctl
command if necessary. Do not continue until the nova-compute service is up.
At this point, it is expected that the overcloud nodes are not running yet, because the course
lab environment only auto-starts workstation, power and director. Check using the rht-
vmctl.
xiv CL210-RHOSP10.1-en-2-20171006
Use the following table to determine the correct command to use to either start or synchronize
the state or the overcloud nodes. The hypervisor state is down the left column. The OpenStack
state is along the top row.
This is also the resolution for hung or unresponsive nodes. Resolve using rht-vmctl
poweroff. Once the node powers off, use rht-vmctl start to boot only the affected node.
You should *always* say (y)es if any of the following conditions are true:"
- You have just reset the overcloud nodes using "rht-vmctl reset" in ILT."
- You have just reset the overcloud nodes from the Online course dashboard."
- You have restarted or rebooted overcloud nodes or any critical services."
- You suspect your environment has a problem and would prefer to validate."
CL210-RHOSP10.1-en-2-20171006 xv
You should see three ceph-osd@# services. If these services do not exist at all, then the systemd
services that were to create the OSD services for each disk device did not complete successfully.
In this scenario, manually create the OSDs by starting these device services:
These ceph-disk services will complete and then exit when their corresponding OSD service is
created. If the ceph-disk services exist in a failed state, then an actual problem exists with the
physical or virtual storage devices used as the ceph storage: /dev/vdb, /dev/vdc, and /dev/vdd.
If the ceph-osd@# services exist in a failed state, they can usually fixed by restarting them.
The above three commands are equivalent to the single command below. Target services are
designed to simplify starting sets of services or for declaring the services that represent a
functional state. After starting the OSDs, use the ceph -s command to verify that ceph has a
status of HEALTH_OK.
xvi CL210-RHOSP10.1-en-2-20171006
flags sortbitwise,require_jewel_osds
pgmap v1004: 224 pgs, 6 pools, 4751 MB data, 1125 objects
5182 MB used, 53152 MB / 58334 MB avail
224 active+clean
workstation compute0
power compute1
ceph0
director
Technically, the director system is the undercloud. However, in the context of "resetting the
overcloud", director must be included because director's services and databases are full of
control, management and monitoring information about the overcloud it is managing. Therefore,
to reset the overcloud without resetting director is to load a fresh overcloud with director
still retaining stale information about the previous overcloud just discarded.
In a physical clkassroom, use the rht-vmctl command to reset only the relevant nodes.
Although you can type one rht-vmctl command per node, which is tedious, there is an
interactive option to choose which nodes to reset and which nodes to skip. Don't forget the -i
option or else you will inadvertently reset all of your virtual machines. While not catastrophic, it
can be an annoying time-waster.
CL210-RHOSP10.1-en-2-20171006 xvii
The director node is configured to start automatically, while the overcloud nodes are
configured to not start automatically. This is the same behavior as a newly provisioned lab
environment. Give director sufficient time to finish booting and initializing services, then ssh
to director to complete the normal overcloud nodes startup tasks.
Wait sufficiently to allow overcloud nodes to finish booting and initializing services. Then use
the health check script to validate the overcloud lab environment.
To reset everything, if deemed necessary, takes time but results in a fresh environment.
Use rht-vmctl fullreset to pull down and start clean disk images from the classroom
system.
After the environment is re-provisioned, start again with the instructions for a new environment.
xviii CL210-RHOSP10.1-en-2-20171006
Internationalization
Language support
Red Hat Enterprise Linux 7 officially supports 22 languages: English, Assamese, Bengali, Chinese
(Simplified), Chinese (Traditional), French, German, Gujarati, Hindi, Italian, Japanese, Kannada,
Korean, Malayalam, Marathi, Odia, Portuguese (Brazilian), Punjabi, Russian, Spanish, Tamil, and
Telugu.
Language settings
In the GNOME desktop environment, the user may be prompted to set their preferred language
and input method on first login. If not, then the easiest way for an individual user to adjust their
preferred language and input method settings is to use the Region & Language application. Run
the command gnome-control-center region, or from the top bar, select (User) > Settings.
In the window that opens, select Region & Language. The user can click the Language box and
select their preferred language from the list that appears. This will also update the Formats
setting to the default for that language. The next time the user logs in, these changes will take
full effect.
These settings affect the GNOME desktop environment and any applications, including gnome-
terminal, started inside it. However, they do not apply to that account if accessed through an
ssh login from a remote system or a local text console (such as tty2).
Note
A user can make their shell environment use the same LANG setting as their graphical
environment, even when they log in through a text console or over ssh. One way to do
this is to place code similar to the following in the user's ~/.bashrc file. This example
code will set the language used on a text login to match the one currently set for the
user's GNOME desktop environment:
Japanese, Korean, Chinese, or other languages with a non-Latin character set may not
display properly on local text consoles.
Individual commands can be made to use another language by setting the LANG variable on the
command line:
CL210-RHOSP10.1-en-2-20171006 xix
Subsequent commands will revert to using the system's default language for output. The locale
command can be used to check the current value of LANG and other related environment
variables.
The Region & Language application can also be used to enable alternative input methods. In the
Region & Language application's window, the Input Sources box shows what input methods are
currently available. By default, English (US) may be the only available method. Highlight English
(US) and click the keyboard icon to see the current keyboard layout.
To add another input method, click the + button at the bottom left of the Input Sources window.
An Add an Input Source window will open. Select your language, and then your preferred input
method or keyboard layout.
Once more than one input method is configured, the user can switch between them quickly by
typing Super+Space (sometimes called Windows+Space). A status indicator will also appear
in the GNOME top bar, which has two functions: It indicates which input method is active, and
acts as a menu that can be used to switch between input methods or select advanced features of
more complex input methods.
Some of the methods are marked with gears, which indicate that those methods have advanced
configuration options and capabilities. For example, the Japanese Japanese (Kana Kanji) input
method allows the user to pre-edit text in Latin and use Down Arrow and Up Arrow keys to
select the correct characters to use.
US English speakers may find also this useful. For example, under English (United States) is the
keyboard layout English (international AltGr dead keys), which treats AltGr (or the right Alt)
on a PC 104/105-key keyboard as a "secondary-shift" modifier key and dead key activation key
for typing additional characters. There are also Dvorak and other alternative layouts available.
Note
Any Unicode character can be entered in the GNOME desktop environment if the user
knows the character's Unicode code point, by typing Ctrl+Shift+U, followed by the
code point. After Ctrl+Shift+U has been typed, an underlined u will be displayed to
indicate that the system is waiting for Unicode code point entry.
For example, the lowercase Greek letter lambda has the code point U+03BB, and can be
entered by typing Ctrl+Shift+U, then 03bb, then Enter.
From the command line, root can change the system-wide locale settings with the localectl
command. If localectl is run with no arguments, it will display the current system-wide locale
settings.
xx CL210-RHOSP10.1-en-2-20171006
To set the system-wide language, run the command localectl set-locale LANG=locale,
where locale is the appropriate $LANG from the "Language Codes Reference" table in this
chapter. The change will take effect for users on their next login, and is stored in /etc/
locale.conf.
In GNOME, an administrative user can change this setting from Region & Language and clicking
the Login Screen button at the upper-right corner of the window. Changing the Language of
the login screen will also adjust the system-wide default language setting stored in the /etc/
locale.conf configuration file.
Important
Local text consoles such as tty2 are more limited in the fonts that they can display
than gnome-terminal and ssh sessions. For example, Japanese, Korean, and Chinese
characters may not display as expected on a local text console. For this reason, it may
make sense to use English or another language with a Latin character set for the
system's text console.
Likewise, local text consoles are more limited in the input methods they support, and
this is managed separately from the graphical desktop environment. The available
global input settings can be configured through localectl for both local text virtual
consoles and the X11 graphical environment. See the localectl(1), kbd(4), and
vconsole.conf(5) man pages for more information.
Language packs
When using non-English languages, you may want to install additional "language packs" to
provide additional translations, dictionaries, and so forth. To view the list of available langpacks,
run yum langavailable. To view the list of langpacks currently installed on the system,
run yum langlist. To add an additional langpack to the system, run yum langinstall
code, where code is the code in square brackets after the language name in the output of yum
langavailable.
References
locale(7), localectl(1), kbd(4), locale.conf(5), vconsole.conf(5),
unicode(7), utf-8(7), and yum-langpacks(8) man pages
Conversions between the names of the graphical desktop environment's X11 layouts and
their names in localectl can be found in the file /usr/share/X11/xkb/rules/
base.lst.
CL210-RHOSP10.1-en-2-20171006 xxi
xxii CL210-RHOSP10.1-en-2-20171006
MANAGING AN ENTERPRISE
OPENSTACK DEPLOYMENT
Overview
Goal Manage the Undercloud, the Overcloud, and related services.
Objectives • Describe the Undercloud architecture and the Overcloud
architecture.
CL210-RHOSP10.1-en-2-20171006 1
Objectives
After completing this section, students should be able to:
The following table reviews the OpenStack core services. Together, these components provide
the services necessary to deploy either tenant workload systems or OpenStack infrastructure
systems.
2 CL210-RHOSP10.1-en-2-20171006
CL210-RHOSP10.1-en-2-20171006 3
The OpenStack core components provide a comprehensive set of services to provision end user
cloud workloads consisting of deployed server instances organized by tenant projects. With
orchestration, arrangements of complex multi-server applications have become easy to define
and deploy with push-button simplicity. Still, the installation and management of OpenStack
cloud infrastructure itself has remained difficult to master and maintain, until the introduction of
Red Hat OpenStack Platform (RHOSP) director.
The RHOSP director is a standalone OpenStack all-in-one installation, providing a tool set
for installing and managing a complete OpenStack infrastructure environment. It is based
primarily on the OpenStack Deployment component developed in the TripleO project, which
is an abbreviation for "OpenStack-On-OpenStack". The Deployment service uses OpenStack
components running on the dedicated all-in-one installation (the undercloud) to install an
operational OpenStack cloud (the overcloud), utilizing extended core components, plus new
components, to locate, provision, deploy and configure bare metal systems as OpenStack
controller, compute, networking and storage nodes. The following table describes the OpenStack
deployment component services.
4 CL210-RHOSP10.1-en-2-20171006
The undercloud is the Red Hat OpenStack Platform director machine itself, plus the provisioning
network and resources required to perform undercloud tasks. During the building process for the
overcloud, the machine nodes being provisioned to become controller, compute, network, and
storage systems are considered to be the workload of the undercloud. When deployment and all
configuration stages are complete, these nodes reboot to become the overcloud.
Stated again: the undercloud installs the overcloud. However, the undercloud is not only an
installation tool set. It is a comprehensive platform for managing, monitoring, upgrading, scaling
and deleting overclouds. Currently, the undercloud supports deploying and managing a single
overcloud. In the future, the undercloud will allow an administrator to deploy and manage many
tenant overclouds.
CL210-RHOSP10.1-en-2-20171006 5
OpenStack Platform release. Packstack is no longer the preferred tool for common cloud
installations, but remains useful for limited use cases. Packstack was an internal tool developed
to create proof-of-concept (POC) deployments of one or possibly a few systems. First-adopter
RHOSP clients and analysts popularized it, and some have pushed the tool beyond recommended
use. Compared to RHOSP director, there are advantages and disadvantages:
Lifecycle management for cloud infrastructure has operational tasks similar to legacy enterprise
management, but also incorporates new interpretations of Continuous Integration and (DevOps).
The cloud industry differentiates stages of lifecycle management by categorizing tasks as Day 0
(Planning), Day 1 (Deploying), and Day 2 (Operations).
As a Day 0 Planning tool, director provides default, customizable configuration files to define
cloud architecture, including networking and storage topologies, OpenStack service parameters,
and third party plugin integration. These default files and templates implement Red Hat's highly
available reference architecture and recommended practices.
6 CL210-RHOSP10.1-en-2-20171006
Director is designed as central management for ongoing Day 2 Operations. It can perform
environment health checks, auto-scale an overcloud by adding or replacing nodes, minor release
updates and major version upgrades, plus patching, monitoring and regulation compliance.
To use the overcloud for Day 2 management, all management must be accomplished using the
undercloud CLI or APIs. Currently, there is no reasonable expectation that the undercloud can
detect, interpret, or reconcile manual changes not implemented through the undercloud. Using
outside tool sets loses the ability to perform safe and predictable updates, upgrades, and scaling.
Integration with third party tools that exclusively call undercloud APIs is recommended, and does
not break Day 2 operation support. Recommended examples include integration between the
undercloud and Red Hat CloudForms, Red Hat Satellite, and Ansible Tower by Red Hat.
The undercloud uses a variety of popular and stable OpenStack components to provide required
services, including the Deployment Service for image deployment, creation, and environment
templating, Bare Metal for bare metal introspection, Orchestration for component definition,
ordering, and deployment, and Puppet for post-instantiation configuration. The undercloud
includes tools that help with hardware testing, and is architected to facilitate future functionality
for automated OpenStack upgrades and patch management, centralized log collection, and
problem identification.
Overcloud nodes are deployed from the undercloud machine using a dedicated, isolated
provisioning network. Overcloud nodes must be configured to PXE boot on this provisioning
network, with network booting on other NICs disabled. These nodes must also support the
Intelligent Platform Management Interface (IPMI). Each candidate system needs to have a single
NIC on the provisioning network. This NIC must not be used for remote connectivity, because the
deployment process will reconfigure NICs for Open vSwitch bridging.
Minimal information must be gathered about candidate nodes before beginning deployment
configuration, including the MAC address of the appropriate provisioning NIC, the IP address of
the IPMI NIC, the IPMI user name and password.
Later in this course, you will view and learn the undercloud configuration used to build the
classroom overcloud on your student system. No previous undercloud knowledge is required, but
it is recommended to become proficient with the technologies mentioned in this section before
using the undercloud to deploy and manage a production environment.
References
Further information is available about RHOSP Director at
Red Hat OpenStack Platform Director Life Cycle
https://access.redhat.com/support/policy/updates/openstack/platform/director
TripleO Architecture
https://docs.openstack.org/tripleo-docs/latest/install/introduction/architecture.html
TripleO documentation
https://docs.openstack.org/tripleo-docs/latest/
CL210-RHOSP10.1-en-2-20171006 7
1. Which tool is recommended for all production Red Hat OpenStack Platform installs?
a. The overcloud
b. Foreman
c. Packstack
d. RHOSP director (undercloud)
e. Manual package install
2. Which four of these components are services of the undercloud? (Choose four.)
a. Data Processing
b. Deployment
c. Bare Metal
d. Database
e. Orchestration
f. Workflow
3. Which four of these capabilities are part of the undercloud's duties? (Choose four.)
a. Application scaling
b. Automated upgrades
c. Patch management
d. Central log collection
e. Monitoring
8 CL210-RHOSP10.1-en-2-20171006
Solution
Choose the correct answer(s) to the following questions:
1. Which tool is recommended for all production Red Hat OpenStack Platform installs?
a. The overcloud
b. Foreman
c. Packstack
d. RHOSP director (undercloud)
e. Manual package install
2. Which four of these components are services of the undercloud? (Choose four.)
a. Data Processing
b. Deployment
c. Bare Metal
d. Database
e. Orchestration
f. Workflow
3. Which four of these capabilities are part of the undercloud's duties? (Choose four.)
a. Application scaling
b. Automated upgrades
c. Patch management
d. Central log collection
e. Monitoring
CL210-RHOSP10.1-en-2-20171006 9
Objectives
After completing this section, students should be able to:
Undercloud Services
Red Hat OpenStack Platform director is a deployment cloud for OpenStack infrastructure,
in which the cloud workload is the overcloud systems themselves: controllers, compute
nodes, and storage nodes. Since infrastructure nodes are commonly built directly on physical
hardware systems, the undercloud may be referred to as a bare metal cloud. However, as you
will experience in this course, an undercloud can deploy infrastructure to virtual systems for
learning, testing, and specific use cases. Similarly, overclouds almost exclusively deploy virtual
machines and containers but can be used to deploy tenant workloads directly to dedicated,
physical systems, such as blade servers or enterprise rack systems, by incorporating bare metal
drivers and methods. Therefore, the terms bare metal cloud and tenant workload cloud are only a
convenient frame of reference.
• manages task interaction and prerequisite ordering using the Workflow service.
The Deployment service generates the data required to instruct subordinate services to perform
deployment and installation tasks. It comes preconfigured with custom configurations and
sample templates for common deployment scenarios. The following table describes the primary
concepts and tasks being introduced in the Deployment service.
10 CL210-RHOSP10.1-en-2-20171006
CL210-RHOSP10.1-en-2-20171006 11
Orchestration Terminology
Term Definition
Resources A template section that defines infrastructure elements to deploy,
such as virtual machines, network ports, and storage disks.
Parameters A template section to define deployment-specific parameter
settings provided to satisfy template resource requirements. Most
templates define default parameters for all settings.
Outputs Output parameters dynamically generated during deployment
and specified as information required to be passed back to the
administrator. For example, public IP addresses, instance names,
and other deployment results.
Template directory A location for storing and invoking modified templates, allowing
the default templates to remain unmodified and reusable.
Environment directory A location for storing environment files. Environment files are
specific to a deployment event, containing parameter settings
defining this particular deployment. The design allows a specific
overcloud design to be reused with new resource names and
settings, without modify underlying templates. Environment files
affect the runtime behavior of a template, overriding resource
implementations and parameters.
An overcloud deployment is invoked by specifying a template directory and a location for the
environment files:
12 CL210-RHOSP10.1-en-2-20171006
IPMI is designed as a server remote access and control interface specification. It remains
consistent across a variety of vendor hardware implementations, including CIMC, DRAC, iDRAC,
iLO, ILOM, and IMM hardware platform interfaces. The primary functions of the specification
include monitoring, power control, logging, and inventory management. IPMI is intended to be
used with systems management software, although it can be invoked directly through simple
command line utilities.
Note
In this course, the overcloud is deployed on virtual machines possessing no hardware
or IPMI layer. Instead, a single virtual machine named power emulates a separate
IPMI interface for each overcloud virtual machine. IPMI commands are sent to a
node-specific IP address on power, where virtual BMC software performs power
management activities by communicating with the hypervisor to perform platform
management requests. A subset of the IPMI specification is implemented: to power up,
power down, and obtain configuration and state notifications.
CL210-RHOSP10.1-en-2-20171006 13
14 CL210-RHOSP10.1-en-2-20171006
Overcloud Management
Following deployment, the overcloud can be managed from the undercloud.
Use the OpenStack CLI to start, stop, and monitor the status of the overcloud nodes. Use
openstack server list to determine the servers' current status.
Use openstack server start to boot each node. The servers should be started in the order
shown. The servers may take many minutes to display an ACTIVE status, so be patient and
continue to recheck until all servers are running.
You may experience a scenario where the status of nodes is ACTIVE, but checking the virtual
machine power state from the online environment or the hypervisor shows the nodes are
actually powered off. In this scenario, the undercloud must instruct the nodes to be stopped first
(to synchronize the recognized node state, even though the nodes are already off) before the
nodes are started again. This can all be accomplished with one command; enter openstack
server reboot for each node.
The nodes will first display a status of REBOOT, but will quickly switch to ACTIVE while they
continue to start.
CL210-RHOSP10.1-en-2-20171006 15
References
The Director Installation & Usage guide for Red Hat OpenStack Platform 10
https://access.redhat.com/documentation/en-US/index.html
16 CL210-RHOSP10.1-en-2-20171006
In this exercise, you will connect to the undercloud node, director to launch the predefined
overcloud. You will use the OpenStack CLI on the undercloud to manage the overcloud nodes.
Outcomes
You should be able to:
Steps
1. Confirm that the infrastructure and undercloud virtual machines (workstation, power,
and director) are started and accessible.
1.3. Log in to director as the stack user, using SSH. The login is passwordless when
coming from workstation.
2. As the stack user on director, check the status of the undercloud. If the nova-compute
service displays as down, wait until the status changes to up before continuing. The wait
should be no more than a minute or two.
2.1. Use the OpenStack CLI to list the status of the undercloud compute services.
Wait until nova-compute displays as up before trying to start the overcloud nodes.
3. As the stack user on director, check the overcloud status. If necessary, start the
overcloud.
CL210-RHOSP10.1-en-2-20171006 17
3.1. Use the OpenStack CLI to list the overcloud server names and current status.
In the above output, the overcloud nodes are SHUTOFF and need to be started.
3.2. Use the OpenStack CLI to start the overcloud nodes in the order shown.
3.3. Use the OpenStack CLI to confirm that the overcloud nodes have transitioned ACTIVE.
When done, log out from director.
18 CL210-RHOSP10.1-en-2-20171006
CL210-RHOSP10.1-en-2-20171006 19
Objectives
After completing this section, students should be able to:
Verifying an Undercloud
The undercloud is architected to be more than an installation tool. This course discusses
orchestration for both initial install and for compute node scaling. RHOSP director also performs
numerous Day 2 activities. Therefore, the undercloud system is not intended to be uninstalled
or decommissioned after the overcloud is installed. The undercloud can be checked for proper
configuration:
Currently, the undercloud is capable of installing a single overcloud with the stack name
overcloud. The Workflow Service is capable of managing multiple plans and stacks. In a future
release, the undercloud will be able to install, access, and manage multiple overclouds. Currently
supported ongoing activities for the undercloud include:
• auto-scaling or replacing HA controllers and compute nodes; currently, scaling storage nodes is
handled by the storage platform, not the undercloud
Verifying an Overcloud
Once built, an overcloud is a production infrastructure with many interacting components.
To avoid damaging live data and applications, verify installation operation before deploying
production workloads. Verifying involves multiple levels of checking:
20 CL210-RHOSP10.1-en-2-20171006
Downloading an object file created by a service account (or, more broadly, to run any OpenStack
command as a service user) requires using that service user's authentication. It is not
necessary to create a permanent authentication rc file for a service account, since running
commands as a service user is not a typical or regular task. Instead, override the current
authentication environment by prepending only the service account's environment variables to
the extraordinary command. For example:
The container name in which the files are stored matches the name of the service which created
them; for the introspection process the service is ironic-inspector. The password for the
ironic service user is found in the undercloud-passwords.conf file. Use the openstack
baremetal node to locate the file name used to store introspection results for a node.
The information required to download introspection results in summarized in the following table.
CL210-RHOSP10.1-en-2-20171006 21
The displayed data are attributes of the introspected node. This data can be used to verify
that the introspection process correctly analyzed this node, or to customize the introspection
process. Such customization is an advanced RHOSP director installation topic and is beyond the
scope of this course.
• list services on each node to view which systemd-configured services are running on each
type of deployment role server.
• compare the static IP addresses set in the orchestration network template files to the network
addresses on each overcloud node
• compare the NIC configuration of the controller deployment role network orchestration
template to the network interfaces and OpenvSwitch bridges on controller0
• compare the NIC configuration of the compute deployment role network orchestration
template to the network interfaces on compute0
• compare the NIC configuration of the ceph-storage deployment role network orchestration
template to the network interfaces on ceph0
• compare the disk configuration in the orchestration storage template file to the output of
ceph osd and ceph status commands
22 CL210-RHOSP10.1-en-2-20171006
IPMI IP addresses
Node name and KVM IP address on provisioning Virtual IP address on power
domain name network IPMI emulator
controller0 172.25.249.1 172.25.249.101
compute0 172.25.249.2 172.25.249.102
compute1 172.25.249.12 172.25.249.112
ceph0 172.25.249.3 172.25.249.103
This classroom does not require the full IPMI set of capabilities, only the ability to power cycle or
start nodes programmatically on demand. The command-line utility to test the functionality of
the power IPMI emulation uses this syntax:
The -I interface options are compiled into the command and may be seen with ipmitool -h.
The lanplus choice indicates the use of the IPMI v2.0 RMCP+ LAN Interface.
For example, to view the power status of the controller0 node, run the following command.
• Testing is invoked as the admin user of the overcloud to be tested. The current environment
file must be loaded before starting.
• The system running the tests must have access to the internal API network. This can be a
temporary interface configured only for the duration of testing.
• An external network and subnet, must exist before running testing.
• Internet access is expected by default, to obtain a CirrOS image to use in testing. In our
classroom, we specify a local image from the command line to avoid this requirement.
• The heat_stack_user role must exist in the tested overcloud.
• Installing the openstack-tempest-all package installs all component tests, including tests for
components not installed on the overcloud. Manual editing of the tempest configuration file
can turn off unneeded components.
CL210-RHOSP10.1-en-2-20171006 23
The Testing service API tests are designed to use only the OpenStack API, and not one of the
Python client interfaces. The intent is for this testing to validate the API, by performing both
valid and invalid API invocations against component APIs to ensure stability and proper error
handling. The Testing Service can also be used to test client tool implementations if they can
operate in a raw testing mode which allows passing JSON directly to the client. Scenario tests
are also included. These test are a related series of steps to create more complex objects and
project states, confirmed for functionality, and then removed.
The Testing service runs the full-length tests by default. However, the service also provides a
method for running only shorter smoke tests or to skip tests, by creating a text file to list tests
by name, then including the file as an option when testing is run. This is useful for including or
excluding tests as required, such as skipping tests that may be inoperable due to component
updates or customization, or where individual features have been disabled. Adding *.smoke to
the skip list limits tests to the smoke tests.
One method for running tests is the tools/run-test.sh script, which uses a skip list file with
both include and exclude regular expression syntax for selecting tests. This course uses this
method because the Testing service CLI in RHOSP10 is not yet feature complete. However, the
tempest run command is available as another simple test invocation method.
The newer Testing service CLI also includes the useful temptest cleanup command, which
can find and delete resources created by the Testing service, even if tests have aborted or
completed with a failed status and left orphaned resources. To use this tool, first run the
command with the --init-saved-state option before running any tests. This option creates
a saved_state.json file containing a list of existing resources from the current cloud
deployment that will be preserved from subsequent cleanup commands. The following example
demonstrates the correct order in which to use the tempest cleanup commands.
An internal-only server instance is not be accessible from any system other than an authorized
controller node for that overcloud. To gain access to the server's console, a user may access the
controller through a VNC- or Spice-enabled browser, or a websockets-implemented VNC or Spice
client. Since Red Hat OpenStack Platform support for Spice is not yet released, this course uses
and describes VNC console components and configuration.
Each compute node runs a vncserver process, listening on the internal API network at one or
more ports starting at 5900 and going up, depending on the number of instances deployed on
that compute node. Each controller node runs a novncproxy process, listening at port 6080
on the same internal API network. The remaining services belong to the Compute Service
(codenamed Nova) with components on both the controller and compute nodes.
24 CL210-RHOSP10.1-en-2-20171006
To access the console, a user clicks the server's instance name from the Dashboard Project/
Compute/Instances screen to reach the instance detail screen, which has 4 sub-tabs. Clicking
the Console sub-tab initiates a request for a VNC console connection. The following list describes
the resulting Compute service and VNC interactions to build the instance-specific URL. Each
component named is followed by its access location in parentheses (node name, network name,
port number):
• A client browser (workstation, external), configured with the NoVNC plug-in, connects to
the Dashboard haproxy (controller0, external, port 80) to request to open a console to a
specific running server instance.
• Haproxy passes an access URL request to nova-api (controller0, internal API, port 8774).
In Figure 1.4: The constructed nova-novncproxy instance-specific URL, notice the URL, which
includes inline parameters for the token and instance ID for the requested server instance demo,
at the bottom of the Dashboard screen as the mouse hovers over the click-able link in the blue
message area.
CL210-RHOSP10.1-en-2-20171006 25
Note
The requirement that a user clicks the link titled Click here to show only console, plus
any messages about keyboard non-response, is not an error. It is the result of browser
settings forbidding cross domain scripts from running automatically. A user could
select settings, such as show all content or load unsafe scripts, that disable protective
security policies, but it is not recommended. Instead, manually click the link.
The Compute service has obtained connection information that it has cached with the console
authorization service, to be requested and used by any user who provides the correct token.
The URL passed to the browser is not the direct address of the demo instance, but instead is the
novncproxy address, which constructs a connection reverse proxy, to allow the demo instance
to initiate console screen refreshes. The following list describes the remaining interactions to
complete the reverse proxy VNC connection when the URL is clicked:
• Using the token, nova-novncproxy retrieves the connect_info object from nova-
consoleauth (controller0, internal API, AMQP).
Deploying and connecting to the VNC console of an internal-only server instance validates core
Compute service, Messaging service and network access functionality.
Verifying an Overcloud
The following steps outline the process to verify an overcloud deployment.
1. On the undercloud, add a port to the control plane interface br-ctlplane and assign it an
IP address.
5. Run the config_tempest tool configuration script using the external network ID as an
argument.
6. Optionally, edit the /etc/tempest.conf file to select or clear the services to be tested.
26 CL210-RHOSP10.1-en-2-20171006
References
Intelligent Platform Management Interface
https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface
Further information is available in the OpenStack Integration Test Suite Guide for
Red Hat OpenStack Platform 10; at
https://access.redhat.com/documentation/en-US/index.html
CL210-RHOSP10.1-en-2-20171006 27
In this exercise, you will view the results of the deployment tasks that created the overcloud on
your system. You will verify the operation and configuration of the undercloud, then verify the
operation and configuration of the overcloud to compare and contrast the differences. Finally, to
validate that the overcloud is functional, you will install and run the Testing service.
Outcomes
You should be able to:
Steps
1. Log in to director as the stack user. Observe that the stackrc environment file
automatically loaded. You will use the stack user's authentication environment to query
and manage the undercloud.
1.1. SSH to the stack user on the director system. No password is required. View the
stack user's environment, which is used to connect to the undercloud.
1.2. View the current overcloud server list to find the provisioning network address for each
node. The IP addresses shown here may differ from yours.
28 CL210-RHOSP10.1-en-2-20171006
2. Log in to each overcloud system to view the unique services running on each node type,
using the heat-admin account that was provisioned during deployment. The heat-admin
on each node is configured with the SSH keys for the stack user from director to allow
password-less access.
2.1. Using SSH, log in to the controller0 service API node. List relevant services and
network configuration, then log out.
2.2. Using SSH, log in to the compute0 hypervisor node. List relevant services and network
configuration, then log out.
CL210-RHOSP10.1-en-2-20171006 29
2.3. Using SSH, log in to the ceph0 storage node. List relevant services and network
configuration, then log out.
3. Test the IPMI emulation software which is performing power management for the
overcloud's virtual machine nodes.
3.1. Use the IPMI command-line tool to power the compute1 node on and off. The
compute1 node will be provisioned as the second compute node in a later chapter,
but is not currently in use. All other nodes are currently functioning cloud nodes; do
not perform these commands on any other nodes. The IPMI address for compute1 is
172.25.249.112. Start by checking the node's current power status.
3.2. Toggle the compute1 power on and off. When you are finished practicing the IPMI
functionality, leave the compute1 node powered off.
30 CL210-RHOSP10.1-en-2-20171006
6. Install the Tempest testing service and component tests. Create a test configuration
directory, and populate it with configuration files using the configure-tempest-
directory script. Run the config-tempest script to configure the tests
for the overcloud, using overcloudrc environment parameters, the external
provider-172.25.250 network ID, and the cirros-0.3.4-x86_64-disk.img image
from http://materials.example.com.
6.1. Install the tempest package and all available component test packages.
CL210-RHOSP10.1-en-2-20171006 31
6.4. Run the config_tempest setup script using the external network ID. This populates
the tempest configuration files based on components currently installed.
7. Configure and run a smoke test. The dynamic configuration in the previous step included
mistral and designate component tests, which are not installed in this overcloud. Edit
the configuration to disable mistral and designate testing. Use the test skip file found
in student's Downloads directory on workstation to also exclude tests for API versions
not in use on this overcloud. Exit from director after the test run.
7.1. Edit the etc/tempest.conf testing configuration file to mark components as not
available. Locate and edit the service_available section to disable mistral and
designate testing. Leave existing entries; only add mistral and designate as
False. The section should appear as shown when done.
32 CL210-RHOSP10.1-en-2-20171006
7.3. Run the tempest cleanup command to save a list of pre-existing cloud resources.
7.4. Run the tests, specifying tempest-smoke-skip as the skip file. Although no test
failures are expected, view the output for any that occur to observe the troubleshooting
information provided by the Testing Service. This command may take 10 minutes or
longer to complete.
==============
Worker Balance
==============
- Worker 0 (13 tests) => 0:01:26.541830
CL210-RHOSP10.1-en-2-20171006 33
7.5. Run the tempest cleanup command to remove resources not listed in the earlier
save list. There may be none to delete, if all tests completed successfully and performed
their own cleanups.
Cleanup
On workstation, run the lab deployment-overcloud-verify cleanup script to clean up
this exercise.
34 CL210-RHOSP10.1-en-2-20171006
In this lab, you will validate that the overcloud is functional by deploying a server instance using
a new user and project, creating the resources required. The lab is designed to be accomplished
using the OpenStack CLI, but you can also perform tasks using the dashboard (http://
dashboard.overcloud.example.com). You can find the admin password in the /home/
stack/overcloudrc file on director.
Outcomes
You should be able to:
On workstation, run the lab deployment-review setup command. The script checks
that the m1.web flavor, the rhel7 image, and the provider-172.25.250 network exist to test
instance deployment. The script also checks that the default admin account is available.
Steps
1. On workstation, load the admin user environment file. To prepare for deploying an server
instance, create the production project in which to work, and an operator1 user with the
password redhat. Create an authentication environment file for this new user.
2. The lab setup script preconfigured an external provider network and subnet, an image, and
multiple flavors. Working as the operator1 user, create the security resources required to
deploy this server instance, including a key pair named operator1-keypair1.pem placed
in student's home directory, and a production-ssh security group with rules for SSH
and ICMP.
4. Deploy the production-web1 server instance using the rhel7 image and the m1.web
flavor.
5. When deployed, use ssh to log in to the instance console. From the instance, verify network
connectivity by using ping to reach the external gateway at 172.25.250.254. Exit the
production-web1 instance when finished.
Evaluation
On workstation, run the lab deployment-review grade command to confirm the success
of this exercise.
CL210-RHOSP10.1-en-2-20171006 35
Cleanup
On workstation, run the lab deployment-review cleanup script to clean up this
exercise.
36 CL210-RHOSP10.1-en-2-20171006
Solution
In this lab, you will validate that the overcloud is functional by deploying a server instance using
a new user and project, creating the resources required. The lab is designed to be accomplished
using the OpenStack CLI, but you can also perform tasks using the dashboard (http://
dashboard.overcloud.example.com). You can find the admin password in the /home/
stack/overcloudrc file on director.
Outcomes
You should be able to:
On workstation, run the lab deployment-review setup command. The script checks
that the m1.web flavor, the rhel7 image, and the provider-172.25.250 network exist to test
instance deployment. The script also checks that the default admin account is available.
Steps
1. On workstation, load the admin user environment file. To prepare for deploying an server
instance, create the production project in which to work, and an operator1 user with the
password redhat. Create an authentication environment file for this new user.
1.2. As admin, create the production project and the operator1 user.
1.3. Create a new authentication environment file by copying the existing admin-rc file.
1.4. Edit the file with the new user's settings. Match the settings shown here.
CL210-RHOSP10.1-en-2-20171006 37
unset OS_SERVICE_TOKEN
export OS_AUTH_URL=http://172.25.250.50:5000/v2.0
export OS_PASSWORD=redhat
export OS_REGION_NAME=regionOne
export OS_TENANT_NAME=production
export OS_USERNAME=operator1
export PS1='[\u@\h \W(operator1-production)]\$ '
2. The lab setup script preconfigured an external provider network and subnet, an image, and
multiple flavors. Working as the operator1 user, create the security resources required to
deploy this server instance, including a key pair named operator1-keypair1.pem placed
in student's home directory, and a production-ssh security group with rules for SSH
and ICMP.
2.1. Source the new environment file. Remaining lab tasks must be performed as this
production project member.
2.2. Create a keypair. Redirect the command output into the operator1-keypair1.pem
file. Set the required permissions on the key pair file.
2.3. Create a security group with rules for SSH and ICMP access.
38 CL210-RHOSP10.1-en-2-20171006
production-subnet1
...output omitted...
3.2. Create a router. Set the gateway address. Add the internal network interface.
3.3. Create a floating IP, taken from the external network. You will use this address to deploy
the server instance.
4. Deploy the production-web1 server instance using the rhel7 image and the m1.web
flavor.
4.1. Deploy the server instance, and verify the instance has an ACTIVE status.
5. When deployed, use ssh to log in to the instance console. From the instance, verify network
connectivity by using ping to reach the external gateway at 172.25.250.254. Exit the
production-web1 instance when finished.
CL210-RHOSP10.1-en-2-20171006 39
5.1. Use the ssh command with the key pair to log in to the instance as the cloud-user
user at the floating IP address.
5.2. Test for external network access. Ping the network gateway from production-web1.
Evaluation
On workstation, run the lab deployment-review grade command to confirm the success
of this exercise.
Cleanup
On workstation, run the lab deployment-review cleanup script to clean up this
exercise.
40 CL210-RHOSP10.1-en-2-20171006
Summary
In this chapter, you learned:
• Enterprise clouds today are built using multiple, interconnected cloud structures. The
undercloud is a provisioning and management cloud for building and managing the production
clouds. Red Hat OpenStack Platform director is the undercloud in Red Hat OpenStack Platform.
• There are three major steps in overcloud provisioning. Introspection discovers and queries
available systems to gather node capabilities. Orchestration uses templates and environment
files to configure everything about the cloud deployment. Testing is designed to validate all the
standard functionality of the components that were installed.
• Common open technologies are used in physical and virtual clouds. Intelligent Platform
Management Interface (IPMI) is the power management technology used to control nodes.
Virtual Network Computing (VNC) is the remote access technology used to access deployed
instance consoles.
• The introspection process defines the basic technical characteristics of nodes to be deployed.
Using those characteristics, overcloud deployment can automatically assign deployment roles
to specific nodes.
• The orchestration process defines the specific configuration for each node's hardware and
software. The provided default templates cover a majority of common use cases and designs.
• OpenStack includes a testing component which has hundreds of tests to verify every
component in an overcloud. Tests and configuration are completely customizable, and include
short, validation smoke tests and longer running, more comprehensive full tests.
CL210-RHOSP10.1-en-2-20171006 41
MANAGING INTERNAL
OPENSTACK COMMUNICATION
Overview
Goal Administer the Keystone identity service and the AMQP
messaging service.
Objectives • Describe the user and service authentication architecture.
CL210-RHOSP10.1-en-2-20171006 43
Objectives
After completing this section, students should be able to:
Identity
Identity encompasses authentication and authorization functions. Users are a digital
representation of a person, system, or service using other OpenStack services. Users are
authenticated before requesting services from OpenStack components. Users must be assigned a
role to participate in a project. Users may be managed using groups, introduced in Identity Service
v3, which can be assigned roles and attached to projects the same as individual users.
Projects (also referred to by the deprecated description tenant) are collections of owned
resources such as networks, images, servers, and security groups. These are structured
according to the development needs of an organization. A project can represent a customer,
account, or any organizational unit. With Identity Service v3, projects can contain sub-projects,
which inherit project role assignments and quotas from higher projects.
Resource
Resource functions manage domains, which are an Identity Service v3 entity for creating
segregated collections of users, groups and projects. Domains allow multiple organizations to
share a single OpenStack installation. Users, projects, and resources created in one domain
cannot be transferred to another domain; by design, they must be recreated. OpenStack creates
a single domain named default for a new installation. In Identity Service v2, multiple domains
are not recognized and all activities use the default domain.
Token
Token functions create, manage and validate time-limited tokens which users pass to other
OpenStack components to request service. A token is a structured enumeration of user access
rights designed to simplify the requirement that each individual OpenStack service request be
verified for sufficient user privilege. Token protocols have evolved since the early OpenStack
days, and are discussed further in this chapter.
Policy
Policy functions provide a rule-based authorization engine and an associated rule management
interface. Policy rules define the capabilities of roles. Default roles include admin, _member_,
swiftoperator, and heat_stack_user. Custom roles may be created by building policies.
44 CL210-RHOSP10.1-en-2-20171006
Role Assignment
Role assignment functions are used to assign users to projects. Users do not belong to projects,
instead they have a role in a project. Users may be assigned multiple roles for the same project,
and may also be assigned different roles in multiple projects.
Roles define a set of user privileges to perform specific operations on OpenStack services,
defined by policy definitions. The most commonly recognized roles are _member_, which can
perform all normal activities within a project, and admin, which adds additional permissions to
create users, projects, and other restricted resource objects.
Catalog
Catalog functions store connection information about every other OpenStack service
component, in the form of endpoints. The catalog contains multiple endpoint entries for each
service, to allow service traffic to be segregated by public, internal, and administration tasks
for traffic management and security reasons. Since OpenStack services may be redundantly
installed on multiple controller and compute nodes, the catalog contains endpoints for each.
When users authenticate and obtain a token to use when accessing services, they are, at the
same time, being given the current URL of the requested service.
Note
Red Hat OpenStack Platform supports both Identity Service v2 and v3.
Identity v3 requires the use of the new authentication environment variables
OS_IDENTITY_API_VERSION and OS_DOMAIN_NAME, and a change to the
OS_AUTH_URL for the new version's endpoint. This OpenStack System Administration II
course only uses Identity Service v2.
Each listed Identity Service function supports multiple choices of back ends, defined through
plug-ins, which can be one of the following types (not all functions support all back-end types):
• Key Value Store: A file-based or in-memory dictionary using primary key lookups.
• Structured Query Language: OpenStack uses SQLAlchemy as the default persistent data
store for most components. SQLAlchemy is a Python-based SQL toolkit.
CL210-RHOSP10.1-en-2-20171006 45
Authentication Tokens
The Identity Service confirms a user's identity through an authentication process specified
through plug-in configuration, then provides the user with a token that represents the user's
identity. A typical user token is scoped, meaning that it lists the resources and access for which
it may be used. Tokens have a limited time frame, allowing the user to perform service requests
without further authentication until the token expires or is revoked. A scoped token lists the user
rights and privileges, as defined in roles relevant to the current project. A requested OpenStack
service checks the provided roles and requested resource access, then either allow or deny the
requested service.
Any user may use the openstack token issue command to request a current scoped
token with output showing the user id, the (scope) project, and the new token expiration. This
token type is actually one of three types of authorization scope: unscoped, project-scoped, and
domain-scoped. Because domains are a new feature supported in the Identity Service v3, earlier
documentation may refer only to scoped and unscoped tokens, in which scope is project-based.
46 CL210-RHOSP10.1-en-2-20171006
Token Providers
There are four types of token providers: UUID, PKI, PKIZ, and the newest provider, Fernet
(pronounced fehr'nεt). All tokens are comprised of a payload, in JSON or random-generated
UUID format, contained in a transport format, such as a URL-friendly hexadecimal or
cryptographic message syntax (CMS) packaging. The default OpenStack recommended token
provider has changed a few times, as the OpenStack developers have addressed token size,
security, and performance issues.
UUID Tokens
UUID tokens were the original and default token provider up until the Folsom release. They are
32 byte randomly generated UUIDs, which must be persistently stored in the Identity Service's
configured back end to permit the Identity Service to validate the UUID each time a user makes a
service request to any service endpoint. Although UUIDs are lightweight and easy to validate with
a simple lookup, they have two disadvantages.
First, because UUID tokens must be retained by the Identity Service back end for repetitive
lookups, the storage space used grows as new tokens are generated. Until recently, expired
tokens were not regularly purged from the back-end store, leading to service performance
degradation.
Second, every individual service API call must bundle the request and token together to send to
the service component, where the service unpacks the UUID and sends a validation request to
the Identity Service. The Identity Service looks up the token's identity to determine the roles and
authorizations of the user, sending the information back to the resource service to determine if
the service component will process the user request. This generates a tremendous amount of
network traffic and activity to and from the Identity Service, which creates a scaling limitation.
The advantage of PKI tokens, because of the public key methodology, is the ability of the
requested resource service component to verify and read the payload authorizations without
needing to send the token back to the Identity Service for every request. To process request
tokens, the requested service is only required to obtain the Identity Service's signing certificate,
the current revocation list, and the CA public certificate that validates the signing certificate.
Validated and unencoded tokens and payloads can be stored and shared using memcache,
eliminating some repetitive token processing overhead.
The disadvantage of the PKI token provider method is unacceptable performance due to
oversized shared caches, increased load on the identity service back end, and other problems
associated with handling tokens with large payloads. PKI tokens take longer to create and to
CL210-RHOSP10.1-en-2-20171006 47
validate than UUID tokens. Subsequently, UUID tokens again became the recommended token
provider. PKI/PKIZ token support was deprecated in the Mitaka release and was removed in the
Ocata release.
Fernet Tokens
Fernet tokens are an implementation of a symmetric key cryptographic authentication method,
which uses the same key to both encrypt and decrypt, designed specifically to process service
API request tokens. Fernet supports using multiple keys, always using the first key (the current
key) in the list to perform encryption and attempting other keys in the list (former keys and
about-to-become-current staged keys) to perform decryption. This technique allows Fernet keys
to be rotated regularly for increased security, while still allowing tokens created with previous
keys to be decrypted.
Fernet tokens do not exceed 250 bytes and are not persisted in the Identity Service back end.
Fernet token payloads use the MessagePack binary serialization format to efficiently carry the
authentication and authorization metadata, which is then encrypted and signed. Fernet tokens
do not require persistence nor do they require maintenance, as they are created and validated
instantaneously on any Identity Service node that can access the Fernet symmetric keys. The
symmetric keys are stored and shared on all Identity Service nodes in a key repository located by
default at /etc/keystone/fernet-keys/. The Fernet token provider was introduced in the
Kilo release and is the default token provider in the Ocata release. In earlier OpenStack developer
documentation, these tokens were referred to as authenticated encryption (AE) tokens.
Warning
All of these token providers (UUID, PKI, PKIZ, and Fernet) are known as bearer tokens,
which means that anyone holding the token can impersonate the user represented in
that token without having to provide any authentication credentials. Bearer tokens
must be protected from unnecessary disclosure to prevent unauthorized access.
PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh
@hourly keystone-manage token_flush &> /var/log/keystone/keystone-tokenflush.log
If necessary, the tokens flushed in the last hour can be viewed in the log file /var/log/
keystone/keystone-tokenflush.log. The log file does not grow in size, since the cron
48 CL210-RHOSP10.1-en-2-20171006
job overwrites the log file each hour. When the cron job is first modified, the token database will
be larger than it will need to be in the future, since it will now be flushed hourly. However, the
database will not automatically reclaim unused space and should be truncated to relinquish all
currently used disk space:
• Primary Key: the primary key is considered to be the current key. There can only be one
primary key on a single Identity Service node, recognized because its file name always has the
highest index number. Primary keys are used to both encrypt and decrypt Fernet tokens.
• Secondary Key: a secondary key is the key that was formerly a primary key and has been
replaced (rotated out). It is only used to decrypt Fernet tokens; specifically, to decrypt any
remaining Fernet tokens that it had originally encrypted. A secondary key's file is named with
an index that is lower than the highest, but never has the index of 0.
• Staged Key: a staged key is a newly added key that will be the next primary key when the keys
are next rotated. Similar to a secondary key, it is only used to decrypt tokens, which seems
unnecessary since it has not yet been a primary key and has never encrypted tokens on this
Identity Service node. However, in a multi-node Identity Service configuration, after the key
repository has been updated with a new staged key and distributed to all Identity Service
nodes, those nodes will perform key rotation one at a time. A staged key on one node may be
needed to decrypt tokens created by another node where that key has already become the
primary key. The staged key is always recognized by having a file name with the index of 0.
CL210-RHOSP10.1-en-2-20171006 49
When the Identity Service v2 API becomes deprecated in some future release, the last remaining
adminURL distinction, that of the end user and admin CRUD PasteDeploy pipeline routines, will
no longer be necessary and the adminURL endpoint will also be deprecated and removed.
References
Keystone tokens
https://docs.openstack.org/keystone/latest/admin/identity-tokens.html
50 CL210-RHOSP10.1-en-2-20171006
a. Policy
b. Resource
c. Catalog
d. Token
e. User
a. Policy
b. Resource
c. Catalog
d. Token
e. User
3. Which type of token authorization describes tokens that are not attached to a project?
a. Scoped Token
b. Domain Token
c. Unscoped Token
d. PKI Token
4. Which Keystone configuration file contains role-based access policy entries that determine
which user can access which objects and how they can be accessed?
a. policy.json
b. default_catalog.templates
c. keystone-paste.ini
d. keystone-env.conf
5. Which two token providers use cryptographic message syntax (CMS)? (Choose two.)
a. Fernet
b. PKI
c. PKIZ
d. Scoped token
e. UUID
CL210-RHOSP10.1-en-2-20171006 51
Solution
Choose the correct answer(s) to the following questions:
a. Policy
b. Resource
c. Catalog
d. Token
e. User
a. Policy
b. Resource
c. Catalog
d. Token
e. User
3. Which type of token authorization describes tokens that are not attached to a project?
a. Scoped Token
b. Domain Token
c. Unscoped Token
d. PKI Token
4. Which Keystone configuration file contains role-based access policy entries that determine
which user can access which objects and how they can be accessed?
a. policy.json
b. default_catalog.templates
c. keystone-paste.ini
d. keystone-env.conf
5. Which two token providers use cryptographic message syntax (CMS)? (Choose two.)
a. Fernet
b. PKI
c. PKIZ
d. Scoped token
e. UUID
52 CL210-RHOSP10.1-en-2-20171006
Objective
After completing this section, students should be able to administer the service catalog.
| endpoints | regionOne |
| | publicURL: http://172.25.250.50:8774/v2.1 |
| | internalURL: http://172.24.1.50:8774/v2.1 |
| | adminURL: http://172.24.1.50:8774/v2.1 |
| | |
| name | nova |
| type | compute |
+-----------+----------------------------------------------------+
Endpoints
An endpoint is a URL that an API client uses to access a service in OpenStack. Every service
has one or more endpoints. There are three types of endpoint URLs: adminURL, publicURL,
and internalURL. The adminURL should only be consumed by those who require administrative
access to a service endpoint. The internalURL is used by services to communicate with each
other on a network that is unmetered or free of bandwidth charges. The publicURL is designed
with the intention of being consumed by end users from a public network. The adminURL is
meant only for access requiring administrative privileges.
To list the services and their endpoints, use the openstack catalog list command as the
OpenStack admin user.
CL210-RHOSP10.1-en-2-20171006 53
To list the ID, region, service name, and service type of all the endpoints, use the openstack
endpoint list command.
Troubleshooting
A proper catalog and endpoint configuration are essential for the OpenStack environment to
function effectively. Common issues that lead to troubleshooting are misconfigured endpoints
and user authentication. There is a known issue documented in BZ-1404324 where the scheduled
token flushing job is not effective enough for large deployments, we will review the fix in the
following guided exercise. When issues do arise, there are steps that can be taken to investigate
and find a resolution to the issue. The following is a list of troubleshooting steps:
• Ensure the authentication credentials and token are appropriate using the curl command to
retrieve the service catalog.
54 CL210-RHOSP10.1-en-2-20171006
"roles_links": [],
"username": "admin"
},
"serviceCatalog": [
{
"name": "nova",
"type": "compute",
"endpoints_links": [],
"endpoints": [
...output omitted...
• Every service has an API log that should be inspected when troubleshooting endpoints. For
example, if an operator cannot retrieve Glance image data, an inspection of /var/log/
glance/api.log may provide useful information. Query the file for DiscoveryFailure.
• Include the --debug option to the openstack catalog show command (or to any
openstack command) to view the HTTP request from the client and the responses from
the endpoints. For example, the following lists the HTTP request from nova compute and the
response from the endpoint.
2. Verify the token by using the curl command with the token to list projects.
3. Display the service catalog using the openstack catalog list command.
CL210-RHOSP10.1-en-2-20171006 55
4. Display endpoints and the ID for a particular service using the openstack catalog show
command, for instance, passing the service name nova as an argument.
References
Identity Concepts
https://docs.openstack.org/keystone/latest/admin/identity-concepts.html
56 CL210-RHOSP10.1-en-2-20171006
In this exercise, you will view the Keystone endpoints and catalog, issue a token, and manage
token expiration.
Outcomes
You should be able to:
Steps
1. On workstation, source the Keystone admin-rc file and list the Keystone endpoints
registry. Take note of the available service names and types.
2. View the Keystone service catalog and notice the endpoint URLs (especially the IP
addresses), the version number, and the port number.
CL210-RHOSP10.1-en-2-20171006 57
3. Issue an admin token to manually (using curl) find information about OpenStack.
4. Verify the token retrieved in the previous command. Use the curl command with the token
ID to retrieve the projects (tenants) for the admin user.
5. Use SSH to connect to director as the user root. The database, MariaDB, resides on
director and provides storage for expired tokens. Accessing MariaDB enables you to
determine the amount of space used for expired tokens.
6. Log in to MariaDB.
7. Use an SQL statement to list the tables and pay special attention to the size of the token
table.
58 CL210-RHOSP10.1-en-2-20171006
8. Use an SQL statement to view the amount of space used for expired Keystone tokens.
9. Truncate the token table then ensure the amount of space used for expired tokens is zero.
11. Ensure that the Keystone user has a cron job to flush tokens from the database.
CL210-RHOSP10.1-en-2-20171006 59
Cleanup
From workstation, run the lab communication-svc-catalog cleanup script to clean up
the resources created in this exercise.
60 CL210-RHOSP10.1-en-2-20171006
Objective
After completing this section, students should be able to manage messages and the message
broker.
RabbitMQ Overview
OpenStack software provides a collection of services covering all the functionality associated
with a private cloud solution. Those services are composed internally of different components,
allowing a flexible and scalable configuration. OpenStack services base their back end on two
services, a database for persistence and a message broker for supporting communications
among the components of each service. Any message broker solution supporting AMQP can be
used as a message broker back end. Red Hat includes RabbitMQ as the message broker to be
used on its OpenStack architecture, since it provides enterprise-level features useful for setting
up advanced configurations.
The following table provides some common RabbitMQ terms and definitions.
Term Description
Exchange retrieves published messages from the producer and distributes them
to queues
Publisher/Producer applications that publish the message
Consumer applications that process the message
Queues stores the message
Routing Key used by the exchange to determine how to route the message
Binding the link between a queue and an exchange
A message broker allows message sending and receiving among producers and consumer
applications. Internally, this communication is executed by RabbitMQ using exchanges, queues,
and the bindings among those two. When an application produces a message that it wants to
send to one or more consumer applications, it places that message on an exchange to which
one or more queues are bound. Consumers can subscribe to those queues in order to receive
the message from the producer. The communication is based on the routing key included in the
message to be transmitted.
Exchange Overview
The exchange's interaction with a queue is based on the match between the routing key included
in the message and the binding key associated to the queue on the related exchange. Depending
on the usage of those two elements, there are several types of exchanges in RabbitMQ.
• Direct
Consumers are subscribed to a queue with an associated binding key, and the producer sets
the routing key of the message to be the same as that of the binding key of the queue to which
the desired consumer is subscribed.
• Topic
CL210-RHOSP10.1-en-2-20171006 61
Consumers are subscribed to a queue that has a binding key including wildcards, so producers
can send messages with different but related routing keys to that queue.
• Fanout
The message is broadcast to all the subscribed queues without regard for whether the routing
and binding keys match.
• Headers
This makes use of the header properties of the message to perform the match against the
binding arguments of the queue.
Troubleshooting
OpenStack services follow a component architecture. The functionalities of a service are split
into different components, and each component communicates with other components using the
message broker. In order to troubleshoot a problem with an OpenStack service, it is important
to understand the workflow a request follows as it moves through the different components
of the service. Generally, the OpenStack service architecture provides a unique component to
make each service’s API available. The Cinder block storage service, for example, is managed by
the cinder-api service. The API component is the entry point to the rest of the component
architecture of its service. When trying to isolate a problem with a service, check its API provider
first.
After the API component has been verified, and if no errors appear in the log files, confirm
that the remaining components can communicate without issue. Any error related to the
RabbitMQ message broker, or its configuration in the related service configuration file, should
appear in the log files of the service. For the Cinder block storage service, after the cinder-
api has processed the petition through the Cinder API, the petition is processed by both
the cinder-volume and cinder-scheduler processes. These components take care of
communicating among themselves using the RabbitMQ message broker to create the volume on
the most feasible storage back end location. Cinder block storage service components (cinder-
scheduler, for example) do not function correctly with a broken RabbitMQ back end that
crashes unexpectedly. Debug the issue by checking the component-related logs, such as /var/
log/cinder/scheduler.log. Then check for problems with the component as a client for the
RabbitMQ message broker. When a component crashes from RabbitMQ-related issues, it is usually
62 CL210-RHOSP10.1-en-2-20171006
RabbitMQ Utilities
RabbitMQ provides a suite of utilities to check the RabbitMQ daemon status and to execute
administrative operations. These tools are used to check the different configurable elements
on a RabbitMQ instance, including the queues used by the producers and consumers to share
messages, the exchanges to which those queues are connected to, and the bindings among the
components. The following table describes the RabbitMQ utility commands.
Utility Description
rabbitmqctl command line tool for managing a RabbitMQ broker
rabbitmqadmin provided by the management plugin, used to perform the same actions as
the web-based UI, and can be used with scripting
• Use the report command to show a summary of the current status of the RabbitMQ daemon,
including the number and types of exchanges and queues.
• Use the add_user command to create RabbitMQ users. For example, to create a RabbitMQ
user named demo with redhat as the password, use the following command:
• Use the set_permissions command to set the authorization for a RabbitMQ user. This
option sets the configure, write, and read permissions that correspond to the three wildcards
used in the command, respectively. For example, to set configure, write, and read permissions
for the RabbitMQ user demo, use the following command:
• Use the set_user_tags command to enable authorization for the management back end. For
example, to assign the RabbitMQ user demo administrator access, use the following command.
CL210-RHOSP10.1-en-2-20171006 63
• Use the list_exchanges command with rabbitmqctl to show the default configured
exchanges on the RabbitMQ daemon.
• Use the list_queues command to list the available queues and their attributes.
• Use the list_consumers command to list all the consumers and the queues to which they
are subscribed.
• Use the declare queue command to create a queue. For example, to create a new queue
name demo.queue, use the following command:
• Use the declare exchange command to create an exchange. For example, to create a topic
exchange named demo.topic, use the following command:
64 CL210-RHOSP10.1-en-2-20171006
• Use the publish command to publish a message to a queue. For example, to publish the
message 'demo message!' to the demo.queue queue, execute the command, type the
message, then press Ctrl+D to publish the message.
• Use the get command to display a message for a queue. For example, to display the message
published to the queue demo.queue use the following command:
3. Set the user tag to administrator or guest, using the rabbitmqctl set_user_tags
command.
References
Management CLI
https://www.rabbitmq.com/management-cli.html
Management Plugins
https://www.rabbitmq.com/management.html
Troubleshooting
https://www.rabbitmq.com/troubleshooting.html
CL210-RHOSP10.1-en-2-20171006 65
In this exercise, you will enable the RabbitMQ Management Plugin to create an exchange and
queue, publish a message, and retrieve it.
Resources
Files: http://material.example.com/cl210_producer, http://
material.example.com/cl210_consumer
Outcomes
You should be able to:
Steps
1. From workstation, use SSH to connect to director as the stack user. Use sudo to become
the root user.
3. Configure permissions for the rabbitmqauth user. Use wildcard syntax to assign all
resources to each of the three permissions for configure, write, and read.
66 CL210-RHOSP10.1-en-2-20171006
10. On workstation, open a second terminal. Using SSH, log in as the stack user
to director. Switch to the root user. Launch the cl210_consumer script using
anonymous.info as the routing key.
11. In the first terminal, launch the cl210_producer script to send messages using the routing
key anonymous.info.
12. In the second terminal, sent message(s) are received and displayed. Running the
cl210_producer script multiple times sends multiple messages.
CL210-RHOSP10.1-en-2-20171006 67
Exit this cl210_consumer terminal after observing the message(s) being received. You are
finished with the example publisher-consumer exchange scripts.
13. The next practice is to observe a message queue. Create a queue named redhat.queue.
14. Verify that the queue is created. The message count is zero.
15. Publish messages to the redhat.queue queue. These first two examples include the
message payload on the command line.
16. Publish a third message to the redhat.queue queue, but without using the payload
parameter. When executing the command without specifying a payload, rabbitmqadmin
waits for multi-line input. Press Ctrl+D when the cursor is alone at the first space of a new
line to end message entry and publish the message.
17. Verify that the redhat queue has an increased message count.
18. Display the first message in the queue. The message_count field indicates how many more
messages exist after this one.
68 CL210-RHOSP10.1-en-2-20171006
19. Display multiple messages using the count option. Each displayed message indicates how
many more messages follow. The redelivered field indicates whether you have previously
viewed this specific message.
20. When finished, delete the queue named redhat.queue. Return to workstation.
Cleanup
From workstation, run lab communication-msg-brokering cleanup to clean up
resources created for this exercise.
CL210-RHOSP10.1-en-2-20171006 69
In this lab, you will troubleshoot and fix issues with the Keystone identity service and the
RabbitMQ message broker.
Outcomes
You should be able to:
Scenario
During a recent deployment of the overcloud, cloud administrators are reporting issues with the
Compute and Image services. Cloud administrators are not able to access the Image service nor
the Compute service APIs. You have been tasked with troubleshooting and fixing these issues.
On workstation, run the lab communication-review setup command. This ensures that
the OpenStack services are running and the environment has been properly configured for this
lab.
Steps
1. From workstation, verify the issue by attempting to list instances as the OpenStack
admin user. The command is expected to hang.
4. Investigate and fix the issue based on the error discovered in the log. Modify the incorrect
rabbitmq port value in /etc/rabbitmq/rabbitmq-env.conf and use HUP signal to
respawn the beam.smp process. Log out of the controller0 node when finished.
5. From workstation, attempt to aqgain list instances, to verify that the issue is fixed. This
command is expected to display instances or return to a command prompt without hangiing.
6. Next, attempt to list images as well. The command is expected to fail, returning an internal
server error.
9. The error in the Image service log indicates a communication issue with the Image service
API and the Identity service. In a previous step, you verified that the Identity service could
70 CL210-RHOSP10.1-en-2-20171006
10. From workstation, again attempt to list images to verify the fix. This command should
succeed and returning a command prompt without error.
Cleanup
From workstation, run the lab communication-review cleanup script to clean up the
resources created in this exercise.
CL210-RHOSP10.1-en-2-20171006 71
Solution
In this lab, you will troubleshoot and fix issues with the Keystone identity service and the
RabbitMQ message broker.
Outcomes
You should be able to:
Scenario
During a recent deployment of the overcloud, cloud administrators are reporting issues with the
Compute and Image services. Cloud administrators are not able to access the Image service nor
the Compute service APIs. You have been tasked with troubleshooting and fixing these issues.
On workstation, run the lab communication-review setup command. This ensures that
the OpenStack services are running and the environment has been properly configured for this
lab.
Steps
1. From workstation, verify the issue by attempting to list instances as the OpenStack
admin user. The command is expected to hang.
1.1. From workstation, source the admin-rc credential file. Attempt to list any running
instances. The command is expected to hang, and does not return to the command
prompt. Use Ctrl+C to escape the command.
2.1. From workstation, use SSH to connect to controller0 as the heat-admin user.
72 CL210-RHOSP10.1-en-2-20171006
4. Investigate and fix the issue based on the error discovered in the log. Modify the incorrect
rabbitmq port value in /etc/rabbitmq/rabbitmq-env.conf and use HUP signal to
respawn the beam.smp process. Log out of the controller0 node when finished.
4.2. List the process ID for the beam.smp process. The beam.smp process is the application
virtual machine that interprets the Erlang language bytecode in which RabbitMQ works.
By locating and restarting this process, RabbitMQ reloads the fixed configuration.
4.3. Restart beam.smp by send a hangup signal to the retrieved process ID.
4.4. List the beam.smp process ID to verify the tcp_listeners port is now 5672.
5. From workstation, attempt to aqgain list instances, to verify that the issue is fixed. This
command is expected to display instances or return to a command prompt without hangiing.
CL210-RHOSP10.1-en-2-20171006 73
6. Next, attempt to list images as well. The command is expected to fail, returning an internal
server error.
7.1. From workstation, use SSH to connect to controller0 as the heat-admin user.
9. The error in the Image service log indicates a communication issue with the Image service
API and the Identity service. In a previous step, you verified that the Identity service could
communicate with the Compute service API, so the next logical step is to focus on the Image
service configuration. Investigate and fix the issue based on the traceback found in the
Image service log.
9.1. First, view the endpoint URL for the Identity service.
74 CL210-RHOSP10.1-en-2-20171006
9.4. Restart the openstack-glance-api service. When finished, exit from controller0.
10. From workstation, again attempt to list images to verify the fix. This command should
succeed and returning a command prompt without error.
10.1. From workstation, attempt to list images. This command should succeed and returning a
command prompt without error.
Cleanup
From workstation, run the lab communication-review cleanup script to clean up the
resources created in this exercise.
CL210-RHOSP10.1-en-2-20171006 75
Summary
In this chapter, you learned:
• RabbitMQ provides a suite of utilities to check the RabbitMQ daemon status and to execute
administrative operations on it.
• Red Hat OpenStack Platform recommends creating a cron job that runs hourly to purge
expired Keystone tokens.
• The Keystone endpoint adminURL should only be consumed by those who require
administrative access to a service endpoint.
• PKIZ tokens add compression using zlib making them smaller than PKI tokens.
• Fernet tokens have a maximum limit of 250 bytes, which makes them small enough to be ideal
for API calls and minimize the data kept on disk.
76 CL210-RHOSP10.1-en-2-20171006
Overview
Goal Build and customize images
Objectives • Describe common image formats for OpenStack.
CL210-RHOSP10.1-en-2-20171006 77
Objective
After completing this section, students should be able to describe the common image formats
used within Red Hat OpenStack Platform.
Red Hat OpenStack Platform supports many virtual disk image formats, including RAW, QCOW2,
AMI, VHD, and VMDK. In this chapter we will discuss the RAW and QCOW2 formats, their features,
and their use in Red Hat OpenStack Platform.
The RAW format is a bootable, uncompressed virtual disk image, whereas the QCOW2 format
is more complex and supports many features. File systems that support sparse files allow RAW
images to be only the size of the used data. This means that a RAW image of a 20 GiB disk may
only be 3 GiB in size. The attributes of both are compared in the following table.
78 CL210-RHOSP10.1-en-2-20171006
When choosing between improved VM performance and reduced storage consumption, reduced
storage consumption is usually preferred. The performance difference between RAW and QCOW2
images is not great enough to outweigh the cost of allocated but underused storage.
References
Further information is available in the documentation for Red Hat OpenStack Platform
at
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform
CL210-RHOSP10.1-en-2-20171006 79
1. What is the correct image format when using Ceph as the back end for the OpenStack Image
service?
a. QCOW2
b. VHD
c. VMDK
d. RAW
2. Which four image formats are supported by Red Hat OpenStack Platform? (Choose four.)
a. VMDK
b. VBOX
c. VHD
d. QCOW2
e. RAW
3. Which three features are part of the QCOW2 format? (Choose three.)
a. Encryption
b. DFRWS support
c. Snapshots
d. Multi-bit error correction
e. Copy-on-write
80 CL210-RHOSP10.1-en-2-20171006
Solution
Choose the correct answers to the following questions:
1. What is the correct image format when using Ceph as the back end for the OpenStack Image
service?
a. QCOW2
b. VHD
c. VMDK
d. RAW
2. Which four image formats are supported by Red Hat OpenStack Platform? (Choose four.)
a. VMDK
b. VBOX
c. VHD
d. QCOW2
e. RAW
3. Which three features are part of the QCOW2 format? (Choose three.)
a. Encryption
b. DFRWS support
c. Snapshots
d. Multi-bit error correction
e. Copy-on-write
CL210-RHOSP10.1-en-2-20171006 81
Building an Image
Objective
After completing this section, students should be able to build an image using diskimage-
builder.
diskimage-builder is a tool for building and customizing cloud images. It can output virtual disk
images in a variety of formats, such as QCOW2 and RAW. Elements are applied by diskimage-
builder during the build process to customize the image. An element is a code set that runs
within a chroot environment and alters how an image is built. For example, the docker
elements export a tar file from a named container allowing other elements to build on top of it,
or the element bootloader, which installs grub2 on the boot partition of the system.
Diskimage-builder Architecture
diskimage-builder bind mounts /proc, /sys, and /dev in a chroot environment. The image-
building process produces minimal systems that possess all the required bits to fulfill their
purpose with OpenStack. Images can be as simple as a file system image or can be customized
to provide whole disk images. Upon completion of the file system tree, a loopback device with file
system (or partition table and file system) is built and the file system tree copied into it.
Diskimage-builder Elements
Elements are used to specify what goes into the image and any modifications that are desired.
Images are required to use at least one base distribution element, and there are multiple
elements for a given distribution. For example, the distribution element could be rhel7, and
then other elements are used to modify the rhel7 base image. Scripts are invoked and applied
to the image based on multiple elements.
82 CL210-RHOSP10.1-en-2-20171006
Each element has scripts that are applied to the images as they are built. The following example
shows the scripts for the base element.
6 directories, 15 files
CL210-RHOSP10.1-en-2-20171006 83
Phase Subdirectories
Phase Subdirectory Description
root.d Builds or modifies the initial root file system content. This
is where customizations are added, such as building on an
existing image. Only one element can use this at a time unless
particular care is taken not to overwrite, but instead to adapt
the context extracted by other elements.
extra-data.d Include extra data from the host environment that hooks may
need when building the image. This copies any data such as
SSH keys, or HTTP proxy settings, under $TMP_HOOKS_PATH.
pre-install.d Prior to any customization or package installation, this code
runs in a chroot environment.
install.d In this phase the operating system and packages are installed,
this code runs in a chroot environment.
post-install.d This is the recommended phase to use for performing
tasks that must be handled after the operating system and
application installation, but before the first boot of the image.
For example, running systemctl enable to enable required
services.
block-device.d Customize the block device, for example, to make partitions.
Runs before the cleanup.d phase runs, but after the target
tree is fully populated.
finalize.d Runs in a chroot environment upon completion of the root
file system content being copied to the mounted file system.
Tuning of the root file system is performed in this phase, so
it is important to limit the operations to only those necessary
to affect the file system metadata and image itself. post-
install.d is preferred for most operations.
cleanup.d The root file system content is cleaned of temporary files.
84 CL210-RHOSP10.1-en-2-20171006
Important
Yum repository configuration files specified by DIB_YUM_REPO_CONF are copied into
/etc/yum.repos.d during the image build and removed when the build is done.
The intention is to provide the specified yum repository access only during the build
and not to leave that yum repository access in the final image. However, this removal
behavior may cause an unintended result; a yum repository configuration file specified
in DIB_YUM_REPO_CONF that matches an already existing configuration file in the
starting base image will result in that configuration file being removed from the final
image at the end of the build. Be sure to check for existing repository configuration
and exclude it from DIB_YUM_REPO_CONF if it should remain in the final built image.
Diskimage-builder Options
We will examine some of the options available in the context of the following example:
The vm element provides sane defaults for virtual machine disk images. The next option is the
distribution; the rhel7 option is provided to specify that the image will be Red Hat Enterprise
Linux 7. The -n option skips the default inclusion of the base element, which might be desirable
if you prefer not to have cloud-init and package updates installed. The -p option specifies which
packages to install; here we are installing the python-django-compressor package. The -a option
specifies the architecture of the image. The -o option specifies the output image name.
Diskimage-builder Execution
Each element contains a set of scripts to execute. In the following excerpt from the diskimage-
build.log file, we see the scripts that were executed as part of the root phase.
Target: root.d
Script Seconds
--------------------------------------- ----------
01-ccache 0.017
10-rhel7-cloud-image 93.202
50-yum-cache 0.045
90-base-dib-run-parts 0.037
The run time for each script is shown on the right. Scripts that reside in the extra-data.d
phase subdirectory were then executed:
Target: extra-data.d
Script Seconds
--------------------------------------- ----------
01-inject-ramdisk-build-files 0.031
10-create-pkg-map-dir 0.114
20-manifest-dir 0.021
50-add-targetcli-module 0.038
50-store-build-settings 0.006
75-inject-element-manifest 0.040
CL210-RHOSP10.1-en-2-20171006 85
98-source-repositories 0.041
99-enable-install-types 0.023
99-squash-package-install 0.221
99-yum-repo-conf 0.039
From these examples, you can confirm the order that the phases were executed in and the order
of script execution in each phase.
To avoid the proliferation of images, you can choose to add customization that is common across
the organization to images, and then perform more granular customization with CloudInit. If only
a small variety of system types are required, it might be simpler to perform all customization
using diskimage-builder.
Building an Image
The following steps outline the process for building an image with diskimage-builder.
3. Add a script to perform the desired customization under the working copy of the relevant
element phase directory.
5. Build the image using the disk-image-create command and appropriate options.
9. Connect to the instance using SSH and verify the customization was executed.
References
Diskimage-builder Documentation
https://docs.openstack.org/diskimage-builder/latest/
86 CL210-RHOSP10.1-en-2-20171006
In this exercise you will build and customize a disk image using diskimage-builder.
Resources
Base Image http://materials.example.com/osp-small.qcow2
Working Copy of diskimage- /home/student/elements
builder Elements
Outcomes
You should be able to:
Steps
1. From workstation, retrieve the osp-small.qcow2 image from http://
materials.example.com/osp-small.qcow2 and save it under /home/student/.
2. Create a copy of the diskimage-builder elements directory to work with under /home/
student/.
3. Create a post-install.d directory under the working copy of the rhel7 element.
4. Add three scripts under the rhel7 element post-install.d directory to enable the
vsftpd service, add vsftpd:ALL to /etc/hosts.allow, and disable anonymous ftp in /
etc/vsftpd/vsftpd.conf.
CL210-RHOSP10.1-en-2-20171006 87
5. Return to the student home directory. Set the executable permission on the scripts.
[student@workstation post-install.d]$ cd
[student@workstation ~]$ chmod +x /home/student/elements/rhel7/post-install.d/*
Environment Variables
Variable Content
NODE_DIST rhel7
DIB_LOCAL_IMAGE /home/student/osp-small.qcow2
DIB_YUM_REPO_CONF /etc/yum.repos.d/openstack.repo
ELEMENTS_PATH /home/student/elements
7. Build the finance-rhel-ftp.qcow2 image and include the vsftpd package. The scripts
created earlier are automatically integrated.
88 CL210-RHOSP10.1-en-2-20171006
Instance Attributes
Attribute Value
flavor m1.web
key pair developer1-keypair1
network finance-network1
image finance-rhel-ftp
security group finance-ftp
name finance-ftp1
10. List the available floating IP addresses, then allocate one to finance-ftp1.
CL210-RHOSP10.1-en-2-20171006 89
11. When the image build was successful, the resulting FTP server displays messages and
requests login credentials. If the following ftp command does not prompt for login
credentials, troubleshoot the image build or deployment.
Attempt to log in to the finance-ftp1 instance as student using the ftp command. Look
for the 220 (vsFTPd 3.0.2) message indicating server response. After login, exit at the
ftp prompt.
Cleanup
From workstation, run the lab customization-img-building cleanup command to
clean up this exercise.
90 CL210-RHOSP10.1-en-2-20171006
Customizing an Image
Objectives
After completing this section, students should be able to customize an image using guestfish
and virt-customize.
CL210-RHOSP10.1-en-2-20171006 91
The --selinux-relabel customization option relabels files in the guest so that they
have the correct SELinux label. This option tries to relabel files immediately. If unsuccessful,
/.autorelabel is created on the image. This schedules the relabel operation for the next time
the image boots.
Use Cases
For most common image customization tasks, virt-customize is the best choice. However,
as listed in the table above, the less frequent low-level tasks should be performed with the
guestfish command.
Important
When working with images that have SELinux enabled, ensure that the correct SELinux
relabeling syntax is used to reset proper labels on files modified. Files with incorrectly
labeled context will cause SELinux access denials. If the mislabeled files are critical
system files, the image may not be able to boot until labeling is fixed.
92 CL210-RHOSP10.1-en-2-20171006
The following steps outline the process for customizing an image with guestfish.
2. Execute the guestfish command. Use -i to automatically mount the partitions and use -
a to add the image.
3. Perform the changes you require, using commands such as add, rm, and command.
Important
If your image will have SELinux enabled, ensure you relabel any affected files
using the selinux-relabel /etc/selinux/targeted/contexts/files/
file_contexts / command.
8. Connect to the instance using SSH and verify the customization was executed.
2. Execute the virt-customize command. Use -a to add the image, and then use other
options such as --run-command, --install, --write and --root-password.
Important
If your image will have SELinux enabled, ensure you use the --selinux-
relabel option last. Running the restorecon command inside the image will
not work through virt-customize.
6. Connect to the instance using SSH and verify the customization was executed.
CL210-RHOSP10.1-en-2-20171006 93
References
guestfish - the guest file system shell
http://libguestfs.org/guestfish.1.html
94 CL210-RHOSP10.1-en-2-20171006
In this exercise you will customize disk images using guestfish and virt-customize.
Resources
Base Image http://materials.example.com/osp-small.qcow2
Outcomes
You should be able to:
Steps
1. From workstation, retrieve the osp-small.qcow2 image from http://
materials.example.com/osp-small.qcow2 and save it as /home/student/
finance-rhel-db.qcow2.
2. Using the guestfish command, open the image for editing and include network access.
><fs>
CL210-RHOSP10.1-en-2-20171006 95
Dependency Installed:
libaio.x86_64 0:0.3.109-13.el7
perl-Compress-Raw-Bzip2.x86_64 0:2.061-3.el7
perl-Compress-Raw-Zlib.x86_64 1:2.061-4.el7
perl-DBD-MySQL.x86_64 0:4.023-5.el7
perl-DBI.x86_64 0:1.627-4.el7
perl-Data-Dumper.x86_64 0:2.145-3.el7
perl-IO-Compress.noarch 0:2.061-2.el7
perl-Net-Daemon.noarch 0:0.48-5.el7
perl-PlRPC.noarch 0:0.2020-14.el7
Complete!
5. Because there was no output, ensure the mariadb service was enabled.
6. Ensure the SELinux contexts for all affected files are correct.
Important
Files modified from inside the guestfish tool are written without valid SELinux
context. Failure to relabel critical modified files, such as /etc/passwd, will result
in an unusable image, since SELinux properly denies access to files with improper
context, during the boot process.
><fs> exit
[student@workstation ~]$
96 CL210-RHOSP10.1-en-2-20171006
Instance Attributes
Attribute Value
flavor m1.database
key pair developer1-keypair1
network finance-network1
image finance-rhel-db
security group finance-db
name finance-db1
10. List the available floating IP addresses, and then allocate one to finance-db1.
10.1. List the floating IPs; unallocated IPs have None listed as their Port value.
CL210-RHOSP10.1-en-2-20171006 97
+---------------------+------+
11. Use ssh to connect to the finance-db1 instance. Ensure the mariadb-server package is
installed, and that the mariadb service is enabled and running.
11.3. Confirm that the mariadb service is enabled and running, and then log out.
98 CL210-RHOSP10.1-en-2-20171006
Instance Attributes
Attribute Value
flavor m1.web
key pair developer1-keypair1
network finance-network1
image finance-rhel-mail
security group finance-mail
name finance-mail1
16. List the available floating IP addresses, and allocate one to finance-mail1.
CL210-RHOSP10.1-en-2-20171006 99
17. Use ssh to connect to the finance-mail1 instance. Ensure the postfix service is
running, that postfix is listening on all interfaces, and that the relay_host directive is
correct.
100 CL210-RHOSP10.1-en-2-20171006
17.6. Return to workstation. Use the mail command to confirm that the test email arrived.
Cleanup
From workstation, run the lab customization-img-customizing cleanup command
to clean up this exercise.
CL210-RHOSP10.1-en-2-20171006 101
In this lab, you will build a disk image using diskimage-builder, and then modify it using
guestfish.
Resources
Base Image URL http://materials.example.com/osp-small.qcow2
Diskimage-builder elements /usr/share/diskimage-builder/elements
directory
Outcomes
You will be able to:
On workstation, run the lab customization-review setup command. This ensures that
the required packages are installed on workstation, and provisions the environment with a public
network, a private network, a key pair, and security rules to access the instance.
Steps
1. From workstation, retrieve the osp-small.qcow2 image from http://
materials.example.com/osp-small.qcow2 and save it in the /home/student/
directory.
2. Create a copy of the diskimage-builder elements directory to work with in the /home/
student/ directory.
3. Create a post-install.d directory under the working copy of the rhel7 element.
4. Add a script under the rhel7/post-install.d directory to enable the httpd service.
Environment Variables
Variable Content
NODE_DIST rhel7
DIB_LOCAL_IMAGE /home/student/osp-small.qcow2
DIB_YUM_REPO_CONF "/etc/yum.repos.d/openstack.repo"
102 CL210-RHOSP10.1-en-2-20171006
Instance Attributes
Attribute Value
flavor m1.web
key pair operator1-keypair1
network production-network1
image production-rhel-web
security group production-web
name production-web1
10. List the available floating IP addresses, and then allocate one to production-web1.
12. From workstation, confirm that the custom web page, displayed from production-
web1, contains the text production-rhel-web.
Evaluation
From workstation, run the lab customization-review grade command to confirm the
success of this exercise. Correct any reported failures and rerun the command until successful.
Cleanup
From workstation, run the lab customization-review cleanup command to clean up
this exercise.
CL210-RHOSP10.1-en-2-20171006 103
Solution
In this lab, you will build a disk image using diskimage-builder, and then modify it using
guestfish.
Resources
Base Image URL http://materials.example.com/osp-small.qcow2
Diskimage-builder elements /usr/share/diskimage-builder/elements
directory
Outcomes
You will be able to:
On workstation, run the lab customization-review setup command. This ensures that
the required packages are installed on workstation, and provisions the environment with a public
network, a private network, a key pair, and security rules to access the instance.
Steps
1. From workstation, retrieve the osp-small.qcow2 image from http://
materials.example.com/osp-small.qcow2 and save it in the /home/student/
directory.
2. Create a copy of the diskimage-builder elements directory to work with in the /home/
student/ directory.
3. Create a post-install.d directory under the working copy of the rhel7 element.
4. Add a script under the rhel7/post-install.d directory to enable the httpd service.
104 CL210-RHOSP10.1-en-2-20171006
[student@workstation post-install.d]$ cd
[student@workstation ~]$
Environment Variables
Variable Content
NODE_DIST rhel7
DIB_LOCAL_IMAGE /home/student/osp-small.qcow2
DIB_YUM_REPO_CONF "/etc/yum.repos.d/openstack.repo"
ELEMENTS_PATH /home/student/elements
CL210-RHOSP10.1-en-2-20171006 105
7.3. Edit the /var/www/html/index.html file and include the required key words.
7.4. To ensure the new index page works with SELinux in enforcing mode, restore the /var/
www/ directory context (including the index.html file).
><fs> exit
[student@workstation ~]$
Instance Attributes
Attribute Value
flavor m1.web
key pair operator1-keypair1
network production-network1
image production-rhel-web
security group production-web
106 CL210-RHOSP10.1-en-2-20171006
Attribute Value
name production-web1
10. List the available floating IP addresses, and then allocate one to production-web1.
10.1. List the floating IPs. Available IP addresses have the Port attribute set to None.
CL210-RHOSP10.1-en-2-20171006 107
man:apachectl(8)
Main PID: 833 (httpd)
...output omitted...
12. From workstation, confirm that the custom web page, displayed from production-
web1, contains the text production-rhel-web.
Evaluation
From workstation, run the lab customization-review grade command to confirm the
success of this exercise. Correct any reported failures and rerun the command until successful.
Cleanup
From workstation, run the lab customization-review cleanup command to clean up
this exercise.
108 CL210-RHOSP10.1-en-2-20171006
Summary
In this chapter, you learned:
• The pros and cons of building an image versus customizing an existing one, such as meeting
organization security standards, including third-party agents, and adding operator accounts.
• When to use the guestfish or virt-customize tools. Use guestfish when you need
to perform low-level tasks such as partitioning disks. Use virt-customize for all common
customization tasks such as setting passwords and installing packages.
• Making changes to an image using these tools affects SELinux file contexts, because SELinux
is not supported directly in the chroot environment.
CL210-RHOSP10.1-en-2-20171006 109
MANAGING STORAGE
Overview
Goal Manage Ceph and Swift storage for OpenStack.
Objectives • Describe back-end storage options for OpenStack
services.
CL210-RHOSP10.1-en-2-20171006 111
Objectives
After completing this section, students should be able to describe back-end storage options for
OpenStack services.
In a physical enterprise environment, servers are often installed with local storage drives
attached to them, and use external storage to scale that local storage. This is also true of a
cloud-based instance, where the instance has some associated local storage, and also some
external storage as a way to scale the local storage. In cloud environments, storage is a key
resource that needs to be managed appropriately so that the maximum number of users can
take advantage of those resources. Local storage for instances is based in the compute nodes
where those instances run, and Red Hat OpenStack Platform recycles this local storage when
an instance terminates. This type of storage is known as ephemeral storage, and it includes both
the effective storage space a user can use inside of an instance and the storage used for swap
memory by the instance. All the ephemeral storage resources are removed when the instance
terminates.
The disk drive space of the physical servers on which instances run limits the available local
storage. To scale the storage of an instance, Red Hat OpenStack Platform provisions additional
space with the OpenStack block storage service, object storage service, or file share service. The
storage resources provided by those services are persistent, so they remain after the instance
terminates.
Although ephemeral storage usually provides better performance, sometimes users need to store
data persistently. Red Hat OpenStack Platform services provide persistent storage in the form
of block storage and object storage. The block storage service allows storing data on a device
available in the instance file system. The object storage service provides an external storage
infrastructure available to instances.
Red Hat OpenStack Platform supports several storage systems as the back end for their services.
Those storage systems include:
112 CL210-RHOSP10.1-en-2-20171006
LVM
The block storage service supports LVM as a storage back end. LVM is available but not officially
supported by Red Hat. An LVM-based back end requires a volume group. Each block storage
volume uses a logical volume as its back end.
NFS
Red Hat OpenStack Platform services such as the block storage service support NFS as a storage
back end. Each volume back end resides in the NFS shares specified in the driver options in the
block storage service configuration file.
Vendor-specific Storage
Supported storage hardware vendor provides a driver for Red Hat OpenStack Platform services
to use their storage infrastructure as a back end.
Note
Red Hat provides support for Red Hat Ceph Storage and NFS.
LVM is suitable for use in test environments. The storage volumes are created on the local
storage of the machine where the block storage service is running. This back end uses that
machine as an iSCSI target to export those storage volumes. This configuration is a bottleneck
when scaling up the environment.
Red Hat Ceph Storage is a separate infrastructure from Red Hat OpenStack Platform. This
storage system provides fault tolerance and scalability. Red Hat Ceph Storage is not the best
choice for some proof-of-concept environments, because of its hardware requirements. The
undercloud can collocate some Red Hat Ceph Storage services in the controller node. This
configuration reduces the number of resources needed.
Because of the growing demand for computing and storage resources, the undercloud now
supports hyper-converged infrastructures (HCI). These infrastructures use compute nodes where
both Red Hat OpenStack Platform and Red Hat Ceph Storage services run. The use of hyper-
converged nodes is pushing the need for better utilization of the underlying hardware resources.
CL210-RHOSP10.1-en-2-20171006 113
Note
The Red Hat OpenStack Platform block storage and image services support Red Hat
Ceph Storage as their storage back end.
Swift Architecture
The Red Hat OpenStack Platform Swift service architecture has a front-end service, the proxy
server (swift-proxy), and three back-end services: account server (swift-account); object
server (swift-object); and container server (swift-container). The proxy server maintains
the Swift API. Red Hat OpenStack Platform configures the Keystone endpoint for Swift with the
URI for this API.
114 CL210-RHOSP10.1-en-2-20171006
References
Further information is available in the Storage Guide for Red Hat OpenStack Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform
CL210-RHOSP10.1-en-2-20171006 115
1. Red Hat provides support for which two storage back ends? (Choose two.)
a. In-memory
b. NFS
c. Red Hat Ceph Storage
d. Raw devices
e. LVM
2. Which two benefits are provided by a Red Hat Ceph Storage-based back end over NFS?
(Choose two.)
a. Snapshots
b. No single point of failure
c. Petabyte-scale storage
d. Thin provisioning
e. Integration with Red Hat OpenStack Platform
a. Production-ready environments
b. Cluster environments
c. Proof of concept environments
d. High performance environments (local storage based)
4. Which method uses the Red Hat OpenStack Platform block storage service to access Ceph?
a. CephFS
b. Ceph Gateway (RADOSGW)
c. RBD
d. Ceph native API (librados)
5. Which two Red Hat OpenStack Platform services are supported to use Red Hat Ceph Storage
as its back end? (Choose two.)
116 CL210-RHOSP10.1-en-2-20171006
Solution
Choose the correct answers to the following questions:
1. Red Hat provides support for which two storage back ends? (Choose two.)
a. In-memory
b. NFS
c. Red Hat Ceph Storage
d. Raw devices
e. LVM
2. Which two benefits are provided by a Red Hat Ceph Storage-based back end over NFS?
(Choose two.)
a. Snapshots
b. No single point of failure
c. Petabyte-scale storage
d. Thin provisioning
e. Integration with Red Hat OpenStack Platform
a. Production-ready environments
b. Cluster environments
c. Proof of concept environments
d. High performance environments (local storage based)
4. Which method uses the Red Hat OpenStack Platform block storage service to access Ceph?
a. CephFS
b. Ceph Gateway (RADOSGW)
c. RBD
d. Ceph native API (librados)
5. Which two Red Hat OpenStack Platform services are supported to use Red Hat Ceph Storage
as its back end? (Choose two.)
CL210-RHOSP10.1-en-2-20171006 117
Objectives
After completing this section, students should be able to configure Ceph as the back-end storage
for OpenStack services.
The Ceph architecture is based on the daemons listed in Figure 4.2: Red Hat Ceph storage
architecture. Multiple OSDs can run on a single server, but can also run across servers. These
daemons can be scaled out to meet the requirements of the architecture being deployed.
Ceph Monitors
Ceph monitors (MONs) are daemons that maintain a master copy of the cluster map. The cluster
map is a collection of five maps that contain information about the Ceph cluster state and
configuration. Ceph daemons and clients can check in periodically with the monitors to be sure
they have the most recent copy of the map. In this way they provide consensus for distributed
118 CL210-RHOSP10.1-en-2-20171006
decision making. The monitors must establish a consensus regarding the state of the cluster.
This means that an odd number of monitors is required to avoid a stalled vote, and a minimum
of three monitors must be configured. For the Ceph Storage cluster to be operational and
accessible, more than 50% of monitors must be running and operational. If the number of active
monitors falls below this threshold, the complete Ceph Storage cluster will become inaccessible
to any client. This is done to protect the integrity of the data.
The goal for the OSD daemon is to bring the computing power as close as possible to the physical
data to improve performance.
Each OSD has its own journal, not related to the file-system journal. Journals use raw volumes on
the OSD nodes, and should be configured on a separate device, and if possible a fast device, such
as an SSD, for performance oriented and/or heavy write environments. Depending on the Ceph
deployment tool used, the journal is configured such that if a Ceph OSD, or a node where a Ceph
OSD is located, fails, the journal is replayed when the OSD restarts. The replay sequence starts
after the last sync operation, as previous journal records were trimmed out.
Metadata Server
The Ceph Metadata Server (MDS) is a service that provides POSIX-compliant, shared file-system
metadata management, which supports both directory hierarchy and file metadata, including
ownership, time stamps, and mode. MDS uses RADOS to store metadata instead of local storage,
and has no access to file content, because it is only required for file access. RADOS is an object
storage service and is part of Red Hat Ceph Storage.
MDS also enables CephFS to interact with the Ceph Object Store, mapping an inode to an object,
and recording where data is stored within a tree. Clients accessing a CephFS file system first
make a request to an MDS, which provides the information needed to get files from the correct
OSDs.
Note
The metadata server is not deployed by the undercloud in the default Ceph
configuration.
• The Ceph native API (librados): native interface to the Ceph cluster. Service interfaces built
on this native interface include the Ceph Block Device, the Ceph Gateway, and the Ceph File
System.
• The Ceph Gateway (RADOSGW): RESTful APIs for Amazon S3 and Swift compatibility. The Ceph
Gateway is referred to as radosgw.
CL210-RHOSP10.1-en-2-20171006 119
• The Ceph Block Device (RBD, librbd): This is a Python module that provides file-like access to
Ceph Block Device images.
• The Ceph File System (CephFS, libcephfs): provides access to a Ceph cluster via a POSIX-like
interface.
When a cluster is deployed without creating a pool, Ceph uses the default pools for storing data.
By default, only the rbd pool is created when Red Hat Ceph Storage is installed.
The ceph osd lspools command displays the current pools in the cluster. This includes the
pools created by the undercloud to integrate Red Hat Ceph Storage with Red Hat OpenStack
Platform services.
Users
A Ceph client, which can be either a user or a service, requires a Ceph user to access the Ceph
cluster. By default, Red Hat Ceph Storage creates the admin user. The admin user can create
other users and their associated key-ring files. Each user has an associated key-ring file. The
usual location of this file is the /etc/ceph directory on the client machine.
Permissions are granted at the pool level for each Ceph user, either for all pools or to one or
more specific pools. These permissions can be read, write, or execute. The users available in a
Ceph cluster can be listed using the ceph auth list command. These users include the admin
user created by default, and the openstack user created by the undercloud for integration with
Red Hat OpenStack Platform services.
120 CL210-RHOSP10.1-en-2-20171006
Hyper-converged Infrastructures
The demand for computing and storage resources in cloud computing environments is growing.
This growing demand is pushing for better utilization of the underlying hardware resources. The
undercloud supports this initiative by supporting the deployment of hyper-converged nodes.
These hyper-converged nodes include both compute and Red Hat Ceph Storage services.
The undercloud supports the deployment and management of Red Hat OpenStack Platform
environments that only use hyper-converged nodes, as well as Red Hat OpenStack Platform
environments with a mix of hyper-converged and compute nodes without any Ceph service.
Hyper-converged node configuration needs to be adjusted manually after deployment to avoid
degradation of either computing or storage services, because of shared hardware resources.
Troubleshooting Ceph
Red Hat Ceph Storage uses a configuration file, ceph.conf, under the /etc/ceph directory. All
the machines running Ceph daemons, and the Ceph clients use this configuration file. Each Ceph
daemon creates a log file on the machine where it is running. These log files are located in the /
var/log/ceph directory.
The Red Hat Ceph Storage CLI tools provide several commands that you can use to determine
the status of the Ceph cluster. For example, the ceph health command determines the
current health status of the cluster. This status can be HEALTH_OK when no errors are present,
HEALTH_WARN, or HEALTH_ERR when the cluster has some issues.
The ceph -s command provides more details about the Ceph cluster's status, such as the
number of MONs and OSDs and the status of the current placement groups (PGs).
[root@demo]# ceph -s
cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8
health HEALTH_OK
monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0}
election epoch 4, quorum 0 overcloud-controller-0
osdmap e53: 3 osds: 3 up, 3 in
flags sortbitwise
pgmap v1108: 224 pgs, 6 pools, 595 MB data, 404 objects
1897 MB used, 56437 MB / 58334 MB avail
224 active+clean
The ceph -w command, in addition to the Ceph cluster's status, returns Ceph cluster events.
Enter Ctrl+C to exit this command.
CL210-RHOSP10.1-en-2-20171006 121
[root@demo]# ceph -w
cluster 2ff74e60-3cb9-11e7-96f3-52540001fac8
health HEALTH_OK
monmap e1: 1 mons at {overcloud-controller-0=172.24.3.1:6789/0}
election epoch 4, quorum 0 overcloud-controller-0
osdmap e53: 3 osds: 3 up, 3 in
flags sortbitwise
pgmap v1108: 224 pgs, 6 pools, 595 MB data, 404 objects
1897 MB used, 56437 MB / 58334 MB avail
224 active+clean
There are other commands available, such as the ceph osd tree command, which shows the
status of the OSD daemons, either up or down. This command also displays the machine where
those OSD daemons are running.
OSD daemons can be managed using systemd unit files. The systemctl stop ceph-
osd@osdid command supports the management of a single OSD daemon with the ID osdid.
This command has to be executed in the Ceph node where the OSD with the corresponding ID is
located. If the OSD with an ID of 0 is located on the demo server, the following command would
be used to stop that OSD daemon:
• Current status of the OSDs (up, down, out, in). An OSD's status is up if the OSD is running,
and down if the OSD is not running. An OSD's status is in if the OSD allows data read and
write, or out if the OSD does not.
Although Ceph is built for seamless scalability, this does not mean that the OSDs cannot run out
of space. Space-related warning or error conditions are reported both by the ceph -s and ceph
health commands, and OSD usage details are reported by the ceph osd df command. When
an OSD reaches the full threshold, it stops accepting write requests, although read requests
are still served.
122 CL210-RHOSP10.1-en-2-20171006
If the MON with an ID of 1 is located on the demo server, the following command would be used
to get additional information about the quorum status for the MON:
3. Verify the monitor daemon and authentication settings in the Ceph cluster's configuration
file.
5. Verify the number of MON and OSD daemons configured in the Ceph cluster.
10. Locate the log files for the three OSD daemons.
References
Further information is available in the Red Hat Ceph Storage for the Overcloud Guide for
Red Hat OpenStack Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
CL210-RHOSP10.1-en-2-20171006 123
In this exercise, you will verify the status of a Ceph cluster. You will also verify the Ceph cluster
configuration as the back end for OpenStack services. Finally you will troubleshoot and fix an
issue with a Ceph OSD.
Outcomes
You should be able to:
• Verify Ceph pools and user for Red Hat OpenStack Platform services.
Steps
1. Verify that the Ceph cluster status is HEALTH_OK.
1.2. Verify Ceph cluster status using the sudo ceph health command.
2. Verify the status of the Ceph daemons and the cluster's latest events.
2.1. Using the sudo ceph -s command, you will see a MON daemon and three OSD
daemons. The three OSD daemons' states will be up and in.
124 CL210-RHOSP10.1-en-2-20171006
2017-05-22 10:48:03.427574 mon.0 [INF] pgmap v574: 224 pgs: 224 active+clean;
1359 kB data, 122 MB used, 58212 MB / 58334 MB avail
...output omitted...
Ctrl+C
3. Verify that the pools and the openstack user, required for configuring Ceph as the back
end for Red Hat OpenStack Platform services, are available.
3.1. Verify that the images and volumes pools are available using the sudo ceph osd
lspools command.
3.2. Verify that the openstack user is available using the sudo ceph auth list
command. This user will have rwx permissions for both the images and volumes
pools.
4. Stop the OSD daemon with ID 0. Verify the Ceph cluster's status.
4.1. Verify that the Ceph cluster's status is HEALTH_OK, and the three OSD daemons are up
and in.
CL210-RHOSP10.1-en-2-20171006 125
4.3. Use the systemd unit file for ceph-osd to stop the OSD daemon with ID 0.
4.5. Verify the Ceph cluster's status is HEALTH_WARN. The two OSDs daemons are up and in
out of three.
5. Start the OSD daemon with ID 0 to fix the issue. Verify that the Ceph cluster's status is
HEALTH_OK.
5.1. Use the systemd unit file for ceph-osd to start the OSD daemon with ID 0.
126 CL210-RHOSP10.1-en-2-20171006
Cleanup
From workstation, run the lab storage-config-ceph cleanup script to clean up this
exercise.
CL210-RHOSP10.1-en-2-20171006 127
Objectives
After completing this section, students should be able to manage Swift as object storage.
Swift Architecture
Swift is a fully distributed storage solution, where both static data and binary objects are stored.
It is neither a file system nor a real-time data storage system. It can easily scale to multiple
petabytes or billions of objects.
The Swift components listed in the following table are all required for the architecture to work
properly.
Component Description
Proxy Server Processes all API calls and locates the requested object.
Encodes and decodes data if Erasure Code is being used.
Ring Maps the names of entities to their stored location on disk.
Accounts, containers, and object servers each have their own
ring.
Account Server Holds a list of all containers.
Container Server Holds a list of all objects.
Object Server Stores, retrieves, and deletes objects.
The proxy server interacts with the appropriate ring to route requests and locate objects. The
ring stores a mapping between stored entities and their physical location.
By default, each partition of the ring is replicated three times to ensure a fully distributed
solution. Data is evenly distributed across the capacity of the cluster. Zones ensure that data is
isolated. Because data is replicated across zones, failure in one zone does not impact the rest of
the cluster.
Zones are created to ensure that failure is not an option. Each data replica should reside within
a different zone. Zone configuration ensures that should one zone fail there are still two up and
running that can either accept new objects or retrieve stored objects.
The recommended number of zones is five, on five separate nodes. As mentioned previously,
Swift, by default, writes three replicas. If there are only three zones and one becomes
unavailable, Swift cannot hand off the replica to another node. With five nodes, Swift has options
and can automatically write the replica to another node ensuring that eventually there will be
three replicas.
After Swift is set up and configured, it is possible to rectify or alter the storage policy. Extra
devices can be added at any time.
128 CL210-RHOSP10.1-en-2-20171006
Storage rings can be built on any hardware that has the appropriate version of Swift installed.
Upon building or rebalancing (changing) the ring structure, the rings must be redistributed to
include all of the servers in the cluster. The swift-ring-builder utility is used to build and
manage rings.
To build the three rings for account, object, and container, the following syntax is used to add a
new device to a ring:
The zone includes a number as the ID for the rack to which the server belongs. The ipaddress is
the IP address of the server. The device is the device partition to add. The weight includes the size
of the device's partition.
Note
Pfrior to the Netwon release of OpenStack, the Object service used ports 6002, 6001
and 6000 for the account, container, and object services. These earlier default Swift
ports overlapped with ports already registered with IANA for X-Server, causing SELinux
policy conflicts and security risks. Red Hat OpenStack Platform switched to the new
ports in the Juno release, and the upstream Swift project completed the switch in
Newton.
Swift Commands
There are two sets of commands for Swift, an older version and a newer version. The older
commands, for example, swift post, swift list, and swift stat, are still supported.
However, OpenStack is moving to the OpenStack Unified CLI described below.
Note
By default, the following commands require the OpenStack user to have either the
admin or swiftoperator roles.
The openstack container list command displays all containers available to the user:
CL210-RHOSP10.1-en-2-20171006 129
The openstack object create command uploads an existing object to the specified
container:
The openstack container save command saves the contents of an existing container
locally:
The openstack object list command lists all of the objects stored in the specified
container:
The openstack object delete command deletes an object from the specified container:
This is perhaps where the similarities end. Ceph lends itself to block access storage, transactional
storage, and is recommended for single sites. Swift uses Object API access to storage, and is
recommended for unstructured data and geographical distribution. Applications that mostly use
block access storage are built in a different way from those that use object access storage. The
decision might come down to which applications need object storage and how they access it.
Swift protects written data first and can therefore take additional time to update the entire
cluster. Ceph does not do this, which makes it a better candidate for databases and real-time
data. Swift would be a better choice for large-scale, geographically dispersed, unstructured data.
This means that you might need or want both Ceph and Swift. This decision will depend on the
types of applications, the geographical structure of your data centers, the type of objects that
need to be stored, consistency of the data replicated, transactional performance requirements,
and the number of objects to be stored.
130 CL210-RHOSP10.1-en-2-20171006
The reduced cost can also be an advantage, with object storage you only pay for the amount of
storage that you use—you upload 5GB, you pay for 5GB. With volume storage, you pay for the size
of the disk you create; if you create a 50GB volume, you will pay for all 50GB whether or not it
is all used. However, be aware that if you use Swift over multiple data centers then the cost can
spiral because you are moving a lot of data over the internet; this can get expensive.
Swift is best used for large pools of small objects. It is easily scalable, whereas volumes are not.
Use Cases
A major university uses Swift to store videos of every sporting event for both men's and women's
sporting events. All events for an entire year are stored in an omnipresent and easily accessible
storage solution. Students, alumni, and fans can use any internet-enabled web browser to access
the university's web site and click a link to view, in its entirety, their desired sporting event.
Note
If you were to change the size of the physical drive, then you would have to rebalance
the ring.
CL210-RHOSP10.1-en-2-20171006 131
Troubleshooting Swift
Swift logs all troubleshooting events in /var/log/swift/swift.log. You should start your
troubleshooting process here. Swift logging is very verbose and the generated logs can be used
for monitoring, audit records, and performance. Logs are organized by log level and syslog
facility. Log lines for the same request have the same transaction ID.
Make sure that all processes are running; the basic ones required are Proxy Server, Account
Server, Container Server, Object Server, and Auth Server.
Drive Failure
It is imperative to unmount the failed drive; this should be the first step taken. This action makes
object retrieval by Swift much easier. Replace the drive, format it and mount it, and let the
132 CL210-RHOSP10.1-en-2-20171006
replication feature take over. The new drive will quickly populate with replicas. If a drive cannot
be replaced immediately, ensure that it is unmounted, that the mount point is owned by root, and
the device weight is set to 0. Setting the weight to 0 is preferable to removing it from the ring
because it gives Swift the chance to try and replicate from the failing disk (it could be that some
data is retrievable), and after the disk has been replaced you can increase the weight of the disk,
removing the need to rebuild the ring.
The following commands show how to change the weight of a device using the swift-
ring-builder command. In the following command, service is either account, object, or
container, device is the device's partition name, and weight is the new weight.
For example, to set the weight of a device named vdd to 0, the previous command must be
executed using the three rings, as follows:
The device can be added back to Swift using the swift-ring-builder set_weight
command, with the new weight for the device. The device's weight has to be updated in the three
rings. For example, if a device's weight has to be changed to 100, the following commands must
be executed using the three rings, as follows:
The three rings must then be rebalanced. The weight associated with each device on each ring
can then be obtained using the swift-ring-builder command. The following command
returns information for each device, including the weight associated with the device in that ring:
CL210-RHOSP10.1-en-2-20171006 133
Server Failure
Should a server be experiencing hardware issues, ensure that the Swift services are not running.
This guarantees that Swift will work around the failure and start replicating to another server. If
the problem can be fixed within a relatively short time, for example, a couple of hours, then let
Swift work around the failure automatically and get the server back online. When online again,
Swift will ensure that anything missing during the downtime is updated.
If the problem is more severe, or no quick fix is possible, it is best to remove the devices from
the ring. After repairs have been carried out, add the devices to the ring again. Remember
to reformat the devices before adding them to the ring, because they will almost certainly be
responsible for a different set of partitions than before.
References
Further information is available in the Object Storage section of the Storage Guide for
Red Hat OpenStack Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
134 CL210-RHOSP10.1-en-2-20171006
In this exercise, you will upload an object to the OpenStack object storage service, retrieve that
object from an instance, and then verify that the object has been correctly downloaded to the
instance.
Resources
Files /home/student/developer1-finance-rc
Outcomes
You should be able to:
Steps
1. Create a 10MB file named dataset.dat. As the developer1 user, create a container called
container1 in the OpenStack object storage service. Upload the dataset.dat file to this
container.
1.2. Load the credentials for the developer1 user. This user has been configured by the lab
script with the role swiftoperator.
CL210-RHOSP10.1-en-2-20171006 135
2. Download the dataset.dat object to the finance-web1 instance created by the lab
script.
2.1. Verify that the finance-web1 instance's status is ACTIVE. Verify the floating IP
address associated with the instance.
2.2. Copy the credentials file for the developer1 user to the finance-web1 instance. Use
the cloud-user user and the /home/student/developer1-keypair1.pem key
file.
2.3. Log in to the finance-web1 instance using cloud-user as the user and the /home/
student/developer1-keypair1.pem key file.
2.5. Download the dataset.dat object from the object storage service.
136 CL210-RHOSP10.1-en-2-20171006
Cleanup
From workstation, run the lab storage-obj-storage cleanup script to clean up this
exercise.
CL210-RHOSP10.1-en-2-20171006 137
In this lab, you will fix an issue in the Ceph environment. You will also upload a MOTD file to the
OpenStack object storage service. Finally, you will retrieve that MOTD file inside an instance.
Resources
Files: http://materials.example.com/motd.custom
Outcomes
You should be able to:
• Download and implement an object in the Object storage service inside an instance.
From workstation, run lab storage-review setup, which verifies OpenStack services and
previously created resources. This script also misconfigures Ceph and launches a production-
web1 instance with OpenStack CLI tools.
Steps
1. The Ceph cluster has a status issue. Fix the issue to return the status to HEALTH_OK.
2. As the operator1 user, create a new container called container4 in the Object storage
service. Upload the custom MOTD file available at http://materials.example.com/
motd.custom to this container.
3. Log in to the production-web1 instance, and download the motd.custom object from
Swift to /etc/motd. Use the operator1 user credentials.
4. Verify that the MOTD file includes the message Updated MOTD message.
Evaluation
On workstation, run the lab storage-review grade command to confirm success of this
exercise.
Cleanup
From workstation, run the lab storage-review cleanup script to clean up this exercise.
138 CL210-RHOSP10.1-en-2-20171006
Solution
In this lab, you will fix an issue in the Ceph environment. You will also upload a MOTD file to the
OpenStack object storage service. Finally, you will retrieve that MOTD file inside an instance.
Resources
Files: http://materials.example.com/motd.custom
Outcomes
You should be able to:
• Download and implement an object in the Object storage service inside an instance.
From workstation, run lab storage-review setup, which verifies OpenStack services and
previously created resources. This script also misconfigures Ceph and launches a production-
web1 instance with OpenStack CLI tools.
Steps
1. The Ceph cluster has a status issue. Fix the issue to return the status to HEALTH_OK.
1.2. Determine the Ceph cluster status. This status will be HEALTH_WARN.
1.3. Determine what the issue is by verifying the status of the Ceph daemons. Only two OSD
daemons will be reported as up and in, instead of the expected three up and three in.
1.4. Determine which OSD daemon is down. The status of the OSD daemon with ID 0 on
ceph0 is down.
CL210-RHOSP10.1-en-2-20171006 139
1.5. Start the OSD daemon with ID 0 using the systemd unit file.
1.6. Verify that the Ceph cluster status is HEALTH_OK. Initial displays may show the Ceph
cluster in recovery mode, with the percentage still degraded shown in parenthesis.
2. As the operator1 user, create a new container called container4 in the Object storage
service. Upload the custom MOTD file available at http://materials.example.com/
motd.custom to this container.
2.2. View the contents of the motd.custom file. This file contains a new MOTD message.
140 CL210-RHOSP10.1-en-2-20171006
+--------------------+------------+---------------+
2.5. Create a new object in the container4 container using the motd.custom file.
3. Log in to the production-web1 instance, and download the motd.custom object from
Swift to /etc/motd. Use the operator1 user credentials.
3.2. Copy the operator1 user credentials to the production-web1 instance. Use cloud-
user as the user and the /home/student/operator1-keypair1.pem key file.
3.3. Log in to the production-web1 instance as the cloud-user user. Use the /home/
student/operator1-keypair1.pem key file.
3.5. Download the motd.custom object from the Object service using the operator1-
production-rc user credentials. Use the --file option to save the object as /etc/
motd.
Because writing /etc files requires root privileges, use sudo. Use the -E option to
carry the operator1 shell environment credentials into the new sudo root child shell,
because this command requires operator1's access to the Object storage container
while also requiring root privilege to write the /etc/motd file.
CL210-RHOSP10.1-en-2-20171006 141
4. Verify that the MOTD file includes the message Updated MOTD message.
Evaluation
On workstation, run the lab storage-review grade command to confirm success of this
exercise.
Cleanup
From workstation, run the lab storage-review cleanup script to clean up this exercise.
142 CL210-RHOSP10.1-en-2-20171006
Summary
In this chapter, you learned:
• Red Hat OpenStack Platform supports both Red Hat Ceph Storage and NFS as storage back
ends.
• The Red Hat Ceph Storage architecture is based on monitor (MON) daemons and object
storage device (OSD) daemons.
• Red Hat Ceph Storage features include seamless scalability and no single point of failure.
• The Red Hat OpenStack Platform block storage and image services use RBDs to access Ceph,
and require both a user and pool to access the cluster.
• The Red Hat OpenStack Platform object storage service (Swift) provides object storage for
instances.
• The Swift architecture includes a front-end service, the proxy server, and three back-end
services: the account server, the object server, and the container server.
• Users can create containers in Swift, and upload objects to those containers.
CL210-RHOSP10.1-en-2-20171006 143
MANAGING AND
TROUBLESHOOTING VIRTUAL
NETWORK INFRASTRUCTURE
Overview
Goal Manage and troubleshoot virtual network infrastructure
Objectives • Manage software-defined networking (SDN) segments and
subnets.
CL210-RHOSP10.1-en-2-20171006 145
Objectives
After completing this section, students should be able to:
Software-defined Networking
Software-defined networking (SDN) is a networking model that allows network administrators to
manage network services through the abstraction of several networking layers. SDN decouples
the software that handles the traffic, called the control plane, and the underlying mechanisms
that route the traffic, called the data plane. SDN enables communication between the control
plane and the data plane. For example, the OpenFlow project, combined with the OpenDaylight
project, provides such implementation.
SDN does not change the underlying protocols used in networking; rather, it enables the
utilization of application knowledge to provision networks. Networking protocols, such as TCP/
IP and Ethernet standards, rely on manual configuration by administrators for applications.
They do not manage networking applications, such as their network usage, the endpoint
requirements, or how much and how fast the data needs to be transferred. The goal of SDN is
to extract knowledge of how an application is being used by the application administrator or the
application's configuration data itself.
History
The origins of SDN development can be traced to around the mid 1990s. Research and
development continued through the early 2000s by several universities and organizations. In
2011, the Open Networking Foundation (ONF) was founded to promote SDN and other related
technologies such as OpenFlow.
Benefits of SDN
Consumers continue to demand fast, reliable, secure, and omnipresent network connections
to satisfy their need for personal mobile devices such as smartphones and tablets. Service
providers are utilizing virtualization and SDN technologies to better meet those needs.
• The decoupling of the control plane and data plane enables both planes to evolve
independently, which results in several advantages such as high flexibility, being vendor-
agnostic, open programmability, and a centralized network view.
• Security features that allow administrators to route traffic through a single, centrally located,
firewall. One advantage of this is the ability to utilize intrusion detection methods on real-time
captures of network traffic.
• Automated load balancing in SDNs enhances performance of servers load balancing, and
reduces the complexity of implementation.
146 CL210-RHOSP10.1-en-2-20171006
• Network scalability allows data centers to use features of software-defined networking along
with virtualized servers and storage to implement dynamic environments where computing
resources are added and removed as needed.
• Reduced operational costs by minimizing the need to deploy, maintain, and replace expensive
hardware such as many of the servers and network switches within a data center.
The SDN architecture delivers an open technology that eliminates costly vendor lock-in and
proprietary networking devices.
Arguments for using SDN over hardware for networking are growing as the technology continues
to develop as a smart and inexpensive approach to deploy network solutions. Many companies
and organizations currently use SDN technology within their data centers, taking advantage of
cost savings, performance factors, and scalability.
Architectural Components
The following list defines and explains the architectural components:
• Application Plane: The plane where applications and services that define network behavior
reside.
• Control Plane: Responsible for making decisions on how packets should be forwarded by one
or more network devices, and for pushing such decisions down to the network devices for
execution.
• Operational Plane: Responsible for managing the operational state of the network device, such
as whether the device is active or inactive, the number of ports available, the status of each
port, and so on.
• Forwarding Plane: Responsible for handling packets in the data path based on the instructions
received from the control plane. Actions of the forwarding plane include actions like
forwarding, dropping, and changing packets.
CL210-RHOSP10.1-en-2-20171006 147
SDN Terminology
Term Definition
Application SDN applications are programs that communicate their
network requirements and desired network behavior to the
SDN controller over a northbound interface (NBI).
Datapath The SDN datapath is a logical network device that exposes
visibility control over its advertised forwarding and data
processing capabilities. An SDN datapath comprises a Control
to Data-Plane Interface (CDPI) agent and a set of one or more
traffic forwarding engines.
Controller The SDN controller is a logically centralized entity in charge of
translating the requirements from the SDN application layer
down to the SDN datapaths. SDN controllers provides a view of
the network to the SDN applications.
Control to Data-Plane The CDPI is the interface defined between an SDN controller
Interface (CDPI) and an SDN datapath that provides control of all forwarding
operations, capabilities advertisement, statistics reporting,
and event notification.
Northbound Interfaces (NBI) NBIs are interfaces between SDN applications and SDN
controllers. They typically provide network views and enable
expression of network behavior and requirements.
Introduction to Networking
Administrators should be familiar with networking concepts when working with Red Hat
OpenStack Platform. The Neutron networking service is the SDN networking project that
provides Networking-as-a-service (NaaS) in virtual environments. It implements traditional
networking features such as subnetting, bridging, VLANs, and more recent technologies, such as
VXLANs and GRE tunnels.
Network Bridges
A network bridge is a network device that connects multiple network segments. Bridges can
connect multiple devices, and each device can send Ethernet frames to other devices without
having the frame removed and replaced by a router. Bridges keep the traffic isolated, and in most
148 CL210-RHOSP10.1-en-2-20171006
cases, the switch is aware of which MAC addresses are accessible at each port. Switches monitor
network activity and maintain a MAC learning table.
A VLAN is defined by an IEEE 802.1Q standard for carrying traffic on an Ethernet. 802.1.Q VLANs
are distinguished by their 4-bytes VLAN tag inserted in the Ethernet header. Within this 4-byte
VLAN tag, 12 bits represent the VLAN ID. This limits the number of VLAN IDs on a network to
4096.
CL210-RHOSP10.1-en-2-20171006 149
VXLAN Tunnels
Virtual eXtensible LAN (VXLAN) is a network virtualization technology that solves the scalability
problems associated with large cloud computing deployments. It increases scalability up to 16
million logical networks and allows the adjacency of layer 2 links across IP networks. The VXLAN
protocol encapsulates L2 networks and tunnels them over L3 networks.
Figure 5.5: The OpenStack Networking service shows how OpenStack Networking services can
be deployed: the two compute nodes run the Open vSwitch agent, which communicate with the
network node, itself running a set of dedicated OpenStack Networking services. Services includes
the metadata server, the Neutron networking server, as well as a set of extra components, such
as the Firewall-as-a-Service (FWaaS), or the Load Balancing-as-a-Service (LBaaS).
150 CL210-RHOSP10.1-en-2-20171006
• Tenant networks
OpenStack users create tenant networks for connectivity within projects. By default, these
networks are completely isolated and are not shared among projects. OpenStack Networking
supports the following types of network isolation and overlay technologies:
◦ Flat: All instances reside on the same network and can be shared with underlying hosts.
Flat networks do not recognize the concepts of VLAN tagging or network segregation. Use
cases for flat networks are limited to testing or proof-of-concept because there is no overlap
allowed. Only one network is supported, which limits the number of available IP addresses.
◦ VLAN: This type of networking allows users to create multiple tenant networks using
VLAN IDs, allowing network segregation. One use case is a web layer instance with traffic
segregated from database layer instances.
◦ GRE and VXLAN: These networks provide encapsulation for overlay networks to activate and
control communication between compute instances.
• Provider networks
CL210-RHOSP10.1-en-2-20171006 151
These networks map to the existing physical network in a data center and are usually flat or
VLAN networks.
• Subnets
A subnet is a block of IP addresses provided by the tenant and provider networks whenever
new ports are created.
• Ports
A port is a connection for attaching a single device, such as the virtual NIC of an instance, to
the virtual network. Ports also provide the associated configuration, such as a MAC address
and IP address, to be used on that port.
• Routers
Routers forward data packets between networks. They provide L3 and NAT forwarding for
instances on tenant networks to external networks. A router is required to send traffic outside
of the tenant networks. Routers can also be used to connect the tenant network to an external
network using a floating IP address.
Routers are created by authenticated users within a project and are owned by that project.
When tenant instances require external access, users can assign networks that have been
declared external by an OpenStack administrator to their project-owned router.
Routers implement Source Network Address Translation (SNAT) to provide outbound external
connectivity and Destination Network Address Translation (DNAT) for inbound external
connectivity.
• Security groups
A security group is a virtual firewall allowing instances to control outbound and inbound traffic.
It contains a set of security group rules, which are parsed when data packets are sent out of or
into an instance.
Managing Networks
Before launching instances, the virtual network infrastructure to which instances will connect
must be created. Prior to creating a network, it is important to consider what subnets will be
used. A router is used to direct traffic from one subnet to another.
• To create a provider network, run the openstack network create command. Specify the
network type by using the --provider-network-type option.
152 CL210-RHOSP10.1-en-2-20171006
• Similar to a physical network, the virtual network requires a subnet. The provider network
shares the same subnet and gateway associated with the physical network connected to the
provider network. To create a subnet for a provider network, run the openstack subnet
create command:
• Create the corresponding subnet for the tenant network, specifying the tenant network
CIDR. By default, this subnet uses DHCP so the instances can obtain IP addresses. The first IP
address of the subnet is reserved as the gateway IP address.
CL210-RHOSP10.1-en-2-20171006 153
1. Packets leaving the eth0 interface of the instance are routed to a Linux bridge.
2. The Linux bridge is connected to an Open vSwitch bridge by a vEth pair. The Linux bridge
is used for inbound and outbound firewall rules, as defined by the security groups. Packets
traverse the vEth pair to reach the integration bridge, usually named br-int.
3. Packets are then moved to the external bridge, usually br-ex, over patch ports. OVS flows
manage packet headers according to the network configuration. For example, flows are used
to strip VLAN tags from network packets before forwarding them to the physical interfaces.
2. Create a subnet for a provider network, and specify the floating IP address slice using the
--allocation-pool option.
154 CL210-RHOSP10.1-en-2-20171006
4. Create the corresponding subnet for the tenant network, specifying the tenant network
CIDR. The first IP address of the subnet is reserved as the gateway IP address.
References
Further information is available in the Networking Guide for Red Hat OpenStack
Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
CL210-RHOSP10.1-en-2-20171006 155
In this exercise, you will manage networks and routers. You will also review the implementation of
the network environment.
Outcomes
You should be able to:
• Create networks
• Create routers
Run the lab network-managing-sdn setup command. This script ensures that the
OpenStack services are running and the environment is properly configured for this exercise.
The script creates the OpenStack user developer1 and the OpenStack administrative user
architect1 in the research project. The script also creates the rhel7 image and the
m1.small flavor.
Steps
1. From workstation, source the developer1-research-rc credentials file. As the
developer1 user, create a network for the project. Name the network research-
network1.
156 CL210-RHOSP10.1-en-2-20171006
2. Create the subnet research-subnet1 for the network in the 192.168.1.0/24 range. Use
172.25.250.254 as the DNS server.
3. Open another terminal and log in to the controller node, controller0, to review the ML2
configuration. Ensure that there are driver entries for VLAN networks.
3.1. Log in to the controller node as the heat-admin user and become root.
3.2. Go to the /etc/neutron/ directory. Use the crudini command to retrieve the values
for the type_drivers key in the ml2 group. Ensure that the vlan driver is included.
CL210-RHOSP10.1-en-2-20171006 157
3.3. Retrieve the name of the physical network used by VLAN networks. ML2 groups are
named after the driver, for example, ml2_type_vlan.
5. Create the subnet for the provider network provider-172.25.250 with an allocation
pool of 172.25.250.101 - 172.25.250.189. Name the subnet provider-
subnet-172.25.250. Use 172.25.250.254 for both the DNS server and the gateway.
Disable DHCP for this network.
158 CL210-RHOSP10.1-en-2-20171006
6.1. Source the developer1-research-rc credentials file and create the research-
router1 router.
CL210-RHOSP10.1-en-2-20171006 159
| routes | |
| status | ACTIVE |
| updated_at | 2017-06-07T20:56:46Z |
+-------------------------+--------------------------------------+
6.3. Use the neutron command to define the router as a gateway for the
provider-172.25.250 network.
8. Launch the research-web1 instance in the environment. Use the m1.small flavor and the
rhel7 image. Connect the instance to the research-network1 network.
160 CL210-RHOSP10.1-en-2-20171006
9.3. List the network ports. Locate the UUID of the port corresponding to the instance in the
research-network1 network.
10. Open another terminal. Use the ssh command to log in to the compute0 virtual machine as
the heat-admin user.
11. List the Linux bridges in the environment. Ensure that there is a qbr bridge that uses the
first ten characters of the Neutron port in its name. The bridge has two ports in it: the TAP
CL210-RHOSP10.1-en-2-20171006 161
device that the instance uses and the qvb vEth pair, which connects the Linux bridge to the
integration bridge.
12. Exit from the compute node and connect to the controller node.
13. To determine the port ID of the phy-br-ex bridge, use the ovs-ofctl command. The
output lists the ports in the br-ex bridge.
14. Dump the flows for the external bridge, br-ex. Review the entries to locate the flow for the
packets passing through the tenant network. Locate the rule that handles packets in the
phy-br-ex port. The following output shows how the internal VLAN ID, 2, is replaced with
the VLAN ID 500 as defined by the --provider-segment 500 option.
Cleanup
From workstation, run the lab network-managing-sdn cleanup script to clean up the
resources created in this exercise.
162 CL210-RHOSP10.1-en-2-20171006
CL210-RHOSP10.1-en-2-20171006 163
Objectives
After completing this section, students should be able to:
To this day, there are more than 20 drivers available from various manufacturers, including
Cisco, Microsoft, Nicira, Ryu, and Lenovo. Drivers implement a set of extensible mechanisms for
various network back-ends to be able to communicate with OpenStack Networking services. The
implementations can either utilize layer 2 agents with a Remote Procedure Call (RPC) or use the
OpenStack Networking mechanism drivers to interact with external devices or controllers. In
OpenStack, each network type is managed by a ML2 driver. Such drivers maintain any needed
network state, and can perform network validation or the creation of networks for OpenStack
projects.
The ML2 plug-in currently includes drivers for the following network types:
• Local: a network that can only be implemented on a single host. Local networks must only be
used in proof-of-concept or development environments.
• Flat: a network that does not support segmentation. A traditional layer 2 Ethernet network
can be considered a flat network. Servers that are connected to flat networks can listen to the
broadcast traffic and can contact each other. In OpenStack terminology, flat networks are used
to connect instances to existing layer 2 networks, or provider networks.
164 CL210-RHOSP10.1-en-2-20171006
• VLAN: a network that uses VLANs for segmentation. When users create VLAN networks,
a VLAN identifier (ID) is assigned from the range defined in the OpenStack Networking
configuration. Administrators must configure the network switches to trunk the corresponding
VLANs.
• GRE and VXLAN: networks that are similar to VLAN networks. GRE and VXLAN are overlay
networks that encapsulate network traffic. Both networks receive a unique tunnel identifier.
However, unlike VLANs, overlay networks do not require any synchronization between the
OpenStack environment and layer 2 switches.
The following lists some of the available OpenStack Networking ML2 plug-ins:
• Open vSwitch
• Cisco UCS and Nexus
• Linux Bridge
• Nicira Network Virtualization Platform (NVP)
• Ryu and OpenFlow Controller
• NEC OpenFlow
• Big Switch Controller
• Cloudbase Hyper-V
• MidoNet
• PLUMgrid
• Embrane
• IBM SDN-VE
• Nuage Networks
• OpenContrail
• Lenovo Networking
Note
Red Hat OpenStack Platform 10 adds support for composable roles. Composable roles
allow administrators to separate the network services into a custom role.
Layer 2 Population
The layer 2 (L2) population driver enables broadcast, multicast, and unicast traffic to scale
out on large overlay networks. By default, Open vSwitch GRE and VXLAN networks replicate
broadcasts to every agent, including those that do not host the destination network. This leads
to a significant network and processing overhead. L2 population is a mechanism driver for
OpenStack Networking ML2 plug-ins that leverages the implementation of overlay networks. The
service works by gaining full knowledge of the topology, which includes the MAC address and the
IP address of each port. As a result, forwarding tables can be programmed beforehand and the
processing of ARP requests is optimized. By populating the forwarding tables of virtual switches,
CL210-RHOSP10.1-en-2-20171006 165
such as Linux bridges or Open vSwitch bridges, the driver decreases the broadcast traffic inside
the physical networks.
Figure 5.7: Network routing on separate VLANs shows the network traffic flowing between
instances on separate VLANs:
Switching occurs at a lower level of the network, that is, on layer 2, which functions faster than
routing that occurs at layer 3. Administrators should consider having as few network hops as
possible between instances. Figure 5.8: Network switching shows a switched network that spans
on two physical systems, which allows two instances to directly communicate without using a
router. The instances share the same subnet, which indicates that they are on the same logical
network:
166 CL210-RHOSP10.1-en-2-20171006
Introduction to Subnets
A subnet is a logical subdivision of an IP network. On TCP/IP networks, the logical subdivision
is defined as all devices whose IP addresses have the same prefix. For example, using a /24
subnet mask, all devices with IP addresses on 172.16.0.0/24 would be part of the same
subnet with 256 possible addresses. Addresses on the /24 subnet include a network address of
172.16.0.0 and a broadcast address of 172.16.0.255, leaving 254 available host addresses
on the same subnet. A /24 subnet can be split by using a /25 subnet mask: 172.16.0.0/25
and 172.16.0.128/25, with 126 hosts per subnet. The first subnet would have a range from
172.16.0.0 (network) to 172.16.0.127 (broadcast) leaving 126 available host addresses.
The second subnet would have a range from 172.16.0.128 (network) to 172.16.0.255
(broadcast) leaving 126 available host addresses. This demonstrates that networks can be divided
into one or more subnets depending on their subnet mask.
A subnet may be used to represent all servers present in the same geographic location, or on
the same Local Area Network (LAN). By using subnets to divide the network, administrators can
connect many devices spread across multiple segments to the Internet. Subnets are a useful way
to share a network and create subdivisions on segments. The practice of creating subnet is called
subnetting. Figure 5.9: Network subnets shows three subnets connected to the same router.
CL210-RHOSP10.1-en-2-20171006 167
• Variable Length Subnet Mask (VLSM): subnet addresses are traditionally displayed using the
network address accompanied by the subnet mask. For example:
• Classless Inter Domain Routing (CIDR): this format shortens the subnet mask into its
total number of active bits. For example, in 192.168.100.0/24 the /24 is a shortened
representation of 255.255.255.0, which is a total of the number of flipped bits when
converted to binary.
168 CL210-RHOSP10.1-en-2-20171006
Note
Since all layer 2 plug-ins provide a total isolation between layer 2 networks,
administrators can use overlapping subnets. This is made possible by the use of
network namespaces that have their own routing tables. Routing tables manage the
routing of traffic. As each namespace has its own routing table, OpenStack Networking
is able to provide overlapping address in different virtual networks.
Administrators can use both the Horizon dashboard and the command-line interface to manage
subnets. The following output shows two subnets, each belonging to a network.
The subinternal1 subnet is an internal subnet, which provides internal networking for
instances. The openstack subnet show command allows administrators to review the details
for a given subnet.
The Network Topology view in the Horizon dashboard allows administrators to review their
network infrastructure. Figure 5.10: Network topology shows a basic topology comprised of an
external network and a private network, connected by a router:
CL210-RHOSP10.1-en-2-20171006 169
• Namespaces for routers, named qrouter-UUID, where UUID is the router ID. The router
namespace contains TAP devices like qr-YYY, qr-ZZZ, and qg-VVV as well as the
corresponding routes.
170 CL210-RHOSP10.1-en-2-20171006
• Namespaces for projects that use DHCP services, named qdhcp-UUID, where UUID is the
network ID. The project namespace contains the tapXXX interfaces and the dnsmasq process
that listens on that interface in order to provide DHCP services for project networks. This
namespace allows overlapping IPs between various subnets on the same network host.
The following output shows the implementation of network namespaces after the creation
of a project. In this setup, the namespaces are created on the controller, which also runs the
networking services.
qrouter-89bae387-396c-4b24-a064-241103bcdb14
qdhcp-0062e02b-7e40-407f-ac43-49e84de096ed
CL210-RHOSP10.1-en-2-20171006 171
2. A set of Netfilter rules is created in the router namespace. This routes the packet between
the instance's IP and the floating IP. OpenStack Networking implements a rule for incoming
traffic (SNAT) as well as for the outgoing traffic (DNAT). The following output shows the two
Netfilter rules in the router namespace.
[user@demo ~]$ ip netns exec qrouter-UUID iptables -L -nv -t nat | grep 250.28
24 1632 DNAT all -- * * 0.0.0.0/0 172.25.250.28
to:192.168.0.11
8 672 SNAT all -- * * 192.168.0.11 0.0.0.0/0
to:172.25.250.28
Note
The same network can be used to allocate floating IP addresses to instances even if
they have been added to private networks at the same time. The addresses allocated
as floating IPs from this network are bound to the qrouter namespace on the network
node, and perform both the Source Network Address Translation (SNAT) and Destination
Network Address Translation (DNAT) to the associated private IP address.
In contrast, the IP address allocated to the instance for direct external network access
is bound directly inside the instance, and allows the instance to communicate directly
with external networks.
172 CL210-RHOSP10.1-en-2-20171006
Input Output
Local Process
Linux Kernel
Inspection Point
• Set basic rules for various network services, such as NTP, VXLAN, or SNMP traffic.
• Allow source NAT on outgoing traffic, which is the traffic originating from instances.
• Create a rule that allows direct traffic from the instance's network devices to the security
group chain.
• Set rules that allow traffic from a defined set of IP and MAC address pairs.
• Drop any packet that is not associated with a state. States include NEW, ESTABLISHED,
RELATED, INVALID, and UNTRACKED.
• Routes direct packets that are associated with a known session to the RETURN chain.
The following output shows some of the rules implemented in a compute node. The neutron-
openvswi-FORWARD chain contains the two rules that direct the instance's traffic to the
security group chain. In the following output, the instance's security group chain is named
neutron-openvswi-scb2aafd8-b
...output omitted...
Chain neutron-openvswi-FORWARD (1 references)
pkts bytes target prot opt in out source destination
4593 387K neutron-openvswi-sg-chain all -- * * 0.0.0.0/0
0.0.0.0/0 PHYSDEV match --physdev-out tapcb2aafd8-b1 --physdev-is-bridged /*
Direct traffic from the VM interface to the security group chain. */
CL210-RHOSP10.1-en-2-20171006 173
A TAP device, such as vnet0 is how hypervisors such as KVM implement a virtual network
interface card. Virtual network cards are typically called VIF or vNIC. An Ethernet frame sent to
a TAP device is received by the guest operating system.
A vEth pair is a pair of virtual network interfaces connected together. An Ethernet frame sent to
one end of a vEth pair is received by the other end of a vEth pair. OpenStack Networking makes
use of vEth pairs as virtual patch cables in order to make connections between virtual bridges.
A Linux bridge behaves like a hub: administrators can connect multiple network interface
devices, whether physical or virtual, to a Linux bridge. Any Ethernet frames that come in from
one interface attached to the bridge is transmitted to all of the other devices. Moreover, bridges
are aware of the MAC addresses of the devices attached to them.
An Open vSwitch bridge behaves like a virtual switch: network interface devices connect to Open
vSwitch bridge's ports, and the ports can be configured like a physical switch's ports, including
VLAN configurations.
For an Ethernet frame to travel from eth0, which is the local network interface of a instance, to
the physical network, it must pass through six devices inside of the host:
174 CL210-RHOSP10.1-en-2-20171006
OpenStack instances. Both the ports created for floating IP addresses, as well as the instances,
are associated with a security group. If none is specified, the port is then associated with the
default security group. Additional security rules can be added to the default security group to
modify its behavior, or new security groups can be created as necessary.
Note
By default, the group drops all inbound traffic and allows all outbound traffic.
Netfilter rules are created on the compute node. Each time a new rule is created, a Netfilter rule
is inserted in the neutron-openvswi-XXX chain. The following output shows the Netfilter rule
that allow remote connections to the TCP port 565 after the creation of a security group rule.
The following output shows the flow rules on the bridge before the creation of any instance. This
is a single rule that causes the bridge to drop all traffic.
After an instance is running on a compute node, the rules are modified to look something like the
following output.
CL210-RHOSP10.1-en-2-20171006 175
The Open vSwitch agent is responsible for configuring flow rules on both the integration bridge
and the external bridge for VLAN translation. For example, when br-ex receives a frame marked
with VLAN ID of 1 on the port associated with phy-br-eth1, it modifies the VLAN ID in the
frame to 101. Similarly, when the integration bridge, br-int receives a frame marked with
VLAN ID of 101 on the port associated with int-br-eth1, it modifies the VLAN ID in the frame
to 1.
Note
If the OpenStack Networking DHCP agent is enabled and running when a subnet is
created, then by default, the subnet has DHCP enabled.
The DHCP agent runs inside a network namespace, named qdhcp-UUID, where UUID is the UUID
of a project network.
Inside the namespace, the dnsmasq process binds to a TAP device, such as tapae83329c-91.
The following output shows the TAP device on a network node, inside a namespace.
Administrators can locate the dnsmasq process associated with the namespace by searching the
output of the ps command for the UUID of the network.
176 CL210-RHOSP10.1-en-2-20171006
--pid-file=/var/lib/neutron/dhcp/0062e02b-7e40-407f-ac43-49e84de096ed /pid
--dhcp-hostsfile=/var/lib/neutron/dhcp/0062e02b-7e40-407f-ac43-49e84de096ed/host
--addn-hosts=/var/lib/neutron/dhcp/0062e02b-7e40-407f-ac43-49e84de096ed/addn_hosts
--dhcp-optsfile=/var/lib/neutron/dhcp/0062e02b-7e40-407f-ac43-49e84de096ed/opts
--dhcp-leasefile=/var/lib/neutron/dhcp/0062e02b-7e40-407f-ac43-49e84de096ed/leases
--dhcp-match=set:ipxe,175
--bind-interfaces
--interface=tapae83329c-91
--dhcp-range=set:tag0,192.168.0.0,static,86400s
--dhcp-option-force=option:mtu,1446
--dhcp-lease-max=256
--conf-file=/etc/neutron/dnsmasq-neutron.conf
--domain=openstacklocal
[ml2]
type_drivers = vlan
• A range of VLAN IDs that reflects the physical network is set in /etc/neutron/plugins/
ml2/ml2_conf.ini. For example, 171-172.
[ml2_type_vlan]
network_vlan_ranges=physnet1:171:172
• The br-ex bridge is set on the compute node, with eth1 enslaved to it.
bridge_mappings = physnet1:br-ex
external_network_bridge =
CL210-RHOSP10.1-en-2-20171006 177
Figure 5.12: Network flow between two VLANs shows the implementation of the various network
bridges, ports, and virtual interfaces.
Such a scenario can be used by administrators for connecting multiple VLAN-tagged interfaces
on a single network device to multiple provider networks. This scenario uses the physical network
called physnet1 mapped to the br-ex bridge. The VLANs use the IDs 171 and 172; the network
nodes and compute nodes are connected to the physical network using eth1 as the physical
interface.
Note
The ports of the physical switch on which these interfaces are connected must be
configured to trunk the VLAN ranges. If the trunk is not configured, the traffic will be
blocked.
The following procedure shows the creation of the two networks and their associated subnets.
178 CL210-RHOSP10.1-en-2-20171006
1. The following commands create the two networks. Optionally, administrators can mark the
networks as shared.
2. The following commands create the subnets and for each external network.
Run the brctl command to review the Linux bridges and their ports.
tap84878b78-63
CL210-RHOSP10.1-en-2-20171006 179
Bridge br-int
fail_mode: secure
Port int-br-ex
Interface int-br-ex
type: patch
options: {peer=phy-br-ex}
Port br-int
Interface br-int
type: internal
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port "qvo86257b61-5d"
tag: 3
Interface "qvo86257b61-5d"
Port "qvo84878b78-63"
tag: 2
Interface "qvo84878b78-63"
1. The packets that leave the instances from the eth0 interface arrive to the Linux bridge,
qbr. The instances use the virtual device, tap, as the network device. The device is set as a
port in the qbr bridge.
2. Each qvo end point residing in the Open vSwitch bridge is tagged with the internal VLAN
tag associated with the VLAN provider network. In this example, the internal VLAN tag 2 is
associated with the VLAN provider network provider-171, and VLAN tag 3 is associated
with VLAN provider network provider-172.
When a packet reaches the qvo end point, the VLAN tag is added to the packet header.
3. The packet is then moved to the Open vSwitch bridge br-ex using the patch between int-
br-ex and phy-br-ex.
Run the ovs-vsctl show command to view the ports in the br-ex and br-int bridges.
options: {peer=int-br-ex}
...output omitted...
Bridge br-int
180 CL210-RHOSP10.1-en-2-20171006
Port int-br-ex
Interface int-br-ex
type: patch
options: {peer=phy-br-ex}
4. When the packet reaches the endpoint phy-br-ex on the br-ex bridge, an Open vSwitch
flow inside the br-ex bridge replaces the internal VLAN tag with the actual VLAN tag
associated with the VLAN provider network.
Run the ovs-ofctl show br-ex command to retrieve the port number of the phy-br-
ex port. In the following example, the port phy-br-ex has a value of 4.
5. The following output shows how Open Flow handles packets in the phy-br-ex bridge
(in_port=4), with a VLAN ID of 2 (dl_vlan=2). Open vSwitch replaces the VLAN
tag with 171 (actions=mod_vlan_vid:171,NORMAL), then forwards the packet.
The output also shows any packets that arrive on the phy-br-ex (in_port=4
with the VLAN tag 3 (dl_vlan=3). Open vSwitch replaces the VLAN tag with 172
(actions=mod_vlan_vid:172,NORMAL), then forwards the packet.
Note
These rules are automatically added by the OpenStack Networking Open vSwitch
Agent.
CL210-RHOSP10.1-en-2-20171006 181
1. Incoming packets destined to instances from the external network first reach the eth1
network device. They are then forwarded to the br-ex bridge.
From the br-ex bridge, packets are moved to the integration bridge, br-int over the peer
patch that connects the two bridges (phy-br-ex and int-br-ex).
2. When the packet passes through the int-br-ex port, an Open vSwitch flow rule inside the
bridge adds the internal VLAN tag 2 if the packets belongs to the provider-171 network,
or the VLAN tag 3 if the packet belongs to the provider-172 network.
Run the ovs-ofctl dump-flows br-int command to view the flow in the integration
bridge:
The third rule instructs that packets passing through the int-br-ex port (in_port=18),
with a VLAN tag of 171 (dl_vlan=171), have the VLAN tag replaced with 2
(actions=mod_vlan_vid:2,NORMAL), and then forwards the packet. These rules are
automatically added by the OpenStack Networking Open vSwitch agent.
With the internal VLAN tag added to the packet, the qvo interface accepts it and forwards
it to the qvb interface after the VLAN tag has been stripped. The packet then reaches the
instance.
182 CL210-RHOSP10.1-en-2-20171006
2. Create a router and connect it to all the projects' subnets. This allows for connectivity
between two instances in separate projects.
4. Connect to the network node and use the tcpdump command against all network interfaces.
5. Connect to the compute node and use the tcpdump command against the qvb devices in
the qrouter namespace.
References
Further information is available in the Networking Guide for Red Hat OpenStack
Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
CL210-RHOSP10.1-en-2-20171006 183
In this exercise, you will manage network flow for two projects. You will review the network
implementation for multitenancy and trace packets between projects.
Outcomes
You should be able to:
Run the lab network-tracing-net-flows setup command. The script ensures that
OpenStack services are running and the environment is properly configured for the general
exercise. This script creates two projects: research and finance. The developer1 user is a
member of the research project, the developer2 user is a member of the finance project.
The architect1 user is the administrative user for the two projects. The script also spawns one
instance in each project.
Steps
1. As the architect1 administrative user, review the instances for each of the two projects.
1.1. From workstation, source the credential file for the architect1 user in the
finance project, available at /home/student/architect1-finance-rc. List the
instances in the finance project.
1.2. Source the credential file of the architect1 user for the research project, available
at /home/student/architect1-research-rc. List the instances in the project.
184 CL210-RHOSP10.1-en-2-20171006
2. As the architect1 administrative user in the research project, create a shared external
network to provide external connectivity for the two projects. Use provider-172.25.250
as the name of the network. The environment uses flat networks with datacentre as the
physical network name.
3. Create the subnet for the provider network in the 172.25.250.0/24 range. Name the
subnet provider-subnet-172.25.250. Disable the DHCP service for the network and
use an allocation pool of 172.25.250.101 - 172.25.250.189. Use 172.25.250.254
as the DNS server and the gateway for the network.
CL210-RHOSP10.1-en-2-20171006 185
--dns-nameserver 172.25.250.254 \
--allocation-pool start=172.25.250.101,end=172.25.250.189 \
provider-subnet-172.25.250
+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| allocation_pools | 172.25.250.101-172.25.250.189 |
| cidr | 172.25.250.0/24 |
| created_at | 2017-06-09T22:28:03Z |
| description | |
| dns_nameservers | 172.25.250.254 |
| enable_dhcp | False |
| gateway_ip | 172.25.250.254 |
| headers | |
| host_routes | |
| id | e5d37f20-c976-4719-aadf-1b075b17c861 |
| ip_version | 4 |
| ipv6_address_mode | None |
| ipv6_ra_mode | None |
| name | provider-subnet-172.25.250 |
| network_id | 56b18acd-4f5a-4da3-a83a-fdf7fefb59dc |
| project_id | c4606deb457f447b952c9c936dd65dcb |
| project_id | c4606deb457f447b952c9c936dd65dcb |
| revision_number | 2 |
| service_types | [] |
| subnetpool_id | None |
| updated_at | 2017-06-09T22:28:03Z |
+-------------------+--------------------------------------+
4. List the subnets present in the environment. Ensure that there are three subnets: one
subnet for each project and one subnet for the external network.
5. Create the research-router1 router and connect it to the two subnets, finance and
research.
186 CL210-RHOSP10.1-en-2-20171006
7. Ensure that the router is connected to the three networks by listing the router ports.
CL210-RHOSP10.1-en-2-20171006 187
"name": ""
},
{
"mac_address": "fa:16:3e:a1:77:5f",
"fixed_ips": "{\"subnet_id\": \"d1dd16ee-a489-4884-a93b-95028b953d16\",
\"ip_address\": \"192.168.1.1\"}",
"id": "fa7dab05-e5fa-4c2d-a611-d78670006ddf",
"name": ""
}
]
8. As the developer1 user, create a floating IP and attach it to the research-app1 virtual
machine.
8.1. Source the credentials for the developer1 user and create a floating IP.
9. As the developer2 user, create a floating IP and attach it to the finance-app1 virtual
machine.
9.1. Source the credentials for the developer2 user and create a floating IP.
188 CL210-RHOSP10.1-en-2-20171006
10. Source the credentials for the developer1 user and retrieve the floating IP attached to the
research-app1 virtual machine.
11. Test the connectivity to the instance research-app1, running in the research project by
using the ping command.
12. As the developer2 user, retrieve the floating IP attached to the finance-app1 virtual
machine so you can test connectivity.
CL210-RHOSP10.1-en-2-20171006 189
}
]
13. Use the ping command to reach the 172.25.250.P IP. Leave the command running, as
you will connect to the overcloud nodes to review how the packets are routed.
14. Open another terminal. Use the ssh command to log in to controller0 as the heat-
admin user.
15. Run the tcpdump command against all interfaces. Notice the two IP address to which the
ICMP packets are routed: 192.168.2.F, which is the private IP of the finance-app1
virtual machine, and 172.25.250.254, which is the gateway for the provider network.
16. Cancel the tcpdump command by pressing Ctrl+C and list the network namespaces.
Retrieve the routes in the qrouter namespace to determine the network device that
190 CL210-RHOSP10.1-en-2-20171006
17. Within the qrouter namespace, run the ping command to confirm that the private IP of
the finance-app1 virtual machine, 192.168.2.F, is reachable.
18. From the first terminal, cancel the ping command by pressing Ctrl+C. Rerun the ping
command against the floating IP of the research-app1 virtual machine, 172.25.250.N.
Leave the command running, as you will be inspecting the packets from the controller0.
19. From the terminal connected to the controller-0, run the tcpdump command. Notice the
two IP address to which the ICMP packets are routed: 192.168.1.R, which is the private IP
of the research-app1 virtual machine, and 172.25.250.254, which is the IP address of
the gateway for the provider network.
CL210-RHOSP10.1-en-2-20171006 191
16:58:40.340690 IP (tos 0x0, ttl 63, id 65405, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 192.168.1.R: ICMP echo request, id 24665, seq 47, length 64
16:58:40.341130 IP (tos 0x0, ttl 64, id 41896, offset 0, flags [none], proto ICMP
(1), length 84)
192.168.1.R > 172.25.250.254: ICMP echo reply, id 24665, seq 47, length 64
16:58:40.341141 IP (tos 0x0, ttl 63, id 41896, offset 0, flags [none], proto ICMP
(1), length 84)
172.25.250.N > 172.25.250.254: ICMP echo reply, id 24665, seq 47, length 64
16:58:41.341051 IP (tos 0x0, ttl 64, id 747, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 172.25.250.N: ICMP echo request, id 24665, seq 48, length 64
16:58:41.341102 IP (tos 0x0, ttl 63, id 747, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.254 > 192.168.1.R: ICMP echo request, id 24665, seq 48, length 64
16:58:41.341562 IP (tos 0x0, ttl 64, id 42598, offset 0, flags [none], proto ICMP
(1), length 84)
192.168.1.R > 172.25.250.254: ICMP echo reply, id 24665, seq 48, length 64
16:58:41.341585 IP (tos 0x0, ttl 63, id 42598, offset 0, flags [none], proto ICMP
(1), length 84)
172.25.250.N > 172.25.250.254: ICMP echo reply, id 24665, seq 48, length 64
...output omitted...
20. Cancel the tcpdump command by pressing Ctrl+C and list the network namespaces.
Retrieve the routes in the qrouter namespace to determine the network device that
handles routing for the 192.168.1.0/24 network. The following output indicates that
packets destined to the 192.168.1.0/24 network are routed through the qr-fa7dab05-
e5 device (the IDs and names will be different in your output).
21. Within the qrouter namespace, run the ping command to confirm that the private IP of
the finance-app1 virtual machine, 192.168.1.F, is reachable.
192 CL210-RHOSP10.1-en-2-20171006
23. List the Linux bridges. The following output indicates two bridges with two ports each.
Each bridge corresponds to an instance. The TAP devices in each bridge correspond to the
virtual NIC; the qvb devices correspond to the vEth pair that connect the Linux bridge to the
integration bridge, br-int.
24. Run the tcpdump command against any of the two qvb interface while the ping command
is still running against the 172.25.250.N floating IP. If the output does not show any
packets being captured, press CTRL+C and rerun the command against the other qvb
interface.
25. From the first terminal, cancel the ping command. Rerun the command against the
172.25.250.P IP, which is the IP of the finance-app1 instance.
CL210-RHOSP10.1-en-2-20171006 193
...output omitted...
26. From the terminal connected to compute0 node, enter CTRL+C to cancel the tcpdump
command. Rerun the command against the second qvb interface, qvb03565cda-b1.
Confirm that the output indicates some activity.
27. From the first terminal, cancel the ping and confirm that the IP address 192.168.2.F is
the private IP of the finance-app1 instance.
28. Log in to the finance-app1 instance as the cloud-user user. Run the ping command
against the floating IP assigned to the research-app1 virtual machine, 172.25.250.N.
28.1.Use the ssh command as the cloud-user user to log in to finance-app1, with an
IP address of 172.25.250.P. Use the developer2-keypair1 located in the home
directory of the student user.
28.2.Run the ping command against the floating IP of the research-app1 instance,
172.25.250.N.
194 CL210-RHOSP10.1-en-2-20171006
29. From the terminal connected to compute-0, enter CTRL+C to cancel the tcpdump
command. Rerun the command without specifying any interface. Confirm that the output
indicates some activity.
192.168.2.F > 172.25.250.N: ICMP echo request, id 12256, seq 309, length 64 ;
18:06:05.030489 IP (tos 0x0, ttl 63, id 39160, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.P > 192.168.1.R: ICMP echo request, id 12256, seq 309, length 64 ;
18:06:05.030774 IP (tos 0x0, ttl 64, id 32646, offset 0, flags [none], proto ICMP
(1), length 84)
192.168.1.R > 172.25.250.P: ICMP echo reply, id 12256, seq 309, length 64 ;
18:06:05.030786 IP (tos 0x0, ttl 63, id 32646, offset 0, flags [none], proto ICMP
(1), length 84)
172.25.250.N > 192.168.2.F: ICMP echo reply, id 12256, seq 309, length 64 ;
18:06:06.030527 IP (tos 0x0, ttl 64, id 40089, offset 0, flags [DF], proto ICMP (1),
length 84)
192.168.2.F > 172.25.250.N: ICMP echo request, id 12256, seq 310, length 64
18:06:06.030550 IP (tos 0x0, ttl 63, id 40089, offset 0, flags [DF], proto ICMP (1),
length 84)
172.25.250.P > 192.168.1.R: ICMP echo request, id 12256, seq 310, length 64
18:06:06.030880 IP (tos 0x0, ttl 64, id 33260, offset 0, flags [none], proto ICMP
(1), length 84)
192.168.1.R > 172.25.250.P: ICMP echo reply, id 12256, seq 310, length 64
18:06:06.030892 IP (tos 0x0, ttl 63, id 33260, offset 0, flags [none], proto ICMP
(1), length 84)
172.25.250.N > 192.168.2.F: ICMP echo reply, id 12256, seq 310, length 64
...output omitted...
The output indicates the following flow for the sequence ICMP 309 (seq 309):
30. Close the terminal connected to compute-0. Cancel the ping command, and log out of
finance-app1.
Cleanup
From workstation, run the lab network-tracing-net-flows cleanup script to clean up
the resources created in this exercise.
CL210-RHOSP10.1-en-2-20171006 195
196 CL210-RHOSP10.1-en-2-20171006
Objectives
After completing this section, students should be able to:
The following table lists some of the basic tools that administrators can use to troubleshoot their
environment.
Troubleshooting Utilities
Command Purpose
ping Sends packets to network hosts. The ping command is a useful tool for
analyzing network connectivity problems. The results serve as a basic
indicator of network connectivity. The ping command works by sending
traffic to specified destinations, and then reports back whether the attempts
were successful.
ip Manipulates routing tables, network devices and tunnels. The command allows
you to review IP addresses, network devices, namespaces, and tunnels.
traceroute Tracks the route that packets take from an IP network on their way to a given
host.
tcpdump A packet analyzer that allows users to display TCP/IP and other packets being
transmitted or received over a network to which a computer is attached.
ovs-vsctl High-level interface for managing the Open vSwitch database. The command
allows the management of Open vSwitch bridges, ports, tunnels, and patch
ports.
ovs-ofctl Administers OpenFlow switches. It can also show the current state of an
OpenFlow switch, including features, configuration, and table entries.
brctl Manages Linux bridges. The command allows you to manage Linux bridges.
Administrators can retrieve MAC addresses, devices names, and bridge
configurations.
openstack The OpenStack unified CLI. The command can be used to review networks and
networks ports.
neutron The Neutron networking service CLI. The command can be used to review
router ports and network agents.
CL210-RHOSP10.1-en-2-20171006 197
Troubleshooting Scenarios
Troubleshooting procedures help mitigate issues and isolate them. There are some basic
recurring scenarios in OpenStack environments that administrators are likely to face. The
following potential scenarios include basic troubleshooting steps.
Note
Some of the resolution steps outlined in the following scenarios can overlap.
2. Review the bridges on the compute node to ensure that a vEth pair connects the project
bridge to the integration bridge.
3. Review the network namespaces on the network node. Ensure that the router namespace
exists and that routes are properly set.
4. Review the security group that the instance uses to make sure that there is a rule that
allows outgoing traffic.
5. Review the OpenStack Networking configuration to ensure that the mapping between the
physical interfaces and the provider network is properly set.
2. Review the namespaces to ensure that the qdhcp namespace exists and has the TAP device
that the dnsmasq service uses.
3. If the environment uses VLANs, ensure that the switch ports are set in trunk mode or that
the right VLAN ID is set for the port.
4. If a firewall manages the compute node, ensure that there are not any conflicting rules that
prevent the DHCP traffic from passing.
5. Use the neutron command to review the state of the DHCP agent.
2. Review the namespace to make sure that there is a route for the 169.254.169.254/32
address, and that it uses the right network interface. This IP addresses is used in Amazon
EC2 and other cloud computing platforms to distribute metadata to cloud instances. In
OpenStack, a Netfilter rule redirects packets destined to this IP address to the IP address of
the node that runs the metadata service.
3. Ensure that there is a Netfilter rule that redirects the calls from the 169.254.169.254 IP
address to the Nova metadata service.
198 CL210-RHOSP10.1-en-2-20171006
Note
As a general practice, it is not recommended to configure firewalls to block ICMP
packets. Doing so makes troubleshooting more difficult.
The ping command can be run from the instance, the network node, and the compute node. The
-I interface option allows administrators to send packets from the specified interface. The
command allows the validation of multiple layers of the network infrastructure, such as:
• Network switching, which implies proper connectivity between the various network devices.
Results from a test using the ping command can reveal valuable information, depending on which
destination is tested. For example, in the following diagram, the instance VM1 is experiencing
connectivity issues. The possible destinations are numbered and the conclusions drawn from a
successful or failed result are presented below.
CL210-RHOSP10.1-en-2-20171006 199
• If the packet reaches the Internet, it indicates that all the various network points are
working as expected. This includes both the physical and virtual infrastructures.
• If the packet does not reach the Internet, while other servers are able to reach it, it
indicates that an intermediary network point is at fault.
2. Physical router: this is the IP address of the physical router, as configured by the network
administrator to direct the OpenStack internal traffic to the external network.
• If the packet reaches the IP address of the router, it indicates that the underlying switches
are properly set. Note that the packets at this stage do not traverse the router, therefore,
this step cannot be used to determine if there is a routing issue present on the default
gateway.
• If the packet does not reach the router, it indicates a failure in the path between the
instance and the router. The router or the switches could be down, or the gateway could
be improperly set.
3. Physical switch: the physical switch connects the different nodes on the same physical
network.
• If the instance is able to reach an instance on the same subnet, this indicates that the
physical switch allows the packets to pass.
• If the instance is not able to reach an instance on the same subnet, this could indicate
that switch ports do not trunk the required VLANs.
4. OpenStack Networking router: the virtual OpenStack Networking router that directs the
traffic of the instances.
• If the instance is able to reach the virtual router, this indicates that there are rules that
allow the ICMP traffic. This also indicates that the OpenStack Networking network node is
available and properly synchronized with the OpenStack Networking server.
• If the instance is not able to reach the virtual router, this could indicate that the security
group that the instance uses does not allow ICMP packets to pass. This could also indicate
that the L3 agent is down or not properly registered to the OpenStack Networking server.
• If the instance VM1 is able to reach VM2, this indicates that the network interfaces are
properly configured.
• If the instance VM1 is not able to reach VM2, this could indicate that VM2 prevents the
ICMP traffic. This could also indicate that the virtual bridges are not set correctly.
Troubleshooting VLANs
OpenStack Networking trunks VLAN networks through SDN switches. The support of VLAN-
tagged provider networks means that the instances are able to communicate with servers
located in the physical network. To troubleshoot connectivity to a VLAN Provider network,
administrator can use the ping command to reach the IP address of the gateway defined during
the creation of the network.
200 CL210-RHOSP10.1-en-2-20171006
There are many ways to review the mapping of VLAN networks. For example, to discover which
internal VLAN tag is in use for a given external VLAN, administrators can use the ovs-ofctl
command
2. Connect to the compute node and run the ovs-ofctl dump-flows command against the
integration bridge. Review the flow to make sure that there is a matching rule for the VLAN
tag 6. The following output shows that packets received on port ID 1 with the VLAN tag 6
are modified to have the internal VLAN tag 15.
3. Run the ovs-ofctl show br-int command to access the flow table and the ports of the
integration bridge. The following output shows that the port with the ID of 1 is assigned to
the int-br-ex port.
CL210-RHOSP10.1-en-2-20171006 201
4. Use tools such as the ping command throughout the various network layers to detect
potentials connectivity issues. For example, if packets are lost between the compute node
and the controller node, this may indicate network congestion on the equipment that
connects the two nodes.
5. Review the configuration of physical switches to ensure that ports through which the
project traffic passes allow the traffic for network packets tagged with the provider ID.
Usually, ports need to be set in trunk mode.
The neutron agent-list command can be used to review the state of the agent. If an agent
is out of synchronization or not properly registered, this can lead to unexpected results. For
example, if the DHCP agent is not marked as alive, instances will not retrieve any IP address
from the agent.
The following command shows how the neutron agent-list and neutron agent-show
command can be used to retrieve more information about OpenStack Networking agents.
| configurations | { |
| | "agent_mode": "legacy", |
| | "gateway_external_network_id": "", |
| | "handle_internal_only_routers": true, |
| | "routers": 0, |
| | "interfaces": 0, |
| | "floating_ips": 0, |
202 CL210-RHOSP10.1-en-2-20171006
| | "interface_driver":
"neutron.agent.
linux.interface.OVSInterfaceDriver",|
| | "log_agent_heartbeats": false, |
| | "external_network_bridge": "", |
| | "ex_gw_ports": 0 |
| | } |
| created_at | 2017-04-29 01:47:50 |
| description | |
| heartbeat_timestamp | 2017-05-10 00:56:15 |
| host | overcloud-controller-0.localdomain |
| id | cabd8fe5-82e1-467a-b59c-a37f9aa68111 |
| started_at | 2017-05-09 19:22:14 |
| topic | l3_agent |
+---------------------+-----------------------------------------------+
The following output shows the ovs_integration_bridge key with a value of br-int in
the DEFAULT group. The entry is commented out, as this is the default value that OpenStack
Networking defines.
[DEFAULT]
#
# From neutron.base.agent
#
OpenStack Networking configuration files are automatically configured by the undercloud when
deploying both the undercloud and the overcloud. The installers parse values defined in the
undercloud.conf or the Heat template files. However, the tools do not check for environment-
related error, such as a missing connectivity to external networks of misconfigured interfaces.
The following table lists the configuration files for OpenStack Networking services, located in /
etc/neutron:
CL210-RHOSP10.1-en-2-20171006 203
File Purpose
metering_agent.ini Used by the OpenStack Networking metering agent.
neutron.conf Used by the OpenStack Networking server.
conf.d/agent The conf.d directory contains extra directories for each
OpenStack Networking agent. This directory can be used
to configure OpenStack Networking services with custom
user-defined configuration files.
plugins/ml2 The ml2 directory contains a configuration file for each
plugin. For example, the openvswitch_agent.ini
contains the configuration for the Open vSwitch plugin.
plugins/ml2/ml2_conf.ini Defines the configuration for the ML2 framework. In this
file, administrators can set the VLAN ranges or the drivers
to enable.
Most of the options in the configuration files are documented with a short comment explaining
how the option is used by the service. Therefore, administrators can understand what the option
does before setting the value. Consider the ovs_use_veth option in the dhcp_agent.ini,
which provides instructions for using vEth interfaces:
# Uses veth for an OVS interface or not. Support kernels with limited namespace
# support (e.g. RHEL 6.5) so long as ovs_use_veth is set to True. (boolean
# value)
#ovs_use_veth = false
Important
While some options use boolean values, such as true or false, other options require
a value. Even if the text above each value specifies the type (string value or
boolean value), administrators need to understand the option before changing it.
Note
Modified configuration files in the overcloud are reset to their default state when
the overcloud is updated. If custom options are set, administrators must update the
configuration files after each overcloud update.
Administrators are likely to need to troubleshoot the configuration files when some action
related to a service fails. For example, upon creation of a VXLAN network, if OpenStack
Networking complains about a missing provider, administrators need to review the configuration
of ML2. They would then make sure that the type_drivers key in the ml2 section of the /etc/
neutron/plugins/ml2/ml2_conf.ini configuration file has the proper value set.
[ml2]
type_drivers = vxlan
They also have to make sure that the VLAN range in the section dedicated to VLAN is set
correctly. For example:
204 CL210-RHOSP10.1-en-2-20171006
[ml2_type_vlan]
network_vlan_ranges=physnet1:171:172
• Traffic does not reach the external network: administrators should review the bridge
mapping. Traffic that leaves the provider network from the router arrives in the integration
bridge. A patch port between the integration bridge and the external bridge allows the
traffic to pass through the bridge of the provider network and out to the physical network.
Administrators must ensure that there is an interface connected to the Internet which belongs
to the external bridge. The bridge mapping is defined in /etc/neutron/plugins/ml2/
openvswitch_agent.ini:
bridge_mappings = datacentre:br-ex
The bridge mapping configuration must correlate with that of the VLAN range. For the
example given above, the network_vlan_ranges should be set as follows:
network_vlan_ranges = datacentre:1:1000
• Packets in a VLAN network are not passing through a switch ports: administrators
should review the network_vlan_ranges in the /etc/neutron/plugin.ini
configuration file to make sure it matches the VLAN IDs allowed to pass through the switch
ports.
enable_isolated_metadata = True
• Support of overlapping IPs is disabled: overlapping IPs require the usage of Linux
network namespace. To enable the support of overlapping IPs, administrators must set the
allow_overlapping_ips key to True in the /etc/neutron/neutron.conf configuration
file:
# MUST be set to False if OpenStack Networking is being used in conjunction with Nova
# security groups. (boolean value)
# allow_overlapping_ips = True
allow_overlapping_ips=True
CL210-RHOSP10.1-en-2-20171006 205
The following output shows the available project networks. In this example, there is only one
network, internal1.
This mapping allows for further troubleshooting. For example, administrators can review the
routing table for this project network.
The tcpdump command can be used for the namespace. Administrators can, for example, open
another terminal window while trying to reach an external server.
206 CL210-RHOSP10.1-en-2-20171006
The log files use the standard logging levels defined by RFC 5424. The following table lists all log
levels and provides some examples that administrators are likely to encounter:
DEBUG Logs all statements when debug is set to true in the service's configuration file.
INFO Logs informational messages. For example, an API call to the service:
CRITICAL Logs critical errors that prevent a service from properly functioning:
Most of the errors contain explicit statements about the nature of the problem, helping
administrators troubleshoot their environment. However, there are cases where the error that
is logged does not indicate the root cause of the problem. For example, if there is a critical error
CL210-RHOSP10.1-en-2-20171006 207
being logged, this does not say anything about what caused that error. Such an error can be
caused by a firewall rule, or by a congested network. OpenStack services communicate through a
message broker, which provides a resilient communication mechanism between the services. This
allows for most of the services to receive the messages even if there are network glitches.
Log files contain many entries, which makes it difficult to locate errors. Administrators can use
the grep command to filter on a specific log level. The following output indicates a network
timeout while a message was being exchanged between the DHCP agent the OpenStack
Networking server.
Troubleshooting Tips
When troubleshooting, administrators can start by drawing a diagram that details the network
topology. This helps to review the network interfaces being used, and how the servers are
connected to each other. They should also get familiar with most of the troubleshooting tools
presented in the table titled “Troubleshooting Utilities” of this section. When troubleshooting,
administrators can ask questions like:
208 CL210-RHOSP10.1-en-2-20171006
• Can instance be reached with the ping command in the project's network namespace?
Introduction to easyOVS
easyOVS, available on Github, is an open source tool for OpenStack that lists the rules or
validates the configuration of Open vSwitch bridges, Netfilter rules, and DVR configuration.
It can be used to map the IP address of an instance to the virtual port, or the VLAN tags and
namespaces in use. The tool is fully compatible with network namespaces.
The following output lists the Netfilter rules associated to a particular IP address:
The following output shows information related to a port. In the following example, c4493802 is
the first portion of the port UUID that uses the IP address 10.0.0.2.
CL210-RHOSP10.1-en-2-20171006 209
tenant_id: 3a55e7b5f5504649a2dfde7147383d02
extra_dhcp_opts: []
binding:vnic_type: normal
device_owner: compute:az_compute
mac_address: fa:16:3e:94:84:90
fixed_ips: [{u'subnet_id': u'94bf94c0-6568-4520-aee3-d12b5e472128', u'ip_address':
u'10.0.0.4'}]
id: c4493802-4344-42bd-87a6-1b783f88609a
security_groups: [u'7c2b801b-4590-4a1f-9837-1cceb7f6d1d0']
device_id: 9365c842-9228-44a6-88ad-33d7389cda5f
1. Review the security group rules to ensure that, for example, ICMP traffic is allowed.
2. Connect to the network nodes to review the implementation of routers and networks
namespaces.
3. Use the ping command within network namespaces to reach the various network devices,
such as the interface for the router in the internal network.
4. Review the list of OpenStack Networking agents and their associated processes by using the
ps command to make sure that they are running.
References
Further information is available in the Networking Guide for Red Hat OpenStack
Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
easyOVS GitHub
https://github.com/yeasy/easyOVS
easyOVS Launchpad
https://launchpad.net/easyovs
210 CL210-RHOSP10.1-en-2-20171006
In this exercise, you will troubleshoot network connectivity issues in a project network.
Outcomes
You should be able to:
Scenario
Users are complaining that they cannot get to their instances using the floating IPs. A user has
provided an instance to test named research-app1 that can be used to troubleshoot the issue.
Steps
1. From workstation, source the credentials for the developer1 user and review the
environment.
1.1. Source the credentials for the developer1 user located at /home/student/
developer1-research-rc. List the instances in the environment.
1.2. Retrieve the name of the security group that the instance uses.
CL210-RHOSP10.1-en-2-20171006 211
"security_groups": [
{
"name": "default"
}
],
...output omitted...
}
1.3. List the rules for the default security group. Ensure that there is one rule that allows
traffic for SSH connections and one rule for ICMP traffic.
212 CL210-RHOSP10.1-en-2-20171006
1.6. Ensure that the router research-router1 has an IP address defined as a gateway
for the 172.25.250.0/24 network and an interface in the research-network1
network.
2. Retrieve the floating IP assigned to the research-app1 instance and run the ping
command against the floating IP assigned to the instance, 172.25.250.P. The command
should fail.
3. Attempt to connect to the instance as the root at its floating IP. The command should fail.
CL210-RHOSP10.1-en-2-20171006 213
4. Reach the IP address assigned to the router in the provider network, 172.25.250.R.
5. Review the namespaces implementation on controller0. Use the ping command within
the qrouter namespace to reach the router's private IP.
5.1. Retrieve the UUID of the router research-router1. You will compare this UUID with
the one of the qrouter namespace.
5.2. Open another terminal and use the ssh command to log in to controller0 as the
heat-admin user. Review the namespace implementation. Ensure that the qrouter
namespace uses the ID returned by the previous command.
214 CL210-RHOSP10.1-en-2-20171006
The output indicates that there are three devices: the loopback interface, lo, the TAP
device with the IP 172.25.250.R, and 172.25.250.P.
5.4. Within the qrouter namespace, run the ping command against the private IP of the
router, 192.168.1.S.
6. From the first terminal, retrieve the private IP of the research-app1 instance. From the
second terminal, run the ping command against the private IP of the instance IP within the
qrouter namespace.
6.1. From the first terminal, retrieve the private IP of the research-app1 instance.
6.2. From the second terminal, run the ping command in the qrouter namespace against
192.168.1.N. The output indicates that the command fails.
CL210-RHOSP10.1-en-2-20171006 215
7. The previous output that listed the namespace indicated that the qdhcp namespace is
missing. Review the namespaces in controller0 to confirm that the namespace is missing.
8. The qdhcp namespace is created for the DHCP agents. List the running processes on
controller0. Use the grep command to filter dnsmasq processes. The output indicates
that no dnsmasq is running on the server.
9. From the first terminal, source the credentials of the administrative user, architect1,
located at /home/student/architect1-research-rc. List the Neutron agents to
ensure that there is one DHCP agent.
10. List the Neutron ports to ensure that there is one IP assigned to the DHCP agent in the
192.168.1.0/24 network.
The output indicates that there are two ports in the subnet. This indicates that the
research-subnet1 does not run a DHCP server.
11. Update the subnet to run a DHCP server and confirm the updates in the environment.
11.1. Review the subnet properties. Locate the enable_dhcp property and confirm that it
reads False.
216 CL210-RHOSP10.1-en-2-20171006
11.2. Run the openstack subnet set command to update the subnet. The command does
not produce any output.
11.3. Review the updated subnet properties. Locate the enable_dhcp property and confirm
that it reads True.
11.4. From the terminal connected to controller0, rerun the ps command. Ensure that a
dnsmasq is now running.
11.5. From the first terminal, rerun the openstack port list command. Ensure that there
is a third IP in the research-subnet1 network.
CL210-RHOSP10.1-en-2-20171006 217
11.6. From the terminal connected to controller0, list the network namespaces. Ensure
that there is a new namespace called qdhcp.
11.7. List the interfaces in the qdhcp namespace. Confirm that there is an interface with an
IP address of 192.168.1.2.
12. From the first terminal, stop then start the research-app1 instance to reinitialize IP
assignment and cloud-init configuration.
218 CL210-RHOSP10.1-en-2-20171006
13.2.Run the ping command against the floating IP 172.25.250.P until it responds.
13.3. Use ssh to connect to the instance. When finished, exit from the instance.
Cleanup
From workstation, run the lab network-troubleshooting cleanup script to clean up the
resources created in this exercise.
CL210-RHOSP10.1-en-2-20171006 219
In this lab, you will troubleshoot the network connectivity of OpenStack instances.
Outcomes
You should be able to:
Scenario
Cloud users reported issues reaching their instances via their floating IPs. Both ping and ssh
connections time out. You have been tasked with troubleshooting and fixing these issues.
On workstation, run the lab network-review setup command. This script creates the
production project for the operator1 user and creates the /home/student/operator1-
production-rc credentials file. The SSH public key is available at /home/student/
operator1-keypair1.pem. The script deploys the instance production-app1 in the
production project with a floating IP in the provider-172.25.250 network.
Steps
1. As the operator1 user, list the instances present in the environment. The credentials file
for the user is available at /home/student/operator1-production-rc. Ensure that the
instance production-app1 is running and has an IP in the 192.168.1.0/24 network
2. Attempt to reach the instance via its floating IP by using the ping and ssh commands.
Confirm that the commands time out. The private key for the SSH connection is available at
/home/student/operator1-keypair1.pem.
3. Review the security rules for the security group assigned to the instance. Ensure that there
is a rule that authorizes packets sent by the ping command to pass.
5. As the operator1 user, list the routers in the environment. Ensure that production-
router1 is present, has a private network port, and is the gateway for the external network.
220 CL210-RHOSP10.1-en-2-20171006
7. From workstation, use the ssh command to log in to controller0 as the heat-admin
user. List the network namespaces to ensure that there is a namespace for the router and
for the internal network production-network1. Review the UUID of the router and the
UUID of the internal network to make sure they match the UUIDs of the namespaces.
List the interfaces in the network namespace for the internal network. Within the private
network namespace, use the ping command to reach the private IP address of the router.
Run the ping command within the qrouter namespace against the IP assigned as a
gateway to the router. From the tenant network namespace, use the ping command to reach
the private IP of the instance.
8. From controller0, review the bridge mappings configuration. Ensure that the provider
network named datacentre is mapped to the br-ex bridge. Review the configuration
of the Open vSwitch bridge br-int. Ensure that there is a patch port for the connection
between the integration bridge and the external bridge. Retrieve the name of the peer
port for the patch from the integration bridge to the external bridge. Make any necessary
changes.
9. From workstation use the ping command to reach the IP defined as a gateway for the
router and the floating IP associated to the instance. Use the ssh command to log in to the
instance production-app1 as the cloud-user user. The private key is available at /
home/student/operator1-keypair1.pem.
Evaluation
From workstation, run the lab network-review grade command to confirm the success
of this exercise. Correct any reported failures and rerun the command until successful.
Cleanup
From workstation, run the lab network-review cleanup command to clean up this
exercise.
CL210-RHOSP10.1-en-2-20171006 221
Solution
In this lab, you will troubleshoot the network connectivity of OpenStack instances.
Outcomes
You should be able to:
Scenario
Cloud users reported issues reaching their instances via their floating IPs. Both ping and ssh
connections time out. You have been tasked with troubleshooting and fixing these issues.
On workstation, run the lab network-review setup command. This script creates the
production project for the operator1 user and creates the /home/student/operator1-
production-rc credentials file. The SSH public key is available at /home/student/
operator1-keypair1.pem. The script deploys the instance production-app1 in the
production project with a floating IP in the provider-172.25.250 network.
Steps
1. As the operator1 user, list the instances present in the environment. The credentials file
for the user is available at /home/student/operator1-production-rc. Ensure that the
instance production-app1 is running and has an IP in the 192.168.1.0/24 network
1.1. From workstation, source the operator1-production-rc file and list the running
instances.
2. Attempt to reach the instance via its floating IP by using the ping and ssh commands.
Confirm that the commands time out. The private key for the SSH connection is available at
/home/student/operator1-keypair1.pem.
2.1. Run the ping command against the floating IP 172.25.250.P. The command should
fail.
222 CL210-RHOSP10.1-en-2-20171006
2.2. Attempt to connect to the instance using the ssh command. The command should fail.
3. Review the security rules for the security group assigned to the instance. Ensure that there
is a rule that authorizes packets sent by the ping command to pass.
3.1. Retrieve the name of the security group that the instance production-app1 uses.
3.2. List the rules in the default security group. Ensure that there is a rule for ICMP traffic.
The output indicates that there is a rule for the ICMP traffic. This indicates that the
environment requires further troubleshooting.
CL210-RHOSP10.1-en-2-20171006 223
4.1. Source the architect1 credentials. List the networks. Confirm that the
provider-172.25.250 network is present.
4.2. Review the provider-172.25.250 network details, including the network type and
the physical network defined.
5. As the operator1 user, list the routers in the environment. Ensure that production-
router1 is present, has a private network port, and is the gateway for the external network.
5.1. Source the operator1-production-rc credentials file and list the routers in the
environment.
224 CL210-RHOSP10.1-en-2-20171006
5.2. Display the router details. Confirm that the router is the gateway for the external
network provider-172.25.250.
5.3. Use ping to test the IP defined as the router gateway interface. Observe the command
timing out.
The ping test was unable to reach the external gateway interface of the router from an
external host, but the root cause is still unknown, so continue troubleshooting.
CL210-RHOSP10.1-en-2-20171006 225
6. From the compute node, review the network implementation by listing the Linux bridges and
ensure that the ports are properly defined. Ensure that there is one bridge with two ports in
it. The bridge and the port names should be named after the first 10 characters of the port
UUID in the private network for the instance production-app1.
From workstation, use ssh to connect to compute0 as the heat-admin user. Review
the configuration of the Open vSwitch integration bridge. Ensure that the vEth pair, which
has a port associated to the bridge, has another port in the integration bridge. Exit from the
virtual machine.
6.1. From the first terminal, list the network ports. Ensure that the UUID
matches the private IP of the instance. In this example, the UUID is
04b3f285-7183-4673-836b-317d80c27904, which matches the characters displayed
above.
6.2. Use the ssh command to log in to compute0 as the heat-admin user. Use the brctl
command to list the Linux bridges. Ensure that there is a qbr bridge with two ports in it.
The bridge and the ports should be named after the first 10 characters of the port of the
instance in the private network.
6.3. As the root user from the compute0 virtual machine, list the network ports in the
integration bridge, br-int. Ensure that the port of the vEth pair qvo is present in the
integration bridge.
226 CL210-RHOSP10.1-en-2-20171006
7. From workstation, use the ssh command to log in to controller0 as the heat-admin
user. List the network namespaces to ensure that there is a namespace for the router and
for the internal network production-network1. Review the UUID of the router and the
UUID of the internal network to make sure they match the UUIDs of the namespaces.
List the interfaces in the network namespace for the internal network. Within the private
network namespace, use the ping command to reach the private IP address of the router.
Run the ping command within the qrouter namespace against the IP assigned as a
gateway to the router. From the tenant network namespace, use the ping command to reach
the private IP of the instance.
7.1. Use the ssh command to log in to controller0 as the heat-admin user. List the
network namespaces.
7.2. From the previous terminal, retrieve the UUID of the router production-router1.
Ensure that the output matches the qrouter namespace.
7.3. Retrieve the UUID of the private network, production-network1. Ensure that the
output matches the qdhcp namespace.
7.4. Use the neutron command to retrieve the interfaces of the router production-
router1.
CL210-RHOSP10.1-en-2-20171006 227
| id | name | mac_address |
+--------------------------------------+------+-------------------+
| 30fc535c-85a9-4be4-b219-e810deec88d1 | | fa:16:3e:d4:68:d3 |
| bda4e07f-64f4-481d-a0bd-01791c39df92 | | fa:16:3e:90:4f:45 |
+--------------------------------------+------+-------------------+
-------------------------------------------------------------------+
fixed_ips |
-------------------------------------------------------------------+
{"subnet_id": "a4c40acb-f532-4b99-b8e5-d1df14aa50cf",
"ip_address": "192.168.1.R"} |
{"subnet_id": "2b5110fd-213f-45e6-8761-2e4a2bcb1457",
"ip_address": "172.25.250.S"} |
-------------------------------------------------------------------+
7.5. From the terminal connected to the controller, use the ping command within the qdhcp
namespace to reach the private IP of the router.
7.6. Within the router namespace, qrouter, run the ping command against the IP defined
as a gateway in the 172.25.250.0/24 network.
7.8. Use the ping command in the same namespace to reach the private IP of the instance
production-app1.
228 CL210-RHOSP10.1-en-2-20171006
8. From controller0, review the bridge mappings configuration. Ensure that the provider
network named datacentre is mapped to the br-ex bridge. Review the configuration
of the Open vSwitch bridge br-int. Ensure that there is a patch port for the connection
between the integration bridge and the external bridge. Retrieve the name of the peer
port for the patch from the integration bridge to the external bridge. Make any necessary
changes.
8.1. From controller0, as the root user, review the bridge mappings configuration.
Bridge mappings for Open vSwitch are defined in the /etc/neutron/plugins/ml2/
openvswitch_agent.ini configuration file. Ensure that the provider network name,
datacentre, is mapped to the br-ex bridge.
8.2. Review the ports in the integration bridge br-int. Ensure that there is a patch port in
the integration bridge. The output lists phy-br-ex as the peer for the patch
CL210-RHOSP10.1-en-2-20171006 229
type: internal
Port br-int
Interface br-int
type: internal
8.3. List the ports in the external bridge, br-ex. The output indicates that the port phy-br-
ex is absent from the bridge.
8.4. Patch ports are managed by the neutron-openvswitch-agent, which uses the
bridge mappings for Open vSwitch bridges. Restart the neutron-openvswitch-
agent.
8.5. Wait a minute then list the ports in the external bridge, br-ex. Ensure that the patch
port phy-br-ex is present in the external bridge.
9. From workstation use the ping command to reach the IP defined as a gateway for the
router and the floating IP associated to the instance. Use the ssh command to log in to the
instance production-app1 as the cloud-user user. The private key is available at /
home/student/operator1-keypair1.pem.
9.1. Use the ping command to reach the IP of the router defined as a gateway.
230 CL210-RHOSP10.1-en-2-20171006
9.3. Use the ping command to reach the floating IP allocated to the instance.
9.4. Use the ssh command to log in to the instance as the cloud-user user. The private
key is available at /home/student/operator1-keypair1.pem. Exit from the
instance.
Evaluation
From workstation, run the lab network-review grade command to confirm the success
of this exercise. Correct any reported failures and rerun the command until successful.
Cleanup
From workstation, run the lab network-review cleanup command to clean up this
exercise.
CL210-RHOSP10.1-en-2-20171006 231
Summary
In this chapter, you learned:
• OpenStack Networking (Neutron) is the SDN networking project that provides Networking-as-
a-service (NaaS) in virtual environments. It implements traditional networking features such as
subnetting, bridging, VLANs, and more recent technologies, such as VXLAN and GRE tunnels.
• The Modular Layer 2 (ML2) plug-in is a framework that enables the usage of various
technologies. Administrators can interact with Open vSwitch or any vendor technology, such as
Cisco equipments, thanks to the various plug-ins available for OpenStack Networking.
• When troubleshooting, administrators can use a variety of tools, such as ping, ip,
traceroute, and tcpdump.
232 CL210-RHOSP10.1-en-2-20171006
MANAGING RESILIENT
COMPUTE RESOURCES
Overview
Goal Add compute nodes, manage shared storage, and perform
instance live migration.
Objectives • View introspection data, orchestration templates, and
configuration manifests used to build the Overcloud.
CL210-RHOSP10.1-en-2-20171006 233
Objectives
After completing this section, students should be able to:
Red Hat OpenStack Platform director is the undercloud, with components for provision and
managing the infrastructure nodes that will become the overcloud. An undercloud is responsible
for planning overcloud roles, creating the provisioning network configuration and services,
locating and inventorying nodes prior to deployment, and running the workflow service that
facilitates the deployment process. The Red Hat OpenStack Platform director installation comes
complete with sample deployment templates and both command-line and web-based user
interface tools for configuring and monitoring overcloud deployments.
Note
Underclouds and tools for provisioning overclouds are relatively new technologies and
are still evolving. The choices for overcloud design and configuration are as limitless
as the use cases for which they are built. The following demonstration and lecture is
an introduction to undercloud tasks and overcloud preparation, and is not intended
to portray recommended practice for any specific use case. The cloud architecture
presented here is designed to satisfy the technical requirements of this classroom.
Introspecting Nodes
To provision overcloud nodes, the undercloud is configured with a provisioning network and IPMI
access information about the nodes it will manage. The provisioning network is a large-capacity,
dedicated, and isolated network, separate from the normal public network. During deployment,
orchestration will reconfigure nodes' network interfaces with Open vSwitch bridges, which would
cause the deployment process to disconnect if the provisioning and deployed networks shared
the same interface. After deployment, Red Hat OpenStack Platform director will continue to
manage and update the overcloud across this isolated, secure provisioning network, completely
segregated from both external and internal OpenStack traffic.
234 CL210-RHOSP10.1-en-2-20171006
[DEFAULT]
local_ip = 172.25.249.200/24
undercloud_public_vip = 172.25.249.201
undercloud_admin_vip = 172.25.249.202
local_interface = eth0
masquerade_network = 172.25.249.0/24
dhcp_start = 172.25.249.51
dhcp_end = 172.25.249.59
network_cidr = 172.25.249.0/24
network_gateway = 172.25.249.200
inspection_iprange = 172.25.249.150,172.25.249.180
generate_service_certificate = true
View the undercloud's configured network interfaces. The br-ctlplane bridge is the
172.25.249.0 provisioning network; the eth1 interface is the 172.25.250.0 public network.
The provisioning subnet is configured for DHCP. The Networking service has configured
a dnsmasq instance to manage the scope. Verify the subnet with the location of the DNS
nameserver, to be handed out to DHCP clients as a scope option with a default gateway.
Power management in a cloud environment normally uses the IPMI management NIC built into
a server chassis. However, virtual machines do not normally have a lights-out-management
platform interface. Instead, they are controlled by the appropriate virtualization management
software, which connects to the running virtual machine's hypervisor to request power
management actions and events. In this classroom, a Baseboard Management Controller (BMC)
CL210-RHOSP10.1-en-2-20171006 235
emulator is running on the power virtual machine, configured with one unique IP address
per virtual machine node. Upon receiving a valid IPMI request at the correct listener, the BMC
emulator sends the request to the hypervisor, which performs the request on the corresponding
virtual machine.
Define and verify the MAC address, IPMI address, power management user name and password,
for each node to be registered, in the instack configuration file instackenv-initial.json.
This node registration file can be either JSON or YAML format. The following example shows the
instack configuration file in JSON format.
The next step is to register the nodes with the Bare Metal service. The Workflow service manages
this task set, which includes the ability to schedule and monitor multiple tasks and actions.
Single or multiple hosts may be introspected simultaneously. When building new clouds,
performing bulk introspection is common. After an overcloud cloud is operational, it is best to
set a manageable provisioning state on selected nodes, then invoke introspection only on those
selected nodes. Introspection times vary depending on the number of nodes and the throughput
capacity of the provisioning network, because the introspection image must be pushed to each
node during the PXE boot. If introspection appears to not finish, check the Bare Metal services
logs for troubleshooting.
236 CL210-RHOSP10.1-en-2-20171006
+-------------+-------------+--------------------+-------------+
Introspecting Nodes
The following steps outline the process to introspect managed nodes from the undercloud.
5. Upload baremetal and overcloud network boot images to the Image Service.
6. Check baremetal nodes for correct NIC and disk physical configuration.
7. Gather node MAC addresses, IPMI addresses, access user names and passwords.
Orchestrating an Overcloud
The undercloud has obtained sizing and configuration information about each node through
introspection. Nodes can be dynamically assigned to overcloud roles (controller, compute,
ceph-storage, block-storage, or object-storage) by comparing each node to capability
CL210-RHOSP10.1-en-2-20171006 237
conditions set by the cloud administrator. Different roles usually have recognizable sizing
distinctions. In this classroom, the nodes are small-scale virtual machines that could be assigned
automatically, but assigning deployment roles manually is useful in many cases.
Add the correct profile tag to each flavor as a property using the capabilities index. Use the same
tag names when setting a profile on each node.
Add the correct matching profile tag to each node as a property using the capabilities index.
238 CL210-RHOSP10.1-en-2-20171006
[user@undercloud]$ ls -l /usr/share/openstack-tripleo-heat-templates/
-rw-r--r--. 1 root root 808 Jan 2 19:14 all-nodes-validation.yaml
-rw-r--r--. 1 root root 583 Jan 2 19:14 bootstrap-config.yaml
-rw-r--r--. 1 root root 20903 Jan 2 19:14 capabilities-map.yaml
drwxr-xr-x. 5 root root 75 May 19 13:22 ci
-rw-r--r--. 1 root root 681 Jan 2 19:14 default_passwords.yaml
drwxr-xr-x. 3 root root 128 May 19 13:22 deployed-server
drwxr-xr-x. 4 root root 168 May 19 13:22 docker
drwxr-xr-x. 4 root root 4096 May 19 13:22 environments
drwxr-xr-x. 6 root root 73 May 19 13:22 extraconfig
drwxr-xr-x. 2 root root 162 May 19 13:22 firstboot
-rw-r--r--. 1 root root 735 Jan 2 19:14 hosts-config.yaml
-rw-r--r--. 1 root root 325 Jan 2 19:14 j2_excludes.yaml
-rw-r--r--. 1 root root 2594 Jan 2 19:14 net-config-bond.yaml
-rw-r--r--. 1 root root 1895 Jan 2 19:14 net-config-bridge.yaml
-rw-r--r--. 1 root root 2298 Jan 2 19:14 net-config-linux-bridge.yaml
-rw-r--r--. 1 root root 1244 Jan 2 19:14 net-config-noop.yaml
-rw-r--r--. 1 root root 3246 Jan 2 19:14 net-config-static-bridge-with-external-
dhcp.yaml
-rw-r--r--. 1 root root 2838 Jan 2 19:14 net-config-static-bridge.yaml
-rw-r--r--. 1 root root 2545 Jan 2 19:14 net-config-static.yaml
drwxr-xr-x. 5 root root 4096 May 19 13:22 network
-rw-r--r--. 1 root root 25915 Jan 2 19:14 overcloud.j2.yaml
-rw-r--r--. 1 root root 13866 Jan 17 12:44 overcloud-resource-registry-puppet.j2.yaml
drwxr-xr-x. 5 root root 4096 May 19 13:22 puppet
-rw-r--r--. 1 root root 6555 Jan 17 12:44 roles_data.yaml
drwxr-xr-x. 2 root root 26 May 19 13:22 validation-scripts
Recommended practice is to copy this whole directory structure to a new working directory, to
ensure that local customizations are not overwritten by package updates. In this classroom, the
working directory is /home/stack/templates/. The environment subdirectory contains the
sample configuration files to choose features and configurations for this overcloud deployment.
Create a new environment working subdirectory and copy only the needed environment files into
it. Similarly, create a configuration working subdirectory and save any modified template files
into it. The subdirectories are cl210-environment and cl210-configuration.
The classroom configuration includes environment files to build trunked VLANs, statically
configured node IP address, an explicit Ceph server layout and more. The need for 3 NICs per
virtual machine required customizing existing templates, which were copied to the configuration
subdirectory before modification. Browse these files of interest to correlate template settings to
the live configuration:
CL210-RHOSP10.1-en-2-20171006 239
• templates/cl210-environment/30-network-isolation.yaml
• templates/cl210-environment/32-network-environment.yaml
• templates/cl210-configuration/single-nic-vlans/controller.yaml
• templates/cl210-configuration/single-nic-vlans/compute.yaml
• templates/cl210-configuration/single-nic-vlans/ceph-storage.yaml
The final step is to start the deployment, specifying the main working directories for templates
and environment files. Deployment time varies greatly, depending on the number of nodes
being deployed and the features selected. Orchestration processes tasks in dependency order.
Although many tasks may be running on different nodes simultaneously, some tasks must
finish before others can begin. This required structure is organized into a workflow plan, which
manages the whole provisioning orchestration process.
Monitor the orchestration process on the console where the deployment command was invoked.
Orchestration plans that do not complete can be corrected, edited and restarted. The following
text displays when the overcloud stack deployment is complete.
240 CL210-RHOSP10.1-en-2-20171006
On the following page, Figure 6.1: Bare Metal boot disk provisioning visually describes the
procedure for delivering a new boot disk to a node being provisioned. The overcloud-full
image is a working Red Hat Enterprise Linux system with all of the Red Hat OpenStack Platform
and Red Hat Ceph Storage packages already installed but not configured. By pushing the
same overcloud-full image to all nodes, any node could be sent instructions to build any of
the supported deployment roles: Controller, Compute, Ceph-Storage, Image-Storage or Block-
Storage. When the node boots this image for the first time, the image is configured to send
a call back message to the Orchestration service to say that it is ready to be unconfigured.
Orchestration then coordinates the sending and processing of resource instructions and Puppet
invocations that accomplish the remainder of the build and configuration of the node. When
orchestration is complete, the result is a complete server running as one of the deployment roles.
CL210-RHOSP10.1-en-2-20171006 241
242 CL210-RHOSP10.1-en-2-20171006
Orchestrating an Overcloud
The following steps outline the process to orchestrate an overcloud from the undercloud.
2. Assign matching profile tags to specify which nodes will be selected for which flavors.
5. Run the openstack overcloud deploy command. Use the --templates parameter to
specify the template directory. Use the --environment-directory parameter to specify
the environment file directory.
6. Use ssh to connect to each deployed node as the heat-admin user, to verify deployment.
7. Review the network interfaces, bridges, and disks to verify that each is correctly configured.
CL210-RHOSP10.1-en-2-20171006 243
244 CL210-RHOSP10.1-en-2-20171006
References
The Ironic developer documentation page
https://docs.openstack.org/developer/ironic/
CL210-RHOSP10.1-en-2-20171006 245
In this exercise, you will view the results of the deployment tasks that created the overcloud on
your virtual machines. You will verify the configuration and status of the undercloud, then verify
the configuration and status of the overcloud.
Outcomes
You should be able to:
Steps
1. Log in to director and review the environment.
1.1. Use the ssh command to connect to director. Review the environment file for the
stack user. OpenStack environment variables all begin with OS_.
1.2. View the environment file for the stack user. This file is automatically sourced when
the stack user logs in. The OS_AUTH_URL variable in this file defines the Identity
Service endpoint of the undercloud.
246 CL210-RHOSP10.1-en-2-20171006
[DEFAULT]
local_ip = 172.25.249.200/24
undercloud_public_vip = 172.25.249.201
undercloud_admin_vip = 172.25.249.202
local_interface = eth0
masquerade_network = 172.25.249.0/24
dhcp_start = 172.25.249.51
dhcp_end = 172.25.249.59
network_cidr = 172.25.249.0/24
network_gateway = 172.25.249.200
inspection_iprange = 172.25.249.150,172.25.249.180
...output omitted...
2.2. Compare the IP addresses in the configuration file with the IP address assigned to the
virtual machine. Use the ip command to list the devices.
2.3. List the networks configured in the undercloud. If an overcloud is currently deployed,
then approximately six networks are displayed. If the overcloud has been deleted or
has not been deployed, only one network will display. Look for the provisioning network
named ctlplane. This display includes the subnets configured within the networks
listed. You will use the ID for the provisioning network's subnet in the next step.
2.4. Display the subnet for the ctlplane provisioning network using the subnet ID obtained
in the previous step. The allocation_pools field is the DHCP scope, and the
dns_nameservers and gateway_ip fields are DHCP options, for an overcloud node's
provisioning network interface.
CL210-RHOSP10.1-en-2-20171006 247
3. List the services, and their passwords, installed for the undercloud.
3.2. Review the admin and other component service passwords located in the /home/
stack/undercloud-passwords.conf configuration file. You will use various service
passwords in later exercises.
[auth]
undercloud_db_password=eb35dd789280eb196dcbdd1e8e75c1d9f40390f0
undercloud_admin_token=529d7b664276f35d6c51a680e44fd59dfa310327
undercloud_admin_password=96c087815748c87090a92472c61e93f3b0dcd737
undercloud_glance_password=6abcec10454bfeec6948518dd3de6885977f6b65
undercloud_heat_encryption_key=45152043171b30610cb490bb40bff303
undercloud_heat_password=a0b7070cd8d83e59633092f76a6e0507f85916ed
undercloud_neutron_password=3a19afd3302615263c43ca22704625db3aa71e3f
undercloud_nova_password=d59c86b9f2359d6e4e19d59bd5c60a0cdf429834
undercloud_ironic_password=260f5ab5bd24adc54597ea2b6ea94fa6c5aae326
...output omitted...
4. View the configuration used to prepare for deploying the overcloud and the resulting
overcloud nodes.
248 CL210-RHOSP10.1-en-2-20171006
4.2. List the provisioned nodes in the current overcloud environment. This command lists
the nodes that were created using the configuration file shown in the previous step.
4.3. List the servers in the environment. Review the status and the IP address of the nodes.
This command lists the overcloud servers built on the bare-metal nodes defined in the
previous step. The IP address assigned to the nodes are reachable from the director
virtual machine.
5. Using the controller0 node and the control role as an example, review the settings that
define how a node is selected to be built for a server role.
5.1. List the flavors created for each server role in the environment. These flavors were
created to define the sizing for each deployment server role. It is recommended practice
that flavors are named for the roles for which they are used. However, properties set on
a flavor, not the flavor's name, determine its use.
CL210-RHOSP10.1-en-2-20171006 249
| ceph-storage | 2048 | 10 | 0 | 1 |
| compute | 4096 | 20 | 0 | 1 |
| swift-storage | 2048 | 10 | 0 | 1 |
| control | 4096 | 30 | 0 | 1 |
| baremetal | 4096 | 20 | 0 | 1 |
| block-storage | 2048 | 10 | 0 | 1 |
+---------------+------+------+-----------+-------+
5.2. Review the control flavor's properties by running the openstack flavor show
command. The capabilities settings include the profile='control' tag.
When this flavor is specified, it will only work with nodes that match these requested
capabilities, including the profile='control' tag.
5.3. Review the controller0 node's properties field. The capabilities settings
include the same profile:control tag as defined on the control flavor. When a
flavor is specified during deployment, only a node that matches a flavor's requested
capabilities is eligible to be selected for deployment.
6. Review the template and environment files that were used to define the deployment
configuration.
6.1. Locate the environment files used for the overcloud deployment.
250 CL210-RHOSP10.1-en-2-20171006
6.2. Locate the configuration files used for the overcloud deployment.
...output omitted...
# Port assignments for the controller role
OS::TripleO::Controller::Ports::ExternalPort: ../network/ports/external.yaml
OS::TripleO::Controller::Ports::InternalApiPort: ../network/ports/internal...
OS::TripleO::Controller::Ports::StoragePort: ../network/ports/storage.yaml
OS::TripleO::Controller::Ports::StorageMgmtPort: ../network/ports/storage_...
OS::TripleO::Controller::Ports::TenantPort: ../network/ports/tenant.yaml
...output omitted...
...output omitted...
# Internal API - used for private OpenStack services traffic
InternalApiNetCidr: '172.24.1.0/24'
InternalApiAllocationPools: [{'start': '172.24.1.60','end': '172.24.1.99'}]
InternalApiNetworkVlanID: 10
InternalApiVirtualFixedIPs: [{'ip_address':'172.24.1.50'}]
RedisVirtualFixedIPs: [{'ip_address':'172.24.1.51'}]
...output omitted...
...output omitted...
type: vlan
# mtu: 9000
vlan_id: {get_param: InternalApiNetworkVlanID}
addresses:
-
ip_netmask: {get_param: InternalApiIpSubnet}
...output omitted...
CL210-RHOSP10.1-en-2-20171006 251
...output omitted...
type: vlan
# mtu: 9000
vlan_id: {get_param: InternalApiNetworkVlanID}
addresses:
-
ip_netmask: {get_param: InternalApiIpSubnet}
...output omitted...
...output omitted...
type: vlan
# mtu: 9000
vlan_id: {get_param: StorageNetworkVlanID}
addresses:
-
ip_netmask: {get_param: StorageIpSubnet}
...output omitted...
7.2. Source the overcloudrc authentication environment file. The OS_AUTH_URL variable
in this file defines the Identity Service endpoint of the overcloud.
252 CL210-RHOSP10.1-en-2-20171006
7.4. Review general overcloud configuration. This listing contains default settings, formats,
and core component version numbers. The currently empty network field displays
networks created, although none yet exist in this new overcloud.
CL210-RHOSP10.1-en-2-20171006 253
Objective
After completing this section, students should be able to add a compute node to the overcloud
using the undercloud.
Scaling
An important feature of cloud computing is the ability to rapidly scale up or down an
infrastructure. Administrators can provision their infrastructure with nodes that can fulfill
multiple roles (for example, computing, storage, or controller) and can be pre-installed with a
base operating system. Administrators can then integrate these nodes into their environment
as needed. Cloud computing provides services that are automatically able to take into account
the increase or decrease in load usage, and adequately warn the administrators in case the
environment needs to be scaled. In traditional computing models, it is often required to manually
install, configure, and integrate new servers into existing environments, thus requiring extra
time and effort to provision the node. Autoscaling is one of the main benefits that the cloud-
computing model provides, as it permits, for example, quick response to load spikes.
Red Hat OpenStack Platform director, with the Heat orchestration service, implements scaling
features. Administrators can rerun the command used to deploy an overcloud, increasing
or decreasing the roles based on infrastructure requirements. For example, the overcloud
environment can scale by adding two additional compute nodes, bringing the total to three.
Red Hat OpenStack Platform director then automatically reviews the current configuration
and reconfigures the available services to provision the OpenStack environment with the three
compute nodes.
Templates are used to create stacks, which are collections of resources (for example, instances,
floating IPs, volumes, security groups, or users). The Orchestration service offers access to all
the undercloud core services through a single modular template, with additional orchestration
capabilities such as autoscaling and basic high availability.
254 CL210-RHOSP10.1-en-2-20171006
Note
Administrators must give a special role to OpenStack users that allows them to manage
stacks. The role name is defined by the heat_stack_user_role variable in /etc/
heat/heat.conf. The default role name is heat_stack_user.
A Heat template is written using YAML syntax, and has three major sections:
The openstack command supports stack management, including commands shown in the
following table.
Troubleshooting the Heat orchestration service requires administrators to understand how the
underlying infrastructure has been configured, since Heat makes use of these resources in order
to create the stack. For example, when creating an instance, the Orchestration service is invoked
the same way users would the Compute service API, through Identity authentication. When a
network port is requested by the OpenStack Networking service, the requests are also made to
the API through the Identity service. This means the infrastructure needs to be configured and
working. Administrators must ensure that the resources requested through Heat can also be
requested manually. Orchestration troubleshooting includes:
• Ensuring all undercloud services that the templates refer to are configured and running.
After the troubleshooting has completed, administrators can review the configuration of
Orchestration services:
CL210-RHOSP10.1-en-2-20171006 255
Orchestration Terminology
The following table lists the terms that administrators should be familiar with to properly
administer their cloud with the Orchestration service.
256 CL210-RHOSP10.1-en-2-20171006
Term Definition
Heat Orchestration Template (HOT) A Heat Orchestration Template (HOT) is a YAML-based
configuration file that administrators write and pass
to the Orchestration service API to deploy their cloud
infrastructure. HOT is a template format designed to
replace the legacy Orchestration CloudFormation-
compatible format (CFN).
CloudFormation template (CFN) CFN is a legacy template format used by Amazon AWS
services. The heat-api-cfn service manages this
legacy format.
Orchestration template parameters Orchestration template parameters are settings passed
to the Orchestration service that provide a way to
customize a stack. They are defined in a Heat template
file, with optional default values used when values are
not passed. These are defined in the parameters
section of a template.
Orchestration template resources Orchestration template resources are the specific
objects that are created and configured as part of
a stack. OpenStack contains a set of core resources
that span all components. These are defined in the
resources section of a Heat template.
Orchestration template outputs Orchestration template outputs are values, defined
in a Heat template file, that are returned by the
Orchestration service after a stack is created. Users can
access these values either through the Orchestration
service API or client tools. These are defined in the
output section of a template.
All the information about a node is retrieved through a process called introspection. After
introspection has completed, it is ready to be used to deploy overcloud services. The Bare Metal
service makes use of the different services included in the undercloud, to deploy the overcloud
services. The Bare Metal service supports different drivers to run the introspection process,
based on what the environment hardware supports (for example, IPMI, DRAC).
The following table includes the most common openstack baremetal commands for
provisioning a new node in Red Hat OpenStack Platform director.
CL210-RHOSP10.1-en-2-20171006 257
{
"nodes": [
{
"pm_user": "admin",
"arch": "x86_64",
"name": "compute1",
"pm_addr": "172.25.249.112",
"pm_password": "password",
"pm_type": "pxe_ipmitool",
"mac": [
"52:54:00:00:f9:0c"
],
"cpu": "2",
"memory": "6144",
"disk": "40"
}
]
}
• pm_user, pm_password: Power management server user name and password used to access it
The following are optional fields used when the introspection has completed:
258 CL210-RHOSP10.1-en-2-20171006
"capabilities": "profile:compute,boot_option:local"
There are various Ironic drivers provided for power management, which include:
• pxe_ssh: Driver which can be used in a virtual environment. It uses virtualized environment
commands to power on and power off the VMs over SSH.
• fake_pxe: All power management for this driver requires manual intervention. It can be used as
a fallback for unusual or older hardware.
To import the instackenv.json file, use the openstack baremetal import command.
When overcloud nodes are booted into the introspection stage, they are provided with the
discovery images by the ironic-inspector service located under /httpboot. The import
process assigns each node the bm_deploy_kernel and bm_deploy_ramdisk images automatically.
Manual use of openstack baremetal configure boot is no longer needed. In the following
output, verify that deploy_kernel and deploy_ramdisk are assigned to the new nodes.
To introspect the hardware attributes of all registered nodes, run the command openstack
baremetal introspection bulk start.
CL210-RHOSP10.1-en-2-20171006 259
To limit the introspection to nodes that are in the manageable provision state, use the --
all-manageable --provide options with the openstack baremetal introspection
command.
Monitor and troubleshoot the introspection process with the following command.
[user@undercloud]$ openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 \
baremetal
The capabilities, such as boot_option for the flavors, must be set to the boot_mode for the flavor,
and the profile defines the node profile to use with the flavor.
Deploy Overcloud
Red Hat OpenStack Platform undercloud uses the Orchestration service to orchestrate the
deployment of the overcloud with a stack definition. These Orchestration templates can be
customized to suit various deployment patterns. The stack templates define all resources
required for the deployment, and maintain the dependencies for these resource deployments.
• ceph-storage: A node that runs the Ceph OSDs. Monitors run on the controller node.
• --templates: Must specify the template location. If no location is specified, the default template
location of /usr/share/openstack-tripleo-heat-templates is used.
260 CL210-RHOSP10.1-en-2-20171006
Note
The --compute-scale deployment option is deprecated in Red Hat OpenStack
Platform 10 (Newton) in favor of using an environment file. Administrators can define
number of nodes to scale out in an environment file and supply that environment file
to overcloud deployment stack. All the --*-scale deployment parameters, which
includes --compute-scale,--swift-storage-scale, --block-storage-
scale, and --ceph-storage-scale, will be discontinued in a future Red Hat
OpenStack Platform release.
• The information is saved in the Ironic database and used during the introspection phase.
Introspection:
• The Bare Metal service uses PXE (Preboot eXecution Environment) to boot nodes over a
network.
• The Bare Metal service connects to the registered nodes to gather more details about the
hardware resources.
• The discovery kernel and ramdisk images are used during this process.
Deployment:
• The stack user deploys overcloud nodes, allocating resources and nodes that were discovered
during the introspection phase.
• Hardware profiles and Orchestration templates are used during this phase.
• The type of power management, such as IPMI or PXE over SSH, being used. The various
power management drivers supported by the Bare Metal service can be listed using ironic
driver-list.
CL210-RHOSP10.1-en-2-20171006 261
All of this information can be passed using a JSON (JavaScript Object Notation) file or using a
CSV file. The openstack baremetal import command imports this file into the Bare Metal
service database.
For the introspection and discovery of overcloud nodes, the Bare Metal service uses PXE (Preboot
eXecution Environment), provided by the undercloud. The dnsmasq service is used to provide
DHCP and PXE capabilities to the Bare Metal service. The PXE discovery images are delivered
over HTTP. Prior to introspection, the registered nodes must have a valid kernel and ramdisk
assigned to them, and every node for introspection has the following settings:
The openstack baremetal introspection command is used to start the introspection, and
--all-manageable --provide informs the Bare Metal service to perform introspection on
nodes that are in the manageable provision state.
The undercloud uses the baremetal hard-coded flavor, which must be set as the default flavor
for any unused roles; otherwise, the role-specific flavors are used.
[user@undercloud]$ openstack flavor create --id auto --ram 6144 --disk 38 --vcpus 2 \
baremetal
[user@undercloud]$ openstack flavor create --id auto --ram 6144 --disk 38 --vcpus 2 \
compute
The undercloud performs automated role matching to apply appropriate hardware for each flavor
of node. When nodes are on identical hardware and no flavors are created, the deployment roles
262 CL210-RHOSP10.1-en-2-20171006
are randomly chosen for each node. Manual tagging can also be used to tie the deployment role
to a node.
To use these deployment profiles, they need to be associated to the respective flavors using the
capabilities:profile property. The capabilities:boot_option property is required to
set the boot mode for flavors.
2. On the undercloud node, create an instackenv.json file containing definitions for the
additional compute node.
3. Import the instackenv.json file using the command openstack baremetal import.
4. Assign boot images to the additional compute node using the command openstack
baremetal configure boot.
5. Set the provisioning state to manageable using the command openstack baremetal
node manage.
7. After introspection has completed successfully, update the node profile to use the compute
role.
8. Deploy the overcloud with the command openstack overcloud deploy --templates
~/templates --environment-directory ~/templates/cl210-environment.
References
Further information is available for Adding Additional Nodes in the Director Installation
and Usage guide for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
CL210-RHOSP10.1-en-2-20171006 263
Resources
Files: http://materials.example.com/instackenv-onenode.json
Outcomes
You should be able to add a compute node to the overcloud.
Steps
1. Use SSH to connect to director as the user stack and source the stackrc credentials
file.
1.1. From workstation, use SSH to connect to director as the user stack and source
the stackrc credentials file.
264 CL210-RHOSP10.1-en-2-20171006
2.3. Import instackenv-onenode.json into the Bare Metal service using the command
openstack baremetal import, and ensure that the node has been properly
imported.
2.4. Prior to starting introspection, set the provisioning state for compute1 to manageable.
4.1. Update the node profile for compute1 to assign it the compute profile.
...output omitted...
CL210-RHOSP10.1-en-2-20171006 265
ComputeCount: 2
...output omitted...
6.1. Deploy the overcloud, to scale out compute node by adding one more node.
Cleanup
From workstation, run the lab resilience-scaling-nodes cleanup command to clean
up this exercise.
266 CL210-RHOSP10.1-en-2-20171006
Objectives
After completing this section, students should be able to:
Introduction to Migration
Migration is the process of moving a server instance from one compute node to another. In this
and the following section of this chapter, the major lecture topic is live migration. Live migration
relocates a server instance (virtual machine) from one compute node hypervisor to another
while the server application is running, offering uninterrupted service. This section discusses
the method known as block-based live migration and the next section discusses an alternative
method known as shared storage live migration. First, however, it is important to define what
is meant by migration, because one of the primary design goals of cloud architecture is to
eliminate the need for legacy server management techniques, including many former use cases
for migration.
A major feature of cloud-designed applications is that they are resilient, scalable, distributed
and stateless; commonly implemented in what is known as a microservices architecture. A
microservice application scales, relocates, and self-repairs by deploying itself as replicated
components instantiated as virtual machines or containers across many compute nodes, cells,
zones, and regions. Applications designed this way share live state information such that the loss
of any single component instance has little or no affect on the application or the service being
offered. By definition, microservice cloud applications do not need to perform live migration. If a
microservices component is to be relocated for any reason, a new component is instantiated in
the desired location from the appropriate component image. The component joins the existing
application and begins work while the unwanted component instance is simply terminated.
Legacy applications, also referred to as enterprise applications, may also include resilient,
scalable, and distributed features, but are distinguished by their need to act stateful. Enterprise
application server instances cannot be terminated and discarded without losing application state
or data, or corrupting data storage structures. Such applications must be migrated to relocate
from one compute node to another.
The simplest form of migration is cold migration. In legacy computing, a virtual machine is
shut down, preserving configuration and state on its assigned disks, then rebooted on another
hypervisor or in another data center after relocating the physical or virtual disks. This same
concept remains available in OpenStack today. Cold migration is accomplished by taking an
instance snapshot on a running, quiesced instance, then saving the snapshot as an image. As
with legacy computing, the image is relocated and used to boot a new instance. The original
instance remains in service, but the state transferred to the new instance only matches that
which existed when the snapshot was taken.
CL210-RHOSP10.1-en-2-20171006 267
What about the disks on the source virtual machine, such as the root disk, extra ephemeral disks,
swap disk, and persistent volumes? These disks also must be transferred and attached to the
destination virtual machine. The method used, block based or shared storage, is directly related
to the overcloud storage architecture that is implemented. With the shared storage method, if
both the source and destination compute nodes connect to and have sufficient access privileges
for the same shared storage locations containing the migrating instance's disks, then no physical
disk movement occurs. The source compute node stops using the disks while the destination
compute node takes over disk activity.
Block-based live migration is the alternate method used when shared storage is not
implemented. When the source and destination compute nodes do not share common-access
storage, the root, ephemeral, swap and persistent volumes must be transferred to the storage
location used by the destination compute node. When performance is a primary focus, block-
based live migration should be avoided. Instead, implement shared storage structures across
common networks where live migration occurs regularly.
• Original, proof of concept installations, such as default Packstack installations, used the
Compute service (Nova) to manage non-persistent root disks, ephemeral disks, and swap disks.
Instance virtual disks managed by the Compute service are found in subdirectories in /var/
lib/nova/instances on each compute node's own disk.
• Different compute nodes, even when operating in the same networks, can be connected
to different storage arrays, Red Hat Virtualization data stores, or other back end storage
subsystems.
• Instances can be deployed using the Block Storage service volume-based transient or
persistent disks instead of using the Compute service ephemeral storage, but compute nodes
configured with different back ends require block-based migration.
268 CL210-RHOSP10.1-en-2-20171006
• Both source and destination compute nodes must be located in the same subnet.
• All controller and compute nodes must have consistent name resolution for all other nodes.
• The UID and GID of the nova and libvirt users must be identical on all compute nodes.
• Compute nodes must be using KVM with libvirt, which is expected when using Red Hat
OpenStack Platform. The KVM with libvirt platform has the best coverage of features and
stability for live migration.
• The permissions and system access of local directories must be consistent across all nodes.
• Consistent multipath device naming must be used on both the source and destination compute
nodes. Instances expect to resolve multipath device names similarly in both locations.
CL210-RHOSP10.1-en-2-20171006 269
• Update the access URI string in /etc/nova/nova.conf to match the strategy. Use
"live_migration_uri=qemu+ACCESSTYPE://USER@%s/system", where ACCESSTYPE is
tcp or tls and USER is nova or use %s, which defaults to the root user.
• Ensure that OpenStack utilities and the VNC proxy are installed, using:
• Add the nova group to the /etc/group file with a line like the following:
nova:x:162:nova
• Add the nova user to the /etc/passwd file with a line like the following
• Allow the nova user access to the compute node's ephemeral directory:
• Add rules for TCP, TLS, and the ephemeral ports to the firewall:
If using TCP:
If using TLS:
user="root"
group="root"
270 CL210-RHOSP10.1-en-2-20171006
vnc_listen="0.0.0.0"
• Ensure that OpenStack utilities and the VNC proxy are installed, using:
CL210-RHOSP10.1-en-2-20171006 271
Troubleshooting
When migrations fail or appear to take too long, check the activity in the compute service log
files on both the source and the destination compute node:
• /var/log/nova/nova-api.log
• /var/log/nova/nova-compute.log
• /var/log/nova/nova-conductor.log
• /var/log/nova/nova-scheduler.log
1. Ensure that the overcloud has more than one compute node added.
2. Configure block storage and live migration on all compute nodes. Ensure that SELinux is set
to permissive mode, and appropriate iptables rules are configured.
3. On the controller node, update the vncserver_listen variable to listen for all
connections in the /etc/nova/nova.conf file.
5. Using the administrator credentials, live migrate the instance to the destination compute
node using the openstack server migrate command.
6. Verify that the instance got migrated successfully to the destination compute node.
References
Further information is available for Configuring Block Migration in the Migrating
Instances guide for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
Further information is available for Migrating Live Instances in the Migrating Instances
guide for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
272 CL210-RHOSP10.1-en-2-20171006
In this exercise, you will migrate a live instance using block storage.
Outcomes
You should be able to:
This guided exercise requires two compute nodes, as configured in a previous guided exercise
which added compute1 to the overcloud. If you did not successfully complete that guided
exercise, have reset your overcloud systems, or for any reason have an overcloud with only a
single installed compute node, you must first run the command lab resilience-block-
storage add-compute on workstation. The command's add-compute task adds the
compute1 node to the overcloud, taking between 40 and 90 minutes to complete.
Important
As described above, only run this command if you still need to install a second compute
node. If you already have two functioning compute nodes, skip this task and continue
with the setup task.
After the add-compute task has completed successfuly, continue with the setup task
in the following paragraph.
Start with the setup task if you have two functioning compute nodes, either from having
completed the previous overcloud scaling guided exercise, or by completing the extra add-
compute task described above. On workstation, run the lab resilience-block-storage
setup command. This command verifies the OpenStack environment and creates the project
resources used in this exercise.
Steps
1. Configure compute0 to use block-based migration. Later in this exercise, you will repeat
these steps on compute1.
1.1. Log into compute0 as heat-admin and switch to the root user.
CL210-RHOSP10.1-en-2-20171006 273
[root@overcloud-compute-0 ~]#
user="root"
group="root"
vnc_listen="0.0.0.0"
1.4. The classroom overcloud deployment uses Ceph as shared storage by default.
Demonstrating block-based migration requires disabling shared storage for the
Compute service. Enable the compute0 node to store virtual disk images, associated
with running instances, locally under /var/lib/nova/instances. Edit the /etc/
nova/nova.conf file to set the images_type variable to default.
2.1. Log into compute1 as heat-admin and switch to the root user.
274 CL210-RHOSP10.1-en-2-20171006
user="root"
group="root"
vnc_listen="0.0.0.0"
2.4. The classroom overcloud deployment uses Ceph as shared storage by default.
Demonstrating block-based migration requires disabling shared storage for the
Compute service. Enable the compute0 node to store virtual disk images, associated
with running instances, locally under /var/lib/nova/instances. Edit the /etc/
nova/nova.conf file to set the images_type variable to default.
CL210-RHOSP10.1-en-2-20171006 275
3.1. Log into controller0 as heat-admin and switch to the root user.
Instance Attributes
Attribute Value
flavor m1.web
key pair developer1-keypair1
network finance-network1
image rhel7
security group finance-web
name finance-web1
5. List the available floating IP addresses, then allocate one to the finance-web1 instance.
5.1. List the floating IPs. An available one has the Port attribute set to None.
276 CL210-RHOSP10.1-en-2-20171006
6.1. To perform live migration, the user developer1 must have the admin role assigned for
the project finance. Assign the admin role to developer1 for the project finance.
The developer1 user may already have been assigned the admin role.
6.3. Prior to migration, ensure the destination compute node has sufficient resources to host
the instance. In this example, the current server instance location node is overcloud-
compute-1.localdomain, and the destination to check is overcloud-compute-0.
Modify the command to reflect your actual source and destination compute nodes.
Estimate whether the total minus the amount used now is sufficient.
CL210-RHOSP10.1-en-2-20171006 277
"Project": "(used_now)",
"Disk GB": 0,
"Host": "overcloud-compute-0.localdomain",
"CPU": 0,
"Memory MB": 2048
},
{
"Project": "(used_max)",
"Disk GB": 0,
"Host": "overcloud-compute-0.localdomain",
"CPU": 0,
"Memory MB": 0
}
6.4. Migrate the instance finance-web1 to a new compute node. In this example, the
instance is migrated from overcloud-compute-1 to overcloud-compute-0. Your
scenario may require migrating in the reverse direction.
7. Use the command openstack server show to verify that the migration of finance-
web1 using block storage migration was successful. The compute node displayed should be
the destination node.
Cleanup
From workstation, run the lab resilience-block-storage cleanup command to clean
up this exercise.
278 CL210-RHOSP10.1-en-2-20171006
If you intend to repeat either of the two Live Migration Guided Exercises in this
chapter that require two compute nodes, do not reset your virtual machines. Because
your overcloud currently has two functioning compute nodes, you may repeat the
Live Migration Guided Exercises without running the add-compute task that was
required to build the second compute node.
CL210-RHOSP10.1-en-2-20171006 279
Objectives
After completing this section, students should be able to:
Live migration using block storage uses a similar process as shared storage live migration.
However, with block storage, disk content is copied before the memory content is transferred,
making live migration with shared storage quicker and more efficient.
280 CL210-RHOSP10.1-en-2-20171006
Parameters Description
live_migration_flag = Migration flags to be set for live migration.
VIR_MIGRATE_UNDEFINE_SOURCE,
VIR_MIGRATE_PEER2PEER,
VIR_MIGRATE_LIVE,
VIR_MIGRATE_TUNNELLED
live_migration_progress_timeout = Time to wait, in seconds, for migration to
150 make progress in transferring data before
aborting the operation.
live_migration_uri = qemu+tcp://%s/ Migration target URI.
system
• Update the access URI string in /etc/nova/nova.conf to match the strategy. Use
"live_migration_uri=qemu+ACCESSTYPE://USER@%s/system", where ACCESSTYPE is
tcp or tls and USER is nova or use '%s', which defaults to the root user.
CL210-RHOSP10.1-en-2-20171006 281
2. Add rules for TCP, TLS, and the ephemeral ports to the firewall.
3. Update qemu with three settings in /etc/libvirt/qemu.conf for user, group, and
vnc_listen.
2. Ensure the destination compute node has sufficient resources to host the instance.
3. Migrate the instance from node one compute node to another by using the openstack
server migrate command.
282 CL210-RHOSP10.1-en-2-20171006
Troubleshooting
When migration fails or takes too long, check the activity in the Compute service log files on both
the source and the destination compute nodes:
• /var/log/nova/nova-api.log
• /var/log/nova/nova-compute.log
• /var/log/nova/nova-conductor.log
• /var/log/nova/nova-scheduler.log
References
Further information is available for Configuring NFS Shared Storage in the Migrating
Instances guide for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
Further information is available for Migrating Live Instances in the Migrating Instances
guide for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
CL210-RHOSP10.1-en-2-20171006 283
In this exercise, you will configure shared storage and migrate a live instance.
Outcomes
You should be able to:
This guided exercise requires two compute nodes, as configured in a previous guided exercise
which added compute1 to the overcloud. If you did not successfully complete that guided
exercise, have reset your overcloud systems, or for any reason have an overcloud with only a
single installed compute node, you must first run the command lab resilience-shared-
storage add-compute on workstation. The command's add-compute task adds the
compute1 node to the overcloud, taking between 40 and 90 minutes to complete.
Important
As described above, only run this command if you still need to install a second compute
node. If you already have two functioning compute nodes, skip this task and continue
with the setup task.
After the add-compute task has completed successfuly, continue with the setup task
in the following paragraph.
Start with the setup task if you have two functioning compute nodes, either from having
completed the previous overcloud scaling guided exercise, or by completing the extra add-
compute task described above. On workstation, run the lab resilience-shared-
storage setup command. This command verifies the OpenStack environment and creates the
project resources used in this exercise.
Steps
1. Configure controller0 for shared storage.
1.1. Log into controller0 as heat-admin and switch to the root user.
284 CL210-RHOSP10.1-en-2-20171006
/var/lib/nova/instances 172.25.250.2(rw,sync,fsid=0,no_root_squash)
/var/lib/nova/instances 172.25.250.12(rw,sync,fsid=0,no_root_squash)
2. Configure compute0 to use shared storage. Later in this exercise, you will repeat these
steps on compute1.
2.1. Log into compute0 as heat-admin and switch to the root user.
CL210-RHOSP10.1-en-2-20171006 285
user="root"
group="root"
vnc_listen="0.0.0.0"
2.6. Configure /etc/nova/nova.conf virtual disk storage and other properties for live
migration. Use the nfs mounted /var/lib/nova/instances directory to store
instance virtual disks.
286 CL210-RHOSP10.1-en-2-20171006
3.1. Log into compute1 as heat-admin and switch to the root user.
user="root"
group="root"
vnc_listen="0.0.0.0"
3.6. Configure /etc/nova/nova.conf virtual disk storage and other properties for live
migration. Use the nfs mounted /var/lib/nova/instances directory to store
instance virtual disks.
CL210-RHOSP10.1-en-2-20171006 287
Instance Attributes
Attribute Value
flavor m1.web
key pair developer1-keypair1
network finance-network1
image rhel7
security group finance-web
name finance-web2
5. List the available floating IP addresses, then allocate one to the finance-web2 instance.
5.1. List the floating IPs. An available one has the Port attribute set to None.
288 CL210-RHOSP10.1-en-2-20171006
6.1. To perform live migration, the developer1 user must have the admin role assigned for
the project finance. Assign the admin role to developer1 for the project finance.
The developer1 user may already have been assigned the admin role.
6.3. Prior to migration, ensure the destination compute node has sufficient resources to host
the instance. In this example, the current server instance location node is overcloud-
compute-1.localdomain, and the destination to check is overcloud-compute-0.
Modify the command to reflect your actual source and destination compute nodes.
Estimate whether the total, minus the amount used now, is sufficient.
CL210-RHOSP10.1-en-2-20171006 289
6.4. Migrate the instance finance-web1 to a new compute node. In this example, the
instance is migrated from overcloud-compute-1 to overcloud-compute-0. Your
scenario may require migrating in the opposite direction.
7. Use the command openstack server show to verify that finance-web2 is now running
on the other compute node.
Cleanup
From workstation, run the lab resilience-shared-storage cleanup command to
clean up this exercise.
If you intend to repeat either of the two Live Migration Guided Exercises in this
chapter that require two compute nodes, do not reset your virtual machines. Because
your overcloud currently has two functioning compute nodes, you may repeat the
Live Migration Guided Exercises without running the add-compute task that was
required to build the second compute node.
290 CL210-RHOSP10.1-en-2-20171006
In this lab, you will add compute nodes, manage shared storage, and perform instance live
migration.
Resources
Files: http://materials.example.com/instackenv-onenode.json
Outcomes
You should be able to:
On workstation, run the lab resilience-review setup command. The script ensures
that OpenStack services are running and the environment has been properly configured for the
lab.
Steps
1. Use SSH to connect to director as the user stack and source the stackrc credentials
file.
4. Update the node profile for compute1 to use the compute profile.
CL210-RHOSP10.1-en-2-20171006 291
11. Launch an instance named production1 as the user operator1 using the following
attributes:
Instance Attributes
Attribute Value
flavor m1.web
key pair operator1-keypair1
network production-network1
image rhel7
security group production
name production1
12. List the available floating IP addresses, then allocate one to the production1 instance.
13. Ensure that the production1 instance is accessible by logging in to the instance as the
user cloud-user, then log out of the instance.
15. Verify that the migration of production1 using shared storage was successful.
Evaluation
From workstation, run the lab resilience-review grade command to confirm the
success of this exercise. Correct any reported failures and rerun the command until successful.
Cleanup
Save any data that you would like to keep from the virtual machines. After the data is saved,
reset all of the overcloud virtual machines and the director virtual machine. In the physical
classroom environment, reset all of the overcloud virtual machines and the director virtual
machine using the rht-vmctl command. In the online environment, reset and start the director
and overcloud nodes.
292 CL210-RHOSP10.1-en-2-20171006
Solution
In this lab, you will add compute nodes, manage shared storage, and perform instance live
migration.
Resources
Files: http://materials.example.com/instackenv-onenode.json
Outcomes
You should be able to:
On workstation, run the lab resilience-review setup command. The script ensures
that OpenStack services are running and the environment has been properly configured for the
lab.
Steps
1. Use SSH to connect to director as the user stack and source the stackrc credentials
file.
CL210-RHOSP10.1-en-2-20171006 293
"arch": "x86_64",
"name": "compute1",
"pm_addr": "172.25.249.112",
"pm_password": "password",
"pm_type": "pxe_ipmitool",
"mac": [
"52:54:00:00:f9:0c"
],
"cpu": "2",
"memory": "6144",
"disk": "40"
}
]
}
2.4. Prior to starting introspection, set the provisioning state for compute1 to manageable.
4. Update the node profile for compute1 to use the compute profile.
294 CL210-RHOSP10.1-en-2-20171006
ComputeCount: 2
8.1. Log into controller0 as heat-admin and switch to the root user.
/var/lib/nova/instances 172.25.250.2(rw,sync,fsid=0,no_root_squash)
/var/lib/nova/instances 172.25.250.12(rw,sync,fsid=0,no_root_squash)
CL210-RHOSP10.1-en-2-20171006 295
9.1. Log into compute0 as heat-admin and switch to the root user.
296 CL210-RHOSP10.1-en-2-20171006
user="root"
group="root"
vnc_listen="0.0.0.0"
9.6. Configure /etc/nova/nova.conf virtual disk storage and other properties for live
migration. Use the nfs mounted /var/lib/nova/instances directory to store
instance virtual disks.
10.1. Log into compute1 as heat-admin and switch to the root user.
CL210-RHOSP10.1-en-2-20171006 297
user="root"
group="root"
vnc_listen="0.0.0.0"
10.6.Configure /etc/nova/nova.conf virtual disk storage and other properties for live
migration. Use the nfs mounted /var/lib/nova/instances directory to store
instance virtual disks.
11. Launch an instance named production1 as the user operator1 using the following
attributes:
Instance Attributes
Attribute Value
flavor m1.web
key pair operator1-keypair1
network production-network1
image rhel7
security group production
name production1
298 CL210-RHOSP10.1-en-2-20171006
12. List the available floating IP addresses, then allocate one to the production1 instance.
12.1. List the floating IPs. An available one has the Port attribute set to None.
13. Ensure that the production1 instance is accessible by logging in to the instance as the
user cloud-user, then log out of the instance.
14.1. To perform live migration, the user operator1 must have the admin role assigned
for the project production. Assign the admin role to operator1 for the project
production.
CL210-RHOSP10.1-en-2-20171006 299
14.3.Prior to migration, ensure compute1 has sufficient resources to host the instance. The
example below uses compute1, however you may need to use compute0. The compute
node should contain 2 VCPUs, a 56 GB disk, and 2048 MBs of available RAM.
14.4.Migrate the instance production1 using shared storage. In the example below, the
instance is migrated from compute0 to compute1, but you may need to migrate the
instance from compute1 to compute0.
15. Verify that the migration of production1 using shared storage was successful.
15.1. Verify that the migration of production1 using shared storage was successful. The
example below displays compute1, but your output may display compute0.
300 CL210-RHOSP10.1-en-2-20171006
"OS-EXT-SRV-ATTR:hypervisor_hostname": "overcloud-compute-1.localdomain",
Evaluation
From workstation, run the lab resilience-review grade command to confirm the
success of this exercise. Correct any reported failures and rerun the command until successful.
Cleanup
Save any data that you would like to keep from the virtual machines. After the data is saved,
reset all of the overcloud virtual machines and the director virtual machine. In the physical
classroom environment, reset all of the overcloud virtual machines and the director virtual
machine using the rht-vmctl command. In the online environment, reset and start the director
and overcloud nodes.
CL210-RHOSP10.1-en-2-20171006 301
Summary
In this chapter, you learned:
• The Red Hat OpenStack Platform Bare Metal provisioning service, Ironic, supports the
provisioning of both virtual and physical machines to be used for the overcloud deployment.
• Red Hat OpenStack Platform director (undercloud) uses the Orchestration service (Heat) to
orchestrate the deployment of the overcloud with a stack definition.
• Low level system information, such as CPU count, memory, disk space, and network interfaces
of a node is retrieved through a process called introspection.
• Block-based live migration is the alternate method used when shared storage is not
implemented.
• When migrating using shared storage, the instance's memory content must be transferred
faster than memory pages are written to the source instance.
• When using block-based live migration, disk content is copied before memory content is
transferred, which makes shared storage live migration quicker and more efficient.
302 CL210-RHOSP10.1-en-2-20171006
TROUBLESHOOTING
OPENSTACK ISSUES
Overview
Goal Holistically diagnose and troubleshoot OpenStack issues.
Objectives • Diagnose and troubleshoot instance launch issues on a
compute node.
CL210-RHOSP10.1-en-2-20171006 303
Objectives
After completing this section, students should be able to diagnose and troubleshoot instance
launch issues on a compute node.
The scheduling is based on the data retrieved from the compute nodes, and is supported by the
Compute scheduler component. This data includes the hardware resources currently available
in the compute node, like the available memory or the number of CPUs. The Nova compute
component, which runs on each compute node, captures this data. This component uses the
RabbitMQ messaging service to connect to the Compute service core components deployed
on the controller node. The Nova compute component also gathers together all the required
resources to launch an instance. This task also includes the scheduling of the instance in the
hypervisor running on the compute node.
In addition to the RabbitMQ messaging service, Compute also uses the MariaDB service to store
its configuration settings. The communication with both RabbitMQ and MariaDB is handled by the
Compute conductor component, running on the controller node.
The log files for Compute components are in the /var/log/nova directory on both the
controller node and the compute node. Each Compute component logs their events to a different
log file. The Nova compute component logs to the /var/log/nova/compute.log file in the
compute node. The Compute components running on the controller node log to the /var/log/
nova directory on that node.
Compute service commands provide additional visibility on the status of the different Compute
components on each node. This status can help troubleshooting issues created by other
auxiliary services used by Compute components, such as RabbitMQ or MariaDB. The openstack
compute service list command displays the hosts where the Compute components are
running on the controller and compute nodes as follows:
304 CL210-RHOSP10.1-en-2-20171006
| nova-scheduler | overcloud-controller-0.localdomain |
| nova-conductor | overcloud-controller-0.localdomain |
| nova-compute | overcloud-compute-0.localdomain |
+------------------+------------------------------------+
This command also shows the status, state, and last update of the Compute components, as
follows:
[user@demo]$ openstack compute service list -c Binary -c Status -c State -c "Updated At"
+------------------+---------+-------+----------------------------+
| Binary | Status | State | Updated At |
+------------------+---------+-------+----------------------------+
| nova-consoleauth | enabled | up | 2017-06-16T19:38:38.000000 |
| nova-scheduler | enabled | up | 2017-06-16T19:38:39.000000 |
| nova-conductor | enabled | up | 2017-06-16T19:38:35.000000 |
| nova-compute | enabled | up | 2017-06-16T19:38:39.000000 |
+------------------+---------+-------+----------------------------+
The previous output shows the node where each Compute component is deployed in the Host
field, the status of the component in the Status field, and the state of the component in
the State field. The Status field shows whether the Compute component is enabled or
disabled. The previous command is used to detect issues related to RabbitMQ. A RabbitMQ
unavailability issue is indicated when all the Nova Compute components are down.
Note
The openstack compute service list command requires admin credentials.
A Compute component can be enabled or disabled using the openstack compute service
command. This command is useful, for example, when a compute node has to be put under
maintenance, as follows:
When the compute node maintenance finishes, the compute node can be enabled again, as
follows:
CL210-RHOSP10.1-en-2-20171006 305
All Compute components use the /etc/nova/nova.conf file as their configuration file. This
applies both for Compute components running on a controller node and a compute node. That
configuration file contains configuration settings for the different Compute components, and also
for connecting those to the back-end services. For example, the messaging-related settings are
identified by the rabbit prefix (for RabbitMQ).
In the compute node, other Nova Compute settings can be configured, like the settings related
to the ratio between the physical and virtual resources provided by the compute node. The
following settings specify this ratio:
For example, specifying a ratio of 1.5 will allow cloud users to use 1.5 times as many virtual CPUs
as physical CPUs that are available.
Some of the filters applied by Compute scheduler when using the filter-based algorithm are:
• The RamFilter filter identifies the hosts with enough RAM memory to deploy the instance.
• The ComputeFilter filter identifies the compute nodes available to deploy the instance.
Note
The Compute scheduler component supports the usage of custom scheduling
algorithms.
306 CL210-RHOSP10.1-en-2-20171006
The usage of auxiliary services to connect the different components, like RabbitMQ or MariaDB,
can cause issues affecting the OpenStack compute service availability. The OpenStack compute
service supports a different hierarchy based on cells. This hierarchy groups compute nodes into
cells. Each cell has all the Compute components running except for the Compute API component,
which runs on a top-level node. This configuration uses the nova-cells service to select the
cell to deploy a new instance. The default OpenStack compute service configuration does not
support cells.
• A failure in the messaging service connecting the Nova compute service with the Compute
scheduler service.
• Lack of resources, for example CPU or RAM, on the available compute nodes.
Those issues usually raise a no valid host issue at the Compute conductor logs because
the Compute conductor and scheduler services cannot find a suitable Nova compute service to
deploy the instance.
This error can also be related to the lack of resources on the available compute nodes. The
current resources available in the compute nodes running on the Red Hat OpenStack Platform
environment can be retrieved using the openstack host list and openstack host show
commands as follows.
CL210-RHOSP10.1-en-2-20171006 307
Note
If there is an instance deployed on a compute node, the openstack host show
command also shows the usage of CPU, memory, and disk for that instance.
The Compute conductor log file also includes the messages related to issues caused by those
auxiliary services. For example, the following message in the Compute conductor log file
indicates that the RabbitMQ service is not available:
The following message indicates that the MariaDB service is not available:
References
Further information is available in the Logging, Monitoring, and Troubleshooting Guide
for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
308 CL210-RHOSP10.1-en-2-20171006
In this exercise, you will fix an issue with the Nova compute service that prevents it from
launching instances. Finally, you will verify that the fix was correctly applied by launching an
instance.
Outcomes
You should be able to troubleshoot and fix an issue in the Nova compute service.
Steps
1. Launch an instance named finance-web1 using the rhel7 image, the m1.web flavor, the
finance-network1 network, the finance-web security group, and the developer1-
keypair1 key pair. These resources were all created by the setup script. The instance
deployment will return an error.
CL210-RHOSP10.1-en-2-20171006 309
+---------------+---------------------+---------------+
| ID | Name | Subnets |
+---------------+---------------------+---------------+
| b0b7(...)0db4 | finance-network1 | a29f(...)855e |
... output omitted ...
1.6. Verify that the developer1-keypair1 key pair, and its associated file located at /
home/student/developer1-keypair1.pem are available.
1.7. Launch an instance named finance-web1 using the rhel7 image, the m1.web
flavor, the finance-network1 network, the finance-web security group, and the
developer1-keypair1 key pair. The instance deployment will return an error.
1.8. Verify the status of the finance-web1 instance. The instance status will be ERROR.
2. Verify on which host the Nova scheduler and Nova conductor services are running. You will
need to load the admin credentials located at the /home/student/admin-rc file.
310 CL210-RHOSP10.1-en-2-20171006
2.2. Verify in which host the Nova scheduler and Nova conductor services are running. Both
services are running in controller0.
3. Review the logs for the Compute scheduler and conductor services in controller0. Find
the issue related to a no valid host found for the finance-web1 instance in the Compute
conductor log file located at /var/log/nova/nova-conductor.log. Find also the issue
related to no hosts found by the compute filter in the Compute scheduler log file located at /
var/log/nova/nova-scheduler.log
3.3. Locate the log message in the Compute conductor log file, which sets the finance-
web1 instance's status to error, since no valid host is available to deploy the instance.
The log file shows the instance ID.
3.4. Locate the log message, in the Nova scheduler file, which returns zero hosts for the
compute filter. When done, log out of the root account.
CL210-RHOSP10.1-en-2-20171006 311
4.2. List the Compute services. The nova-compute service running on compute0 is
disabled.
5.2. Verify that the Nova compute service has been correctly enabled on compute0. When
done, log out from the controller node.
6. Launch the finance-web1 instance again from workstation using the developer1 user
credentials. Use the rhel7 image, the m1.web flavor, the finance-network1 network, the
finance-web security group, and the developer1-keypair1 key pair. The instance will
be deployed without errors. You will need to delete the previous instance deployment with
an error status before deploying the new instance.
6.2. Delete the previous finance-web1 instance which deployment issued an error.
312 CL210-RHOSP10.1-en-2-20171006
6.4. Launch the finance-web1 instance again, using the rhel7 image, the m1.web
flavor, the finance-network1 network, the finance-web security group, and the
developer1-keypair1 key pair.
6.5. Verify the status of the finance-web1 instance. The instance status will be ACTIVE. It
may take some time for the instance's status to became ACTIVE.
Cleanup
From workstation, run the lab troubleshooting-compute-nodes cleanup script to
clean up this exercise.
CL210-RHOSP10.1-en-2-20171006 313
Objectives
After completing this section, students should be able to diagnose and troubleshoot the Identity
and Messaging services.
The Keystone identity service, like other OpenStack services, has three endpoints associated
to it. Those endpoints are the public endpoint, the admin endpoint, and the internal endpoint.
The public endpoint, by default bound to port TCP/5000, provides the API functionality required
for an external user to use Keystone authentication. This endpoint is usually the one used as
the authentication URL provided to cloud users. A user's machine needs to have access to the
TCP/5000 port on the machine where the Keystone identity service is running to authenticate in
the Red Hat OpenStack Platform environment. The Keystone identity service usually runs on the
controller node.
The admin endpoint provides additional functionality to the public endpoint. The other Red Hat
OpenStack Platform services use the internal endpoint to run authentication and authorization
queries on the Keystone identity service. The openstack catalog show identity
command displays the list of endpoints available for the user credentials.
In the previous output, each endpoint uses a different IP address based on the availability
required for each of those endpoints. The HAProxy service manages all of these IP addresses.
This service runs on the controller node. The HAProxy configuration file includes two services
to manage the three endpoints' IP addresses: keystone_admin and keystone_public.
Both services include two IP addresses, one internal and one external. For example, the
keystone_public service serves the public endpoint using both an internal IP address and an
external IP address:
314 CL210-RHOSP10.1-en-2-20171006
In the previous definition for the keystone_public service, the first IP address 172.24.1.50,
is configured with the internal IP address. This IP address is used by other OpenStack services
for user authentication and authorization, made possible by the Keystone identity service.
The second IP address configured for the keystone_public service, 172.25.250.50, is
configured with the external IP address. Cloud users use this IP address in their authorization
URL.
The Keystone identity service runs on top of the httpd service. Issues in Keystone are usually
related to the configuration or availability of either the HAProxy or httpd service. If the httpd
service is not available, the following error message is displayed:
When a component wants to send a message to another component, the component places that
message in a queue. Both a user and a password are required to send the message to that queue.
All Red Hat OpenStack Platform services use the guest user to log into RabbitMQ.
Pacemaker manages the RabbitMQ service as a resource. The name for the Pacemaker resource
is rabbitmq. An issue with RabbitMQ availability usually means a blocked request for the cloud
user.
The status of the RabbitMQ service can be obtained using the rabbitmqctl cluster_status
command. This command displays basic information about the RabbitMQ cluster status.
Additional information, like the IP address where RabbitMQ is listening, is available using the
rabbitmqctl status command.
CL210-RHOSP10.1-en-2-20171006 315
[{nodes,[{disc,['rabbit@overcloud-controller-0']}]},
{running_nodes,['rabbit@overcloud-controller-0']},
{cluster_name,<<"rabbit@overcloud-controller-0.localdomain">>},
...output omitted...
{memory,[{total,257256704},
{connection_readers,824456},
{connection_writers,232456},
{connection_channels,1002976},
{connection_other,2633224},
{queue_procs,3842568},
{queue_slave_procs,0},
...output omitted...
{listeners,[{clustering,25672,"::"},{amqp,5672,"172.24.1.1"}]},
...output omitted...
The status of the Pacemaker resource for RabbitMQ can be viewed using the pcs status
command. This command shows the status and any error reports of all the resources configured
in the Pacemaker cluster.
In case of failure of the rabbitmq resource, the resource can be restarted using the pcs
resource cleanup and the pcs resource debug-start as follows:
316 CL210-RHOSP10.1-en-2-20171006
References
Further information is available in the Logging, Monitoring, and Troubleshooting Guide
for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
CL210-RHOSP10.1-en-2-20171006 317
In this exercise, you will fix an issue with the authentication and messaging services.
Outcomes
You should be able to:
Steps
1. Create a 1 GB volume named finance-volume1 using developer1 user credentials. The
command will raise an issue.
1.2. Create a 1 GB volume named finance-volume1. This command raises a service
unavailable issue.
2. Verify that the IP address used in the authentication URL of the developer1 user
credentials file is the same one configured as a virtual IP in the HAProxy service for the
keystone_public service. The HAProxy service runs in controller0.
2.1. Find the authentication URL in the developer1 user credentials file.
318 CL210-RHOSP10.1-en-2-20171006
2.3. Find the virtual IP address configured in the HAProxy service for the
keystone_public service.
2.5. Verify the status for the httpd service. The httpd service is inactive.
3. Start the httpd service. It may take some time for the httpd service to be started.
3.2. Verify that the httpd service is active. When done, log out from the controller node.
CL210-RHOSP10.1-en-2-20171006 319
5.2. Verify that the log file for the Keystone identity service reports that the RabbitMQ
service in unreachable.
6. Verify that the root cause for the RabbitMQ cluster unavailability is that the rabbitmq
Pacemaker resource is disabled. When done, enable the rabbitmq Pacemaker resource.
6.1. Verify that the root cause for the RabbitMQ cluster unavailability is that the rabbitmq
Pacemaker resource is disabled.
320 CL210-RHOSP10.1-en-2-20171006
Cleanup
From workstation, run the lab troubleshooting-authentication cleanup script to
clean up this exercise.
CL210-RHOSP10.1-en-2-20171006 321
Objectives
After completing this section, students should be able to diagnose and troubleshoot the
OpenStack networking, image, and volume services.
Networking
This section discusses the different methods, commands, procedures and log files you can use to
troubleshoot OpenStack networking issues.
Unreachable Instances
Problem: You have created an instance but are unable to assign it a floating IP.
This problem can occur when the network is not setup correctly. If a router is not set as the
gateway for the external network, then users will not be able to assign a floating IP address to
an instance. Use the neutron router-gateway-set command to set the router as a gateway
for the external network. Then use the openstack server add floating ip command to
assign a floating IP address to the instance.
Note
Floating IPs can be created even if the router is not connected to the external gateway
but when the user attempts to associate a floating IP address with an instance, an
error will display.
If a router is not set as the gateway for the external network, then users will not be able to assign
a floating IP address to an instance.
Use the neutron router-gateway-set command to set the router as a gateway for the
external network.
322 CL210-RHOSP10.1-en-2-20171006
Use the openstack server add floating ip command to assign a floating IP address to
the instance.
Use the openstack server list command to verify that a floating IP address has been
associated with the instance.
Problem: Check that a security group has been assigned to the instance and that a rule has
been added to allow SSH traffic. SSH rules are not included by default.
Verify that a security group has been assigned to the instance and that it has a rule allowing
SSH. By default, it does not. The rule with the Port Range 22:22, should be associated to the
same security group than the instance. Verify this by comparing the IDs.
CL210-RHOSP10.1-en-2-20171006 323
This problem can also occur if the internal network was attached to the router after the
instance was created. In this situation, the instance is not able to contact the metadata service at
boot, therefore the key is not added to the authorized_keys for the cloud-user user.
This can be verified by checking the /var/log/cloud-init.log log file on the instance itself.
Alternatively, check the contents of /home/cloud-user/.ssh/authorized-keys. You can
gain access to the instance via Horizon.
In this situation, there is no option but to delete the instance, attach the subnet to the router, and
re-create the instance.
Problem: A key pair was not assigned to the instance at creation. SSH will not be possible. In
this scenario, the instance must be destroyed and re-created and a key pair assigned at creation.
324 CL210-RHOSP10.1-en-2-20171006
Images
The Glance image service stores images and metadata. Images can be created by users and
uploaded to the Image service. The Glance image service has a RESTful API that allows users to
query the metadata of an image, as well as obtaining the actual image.
Logging
The Image service has two logging files. Their use can be configured by altering the [DEFAULT]
section of the /etc/glance/glance.api configuration file. In this file, you can dictate
where and how logs should be stored, which storage method should be used, and its specific
configuration.
You can also configure the Glance image service size limit. Use the image_size_cap=SIZE in
the [DEFAULT] section of the file. You can also specify a storage capacity per user by setting
the user_storage_quota=SIZE parameter in the [DEFAULT] section.
Managing Images
When creating a new image, a user can choose to protect that image from deletion with the --
protected option. This prevents an image from being deleted even by the administrator. It must
be unprotected first, then deleted.
Volumes
Ceph
The OpenStack block storage service can use Ceph as a storage back end. Each volume created
in the block storage service has an associated RBD image in Ceph. The name of the RBD image is
the ID of the block storage volume.
The OpenStack block storage service requires a user and a pool in Ceph in order to use it. The
user is openstack, the same user configured for other services using Ceph as their back
end, like the OpenStack image service. The undercloud also creates a dedicated Ceph pool for
the block storage services, named volumes. The volumes pool contains all the RBD images
associated to volumes. These settings are included in the /etc/cinder/cinder.conf configuration
file.
CL210-RHOSP10.1-en-2-20171006 325
rbd_user=openstack
...output omitted...
Permissions within Ceph are known as capabilities, and are granted by daemon type, such as
MON or OSD. Three capabilities are available within Ceph: read (r) to view, write (w) to modify,
and execute (x) to execute extended object classes. All daemon types support these three
capabilities. For the OSD daemon type, permissions can be restricted to one or more pools, for
example osd 'allow rwx pool=rbd, allow rx pool=mydata'. If no pool is specified, the
permission is granted on all existing pools. The openstack user has capabilities on all the pools
used by OpenStack services.
The openstack user requires read, write, and execute capabilities in both the volumes and
the images pools to be used by the OpenStack block storage service. The images pool is the
dedicated pool for the OpenStack image service.
In order to attach a volume, it must be in the available state. Any other state results in an
error message. It can happen that a volume is stuck in a detaching state. The state can be
altered by an admin user.
If you try to delete a volume and it fails, you can forcefully delete that volume using the --force
option.
Incorrect volume configurations cause the most common block storage errors. Consult the
Cinder block storage service log files in case of error.
Log Files
326 CL210-RHOSP10.1-en-2-20171006
The Block Storage service api.log is useful in determining whether the error is due to an
endpoint or connectivity error. That is, if you try to create a volume and it fails, then the
api.log is the one you should review. If the create request was received by the Block Storage
service, then you can verify the request in this api.log log file. Assuming the request is logged
in the api.log but there are no errors, check the volume.log for errors that may have
occurred during the create request.
For Cinder Block Storage services to function properly, it must be configured to use the
RabbitMQ messaging service. All Block Storage configuration can be found in the /etc/
cinder/cinder.conf configuration file, stored on the controller node. The default
rabbit_userid is guest. If that user is wrongly configured and the Block Storage services
are restarted, RabbitMQ will not respond to Block Storage service requests. Any volume created
during that period results in a status of ERROR. Any volume with a status of ERROR must be
deleted and re-created once the Cinder Block Storage service has been restarted and is running
properly.
Verify that both the RabbitMQ Cluster and the rabbitmq-clone Pacemaker resource
are available. If both resources are available the problem could be found in the cinder.conf
configuration file. Check that all usernames, passwords, IP addresses and URLs in the /etc/
cinder/cinder.conf configuration file are correct.
CL210-RHOSP10.1-en-2-20171006 327
8. Verify the Ceph back end configuration for the Cinder volume service.
9. Verify the capabilities configured for the Cinder volume service user in Ceph.
References
Further information is available in the Logging, Monitoring, and Troubleshooting Guide
for Red Hat OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
328 CL210-RHOSP10.1-en-2-20171006
In this exercise, you will fix an issue related to image requirements. You will also fix an issue with
the accessibility of the metadata service. Finally, you will fix an issue with the Ceph back end for
the OpenStack Block Storage service.
Outcomes
You should be able to:
Steps
1. Launch an instance named finance-web1. Use the rhel7 image, the finance-web
security group, the developer1-keypair1 key pair, the m1.lite flavor, and the
finance-network1 network. The instance's deployment will fail because the flavor does
not meet the image's minimal requirements.
CL210-RHOSP10.1-en-2-20171006 329
+---------------+-------------+------------------------+---------------+
| 0cb6(...)5c7e | finance-web | finance-web | 3f73(...)d660 |
...output omitted...
1.4. Verify that the developer1-keypair1 key pair, and its associated key file located at /
home/student/developer1-keypair1.pem are available.
1.7. Create an instance named finance-web1. Use the rhel7 image, the finance-web
security group, the developer1-keypair1 key pair, the m1.lite flavor, and the
finance-network1 network. The instance's deployment will fail because the flavor
does not meet the image's minimal requirements.
2. Verify the rhel7 image requirements for memory and disk, and the m1.lite flavor
specifications.
2.1. Verify the rhel7 image requirements for both memory and disk. The minimum disk
required is 10 GB. The minimum memory required is 2048 MB.
330 CL210-RHOSP10.1-en-2-20171006
| min_disk | 10 |
| min_ram | 2048 |
| name | rhel7 |
...output omitted...
2.2. Verify the m1.lite flavor specifications. The disk and memory specifications for the
m1.lite flavor do not meet the rhel7 image requirements.
3. Verify that the m1.web flavor meets the rhel7 image requirements. Launch an instance
named finance-web1. Use the rhel7 image, the finance-web security group, the
developer1-keypair1 key pair, the m1.web flavor, and the finance-network1 network.
The instance's deployment will be successful.
3.1. Verify that the m1.web flavor meets the rhel7 image requirements.
3.2. Launch an instance named finance-web1. Use the rhel7 image, the finance-
web security group, the developer1-keypair1 key pair, the m1.web flavor, and the
finance-network1 network.
CL210-RHOSP10.1-en-2-20171006 331
4. Attach an available floating IP to the finance-web1. The floating IP will not be attached
because the external network is not reachable from the internal network.
4.2. Attach the previous floating IP to the finance-web1. The floating IP will not be
attached because the external network is not reachable from the internal network.
5. Fix the previous issue by adding the finance-subnet1 subnetwork to the finance-
router1 router.
5.2. Verify the current subnetworks added to the finance-router1 router. No output will
display because the subnetwork has not been added.
332 CL210-RHOSP10.1-en-2-20171006
5.4. Verify that the finance-subnet1 subnetwork has been correctly added to the
finance-router1 router
6. Attach the available floating IP to the finance-web1 instance. When done, log in to
the finance-web1 instance as the cloud-user user, using the /home/student/
developer1-keypair1.pem key file. Even though the floating IP address is attached to
the finance-web1 instance, logging in to the instance will fail. This issue will be resolved in
an upcoming step in this exercise.
6.2. Log in to the finance-web1 instance as the cloud-user user, using the /home/
student/developer1-keypair1.pem key file.
7. Verify that the instance is not able to contact the metadata service at boot time. The
metadata service is not reachable because the finance-subnet1 was not connected to
the finance-router1 router when the finance-web1 instance was created. This is the
root cause for the previous issue because the key is not added to the authorized_keys
for the cloud-user user.
7.2. Open Firefox, and navigate to the finance-web1 instance's console URL.
CL210-RHOSP10.1-en-2-20171006 333
7.3. Log in to the finance-web1 instance's console as the root user, using redhat as a
password.
7.4. Verify that the authorized_keys file for the cloud-user is empty. No key has been
injected by cloud-init during the instance's boot process.
7.5. Verify in the cloud-init log file, located at /var/log/cloud-init.log, that the
finance-web1 instance cannot reach the metadata service during its boot process.
8.4. Log in to the finance-web1 instance as the cloud-user user, using the /home/
student/developer1-keypair1.pem key file.
8.5. Verify that the authorized_keys file for the cloud-user user has had a key injected
into it. When done, log out from the instance.
334 CL210-RHOSP10.1-en-2-20171006
9.2. Verify the status of the volume finance-volume1. The volume's status will be error.
10. Confirm the reason that the finance-volume1 volume was not correctly created. It is
because no valid host was found by the Block Storage scheduler service.
10.2.Verify that the Block Storage scheduler log file, located at /var/log/cinder/
scheduler.log, reports a no valid host issue.
11. Verify that the Block Storage volume service's status is up to discard any issue related to
RabbitMQ.
11.2. Verify that the Block Storage volume service's status is up.
CL210-RHOSP10.1-en-2-20171006 335
+------------------+---------+-------+
| cinder-volume | enabled | up |
...output omitted...
+------------------+---------+-------+
12. Verify that the Block Storage service is configured to use the openstack user, and
the volumes pool. When done, verify that the volume creation error is related to the
permissions of the openstack user in Ceph. This user needs read, write and execute
capabilities on the volumes pools.
12.1. Verify that the block storage service is configured to use the openstack user, and the
volumes pool.
12.4.Verify that the openstack user has no capabilities on the volumes pool.
13. Fix the issue by adding read, write, and execute capabilities to the openstack user on the
volumes pool.
13.1. Add the read, write, and execute capabilities to the openstack user on the volumes
pool. Unfortunately, you cannot simply add to the list, you must retype it entirely.
Important
Please note that the line starting with osd must be entered as a single line.
336 CL210-RHOSP10.1-en-2-20171006
13.2.Verify that the openstack user's capabilities has been correctly updated. When done,
log out from the Ceph node.
14. On workstation, try to create again a 1 GB volume, named finance-volume1. The volume
creation will be successful. You need to delete the failed finance-volume1 volume.
14.3.Verify that the finance-volume1 volume has been correctly created. The volume
status should show available, if status is error, please ensure permissions were set
correctly in the previous step.
Cleanup
From workstation, run the lab troubleshooting-services cleanup script to clean up
this exercise.
CL210-RHOSP10.1-en-2-20171006 337
338 CL210-RHOSP10.1-en-2-20171006
In this lab, you will find and fix issues in the OpenStack environment. You will solve problems in
the areas of authentication, networking, compute nodes, and security. Finally, you will launch an
instance and ensure that everything is working as it should.
Outcomes
You should be able to:
Steps
1. As the operator1 user, remove the existing image called production-rhel7. The
operator1-production-rc file can be found in student's home directory on
workstation. Troubleshoot any problems.
2. Source the admin-rc credential file, then run lab troubleshooting-review break to
set up the next part of the lab exercise.
4. Create a new server instance named production-web1. Use the m1.web flavor, the
operator1-keypair1 key pair, the production-network1 network, the production-
web security group, and the rhel7 image. This action will fail. Troubleshoot any issues and
fix the problem.
5. Create a floating IP address and assign it to the instance. Troubleshoot any issues and fix the
problem.
CL210-RHOSP10.1-en-2-20171006 339
6. Access the instance using SSH. An error will occur. Troubleshoot any issues and fix the
problem.
7. Create a volume named production-volume1, size 1 GB. Verify the volume status.
Use the admin user's Identity service rc file on controller0 at /home/heat-admin/
overcloudrc. Troubleshoot any issues and fix the problem.
Evaluation
On workstation, run the lab troubleshooting-review grade command to confirm
success of this exercise.
Cleanup
From workstation, run the lab troubleshooting-review cleanup script to clean up this
exercise.
340 CL210-RHOSP10.1-en-2-20171006
Solution
In this lab, you will find and fix issues in the OpenStack environment. You will solve problems in
the areas of authentication, networking, compute nodes, and security. Finally, you will launch an
instance and ensure that everything is working as it should.
Outcomes
You should be able to:
Steps
1. As the operator1 user, remove the existing image called production-rhel7. The
operator1-production-rc file can be found in student's home directory on
workstation. Troubleshoot any problems.
1.3. The error you see is because the image is currently protected. You need to unprotect
the image before it can be deleted.
CL210-RHOSP10.1-en-2-20171006 341
production-rhel7
2. Source the admin-rc credential file, then run lab troubleshooting-review break to
set up the next part of the lab exercise.
3.2. The error occurs because OpenStack cannot authenticate the operator1 user. This can
happen when the rc file for the user has a bad IP address. Check the rc file and note
the OS_AUTH_URL address. Compare this IP address to the one that can be found
in /etc/haproxy/haproxy.cfg on controller0. Search for the line: listen
keystone_public. The second IP address is the one that must be used in the user's rc
file. When done, log out from the controller node.
3.3. Compare the IP address from HAproxy and the rc file. You need to change it to the
correct IP address to continue.
...output omitted...
export OS_AUTH_URL=http://172.25.251.50:5000/v2.0
...output omitted...
...output omitted...
export OS_AUTH_URL=http://172.25.250.50:5000/v2.0
...output omitted...
3.5. Source the operator1-production-rc again. Use the openstack image list
command to ensure that the OS_AUTH_URL option is correct.
342 CL210-RHOSP10.1-en-2-20171006
4. Create a new server instance named production-web1. Use the m1.web flavor, the
operator1-keypair1 key pair, the production-network1 network, the production-
web security group, and the rhel7 image. This action will fail. Troubleshoot any issues and
fix the problem.
4.2. This error is due to a problem with the nova compute service. List the Nova services.
You need to source the /home/student/admin-rc first, as operator1 does not have
permission to interact directly with nova services.
4.4. Source the operator1 rc file and try to create the instance again. First, delete the
instance that is currently showing an error status. The instance deployment will finish
correctly.
CL210-RHOSP10.1-en-2-20171006 343
production-web1
[student@workstation ~(operator1-production)]$ openstack server list
5. Create a floating IP address and assign it to the instance. Troubleshoot any issues and fix the
problem.
This error message occurs because the external network is not attached to the router of
the internal network.
5.3. Attach the floating IP address to the instance. Verify that the instance has been
assigned the floating IP address.
6. Access the instance using SSH. An error will occur. Troubleshoot any issues and fix the
problem.
344 CL210-RHOSP10.1-en-2-20171006
6.2. Find out which security group the instance is using, then list the rules in that security
group.
6.3. We can see that there is no rule allowing SSH to the instance. Create the security group
rule.
CL210-RHOSP10.1-en-2-20171006 345
7. Create a volume named production-volume1, size 1 GB. Verify the volume status.
Use the admin user's Identity service rc file on controller0 at /home/heat-admin/
overcloudrc. Troubleshoot any issues and fix the problem.
7.3. The volume displays an error status. The Block Storage scheduler service is unable to
find a valid host on which to create the volume. The Block Storage volume service is
currently down. Log into controller0 as heat-admin.
7.4. Verify that no valid host was found to create the production-volume1 in the Block
Storage scheduler's log file.
7.5. Load the admin credentials and verify that the Cinder volume service is down. The
admin credential can be found in /home/heat-admin/overcloudrc.
346 CL210-RHOSP10.1-en-2-20171006
+------------------+------------------------+---------+-------+
7.6. Confirm that the IP address and port for the RabbitMQ cluster and the rabbitmq-
clone Pacemaker resource are correct.
The username for rabbit_userid is wrong. In the following output, you can see the
default is guest, but is currently set as change_me.
7.8. Using crudini change the RabbitMQ user name in the Cinder configuration file. Then
reload the Cinder configuration in the Pacemaker cluster to apply the changes and log
out.
7.9. On workstation, delete the incorrect volume and recreate it. Verify it has been
properly created.
CL210-RHOSP10.1-en-2-20171006 347
| source_volid | None |
| status | creating |
| type | None |
| updated_at | None |
| user_id | 0ac575bb96e24950a9551ac4cda082a4 |
+---------------------+--------------------------------------+
[student@workstation ~(operator1-production)]$ openstack volume list
+--------------------------------------+--------------------+-----------+------+
| ID | Display Name | Status | Size |
+--------------------------------------+--------------------+-----------+------+
| 128a9514-f8bd-4162-9f7e-72036f684cba | production-volume1 | available | 1 |
+--------------------------------------+--------------------+-----------+------+
Evaluation
On workstation, run the lab troubleshooting-review grade command to confirm
success of this exercise.
Cleanup
From workstation, run the lab troubleshooting-review cleanup script to clean up this
exercise.
348 CL210-RHOSP10.1-en-2-20171006
Summary
In this chapter, you learned:
• The overcloud uses the HAProxy service to balance traffic to OpenStack services.
• The OpenStack compute service is composed of different components running on both the
controller and the compute nodes. These components include the Compute scheduler and the
Nova compute services.
• The Compute scheduler component selects a compute node to deploy an instance based on an
algorithm. By default, this algorithm is filter-based.
• The Compute component orchestrates the instance deployment and sends the compute node
status to the Compute scheduler component. The no valid host error means that the Compute
scheduler has not identified a compute node that can provide the resources required by the
instance.
• The keystone_admin and the keystone_public services in HAProxy support the three
endpoints for the Keystone identity service: public, admin, and internal.
• Issues in OpenStack services are usually related to either a failing communication because of
a nonfunctioning messaging service, or to a misconfiguration or issue in the storage back end,
such as Ceph.
• The RabbitMQ service is managed by a Pacemaker cluster running on the controller node.
• To access an instance using a floating IP, both the external network associated with that
floating IP and the internal network to which the instance is connected, have to be connected
using a router.
• The OpenStack block storage service requires that the openstack user has read, write, and
execute capabilities in both the volumes and the images pool in Ceph.
CL210-RHOSP10.1-en-2-20171006 349
Overview
Goal Monitor and analyze cloud metrics for use in orchestration
autoscaling.
Objectives • Describe the architecture of Ceilometer, Aodh, Gnocchi,
Panko, and agent plugins.
CL210-RHOSP10.1-en-2-20171006 351
Objective
After completing this section, students should be able to describe the architecture of Ceilometer,
Aodh, Gnocchi, Panko, and agent plugins.
The sample data collected by various agents is stored in the database by the OpenStack
Telemetry collector service. The Telemetry collector service uses a pluggable storage system and
various databases, such as MongoDB. The Telemetry API service allows executing query requests
on this data store by the authenticated users. The query requests on a data store return a list of
resources and statistics based on various metrics collected.
With this architecture, the Telemetry API encountered scalability issues with an increase in query
requests to read the metric data from the data store. Each query request requires the data
store to do a full scan of all sample data stored in the database. A new metering service named
Gnocchi was introduced to decouple the storing of metric data from the Telemetry service
to increase efficiency. Similarly, alerts that were once handled by the Telemetry service were
handed over to a new alarming service named Aodh. The Panko service now stores all the events
generated by the Telemetry service. By decoupling these services from Telemetry, the scalability
of the Telemetry service is greatly enhanced.
352 CL210-RHOSP10.1-en-2-20171006
• Notification agents: This is the preferred method for collecting data. An agent monitors the
message bus for data sent by different OpenStack services such as Compute, Image, Block
Storage, Orchestration, Identity, etc. Messages are then processed by various plugins to
convert them into events and samples.
• Polling agents: These agents poll services for collecting data. Polling agents are either
configured to get information about the hypervisor or using a remote API such as IPMI to
gather the power state of a compute node. This method is less preferred as this approach
increases the load on the Telemetry service API endpoint.
Data gathered by notification and polling agents are processed by various transformers to
generate data samples. For example, to get a CPU utilization percentage, multiple CPU utilization
sample data collected over a period can be aggregated. The processed data samples get
published to Gnocchi for long term storage or to an external system using a publisher.
• Compute agent: This agent gathers resource data about all instances running on different
compute nodes. The compute agent is installed on every compute node to facilitate interaction
with the local hypervisor. Sample data collected by a compute agent is sent to the message
bus. The sample data is processed by the notification agent and published to different
publishers.
• Central agent: These agents use the REST APIs of various OpenStack services to gather
additional information that was not sent as a notification. A central agent polls networking,
object storage, block storage, and hardware resources using SNMP. The sample data collected
is sent to the message bus to be processed by the notification agent.
• IPMI agent: This agent uses the ipmitool utility to gather IPMI sensor data. An IPMI-capable
host requires that an IPMI agent is installed. The sample data gathered is used for providing
metrics associated with the physical hardware.
Gnocchi
Gnocchi is based on a time series database used to store metrics and resources published by the
Telemetry service. A time series database is optimized for handling data that contains arrays
of numbers indexed by time stamp. The Gnocchi service provides a REST API to create or edit
metric data. The gnocchi-metricd service computes statistics, in real time, on received data.
This computed data is stored and indexed for fast retrieval.
Gnocchi supports various back ends for storing the metric data and indexed data. Currently
supported storage drivers for storing metric data include file, Ceph, Swift, S3, and Redis. The
default storage driver is file. An overcloud deployment uses the ceph storage driver as the
storage for the metric data. Gnocchi can use a PostgreSQL or a MySQL database to store indexed
data and any associated metadata. The default storage driver for indexed data is PostgreSQL.
CL210-RHOSP10.1-en-2-20171006 353
The Telemetry service uses the Gnocchi API service to publish data samples to Gnocchi for
processing and storage. Received data samples are stored in temporary measure storage. The
gnocchi-metricd service reads the measures from the measure storage. The gnocchi-
metricd service then computes the measures based on the archive policy and the aggregation
methods defined for the meter. The computed statistics are then stored for long term in the
metric storage.
To retrieve the metric data, a client, such as the Telemetry alarming service, uses the Gnocchi
API service to read the metric measures from the metric storage, and the metric metadata
stored in the index storage.
Aodh
Aodh provides the alarming services within the Telemetry architecture. For example, you might
want to trigger an alarm when CPU utilization of an instance reaches 70% for more than 10
minutes. To create an Aodh alarm, an alarm action and conditions need to be defined.
An alarm rule is used to define when the alarm is to be triggered. The alarm rule can be based
on an event or on a computed statistic. The definition of an action to be taken when the alarm is
triggered supports multiple forms:
Panko
Panko provides the service to store events collected by the Telemetry service from various
OpenStack components. The Panko service allows storing event data in long term storage, to be
used for auditing and system debugging.
354 CL210-RHOSP10.1-en-2-20171006
data required for billing purposes. This data can be fed into cloud management software, such as
Red Hat CloudForms, to provide itemized billing and a charge-back to the cloud users.
• Stores metric data with one hour granularity over 365 days.
high • Stores metric data with one second granularity over one hour.
• Stores metric data with one hour granularity over 365 days.
The gnocchi-metricd daemon is used to compute the statistics of gathered data samples. In
the event that the number of processes increase, the gnocchi-metricd daemon can be scaled to
any number of servers.
CL210-RHOSP10.1-en-2-20171006 355
---
sources:
- name: cpu_source
interval: 600
meters:
- "cpu"
sinks:
- cpu_sink
- cpu_delta_sink
sinks:
- name: cpu_sink
transformers:
- name: "rate_of_change"
parameters:
target:
name: "cpu_util"
unit: "%"
type: "gauge"
356 CL210-RHOSP10.1-en-2-20171006
The processed data is published, over the messaging bus, to the persistent storage of several
consumers. The publishers section in pipeline.yaml defines the destination for published
data. The Telemetry service supports three types of publishers:
References
Gnocchi Project Architecture
http://gnocchi.xyz/architecture.html
Telemetry service
https://docs.openstack.org/newton/config-reference/telemetry.html
Telemetry service overview
https://docs.openstack.org/mitaka/install-guide-rdo/common/
get_started_telemetry.html
Ceilometer architecture
https://docs.openstack.org/ceilometer/latest/admin/telemetry-system-
architecture.html
CL210-RHOSP10.1-en-2-20171006 357
1. Which service is responsible for storing metering data gathered by the Telemetry service?
a. Panko
b. Oslo
c. Aodh
d. Ceilometer
e. Gnocchi
2. What two data collection mechanisms are leveraged by the Telemetry service? (Choose two.)
a. Polling agent
b. Publisher agent
c. Push agent
d. Notification agent
3. Which configuration file contains the meter definitions for the Telemetry service?
a. /etc/ceilometer/ceilometer.conf
b. /etc/ceilometer/meters.conf
c. /etc/ceilometer/definitions.yaml
d. /etc/ceilometer/meters.yaml
e. /etc/ceilometer/resources.yaml
4. What three publisher types are supported by the Telemetry service? (Choose three.)
a. Panko
b. Aodh
c. Notifier
d. Gnocchi
5. What two default archive policies are defined in the Gnocchi service? (Choose two.)
a. low
b. coarse
c. medium
d. sparse
e. moderate
358 CL210-RHOSP10.1-en-2-20171006
Solution
Choose the correct answer(s) to the following questions:
1. Which service is responsible for storing metering data gathered by the Telemetry service?
a. Panko
b. Oslo
c. Aodh
d. Ceilometer
e. Gnocchi
2. What two data collection mechanisms are leveraged by the Telemetry service? (Choose two.)
a. Polling agent
b. Publisher agent
c. Push agent
d. Notification agent
3. Which configuration file contains the meter definitions for the Telemetry service?
a. /etc/ceilometer/ceilometer.conf
b. /etc/ceilometer/meters.conf
c. /etc/ceilometer/definitions.yaml
d. /etc/ceilometer/meters.yaml
e. /etc/ceilometer/resources.yaml
4. What three publisher types are supported by the Telemetry service? (Choose three.)
a. Panko
b. Aodh
c. Notifier
d. Gnocchi
5. What two default archive policies are defined in the Gnocchi service? (Choose two.)
a. low
b. coarse
c. medium
d. sparse
e. moderate
CL210-RHOSP10.1-en-2-20171006 359
Objective
After completing this section, students should be able to analyze OpenStack metrics for use in
autoscaling.
360 CL210-RHOSP10.1-en-2-20171006
To retrieve all the resources and the respective resource IDs, use the openstack metric
resource list command.
The Time Series Database service allows you to create custom resource types to enable the
use of elements that are part of your architecture but are not tied to any OpenStack resources.
For example, when using a hardware load balancer in the architecture, a custom resource type
can be created. These custom resource types use all the features provided by the Time Series
Database service, such as searching through the resources, associating metrics, and so on. To
create a custom resource type, use the openstack metric resource-type create. The
--attribute option is used to specify various attributes that are associated with the resource
type. These attributes are used to search for resources associated with a resource type.
CL210-RHOSP10.1-en-2-20171006 361
| state | active |
+-------------------------+----------------------------------------------------------+
To list the metrics associated with a resource, use the openstack metric resource show
command. The resource ID is retrieved using the openstack metric resource list --
type command, which filters based on resource type.
New metrics can be added to a resource by an administrator using the openstack metric
resource update command. The --add-metric option can be used to add any existing
metric. The --create-metric option is used to create and then add a metric. The --create-
metric option requires the metric name and the archive policy to be attached to the metric.
To add a new metric named custommetric with the low archive policy to an image resource,
use the command as shown. The resource ID in this example is the ID that was shown previously.
362 CL210-RHOSP10.1-en-2-20171006
| name | finance-rhel7 |
| original_resource_id | 6bd6e073-4e97-4a48-92e4-d37cb365cddb |
| project_id | cebc8e2f3c8f45a18f716f03f017c623 |
| revision_end | None |
| revision_start | 2017-05-23T04:06:12.958634+00:00 |
| started_at | 2017-05-23T04:06:12.958618+00:00 |
| type | image |
| user_id | None |
+-----------------------+------------------------------------------------------+
All the metrics provided by the Telemetry service can be listed by an OpenStack administrator
using the openstack metric metric list command.
The openstack metric metric show command shows the metric details. The resource ID of
a resource is retrieved using the openstack metric resource list command.
To list the detailed information of the image.serve metric for an image with the
6bd6e073-4e97-4a48-92e4-d37cb365cddb resource ID, run the following command:
CL210-RHOSP10.1-en-2-20171006 363
| resource/started_at | 2017-05-23T04:06:12.958618+00:00 |
| resource/type | image |
| resource/user_id | None |
| unit | None |
+------------------------------------+------------------------------------------------+
The openstack metric archive-policy list command list the archive policies.
A Telemetry service administrator can add measures to the data store using the openstack
metric measures add command. To view measures, use the openstack metric measures
show. Both commands require the metric name and resource ID as parameters.
The Time Series Database service uses ISO 8601 time stamp format for output. In ISO
8601 notation, the date, time, and time zone are represented in the following format:
yyyymmddThhmmss+|-hhmm. The date -u "+%FT%T.%6N" command converts the current
date time into the ISO 8601 timestamp format.
The resource ID of a resource is retrieved using the openstack metric resource list
command. To list the metrics associated with a resource, use the openstack metric
resource show command. The default aggregation method used by the openstack metric
resource show command is mean.
364 CL210-RHOSP10.1-en-2-20171006
Important
For removing measures, administrator privileges are required.
The final entry in the following output shows the average CPU utilization for a resource.
Note
For querying and adding measures, a few other time stamp formats are supported. For
example: 50 minutes, which indicating 50 minutes from now, and - 50 minutes,
indicating 50 minutes ago. Time stamps based on the UNIX epoch is also supported.
Use aggregation methods such as min, max, mean, sum, etc., to display the measures based on
the granularity.
The following command shows how to list measures with a particular aggregation method.
The command uses the resource ID associated with an instance to display the minimum CPU
utilization for different granularity. The --refresh option is used to include all new measures.
The final entry of the following screen capture shows the minimum CPU utilization for the
resource.
CL210-RHOSP10.1-en-2-20171006 365
The --query option uses attributes associated with a resource type. The following command
displays the mean CPU utilization for all provisioned instances that use the flavor with an ID of 1,
or that use the image with an ID of 6bd6e073-4e97-4a48-92e4-d37cb365cddb.
Use the --start option and the --stop option in the openstack metric measures
aggregation command to provide the time range for computing aggregation statistics. For
example, the server_group attribute of the instance resource type can be used the --query
option to group a specific set of instances which can then be monitored for autoscaling. It is also
possible to search for values in the metrics by using one or more levels of granularity. Use the --
granularity option to make queries based on the granularity.
• Cumulative: A cumulative meter provides measures that are accumulated over time. For
example, total CPU time used.
• Gauge: A gauge meter records the current value at the time that a reading is recorded. For
example, number of images.
366 CL210-RHOSP10.1-en-2-20171006
• Delta: A delta meter records the change between values recorded over a particular time
period. For example, network bandwidth.
The alarm action defines the action that needs to be taken when an alarm is triggered. In Aodh,
the alarm notifier notifies the activation of an alarm by using one of three methods: triggering
the HTTP callback URL, writing to a log file, or sending notifications to the messaging bus.
CL210-RHOSP10.1-en-2-20171006 367
You can create a threshold alarm that activates when the aggregated statistics of a metric
breaches the threshold value. In the following example, an alarm is created to trigger when the
average CPU utilization metric of the instance exceeds 80%. The alarm action specified adds an
entry to the log. A query is used so that the alarm monitors the CPU utilization of a particular
instance with an instance ID of 5757edba-6850-47fc-a8d4-c18026e686fb.
To get the alarm state, use the openstack alarm state get command. The alarm history
can be viewed using the openstack alarm-history show command. This checks the alarm
state transition and shows the related time stamps.
368 CL210-RHOSP10.1-en-2-20171006
[
{
"timestamp": "2017-06-08T16:28:54.002079",
"type": "state transition",
"detail": "{\"transition_reason\": \"Transition to ok due to 2 samples
inside threshold, most recent: 0.687750180591\", \"state\": \"ok\"}"
},
{
"timestamp": "2017-06-08T15:25:53.525213",
"type": "state transition",
"detail": "{\"transition_reason\": \"2 datapoints are unknown\", \"state\":
\"insufficient data\"}"
},
{
"timestamp": "2017-06-08T14:05:53.477088",
"type": "state transition",
"detail": "{\"transition_reason\": \"Transition to alarm due to 2 samples
outside threshold, most recent: 70.0\", \"state\": \"alarm\"}"
},
...output omitted...
1. Use the openstack metric resource list command to find the resource ID and the
desired resource.
2. Use the openstack metric resource show command with the resource ID found in the
previous step to view the available meters for the resource. Make note of the metric ID.
3. Use the openstack metric metric show command with the metric ID found in the
previous step to view the details of the desired meter.
4. Create an alarm based on the desired meter using the openstack alarm create
command. Use the --alarm-action option to define the action to be taken after the
alarm is triggered.
5. Verify the alarm state using the openstack alarm state get command.
6. List the alarm history using the openstack alarm-history command to check the
alarm state transition time stamps.
CL210-RHOSP10.1-en-2-20171006 369
References
Further information is available in the Monitoring Using the Telemetry Service
chapter of the Logging, Monitoring, and Troubleshooting Guide for Red Hat
OpenStack Platform 10 at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
370 CL210-RHOSP10.1-en-2-20171006
In this exercise, you will view and analyze common metrics required for autoscaling.
Outcomes
You should be able to:
Steps
1. From workstation connect to the controller0 node. Open the /etc/ceilometer/
ceilometer.conf file and which meter dispatcher is configured for the Telemetry service.
On workstation, run the ceilometer command (which should produce an error) to verify
that the Gnocchi telemetry service is running instead of Ceilometer.
CL210-RHOSP10.1-en-2-20171006 371
2. List the resource types available in the Telemetry metering service. Use the resource ID of
the instance resource type to list all the meters available.
2.2. List the resources accessible by the developer1 user. Note the resource ID of the
instance resource type.
2.3. Verify that the instance ID of the finance-web1 instance is the same as the resource
ID.
372 CL210-RHOSP10.1-en-2-20171006
2.4. Using the resource ID, list all the meters associated with the finance-web1 instance.
CL210-RHOSP10.1-en-2-20171006 373
+-----------------------+------------------------------------------------------+
3.1. Retrieve the resource ID associated with the image resource type.
3.2. List the meters associated with the image resource ID.
4. Using the resource ID, list the details for the disk.read.requests.rate metric
associated with the finance-web1 instance.
374 CL210-RHOSP10.1-en-2-20171006
The disk.read.requests.rate metric uses the low archive policy. The low archive
policy uses as low as 5 minutes granularity for aggregation and the maximum life span of
the aggregated data is 30 days.
Observe the value column, which displays the aggregated values based on archive policy
associated with the metric. The 86400, 3600, and 300 granularity column values
represent the aggregation period as 1 day, 1 hour, and 5 minutes, respectively, in seconds.
6. Using the resource ID, list the maximum measures associated with the cpu_util metric
with 300 seconds granularity. The number of records returned in the output may vary.
CL210-RHOSP10.1-en-2-20171006 375
--granularity 300 \
cpu_util
+---------------------------+-------------+-----------------+
| timestamp | granularity | value |
+---------------------------+-------------+-----------------+
| 2017-05-23T05:45:00+00:00 | 300.0 | 0.0708371692841 |
| 2017-05-23T05:55:00+00:00 | 300.0 | 0.0891683788482 |
| 2017-05-23T06:05:00+00:00 | 300.0 | 0.0907790288644 |
| 2017-05-23T06:15:00+00:00 | 300.0 | 0.0850440360854 |
| 2017-05-23T06:25:00+00:00 | 300.0 | 0.0691660923575 |
| 2017-05-23T06:35:00+00:00 | 300.0 | 0.0858326136269 |
| 2017-05-23T06:45:00+00:00 | 300.0 | 0.0666668728895 |
| 2017-05-23T06:55:00+00:00 | 300.0 | 0.0658094259754 |
| 2017-05-23T07:05:00+00:00 | 300.0 | 0.108326315232 |
| 2017-05-23T07:15:00+00:00 | 300.0 | 0.066695508806 |
| 2017-05-23T07:25:00+00:00 | 300.0 | 0.0666670677802 |
| 2017-05-23T07:35:00+00:00 | 300.0 | 0.0666727313294 |
+---------------------------+-------------+-----------------+
7. List the average CPU utilization for all instances provisioned using the rhel7 image. Query
for all instances containing the word finance in the instance name.
7.1. List the attributes supported by the instance resource type. The command returns the
attributes that may be used to query this resource type.
7.2. Only users with the admin role can query measures using resource attributes. Use the
architect1 user's Identity credentials to execute the command. The architect1
credentials are stored in the /home/student/architect1-finance-rc file.
376 CL210-RHOSP10.1-en-2-20171006
7.4. List the average CPU utilization for all the instances using the openstack metric
measures aggregation command. Use the --query option to filter the instances.
Source the admin-rc credential file.
The instance resource type has the attributes image_ref and display_name. The
image_ref attribute specifies the image used for provisioning. The display_name
attribute specifies the instance name. The query uses the like operator to search for
the finance substring. Combine the query conditions using the and operator. The --
refresh option is used to force aggregation of all known measures. The number of
records returned in the output may vary.
Cleanup
From workstation, run the lab monitoring-analyzing-metrics cleanup command to
clean up this exercise.
CL210-RHOSP10.1-en-2-20171006 377
In this lab, you will analyze the Telemetry metric data and create an Aodh alarm. You will also set
the alarm to trigger when the maximum CPU utilization of an instance exceeds a threshold value.
Outcomes
You should be able to:
• Search and list the metrics available with the Telemetry service for a particular user.
• Create an alarm based on aggregated usage data of a metric, and trigger it.
On workstation, run the lab monitoring-review setup command. This will ensure that
the OpenStack services are running and the environment has been properly configured for this
lab. The script also creates an instance named production-rhel7.
Steps
1. List all of the instance type telemetry resources accessible by the user operator1. Ensure
the production-rhel7 instance is available. Observe the resource ID of the instance.
Credentials for user operator1 are in /home/student/operator1-production-rc on
workstation.
3. List the available archive policies. Verify that the cpu_util metric of the production-
rhel7 instance uses the archive policy named low.
4. Add new measures to the cpu_util metric. Observe that the newly added measures
are available using min and max aggregation methods. Use the values from the following
table. The measures must be added using the architect1 user's credentials, because
manipulating data points requires an account with the admin role. Credentials of user
architect1 are stored in /home/student/architect1-production-rc file.
Measures Parameter
Timestamp Current time in ISO 8601 formatted timestamp
Measure values 30, 42
The measure values 30 and 42 are manual data values added to the cpu_util metric.
378 CL210-RHOSP10.1-en-2-20171006
6. Simulate high CPU utilization scenario by manually adding new measures to the cpu_util
metric of the instance. Observe that the alarm triggers when the aggregated CPU utilization
exceeds the 50% threshold through two evluation periods of 5 minutes each. To simulate
high CPU utilization, manually add a measure with a value of 80 once every minute until the
alarm triggers. It is expected to take between 5 and 10 minutes to trigger.
Evaluation
On workstation, run the lab monitoring-review grade command to confirm success of
this exercise. Correct any reported failures and rerun the command until successful.
Cleanup
From workstation, run the lab monitoring-review cleanup command to clean up this
exercise.
CL210-RHOSP10.1-en-2-20171006 379
Solution
In this lab, you will analyze the Telemetry metric data and create an Aodh alarm. You will also set
the alarm to trigger when the maximum CPU utilization of an instance exceeds a threshold value.
Outcomes
You should be able to:
• Search and list the metrics available with the Telemetry service for a particular user.
• Create an alarm based on aggregated usage data of a metric, and trigger it.
On workstation, run the lab monitoring-review setup command. This will ensure that
the OpenStack services are running and the environment has been properly configured for this
lab. The script also creates an instance named production-rhel7.
Steps
1. List all of the instance type telemetry resources accessible by the user operator1. Ensure
the production-rhel7 instance is available. Observe the resource ID of the instance.
Credentials for user operator1 are in /home/student/operator1-production-rc on
workstation.
1.2. Use the retrieved user ID to search the resources accessible by the operator1 user.
Filter the output based on the instance resource type.
380 CL210-RHOSP10.1-en-2-20171006
[
{
"user_id": "4301d0dfcbfb4c50a085d4e8ce7330f6",
"type": "instance",
"id": "969b5215-61d0-47c4-aa3d-b9fc89fcd46c"
}
]
1.3. Observe that the ID of the resource in the previous output matches the instance ID of
the production-rhel7 instance. The production-rhel7 instance is available.
2.1. Use the production-rhel7 instance resource ID to list the available metrics. Verify
that the cpu_util metric is listed.
CL210-RHOSP10.1-en-2-20171006 381
3. List the available archive policies. Verify that the cpu_util metric of the production-
rhel7 instance uses the archive policy named low.
3.1. List the available archive policies and their supported aggregation methods.
3.3. Use the resource ID of the production-rhel7 instance to check which archive policy
is in use for the cpu_util metric.
3.4. View the measures collected for the cpu_util metric associated with the
production-rhel7 instance to ensure that it uses granularities according to the
definition of the low archive policy.
382 CL210-RHOSP10.1-en-2-20171006
4. Add new measures to the cpu_util metric. Observe that the newly added measures
are available using min and max aggregation methods. Use the values from the following
table. The measures must be added using the architect1 user's credentials, because
manipulating data points requires an account with the admin role. Credentials of user
architect1 are stored in /home/student/architect1-production-rc file.
Measures Parameter
Timestamp Current time in ISO 8601 formatted timestamp
Measure values 30, 42
The measure values 30 and 42 are manual data values added to the cpu_util metric.
4.1. Source architect1 user's credential file. Add 30 and 42 as new measure values.
4.2. Verify that the new measures have been successfully added for the cpu_util metric.
Force the aggregation of all known measures. The default aggregation method is mean,
so you will see a value of 36 (the mean of 30 and 42). The number of records and their
values returned in the output may vary.
4.3. Display the maximum and minimum values for the cpu_util metric measure.
CL210-RHOSP10.1-en-2-20171006 383
384 CL210-RHOSP10.1-en-2-20171006
5.2. View the newly created alarm. Verify that the state of the alarm is either ok or
insufficient data. According to the alarm definition, data is insufficient until two
evaluation periods have been recorded. Continue with the next step if the state is ok or
insufficient data.
6. Simulate high CPU utilization scenario by manually adding new measures to the cpu_util
metric of the instance. Observe that the alarm triggers when the aggregated CPU utilization
exceeds the 50% threshold through two evluation periods of 5 minutes each. To simulate
high CPU utilization, manually add a measure with a value of 80 once every minute until the
alarm triggers. It is expected to take between 5 and 10 minutes to trigger.
6.1. Open two terminal windows, either stacked vertically or side-by-side. The second
terminal will be used in subsequent steps to add data points until the alarm triggers. In
the first window, use the watch to repetitively display the alarm state.
+--------------------------------------+--------------------+-------+
| alarm_id | name | state |
+--------------------------------------+--------------------+-------+
| 82f0b4b6-5955-4acd-9d2e-2ae4811b8479 | cputhreshold-alarm | ok |
+--------------------------------------+--------------------+-------+
6.2. In the second terminal window, use the watch command to add new measures to the
cpu_util metric of the production-rhel7 instance every minute. A value of 80 will
simulate high CPU utilization, since the alarm is set to trigger at 50%.
Repeat this command once per minute. Continue to add manual data points at a rate
of about one of these commands per minute. Be patient, as the trigger must detect a
maximum value greater than 50 in 2 consecutive 5 minute evaluation periods. This is
expected to take between 6 and 10 minutes. As long as you are adding one measure at a
casual pace of one per minute, the alarm will always trigger.
CL210-RHOSP10.1-en-2-20171006 385
Note
In a real-world environment, measures are collected automatically using
various polling and notification agents. Manually adding data point measures
for a metric is only for alarm configuration testing purposes.
6.3. The alarm-evaluator service will detect the new manually added measures. Within
the expected 6 to 10 minutes, the alarm change state to alarm in the first terminal
window. Stop manually adding new data measures as soon as the new allarm state
occurs. Observe the new alarm state. The alarm state will transition back to ok after
one more evaluation period, because high CPU utilization values are no longer being
received. Press CTRL-C to stop the watch.
+--------------------------------------+--------------------+-------+
| alarm_id | name | state |
+--------------------------------------+--------------------+-------+
| 82f0b4b6-5955-4acd-9d2e-2ae4811b8479 | cputhreshold-alarm | alarm |
+--------------------------------------+--------------------+-------+
6.4. After stopping the watch and closing the second terminal, view the alarm history to
analyze when the alarm transitioned from the ok state to the alarm state. The output
may look similar to the lines displayed below.
[
{
"timestamp": "2017-06-08T14:05:53.477088",
"type": "state transition",
"detail": "{\"transition_reason\": \"Transition to alarm due to 2 samples
outside threshold, most recent: 70.0\", \"state\": \"alarm\"}"
},
{
"timestamp": "2017-06-08T13:18:53.356979",
"type": "state transition",
"detail": "{\"transition_reason\": \"Transition to ok due to 2 samples
inside threshold, most recent: 0.579456043152\", \"state\": \"ok\"}"
},
{
"timestamp": "2017-06-08T13:15:53.338924",
"type": "state transition",
"detail": "{\"transition_reason\": \"2 datapoints are unknown\", \"state\":
\"insufficient data\"}"
},
{
"timestamp": "2017-06-08T13:11:51.328482",
"type": "creation",
"detail": "{\"alarm_actions\": [\"log:/tmp/alarm.log\"], \"user_id\":
\"b5494d9c68eb4938b024c911d75f7fa7\", \"name\": \"cputhreshold-alarm\",
\"state\": \"insufficient data\", \"timestamp\": \"2017-06-08T13:11:51.328482\",
\"description\": \"Alarm to monitor CPU utilization\", \"enabled\":
true, \"state_timestamp\": \"2017-06-08T13:11:51.328482\", \"rule\":
386 CL210-RHOSP10.1-en-2-20171006
Evaluation
On workstation, run the lab monitoring-review grade command to confirm success of
this exercise. Correct any reported failures and rerun the command until successful.
Cleanup
From workstation, run the lab monitoring-review cleanup command to clean up this
exercise.
CL210-RHOSP10.1-en-2-20171006 387
Summary
In this chapter, you learned:
• Telemetry data is used for system monitoring, alerts, and for generating customer usage
billing.
• The Telemetry service collects data using polling agents and notification agents.
• The Time Series Database (Gnocchi) service was introduced to decouple the storing of metric
data from the Telemetry service to increase the efficiency.
• The gnocchi-metricd service is used to compute, in real time, statistics on received data.
• The Alarm (Aodh) service provides alarming services within the Telemetry service architecture.
• The Event Storage (Panko) service stores events collected by the Telemetry service from
various OpenStack components.
• The measures stored in the Time Series Database are indexed based on the resource and its
attributes.
• The aggregated data is stored in the metering database according to the archive policies
defined on a per-metric basis.
• In the Alarm service, the alarm notifier notifies the activation of an alarm by using the HTTP
callback URL, writing to a log file, or sending notifications using the messaging bus.
388 CL210-RHOSP10.1-en-2-20171006
ORCHESTRATING
DEPLOYMENTS
Overview
Goal Deploy Orchestration stacks that automatically scale.
Objectives • Describe the Orchestration service architecture and use
cases.
CL210-RHOSP10.1-en-2-20171006 389
Objectives
After completing this section, students should be able to describe Heat orchestration
architecture and use cases.
The Orchestration service (Heat) provides developers and system administrators an easy and
repeatable way to create and manage a collection of related OpenStack services. The Heat
orchestration service deploys OpenStack resources in an orderly and predictable fashion. The
user creates a Heat Orchestration Template (HOT) template to describe OpenStack resources
and run time parameters required to execute an application. The Orchestration service does
the ordering of the deployment of these OpenStack resources and resolves any dependencies.
When provisioning your infrastructure with the Orchestration service, the Orchestration template
describes the resources to be provisioned and their settings. Since the templates are text files,
the versions of the templates can be controlled using a version control system to track changes
to infrastructure.
The template, along with the input parameters, calls the Orchestration REST APIs for deploying
the stack using either the Horizon dashboard or the OpenStack CLI commands. The Heat
orchestration API service forwards requests to the Orchestration engine service using the
remote procedure calls (RPCs) over AMQP. Optionally, the Orchestration CFN service sends the
AWS CloudFormation-compatible requests to the Orchestration engine service over RPC. The
Orchestration engine service interprets the orchestration template and launches the stack. The
events generated by the Orchestration engine service are consumed by the Orchestration API
service to provide the status of the Orchestration stack that was launched.
390 CL210-RHOSP10.1-en-2-20171006
CL210-RHOSP10.1-en-2-20171006 391
• Using multiple layers of stacks that build on top of one another is the best way to organize an
orchestration stack. Putting all the resources in one stack becomes cumbersome to manage
when the stack is scaled, and broadens the scope of resources to be provisioned.
• When using nested stacks the resources names or IDs can be hard coded into the calling stack.
However, hard coding of resources names or IDs can make templates difficult to be reused, and
may increase overhead to get the stack deployed.
• The changes in the infrastructure after updating a stack should be verified first by doing a dry
run of the stack.
• Before launching a stack, ensure all the resources to be deployed by the orchestration stack
are within the project quota limits.
• With the growth of infrastructure, declaring resources in each template becomes repetitive.
Such shared resources should be maintained as a separate stack and used inside a nested
stack. Nested stacks are the stacks that create other stacks.
• When declaring parameters in the orchestration template, use constraints to define the format
for the input parameters. Constraints allow you to describe legal input values so that the
Orchestration engine catches any invalid values before creating the stack.
• Before using a template to create or update a stack, you can use OpenStack CLI to validate
it. Validating a template helps catch syntax and some semantic errors, such as circular
dependencies before the Orchestration stack creates any resources.
Parameter Description
encrypt_parameters_and_properties Encrypts the parameter and properties of a
resource; marked as hidden before storing
in the database. The parameter accepts a
Boolean value.
heat_stack_user_role The Identity user role name associated with
the user who is responsible for launching
the stack. The parameter accepts the
value as a string. The default value is the
heat_stack_user role.
num_engine_workers The number of heat-engine processes to
fork and run on the host. The parameter
accepts the value as an integer. The default
value is either the number of CPUs on the
host running the heat-engine service or 4,
whichever is greater.
stack_action_timeout The default timeout period in seconds to
timeout the creation and update of a stack.
The default value is 3600 seconds (1 hour).
392 CL210-RHOSP10.1-en-2-20171006
The log files for the orchestration service are stored in the /var/log/heat directory of the
host on which the heat-api, heat-engine, and heat-manage services are running.
• Editing an existing template might introduce YAML syntax errors. Various tools, such as
python -m json.tool, help validate the YAML syntax errors in the template files. Using
the --dry-run option with the openstack stack create command validates some of the
YAML syntax.
• If an instance goes into the ERROR state after launching a stack, troubleshoot the problem
by looking for the /var/log/nova/scheduler.log log file on the compute node. If the
error shows No valid host was found, the compute node does not have the required
resources to launch the instance. Check the resources consumed by the instances running on
the compute nodes and, if possible, change the allocation ratio in the /etc/nova/nova.conf
file.
To over-commit the amount of the CPU, the RAM, and the disk allocated on the compute nodes
use the following commands to change the allocation ratio. The ratios shown in the commands
are arbitrary.
• While validating a template using the --dry-run option. It checks for the existence of
resources required for the template and run time parameters. Using custom constraints helps
the template parameters to be parsed at an early stage rather than failing during the launch of
the stack.
References
Further information is available in the Components chapter of the Architecture Guide
for Red Hat OpenStack Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
CL210-RHOSP10.1-en-2-20171006 393
a. Nova
b. Glance
c. Heat
d. Ceilometer
2. Which two template formats are supported by the Orchestration service? (Choose two.)
a. XML
b. JSON
c. YAML
d. HTML
a. 86400 Seconds
b. 3600 Seconds
c. 300 Seconds
d. 600 Seconds
5. In which log file does information related to the Orchestration engine service get logged?
a. /var/log/heat/heat-api.log
b. /var/log/heat/heat-manage.log
c. /var/log/heat/engine.log
d. /var/log/heat/heat-engine.log
a. --validate
b. --run-dry
c. --dry-run
d. --yaml
394 CL210-RHOSP10.1-en-2-20171006
Solution
Choose the correct answer(s) to the following questions:
a. Nova
b. Glance
c. Heat
d. Ceilometer
2. Which two template formats are supported by the Orchestration service? (Choose two.)
a. XML
b. JSON
c. YAML
d. HTML
a. 86400 Seconds
b. 3600 Seconds
c. 300 Seconds
d. 600 Seconds
5. In which log file does information related to the Orchestration engine service get logged?
a. /var/log/heat/heat-api.log
b. /var/log/heat/heat-manage.log
c. /var/log/heat/engine.log
d. /var/log/heat/heat-engine.log
a. --validate
b. --run-dry
c. --dry-run
d. --yaml
CL210-RHOSP10.1-en-2-20171006 395
Objectives
After completing this section, students should be able to write templates using the Heat
Orchestration Template (HOT) language.
Introduction to YAML
Orchestration templates are written using the YAML Ain't Markup Language (YAML) language.
Therefore, it is necessary to understand the basics of YAML syntax to write an orchestration
template.
YAML was designed primarily for the representation of data structures such as lists and
associative arrays, in an easily written, human-readable format. This design objective is
accomplished primarily by abandoning traditional enclosure syntax, such as brackets, braces, or
opening and closing tags, commonly used by other languages to denote the structure of a data
hierarchy. Instead, in YAML, data hierarchy structures are maintained using outline indentation.
Data structures are represented using an outline format with space characters for indentation.
There is no strict requirement regarding the number of space characters used for indentation
other than data elements must be further indented than their parents to indicate nested
relationships. Data elements at the same level in the data hierarchy must have the same
indentation. Blank lines can be optionally added for readability.
Indentation can only be performed using the space character. Indentation is very critical to the
proper interpretation of YAML. Since tabs are treated differently by various editors and tools,
YAML forbids the use of tabs for indentation.
Adding the following line to the user's $HOME/.vimrc, two-space indentation is performed when
the Tab key is pressed. This also auto-indents subsequent lines.
Each orchestration template must include the heat_template_version key with a correct
orchestration template version. The orchestration template version defines both the supported
format of the template and the features that are valid and supported for the Orchestration
service. The orchestration template version is in a date format or uses the release name, such as
newton. The openstack orchestration template version list command lists all the
supported template versions.
396 CL210-RHOSP10.1-en-2-20171006
| heat_template_version.2014-10-16 | hot |
| heat_template_version.2015-04-30 | hot |
| heat_template_version.2015-10-15 | hot |
| heat_template_version.2016-04-08 | hot |
| heat_template_version.2016-10-14 | hot |
+--------------------------------------+------+
The description key in a template is optional, but can include some useful text that describes
the purpose of the template. You can add multi-line text to the description key by using
folded blocks (>) in YAML. Folded blocks replace each line break with a single space, ignoring
indentation.
heat_template_version: 2016-10-14
description: >
This is multi-line description
that describes the template usage.
Parameters
The orchestration templates allow users to customize the template during deployment of
the orchestration stack by use of input parameters. The input parameters are defined in the
parameters section of the orchestration template. Each parameter is defined as a separate
nested block with required attributes such as type or default. In the orchestration template,
the parameters section uses the following syntax and attributes to define an input parameter
for the template.
parameters:
<param_name>:
type: <string | number | json | comma_delimited_list | boolean>
label: <human-readable name of the parameter>
description: <description of the parameter>
default: <default value for parameter>
hidden: <true | false>
constraints:
<parameter constraints>
immutable: <true | false>
Attribute Description
type Data type of the parameter. The supported data types are
string, number, JSON, comma delimited list, and boolean.
label Human-readable name for the parameter. This attribute is
optional.
description Short description of the parameter. This attribute is optional.
default Default value to be used in case the user does not enter any
value for the parameter. This attribute is optional.
hidden Determines whether the value of the parameter is hidden
when the user lists information about a stack created by the
orchestration template. This attribute is optional and defaults
to false.
constraints Constraints to be applied to validate the input value provided
by the user for a parameter. The constraints attribute can
apply lists of different constraints. This attribute is optional.
CL210-RHOSP10.1-en-2-20171006 397
Attribute Description
immutable Defines whether the parameter can be updated. The stack
fails to be updated if the parameter value is changed and the
attribute value is set to true.
parameters:
volume_name:
type: string
description: volume name
constraints:
- custom_constraints: cinder.volume
Resources
The resources section in the orchestration template defines resources provisioned during
deployment of a stack. Each resource is defined as a separate nested block with its required
attributes, such as type and properties. The properties attribute defines the properties
required to provision the resource. The resources section in a template uses the following
syntax and attributes to define a resource for the stack.
resources:
<resource ID>:
type: <resource type>
properties:
<property name>: <property value>
Attribute Description
resource ID A resource name. This must be uniquely referenced within the
resources section of the template.
type The attribute uses the resource type name. The core
OpenStack resources are included in the Orchestration engine
service as a built-in resource. The Orchestration service
provides support for resource plugins using custom resources.
This attribute is mandatory and must be specified when
declaring a resource.
properties This attribute is used to specify a list of properties associated
with a resource type. The property value is either hard-coded
or uses intrinsic functions to retrieve the value. This attribute
is optional.
Resource Types
A resource requires a type attribute, such as an instance and various properties, that depend
on the resource type. To list the available resource types, use the openstack orchestration
resource type list command.
398 CL210-RHOSP10.1-en-2-20171006
| Resource Type |
+----------------------------------------------+
...output omitted...
| OS::Nova::FloatingIP |
| OS::Nova::FloatingIPAssociation |
| OS::Nova::KeyPair |
| OS::Nova::Server |
| OS::Nova::ServerGroup |
| OS::Swift::Container |
+----------------------------------------------+
The OS::Heat::ResourceGroup resource type creates one or more identical resources. The
resource definition is passed as a nested stack. The required property for the ResourceGroup
resource type is resource_def. The value of the resource_def property is the definition of
the resource to be provisioned. The count property sets the number of resources to provision.
resources:
my_group:
type: OS::Heat::ResourceGroup
properties:
count: 2
resource_def:
type: OS::Nova::Server
properties:
name: { get_param: instance_name }
image: { get_param: instance_image }
Intrinsic Functions
HOT provides several built-in functions that are used to perform specific tasks in the template.
Intrinsic functions in the Orchestration template assign values to the properties that are
available during creation of the stack. Some of the widely used intrinsic functions are listed
below:
• get_attr: The get_attr function references an attribute of a resource. This function takes
the resource name and the attribute name as the parameters to retrieve the attribute value for
the resource.
resources:
the_instance:
type: OS::Nova::Server
...output omitted...
outputs:
instance_ip:
description: IP address of the instance
value: { get_attr: [the_instance, first_address] }
parameters:
instance_flavor:
type: string
description: Flavor to be used by the instance.
resources:
CL210-RHOSP10.1-en-2-20171006 399
the_instance:
type: OS::Nova::Server
properties:
flavor: { get_param: instance_flavor }
resources:
the_port:
type: OS::Neutron::Port
...output omitted...
the_instance:
type: OS::Nova::Server
properties:
networks:
port: { get_resource: the_port }
• str_replace: The str_replace function substitutes variables in an input string with values
that you specify. The input string along with variables are passed to the template property of
the function. The values of the variables are instantiated using the params property as a key
value pair.
outputs:
website_url:
description: The website URL of the application.
value:
str_replace:
template: http://varname/MyApps
params:
varname: { getattr: [ the_instance, first_address ] }
• list_join: The list_join function appends a set of strings into a single value, separated
by the specified delimiter. If the delimiter is an empty string, it concatenates all of the strings.
resources:
random:
type: OS::Heat::RandomString
properties:
length: 2
the_instance:
type: OS::Nova::Server
properties:
instance_name: { list_join: [ '-', [ {get_param: instance_name}, {get_attr:
[random, value]} ] ] }
400 CL210-RHOSP10.1-en-2-20171006
implement software configurations. There are, broadly, three options to implement the software
configuration changes using the orchestration template:
• Using a custom image the includes installed and configured software. This method can be used
when there is no change in software configuration required during the life cycle of an instance.
• Using the user data script and cloud-init to configure the pre-installed software in the
image. This method can be used when there is a software configuration change required once
during the life cycle of an instance (at boot time). An instance must be replaced with a new
instance when software configuration changes are made using this option.
resources:
the_instance:
type: OS::Nova::Server
properties:
...output omitted...
user_data_format: RAW
user_data:
str_replace:
template: |
#/bin/bash
echo "Hello World" > /tmp/$demo
params:
$demo: demofile
When the user data is changed and the orchestration stack is updated using the openstack
stack update command, the instance is deleted and recreated using the updated user data
script.
To provide complex scripts using the user-data property, one must use the get_file intrinsic
function. The get_file function takes the name of a file as its argument.
resources:
the_instance:
type: OS::Nova::Server
properties:
...output omitted...
user_data_format: RAW
user_data:
str_replace:
template: { get_file: demoscript.sh }
params:
$demo: demofile
CL210-RHOSP10.1-en-2-20171006 401
resources:
the_config:
type: OS::Heat::SoftwareConfig
properties:
group: script
inputs:
- name: filename
- name: content
outputs:
- name: result
config:
get_file: demo-script.sh
resources:
the_deployment:
type: OS::Heat::SoftwareDeployment
402 CL210-RHOSP10.1-en-2-20171006
properties:
server:
get_resource: the_server
actions:
- CREATE
- UPDATE
config:
get_resource: the_config
input_values:
filename: demofile
content: 'Hello World'
• The OS::Nova::Server resource type defines the instance on which the software
configuration changes are applied. The user_data_format property of the
OS::Nova::Server resource type must use the SOFTWARE_CONFIG value to support the
software configuration changes using the OS::Heat::SoftwareDeployment resource.
resources:
the_server:
type: OS::Nova::Server
properties:
...output omitted...
user_data_format: SOFTWARE_CONFIG
• The os-collect-config agents poll the Orchestration API for updated resource metadata
that is associated with the OS::Nova::Server resource.
CL210-RHOSP10.1-en-2-20171006 403
The os-refresh-config agent uses the group property defined for the deployment
to process configuration. It uses the heat-config-hook script to apply the software
configuration changes. The heat-config-hook scripts are provided by the python-heat-
agent-* packages. Upon completion, the hook notifies the Orchestration API of a successful or
failed configuration deployment using the heat-config-notify element.
6. Optionally, specify the output of the stack using the attributes of the
OS::Heat::SoftwareDeployment resource.
7. Create the environment file with all input parameters required for launching the
orchestration stack.
9. Initiate the orchestration stack to configure the software using the openstack stack
create command.
10. Optionally, change the software configuration either by editing the configuration script
or by changing the input parameters passed during runtime. Commit the configuration
changes to the instance by updating the stack using the openstack stack update
command.
404 CL210-RHOSP10.1-en-2-20171006
References
Template Guide
https://docs.openstack.org/heat/latest/template_guide/index.html
Software configuration
https://docs.openstack.org/heat/latest/template_guide/software_deployment.html
CL210-RHOSP10.1-en-2-20171006 405
In this exercise, you will edit a orchestration template to launch a customized instance. You will
use a preexisting template and troubleshoot orchestration issues.
Resources
Files: http://materials.example.com/heat/finance-app1.yaml
http://materials.example.com/heat/ts-stack.yaml
http://materials.example.com/heat/ts-environment.yaml
Outcomes
You should be able to:
Steps
1. On workstation, create a directory named /home/student/heat-templates. The
/home/student/heat-templates directory will store downloaded template files and
environment files used for orchestration.
2. When you edit YAML files, you must use spaces, not the tab character, for indentation. If you
use vi for text editing, add a setting in the .vimrc file to set auto-indentation and set the
tab stop and shift width to two spaces for YAML files. Create the /home/student/.vimrc
file with the content, as shown:
406 CL210-RHOSP10.1-en-2-20171006
• The web server must host a web page containing the following content:
The $public_ip variable is the floating IP address of the instance. The $private_ip
variable is the private IP address of the instance. You will define these variables in the
template.
• The orchestration stack must retry once to execute the user data script. The user data
script must return success on the successful execution of the script. The script must
return the fail result if it is unable to execute the user data script within 600 seconds due
to timeout.
3.2. Use the user_data property to define the user data script to install the httpd
package. The httpd service must be started and enabled to start at boot time. The
user_data_format property for the OS::Nova::Server resource type must be set
to RAW.
web_server:
type: OS::Nova::Server
properties:
name: { get_param: instance_name }
image: { get_param: image_name }
flavor: { get_param: instance_flavor }
key_name: { get_param: key_name }
networks:
- port: { get_resource: web_net_port }
user_data_format: RAW
user_data:
str_replace:
template: |
#!/bin/bash
yum -y install httpd
CL210-RHOSP10.1-en-2-20171006 407
3.3. In the user_data property, create a web page with the following content:
The web page uses the $public_ip and the $private_ip variables passed
as parameters. These parameters are defined using the params property of the
str_replace intrinsic function.
web_server:
type: OS::Nova::Server
properties:
name: { get_param: instance_name }
image: { get_param: image_name }
flavor: { get_param: instance_flavor }
key_name: { get_param: key_name }
networks:
- port: { get_resource: web_net_port }
user_data_format: RAW
user_data:
str_replace:
template: |
#!/bin/bash
yum -y install httpd
systemctl restart httpd.service
systemctl enable httpd.service
sudo touch /var/www/html/index.html
sudo cat << EOF > /var/www/html/index.html
<h1>You are connected to $public_ip</h1>
<h2>The private IP address is:$private_ip</h2>
Red Hat Training
EOF
params:
$private_ip: {get_attr: [web_net_port,fixed_ips,0,ip_address]}
$public_ip: {get_attr: [web_floating_ip,floating_ip_address]}
3.4. Use the WaitHandleCondition resource to send a signal about the status of the user
data script.
The $wc_notify variable is set to the wait handle URL using the curl_cli
attribute of the wait_handle resource. The wait handle URL value is set to the
$wc_notify variable. The $wc_notify variable returns the status as SUCCESS if
the web page deployed by the script is accessible and returns 200 as the HTTP status
code. The web_server resource state is marked as CREATE_COMPLETE when the
WaitConditionHandle resource signals SUCCESS.
408 CL210-RHOSP10.1-en-2-20171006
web_server:
type: OS::Nova::Server
properties:
name: { get_param: instance_name }
image: { get_param: image_name }
flavor: { get_param: instance_flavor }
key_name: { get_param: key_name }
networks:
- port: { get_resource: web_net_port }
user_data_format: RAW
user_data:
str_replace:
template: |
#!/bin/bash
yum -y install httpd
systemctl restart httpd.service
systemctl enable httpd.service
sudo touch /var/www/html/index.html
sudo cat << EOF > /var/www/html/index.html
<h1>You are connected to $public_ip</h1>
<h2>The private IP address is:$private_ip</h2>
Red Hat Training
EOF
export response=$(curl -s -k \
--output /dev/null \
--write-out %{http_code} http://$public_ip/)
[[ ${response} -eq 200 ]] && $wc_notify \
--data-binary '{"status": "SUCCESS"}' \
|| $wc_notify --data-binary '{"status": "FAILURE"}'
params:
$private_ip: {get_attr: [web_net_port,fixed_ips,0,ip_address]}
$public_ip: {get_attr: [web_floating_ip,floating_ip_address]}
$wc_notify: {get_attr: [wait_handle,curl_cli]}
parameters:
image_name: finance-rhel7
instance_name: finance-web1
instance_flavor: m1.small
key_name: developer1-keypair1
public_net: provider-172.25.250
private_net: finance-network1
private_subnet: finance-subnet1
CL210-RHOSP10.1-en-2-20171006 409
5. Launch the stack and verify it by accessing the web page deployed on the instance. Use the
developer1 user credentials to launch the stack.
5.1. Using the developer1 user credentials to dry run the stack, check the resources that
will be created when launching the stack. Rectify all errors before proceeding to the
next step to launch the stack.
Note
Before running the dry run of the stack. Download the http://
materials.example.com/heat/finance-app1.yaml-final template
file in the /home/student/heat-templates directory. Use the diff
command to confirm the editing of the finance-app1.yaml template file
with the known good template file; finance-app1.yaml-final. Fix any
differences you find, then proceed to launch the stack.
5.2. Launch the stack using the finance-app1.yaml template file and the
environment.yaml environment file. Name the stack finance-app1.
If the dry run is successful, run the openstack stack create with the --enable-
rollback option. Do not use the --dry-run option while launching the stack.
410 CL210-RHOSP10.1-en-2-20171006
5.3. List the output returned by the finance-app1 stack. Check the website_url output
value.
5.4. Verify that the instance was provisioned and the user data was executed successfully on
the instance. Use the curl command to access the URL returned as the value for the
website_url output.
CL210-RHOSP10.1-en-2-20171006 411
In the previous output, the N represents the last octet of the floating IP address
associated with the instance. The P represents the last octet of the private IP address
associated with the instance.
parameters:
...output omitted...
instance_count:
type: number
description: count of servers to be provisioned
constraints:
- range: { min: 1, max: 2 }
...output omitted...
resources:
my_resource:
type: OS::Heat::ResourceGroup
properties:
count: { get_param: instance_count }
...output omitted...
412 CL210-RHOSP10.1-en-2-20171006
parameters:
image_name: finance-rhel7
instance_name: finance-web1
instance_flavor: m1.small
key_name: developer1-keypair1
public_net: provider-172.25.250
private_net: finance-network1
private_subnet: finance-subnet1
instance_count: 2
resource_registry:
My::Server::Custom::WebServer: finance-app1.yaml
parameters:
image_name: finance-rhel7
instance_name: finance-web1
instance_flavor: m1.small
key_name: developer1-keypair1
public_net: provider-172.25.250
private_net: finance-network1
private_subnet: finance-subnet1
instance_count: 2
resources:
my_resource:
type: OS::Heat::ResourceGroup
properties:
count: { get_param: instance_count }
resource_def:
type: My::Server::Custom::WebServer
properties:
instance_name: { get_param: instance_name }
instance_flavor: { get_param: instance_flavor }
image_name: { get_param: image_name }
key_name: { get_param: key_name }
public_net: { get_param: public_net }
private_net: { get_param: private_net }
private_subnet: { get_param: private_subnet }
CL210-RHOSP10.1-en-2-20171006 413
7.7. Use the developer1 user credentials to dry run the stack and check for resources
that will be created. Name the stack finance-app2. Use the nested-stack.yaml
template and the environment.yaml environment file. Rectify any errors before
proceeding to the next step to launch the stack.
Note
Before running the dry run of the stack. Download the http://
materials.example.com/heat/nested-stack.yaml-final template
file in the /home/student/heat-templates directory. Use the diff
command to confirm the editing of the nested-stack.yaml template file
with the known good template file; nested-stack.yaml-final. Fix any
differences you find, then proceed to launch the stack.
7.8. Launch the stack using the nested-stack.yaml template file and the
environment.yaml environment file. Name the stack finance-app2.
If the dry run succeeds, run the openstack stack create with --enable-
rollback option. Do not use the --dry-run option while launching the stack.
414 CL210-RHOSP10.1-en-2-20171006
8.1. Download the template and the environment files in the /home/student/templates
directory.
8.2. Verify that the Heat template does not contain any errors.
Use the developer1 user credentials to dry run the stack and check for any errors.
Name the stack finance-app3. Use the ts-stack.yaml template and the ts-
environment.yaml environment file.
CL210-RHOSP10.1-en-2-20171006 415
8.3. Fix the indentation error for the name property of the OS::Nova::Server resource
type.
web_server:
type: OS::Nova::Server
properties:
name: { get_param: instance_name }
8.4. Verify the indentation fix by running the dry run of the finance-app3 stack again.
8.5. Resolve the error as the keypair passed in the ts-environment.yaml file does not
exists.
The finance-app3 stack dry run must not return any error.
8.7. Launch the stack using the ts-stack.yaml template file and the ts-
environment.yaml environment file. Name the stack finance-app3.
416 CL210-RHOSP10.1-en-2-20171006
Cleanup
From workstation, run the lab orchestration-heat-templates cleanup command to
clean up this exercise.
CL210-RHOSP10.1-en-2-20171006 417
Objective
After completing this section, students should be able to implement Autoscaling.
• Autoscaling detects an unhealthy instance, terminates it, and launches a new instance to
replace it.
• Autoscaling allows cloud resources to run with the capacity required to handle the demand.
There are two types of scaling architecture: scale-up and scale-out. In scale-up architecture,
scaling adds more capacity by increasing the resources such as memory, CPU, disk IOPS, and so
on. In scale-out architecture, scaling adds more capacity by increasing the number of servers to
handle the load.
The scale-up architecture is simple to implement but hits a saturation point sooner or later. If
you keep adding more memory to an existing cloud instance to adapt with the current load,
saturation is reached once the instance's host itself runs out of resources. The scaling scope in
this case depends entirely upon the hardware capacity of the node where the cloud instance
is hosted. In scale-out architecture, new identical resources are created to fulfill the load with
virtually unlimited scaling scope. Therefore, the scale-out architecture is preferred and the
recommended approach for cloud infrastructure.
Autoscaling requires a trigger generated from an alarming service to scale out or scale in.
In Red Hat OpenStack Platform, the Orchestration service implements autoscaling by using
utilization data gathered from the Telemetry service. An alarm acts as the trigger to autoscale
an orchestration stack based on the resource utilization threshold or the event pattern defined in
the alarm.
418 CL210-RHOSP10.1-en-2-20171006
An orchestration stack is also automatically scaled using the Aodh event alarms. For example,
when an instance abruptly stops, the stack marks the server unhealthy and launches a new
server to replace it.
CL210-RHOSP10.1-en-2-20171006 419
420 CL210-RHOSP10.1-en-2-20171006
When the event alarm associated with the load balancer detects an event indicating one of the
instance in the pool is stopped or deleted, a scaling event occurs. It first marks the server as
unhealthy and begins a stack update to replace it with a new identical stack automatically.
The following recommended practices help you to plan and organize autoscaling with an
Orchestration stack:
• Scale-out architecture is more suitable for cloud computing and autoscaling, whereas scale up
is a better option for traditional virtualization platform.
• Stateless application architecture is most appropriate for autoscaling. When a server goes
down or transitions into an error state, it is not repaired, but is removed from the stack and
replaced by a new server.
• It is better to scale up faster than scale down. For example, when scaling up, do not add one
server after five minutes then another one after ten minutes. Instead, add two servers at once.
• Avoid unnecessary scaling by defining a reasonable cool-down period in the Autoscaling group.
• Ensure that the Telemetry service is operating correctly and emitting the metrics required for
autoscaling to work.
• The granularity defined for the alarm must match the archive policy used by the metric.
CL210-RHOSP10.1-en-2-20171006 421
• Test your scaling policies by simulating real-world data. For example, use the openstack
metric measures add command to push new measures directly to the metric and check if
that triggers the scaling as expected.
Autoscaling Configuration
In a template, the Autoscaling resource group defines the resource to be provisioned. It launches
a number of instances defined by the desired capacity or minimum group size parameters.
Telemetry alarms are defined to trigger autoscaling to either scale out or scale in, based on the
alarm rules. Primarily there are two alarms: one for scaling out and the other for scaling in. The
action for these alarms invokes the URL associated with the scaling-out policy and scaling-in
policy.
The Autoscaling policy defines the number of resources that need to be added or removed in the
event of scale out or scale in. It uses the defined Autoscaling group. To adjust to various usage
patterns, multiple Autoscaling policies can be defined to automatically scale the infrastructure.
Almost all metrics monitored by the Telemetry service can be used to scale orchestration
stacks dynamically. The following Orchestration resource types are used to create resources for
autoscaling:
OS::Heat::AutoScalingGroup
This resource type is used to define an Autoscaling resource group. Required properties
include max_size, min_size, and resource. Optional properties include cooldown,
desired_capacity, and rolling_updates.
The resource property defines the resource and its properties that are created in the
Autoscaling group.
The max_size property defines the maximum number of identical resources in the
Autoscaling group. The min_size property defines the minimum number of identical
resources that must be running in the Autoscaling group.
The desired_capacity property defines the desired initial number of resources. If not
specified, the value of desired_capacity is equal to the value of min_size. The optional
cooldown property defines the time gap, in seconds, between two consecutive scaling
events.
The rolling_updates property defines the sequence for rolling out the updates. It
streamlines the update rather than taking down the entire service at the same time. The
optional max_batch_size and min_in_service parameters of the property define
maximum and minimum numbers of resources to be replaced at once. The pause_time
property defines a time to wait between two consecutive updates.
web_scaler:
type: OS::Heat::AutoScalingGroup
properties:
desired_capacity: 2
cooldown: 100
max_size: 5
min_size: 1
resource:
type: My::Server::Custom::WebServer
properties:
422 CL210-RHOSP10.1-en-2-20171006
OS::Heat::ScalingPolicy
The OS::Heat::AutoScalingGroup resource type defines the Autoscaling policy used to
manage scaling in the Autoscaling group. Required properties include adjustment_type,
auto_scaling_group_id, and scaling_adjustment. Optional properties include
cooldown and min_adjustment_step.
The Autoscaling policy uses the adjustment_type property to decide on the type of
adjustment needed. When a scaling policy is executed, it changes the current capacity
of the Autoscaling group using the scaling_adjustment specified in the policy. The
value for the property can be set to change_in_capacity, exact_capacity, or
percentage_change_in_capacity.
The Autoscaling policy uses the auto_scaling_group_id property to apply the policy to
the Autoscaling group. The scaling_adjustment property defines the size of adjustment.
A positive value indicates that resources should be added. A negative value terminates
the resource. The cooldown property defines the time gap, in seconds, between two
consecutive scaling events.
The resource return two attributes: alarm_url and signal_url. The alarm_url attribute
returns a signed URL to handle the alarm associated with the scaling policy. This attribute
is used by an alarm to send a request to either scale in or scale out, depending on the
associated scaling policy. The signal_url attribute is an URL to handle the alarm using the
native API that is used for scaling. The attribute value must be invoked as a REST API call
with a valid authentication token.
scaleup_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: { get_resource: web_scaler }
cooldown: 180
scaling_adjustment: 1
scaledown_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: { get_resource: web_scaler }
cooldown: 180
scaling_adjustment: -1
CL210-RHOSP10.1-en-2-20171006 423
OS::Aodh::GnocchiAggregationByResourcesAlarm
This resource type defines the Aodh telemetry alarm based on the aggregation of
resources. The alarm monitors the usage of all the sub-resources of a resource. Required
properties include metric, query, resource_type, and threshold. Optional
properties include aggregation_method, alarm_actions, comparison_operator,
evaluation_periods, and granularity.
The alarm_actions property defines the action to be taken when the alarm is triggered.
When the alarm associated with a scaling policy is triggered, the alarm_actions property
calls the signal_url attribute of the Autoscaling policy. The signal_url attribute is the
URL that handles an alarm.
memory_alarm_high:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
properties:
description: Scale up if memory usage is 50% for 5 minutes
metric: memory
aggregation_method: mean
granularity: 300
evaluation_periods: 1
threshold: 600
resource_type: instance
comparison_operator: gt
alarm_actions:
- str_replace:
template: trust+url
params:
url: {get_attr: [scaleup_policy, signal_url]}
query:
str_replace:
template: '{"=": {"server_group": "stack_id"}}'
params:
stack_id: {get_param: "OS::stack_id"}
2. Define the outputs section to return the output values using the signal_url attribute of
the ScalingPolicy resources.
3. Launch the orchestration stack. List the output values returned by the signal_url
attribute for both scaling out and scaling in policies.
424 CL210-RHOSP10.1-en-2-20171006
5. Manually scale out or scale in by invoking the REST API using the signal_url attribute
value along with the token ID generated.
The following log files for each orchestration service are stored in the /var/log/heat directory
on the host where the Orchestration components are deployed.
heat-api.log
The /var/log/heat/heat-api.log log file records API calls to the orchestration service.
heat-engine.log
The /var/log/heat/heat-engine.log log file stores the processing of orchestration
templates and the requests to the underlying API for the resources defined in the template.
heat-manage.log
The /var/log/heat/heat-manage.log log file stores the events that occur when
deploying a stack, or when a scaling event is triggered.
Alarms play important roles in the autoscaling of instances. The following log files for the Aodh
alarming service is stored in the /var/log/Aodh directory of the controller node.
listener.log
Logs related to the Aodh alarming service querying the Gnocchi metering service are
recorded in this file. The /var/log/aodh/listener.log log file provides information to
troubleshoot situations when the Alarming service is unable to reach the Telemetry service
to evaluate the alarm condition.
notifier.log
Logs related to notifications provided by an Aodh alarm are recorded in this file. The /
var/log/aodh/notifier.log log file is helpful when troubleshooting situations where
the Alarming service is unable to reach the signal_url defined for the alarm to trigger
autoscaling.
evaluator.log
The Alarming service evaluates the usage data every minute, or as defined in the
alarm definition. Should the evaluation fail, errors are logged in the /var/log/aodh/
evaluator.log log file.
If the autoscaling stack fails to deploy, use the openstack stack command to identify the
failed component. Use the openstack stack list command with the --show-nested
CL210-RHOSP10.1-en-2-20171006 425
option to view all nested stacks. The command returns the nested stack IDs, names, and stack
status.
Use the openstack stack resource list command to identify the failed resource. The
command returns the resource name, physical resource ID, resource type, and its status. The
physical resource ID can then be queried using the openstack stack resource show
command to check the output value returned while creating the resource.
References
Further information is available in the Configure Autoscaling for Compute section of
the Autoscaling for Compute for Red Hat OpenStack Platform at
https://access.redhat.com/documentation/en/red-hat-openstack-platform/
426 CL210-RHOSP10.1-en-2-20171006
1. Which OpenStack service provides the evaluation criteria for triggering auto-scaling?
a. Nova
b. Gnocchi
c. Aodh
d. Ceilometer
2. Which two statements are true about autoscaling using an orchestration stack? (Choose
two.)
3. What is the resource type required to define the Auto Scaling policy using an orchestration
stack?
a. OS::Heat::AutoScalingPolicy
b. OS::Nova::Server
c. OS::Heat::ScalingPolicy
d. OS::Heat::AutoScalingGroup
4. Which property of the AutoScalingGroup resource is used to define the time gap between
two consecutive scaling events?
a. cooldown
b. wait
c. pause
d. timeout
5. Which three are allowed values for the adjustment_type property of a scaling policy
resource? (Choose three)
a. change_capacity
b. change_in_capacity
c. exact_capacity
d. exact_in_capacity
e. percentage_change_in_capacity
f. percentage_change_capacity
6. Which attribute of the scaling policy returns the signed URL to handle the alarm associated
with the scaling policy?
CL210-RHOSP10.1-en-2-20171006 427
a. signed_URL
b. signal_URL
c. alarm_URL
d. scale_URL
428 CL210-RHOSP10.1-en-2-20171006
Solution
Choose the correct answer(s) to the following questions:
1. Which OpenStack service provides the evaluation criteria for triggering auto-scaling?
a. Nova
b. Gnocchi
c. Aodh
d. Ceilometer
2. Which two statements are true about autoscaling using an orchestration stack? (Choose
two.)
3. What is the resource type required to define the Auto Scaling policy using an orchestration
stack?
a. OS::Heat::AutoScalingPolicy
b. OS::Nova::Server
c. OS::Heat::ScalingPolicy
d. OS::Heat::AutoScalingGroup
4. Which property of the AutoScalingGroup resource is used to define the time gap between
two consecutive scaling events?
a. cooldown
b. wait
c. pause
d. timeout
5. Which three are allowed values for the adjustment_type property of a scaling policy
resource? (Choose three)
a. change_capacity
b. change_in_capacity
c. exact_capacity
d. exact_in_capacity
e. percentage_change_in_capacity
f. percentage_change_capacity
6. Which attribute of the scaling policy returns the signed URL to handle the alarm associated
with the scaling policy?
a. signed_URL
b. signal_URL
CL210-RHOSP10.1-en-2-20171006 429
c. alarm_URL
d. scale_URL
430 CL210-RHOSP10.1-en-2-20171006
Summary
In this chapter, you learned:
• The Orchestration service (Heat) provides the developers and the system administrators an
easy and repeatable way to create and manage a collection of related OpenStack resources.
• The Orchestration API service forwards requests to the Orchestration engine service using
remote procedure calls (RPCs) over AMQP.
• The Orchestration engine service interprets the orchestration template and launches the stack.
• Using multiple layers of stacks that build on top of one another is the best way to organize an
orchestration stack.
• Changes in infrastructure after updating a stack must be verified first by doing a dry run of the
stack.
• Intrinsic functions in the Heat orchestration template assign values to properties that are
available during the creation of a stack.
• When the user data is changed and the orchestration stack is updated using the openstack
stack update command, the instance is deleted and recreated using the updated user data
script.
• The AutoScalingGroup and the ScalingPolicy resources of the Orchestration stack help
build self-healing infrastructure.
• Stateless servers are more suitable for autoscaling. If a server goes down or transitions into an
error state, instead of repairing the server, it will be replaced with a new server.
CL210-RHOSP10.1-en-2-20171006 431