Download as pdf or txt
Download as pdf or txt
You are on page 1of 100

WZD-STFSS-305 Oracle Flash Storage Path Manager FSPM 4

Slide 1

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 1
Slide 2

Oracle FS Path Manager FSPM 4

Jeremy Farrell, Senior Principal Engineer


FSPM Development

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Hello and welcome to the Oracle Flash Storage Path Manager FSPM 4 course. This course was
originally developed and delivered as a TOI by Jeremy Farrell, Senior Principal Engineer in
FSPM development and was updated in March 2015.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 2
Slide 3

FSPM 4 Agenda

1 FSPM – What and Why


2 Information common to all the implementations - Common
architecture, features, and support
3 Details for each implementation - FSPM products for
Windows, Linux, Solaris, AIX, and HP-UX

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 3

In this course, we will talk about what FSPM is, and we will review information common to all the
implementations of FSPM, such as architecture, features, and support. Then, we will review
details that are specific to each implementation, such as FSPM products for Windows, Linux,
Solaris, AIX, and HP-UX.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 3
Slide 4

FSPM – What is it?

• Runs on SAN host systems


– Enables SAN LUN multipathing, one way or another
– Integrates into management system on Pilot – GUI and CLI
– Common functionality for all multipath SAN hosts
– Enforces OS updates and patches where needed
– Enables host log collection for SAN problems
• Strongly recommended for all SAN hosts

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

So, what is FSPM? It is software that runs on SAN hosts. FSPM’s purpose is to enable
multipathing, and to make sure that multipathing works with Oracle FS and legacy Pillar Axiom
systems, as best it can. FSPM integrates into the management system on the Oracle FS, so it
automates things in the GUI, and it allows configuration of FSPM from the GUI.

FSPM provides common functionality for all the different operating systems. When using Oracle
FS, you get the same feel for different host operating systems as you’re controlling things. One
of the important features is that when installing FSPM, FSPM will enforce any patch levels that
are necessary, which is vital on some operating systems. Without some of the patches,
multipathing does not work very well.

Another important feature from the internal point of view is that having FSPM running on a host
allows collecting a standard set of logs that provide information about what is happening with
multipathing on a host. It is strongly recommended that FSPM should be installed on all SAN
hosts connected to Oracle Flash Storage systems.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 4
Slide 5

FSPM – What can I install it on?

• There are FSPM products for these Server Operating Systems


– Windows Server 2008 and later
– Linux – many recent Linux Server releases
– Solaris 10 and later
– AIX 5.3 and later
– HP-UX 11i v3 u3 and later
• Details of supported OS releases will be discussed later

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

There are five different FSPM products based on operating systems. There is FSPM for
Windows Server, the new release supports 2008 and later; FSPM for a range of several recent
Enterprise Linux Server releases; FSPM for Solaris 10 and later versions; FSPM for AIX 5.3 and
later versions; and FSPM for the current releases of HP-UX.

Details about the supported operating systems will be discussed later in the course.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 5
Slide 6

FSPM – how does it relate to APM?

• APM renamed, with added goodness


– FSPM is next release of APM, adds support for Oracle FS Systems
– Can connect to Pillar Axiom and Oracle FS Systems at same time
• Pillar Axiom customers should install latest FSPM for their OS
– APM only for old OSes which are not supported by FSPM
– Existing installations OK while they work, fixes will be to FSPM only
• Some APM releases are no longer available for download
– Particularly those whose range of OS versions is supported by FSPM

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

You might be wondering how FSPM and APM are related. Well, FSPM is APM renamed. FSPM
adds support for the Oracle Flash Storage system. It can connect to Pillar Axiom and Oracle FS
systems at the same time, so there’s no need to install both APM and FSPM. Customers who
have Axiom systems but do not have Oracle FS systems should regard FSPM as the latest
release of APM, and install only FSPM. Customers should install the newest release of FSPM
that supports the host operating system they are using.

Some of the existing APM releases may no longer be available; particularly the APM releases
that are supported on the same operating systems as FSPM is supported. The reasons for
these decisions, for Linux in particular, will be covered later in this course.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 6
Slide 7

FSPM – Why Good Multipathing Matters

• Asymmetric Logical Unit Access (ALUA)


– Each LUN has a particular “home” Controller (or Control Unit)
– Can be accessed through the “buddy” Controller, but at a cost
– Paths to the home are “Optimized”, to the buddy are “Non-Optimized”
– LUNs can move between Controllers because of failures
– Hosts should only ever use Optimized paths if there are any
• A multipath host must correctly handle ALUA

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Why does multipathing matter? Well, multipathing is good for providing defense against a SAN
failure or to possibly improve performance. The Oracle FS systems use a system called
Asymmetric Logical Unit Access or ALUA. This means that paths connected to one of the
Controllers work substantially better than paths connected to the other Controller.

The LUNs are accessible through both Controllers, but if you go through the wrong Controller,
what we term as non-optimized paths, then access to that LUN will be slower and that puts a
higher load onto the Oracle FS system. This can impact access to all other LUNs by everybody
else. It is important for hosts to always use optimized paths. Therefore, good multipathing
software is needed on those hosts, to correctly enable and enforce the ALUA access. FSPM
does that, or aims to.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 7
Slide 8

FSPM – How FSPM Helps

• Implements multipathing in some implementations


– Windows, AIX, Linux
• Automates configuration tasks, ensures correct configuration
– At the Oracle FS System, and at the host
• Installation checks OS update and patch levels
– All versions of FSPM rely on OS functionality to some degree
– Example: Solaris 10 multipathing requires a number of updates and patches to work
reliably with ALUA systems like ours
• FSPM makes multipathing work as well as it can

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

On some implementations, FSPM implements multipathing to a greater or lesser degree. This is


true for Windows and AIX, and FSPM is heavily involved in Linux implementations. Multipathing
is provided by OS functionality on Solaris and HP-UX implementations.

FSPM also automates configuration tasks of the Oracle FS system, and ensures the
configuration is correct. If there are any configuration options that need setting at the host,
FSPM will set the options when it can, and it will automatically configure the host into the Oracle
FS system.

As previously mentioned, the installation checks the OS update and patch levels. Solaris 10 is a
classic example where there were some distinct problems for which patches are available. The
patches must be installed if Solaris is going to work reliably with ALUA. The overall aim is that
installing FSPM makes multipathing work as well as it can on the host OS.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 8
Slide 9

FSPM – When do I install it?

• Part of connecting the host to the Oracle FS System


– Make a SAN connection from the host
– Install FSPM on the host
• FSPM automatically configures the host in the Oracle FS System
– No need to configure FC WWNs in the Oracle FS GUI
– Ready for mapping LUNs
• Can be installed later if the host was hand-configured originally
– Need to delete hand-created host entry from GUI first

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Ideally, you should install FSPM when initially connecting the host to the Oracle FS system.
Connect the SAN to the host and the Oracle FS, ensure that the ports are open to enable a TCP
connection between the host and the Pilot on the Oracle FS, then install FSPM on the host.
FSPM will then automatically configure the host into the Oracle FS. There is no need to hand
configure the port WWNs or iSCSI initiator names in the FS. It gets the FS into the state where
it’s ready for mapping LUNs to the host.

If, for whatever reason, someone has configured the host by hand, there is no substantial
problem with installing FSPM later. In that case, the previously-created host entry should be
deleted before installing FSPM. There is an option when deleting an entry to remove or not
remove mappings. Do not remove mappings at that point. Delete the hand-created host entry,
and FSPM will create a new host entry, and will carry over the mappings from the previous
entry.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 9
Slide 10

FSPM – Where do I get it?

• Download from Oracle Technology Network


– http://www.oracle.com/technetwork/server-storage/san-storage/downloads/
• One ZIP file for each release of each FSPM product
– Installation packages for each supported hardware platform
– Installation Guide PDF, Release Notes PDF, readme text file
• Installation Guides also available under the Help Center
– http://docs.oracle.com/en/storage/#fla

• Please read the Guides – cover much more than FSPM Installation

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

FSPM can be downloaded from the Oracle Technology Network. The release style is the same
as the APM release style. There will be one zip file for each FSPM product. The zip file will
contain the installation packages for all supported hardware. It will also contain copies of the
installation guides, the current release notes, and a brief readme that describes what’s in the
file.

The installation guides can also be obtained from the Help Center. We strongly encourage all
customers and support employees to read the guides. They are called installation guides, but
they cover much more such as various aspects of having FSPM installed on the host, things
that can be done with FSPM, and how FSPM fits into the host OS.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 10
Slide 11

FSPM – Common Information for all


Implementations

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 11

Now, we will cover information that is common to all implementations of FSPM.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 11
Slide 12

FSPM – Common to All Implementations

 FSPM Architecture Overview

 The FSPM Daemon

 Management Integration with the Oracle FS System

 Collecting logs

 Review Oracle FS System ALUA implementation

 Supported features
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

First we will give a high-level overview of the FSPM architecture. Then we will get into more
detail on the FSPM daemon, how it does management integration with the FS, how it enables
collecting logs, and how to collect logs.

Then we will talk about some features of the Oracle FS and the Pillar Axiom, in particular, the
ALUA implementation because the way that implementation works greatly affects what you will
see in the logs and the behavior of the host. And finally, we will review a few other supported
features that are relevant to some hosts.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 12
Slide 13

FSPM – Architecture Overview

• All implementations have two major components


• A “Driver” to implement I/O multipathing on the SAN
– Looks after the “data path”
– May be supplied entirely by the OS or partly by FSPM
– Different in each implementation, details later
• A “Daemon” or “Service” to integrate with the Pilot software
– Looks after the “control path”
– Management services, mainly common code

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

All implementations have two major components. First, there is a driver which does the
multipathing. The driver looks after what we generally term the data path. The driver may be
provided entirely by the operating system, or it could be provided partly or even wholly by
FSPM. In current implementations, there is no FSPM which provides the entire multipathing
driver. The details of how and what implements the driver are different in each FSPM
implementation. We will discuss these later.

Now we are going to talk about the daemon or service, which runs on the host and integrates
with the software on the Pilot. This looks after the control path, and provides management
services and integration into the Oracle FS. It is mainly common code; it is central to the same
stuff running on all the implementations.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 13
Slide 14

FSPM Daemon – Role

• Create and maintain host entry in Oracle FS System Manager


– Host name, and list of FC and iSCSI initiators
• Monitor driver and host, and update Pilot
– Counts of available paths, add or remove of initiators
• Receive commands from Pilot
– Collect logs
– Update load balancing settings
– Bring newly mapped LUNs and paths on-line

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Now let’s take a look at the role of the daemon. At the start, the daemon creates the host entry
in the Oracle FS system, and it will maintain that host entry if anything subsequently changes on
the host. That consists of the host name, which is lifted from the host itself and a list of all the
SAN initiators on the host, the Fibre Channel port WWNs, and the iSCSI initiator names. The
daemon monitors the FSPM driver and other aspects of the host, and it updates the Pilot if
things change. It also provides a count of available paths as seen at the host.

The number of paths available to the software on the host may not be quite the same as the
number of physically connected paths at the time. Having the count as seen by the software,
gives a general idea of the risk of losing connection altogether at that time. It also monitors the
host for initiators that are being added or removed, which is increasingly common on the high-
end machines, with either virtualized host adaptors or even physical ones which can be moved
between partitions on large machines.

The daemon also receives commands coming back from the Pilot. The daemon can be told to
collect the logs of the host and deliver them to the Pilot, and it can be told to update load
balancing settings for individual LUNs. The daemon (demon) is also told when mapping of LUNs
and enabling of paths of the Oracle FS is done. The plug and play implementation on some host
operating systems is not that strong, so manual intervention is needed in order to indicate that
something has changed, for example, to discover a new LUN. To the extent that it can, FSPM
will handle the changes automatically. FSPM will run the commands necessary to bring new
LUNS online when it is told about them from the Pilot.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 14
Slide 15

FSPM Daemon – Implementation


Discover Oracle FS Systems

• Scans the SAN to discover Oracle FS and Pillar Axiom


– INQUIRY command to LUN 0 on each discovered SAN target port
• Need not be a storage LUN visible to the host at LUN 0
– Identifies Oracle FS Systems and Pillar Axioms
– Gets Pilot management VIF IP Address from INQUIRY data
• NOTE: an FSPM host must be able to access the system on the SAN before
it can connect to the Pilot

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The daemon scans the SAN to discover the systems which are available and which might be
presenting LUNs to the host. To do that, it sends an INQUIRY command addressed to LUN zero
on each discovered SAN target. There does not actually need to be a storage LUN at LUN zero.

The SCSI protocol requires that targets must respond to commands at LUN zero whether or not
there is actually a LUN present there. The INQUIRY command sends information that enables
us to identify this target as an Oracle FS system or as a Pillar Axiom, and the INQUIRY data
also includes the IP address of the Pilot associated with that Oracle FS system.

By scanning the SAN, we discover what Axioms and Oracle FS systems are present, and we
get the addresses of the control path. In order to discover a system and find the control path
connection, the FSPM host must be able to see the system on the SAN. So, connecting over
the SAN is the first thing to do.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 15
Slide 16

FSPM Daemon – Implementation


Collect Host Information

• Scans the host to collect useful information


– SAN Initiators – FC Port WWNs, iSCSI Initiator Names
– Host name
– General information – OS version, FSPM version, …
– OS’s name for any of our LUNs which it can see
– Numbers of Optimized and Non-Optimized paths to our LUNs

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Now let’s look at the implementation of the daemon and the loop that it sits in. It starts out
scanning the host, collecting all the useful information together, both the vital information such
as the host name and the SAN initiator identifiers, and more general information such as the
operating system versions and the FSPM versions. It also derives the name of the LUNs as
seen on the operating system.

The operating system creates names by which the LUNs are accessed. The operator needs to
know those names in order to make use of those LUNs. By picking that up and passing it back
to the Pilot, we can provide a mapping between the LUN names as they are known on the Pilot
and the LUN names as they are presented on the host.

As mentioned earlier, we also keep track of the number of optimized and non-optimized paths
currently available on the hosts for each of these LUNs.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 16
Slide 17

FSPM Daemon – Implementation


Log In to Pilot

• Opens TCP connections to all discovered Pilot IP addresses


– Port 26004 on Pillar Axiom, 26012 on Oracle FS System
• Host must be able to make TCP connection to these ports

• Logs in, and secures the connection using SSL/TLS


– Obfuscated username/password on Pillar Axiom
– Authentication handshake involving the SAN on Oracle FS System

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Having gathered all this information, the TCP connection is opened to talk to the Pilots. On Pillar
Axiom systems, we talk to Port 26004, and on Oracle FS systems we talk to Port 26012. The
reason there are different ports is that totally different protocols are used on these two systems.
This was one of the major changes in FSPM in order to support the Oracle FS system, but it is
essentially transparent to everyone. Those ports must be available, and the host must be able
to make a TCP connection to those ports to have the control path connection to the Pilot.

It logs in and it secures that connection using secure sockets. The log in is done using an
obfuscated username on Pillar Axiom which is a pretense of security. Security on the Oracle FS
is done properly, using an authentication handshake which incorporates the SAN.

The authentication principle is that if the host is able to talk to an Oracle FS system over the
SAN, then it is allowed to talk to it and log into it through the Pilot.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 17
Slide 18

FSPM Daemon – Implementation


Loop

• Sends message providing host information to each system


• Keeps connections open “forever”
– Indicates “still here” to Pilot
• Actions messages from Pilots
– Load balancing, log collection, mapping and masking changes
• Monitors for changes in host information
– Rescans the host and updates all Pilots if there are changes

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

We then go into a loop. We sit forever. We send a message providing all that host information to
each Pilot found, then we keep the connection open which indicates that the daemon is still
there. Messages may be received from the Pilot asking for log collection or the other features
mentioned earlier.

We monitor for any changes in the host information, or changes in the number of paths or
changes in the initiators, or any other changes of interest. Whenever anything relevant changes,
we rescan the host and update the Pilots.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 18
Slide 19

FSPM Integration – cadps102 before FSPM


Unassociated Initiator Ports

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Now we will look at how to initially connect to the Pilot. We have a host on the system which
does not have FSPM installed, but the host is connected on the SAN. Under the host name,
there’s a special pseudo host name called Unassociated, which is where all SAN initiators which
have been seen by the Oracle FS system get listed if the Oracle FS doesn’t know what those
initiators belong to.

The screenshot shows that there is a stack of Fibre Channel initiator ports listed. They will
typically be listed as connected if they are connected to the Controller. If the Oracle FS system
has seen this initiator but it is no longer connected, it will remember that it has seen it but it will
display it here as Not Connected. We then install FSPM on the host, and automatically the host
entry arrives with the host name as the name is configured on the host itself, and all the
initiators associated with that host have been taken out of the Unassociated list, and brought in
under this host name.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 19
Slide 20

FSPM Integration – cadps102 with FSPM


Host entry created by FSPM

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

This saves the admin from having to collect all the information themselves and it avoids the risk
of incorrectly typing the information. The list from the host includes all SAN initiators in that host,
whether or not they are currently connected to the Oracle FS system. This may mean that
initiators are included, that the host is not going to use with the Oracle FS system. If they are not
connected to the Oracle FS system, it does not matter that they are included in this definition.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 20
Slide 21

FSPM Integration

Status
• Communicating – host is currently
logged in to the Pilot
• Not Communicating – logged in
previously, but not at this moment
• Not Registered – FSPM has never
logged in with this host name

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Now let’s take a closer look at some of the information presented on the screen we just looked
at. One of the fields in the table is the Status for Oracle FSPM. Status can have three different
values. If the status is “Communicating,” it means there is currently an open connection
between the host and the Pilot; the host is logged into the Pilot.

A status of “Not Communicating” means that the host has logged in at some time in the past.
FSPM has been installed or the discovery has worked, and an FSPM entry has been created in
the Oracle FS Management System, but that connection is not currently in place. This could be
the case if there’s a problem on the TCP network, or the host is simply down, or the daemon is
not running.

A status of “Not Registered,” means that the FSPM has never logged in with that host name.
This status will display for hosts that have been created by hand, which are not FSPM hosts.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 21
Slide 22

FSPM Integration

Version String
• a.b.c / d.e.f – first part is version of
daemon and software package,
second is the “driver” or other OS-
specific details
• a.b.c – both components have the
same version
• ?.?.? – version not available
• More details with each OS

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Another piece of information displayed here is the version string for FSPM. This slide shows the
generic definition of the version string. Two parts separated by a slash. The first part of the
string is the version number built into the daemon and is usually also the version number of the
software package from which FSPM was installed. The second part of the string is normally a
version number of the driver. Driver can mean different things on different implementations, so
it contains very specific information.

If both of the strings are identical, then only one string is displayed. For example, 4.0.2 indicates
that both the daemon and the driver on the software package itself are version 4.0.2. If the
version string includes three questions marks, it means that FSPM has not been able to find the
version number for the component in question. This should never happen. This definitely means
that something is wrong and needs to be investigated.

More detail will be provided later in the sections on the individual operating systems on what the
operating system-specific version strings can mean.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 22
Slide 23

FSPM Integration – GUI Host Information


View cadps102 information, modify load balancing settings

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Once FSPM is installed and it is up and running, what we can do with it? The short answer is not very
much. The idea of FSPM is to sit there silently without getting in the way and ensuring that multipathing
works as well as it can. User interaction with FSPM should be minimal. FSPM does enable looking at
information about the host’s interaction with the Oracle FS. The details from the SAN host screen in the
GUI show the host name, the IP address from which the host is connected to the Pilot, details of the
operating system running on the host, and the version number. Underneath are the LUN names of each
LUN mapped to that host and also the name on the host. This is a very useful mapping for the Admin who
created the hosts and the LUNs, and now wants to use each one for different purposes. From here, the
Admin can easily tell which name on the operating system corresponds to which LUN created on the
Axiom or on the Oracle FS. The number of optimized and non-optimized paths to each of the LUNs is
displayed, as known by the software on the host at that time.

This is important information. Obviously, if the number drops to zero there are big problems; there is no
connection at that time. For whatever reasons there can be issues where the software is unable to use a
path which is physically connected, so it is useful to look at this information to see the host view of what is
available at that moment. The one thing that you can actually control and use FSPM for directly is to set a
load balancing setting for each individual LUN.

There are two loan balancing methods. One is a round robin method where I/O is sent in turn to each
optimized path or a static mode whereby we choose an optimized path and use that path exclusively until
such time as the path becomes unavailable, or something else changes. People often ask which method
should be used. There really is no good answer to that question. The default is round robin, because that
will generally ensure that all facilities available on the host under the Oracle FS are being used relatively
evenly. Under some circumstances it is possible that static mode balancing may give better performance
on an individual LUN. However, performance may be reduced on other LUNs by using the static mode
method. It’s not generally possible to predict which LUNs or which usage paths will produce better
performance with either load balancing method.

The only advice to give is if the default round robin load balancing method is not adequate or you do not
have the time to play, then try the static mode and see how it behaves. It may or may not be better.
Unless there is a good reason to change to static mode, it is recommended that you stay with the round
robin method.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 23
Slide 24

FSPM Host Log Collection

• Daemon and some Driver logs are kept in memory


– collect logs before restarting the Daemon or the host
• FSPM logs should be collected from the Oracle FS System
– Include Host, Pilot and Controller logs in the same bundle
– Enables times to be synchronized and full story seen
– Can only be done when host is in Communicating status
• Mechanism available to collect logs locally on the host
– Use only if host Not Communicating, or normal method fails

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Log collection is an important feature of FSPM. Log collection enables collecting host logs which
are of particular interest for both the FSPM and for multipathing in general and for SAN access
in general. If anything goes wrong with a SAN connection from the host, picking up the FSPM
logs will provide the information needed to resolve the issue.

The traces created by the daemon and by drivers provided by FSPM itself are kept in memory.
These traces are not dumped out regularly to disk, so it is important that logs be collected
before restarting the daemon or rebooting the host. Unfortunately, it is not unusual to see
situations where something goes wrong, everyone tries restarting and rebooting everything, and
then they hope to know what went wrong.

That does not happen often, and the situation can usually be resolved anyway. But as a general
principle, as soon as something goes wrong, the first thing to do is to pick up the host logs
because it will provide the most useful information that goes way back. Ideally, the FSPM logs
should be collected using the normal log collection mechanism on the Oracle FS or the Pillar
Axiom.

You should pick up the host logs at the same time as the Pilot and Controller logs, enabling
times to be properly synchronized between the hosts and the Oracle FS, and enabling seeing
the full story of what’s going on with communication; three-way communication between the
Pilot, the Controllers, and the host. With these three sets of logs, we should be able to get a full
story of what’s been happening.

There is also a mechanism available to collect logs locally on the host. This is an emergency
mechanism only, and should only be used if the host is not currently communicating with the
Axiom, or if the normal method of collecting the logs through the Oracle FS or the Axiom fails,
for whatever reason.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 24
Slide 25

FSPM Host Log Collection from the GUI


Must explicitly select the hosts from which logs are to be collected

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

This is the typical set of panes you would see when collecting logs from the GUI. This is the
minimum set you should collect if you have a SAN host problem. Collect logs from the Controller
or Controllers, the Event Log, the Pilot logs, and the system configuration. Click the Select
Hosts button, go through the list of hosts which are presented, and check the ones you want to
collect logs from.

In the early days on the Axiom, if you asked to collect logs, it would collect logs from all
available hosts. This feature has changed, after release 5, and you must now explicitly indicate
which hosts to collect from otherwise, it will collect from none.

Often, logs come in where people forgot to collect the host logs, so it’s an important thing to do
if there are any SAN problems you’re trying to sort out. Even if there are no known problems
with FSPM, picking up the host logs can give us a view of what is happening on the SAN. If a
host is going astray, it is sometimes useful to collect logs from one or two other hosts as well
which might help to narrow down where a problem is in the SAN. But most importantly, be sure
to collect logs from the host which is having problems.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 25
Slide 26

FSPM Host Log Collection


What gets collected

• Trace file of recent activity in the Daemon


• Data from the Daemon’s latest scan of the host (as sent to Pilot)
• Trace file of the driver (when driver code is provided by FSPM)
• Other data specific to each implementation, covering
– OS log files - Windows Event Log, Linux ‘messages’ files, …
– Output of useful OS commands
• FSPM logs go into the bundle in directory ‘apm’

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

We will now take a look at what data gets collected. We collect the trace file of all the recent
activity in the daemon. The trace is a limited size, and how far back it goes depends on how
much has been happening. Some traces will go back several years, and other times,
unfortunately, they only go back a few minutes. We pick up a copy of the latest scan information
that has been sent to the Oracle FS. When the driver is provided by FSPM, we also collect a
trace file out of the driver which can be very useful.

Other data depends on each OS, but it will typically include the OS log files. For example, on
Windows we collect a copy of the System Event log, on Linux we collect the messages files,
which are roughly equivalent.

We also run a set of standard OS configuration and investigative commands to pull out any
other information that may be useful such as patch levels, state of the system as seen by the
OS’s native tools, etc. These logs all go together in the log bundle that is collected from the
Oracle FS, into a directory called apm. FSPM is the follow-on to APM. Some internal things that
are not customer visible has not been renamed, so the directory is still called apm.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 26
Slide 27

FSPM Host Log Collection


Finding FSPM logs in a “scanned” bundle

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

This slide shows a set of apm logs from a bundle which has been pulled in and then scanlog run
on it. You can find the apm logs by searching for the string apm. This is a search from a single
Windows host. As you can see, there are a couple of winstat and the winper files that provide
information specific to this implementation.

They are not very useful. Driver trace provides precise information on what is happening on the
data path. DaemonTrace provides information about the interactions with the Pilot.
DaemonScan is the file that provides the last discovered information from the host as last sent
to the Pilot. iSCSI info is a dump of everything to do with the iSCSI configuration on the host.
Winsys is the system event log from the host.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 27
Slide 28

FSPM Host Log Collection


Checking for FSPM logs in an unexpanded bundle

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Here is a Linux or other Unix commands to check a non-expanded bundle to see if there are
any apm logs. This command will search through the bundle to find any apm related logs.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 28
Slide 29

FSPM Host Log Collection on the Host


… but only if they can’t be collected with the Oracle FS logs

• As Administrator/root, run the daemon executable with the ‘-d’ option


– Sends a message to the background daemon to dump its logs
– Does not stop or restart the background daemon itself
• Logs placed in a date/time directory in a well-known location
– %windir%\Debug\fspmd
– /var/log/fspmd
• Wait for the collection to finish
– Creates a COLLECTION_FINISHED file after all others

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

When logs cannot be collected from the Oracle FS or when the Oracle FS fails to collect the
logs from the host, it will not tell you. It will just not be present in the bundle. Use the commands
previously shown to check whether or not a bundle contains the apm logs, even if you asked it
to. If you don’t have any apm logs or if they can’t be collected from the Axiom, for whatever
reason, they can be collected locally on the host.

To collect locally on the host, log in as the Administrator on Windows or as root on anything
else, then run the daemon executable with the -d parameter. This sends a message to the
daemon which is currently running and asks it to dump its logs. It does not restart the daemon, it
does not interfere at all with FSPM’s operation on the host at the time, so there is no disruption.

The daemon will pull the same set of logs as it would put into a collection requested from the
Oracle FS, but it will dump them under a directory named the current date and time in a well-
known place that is under the standard Debug directory on Windows, or under var log fspmd on
everything else.

Wait for the collection to finish, and the completion will be indicated by the daemon creating a
file with COLLECTION FINISHED in its name. It will do this last after everything else is
complete. Usually log collection takes a few seconds, but in some circumstances it can take
quite a while for a log collection to complete. Some of the commands to collect host information
can be blocked for a while if the host is unhealthy. You will know when the collection is finished
because of the COLLECTION FINISHED file.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 29
Slide 30

FSPM Log Collection on Windows Host

Path to daemon program


• Find from the properties associated
with the Service entry

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

To run the daemon program executable command in Windows, open the properties window of
the Oracle FS Path Manager Service, and find the Path to executable window. The path often
appears truncated, but the full text is present in the window. If you select the path to the
executable, you can pick up the actual path to the daemon executable from wherever the user
has chosen to install it.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 30
Slide 31

FSPM Log Collection on Windows Host


Run daemon command as Administrator

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Then paste the command into an Administrator Command Prompt, and add -d to the end of it,
and run the command. Here we see that the fspmd directory did not exist before. Then we run
the daemon executable with -d, and afterwards we find a directory under fspmd that is named
for the current date and time, and there are the set of logs.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 31
Slide 32

FSPM Log Collection on Windows Host


Wait for collection to finish – usually a few seconds

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Here is what the set of logs looks like, the same set that would have been collected from the
Axiom, plus an empty file saying COLLECTION FINISHED once they are all done.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 32
Slide 33

FSPM Log Collection on Non-Windows Host


Can use ps command to find path to daemon program

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

On other systems, an easy way to find the path to the daemon executable is to use the ps
command. Look for the term fspmd, copy that path, add -d to the end, run the command, and it
creates the directory, and there is a set of logs in the new directory. The logs would then have to
be copied for support and development to look at.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 33
Slide 34

FSPM Daemon Connection to Pilot Not Required


… but strongly recommended

• Datapath multipathing functionality not dependent on pilot connection


– Assistance bringing new LUNs online automatically would be lost
– Core multipathing entirely independent of Pilot connection
• All the integration functionality not available without Pilot connection

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

All this functionality depends on the daemon having a connection to the Pilot, except collecting
logs locally on the host. A connection to the Pilot enables integration into the management
system on the Oracle FS, but it is not required. Some customer’s policies do not allow typical
SAN hosts to be able to make a TCP connection to the Pilot. Without a collection to the Pilot,
FSPM still provides and enables all the multipathing functionality, but the integration
functionality does not work.

However, it is an important point that a connection to the Pilot is not necessary for multipathing
itself to work. If something has gone astray, such that the FSPM daemon is no longer
communicating with the Pilot, that does not necessarily mean that there is anything wrong at all
on the SAN. The multipathing itself is quite independent of the Pilot connection. About the only
thing with the SAN that you would lose by not having a Pilot connection is assistance with
bringing new LUNs online, which some of the FSPMs provide.

In that circumstance, you would have to run whatever commands you would normally run. It is
strongly recommended getting a Pilot connection if possible, if for no other reason than
collecting the logs and for having information in the FSPM logs as to what has been going on
with the LUNs and paths available online.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 34
Slide 35

Oracle FS System ALUA Implementation


… same as the Pillar Axiom

• Dynamic Implicit Asymmetric Logical Unit Access


– Optimization states controlled entirely by the system, change at any time
• LUNs start out on their configured home Controller
– Paths to configured home Controller are optimized
– Paths to the other Controller are non-optimized
• FSPM hosts send I/O through optimized paths only, if there are any
– Only sends I/O through non-optimized paths if all Optimized have failed

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Now we will review the ALUA implementation on the Oracle FS system, which is the same as on
the Pillar Axiom system. It is important that this is well understood, because it clearly impacts
things you may see happening on the SAN. The Oracle FS implements a mechanism known as
Dynamic Implicit Asymmetric Logical Unit Access. Implicit means that the Oracle FS itself
controls where the LUNs are resident.

The Oracle FS Administrator can move a LUN between Controllers, but from the point of view of
the SAN, the changes will only happen from the Oracle FS. They will never be initiated from the
host. The Oracle FS is in charge of which paths are optimized and which are non-optimized.
The optimization state is cannot be controlled from the host. When a LUN is created, it is
deliberately or by default given a configured home Controller. You determine which Controller it
is going to be created on.

The paths to that Controller are the optimized paths, and the paths connected to the other
Controller are non-optimized paths. FSPM will send I/O only through optimized paths. The most
important functionality of FSPM is that it uses only optimized paths if there are optimized paths
available. If no optimized paths are available, then FSPM will switch over to using non-optimized
paths.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 35
Slide 36

Oracle FS System ALUA Implementation


… when all optimized paths to a LUN have failed

• FSPM starts sending I/O down non-optimized paths


– System monitors amount of optimized and non-optimized I/O to each LUN
– Significant amount of non-optimized causes an Event Log entry
• When I/O is predominantly non-optimized, system moves the LUN
– The LUN is “rehomed” from its configured home to the buddy Controller
– Path optimization states swap, so the current paths are now optimized
• LUN moved back to its configured home after 10 minutes
– If the failed paths have not recovered, this sequence quickly repeats

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Let’s say that all the optimized paths have failed. At that point, FSPM will switch over to using
the non-optimized paths. The Oracle FS system monitors the amount of optimized and non-
optimized I/O that is going to each LUN. If the system sees a significant amount of non-
optimized I/O, an event log entry will be generated on the Oracle FS, and the Oracle FS will
move the LUN to the other Controller.

This is called “re-homing.” The LUN will be moved from its configured home to its buddy
Controller, known as the alternate Controller. Then the path optimization states swap to where
the non-optimized paths get changed to being optimized paths, because the LUN has been
moved to the alternate Controller. After about 10 minutes, the Oracle FS will attempt to move
the LUN back to its configured home. LUNs have configured homes in order to equally balance
different LUNs across different Controllers.

The system is interested in keeping LUNs on their configured home if it can. If the original paths
have recovered on the host, FSPM will see this optimization state change, and it will change
back to using the original paths since they are now optimized. If those failed paths have not
recovered, FSPM carries on using the only paths it has, which have now suddenly become non-
optimized again, and that leads us back into the situation we started with.

So when optimized paths fail, there will be a pattern of non-optimized access, followed by 10
minutes of everything appearing fine because access is optimized again. Then, every 10
minutes or so, you will see a brief burst of non-optimized access because the Oracle FS has
attempted to move the LUN back to where it wants it to be. This pattern is to be expected, and
the different bursts are exactly how it should work. The overall indication is that paths to the
configured home Controller are not available.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 36
Slide 37

Oracle FS System – Changes from Pillar Axiom


… which directly affect SAN hosts

• Oracle FS System can provision 4096 LUNs per host


– Logical Unit Numbers 0 to 4095
– Pillar Axiom provides 256 LUNs per host
– Windows can only access LUNs 0 to 255
– HP-UX can access LUNs 1 to 4095
– Other OSes can access all 4096

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

This slide shows some changes from the Pillar Axiom to the Oracle FS system which directly
affects SAN hosts. The most visible change is that the number of LUNs that can be provisioned
to each SAN host has increased to 4,096 from 256 LUNs per host on the Pillar Axiom. Logical
Unit numbers zero to 4095.

The Windows operating system can only access 255 LUNs, so this change makes no difference
for Windows; Windows is still limited to LUN numbers zero to 255. HP-UX can access LUNs one
to 4095, but cannot access LUN zero, and other operating systems can access all 4096 LUNs.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 37
Slide 38

FSPM – Supported SAN Protocols

• All FSPM implementations support these protocols


– Fibre Channel switched fabric
• Direct connect and arbitrated loop not always supported
– Fibre Channel over Ethernet
• Through a switched fabric to FC ports on the Oracle FS System
– iSCSI – software initiators and adapters
• To iSCSI ports on Oracle FS and Pillar Axiom Systems
• Through Cisco MDS 9000 iSCSI-to-FC router to FC ports
– Oracle Virtual Networking (on OSes which support it)

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

FSPM supports Fibre Channel and iSCSI SAN protocols. We always support Fibre Channel
switched fabric, and any serious data center is going to use switched fabric. Direct connect and
arbitrated loop are not necessarily supported and they are not tested as thoroughly. Arbitrated
loop should not be in use at all these days for connecting serious systems. Direct connect is not
something you would expect to see in anything other than an experimental setup. Fibre Channel
over Ethernet is fully supported at the host. At the moment that must go through a switched
fabric switching over to traditional Fibre Channel for connection into the Oracle FS.

iSCSI is supported using both software initiators and hardware adapters. The host would be
connected through a Cisco MDS 9000 iSCSI switch to FC routers to connect to FC ports on the
Oracle FS or the Axiom. Sometimes customers have outlying systems that they want to have
SAN access and it is not worth having a full iSCSI infrastructure within the data center itself.
Therefore, they may just put a switch on the edge to switch over to Fibre Channel within the
data center. We also fully support Oracle Virtual Networking on all the operating systems which
support it. Oracle Virtual Networking was previously known as Xsigo.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 38
Slide 39

FSPM – Configuration at the Host

• Oracle FS System GUI shows current Load Balancing setting at host


– Can be changed at host on Windows and AIX, change reflected in GUI
– Use normal host configuration methods to change it
• Pillar Axiom GUI will not reflect changes made at the host
– Host may be updated to Axiom’s setting at any time

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

When configuring the host, we have had some customers as mentioned earlier, who either
cannot, or choose not to have a connection to the Pilot. These customers like to be able to
configure load balancing at the host. A change with the Oracle FS is that if the load balancing is
changed at the host, the Oracle FS GUI will now show the current setting. It can be configured
at the host on Windows and AIX. With Oracle FS, that change will be reflected in the GUI. You
can use normal host configuration methods to change it to normal OS specific methods.

The Pillar Axiom GUI will not show any changes made at the host. It will show whatever settings
were last set in the Pillar Axiom GUI. If the host has a control path connection, then the Pillar
Axiom load balancing may get set back to whatever it says on the Axiom at any time. So on the
Pillar Axiom, the idea of setting the load balancing at the host is only relevant if you simply do
not have a control path connection to the Pilot. In the Oracle FS, it’s simply an option. You can
control it at either end and both ends will be updated appropriately.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 39
Slide 40

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | y Restricted

Now that we have covered the generic information, we will take a look at each of the five
different implementations of FSPM, discussing the driver side of things and the more operating
system-specific parts.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 40
Slide 41

FSPM 4.0 for Windows Server


Editions of Windows Server from 2008 to 2012 R2

• Supports most Editions of 2008, 2008 R2, 2012, 2012 R2


– All platforms – 32-bit, x64, Itanium
• No major changes since APM 3.4 (Server 2012 R2 added)
– Apart from support for the Oracle FS System …
• FSPM Driver is an MPIO Device Specific Module (DSM)
– Integrates into the Microsoft Multipath I/O framework
– Well-defined published interface for third parties
– Direct control of pathing and error handling for each I/O

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

First we will discuss Windows. We support Windows Server from 2008 upwards. Support for
2012 R2 was added, so we support all four releases of Windows Server, and we support them
on all platforms: 32-bit, which is rapidly fading in this market, 64-bit x86, and Itanium, which is
also fading away.

There have been no major changes since the last APM release for Windows, other than adding
Server 2012 R2 and adding support for the Oracle FS. On Windows, the FSPM Driver is an
MPIO device specific module. There is a large chunk of code which FSPM provides to do the
multipathing. This is a module which integrates into Microsoft’s Multipathing I/O framework,
MPIO. MPIO is a well-defined published interface for third parties to use, so this mechanism is
fully supported and an officially approved Microsoft mechanism. The FSPM code has direct
control of the pathing and error handling for each I/O. We decide which path each individual I/O
goes down, and we deal with the success or failure of each of those I/Os.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 41
Slide 42

FSPM 4.0 for Windows Server


Changes since APM 3.4

• Windows Server 2012 R2 support added


• Windows Server 2003 and 2003 R2 support removed

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The changes since the last APM release for Windows Server are, 2012 R2 is supported, and
the old 2003 and 2003 R2 releases are not supported. Oracle FS is supported on Windows
Server 2008 and later only.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 42
Slide 43

FSPM 4.0 for Windows Server

Architecture
• 4 – FSPM daemon
• 9 – Windows disk driver
• 10 – Microsoft MPIO Framework
with FSPM Device-Specific Module

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Here is a diagram of the architecture of FSPM on Windows with the more important parts
labeled. Number 4 is the daemon running as a Windows service, up in user land. Number 9 is
the Windows disk driver sitting over the top of number 10, which is the multi-path I/O framework.
Plugged into that multi-path I/O framework is the device-specific module which makes all the
decisions about what to do with each I/O and which path to send it over.

Underneath are the lower level things looking after each path. Number 12 is the iSCSI software
initiator, number 13 are the Fibre Channel drivers and iSCSI drivers, and numbers 14 are the
actual HBA ports and NICs.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 43
Slide 44

FSPM 4.0 for Windows

Version string for Windows


• 4.0.x / 4.0.y – daemon and package version
4.0.x, driver version 4.0.y
• In customer releases, driver will always be
from an earlier build than the daemon and
package because of the Microsoft driver
signing process

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

On customer releases, the version string displayed for FSPM on Windows will always be in a
two-part form, as shown here. The first one is always the version number built into the daemon,
which is the same as the version number of the package that it was installed from.

The second part is the version number built into the driver, into the device-specific module. On
customer releases for Windows, the driver number will always be lower than the daemon
number because of the way drivers are signed by Windows. Microsoft has a sophisticated
mechanism for signing software.

It is necessary for many drivers or all drivers in the most recent releases to be fully signed by
Microsoft’s own signing methods and protocols. Anything in the data path has to be tested using
a sophisticated set of Microsoft tests. Microsoft then checks the results of those and then signs
the drivers if it is happy with them.

We go through testing and complete our own tests; and when we are happy with the driver, we
then submit it for the Microsoft tests. Microsoft then comes back and provides a signature for
the driver, and we then have to wrap that signature and that binary driver into a new release in
order to get into a package.

That means that the package number and the daemon version number will always be greater
than the driver that it encompasses.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 44
Slide 45

FSPM 4.0 for Windows Server


Hyper-V Virtualization

• FSPM can run in hypervisor itself like normal server


– SCSI terminated at hypervisor, LUNs shared as virtual disks
• FSPM can also run in Virtual Machines
– Virtual Fibre Channel (NP-IV) not supported
• Microsoft have disabled functionality required by FSPM
– Direct access to LUNs from VMs over iSCSI only
– Indirect access through virtual disks exported by hypervisor

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

One of the features of Windows Server which is becoming more important is virtualization using
Hyper-V, as Microsoft termed their product. Hyper-V runs in a Windows Server image, often a
very cut down image, which itself is essentially the hypervisor. Since it is a Windows Server
image, FSPM can run it.

FSPM can be installed on the parent, or hypervisor, so the SCSI can be terminated at that level.
The LUNs are mapped out from the Oracle FS system to the parent, to the hypervisor; and once
the hypervisor has those LUNs through FSPM, they are just like any other disk resource and
they can share them out as virtual disks for the client’s partitions to use.

FSPM can also run directly in Virtual Machines that are running on a hypervisor. Unfortunately,
at the current time, Virtual Fibre Channel in Virtual Machines is not supported. For reasons,
Microsoft has chosen to deliberately cripple an API which FSPM is dependent on with NP-IV.
We hope to resolve that in a future release, but at the current time, NP-IV based Virtual Fibre
Channel is not supported in Virtual Machines on Windows Server. Direct access to LUNs from
Virtual Machines on Windows can be accessed over iSCSI only. But Virtual Machines can
indirectly access Oracle FS LUNs by picking them up as virtual disks exported by the
hypervisor, and the hypervisor itself is accessing them as LUNs off an Oracle FS server.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 45
Slide 46

FSPM 4.0 for Windows Server


Core Mode

• Microsoft now recommend that Server should run in Core Mode


– No GUI desktop
– reduced attack surface, and fewer things to go wrong
• Increasingly popular
• need to learn command line interfaces and PowerShell
– PowerShell is very powerful, but unlike any other OS environment

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Another thing which has no direct impact on FSPM that is worth mentioning, is Windows Server
Core Mode. This is mentioned because it is something that almost everyone is aware of and it is
something that is becoming more important. Core Mode is a way of operating Windows Server
without a standard Windows desktop GUI. It is actually turning it into a server operating system.
Microsoft now recommends that servers should be running in Core Mode, and it is becoming
increasingly popular.

The reason is because it simply reduces the amount of software that is running. It simplifies
things, it reduces the attack surface for all those people who are trying to hack their way in, and
there are fewer things that can go wrong. People are beginning to heed this advice. Core Mode
is becoming increasingly popular.

It means weaning ourselves off using the GUI interfaces, we need to learn the command line
interfaces, and we also need to learn PowerShell which is a relatively new shell mechanism
from Microsoft which is very powerful, very useful, and unlike any other shells on any other
operating systems. There is nothing FSPM specific about that, it is just something to be aware
of that is becoming more important to Windows Server and something that we will have to get
used to.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 46
Slide 47

FSPM 4.0 for Windows Server


Over-reaction to Data Phase Errors

• Most common issue seen recently is over-reaction to Data Phase Errors


• DPEs are usually caused by physical SAN faults – cables, SFPs, …
– Can also be caused by over-driving the Oracle FS System
• Driver logs show one or two reports of Data Phase Error
• Some seconds later Windows takes all paths (and the LUN) offline
• Windows timers can be set to avoid this
– https://support.oracle.com/epmos/faces/DocumentDisplay?id=1541772.1
• BUT – fix the SAN, that’s the real issue, it will cause other problems

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Windows systems seem to be getting fairly stable. We do not see a huge set of problems
coming in from Windows servers. One thing we still see is a feature of Windows multipath I/O
where it can badly overreact to data phase errors in the SAN. Data phase errors are most often
caused by physical faults in the SAN. These could include items such as something is wrong
with one of the cables, an SFP fails, or a port fails on a switch causing low level errors at the
physical level on the transport, and those get reported up to the SCSI layer as SCSI faults which
are described as Data Phase Errors.

These can also come from the Pillar Axiom or the Oracle FS if the system is being very badly
over driven, if the I/O load is simply far too high. What we typically see is that the driver logs will
show one or two I/Os failing because of data phase errors, and those individual I/Os are retried
and succeed quite well. Then all of a sudden Windows removes all the paths to that LUN and
takes the LUN offline. This is an unfortunate feature of multipath I/O.

There is a set of timers which can be configured on the host in order to reduce the likelihood of
this happening, and there is an Oracle knowledge based article available, which describes how
to set the timers to avoid this problem. But it is important to stress that this a physical problem in
the SAN. Setting the timers will stop it. Taking the LUNs away stop the immediate problem that
is seen on the server, but there is still a problem in the SAN, which will cause performance
issues and all other types of issues.

Setting timers might almost be thought of as a workaround, something to do to get things


moving again, but it is vital to follow through and sort out the underlying problems and sort out
the actual physical problem on the SAN.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 47
Slide 48

FSPM – Linux

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 48

This concludes the section on Windows. Next we will take a look at Linux. We will spend quite a
bit of time on Linux for two reasons. One reason is because it is probably the cause of most
problems at this time. We hear more about Linux systems than we hear about any others.

And the second reason is because it has been dramatically re-implemented in FSPM. All the
other versions of FSPM have relatively small updates, the main difference being support for the
Oracle FS itself. The same updates have been made in Linux, but we have also done a
substantial re-implementation to change the way things happen together. We will go through
these changes in more depth.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 48
Slide 49

FSPM – Linux

 What’s new since APM, and what OSes are supported

 Linux Multipath architecture and issues

 FSPM plug-ins to the Linux Multipath Daemon

 Summary of APM for Linux and the problems

 One release to rule them all

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Next we are going to discuss Linux. We will look at what is new since APM, and the operating
systems that are now supported on Linux including the different Linux distributions that are
supported. We will give the background on issues we have on Linux, the issues that the
multipath architecture creates, and we will talk about FSPM’s integration into Linux’s
multipathing system.

We will review a brief summary of how APM used to be on Linux and the problems it had. And
finally, we will talk about what we have done which we hope will resolve, or at least greatly help,
with all these problems.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 49
Slide 50

FSPM 4.0 for Linux


All recent releases of major “Enterprise” Linux x86 distributions

• Single FSPM release covers all supported distributions and releases


– Was a different APM release for each minor release of each distribution
• Substantial reimplementation of APM
– Significant enhancements over both APM and native multipathing
– Bug fixes
– Enhanced error reporting and logging
– Dynamically adjusts to OS updates
– Adds support for iSCSI HBAs

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The new release for Linux support is a single FSPM release that covers all the supported
distributions and releases. Previously, we used to release a different version of APM for each
minor release of each Linux distribution, causing all types of problems. Now we have moved to
having a single release which covers all supported versions of Linux. It is a substantial
reimplementation of APM, and it has significant enhancements over APM, and even more so
over the native multipathing. It includes some bug fixes. One of the main differences is greatly
enhanced error reporting and logging.

One of the benefits of Linux multipathing was that it tended to tell you helpful things such as that
a path has failed, but it would not tell you why the path failed or what it was doing about the
failed path. We now have improved error reporting, enabling us to better see what is going on
when things go wrong.

The new FSPM dynamically adjusts to operating system updates. We used to have a problem
where automatic updates coming in by the Yellowdog Updater Modified also known as YUM, or
something similar, would destroy a multipathing configuration causing havoc. It now dynamically
adjusts to new updates. A minor change is that we added support for hardware iSCSI.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 50
Slide 51

CSta 4.0 for Linux – Supported 5istributions


on 32-bit and 64-bit x86 platforms

Operating System
 Oracle Linux, Red Hat Enterprise Linux, CentOS
 Release 5 5.8 and later, all kernels
 Release 6 6.2 and later, all kernels
 Release 7 7.0 and later, all kernels
 Oracle VM Server for x86 3.1 and later
 SUSE Linux Enterprise Server
 Release 11 11.1 and later
 Release 12 12.0 and later

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Lnternal/Restricted/Highly Restricted

FSPM supports the Linux versions listed here, which includes anything derived from Red Hat 5
through 7, and also SUSE Linux Enterprise Server 11 and 12. On 5.8 and later, Oracle Linux
also ships a variety of kernels. We support all the different kernels. It is 5.8 or later on the 5
series, 6.2 and later on the 6 series, and 7.0 and later on the 7 series.

FSPM also supports Oracle VM Server, which is based on Oracle Linux. FSPM will probably
also work with any other clones that are based off Red Hat such as Scientific Linux, but those
are not formally supported. We would not be interested in any problem report for them, but they
would probably work.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 51
Slide 52

FSPM 4.0 for Linux


Driver Implementation

• FSPM “Driver” is Linux Device Mapper with FSPM plug-ins


– Device Mapper Multipath (kernel driver) and Multipath Tools (user daemon)
– Poor architecture, buggy implementation, unsuited to Implicit ALUA
– FSPM integrates plug-in libraries to the Multipath Tools daemon
• Sets priority of a path to a LUN when requested
• Checks the state of a path when requested

• FSPM has no control of individual I/Os


– Just gives advice when Multipath Tools asks for it

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The FSPM driver on Linux is standard Linux multipathing with some plug-in modules provided
by FSPM. It consists of the Device Mapper driver, which runs in the kernel, with a multipath
extension, which is also standard Linux and runs in the kernel, and also a set of multipath tools,
principally a multipath daemon, which runs at the user level.

It is a poor architecture that causes all sorts of problems, and the implementation of this
architecture is very buggy, which causes many additional problems. Overall, it is very unsuited
for an implicit ALUA system such as ours, which is very unfortunate. The good news is that it is
getting better as time goes by. On current releases of Linux, it works adequately, especially with
FSPM.

FSPM works by integrating plug-ins into the multipath tools daemon. This is the area where it is
easiest to get a tool to influence how the multipathing works, and how to make it do the right
thing. The two different plug-ins we have are ones that sets the priority of each path, which
effectively determines which path is going to be used for accessing LUNs.

The second plug-in checks the state of a path when it is requested. It says whether or not a path
is usable, and whether or not multipathing should make use of the path. FSPM on Linux has no
control over individual I/Os. All it can do is give some advice to the multipathing system when
the multipath daemon chooses to ask for it. Traditionally, the problem has been that it did not
ask at the right time, but this is now getting better. It now tends to ask for more information on a
more timely basis, so things are improving on Linux.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 52
Slide 53

FSPM 4.0 for Linux

Architecture
• 3 – Linux Multipath Daemon with
FSPM plug-ins
• 4 – FSPM daemon
• 10 – Linux Device Mapper driver
with Multipath module
• 11 – Linux sd disk driver

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

This diagram shows FSPM on the Linux architecture. Number 4 is the FSPM daemon which
monitors things. Number 3 is the Linux multipath daemon with the FSPM plug-ins into it. The
multipath daemon talks to the Device Mapper driver, which is down in the Linux kernel.
Something to notice here is the Device Mapper driver sits over the top of the disk driver, which
is number 11 in this diagram. 12, 13, and 14 are the low level initiator and port drivers.

We will go through the kernel level architecture in more detail.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 53
Slide 54

FSPM 4.0 for Linux


Architecture Comparison – most other multipath implementations

• FC driver discovers LUN, reports it to OS


• OS identifies it as a LUN which supports multipathing
– Attaches path under multipath driver
• Multipath driver acts as virtual HBA, reports new LUN
• OS attaches multipath LUN underneath disk driver

• Result: single disk device representing a multipath LUN

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

A typical multipathing architecture is what almost everything else does apart from Linux. A Fibre
Channel driver, for example, would discover a LUN and report it to the operating system. The
operating system then sees that it is a LUN by configuration that supports multipathing, and it
attaches that connection underneath the multipath driver.

The multipath drivers usually behave as though they are virtual host bus adapters. When a path
for a new LUN gets attached to the multipath driver, it announces that it has a new LUN. Since it
is coming from the multipath driver, the operating system does not attempt to attach it to the
multipath driver again; it attaches it to the disk driver. We end up with a single disk device which
represents a multipath LUN.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 54
Slide 55

FSPM 4.0 for Linux

Non-Linux Architecture
• 9 – Disk driver
• 10 – Multipathing driver
• Note single disk device, individual
paths dealt with lower down the
stack

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Here is an illustration of that typical configuration. We have a disk driver that is number 9, which
presents a single view of a LUN upwards. Underneath that disk driver there is a connection to
the multipath driver which is presenting the single LUN upwards. And underneath it has multiple
paths to that LUN through as many protocols as well. The disk driver deals with a single disk.
Multipathing is dealt with at the transport level underneath. That is how it should be. This is what
Linux does.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 55
Slide 56

FSPM 4.0 for Linux


Architecture Comparison – Linux

• FC driver discovers LUN, reports it to OS


• OS attaches path underneath disk driver, creating sd device
• Later, OS recognizes some sd devices refer to same LUN
– Creates a ‘multipath’ device for whole LUN, references to sd devices

• Result: individual disk devices for each path to the LUN,


as well as a multipath device for the whole LUN
– Recipe for chaos

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The Fibre Channel driver discovers a LUN and reports it to the operating system, and the
operating system at this level does not know anything about multipathing. It sees that this is a
disk, it attaches it to the disk driver, and it creates a user level device for accessing that path as
a disk.

Later on, the operating system recognizes that some of these disks’ paths actually refer to the
same LUN. So, it creates an additional multipath device which refers to these individual disk
devices that represent each path.

What we end up with in this architecture are individual disk devices for each path to the LUN.
These are accessible as disks in their own right, as well as a multipath device which is
accessible as a disk device representing the whole LUN. This is a recipe for chaos.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 56
Slide 57

FSPM 4.0 for Linux

Linux Kernel Architecture


• 10 – Linux Device Mapper driver
with Multipath module
• 11 – Linux sd disk driver
• Note direct access to disk driver as
well as access through Device
Mapper Multipath

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

This diagram provides more detail of the architecture diagram we looked at earlier. Number 10,
the Device Mapper driver, is sitting over the top of the disk driver. It can see these different
devices, and it knows which ones represent a single LUN. At the same time, that disk driver is
making all those disk devices visible upwards in parallel to the multipath device visible from the
multipath driver.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 57
Slide 58

FSPM 4.0 for Linux


Architecture - consequences

• Applications can access the multipath device and individual paths


– Easy for admins to misconfigure things to use individual paths
• Will cause non-optimized access and confusion sooner or later
– One of first things to check if seeing persistent NOAs
– BUT a few things can only be done using an individual path sd device
• Disk partitioning doesn’t work on the multipath device sometimes

• OS does I/O on each path as it attaches it as a disk


– Not unusual to see NOA reported as LUNs are brought on-line

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The biggest consequence to this architecture is that the applications can access the individual devices as
disks, and they look like perfectly ordinary disks.
Each individual path appears to be a disk device, which works as it would if it were the only path to that
disk, in addition to it being a multipath device. This makes it extremely easy for administrators to
misconfigure things and to set up the software to use individual paths instead of using the multipath
device.

Some automation packages which set up disk access do not necessarily know about multipath devices
and they will spot the individual disk devices and configure those instead. It is very easy for software to be
set up to talk to the individual paths and that will cause non-optimized access. If by some miracle they
happen to pick on just the optimized paths, it might not immediately be obvious but sooner or later it will
cause chaos.

It is one of the first things to check if you are seeing persistent non-optimized access from a Linux box;
check to make sure that it is all configured properly, that software is only accessing multipath LUNs
through the multipath device, not directly through sd device. Having said that Linux, being Linux, there are
a few things that can only be done by talking to an individual sd device that can not actually be done
through the multipath device. One of these is partitioning the disk in the first place. This may have
changed in the most recent versions of Linux. It may now be possible to do partitioning through the
multipath device; but until quite recently, partitioning had to be done through an individual sd device.

Another consequence is that since the operating system brings each individual path online as a disk
device, it can be doing a lot of I/O. There is often a lot of I/O involved in identifying a disk and bringing it
fully online; sometimes enough I/O for that to be reported as non-optimized access in the Oracle FS
system. So it is not unusual to see non-optimized access reports in the Oracle FS as a Linux box is being
booted or as a new LUN is being brought online and attached to that box.

As long as that non-optimized access stops very quickly after the boot or after the initial adding of the
LUN, it is not necessarily a problem. It is however, something that is worrying and confusing.
Unfortunately, it’s a consequence of the architecture.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 58
Slide 59

FSPM 4.0 for Linux


Architecture – other features

• Decisions about path health and which paths to use made by daemon
– Partly polling loop, partly in response to events in driver
– Message to daemon, loads plug-in, determines priority, message to driver
• Can be very slow due to scheduling and polling delays
– Done separately for each path
• Can take long time and lot of I/O to respond to optimization changes
– Doesn’t work well for dynamic ALUA systems
– NOAs around LUN movement
• Can trigger the rehoming algorithm, causing LUNs to bounce

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Let us now look at some other features of the Linux implementation of multipathing. The decisions about
the health of a path, and the decisions about which paths to actually use for I/O, are made by the user-
level multipath daemon in association with the provided plug-ins. It does this partly by simply polling
regularly, running the plug-ins regularly and asking them about each path and what priority to use for
each path, and it does this partly in response to events being raised by the driver.

Again things are improving, it is doing it more and more in response to events in the driver, and it tends to
be more timely now. So even when it is doing it in response to events from the driver, that involves a
driver raising an event, not getting through to the daemon, the daemon being scheduled and loading a
plug-in, the plug-in determining path priority and sending a message back to the driver. This can be very
slow due to the scheduling and polling issues.

A more sane multipath implementation that makes decisions about which paths to use, takes one or two
I/Os on Linux, and that can be a long time. This mechanism is done separately for each individual path.
With other systems, you can generally decide other paths at the same time when things change.

Overall, it can take a long time and a lot of I/O can pass before a proper response is made to an
optimization change. That does not work at all well with dynamic ALUA systems such as the Oracle FS
and Pillar Axiom. You will typically see a lot of non-optimized access whenever LUNs are being moved if
the admin is deliberately choosing to move a LUN or if paths have failed.

If any disruption happens that causes a LUN to move, then Linux servers will typically react slowly to it.
You will see non-optimized access is being reported on the Oracle FS, and in the worst case, it can
retrigger the rehoming algorithm. You can end up with a situation where a LUN is moved in the Oracle FS
and the host is so slow responding to that movement, that it affectively carries on doing non-optimized
access for a long time. That can go on long enough that the Axiom decides to move the LUN back again,
about the same time as the host decides to move to the other paths. We can then end up with LUNs
bouncing back and forth between the Controllers. This usually settles down quickly, but it is not unusual
to see a LUN move back and forth a few times before it finally settles down to where it should be.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 59
Slide 60

FSPM 4.0 for Linux


Multipath Daemon plug-ins

• FSPM provides two plug-ins for the Multipath Daemon


– Checker – checks the health of a path
• We check path connectivity, not health of LUN
– Prioritizer – decides the priority to be assigned to a path
• I/O will go only to paths of highest priority
• Round-robin load balancing if more than one path of highest priority

• Shared Libraries in newer Linux releases


• Prioritizer only, as a free-standing program, in older Linux releases

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The modern versions of Linux, the 6.x releases and the Oracle Linux 5 releases support two
different multipath daemon plug-ins. One is a Checker, whose job is simply to check the health
of a path, and the other is a Prioritizer, which decides which paths should be used. The FSPM
Checker that we supply checks the health of the individual path.

Most Checkers check the health of the whole LUN which can sometimes have some unfortunate
consequences. We check that a path has connectivity to the Oracle FS system. We do not
necessarily check that the LUN at the end of that path is fully healthy and working properly
because that is not the multipathing driver’s job.

The multipathing driver’s job is to maintain physical connectivity to the LUN. The Prioritizer
decides the priority that should be assigned to a path. The Linux multipathing system groups
paths of the same priority together, and it will only send I/O to paths of the highest priority. If
there is more than one path at the highest priority, then it will do round-robin load balancing
between those paths. The Prioritizer gives highest priority to Fiber Channel optimized paths and
lowest priority to iSCSI non-optimized paths. We will go into this in more detail later.

In the newer Linux releases, plug-ins are provided in the form of Shared Libraries which the
multipath daemon loads at runtime. In the older Linux releases, there is only a Prioritizer plug-in,
which does the checking itself by a built-in mechanism. The Prioritizer is a free standing
program, which the daemon has to load and execute a program every time it runs it, which is a
much higher load and slower than using Shared Libraries.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 60
Slide 61

FSPM 4.0 for Linux


Checker Plug-in – differences from APM and Native versions

• Checks health of path and presence of LUN, not health of LUN


– Don’t fail a path because the LUN is busy
• Retry transient failures
– Don’t fail a path because the target raises an information indication
• Log Errors
– All retried and fatal errors are logged
– Saves having to make wild guesses about why paths were marked failed

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Let us look at the Checker plug-in in more detail, specifically the differences from the APM and
native versions. The plug-ins we provided with APM were based on the native plug-ins with
some enhancements and some bug corrections. But the native plugs-ins are quite poor quality,
there are many problems and many bugs, and we did not fix all the bugs at that time.

The main changes were to provide the FSPM functionality. In this FSPM release, we have re-
written these plug-ins from scratch and fixed all the problems that we were aware of. They
should now do things right, with a bit of luck.

The Checker plug-in looks at the health of the path and the presence of the LUN, but it does not
check the health of a LUN. Previously, if a LUN was temporarily reporting itself as busy, the
path to the LUN might fail. If you are unlucky, that could cause all the paths to fail at the same
time.

Now we retry for transient failures. The SCSI protocol has a mechanism for the target to indicate
that something interesting has happened, and it does that by returning a particular type of failure
to an I/O command. The old plug-ins, if it got anything other than a successful response to the
commands it sent, regarded these as failures. Now we properly identify all the transient innocent
failures and we retry the commands in most cases, so we do not fail a path simply because the
target is trying to tell us something.

We also log the errors, we log the status codes that we see and we log why commands failed.
In in the past we have often had many difficult support situations where we can see things
failing, but we do not typically know why because the native multipathers, and the APM plug-ins
based on them, just did not log anything useful. Now we should get the information that we
need.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 61
Slide 62

FSPM 4.0 for Linux


Prioritizer Plug-in – differences from APM and Native versions

• Gives FC paths higher priority than iSCSI


• Implements FSPM load balancing
– For static, all paths get a different priority within their band
• Retry transient failures
– Don’t demote a path because the target raises an information indication
• Log Errors
– All retried and fatal errors are logged
– Saves having to make wild guesses about problems

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The Prioritizer plug-in gives Fibre Channel paths higher priority than iSCSI paths, so you can
have multi-protocol paths to a LUN and use Fibre Channel in preference to iSCSI when
possible. Optimized paths are preferred to non-optimized paths because that is the core
functionality that we provide. We also implement the FSPM idea of load balancing, round-robin
or static.

At the low level, the Linux multipathing treats all paths which have the same priority, and it
always does round-robin between them. Static priority is implemented simply by manipulating
the priorities that are assigned to each path. In static FSPM mode, every path is given a
different priority so only one path will end up as the highest priority and only one path will be
used.

As with the Checker plug-in, transient failures are retried and errors are logged. Previously, a
transient failure would end up with a path being taken out of use because the path was unable
to tell what priority to use, so it would affectively be failed, and again things would not log and
we did not know why.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 62
Slide 63

FSPM 4.0 for Linux

Prioritizer Load Balancing


• Multipath Daemon puts paths with
same priority in a path group
• Device Mapper does I/O only to paths
in highest priority path group, using
round-robin
• FSPM prioritizer implements FSPM
Static mode by giving each path a
different priority

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The multipath command is the native command to look at regarding what is going on with
multipathing. This slide shows the output of the multipath command on a host which has a
single LUN, and this shows how FSPM load balancing works. The multipath daemon puts paths
which have the same priority into a path group, and then does round-robin load balancing
across that path group for the one which has the highest priority.

With FSPM load balancing, all Fibre Channel optimized paths are given the same priority so the
multipath daemon puts them into a single path group and then does round-robin across them.
This is shown in the first example on the slide.

The second example shows FSPM static mode. The Prioritizer has given every path a different
priority so there is only one path in the path group that has the highest priority. The priorities are
still split into bands, so optimized Fibre Channel is still higher priority than anything else. Each
path group has a single path which means that the multipath daemon picks the highest priority
path group and does a round-robin across the one path within the path group, which is actually
the same as what FSPM describes as static mode.

Sometimes there is confusion about this as you can see in the FSPM static mode example.
Round robin is displayed. In this example, round robin is a description of what is happening at
the lower level in the Device Mapper across that individual path group. So, we do a round robin
across a path group containing one path.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 63
Slide 64

CSta 4.0 for Linux – tath Group triorities

Group Priority
PatO Type
Range
 4000000 – 4999999 FSPM Optimized Fibre COannel
 3000000 – 3999999 FSPM Optimized iSCSI
 2000000 – 2999999 FSPM Non-Optimized Fibre COannel
 1000000 – 1999999 FSPM Non-Optimized iSCSI
 0 – 999999 Not generated by FSPM
 50 (typical) Optimized patO reported by native ALUA prioritizer (varies witO code version)
 10 (typical) Non-Optimized patO reported by native ALUA prioritizer (varies)
 Less tOan 0 PatO type (and Oence priority) could not be determined

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Lnternal/westricted/Iighly westricted

These are the path group priorities which you will see. For FSPM round robin mode, the priority
will always be set to the value on the million boundary.
In the static mode, it can be anywhere in these ranges. Optimized paths are always higher
priority than non-optimized paths, and Fibre Channel paths are always higher priority than iSCSI
paths.

Groups less than a million are not generated by FSPM. If the system ends up using the native
prioritizer on the FSPM, which we’ll discuss in more detail later, the values you see will differ
between releases. At the moment, it seems typical that you will see 130 for an optimized path
and you will see 50 for a non-optimized path.

Any value less than zero means that the prioritizer is not able to determine the path priority, or in
the case of native prioritizers, it may have had an error when it was trying to determine the
priority. Priorities less than zero means that that path did not get used.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 64
Slide 65

FSPM 4.0 for Linux


Problems with APM Release Mechanisms

• APM releases were specific to minor releases of each OS


– One release for RHEL 5.6, different one for OL 5.6, and for RHEL 5.7
– Often did not work with interim updates to Multipath Tools
• Some APM releases included our own distribution of Multipath Tools
– But others didn’t …
– Many customers unhappy using our distribution of Multipath Tools
• Increasing use of automatic updates (yum and friends)
– Multipath Tools updates often broke APM and its configuration

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Now let us review some of the problems we have had with APM for Linux. Each release of APM
was specific to a minor release of each OS. There would be one release for Red Hat 5.6,
another release for Oracle Linux 5.6, and another for Red Hat 5.7. Although that was already
precise enough, it often was not, and things were actually tighter than that.

The distributors of Linux now release relatively frequent updates, and we were getting into
situations where the APM release would work with the version of multipath tools that was in. For
example, Red Hat version 5.6 when it first came out, would not work with an update that was
subsequently released for an update for Red Hat 5.6.

We were getting into situations where updates would break a working installation, and there was
confusion between the different versions of APM for different versions of Linux. Some APM
releases included our own distribution of the multipath tools package and others did not.

With releases that did include our own distribution of the multipath tools package, we had many
customers who were very unhappy about installing our own version, about taking out the
version that came with the distribution because their distributors would not be responsible for
any problems. Furthermore, increased use of the automatic update systems often broke APM,
even if it worked when first installed.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 65
Slide 66

FSPM 4.0 for Linux


Problems with APM Release Mechanisms - effects

• High development load keeping up with OS releases


– Slow releasing support for new releases
– Too busy keeping up to do enhancements
• High support load sorting out the many problems
– Installed APM for a different release/distribution (“it’s all Linux”)
– Working setups destroyed by automatic updates
– APM installation overwrote existing configuration

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The effect of all these problems was a very high development load. We were constantly trying to
turn out APM releases to try to keep up with the operating system releases. We were often slow
getting new APM releases out, and we were far too busy keeping up to try to do any
enhancements to make any significant improvements.

We also had a high support load sorting out the many problems, both in development and in the
support teams. Customers tended to install any old version of APM for Linux because they were
convinced it was all Linux so it was all the same. Setups that were originally working were
destroyed by automatic operating system updates, and installing APM would over-write existing
configurations of multipath tools for the more sophisticated customers who were already using
multipath tools for other purposes.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 66
Slide 67

FSPM 4.0 for Linux


Problems with APM Release Mechanisms - opportunity

• Vendors now shipping much better versions of Multipath Tools


– No need to ship our own bug-fixed version
• Increasing use of automatic updates made current situation untenable
– Customers not happy when asked to disable updates
• Rate of change of Multipath Tools versions diminishing
– More commonality across distributions and between releases

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

In general, the versions of multipath tools that are shipped by the vendors with the operating
systems are now getting much better. There are fewer bugs and they are much more reliable,
so there is less need for us to ship our own bug-fixed version. The increased use of automatic
updates is becoming more and more popular, making the current situation untenable.

Something had to be done. The immediate reaction had been to ask our customers to turn off
automatic updates, which they were not happy to do. But it was the only way to keep an APM
installation working. The rate of change of the multipath tools versions has diminished.

New versions are still shipped, but the actual change between those versions has greatly
reduced. Now there is rarely any need to make any core changes in order to match those
differences in tools. This gave us an opportunity to try and improve the situation.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 67
Slide 68

FSPM 4.0 for Linux


Problems with APM – remaining issues which needed solving

• Newer plug-ins are shared objects, dynamically linked at load time


– Structure interface to Multipath Tools, structure layout changes frequently
– Often changes with patch updates, need for matching release of APM
• Installation created new Multipath Daemon configuration file
– Gives settings required by APM, but destroys any customer settings
– Similarly, Multipath upgrades destroyed the APM customizations
– Configuration file layout changes between Multipath Tools releases

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Addressing the problem has been difficult because plug-ins, in general, are now shared objects
which are dynamically linked to load time. The interface of the shared objects is a structure, an
arrangement of bits of data. Although there are no major functional changes between the
versions of multipath tools, the layout of these structures is frequently changed.

A library that was built for one version of multipath tools would not work with a different version
because the structure layout was different and it picked up the wrong information out of the
structure. There does not seem to be any concept of binary compatibility at all in the structure
layouts. They change frequently between each different patch update.

The other issue is that the installation mechanism created a new version of the multipath
configuration file. If the customer already uses multipath tools, this will destroy their existing
configuration. A copy is always saved, but it requires an administrator to go through by hand
and merge the new version with whatever configuration the customer had.

An APM installation would destroy any customer settings, a multipath upgrade would destroy
the APM settings, and the configuration file layout tends to change between multipath tool
releases. These are issues that we have had to deal with, and next we will talk about what was
done to address these issues.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 68
Slide 69

FSPM 4.0 for Linux


One FSPM Release – handling plug-in interfaces

• As plug-in library loads, it fingerprints the text segment of native plug-in


– Fingerprint will change if interface structure layout changes
• Looks up fingerprint in parameter file distributed with FSPM
• Parameter file supplies all details of the required interface
– Our plugin automatically adjusts to different structure layouts
• If lookup fails, pass calls through to most suitable native plug-in
– Don’t fail if FSPM release doesn’t match Multipath Tools
– Do same as for a hand-configured non-FSPM setup

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Now, when one of the multipath tools plug-in libraries loads, it takes a fingerprint of the text
segment, the actual binary code present in one of the native plug-ins. This fingerprint will
change if the code changes at all. The fingerprint will also change even if the code does not
change but the layout of the structure interface to it changes. These fingerprints can be
regarded as mapping onto the layout of a structure. It looks up that fingerprint in a file that we
distribute with FSPM, and that file supplies all the details of the layout of the structure which
maps onto that fingerprint.

The plug-in then automatically adjusts itself to use the parameters which it takes from its file,
and can talk to a variety of different versions of the multipath daemon. If the lookup fails, we
pass the call through to a native plug-in. If somebody installs a new version of the multipath
tools that we have never heard of and our plug-ins do not know how to deal with the tools, we
load the native plug-in which best works with our systems.

There is a native plug-in that does ALUA path management and there is a simple path checker.
We pass control for those, so in a case where we do not recognize the version of multipath
tools, the customer ends up as if they were using the native multipathing correctly configured for
use with our systems.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 69
Slide 70

FSPM 4.0 for Linux


One FSPM Release – handling configuration file and updates

• FSPM installer updates multipath.conf instead of replacing it


– Always saves dated copy of original before changing it
• Installer registers ‘trigger’ to be called on all Multipath Tools updates
– Trigger script checks the FSPM integration, does it again if needed
• FSPM parameter file in separate RPM package from software
– Can release new parameter file much more quickly than software
– Parameter RPM package update also triggers reintegration

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

We solve the multipath.conf problem and the configuration file problem by having our installer
modify the file as it is updated instead of replacing it. We save a dated copy of the one before
we started with it, and we then go in and modify it as little as possible in order to be correctly
configured for the Oracle FS and Pillar Axiom systems.

Our FSPM installer also registers a trigger into the RPM system which causes a script to be
called whenever anybody installs or upgrades either any part of FSPM, or multipath tools, or
multipath daemon, or anything that we are associated with. That trigger causes the FSPM
integration tasks to be repeated. FSPM again checks the multipath tools configuration file to
make sure it is set up properly, and it makes sure that the plug-ins and other files are in the
places they need to be for the multipath daemon to load them.

The parameter file described earlier is shipped in a separate RPM package from the software
itself. This means we can release a new parameter file without needing to change the software,
enabling us to release parameter files much more quickly than if we had to do a new software
release and go through full tests of that software. Installing the new parameter RPM package
also triggers the reimplementation. Simply installing a new package which adds support for
multipath tools, will bring that version of multipath tools under support.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 70
Slide 71

FSPM 4.0 for Linux


One FSPM Release - three packages in the OTN ZIP file

• The three packages in FSPM 4.0 will have names such as


– oracl•-fspm-4.0.1-1.i486.rpm – 32-bit software
– oracl•-fspm-4.0.1-1.x86_64.rpm – 64-bit software
– oracl•-fspm-params-4.0.3-1.noarch.rpm – parameter file
• Third part of each version number is a ‘build’ or ‘patch’ number
• Parameter package will be updated frequently, and ZIP file updated
• Do not intend to change software packages once 4.0 released
• ZIP file version number will match the parameter file version

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The ZIP file on OTN contains three different RPM packages. There is one package each for the
32-bit and 64-bit software, and a third package which is the parameter file. When installing
FSPM for Linux, install one of the software packages and the parameter file package. The
version number of the software will be the version number of the software that actually went
through for formal test. The version number of the parameter file will change as the parameter
file gets revised. The plan is that the ZIP file on OTN will get regularly updated as entries are
added to the parameter file.

The parameter file RPM version number will be updated at the same time. So the ZIP file on
OTN for FSPM 4.0 will be updated relatively frequently each time new entries are added to the
parameter file. The ZIP file version number will be the same as the parameter file version
number, which will change in its third part, which is the patch level. We will be able to see the
ZIP file has an updated version of the parameter file in it.

When release 4.0 of FSPM is released, software in that ZIP file is not expected to change. Only
the parameter file will be updated by adding a new entry to support new versions of multipath
tools. The ZIP file number will always match the parameter file number so you can see that it
has changed.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 71
Slide 72

FSPM 4.0 for Linux


One FSPM Release – typical OS update scenario

• FSPM is installed and working normally


• ‘yum’ update installs a new unknown version of Multipath Tools
– FSPM’s trigger reintegrates FSPM as this is installed
– FSPM plug-ins go to ‘native’ mode
• Customer notices functionality change and ‘native’ mode
– Raises a Service Request

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Now we are going to talk about what happens with FSPM in a situation where a customer does
an operating system update, which in the past would have broken APM and created a mess.
FSPM is installed and everything works properly. The customer does a yum update, which
brings in a new version of multipath tools that we’ve never seen before and one which FSPM
does not support. That update will trigger the implementation, so FSPM will be properly
integrated. The new version of the multipath tool is installed, and when the FSPM plug-ins see
that they do not recognize this version of the multipath tools, they will go into what we term the
native mode in which they actually load and call the native plug-ins, which came with the
multipath tools packages.

The customer will notice that they are in native mode, and they may notice a functionality
change such as not having static load balancing anymore. There is also an indication in the
version string that they are in native mode. When the customer notices this, they raise a service
request. How can the customer determine if they are in native mode?

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 72
Slide 73

FSPM 4.0 for Linux – spotting ‘native’ mode

Version string for Linux


• 4.0.x / 4.0.y native – software 4.0.x,
parameter file 4.0.y, operating in
‘native’ mode
• 4.0.x / 4.0.y – software 4.0.x,
parameter file 4.0.y, operating in full
FSPM mode
• 4.0.x – both packages are version
4.0.x, operating in full FSPM mode

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

These are the layers of the version string for FSPM on Linux. When the customer is in native
mode, the word native is in the version string. The first part of the version string is always the
software package version number and the daemon version number. The second part is the
version number of the parameter file which is currently installed. If the customer is currently
operating using the native multipathing plug-ins, the FSPM runs in the native mode and the
word native will be at the end of the version name. This is the easiest way to spot that the
system is operating in native mode.

Keep in mind that not only does the customer lose the ability to have static load balancing,
which is not significant, but the customer is now exposed to the many bugs in the native plug-
ins. This is the reason why it’s important to notice if the customer is in native mode, and get
things updated so the customer can get back to the FSPM plug-ins.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 73
Slide 74

FSPM 4.0 for Linux


One FSPM Release – response to SR for ‘native’ mode

• Collect FSPM logs


– Logs include info to determine required new plug-in parameters
• Development release new parameter file RPM, update OTN ZIP
• Customer installs new parameter file RPM (not the software)
• Installation triggers reintegration with Multipath Tools
– Full FSPM functionality restored

• Result: Multipathing functionality maintained throughout, no disruption

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

What happens when we get a call because FSPM is in native mode? The first step is to collect
the FSPM logs. The Linux logs include copies of information from multipath tools, which
provides enough to be able to work out the entries that need to go into the parameter file.

The logs will be pulled in and passed to development. Development will determine the new
values that need to be added to the parameter file, they will release a new parameter file RPM,
and they will update the ZIP file on OTN. The customer picks up the new ZIP file and installs the
new parameter file RPM.
This installation triggers the reintegration of FSPM into the multipath tools again, pulls in the
new parameter information, the multipath daemon restarts and reloads the plug-ins, and our
plug-ins now operate themselves.

We use the FSPM supplied plug-ins instead of the plug-ins provided by multipath tools, and we
are back to operating normally. The thing to note throughout this whole sequence is that
multipathing functionality worked. We dropped to the native mode where we were more at risk
of hitting bugs, but the basic multipathing continued to work. We upgraded to bring back full
FSPM support again without disruption, everything continued to work, and we are now back to
full FSPM functionality.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 74
Slide 75

FSPM 4.0 for Linux


One FSPM Release – significant improvement over APM

• Expect these enhancements to significantly improve the situation


– Should see fewer issues in the field
• Unless we’ve introduced new and interesting problems …

• Will withdraw the APM releases for OSes supported by FSPM


• Encourage customers to upgrade to FSPM if possible
• Encourage them to upgrade their OS if FSPM doesn’t support it

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The other thing that we get out of this automatic recognizing of multipath tools versions, is that
we can do a single release of FSPM to support a wide range of Linux releases. We expect
these enhancements to significantly improve things. We anticipate far fewer instances from the
Field, although there is always the risk that says there is new code that will introduce new
issues, instead of just fixing the old ones. We hope the risk is low because it has been through a
full test cycle.

Because these changes are so significant and we believe they will help so much, we plan to
withdraw by the time of the new FSPM release, all the APM releases for the distributions of
Linux which are supported. Normally, there is no need to upgrade APM if it is working, because
APM has been upgraded.
In this case, when using an APM which is now supported by the new FSPM release, we
encourage people to upgrade to FSPM because we believe it will fix many of the problems
which are still outstanding with APM.

Basically, we encourage customers that are running versions of APM which are now handled by
FSPM to upgrade to FSPM. Customers who are running a version of Linux which is not
supported by FSPM, are encouraged to get an upgrade to a Linux version that is supported and
then install FSPM.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 75
Slide 76

FSPM – Solaris

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 76

This completes the section on FSPM for Linux. Now will we take a look at FSPM for Solaris.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 76
Slide 77

FSPM 4.0 for Solaris


Solaris 10 and Later

• Supports Solaris 10 and later (but not Express editions)


– All platforms – SPARC and x86
• No major changes since APM 3.0
– Apart from support for the Oracle FS System …
• FSPM “Driver” is the native Solaris I/O Multipathing Features
– Previously known as STMS, MPxIO, many other names
– No extensions or plug-ins from FSPM

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

We support Solaris 10 and later on SPARC and x86. There are no major changes from APM.
The driver on Solaris is Solaris I/O multipathing which comes with Solaris, previously known as
STMS, commonly known as MPxIO, and there are a lot of other names used for it. There are no
extensions or plug-ins from FSPM.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 77
Slide 78

FSPM 4.0 for Solaris


Solaris 10 and Later

• NOTE that Solaris I/O Multipathing had several bugs with ALUA
– Working with Solaris development for many years to get them fixed
– Vital to have various Solaris updates and patches installed
• FSPM installation checks Solaris update and patch level, and config
• Host must have the patches installed to work with Oracle FS System
– Even if they choose not to install FSPM

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

There were a lot of bugs in Solaris I/O multipathing with ALUA. We have been working with the
Solaris development team for many years to get them fixed. Fixes are present in patches and
the patches must be installed. Even if the customer does not want to install FSPM for whatever
reason, they should still be at the patch level that FSPM requires if they are going to talk to a
Pillar Axiom or Oracle FS system. The FSPM installation will check the patch levels and make
sure that all patches are at the level they should be.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 78
Slide 79

FSPM 4.0 for Solaris


Solaris 10 Minimum Patch Levels

• Minimum installed release: Solaris 10 8/07 release (Update 4)


• Patches for Solaris 10 8/07 (Update 4) through 8/10 (Update 9)
– On SPARC systems
• Solaris 10 8/11 (Update 10) Patch Bundle (patch 144401-10)
• kernel patch147440-24
– On x86 systems
• Solaris 10 8/11 (Update 10) Patch Bundle (patch 144402-10)
• kernel patch 147441-24

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

This slide shows the Solaris 10 minimum patch levels.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 79
Slide 80

FSPM 4.0 for Solaris


Solaris 10 Minimum Patch Levels

• Patches for Solaris 10 8/11 (Update 10)


– On SPARC systems
• kernel patch 147440-24;
– On x86 systems
• kernel patch 147441-24;

• Patches for Solaris 10 1/13 (Update 11) and later


– No additional patches required

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

This slide shows the minimum patch levels that will be enforced by the installer. If the customer
does not have these patch levels, they will be told about them at the time of installation.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 80
Slide 81

FSPM 4.0 for Solaris


Solaris 11 Minimum Patch Levels

• Minimum installed release: Solaris 11 11/11


• Patches for Solaris 11 11/11
– Solaris 11 11/11 SRU10.5 (Support Repository Update 10.5)
• Patches for Solaris 11.1 (Update 1)
– No additional SRU is required

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

And this shows the minimum patch levels required for Solaris 11.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 81
Slide 82

FSPM 4.0 for Solaris – booting from a LUN

Set the Boot LUN option


• When creating or modifying a LUN
on an Oracle FS System, there is an
option to ‘Use as a Boot LUN’
• This option must be selected if
Solaris is to be booted from an
Oracle FS System LUN
• Boot will fail without this option

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

A change with the Oracle FS is a new option that enables you to boot from an Oracle FS LUN
on Solaris. When creating or modifying a LUN, check the “Use as a Boot LUN” box to enable
this option. Solaris will not boot from an Oracle FS system unless this option is selected.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 82
Slide 83

FSPM 4.0 for Solaris


FSPM Load Balancing

• Solaris I/O Multipathing does not allow load balancing config per LUN
– FSPM Load Balancing setting does not work
• On Pillar Axiom
– Load Balancing can be set in the GUI, but the setting is ignored
– Setting displayed in the GUI does not reflect what’s in use on the host
• On Oracle FS System
– GUI does not allow Load Balancing to be set for Solaris hosts
– Setting displayed in GUI correctly reports what’s in use on the host

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Solaris multipathing does not support load balancing per LUN. On an Oracle FS system, there is
not an option to change load balancing. The Pillar Axiom system offers this option, but the
option is ineffective. The Oracle FS will show whatever configuration has been set on the
Solaris box whereas the Pillar Axiom load balancing setting information is incorrect.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 83
Slide 84

FSPM 4.0 for Solaris


Main Takeaways

• FSPM on Solaris is not involved in multipathing on the data path


• If there is a problem with multipath SAN access from a Solaris host
– If it’s not a problem in the Oracle FS System, then it’s a Solaris problem
– Should be investigated and handled by Solaris support
• Installing FSPM checks that the host has the right patches installed
– Patches required for multipathing to work properly, with or without FSPM

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The main take-away is that FSPM is not involved in multipathing on Solaris. If there is a problem
with multipathing on Solaris, and you know that it is not a problem in the Oracle FS or Pillar
Axiom system, then it is a problem in Solaris and it should be dealt with by the Solaris team.
Installing FSPM on Solaris checks that Solaris is at the right patch level that it needs to be to
talk to a Pillar Axiom or Oracle FS system.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 84
Slide 85

FSPM – AIX

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 85

Now we will review AIX.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 85
Slide 86

FSPM 4.0 for AIX


AIX 5.3 and Later

• Supports 5.3 TL 12 and later, 6.1 TL 5 and later, all levels of 7.1
• No major changes since APM 3.1
– Apart from support for the Oracle FS System …
• FSPM Driver is an MPIO Path Control Module (PCM)
– Integrates into the AIX Multipath I/O framework
– Well-defined interface for third parties
– Direct control of pathing and error handling for each I/O
– Architecture very similar to Windows

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

We support AIX 5.3 and later technology level 12. There have been no major changes since
APM 3.1, apart from support for the Oracle FS system. On AIX, the driver is a MPIO Path
Control Module which fits into the AIX multipath framework, almost identically to Windows. Well-
defined interface for third parties is fully supported.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 86
Slide 87

FSPM 4.0 for AIX


Changes since APM 3.1

• Various advanced AIX features are now supported


– Fibre Channel over Ethernet
– Dynamic Reconfiguration (move adapters in live systems)
– Virtual Fibre Channel (NP-IV)
– Boot from iSCSI and FCoE
– Live Partition Mobility (with some restrictions)
– LUNs larger than 2 Terabytes

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The changes since APM 3.1 include additional support for more advanced features which have
been released on AIX. These include Fibre Channel over Ethernet, dynamic reconfiguration
moving adapters between partitions on live systems, Virtual Fibre Channel, booting from iSCSI
and FCoE, live partition mobility where a virtual partition which is live can move between two
physical boxes, and larger LUNs.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 87
Slide 88

FSPM 4.0 for AIX

Architecture
• 4 – FSPM daemon
• 9 – AIX disk driver
• 10 – AIX MPIO Framework with
FSPM Path Control Module

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The architecture on AIX is exactly the same as on Windows.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 88
Slide 89

FSPM 4.0 for AIX – Partitioning


Overview

• AIX is often run on big machines which are partitioned


– WPAR – Workload Partition – software concept, doesn’t impact FSPM
– LPAR – Logical Partition – physical sharing out of resources
– VIOS – Virtual I/O Server – a dedicated LPAR providing I/O to others
• VIOS is based on AIX with different command set; it can run FSPM
– FSPM supports Virtual I/O Server 2.2.1 and later
– Can be thought of as FSPM running in the hypervisor
– Terminates SCSI, makes LUN available as ‘virtual SCSI disk’

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

AIX tends to run on very big machines which can be partitioned. WPAR is workload partition,
which does not impact FSPM. LPAR is a logical partition, physical sharing out of resources such
as Fibre Channel, HBAs. Virtual I/O Server is specialized software that runs on logical partitions
in order to provide I/O functionality to other partitions.

FSPM supports Virtual I/O Server 2.2.1 and later. VIOS is based on AIX but with a special
command set, and it can run FSPM. It is the equivalent to what we discussed on Windows of
FSPM running in the hypervisor. VIOS terminates the SCSI and makes LUNs available as
virtual SCSI disks to other things.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 89
Slide 90

FSPM 4.0 for AIX – Accessing LUNs


Available Mechanisms for FSPM Multipathing

• FC/FCoE from adapters assigned to the LPAR


• NP-IV Virtual Fibre Channel
– Live Partition Mobility not currently supported with NP-IV
• iSCSI adapters assigned to the LPAR
• iSCSI software initiator
• Virtual SCSI Disks presented from one or more VIOSes
– Doesn’t need FSPM in the consumer, must be running in the VIOSes
– Can use Native multipathing across multiple VIOSes

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

For FSPM multipathing, you can use Fibre Channel, Fibre Channel over Ethernet, NP-IV Virtual
Fibre Channel with some restrictions, iSCSI adapters, iSCSI software, or virtual SCSI disks
coming out of the virtual I/O server.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 90
Slide 91

FSPM 4.0 for AIX – Accessing LUNs


Protocol Combinations

• Each LUN can be accessed through only one type of protocol


– no protocol combinations supported (AIX restriction)
• Can use any one of
– Physical FC and FCoE
– virtual FC and FCoE, NP-IV
– Software and hardware iSCSI
– Virtual SCSI disk from one or more VIOSes

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Each LUN can be accessed through only one type of protocol. AIX will not support a
combination of protocols. A LUN can be accessed through Fibre Channel, Virtual Fibre
Channel, iSCSI, or as a virtual disk.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 91
Slide 92

FSPM 4.0 for AIX


AIX Configuration and Management Commands

• FSPM integrates fully into normal AIX configuration and management


• GUI
– smitty, SMIT, Web-based System Manager (WebSM)
• Command Line
– lsd•v -C disk, lspath, lsattr -E -l hdisk#, chd•v
• Monitor paths, check and set load balancing, …

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

FSPM integrates fully into normal AIX configuration and management, so all the standard AIX
configuration commands work. For the various GUIs or pseudo GUIs and command lines, you
can check and set the load balancing locally on the host, monitor paths, etc.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 92
Slide 93

FSPM 4.0 for AIX – booting from a LUN

Set the Boot LUN option


• When creating or modifying a LUN
on an Oracle FS System, there is an
option to ‘Use as a Boot LUN’
• This option must be selected if AIX
is to be booted from an Oracle FS
System LUN
• Boot will fail without this option

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

And again, as with Solaris, you must set the “Use as a Boot LUN” option if you want to boot
from an Oracle FS LUN with AIX.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 93
Slide 94

FSPM – HP-UX

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 94

Now we will take a look at HP-UX.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 94
Slide 95

FSPM 4.0 for HP-UX


HP-UX 11i v3 (11.31) Update 3 and Later

• Supports HP-UX 11i v3 (11.31) Update 3 and later


– All platforms – PA-RISC and Itanium
• Few major changes since APM 2.1
– Apart from support for the Oracle FS System …
– iSCSI and FCoE support added
• FSPM “Driver” is the native HP-UX Multipathing
– Part of the new SCSI stack in HP-UX 11i v3
– No extensions or plug-ins from FSPM

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

We support HP-UX 11i v3 update 3 and later, on all platforms. It is native multipathing, using the
HP-UX’s own multipathing system.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 95
Slide 96

FSPM 4.0 for HP-UX


HP-UX SCSI Features

• HP-UX SCSI implementation is … unlike all others


– Several unfortunate consequences
• HP-UX FC port WWNs may not show up in the Oracle FS GUI
• FSPM needs a LUN mapped to a host WWN to discover system
– Set up SAN and zoning
– Configure one of the host’s Port WWNs by hand if necessary
– Map a LUN to it
– Install FSPM, then the rest of the ports are configured automatically

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

HP-UX SCSI implementation is different from other implementations. WWNs may not show up
in the Oracle FS by themselves, so configuring by hand may be needed.

WWN is configured, SAN and zoning is set up, and a LUN is mapped to the WWN to discover
the system. FSPM is installed, and the rest of the ports are configured automatically.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 96
Slide 97

FSPM 4.0 for HP-UX – configure HP-UX mode

Set the HP-UX mode option


• Enables the host to access LUNs 1 to
4095
• Must not be a LUN visible to the
host at LUN 0
• Very limited range of LUN numbers
visible if this option is not selected
• Recommended for all HBAs

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

The HP-UX Compatibility Mode option must be set for the host. Otherwise, access to LUNs may
be limited to using only LUN number zero to eight or the limitations may be even tighter. With
HP-UX Compatibility Mode enabled, LUNs 1 to 4095 can be used, and LUN zero cannot be
used.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 97
Slide 98

FSPM 4.0 for HP-UX


FSPM Load Balancing

• HP-UX Multipathing does not allow load balancing config per LUN
– FSPM Load Balancing setting does not work
• On Pillar Axiom
– Load Balancing can be set in the GUI, but the setting is ignored
– Setting displayed in the GUI does not reflect what’s in use on the host
• On Oracle FS System
– GUI does not allow Load Balancing to be set for HP-UX hosts
– Setting displayed in GUI correctly reports what’s in use on the host

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted

Load balancing for HP-UX is the same as with Solaris. Load balancing is per host, not per LUN,
so it cannot be configured from the Oracle FS system.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 98
Slide 99

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

This concludes the Oracle Flash Storage Path Manager FSPM 4 course. Thank you for your
time.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 99
Slide 100

Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal Page 100

You might also like