Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

CONNECTING THE SCATTERED

PIECES FOR STORAGE REPORTING

Hok Pui Chan


Practice Consultant
Consulting, EMC Hong Kong
hokpui.chan@emc.com
Table of Contents
Introduction ................................................................................................................................ 4

Analyzing the Components and Inter-relationships of a Storage Infrastructure........................... 6

Extracting and Transforming the Storage Configuration Data..................................................... 8

VMAX ..................................................................................................................................... 8

EMC ControlCenter............................................................................................................. 8

EMC Solutions Enabler CLI ................................................................................................ 9

SAN Switches/Fabrics ...........................................................................................................10

Host Information ....................................................................................................................10

Summary ...............................................................................................................................11

Building the Storage Configuration Repository ..........................................................................12

A Simple Solution ..................................................................................................................12

A Scalable Solution ...............................................................................................................13

Automation of Data Source Collections ..............................................................................14

Extract Transform & Load Process.....................................................................................14

Database ...........................................................................................................................14

OLAP Analysis ...................................................................................................................15

Web Page ..........................................................................................................................15

Static Report ......................................................................................................................16

Real Life Use Cases of the Storage Configuration Repository ..................................................17

Use Case #1 – Capacity Trending for Storage Array .............................................................17

Use Case #2 – Capacity Used by Servers .............................................................................17

Use Case #3 – Storage/Fabric Migration and Data Center Relocation...................................18

Use Case #4 – Highlighting the Exceptions ...........................................................................18

Practical Considerations ...........................................................................................................19

Skill Set .................................................................................................................................19

2014 EMC Proven Professional Knowledge Sharing 2


Difficulty Acquiring Server Resources ....................................................................................19

Difficulty Acquiring Software Licenses ...................................................................................19

On-going Maintenance ..........................................................................................................19

Setting Up Data Import for New Storage Equipment ..............................................................20

Future Extensions .....................................................................................................................20

Include Performance Data into the Repository.......................................................................20

Correlate Other Non-Storage Related Data ...........................................................................21

Conclusion ................................................................................................................................21

Disclaimer: The views, processes or methodologies published in this article are those of the
author. They do not necessarily reflect EMC Corporation’s views, processes or methodologies.

2014 EMC Proven Professional Knowledge Sharing 3


Introduction
Every organization has its own unique needs and there are a lot of different types (and even
different brands) of storage equipment to be managed. In managing a sizable storage
infrastructure, an accurate, timely, and comprehensive storage reporting mechanism is
important to measure the performance and utilization of the storage infrastructure. Storage
reporting is also crucial for measuring the business value of storage equipment investments by
associating storage costs to different application and business units.

At present, many companies rely on native tools (e.g. EMC Unisphere®, EMC ControlCenter®,
Cisco DCNM, etc.) to manage the storage infrastructure and extract information for reporting.
These tools have done a very good job of helping storage administrators manage and provide
reporting functionalities for that particular type of equipment. However, these tools do not talk to
each other – the tools for managing fabric only provide fabric/SAN switch-related information
while the tools for storage array only provide array-specific information. Sometimes, answers to
common, simple questions cannot be easily produced by these tools, e.g. “What does the end-
to-end connectivity for this server look like?”, “When this server is retired, how much storage
resources can be freed up and re-purposed?” A holistic, interconnected dimension of storage
reporting should be made available to show the interrelationships from server to SAN switches,
SAN switches to storage array, and storage array to storage volumes provisioned.

Moreover, producing a meaningful report to senior IT management requires co-relating


company-specific information into storage reports. For example, to justify storage infrastructure
investment, an IT manager might want to know how much storage resource is being used by a
particular project, application, or line of business. Since most storage reporting tools do not
provide functionality to include customized data inputs, storage administrators struggle to
produce these reports manually at regular intervals while trying to ensure the accuracy and
timeliness of such reports.

So, as the title suggests, all the essential elements for a comprehensive storage report are
available but scattered around different places or tools. What is needed is to find a way to
connect them and provide different dimensions for storage reporting, e.g. from the project/
business line perspective, from the server/application perspective and, of course, from the
traditional storage array perspective. To produce these co-related reports, a scalable and robust

2014 EMC Proven Professional Knowledge Sharing 4


Storage Configuration Repository is required as the foundation to all of the above. This article
will focus on how a Storage Configuration Repository can be built.

EMC Symmetrix® VMAX® SAN infrastructure will be used to illustrate the technical details of
extracting the source data, processing them, storing them in the Storage Configuration
Repository, and how the reports are produced. Technical details include, “What and how can
VMAX information be extracted?”, “How can this information be transformed and imported into
the configuration repository?”, etc. Other practical considerations and future extensions will also
be discussed.

2014 EMC Proven Professional Knowledge Sharing 5


Analyzing the Components and Inter-relationships of a Storage
Infrastructure
Server, SAN switches, and storage arrays are the essential components of a typical storage
infrastructure. These components are physically connected together.

Taking a deeper look, we find that all the control and configurations of the storage infrastructure
is actually built around one element – the World Wide Name (WWN). In simple words, the WWN
is a global identifier of storage Fibre Channel ports (for storage array front-end ports and server
Host Bus Adapter {HBA ports}) and storage devices. WWN is used to control which ports can be
interconnected with each other and which storage devices can be accessed by these storage
ports.

If we can gather the WWN relationship in a storage infrastructure, we can co-relate and present
a logical view of the whole storage infrastructure. For example, if we can identify the WWN of a
server HBA, the zoning configuration on the SAN switches, and the LUN masking information
on the array, we can easily know how much storage a server is using as well as the
corresponding logical connection. When building the Storage Configuration Repository for
showing the end-to-end relationships, we can focus on gathering information around the WWN
and make things co-related. The following diagram illustrates how the WWN connects each
storage infrastructure component.

2014 EMC Proven Professional Knowledge Sharing 6


Production
Server

HBA WWN
Storage Ports LUN Fabric
Masking Login
LUN
Masking Switch Port
Storage Devices
Replication
Relationships SAN Switch
Remote Storage Devices

LUN
Masking

HBA WWN

DR Server

In the following sections, we will discuss what information we should extract from the storage
infrastructure components to build an end-to-end relationship.

2014 EMC Proven Professional Knowledge Sharing 7


Extracting and Transforming the Storage Configuration Data
In this section, we discuss the information required to be collected from the storage
infrastructure components to:

1. Connect the interrelationships of the components


2. Store necessary configurations for reporting

A VMAX SAN infrastructure is used for illustration.

VMAX
For VMAX, there are two commonly used data sources:

 EMC ControlCenter (ECC)


 EMC Solutions Enabler CLI (SE)

Information to be collected:

 Masking information
 Device information
 SRDF® relationship
 FA port information
 Pool information
 FAST™ VP information

EMC ControlCenter
A component in EMC ControlCenter (ECC)—StorageScope—has its own Oracle database as
the data repository. It provides database views for accessing the data. We can use the views as
the data sources for our purposes. A database connection, e.g. ODBC, JDBC, etc., is needed in
order to access the database using SQL queries to those views. After that, the data can be
stored and manipulated further. For details about accessing these database views and the data
available, please refer to StorageScope documentation.

This method became popular few years ago as most Symmetrix environments had ECC
installed and the information provided by the views is quite comprehensive. However, other
products like Unisphere for VMAX and Symmetrix Management Console (SMC) are starting to
replace ECC as the most popular management tool for Symmetrix environments. Hence, we are

2014 EMC Proven Professional Knowledge Sharing 8


not drilling into the details for getting the required information through ECC StorageScope in this
article.

EMC Solutions Enabler CLI


In almost all Symmetrix environments, Solutions Enabler CLI will be installed. It is a good tool
for managing the Symmetrix, especially when repeated commands and scripting are required.
For collecting storage configurations, we need to execute a set of different symcli commands
and parse the text output.

There is an excellent function in Solutions Enabler CLI which can specify that the command
output results be in XML format by adding the “-output xml_element” option to the symcli
command. Using XML outputs can make automated data importing much easier.

Here are some sample commands that are useful to extract VMAX configuration data in XML
format:

 Device information
o symdev –sid <sid> list -v -output xml_element > symdev.xml
 SRDF relationship
o symrdf -sid <sid> list -v -output xml_element > symrdf.xml
 Pool information
o symcfg -sid <sid> list -pool -thin -mb -output xml_element >
pool.xml
 Masking information
o symaccess -sid <sid> list view -v -detail -output xml_element >
masking.xml
 Device and Pool relationships
o symcfg -sid <sid> list -tdev -detail -output xml_element >
tdev.xml

We will use this approach for extracting configuration data from a Symmetrix VMAX.

2014 EMC Proven Professional Knowledge Sharing 9


SAN Switches/Fabrics
SAN switches and fabrics are the core connectivity component within a storage infrastructure,
providing physical connectivity (through physical ports) and logical connectivity (through zone
and zoneset configurations).

Configurations to be collected to build the end-to-end relationship:

 Fabric logins – the relationship between WWN and SAN Switch port
 Zones and Zonesets – the relationship between HBA and storage ports

SAN switches usually provide a command line interface and command outputs can be collected
for further processing. Here are some examples:

 Fabric Logins:
o show flogi database
 Zone information
o show zoneset active

Some storage administrators might think that it is relatively less important to collect SAN and
Fabric information. The configuration information they usually need for reporting is just port
utilization and zone information. This information will not change frequently and can easily
tracked manually.

Host Information
Host information includes host-related attributes like OS and Host model. The most important
information in this discussion is the HBA WWN(s) of a host. We need to have “Host Name to
HBA WWN” mappings in order to link up all other storage infrastructure components.

One of the methods is to rely on tools. Most EMC storage administrators should be familiar with
the emcgrab/emcreport tools to collect host information for a configuration snapshot or
troubleshooting. EMC also provide a nice web portal called E-Lab Advisor (ELA) to accept
emcgrab/emcreport output file uploads. In this web portal, a user can see an inventory of hosts
which emcgrab/emcreport output files have uploaded. A user can then select the hosts and
generate a SAN Summary report in Excel format. The SAN Summary report presents a list of
hosts in tabular form with WWN(s) as one of the columns.

2014 EMC Proven Professional Knowledge Sharing 10


Some organizations require internal processes in order to get approval to run
emcgrab/emcreport. Consequently, storage administrators often try to avoid using
emcgrab/emcreport unless they are really necessary, e.g. for troubleshooting issues. Some
storage administrators prefer to manually keep track on this mapping; they need to collect this
information anyway for storage provisioning and zoning.

Another possibility is to use configuration management tools like MS SCCM.

Summary
Below is a summary of information to be collected and the suggested frequency of data
collection. (We will discuss “Simple Solution” and “Scalable Solution” in the next section.)

Information to Data Source Data Collection Data Collection


be Collected Frequency Frequency
(Scalable Solution)
(Simple Solution)

VMAX configs Solution Enabler CLI output in Weekly Daily


XML format

Switch/SAN fabric CLI command outputs Weekly Daily

Host Information Manual mapping input/SAN Ad-hoc Ad-hoc


Summary

2014 EMC Proven Professional Knowledge Sharing 11


Building the Storage Configuration Repository
After discussing some details about the data to be collected to co-relate storage infrastructure
components, this section is going to discuss about how to make use of this data.

We will discuss two alternatives. One is simpler but with less functionality while the other is
more comprehensive and can provide more opportunities for future extensions.

A Simple Solution
This solution focuses only on storage array configurations (not much correlation with other
storage infrastructure components) and is only semi-automated. It requires some manual tasks
and provides limited functionalities. However, storage administrators might find it very useful to
keep track of storage configuration changes and produce regular storage utilization reports. This
is especially true for a storage environment having less than five arrays.

In the last section, we discussed how to collect VMAX configuration data in XML files. Now, we
can use XSL transformation to extract and transform these XML files into simpler formats such
as CSV and HTML which can be imported to Excel or directly used for reporting.

2014 EMC Proven Professional Knowledge Sharing 12


The diagram below illustrates such transformation using the XML command output of:

“symtier list –v -output xml_element > symtier.xml”.

Symcli XML output XSL File

Transform

CSV output

After collecting output for several weeks, a report of the tier utilization trend can be easily made.

Some open source XML editors can provide XML/XSL editing and transformation tools.

A Scalable Solution
A more scalable option is recommended for storage infrastructures having more than five arrays.
We can build a web application with a database to store historical storage configuration. When
comparing to the simple solution discussed above, this option is more robust and automated for
data collection and reporting. The diagram below illustrates the high level architecture of this
option of the Storage Configuration Repository.

2014 EMC Proven Professional Knowledge Sharing 13


Solution Enabler
OLAP Analysis

XML Files

Extract,
Transform &
Load (ETL) Database
Host-WWN
Storage Administrator Mapping Static Report

Command Output

Web Page
SAN Switch

Automation of Data Source Collections


The data collection process, except the manual input of Host and WWN mapping, can be
automated using scheduled jobs to execute the commands daily. After that, the output files will
be transferred to a shared location, e.g. a share drive, for further processing.

Extract Transform & Load Process


Simple programs can be developed to parse and import the file contents into the database.
Some tools like MS SQL Server Integration Services can be used to design and implement the
Extract Transform & Load (ETL) process easily.

Database
The database is the core component of the storage configuration repository. It stores the most
recent, as well as historical data, for reporting. The database structure reflects the relationship
between storage infrastructure components and configurations. A simplified version of the
Entity-Relationship diagram of the database is shown below.

2014 EMC Proven Professional Knowledge Sharing 14


OLAP Analysis
Once the database is constructed, online analytical processing (OLAP) cubes can be designed
to facilitate an interactive, responsive, and user friendly access of data inside the database. For
example, MS SQL Server Analysis Service is a popular tool to build OLAP cubes and MS Excel
pivot tables can provide a familiar interface for accessing information in the OLAP cubes.

However, building OLAP cubes require special knowledge and may not be easy.

Web Page
It is very common to use a web interface to access a database. Everyone is familiar with web
browsers and web pages are not very difficult to build. With a web interface, information in the

2014 EMC Proven Professional Knowledge Sharing 15


database can be shared easily. A web page can accept a user’s input parameters, e.g.
hostname, as query or filtering criteria.

Static Report
Sometimes, a storage administrator may need regular reports on the storage infrastructure.
The formats of these reports are not expected to change frequently. Hence, report templates
can be constructed and reports can be generated regularly to a shared location or even
published to a web portal. Here are some examples of static reports:

 Storage pool utilization report with history


 SAN Switch port utilization report
 Hosts with highest storage capacity subscriptions
 Fan-out ratio of storage front-end ports

2014 EMC Proven Professional Knowledge Sharing 16


Real Life Use Cases of the Storage Configuration Repository
This section will discuss how the Storage Configuration Repository helps in real life situations.

Use Case #1 – Capacity Trending for Storage Array


Today, storage pools with thin provisioning are usually configured on newly acquired arrays. It is
important to keep track of pool utilization and initiate the disk upgrade procurement process in
time. Storage administrators are usually required to look at capacity statistics for individual array,
arrays in a site, and all the arrays in all the sites. The Storage Configuration Repository can
provide capacity usage history and a capacity report with multiple dimensions can be generated
regularly.

Use Case #2 – Capacity Used by Servers


In real life, storage administrators need to have statistics about the capacity used by individual
or a group of servers. Storage administrators not only want to know the capacity used in the
production site but also in the DR site. From the interrelationship that we built in the repository,
we can easily identify the storage devices being used by the production servers and the DR
servers through the device replication relationships.

However, there are two major considerations here. The first is to cater for cluster servers. If we
only aggregate the storage capacity used by a list of servers, the shared devices used by
clusters could be double counted. Consider the following scenario:

 A 2-node cluster
 Using a shared disk 50GB in size
 Node#1 and Node#2 both “see” the 50GB disk

If we just aggregate the storage capacity used by Node#1 and Node#2, the total capacity will
become 100GB instead of the real usage of 50GB. Ideally, the reporting/query facility should be
able to “de-dupe” these capacities.

Another consideration is applications that do not use array-based replication for DR, e.g. some
databases that use native data replication. In such cases, we cannot get the hostname and the
corresponding capacity used in the DR site from the repository. We need to either store these

2014 EMC Proven Professional Knowledge Sharing 17


relationships in the repository or include DR hostnames in the query/reporting and make use of
the “de-dupe” mechanism.

Use Case #3 – Storage/Fabric Migration and Data Center Relocation


In a typical storage and/or fabric migration or Data Center Relocation project, the storage
infrastructure will be very dynamic. It is extremely difficult and error-prone to manually track
inventory and migration status, especially if it is a large scale migration involving multiple arrays,
fabrics, and sites. Also, the migration project team may need to make a lot of migration
priority/scheduling decisions based on some “what-if” analysis results.

During the planning phase of a migration, we can rely on the query/reporting functionalities to
provide immediate results for decisions. As data will be refreshed regularly, the repository can
provide the most updated configurations snapshot for reporting.

When executing the migration, storage administrators can rely on the repository to provide
detailed storage configurations. For example, the repository contains information that can be
used to list the devices, the local and remote replication relationships, related cluster nodes, and
DR hostnames in a migration event. It is particularly useful for some complex storage
environments where a host is using multiple arrays with some replicated and some non-
replicated storage devices.

The migration progress can also be tracked. Based on the most updated configurations,
statistics of the host inventory on both the new and old environment can be easily extracted and
presented in migration status dashboards.

Use Case #4 – Highlighting the Exceptions


Mis-configurations or non-standard configurations can be highlighted by extracting configuration
data using pre-defined criteria. For example, it can be a mis-configuration if a storage device
does not have a remote replication relationship while the other devices belonging to the same
host do. Another example is to check whether the LUN masking configurations align to the
zoning configurations – a missing zone that makes a LUN masking ineffective can be
highlighted.

2014 EMC Proven Professional Knowledge Sharing 18


Practical Considerations
If you decide to build your own Storage Configuration Repository, look at the practical
considerations discussed in this section.

Skill Set
Storage administrators are usually not experts in application development. Building the Storage
Configuration Repository requires programming, web framework, and database design skills.
Some organizations have chosen to outsource this initiative to external professional services.

Difficulty Acquiring Server Resources


Demand for a Storage Configuration Repository is usually initiated from the storage team.
However, the storage team does not have the control over server provisioning. In some
organizations, the storage team may need to provide a lot of justifications and go through a
lengthy approval process to get a server (either a physical server or VM). Some storage teams
have given up getting a server grade environment; instead, they just use some old PCs to run
their repository.

Difficulty Acquiring Software Licenses


Usually, the software used by the storage team is acquired together with the storage hardware.
To build the Storage Configuration Repository, a database license, e.g. MS SQL Server license,
may need to be acquired. However, there is usually no precedent for the storage team to
acquire such licenses. There could be no mechanism or be very difficult to chargeback the
licensing cost to the storage team.

To avoid these “troubles”, storage teams have decided to use open source software and
platforms. For example, some choose Postgres database, JAVA, and Tomcat.

On-going Maintenance
Even when construction of the Storage Configuration Repository is outsourced to professional
services, the storage team remains responsible for the on-going maintenance and
troubleshooting. Maintenance tasks could include backup, performance monitoring, and
ensuring the success of data importing jobs. As well, some issues could occur occasionally that
require troubleshooting. For example, a scheduled job for extracting and importing storage
configurations could suddenly stop working for no apparent reason.

2014 EMC Proven Professional Knowledge Sharing 19


Setting Up Data Import for New Storage Equipment
During initial setup of the Storage Configuration Repository, data import and processing were
configured for every piece of existing storage equipment. However, after a few years, new
storage equipment will be acquired to replace the old. Even if the repository was built in-house,
there is no guarantee that anyone will remember how the data import for storage equipment can
be set up after a few years’ time. So, it is very important to make sure every procedure is well
documented.

Future Extensions
After building the Storage Configuration Repository as a foundation, we can continue to
enhance the repository. In this section, we discuss some possible enhancements that can be
made to provide extra value.

Include Performance Data into the Repository


In the above discussion, we focused on importing configuration data. However, it is also
possible to import storage performance data.

At present, a question like, “What is the 95th-prcentile of IOPS that the server produced?” is
difficult to answer as we are not talking about the 95th-percentile of each individual device.
Rather, we need to look at the performance data for the whole server. We cannot just calculate
the 95th-percentile for each storage device and sum them to get the result. Instead, the following
steps are needed:

 Look at which storage devices the server is using and extract the corresponding
performance data (IOPS)
 Aggregate the IOPS values and get the total IOPS generated by the server at each time
interval (i.e. every 5 minutes)
 Use statistical functions to calculate the 95th-percentile of the aggregated IOPS values

It is impossible to do this manually. However, after importing the performance data into the
repository, reports can easily be made to answer questions like this.

Storage administration tools like Unisphere for VMAX can schedule performance data exports.

2014 EMC Proven Professional Knowledge Sharing 20


Correlate Other Non-Storage Related Data
In storage reporting, we may need to summarize storage use for an individual business line,
project, or application. This information is useful for chargeback and management reporting as
well. Non-storage related information can also be included into the repository. Servers that a
particular application is using can be specified and a report generated from the application’s
perspective.

During a storage migration project, migration-related information—e.g. date for migration—can


be included into the repository. For example, total capacity to be migrated can be easily
obtained when there is a change to the migration schedule, i.e. when some servers are re-
assigned to another migration event. This information is extremely useful for providing quick
answers to migration planning.

For manual input, we may need to set up additional data sources for data import, e.g. manual
construction of CSV files containing the mapping between projects and servers. Another way is
to provide a web interface to facilitate editing of these manual inputs and save them to the
repository.

Conclusion
This article discussed the value of having a Storage Configuration Repository and how to
establish it as practice. After reading the article, the audience should be able to get an idea—
along with some technical details—on how they can build their own storage configuration
repository and reporting facilities.

2014 EMC Proven Professional Knowledge Sharing 21


EMC believes the information in this publication is accurate as of its publication date. The
information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION


MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO
THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an
applicable software license.

2014 EMC Proven Professional Knowledge Sharing 22

You might also like