Professional Documents
Culture Documents
Chan-Connecting The Scattered Pieces For Storage Reporting
Chan-Connecting The Scattered Pieces For Storage Reporting
VMAX ..................................................................................................................................... 8
EMC ControlCenter............................................................................................................. 8
Summary ...............................................................................................................................11
Database ...........................................................................................................................14
Conclusion ................................................................................................................................21
Disclaimer: The views, processes or methodologies published in this article are those of the
author. They do not necessarily reflect EMC Corporation’s views, processes or methodologies.
At present, many companies rely on native tools (e.g. EMC Unisphere®, EMC ControlCenter®,
Cisco DCNM, etc.) to manage the storage infrastructure and extract information for reporting.
These tools have done a very good job of helping storage administrators manage and provide
reporting functionalities for that particular type of equipment. However, these tools do not talk to
each other – the tools for managing fabric only provide fabric/SAN switch-related information
while the tools for storage array only provide array-specific information. Sometimes, answers to
common, simple questions cannot be easily produced by these tools, e.g. “What does the end-
to-end connectivity for this server look like?”, “When this server is retired, how much storage
resources can be freed up and re-purposed?” A holistic, interconnected dimension of storage
reporting should be made available to show the interrelationships from server to SAN switches,
SAN switches to storage array, and storage array to storage volumes provisioned.
So, as the title suggests, all the essential elements for a comprehensive storage report are
available but scattered around different places or tools. What is needed is to find a way to
connect them and provide different dimensions for storage reporting, e.g. from the project/
business line perspective, from the server/application perspective and, of course, from the
traditional storage array perspective. To produce these co-related reports, a scalable and robust
EMC Symmetrix® VMAX® SAN infrastructure will be used to illustrate the technical details of
extracting the source data, processing them, storing them in the Storage Configuration
Repository, and how the reports are produced. Technical details include, “What and how can
VMAX information be extracted?”, “How can this information be transformed and imported into
the configuration repository?”, etc. Other practical considerations and future extensions will also
be discussed.
Taking a deeper look, we find that all the control and configurations of the storage infrastructure
is actually built around one element – the World Wide Name (WWN). In simple words, the WWN
is a global identifier of storage Fibre Channel ports (for storage array front-end ports and server
Host Bus Adapter {HBA ports}) and storage devices. WWN is used to control which ports can be
interconnected with each other and which storage devices can be accessed by these storage
ports.
If we can gather the WWN relationship in a storage infrastructure, we can co-relate and present
a logical view of the whole storage infrastructure. For example, if we can identify the WWN of a
server HBA, the zoning configuration on the SAN switches, and the LUN masking information
on the array, we can easily know how much storage a server is using as well as the
corresponding logical connection. When building the Storage Configuration Repository for
showing the end-to-end relationships, we can focus on gathering information around the WWN
and make things co-related. The following diagram illustrates how the WWN connects each
storage infrastructure component.
HBA WWN
Storage Ports LUN Fabric
Masking Login
LUN
Masking Switch Port
Storage Devices
Replication
Relationships SAN Switch
Remote Storage Devices
LUN
Masking
HBA WWN
DR Server
In the following sections, we will discuss what information we should extract from the storage
infrastructure components to build an end-to-end relationship.
VMAX
For VMAX, there are two commonly used data sources:
Information to be collected:
Masking information
Device information
SRDF® relationship
FA port information
Pool information
FAST™ VP information
EMC ControlCenter
A component in EMC ControlCenter (ECC)—StorageScope—has its own Oracle database as
the data repository. It provides database views for accessing the data. We can use the views as
the data sources for our purposes. A database connection, e.g. ODBC, JDBC, etc., is needed in
order to access the database using SQL queries to those views. After that, the data can be
stored and manipulated further. For details about accessing these database views and the data
available, please refer to StorageScope documentation.
This method became popular few years ago as most Symmetrix environments had ECC
installed and the information provided by the views is quite comprehensive. However, other
products like Unisphere for VMAX and Symmetrix Management Console (SMC) are starting to
replace ECC as the most popular management tool for Symmetrix environments. Hence, we are
There is an excellent function in Solutions Enabler CLI which can specify that the command
output results be in XML format by adding the “-output xml_element” option to the symcli
command. Using XML outputs can make automated data importing much easier.
Here are some sample commands that are useful to extract VMAX configuration data in XML
format:
Device information
o symdev –sid <sid> list -v -output xml_element > symdev.xml
SRDF relationship
o symrdf -sid <sid> list -v -output xml_element > symrdf.xml
Pool information
o symcfg -sid <sid> list -pool -thin -mb -output xml_element >
pool.xml
Masking information
o symaccess -sid <sid> list view -v -detail -output xml_element >
masking.xml
Device and Pool relationships
o symcfg -sid <sid> list -tdev -detail -output xml_element >
tdev.xml
We will use this approach for extracting configuration data from a Symmetrix VMAX.
Fabric logins – the relationship between WWN and SAN Switch port
Zones and Zonesets – the relationship between HBA and storage ports
SAN switches usually provide a command line interface and command outputs can be collected
for further processing. Here are some examples:
Fabric Logins:
o show flogi database
Zone information
o show zoneset active
Some storage administrators might think that it is relatively less important to collect SAN and
Fabric information. The configuration information they usually need for reporting is just port
utilization and zone information. This information will not change frequently and can easily
tracked manually.
Host Information
Host information includes host-related attributes like OS and Host model. The most important
information in this discussion is the HBA WWN(s) of a host. We need to have “Host Name to
HBA WWN” mappings in order to link up all other storage infrastructure components.
One of the methods is to rely on tools. Most EMC storage administrators should be familiar with
the emcgrab/emcreport tools to collect host information for a configuration snapshot or
troubleshooting. EMC also provide a nice web portal called E-Lab Advisor (ELA) to accept
emcgrab/emcreport output file uploads. In this web portal, a user can see an inventory of hosts
which emcgrab/emcreport output files have uploaded. A user can then select the hosts and
generate a SAN Summary report in Excel format. The SAN Summary report presents a list of
hosts in tabular form with WWN(s) as one of the columns.
Summary
Below is a summary of information to be collected and the suggested frequency of data
collection. (We will discuss “Simple Solution” and “Scalable Solution” in the next section.)
We will discuss two alternatives. One is simpler but with less functionality while the other is
more comprehensive and can provide more opportunities for future extensions.
A Simple Solution
This solution focuses only on storage array configurations (not much correlation with other
storage infrastructure components) and is only semi-automated. It requires some manual tasks
and provides limited functionalities. However, storage administrators might find it very useful to
keep track of storage configuration changes and produce regular storage utilization reports. This
is especially true for a storage environment having less than five arrays.
In the last section, we discussed how to collect VMAX configuration data in XML files. Now, we
can use XSL transformation to extract and transform these XML files into simpler formats such
as CSV and HTML which can be imported to Excel or directly used for reporting.
Transform
CSV output
After collecting output for several weeks, a report of the tier utilization trend can be easily made.
Some open source XML editors can provide XML/XSL editing and transformation tools.
A Scalable Solution
A more scalable option is recommended for storage infrastructures having more than five arrays.
We can build a web application with a database to store historical storage configuration. When
comparing to the simple solution discussed above, this option is more robust and automated for
data collection and reporting. The diagram below illustrates the high level architecture of this
option of the Storage Configuration Repository.
XML Files
Extract,
Transform &
Load (ETL) Database
Host-WWN
Storage Administrator Mapping Static Report
Command Output
Web Page
SAN Switch
Database
The database is the core component of the storage configuration repository. It stores the most
recent, as well as historical data, for reporting. The database structure reflects the relationship
between storage infrastructure components and configurations. A simplified version of the
Entity-Relationship diagram of the database is shown below.
However, building OLAP cubes require special knowledge and may not be easy.
Web Page
It is very common to use a web interface to access a database. Everyone is familiar with web
browsers and web pages are not very difficult to build. With a web interface, information in the
Static Report
Sometimes, a storage administrator may need regular reports on the storage infrastructure.
The formats of these reports are not expected to change frequently. Hence, report templates
can be constructed and reports can be generated regularly to a shared location or even
published to a web portal. Here are some examples of static reports:
However, there are two major considerations here. The first is to cater for cluster servers. If we
only aggregate the storage capacity used by a list of servers, the shared devices used by
clusters could be double counted. Consider the following scenario:
A 2-node cluster
Using a shared disk 50GB in size
Node#1 and Node#2 both “see” the 50GB disk
If we just aggregate the storage capacity used by Node#1 and Node#2, the total capacity will
become 100GB instead of the real usage of 50GB. Ideally, the reporting/query facility should be
able to “de-dupe” these capacities.
Another consideration is applications that do not use array-based replication for DR, e.g. some
databases that use native data replication. In such cases, we cannot get the hostname and the
corresponding capacity used in the DR site from the repository. We need to either store these
During the planning phase of a migration, we can rely on the query/reporting functionalities to
provide immediate results for decisions. As data will be refreshed regularly, the repository can
provide the most updated configurations snapshot for reporting.
When executing the migration, storage administrators can rely on the repository to provide
detailed storage configurations. For example, the repository contains information that can be
used to list the devices, the local and remote replication relationships, related cluster nodes, and
DR hostnames in a migration event. It is particularly useful for some complex storage
environments where a host is using multiple arrays with some replicated and some non-
replicated storage devices.
The migration progress can also be tracked. Based on the most updated configurations,
statistics of the host inventory on both the new and old environment can be easily extracted and
presented in migration status dashboards.
Skill Set
Storage administrators are usually not experts in application development. Building the Storage
Configuration Repository requires programming, web framework, and database design skills.
Some organizations have chosen to outsource this initiative to external professional services.
To avoid these “troubles”, storage teams have decided to use open source software and
platforms. For example, some choose Postgres database, JAVA, and Tomcat.
On-going Maintenance
Even when construction of the Storage Configuration Repository is outsourced to professional
services, the storage team remains responsible for the on-going maintenance and
troubleshooting. Maintenance tasks could include backup, performance monitoring, and
ensuring the success of data importing jobs. As well, some issues could occur occasionally that
require troubleshooting. For example, a scheduled job for extracting and importing storage
configurations could suddenly stop working for no apparent reason.
Future Extensions
After building the Storage Configuration Repository as a foundation, we can continue to
enhance the repository. In this section, we discuss some possible enhancements that can be
made to provide extra value.
At present, a question like, “What is the 95th-prcentile of IOPS that the server produced?” is
difficult to answer as we are not talking about the 95th-percentile of each individual device.
Rather, we need to look at the performance data for the whole server. We cannot just calculate
the 95th-percentile for each storage device and sum them to get the result. Instead, the following
steps are needed:
Look at which storage devices the server is using and extract the corresponding
performance data (IOPS)
Aggregate the IOPS values and get the total IOPS generated by the server at each time
interval (i.e. every 5 minutes)
Use statistical functions to calculate the 95th-percentile of the aggregated IOPS values
It is impossible to do this manually. However, after importing the performance data into the
repository, reports can easily be made to answer questions like this.
Storage administration tools like Unisphere for VMAX can schedule performance data exports.
For manual input, we may need to set up additional data sources for data import, e.g. manual
construction of CSV files containing the mapping between projects and servers. Another way is
to provide a web interface to facilitate editing of these manual inputs and save them to the
repository.
Conclusion
This article discussed the value of having a Storage Configuration Repository and how to
establish it as practice. After reading the article, the audience should be able to get an idea—
along with some technical details—on how they can build their own storage configuration
repository and reporting facilities.
Use, copying, and distribution of any EMC software described in this publication requires an
applicable software license.