Tivoli Netcool OMNIbus 7.3.1 Large Scale and Geographically Distributed Architectures - Best Practice - v1.0

Tivoli Netcool/OMNIbus 7.3.
1
Large scale and geographically
distributed architectures
Best Practices
Licensed Materials – Property of IBM

Note: Before using this information and the products it supports, read the information in
“Notices” on page 20.
© Copyright IBM Corporation 2011, 2012.

US Government Users Restricted Rights–Use, duplication or disclosure restricted by GSA
ADP Schedule Contract with IBM Corp.
Contents
About this publication .......................................................................................................................iv
Intended audience ......................................................................................................................................... iv
What this publication contains..................................................................................................................... iv
Conventions used in this publication.......................................................................................................... iv
Typeface conventions ........................................................................................................................... v
Operating system-dependent variables and paths ........................................................................... v
Chapter 1 Executive summary ........................................................................................................... 6
Chapter 2 Introduction ........................................................................................................................ 7
Chapter 3 Details of the new architecture model ......................................................................... 10
Architecture model building block............................................................................................................. 10
Event aggregation at the dashboard layer ................................................................................................. 10
Network bandwidth considerations .......................................................................................................... 12
Chapter 4 Lab test environment ...................................................................................................... 14
Test architecture ............................................................................................................................................ 14
Test hardware................................................................................................................................................ 15
Composition and distribution of the test events....................................................................................... 15
Users and filters ............................................................................................................................................ 16
Metrics gathered ........................................................................................................................................... 17
Metric 1: How long does the AEL initially take to load the whole data set? .............................. 17
Metric 2: How long do AEL auto-refreshes take to execute? ........................................................ 17
Test results ..................................................................................................................................................... 17
Results analysis ............................................................................................................................................. 19
A note about data caching ................................................................................................................. 19
Notices.................................................................................................................................................. 20
Trademarks .................................................................................................................................................... 22
Tivoli Netcool/OMNIbus 7.3.1

Large scale and geographically distributed architectures Best Practices Contents
© Copyright IBM Corporation 2011, 2012. IBM
iii
About this publication

This document provides a best practice model for deploying a Tivoli Netcool solution in
which more than one Aggregation pair of ObjectServers is required. This is typically in a
scenario where the solution is handling extremely large amounts of events, large amounts of
data or both ― or where the business requirements necessitate that the solution components
are geographically distributed.
This document provides a detailed description of the best practice large scale or
geographically distributed architecture model ― and provides some metrics taken from a test
system set up across the IBM WAN using actual customer event and user parameters.
The model described in this document has been deployed into production by a large (200K+
employees) North American Tivoli Netcool customer in the financial industry. With the new
model in place, they enjoy enhanced scalability; a robust solution, free of race conditions; and
the ability for any one geographical region to maintain operations in isolation ― despite loss
of connectivity to the other geographical regions.
This best practice model requires Tivoli Netcool/OMNIbus 7.3.1 and two of its sub-
components: the standard multitier architecture configuration and the WebGUI.
Intended audience
This publication is intended for anyone preparing to deploy a large scale or geographically
distributed Tivoli Netcool/OMNIbus solution.
What this publication contains

This publication contains the following sections:
Executive summary on page 6;
Provides a summary on the architecture described in this document, some key features
and advantages, and references its use in a customer production environment.
Introduction on page 7;
Provides background as to the types of scenario where the new architecture model
described in this document can be leveraged and introduces the fundamentals of how it
works and important factors to consider when using the new model;
Details of the new architecture on page 10;
Provides a detailed description of the architecture, how it works and includes factors that
should be considered when planning a deployment;
Lab test environment on page 14;
Provides a description of the WAN-based IBM test environment set up; includes some
performance metrics of the new architecture model; and provides an analysis and
commentary of the results.
Conventions used in this publication

This publication uses several conventions for special terms and actions and operating system-
dependent commands and paths.

Large scale and geographically distributed architectures Best Practices About this publication
iv
Typeface conventions
This publication uses the following typeface conventions:
Bold
Lowercase commands and mixed case commands that are otherwise difficult to
distinguish from surrounding text
Interface controls (check boxes, push buttons, radio buttons, spin buttons, fields,
folders, icons, list boxes, items inside list boxes, multicolumn lists, containers, menu
choices, menu names, tabs, property sheets), labels (such as Tip: and Operating
system considerations:)
Keywords and parameters in text
Italic
Citations (examples: titles of publications, diskettes, and CDs)
Words defined in text (example: a nonswitched line is called a point-to-point line)
Emphasis of words and letters (words as words example: "Use the word that to
introduce a restrictive clause."; letters as letters example: "The LUN address must
start with the letter L.")
New terms in text (except in a definition list): a view is a frame in a workspace that
contains data
Variables and values you must provide: ... where myname represents....
Monospace
Examples and code examples
File names, programming keywords, and other elements that are difficult to
distinguish from surrounding text
Message text and prompts addressed to the user
Text that the user must type
Values for arguments or command options
Operating system-dependent variables and paths

This publication uses the UNIX convention for specifying environment variables and for
directory notation.
When using the Windows command line, replace $variable with %variable% for
environment variables, and replace each forward slash (/) with a backslash (\) in directory
paths. For example, on UNIX systems, the $NCHOME environment variable specifies the
directory where the Network Manager core components are installed. On Windows systems,
the same environment variable is %NCHOME%. The names of environment variables are not
always the same in the Windows and UNIX environments. For example, %TEMP% in
Windows environments is equivalent to $TMPDIR in UNIX environments.
If you are using the bash shell on a Windows system, you can use the UNIX conventions.

v
Chapter 1 Executive summary

This document presents a new architectural model for the deployment of large scale Tivoli
Netcool/OMNIbus environments, geographically distributed Tivoli Netcool/OMNIbus
environments ― or environments that are both large scale and geographically distributed.
MAKING USE OF EXISTING BEST PRACTICE
The new model makes use of standard configuration ― the standard multitier architecture
configuration ― that ships with Tivoli Netcool/OMNIbus 7.3.0 (or later) and is therefore
robust and configured for high performance.
The following link provides details on the standard multitier architecture configuration and
how to deploy it:
http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.netcool_OMNIbus
.doc_7.3.1/omnibus/wip/install/concept/omn_esf_configuringdeploymultitieredarch.html
The new model extends the standard multitier architecture and allows operators to have
consolidated views of events from multiple Tivoli Netcool/OMNIbus instances ― either
collocated on the same site or geographically distributed ― even potentially on different
continents.
Since there is no programmatic limit to the number of datasources that WebGUI can connect
to, this new model provides a method of deploying Tivoli Netcool/OMNIbus in a manner
that is truly "ultra scalable".
ALREADY IN PRODUCTION
IBM can confirm that this new architecture model is currently in use in production by a large
North American Netcool customer in the financial sector. "Customer A" has 4 Network
Operation Centres (NOCs) geographically distributed equidistantly around the world. Each
NOC draws events from three different globally distributed Tivoli Netcool/OMNIbus
partitions.
Customer A enjoys seamless, race-condition-free performance from their globally distributed
Tivoli Netcool/OMNIbus deployment. All of the regional NOCs can continue to operate in
the event of a disconnect with one or more of the other regional partitions ― without manual
intervention. Similarly, recovery of the outage is automatic and just as seamless.
LAB TEST RESULTS
In addition to outlining how to deploy the new architecture, this document provides test
results from a test environment that was set up over the IBM WAN spanning 3 continents to
prove the concept.
The test system was loaded up with 11,300 events ― a typical number of events found in the
production system of Customer A. 30 unique users were logged in and an AEL opened for
each user with the specified filter applied. Even with relatively low specification hardware
and relatively slim network pipes, the tests returned favourable results.
The new architecture model described in this document is recommended for use by anyone
contemplating deploying Tivoli Netcool/OMNIbus on an ultra large scale ― or where the
requirements are such that a geographically distributed model is necessary.

6
Chapter 2 Introduction
Tivoli Netcool/OMNIbus has always been the watchword for high availability and
scalability in the event management space. To provide high availability and scalability out of
the box, Tivoli Netcool/OMNIbus ships with a pre-canned configuration to support
multitiered environments ― called the standard multitier architecture configuration ― for the
purpose of supporting higher numbers of events, users or both. Using the standard multitier
configuration alone, however, is not always an ideal fit in an environment that is either
required to be globally distributed or is of such a large scale, that it exceeds the standard
multitier architecture model's capabilities.
ADDITIONAL REQUIREMENTS
First, although the traditional multitiered architecture model is highly scalable, there is of
course a limit as to the load a single instance can handle. There are occasions where the
amount of events to be handled exceeds even the capabilities of even a 3-tiered system ―
particularly one with high-load intensive custom functionality; a very high numbers of
events or custom table data; or a combination of these or other factors.
Second, as the geographical boundaries of businesses operating in a global marketplace
continue to expand, so too do the business requirements of the network management
monitoring systems that support them. The need to have 24 hour or "follow-the-sun"
operations within a globally shared system is becoming increasingly important and
commonplace to customers that support a globally distributed infrastructure. A globally
distributed architecture model is needed to support such an infrastructure. In such cases, an
architecture model based purely on the primary/backup concept is not a natural fit.
In both cases, an augmented solution is needed.
This document provides a best practice architecture model for the two scenarios described
above using both the Tivoli Netcool/OMNIbus standard multitier architecture configuration
in conjunction with new functionality available in the WebGUI.
COMBINED DATASET VIEWS
Within the context of Tivoli Netcool/Webtop or the WebGUI component of Tivoli
Netcool/OMNIbus, a datasource is defined as an ObjectServer or failover pair with zero or
more Display ObjectServers configured to provide user load sharing capability. New
functionality introduced in WebGUI provides the ability of the WebGUI server to pull event
data from multiple datasources and seamlessly combine the data into common views ― such as
Active Event Lists (AELs), Monitor Boxes and map page elements. This means there is no
longer any technical need to combine the events within the underlying Tivoli
Netcool/OMNIbus infrastructure in order to obtain such views ― as was the case with Tivoli
Netcool/Webtop.
Given this new freedom, the underlying events can now be partitioned off into multiple,
disparate Tivoli Netcool/OMNIbus systems and the events from any number of these
systems combined by WebGUI into common views, as required.
Note: Depending on the context, the term "partition" is used in this document to describe
either the grouping of events by functional need ― or to refer to an individual Tivoli
Netcool/OMNIbus system ― which could itself be 1, 2 or 3-tiered.
PARTITION SCENARIOS
The decision whether or not to split incoming events over multiple Tivoli Netcool/OMNIbus
partitions and also how best to divide the events up, depends on the scale of the overall
solution, the physical distribution of the overall solution or a combination of the two.

7
Consider the following example scenarios:

If the end solution is intended to be globally distributed ― for example: distributed across
the UK, the US and Japan ― then it makes sense to partition the events by country and
deploy a separate Tivoli Netcool/OMNIbus partition within each country.
If the need to partition is on the basis of load ― for example: high event numbers and/or
heavy custom functionality causing more load than a single system can practically
handle ― then events should be divided up and grouped based on the need to be
correlated with each other. Events that need to be correlated with each other should be
grouped together into the same Tivoli Netcool/OMNIbus partition. ObjectServer
triggers at the Aggregation layer of each partition ― or Tivoli Netcool/Impact ― are then
able to correlate the events with each other separately within each partition.
Partitioning both by geography and within geography is also possible ― in scenarios
where both the above scenarios apply. For example, there may be two or more
geographically separated regions ― each with one or more local Tivoli
Netcool/OMNIbus partitions.
In all cases the following notes apply:
The standard multitier architecture configuration should always be used when deploying
each partition ― even if it is only a single failover pair ― since a "partition" is simply a 1, 2
or 3-tiered Tivoli Netcool/OMNIbus system.
The decision whether to deploy a 1, 2 or 3-tiered system in each partition should be
decided separately for each partition and will depend on the loading characteristics within
each partition. Note that this determination is the same as when deploying just a single
1, 2 or 3-tiered Tivoli Netcool/OMNIbus environment on its own.
Cross partition correlation ― that is, correlation of events or data between two or more
Tivoli Netcool/OMNIbus partitions ― can be provided by Tivoli Netcool/Impact. There
is no programmatic limit within Tivoli Netcool/Impact as to the number of datasources it
can access in order to carry out cross-partition correlation tasks. Note that such a system
would have to be rigorously tested prior to deployment into a production environment ―
especially if Tivoli Netcool/Impact were accessing partitions across a WAN.
In the case of a geographically distributed system, at least one WebGUI server should be
deployed locally for each region to ensure the region can continue to operate in isolation
if there is a network problem between the local region and any of the remote regions. It
is recommended to deploy WebGUI servers in groups of two or more within each
geographical region for resiliency.
PULLING THE DATA TOGETHER
Once the disparate Tivoli Netcool/OMNIbus partitions have been deployed and configured,
each partition from which WebGUI will pull events will be defined within the WebGUI
datasource definition file as a datasource. WebGUI can then be configured to pull filtered
event data sets from one or more of the datasources as required.
Each AEL, Monitor Box or map page element can be individually configured to access as
many of the datasources as is required. The WebGUI server will then seamlessly combine
the events from the selected datasources into the given AEL, Monitor Box or map page
element for display.
This new architecture model provides enormous scalability options, provides a solution to
"follow-the-sun" operations and forms a best practice for geographically distributed or ultra
large Tivoli Netcool/OMNIbus deployments. The purpose of this document is to describe
the design and layout of this new architecture model.

8
Also included are some performance metrics on WebGUI's ability to pull data across a WAN
from globally distributed datasources.
Note: Note that the performance metrics provided in this document were carried out on
IBM networks and do not constitute any sort of guarantee of performance ― they merely
provide an indication of what can be achieved given the stated parameters.

9
Chapter 3 Details of the new architecture model

This chapter outlines the basic design for ultra large scale or geographically distributed Tivoli
Netcool/OMNIbus architectures.
Architecture model building block

As mentioned in the previous chapter, the standard building block for the ultra large scale or
geographically distributed Tivoli Netcool/OMNIbus architecture is the standard multitier
architecture model that ships with the Tivoli Netcool/OMNIbus product.
The resulting architecture will be made up of two or more instances of the standard multitier
architecture model ― each of which may be a 1, 2 or 3-tiered system.
The Tivoli Netcool/OMNIbus 7.3.1 Installation and Deployment Guide contains an introduction
on the standard multitier architecture model and instructions on how to deploy it:
http://publib.boulder.ibm.com/infocenter/tivihelp/v8r1/topic/com.ibm.netcool_OMNIbus
.doc_7.3.1/omnibus/wip/install/concept/omn_esf_overviewstandardarch.html
Whether to use 1, 2 or 3 tiers in each partition should be decided on a case by case basis ―
and will depend on the individual loading characteristics within each partition.
Each partition will exist and operate independently of the other partitions. The ability to
operate autonomously is an important feature ― particularly within a geographically
distributed environment ― because it ensures that individual partitions can continue to
operate even when isolated from the other partitions. The ability for each partition to operate
autonomously is also important in that it means that an outage in one partition will not
directly affect the performance or availability of the other partitions.
Event aggregation at the dashboard layer

A new feature in WebGUI (version 7.3.0 and later) enables an entity, monitor box or Active
Event List to contain a seamless aggregate of event data from more than one datasource.
This feature is key to the architecture design as it provides the mechanism that pulls together
event data from an indefinite number of sources.
When a datasource goes "offline" or is unavailable to the WebGUI server, the users are
notified that the datasource is temporarily unavailable and the events from that datasource
simply drop out of their views. When the datasource is restored, the "datasource down"
notification disappears from the users' views and the events are refreshed and restored to the
users' views.
A diagrammatic representation of how WebGUI connects to each of the datasources
separately is depicted below.
Note: The "collection" of ObjectServers depicted in each of the datasources in the following
diagrams are meant to generically represent a 1, 2 or 3-tiered standard multitier architecture
system and are not intended to literally represent any particular number of ObjectServers or
tiers in particular.

10
WebGUI server
Combined event view layer
Read/Write
Connection
Netcool datasource Netcool datasource Netcool datasource
EXAMPLE:
Widgetcom have three data centres: one in London, one in Bangalore and one in Wellington;
each one with a 3-tier Tivoli Netcool/OMNIbus installation. Widgetcom wish to set up a
"follow the sun" support model ― where managed systems around the world are monitored
on a 24 hour basis by the three globally distributed NOCs during each respective data
centre's business hours. Each of the three data centres must have visibility of events from the
other two ― and each one must also be able to function in isolation if cut off from the other
two.
The Netcool Administrator elects to include a datasource definition for each region in the
WebGUI datasource definitions files on all servers. Active Event Lists and other dashboards
can then be constructed using the combined event sets from all three datasources. The
resulting architecture is shown below:
WebGUI servers WebGUI servers WebGUI servers
London datasource Bangalore datasource Wellington datasource
Probes Probes Probes

11
With the new datasources and event views in place, the operators in each of the three data
centres are then able to see and even deal with events from any of the three regions. All of
the data centres can operate independently of each other ― including in isolation.
NOTES:
It is recommended to enable caching for remote datasources to minimise the amount of
data moving over the WAN.
When operators use common filters or views, this reduces the data transfer further when
caching is enabled. The decision whether or not to use data caching for the local
datasource would have to be decided based on the variance of filters and views in use by
operators.
EXAMPLE:
Within a year, Widgetcom's London data centre expands its operations to the point where the
business requirement is to hold more events at any one time than a single Tivoli
Netcool/OMNIbus Aggregation pair can accommodate. This is primarily due to the large
increase in the number of standing rows at the Aggregation layer combined with the complex
custom correlations and event processing operations being carried out on the Aggregation
layer ObjectServers on an on-going basis.
After analysis, the Netcool Administrator identifies that the events are made up of roughly
50% application X events and 50% application Y events. Further, the business requirements
have no custom correlation needs that involve events of both types.
The Netcool Administrator elects to install a second Tivoli Netcool/OMNIbus multitier
partition and relocates application Y events to the new system. This has the effect of halving
the event number from the incumbent system and evenly spreads the load across the two
Tivoli Netcool/OMNIbus partitions. The WebGUI datasource definitions file is updated to
include the new datasource that holds the application Y events ― and the WebGUI operator
views are updated to include events from the new datasource.
With the additional partition in-place, the London data centre is now able to handle a much
higher volume of events than before. Additionally, the maximum event handling headroom
is elevated significantly ― affording Widgetcom more leeway in an event storm scenario.
NOTES:
There is no programmatic limit as to the maximum number of datasources (ie. Tivoli
Netcool/OMNIbus multitier partitions) a WebGUI server can connect to ― hence this
partitioning technique allows for extensive lateral scalability.
Inter-partition event correlation and processing can be achieved by using Tivoli
Netcool/Impact, if required.
Network bandwidth considerations

A key success factor for this distributed architecture model is the ability of WebGUI to pull
event data over the network from the datasources. In some cases however, the datasources
may be physically located in different geographical regions. Hence the architecture design is
constrained by the bandwidth capabilities of the link between the WebGUI server and each of
its datasources.
In planning a system based on this distributed model therefore, sufficient bandwidth
provisioning must be done prior to deployment. Bandwidth requirement calculations must
include the following considerations:

12
The average size of an event including an allowance for the occasional journal and/or
detail per event should be calculated. A typical value of 3 KB can generally be used as a
multiplier if a more specific value can not be calculated at the time;
The average number of events the WebGUI server will retrieve per minute from the
target Display layer ObjectServer(s) should be calculated. If data caching is enabled for
the remote datasource, it will be the count of all matching events for all filters that are in
use. If data caching is not enabled for the datasource, this figure would then need to be
multiplied by the maximum number of logged in users at any one time;
The number of WebGUI servers to be deployed.
Multiplying these three values together will give you the number of kilobytes that will need
to be transferred per minute from the remote datasource to the local site ― which can then be
converted to bits per second ― and therefore used as a bandwidth provisioning requirement.
Note: Such calculations would typically give an indication only of the required bandwidth
required. Any bandwidth provisioning would have to include significant contingency for
peak event loads and would likely have to be revisited if any of the above parameters
significantly changed.

13
Chapter 4 Lab test environment

As mentioned in Chapter 3, a key success factor in the deployment of the architecture model
described in this document is WebGUI's ability to pull event data from remote datasources.
To prove the concept, experiments were carried out over IBM's WAN to demonstrate how
well the distributed architecture model can work when globally distributed across a real
WAN environment ― as opposed to a simulated WAN environment.
This chapter provides detail on the setup of the test environment including: the architecture,
the hardware used, the number of events, the number of users, the user filter in use, a
description of the metrics gathered ― and the results.
The test environment was configured to match key characteristics of the production
environment of one of IBM's large North American Tivoli Netcool customers ― hereafter
referred to as: "Customer A".
The test partitioned model architecture was set up across three sites in London (United
Kingdom), Austin, Texas (United States) and Perth (Australia). This is similar in
geographical distribution to the production environment of Customer A.
Test architecture
Three servers were set up in three roughly equidistant parts of the world: one in London, UK
(rheles41); one in Austin, Texas, USA (emonster); one in Perth, Australia (snapper).
Tivoli Netcool/OMNIbus 7.3.1 was then installed on all three servers and the standard
multitier architecture configuration used to construct a simple 2-tier Aggregation/Display
system on each one. Each server installation was comprised of a single Aggregation layer
ObjectServer, a single Display layer ObjectServer and a unidirectional Display ObjectServer
Gateway connecting the two.
Note: Since failover scenarios were not going to be included in these tests, failover
components were not included in the environment. It is recommended however ― and,
indeed, best practice ― to include failover components in a real, production system.
A diagram of the test architecture configuration is shown below:
Display layer
ObjectServer
Unidirectional Display
ObjectServer Gateway
Aggregation layer
ObjectServer
WebGUI was then installed on the London test server (rheles41) and was configured to
connect to the datasource in each of the three regions.

14
The WebGUI server was configured to connect to each datasource in standard Dual Server
Desktop (DSD) mode ― that is, a read/write connection to the Display layer ObjectServer
and a write connection to the Aggregation layer ObjectServer.
The measured bandwidth of the London/Perth link was ~230 Kbps. The measured
bandwidth of the London/Austin link was ~400 Kbps.
A diagram of the test environment is shown below:
WebGUI server (UK)
rheles41
Bandwidth: ~230 Kbps Bandwidth: ~400 Kbps
snapper rheles41 emonster
Datasource: PERTH Datasource: LONDON Datasource: AUSTIN

Netcool partition (Australia) Netcool partition (UK) Netcool partition (US)
Test hardware
The hardware specifications of the machines used in the tests are as follows:
Hostname Platform Installed components CPU Memory
rheles41 (London) Red Hat Enterprise Tivoli Netcool/OMNIbus 1 × 3.2 GHz 2 GB

Linux ES release 4 7.3.1 datasource (x86)
WebGUI
emonster (Austin) SunOS 5.9 Tivoli Netcool/OMNIbus 2 × 400 MHz 2 GB

7.3.1 datasource (UltraSPARC-II)
snapper (Perth) SunOS 5.10 Tivoli Netcool/OMNIbus 2 ×1.2 GHz 4 GB

7.3.1 datasource (UltraSPARC-IIIi)
Composition and distribution of the test events

The synthetic events injected into the test environment were engineered to be representative
of the events of Customer A both in terms of content and quantity. The type and number of
events used in the test environment is shown in the table below.

15
Event type Austin, Texas, USA London, UK Perth, Australia
Type A 2,000 600 320
Type B 1,000 300 160
Type C 1,000 300 160
Type D 500 150 80
Type E 500 150 80
Type F 4,000
Totals 9,000 1,500 800
The overall number of events across all systems in the test environment therefore was 11,300.
The synthetic events were created via an ObjectServer trigger located within the Aggregation
ObjectServer within each region. The trigger runs once every 60 seconds and carries out the
following tasks:
Inserts 10% of the total number of events for each event type;
Deletes any events older than 10 minutes.
This ensures that:
After an initial 10 minute period, the total number of events remains constant to the
numbers specified in the table above;
There is 10% event turnover per minute.
The purpose of the 10% event turnover is to simulate event churn ― which would be present
in a real environment. This is important because it simulates the need for the event views to
refresh with new data on a constant basis. It was deemed that 10% event churn per minute is
a conservatively high estimation of event churn and would likely be a lot less in reality.
Users and filters

A typical number of users logged in to a single WebGUI server at any one time at Customer A
is around 30. The testing therefore involved logging in 30 unique users simultaneously while
the performance metrics were gathered to simulate loading onto the WebGUI server.
Each of the logged in users had an AEL open with the following filter selected:
Acknowledged = 0 and
Flash = 0 and
FirstOccurrence < (getdate() - 300) and
Node <> 'server01'
Since none of the synthetic events were either acknowledged or flashing nor did any have the
Node field set to server01, the filtered event set all users were viewing consisted of all
events whose first occurrence is more than 5 minutes ago. The user filter used in the tests
included all of these field comparisons so that it would be comparable in terms of complexity
and hence induced load on the ObjectServer during execution as a "real world" filter would.
16
Since the events are replaced at a rate of 10% per minute, this meant that each user AEL was
displaying approximately 5,650 events. This can be calculated by:
Total events (11,300) ― number of new events (ie. less than 5 minutes old)
→ 11,300 total events ― (11,300 × 10% × 5 minutes) = 5,650 events
This number of AEL events was deemed as a typical number of events that an operator
would have in their event list.
The AEL timed refresh was set to the default of 60 seconds for all users. Since the event
churn rate was 10% per minute, approximately 565 new events were inserted and
approximately 565 events were deleted with each AEL timed refresh.
Metrics gathered
It was deemed that the success of the tests would ultimately be judged on how good the end-
user experience is for the operators. That being the case, the following items were identified
as key metrics to be collected for this exercise. 100 measurements were taken for each of the
following two metrics:
Metric 1: How long does the AEL initially take to load the whole data set?
This metric measures the amount of time it takes the AEL to do a full load of the events when
the filter is first selected.
In order to exclude the length of time it takes for the AEL applet to load (which is heavily
client dependent), this measurement was taken when the filter selection was changed from
one filter to the target filter. 100 measurements were taken for this exercise.
Metric 2: How long do AEL auto-refreshes take to execute?

After an AEL is loaded, there is a count-down clock in the lower-right-hand corner of the
AEL which shows how many seconds until the next auto-refresh happens. This clock was
also used to measure the number of seconds it took for the auto-refresh to complete.
This differs from the first metric because when the AEL auto-refreshes, it only retrieves any
updates since the last refresh ― rather than retrieving the whole data set. 100 measurements
were taken for this exercise.
Test results
The following table shows a summary of the measurements taken for the two metrics. The
values have been averaged and the standard deviation calculated. All values are shown in
seconds:
Metric 1: filter select change (full data reload) Metric 2: AEL auto-refresh (partial data reload)
1 1 1 1 1 1
1 1 1 1 1 1

17
4 1 1 1 1 1
1 5 1 1 1 1
1 1 6 1 1 1
1 1 1 1 1 1
1 4 1 1 2 1
7 1 3 1 1 1
1 1 1 3 3 1
1 1 1 1 1 1
1 1 1 1 7 1
1 1 1 2 1 1
2 1 1 1 1 1
1 1 2 2 1 1
3 1 1 1 1 1
5 4 1 1 1 1
1 3 1 1 1 1
1 1 1 4 1 1
2 3 1 1 1 1
1 1 3 1 1 1
1 1 1 1 1 1
1 1 2 1 1 1
1 1 1 1 1 1
3 4 1 1 1 1
5 1 1 1 1 4
1 6 1 1 9 1
1 1 1 1 1 1
1 1 3 1 1 1
1 1 2 1 1 1
1 1 1 1 2 1
3 7 1 1 2 1
1 1 2 1 1 1
5 1 1 1
1 3 1 1
Average 1.74 Average 1.29
Std. Dev. 1.4538 Std. Dev. 1.112782

18
Results analysis
The average load time for an AEL was 1.74 (±1.45) seconds whereas the average load time of
an AEL auto-refresh was 1.29 (±1.11) seconds. The time taken for an AEL to do a full load of
the data set therefore was typically around 50% more than that of an auto-refresh. Both
average metric values returned relatively low standard deviations indicating the average
values were fairly typical.
There were a small number of high values within the test results. Generally speaking, the
lower values will likely occur when the AEL is accessing the result set from the WebGUI
server's cache and the higher values when the AEL invokes the WebGUI server to access the
event data from the datasources directly (ie. when cache results have expired in each case).
In a WAN scenario, it would be expected to see occasional high return values due to network
latency. The latency would factor into AEL responsiveness during WAN-based queries ― for
example: queries the WebGUI server makes to the remote datasources. The response time
would ultimately depend on the reliability of the WAN link.
Caching was enabled for the remote datasources during these tests and the WebGUI trace file
reported that the cache was being accessed by the AEL clients about 63% of the time. This
means that AEL refreshes were only creating WAN traffic around a third of the time. This
highlights the value of intelligent filter construction and cache use.
Each datasource only had only one Display layer ObjectServer serving up event data to
WebGUI. It is expected that results would be more favourable on a "real" system where more
powerful hardware and more Display ObjectServers supporting the user load were
provisioned.
A note about data caching

It is recommended that caching be enabled for non-local datasources and, optionally,
disabled for local datasources. This will benefit in three ways.
First, AEL refresh times will generally be improved by caching non-local datasource data.
Second, by caching non-local datasource data, the amount of data being pulled across the
WAN will be minimised thus minimising WAN utilisation. Third, by not caching local
datasource data, "time to glass" values will be improved for local event data. This is because
local data will then be refreshed on-demand and sourced directly by the WebGUI server
from the local Display layer ObjectServers with each AEL refresh ― and not from the
WebGUI server's cache.
By not caching local datasource data, the load will likely be increased on local Display layer
ObjectServers ― compared to when caching is used ― however the load can be managed by
adding further Display layer ObjectServers, if required.

19
Notices
This information was developed for products and services offered in the U.S.A.
IBM® may not offer the products, services, or features discussed in this document in other
countries. Consult your local IBM representative for information on the products and
services currently available in your area. Any reference to an IBM product, program, or
service is not intended to state or imply that only that IBM product, program, or service may
be used. Any functionally equivalent product, program, or service that does not infringe any
IBM intellectual property right may be used instead. However, it is the user's responsibility
to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in
this document. The furnishing of this document does not grant you any license to these
patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual
Property Department in your country or send inquiries, in writing, to:
IBM World Trade Asia Corporation
Licensing
2-31 Roppongi 3-chome, Minato-ku
Tokyo 106-0032, Japan
The following paragraph does not apply to the United Kingdom or any other country where
such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES
CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF
ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied
warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are
periodically made to the information herein; these changes will be incorporated in new
editions of the publication. IBM may make improvements and/or changes in the product(s)
and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only
and do not in any manner serve as an endorsement of those Web sites. The materials at those
Web sites are not part of the materials for this IBM product and use of those Web sites is at
your own risk.
IBM may use or distribute any of the information you supply in any way it believes
appropriate without incurring any obligation to you.
Licensees of this program who wish to have information about it for the purpose of enabling:
(i) the exchange of information between independently created programs and other
programs (including this one) and (ii) the mutual use of the information which has been
exchanged, should contact:

Large scale and geographically distributed architectures Best Practices Notices
20
IBM Corporation
958/NH04
IBM Centre, St Leonards
601 Pacific Hwy
St Leonards, NSW, 2069
Australia
IBM Corporation
896471/H128B
76 Upper Ground
London
SE1 9PZ
United Kingdom
IBM Corporation
JBF1/SOM1 294
Route 100
Somers, NY, 10589-0100
United States of America
Such information may be available, subject to appropriate terms and conditions, including in
some cases, payment of a fee.
The licensed program described in this document and all licensed material available for it are
provided by IBM under terms of the IBM Customer Agreement, IBM International Program
License Agreement or any equivalent agreement between us.
Any performance data contained herein was determined in a controlled environment.
Therefore, the results obtained in other operating environments may vary significantly. Some
measurements may have been made on development-level systems and there is no guarantee
that these measurements will be the same on generally available systems. Furthermore, some
measurements may have been estimated through extrapolation. Actual results may vary.
Users of this document should verify the applicable data for their specific environment.
Information concerning non-IBM products was obtained from the suppliers of those
products, their published announcements or other publicly available sources. IBM has not
tested those products and cannot confirm the accuracy of performance, compatibility or any
other claims related to non-IBM products. Questions on the capabilities of non-IBM products
should be addressed to the suppliers of those products.
All statements regarding IBM's future direction or intent are subject to change or withdrawal
without notice, and represent goals and objectives only.
All IBM prices shown are IBM's suggested retail prices, are current and are subject to change
without notice. Dealer prices may vary.
This information is for planning purposes only. The information herein is subject to change
before the products described become available.

21
This information contains examples of data and reports used in daily business operations. To
illustrate them as completely as possible, the examples include the names of individuals,
companies, brands, and products. All of these names are fictitious and any similarity to the
names and addresses used by an actual business enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate
programming techniques on various operating platforms. You may copy, modify, and
distribute these sample programs in any form without payment to IBM, for the purposes of
developing, using, marketing or distributing application programs conforming to the
application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions.
IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these
programs.
If you are viewing this information softcopy, the photographs and color illustrations may not
appear.
Trademarks
These terms are trademarks of International Business Machines Corporation in the United
States, other countries, or both:
IBM
Tivoli
Netcool
SunOS, Sun, Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in
the United States, other countries, or both.
Red Hat, RHEL are trademarks or registered trademarks of Red Hat in the United States,
other countries, or both.
Adobe, Acrobat, Portable Document Format (PDF), PostScript, and all Adobe-based
trademarks are either registered trademarks or trademarks of Adobe Systems
Incorporated in the United States, other countries, or both.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of
Oracle, Inc. in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft
Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.

22

Tivoli Netcool OMNIbus 7.3.1 Large Scale and Geographically Distributed Architectures - Best Practice - v1.0

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tivoli Netcool OMNIbus 7.3.1 Large Scale and Geographically Distributed Architectures - Best Practice - v1.0

Uploaded by

Copyright:

Available Formats

Tivoli Netcool/OMNIbus 7.3.

Licensed Materials – Property of IBM

© Copyright IBM Corporation 2011, 2012.

Tivoli Netcool/OMNIbus 7.3.1

About this publication

What this publication contains

Conventions used in this publication

Tivoli Netcool/OMNIbus 7.3.1

Operating system-dependent variables and paths

Tivoli Netcool/OMNIbus 7.3.1

Chapter 1 Executive summary

Tivoli Netcool/OMNIbus 7.3.1

Tivoli Netcool/OMNIbus 7.3.1

Consider the following example scenarios:

Tivoli Netcool/OMNIbus 7.3.1

Tivoli Netcool/OMNIbus 7.3.1

Chapter 3 Details of the new architecture model

Architecture model building block

Event aggregation at the dashboard layer

Tivoli Netcool/OMNIbus 7.3.1

Netcool datasource Netcool datasource Netcool datasource

WebGUI servers WebGUI servers WebGUI servers

London datasource Bangalore datasource Wellington datasource

Probes Probes Probes

Tivoli Netcool/OMNIbus 7.3.1

Network bandwidth considerations

Tivoli Netcool/OMNIbus 7.3.1

Tivoli Netcool/OMNIbus 7.3.1

Chapter 4 Lab test environment

A diagram of the test architecture configuration is shown below:

Tivoli Netcool/OMNIbus 7.3.1

WebGUI server (UK)

snapper rheles41 emonster

Datasource: PERTH Datasource: LONDON Datasource: AUSTIN

Hostname Platform Installed components CPU Memory

rheles41 (London) Red Hat Enterprise Tivoli Netcool/OMNIbus 1 × 3.2 GHz 2 GB

emonster (Austin) SunOS 5.9 Tivoli Netcool/OMNIbus 2 × 400 MHz 2 GB

snapper (Perth) SunOS 5.10 Tivoli Netcool/OMNIbus 2 ×1.2 GHz 4 GB

Composition and distribution of the test events

Tivoli Netcool/OMNIbus 7.3.1

Event type Austin, Texas, USA London, UK Perth, Australia

Type A 2,000 600 320

Type B 1,000 300 160

Type C 1,000 300 160

Type D 500 150 80

Type E 500 150 80

Totals 9,000 1,500 800

Users and filters

→ 11,300 total events ― (11,300 × 10% × 5 minutes) = 5,650 events

Metric 2: How long do AEL auto-refreshes take to execute?

Tivoli Netcool/OMNIbus 7.3.1

Average 1.74 Average 1.29

Std. Dev. 1.4538 Std. Dev. 1.112782

Tivoli Netcool/OMNIbus 7.3.1

A note about data caching

Tivoli Netcool/OMNIbus 7.3.1

Tivoli Netcool/OMNIbus 7.3.1

Tivoli Netcool/OMNIbus 7.3.1

Tivoli Netcool/OMNIbus 7.3.1

You might also like