Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

An introduction to event monitoring using the AIX

Event Infrastructure
Cheryl L. Jennings May 17, 2011
Trishali Nayar

The AIX® Event Infrastructure is an extensible framework for monitoring multiple types of
system events. This article gives an overview of the monitoring interface, as well as pointers for
writing event monitoring applications.

Introduction
The AIX Event Infrastructure is an event monitoring framework for monitoring predefined and user
defined system events. Within the context of the AIX Event Infrastructure, an event is defined as
a change in state or in value which can be detected within the kernel or a kernel extension at the
time the change occurs.

Each type of event, which may be monitored, is associated with an event producer. Event
producers are simply sections of code which can detect an event as it happens and notify the AIX
Event Infrastructure of event occurrences. Some examples of available event producers are:

• modFile: The modFile event producer monitors for modifications to the content of files.
• utilFs: The utilFs event producers monitors the utilization of a file system.
• waitTmPgInOut: The waitTmPgInOut event producer monitors for the average wait time, in
milliseconds, of threads waiting for page in or page out operations to complete over a one
second period.

For a full listing of available event producers, see the Related topics section.

Events are represented as files within a pseudo file system. Existing file system interfaces (read(),
write(), select(), and so on) are used to specify how and when monitoring applications (also called
consumers) should be notified and to wait on and read data about event occurrences. The path
name to a monitor file within the AIX Event Infrastructure file system is used to determine which
event a consumer wishes to monitor.

© Copyright IBM Corporation 2011 Trademarks


An introduction to event monitoring using the AIX Event Page 1 of 13
Infrastructure
developerWorks® ibm.com/developerWorks/

Figure 1. Example instance of a mounted AIX Event Infrastructure file system

Inside the AIX Event Infrastructure file system, there are four basic file types:

1. Monitor factories: Monitor factories are directories with the ".monFactory" extension. They
are directory representations of event producers. These directories are automatically created
when the AIX Event Infrastructure file system is mounted.
2. Monitor files: Monitor files are designated by a ".mon" extension and represent events to be
monitored. They only exist under a monitor factory which represents their associated event
producer.
3. List files: List files are special data files that end with the ".list" extension. Currently only one
list file exists, evProds.list. Reading this file will return the list of all available event producers.
4. Subdirectories: Subdirectories are used for ease of management and to represent the full
pathname of the event.

Monitoring events with the AIX Event Infrastructure


The AIX Event Infrastructure is contained in the bos.ahafs fileset on AIX 6.1 TL 6 and AIX 7.1.
To monitor events, first install the bos.ahafs fileset and mount an instance of the AIX Event
Infrastructure file system:
mkdir /aha
mount -v ahafs /aha /aha

You may mount the file system on any desired mount point. Examples in this article assume it has
been mounted on /aha.

Determining which monitor file to use


Once the filesystem is mounted, you must determine the pathname of the monitor file that
corresponds to the event you wish to monitor. Each event producer has a different set of
instructions for determining the pathname you should use. For additional information, see:

• Pre-defined event producers


• Pre-defined event producers for a Cluster Aware AIX instance
In the monitoring example that we will follow in the next section, we will be monitoring for
modifications to the /etc/passwd file. To determine the pathname for the monitor file, we need to
determine which event producer corresponds to this event. Since /etc/passwd is a regular file,
the modFile event producer should be used to monitor for modifications to the contents of the file.
According to the modFile documentation, "a monitor file with the same path as the file you wish to
monitor should be created under the modFile.monFactory directory." So, assuming the AIX Event

An introduction to event monitoring using the AIX Event Page 2 of 13


Infrastructure
ibm.com/developerWorks/ developerWorks®

Infrastructure file system is mounted at /aha, the full pathname for the monitor file would be: /aha/
fs/modFile.monFactory/etc/passwd.mon

It should be noted that monitor files can have the same pathname starting from the associated
monitor factory and represent different events. For example, the file /aha/fs/modDir.monFactory/
home/cherylfs.mon monitors for file creation and deletion within the /home/cherylfs directory. The
file /aha/fs/utilFs.monFactory/home/cherylfs.mon monitors for the utilization of the filesystem /
home/cherylfs.

Example event monitoring flow


The following is a high level view of a typical event monitoring flow. In this example, the /etc/
passwd file is monitored by an event consumer (monitoring application). This flow illustrates the
actions taken by both the event consumer and the AIX Event Infrastructure when an event is
monitored.

Figure 2. Example event monitoring flow

At this point, the monitoring application is asleep until an event occurrence is detected. The AIX
Event Infrastructure watches file operations to see if a monitored file is modified.

An introduction to event monitoring using the AIX Event Page 3 of 13


Infrastructure
developerWorks® ibm.com/developerWorks/

Figure 3. Example event monitoring flow continued

As seen in the previous example, the typical flow for a monitoring application is:

1. Set up event monitoring using the write() system call.


2. Wait on event occurrences with the select() (or a blocking read()) system call.
3. Read event occurrence data through a read() system call.
4. Parse event occurrence data and take appropriate action.
5. If desired, wait for further event occurrences.

Each step is examined in this article. A sample program, mon_modFile_event.c, has been
provided as an example you can download.

Setting up event monitoring


Once the AIX Event Infrastructure file system has been mounted and the appropriate monitor
file identified, the monitor file must be created. This may be done with an open() call with the
O_CREAT flag specified. See Related topics for information on creating monitor files.

Once the monitor file has been created and opened, specifications on how and when to be notified
must be written. A complete listing of the available options are available in the found in Related
topics section (see "Writing to the monitor file").

While most of the monitoring specifications are straightforward, the AIX Event Infrastructure has
two behaviors for the notify count (NOTIFY_CNT) specification:

An introduction to event monitoring using the AIX Event Page 4 of 13


Infrastructure
ibm.com/developerWorks/ developerWorks®

Figure 4. NOTIFY_CNT differences

In the case of NOTIFY_CNT=-1, if another event occurrence is detected while the monitoring
application is not currently blocked in a select() or read() call, that event occurrence data is logged
in the buffer allocated for this consumer. Once the monitoring application attempts another select()
or blocking read(), those calls will return immediately since there is unread event occurrence data
waiting in the buffer.

In the sample program, mon_modFile_event.c, the file to be monitored through the modFile event
producer is passed in as an argument from the user. If necessary, subdirectories are created in the
AIX Event Infrastructure file system to create the necessary monitor file.

Once the monitor file is opened, the string


"CHANGED=YES;WAIT_TYPE=WAIT_IN_SELECT;INFO_LVL=1" is written to the monitor file
since:

1. The modFile event producer has specified the AHAFS_THRESHOLD_STATE capability


(CHANGED=YES).
2. The mon_modFile_event program will wait in a select() call
(WAIT_TYPE=WAIT_IN_SELECT).
3. The modFile event producer does not pass a message, and there is no need for the stack
trace in this program (INFO_LVL=1).
The mon_modFile_event program monitors continuously for events. Since the default value for
NOTIFY_CNT is -1, this does not need to be specified.

Waiting on event occurrences


Monitoring applications may monitor for more than one event, and multiple applications may
monitor the same event with different monitoring specifications.

It is important to note that monitoring of events does not begin until the monitoring application
issues a select() or a blocking read() call. There are several conditions which cause select() or a
blocking read() to return. These conditions are listed in the AIX 6.1 information center (see Waiting
on events).

An introduction to event monitoring using the AIX Event Page 5 of 13


Infrastructure
developerWorks® ibm.com/developerWorks/

The sample program blocks in a select() call once the monitoring information is written to the
monitor file. If the select() call returns an error, a special flag is passed to the parsing function to
indicate that the error format will be used in the output.

Unavailable event occurrences


For some event producers, there may be some types of event occurrences that cause monitored
events to become invalid. Some examples are:

• The unmounting of a monitored file system through the utilFs event producer.
• The removing or renaming of a monitored file through the modFile event producer.
• The death of a process monitored through the processMon or pidProcessMon event producer.

Once an unavailable event occurrence has occurred, users may not monitor that event until
it becomes available again. Ideally, the monitoring application identifies unavailable event
occurrences and takes corrective action. This will cause the event to become valid again. The
documentation for each event producer lists, which return codes, indicates that an unavailable
event occurrence has been detected. The return codes for event producers are defined in sys/
ahafs_evProds.h.

For local unavailable event occurrences, the monitor file associated with the event is deleted.
Monitoring applications may read event data from deleted monitor files while the file descriptor for
the monitor file is still open but may not block for further event occurrences. Once the monitoring
application has taken corrective action and the event is valid again, the following actions must be
taken to resume monitoring:

1. The file descriptor for the deleted monitor file must be closed.
2. The monitor file must be re-opened with the O_CREAT flag.
3. Monitoring specifications must be written to the file.

At this point, the monitoring application may wait for event occurrences again in select() or read().

The sample program inspects the return code from the event producer (RC_FROM_EVPROD)
to determine if the event occurrence was an unavailable event occurrence. It attempts corrective
action for some of the possible unavailable event occurrence types and ceases monitoring for
others.

Reading event occurrence data


Event data consists of <keyword>=<value> pairs and the data collected depends on the
capabilities of the event producer and the INFO_LVL specified by the monitoring application.
Capabilities for each event producer are listed in the event producer's documentation.

Event data may only be read once and no more than one event's worth of data is returned in a
single read call. For example, say that the data for two event occurrences have been copied into
the buffer before the consumer reads from the monitor file, and the event data for each event
occurrence has 256 bytes worth of data. If the consumer calls read() for 4096 bytes, only the 256

An introduction to event monitoring using the AIX Event Page 6 of 13


Infrastructure
ibm.com/developerWorks/ developerWorks®

bytes of the first event is returned to the user. A second read call needs to be performed to obtain
the data from the second event.

In the sample program, event data is always read with a buffer of 4K. This is the recommended
read size since most event occurrence data will be less than 4K. Reading with this large of a buffer
means that no partial event occurrences will be returned due to insufficient space in the buffer
passed to the read() call.

Examples of event occurrence data


For an event producer which has specified AHAFS_THRESHOLD_STATE and
AHAFS_STKTRACE_AVAILABLE and passes a message to the event consumers, the three levels
of output look like this:

INFO_LVL=1 INFO_LVL=2 INFO_LVL=3

BEGIN_EVENT_INFO BEGIN_EVENT_INFO BEGIN_EVENT_INFO


TIME_tvsec=1269863383 TIME_tvsec=1269863383 TIME_tvsec=1269863383
TIME_tvnsec=455993143 TIME_tvnsec=455993143 TIME_tvnsec=455993143
SEQUENCE_NUM=0 SEQUENCE_NUM=0 SEQUENCE_NUM=0
PID=6947038 PID=6947038 PID=6947038
UID=0 UID=0 UID=0
UID_LOGIN=0 UID_LOGIN=0 UID_LOGIN=0
GID=0 GID=0 GID=0
PROG_NAME=cat PROG_NAME=cat PROG_NAME=cat
RC_FROM_EVPROD=1000 RC_FROM_EVPROD=1000 RC_FROM_EVPROD=1000
END_EVENT_INFO BEGIN_EVPROD_INFO BEGIN_EVPROD_INFO
event producer message here event producer message here
END_EVPROD_INFO END_EVPROD_INFO
END_EVENT_INFO STACK_TRACE:
ahafs_prod_callback+3C4
ahafs_cbfn_wrapper+30
ahafs_vn_write+204
vnop_rdwr+7E4
vno_rw+B4
rwuio+12C
rdwr+184
kewrite+16C
.svc_instr
write+1A4
_xwrite+6C
_xflsbuf+B0
__flsbuf+9C
copyopt_ascii+2C0
scat+388
main+11C
__start+68
END_EVENT_INFO

For an event producer which has specified AHAFS_THRESHOLD_VALUE_HI and has not
specified AHAFS_STKTRACE_AVAILABLE and passes a message to event consumers, the three
levels of output look like this:

An introduction to event monitoring using the AIX Event Page 7 of 13


Infrastructure
developerWorks® ibm.com/developerWorks/

INFO_LVL=1 INFO_LVL=2 INFO_LVL=3

BEGIN_EVENT_INFO BEGIN_EVENT_INFO BEGIN_EVENT_INFO


TIME_tvsec=1269866715 TIME_tvsec=1269866715 TIME_tvsec=1269866715
TIME_tvnsec=16678418 TIME_tvnsec=16678418 TIME_tvnsec=16678418
SEQUENCE_NUM=0 SEQUENCE_NUM=0 SEQUENCE_NUM=0
CURRENT_VALUE=3 CURRENT_VALUE=3 CURRENT_VALUE=3
RC_FROM_EVPROD=1000 RC_FROM_EVPROD=1000 RC_FROM_EVPROD=1000
END_EVENT_INFO BEGIN_EVPROD_INFO BEGIN_EVPROD_INFO
event producer message here event producer message here
END_EVPROD_INFO END_EVPROD_INFO
END_EVENT_INFO END_EVENT_INFO

If there is an error from the event producer, all event producers have the following format for all
INFO_LVLs:
BEGIN_EVENT_INFO
TIME_tvsec=1269868036
TIME_tvnsec=966708948
SEQUENCE_NUM=0
RC_FROM_EVPROD=19
END_EVENT_INFO

If a consumer is monitoring a value event and the current value already exceeds the requested
threshold, the following format is used to record this EALREADY event:
BEGIN_EVENT_INFO
TIME_tvsec=1281837726
TIME_tvnsec=446010404
SEQUENCE_NUM=0
CURRENT_VALUE=70
RC_FROM_EVPROD=56
END_EVENT_INFO

Each event producer provides information on what is included in the output for event occurrences
they produce. The return codes for event producers are defined in sys/ahafs_evProds.h.

The sample program checks for the error format in the event occurrence data if the select()
call returned an error. Otherwise, it uses the format included in the modFile event producer's
documentation. It uses sscanf() to read in the values for the corresponding keywords.

The SEQUENCE_NUM keyword


Sequence numbers are maintained per event, per consumer. The same event occurrence may
have different sequence numbers for different consumers, depending on when the consumers
began monitoring the event.

The sequence number is reset to 0 following any cessation in monitoring. The following image
illustrates the behavior of the SEQUENCE_NUM keyword.

An introduction to event monitoring using the AIX Event Page 8 of 13


Infrastructure
ibm.com/developerWorks/ developerWorks®

Figure 4. Two consumers monitoring the same event

BUF_WRAP and EVENT_OVERFLOW Keywords


Event data is kept in a circular buffer per consumer, per event monitored. If event data is being
written faster than the consumer can read it, there is a possibility of a buffer wrap. If a buffer wrap
occurs, event data will be overwritten such that there is no partial event data returned through a
read() call. Figure 5 illustrates what the buffer looks like before and after a buffer wrap:

An introduction to event monitoring using the AIX Event Page 9 of 13


Infrastructure
developerWorks® ibm.com/developerWorks/

Figure 5. Buffer wrap condition

In a buffer wrap condition, the first read returns only the keyword BUF_WRAP. The second read
returns the event data for the next, whole event occurrence. The SEQUENCE_NUM field should
be consulted to see how many event occurrences may have been overwritten from a buffer wrap.
If monitoring applications experience buffer wrap conditions, they may increase the size of their
monitoring buffer specified in BUF_SIZE. This cannot be done dynamically.

If the consumer is using a very small buffer, there is a possibility that the event data from one event
occurrence may be larger than the buffer. In this case, the keyword EVENT_OVERFLOW will be
written to the buffer, with as much event occurrence data as can fit inside the buffer. The first read
in an overflow case returns only the keyword EVENT_OVERFLOW. The next read returns the
event data that was able to fit in the buffer.

If a wrap condition occurred with this overflow condition, the first read returns the keyword
BUF_WRAP, the second read returns the keyword EVENT_OVERFLOW and the third read returns
the event data that was able to fit inside the buffer.

The sample program checks to see if the event data passed in contains the BUF_WRAP keyword.
A warning is printed to the user if a buffer wrap condition has occurred. The sample program does
not check for EVENT_OVERFLOW since it specifies an INFO_LVL of 1 and uses the default buffer
size of 4K. Currently, the maximum size of a modFile event with an INFO_LVL of 1 will be less than
4K.

Duplicate event data consolidation


To reduce the occurrence of buffer wraps, the AIX Event Infrastructure consolidates duplicate,
unread events. When new event data is collected, it is compared to the most recent unread event

An introduction to event monitoring using the AIX Event Page 10 of 13


Infrastructure
ibm.com/developerWorks/ developerWorks®

data. If the data from each event occurrence is exactly the same (except the timestamp and
sequence number), the timestamp and sequence number of the previously written event data are
updated to reflect this new, duplicate event occurrence.

Duplicate event data consolidation can be distinguished from buffer wrap conditions in that the
keyword BUF_WRAP will not be read in the consolidation case.

The sample program maintains its own internal sequence number. It compares the expected value
to the actual value read from the event occurrence data to determine if duplicate events were
consolidated. This internal sequence number is reset to 0 if an unavailable event was detected.

An introduction to event monitoring using the AIX Event Page 11 of 13


Infrastructure
developerWorks® ibm.com/developerWorks/

Downloadable resources
Description Name Size
Program monitors for modifications mon_modFile_event.c 10KB

An introduction to event monitoring using the AIX Event Page 12 of 13


Infrastructure
ibm.com/developerWorks/ developerWorks®

Related topics
• Read the Waiting on events list. These conditions are listed here: http://
publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.baseadmn/doc/
baseadmndita/aix_ev_in_mon_wait.htm
• http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.baseadmn/doc/
baseadmndita/aix_ev.htm
• See how to create monitor files.
• See "Writing to the monitor file" for a complete listing monitor file options.

© Copyright IBM Corporation 2011


(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)

An introduction to event monitoring using the AIX Event Page 13 of 13


Infrastructure

You might also like