Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

McAfee SIEM Event Aggregation

Event aggregation is a simple concept, however, it is important to understand the nuances


to effectively operate the SIEM. In short, log aggregation allows a SIEM to be able to reduce
event volume by combining like events. There are multiple ways to adjust aggregation and
this post will cover each of the use cases. Feel free to add your own use cases or ask
questions in the comments. This information is current as of release 9.5.2 MR2.

Data Flow Primer

Due to the repetitive nature of log data, a great deal of efficiency can be gained by
consolidating like events based on common fields. The process starts at the device creating
logs. The logs are transferred to a Receiver, using any one of a variety of protocols, where
they are processed for storage on the ESM and ELM.

For the ELM, the Receiver bundles up the raw logs and provides them to the ELM where
they are compressed and digitally signed for integrity. These files are then added to a time-
based repository where they are available for full text search and integrity verification to
prove they have not been tampered with for the duration of the retention period.

For the ESM, the Receiver will parse the logs into fields, normalize the events into
categories and aggregate the data based on the repetition of like[1] events. This process is
basically creating metadata for the logs. The Receiver then inserts the metadata into its
local database and stores it there on a first in, first out (FIFO) basis. Meanwhile, the ESM is
querying each Receiver every few minutes to get the latest events and populate them into
the GUI.

The solution essentially stores two copies of the data. The metadata on the ESM provides
the operational representation of the raw data stored on the ELM.

[1] By default, “like” is defined as a common source IP address, destination IP address and signature ID.
SIEM Architecture

Note: For a combo appliance/VM, the data flow is the same between the virtual components that exist within the
appliance.
How Aggregation Works

By default, events are aggregated based on the same source IP, destination IP and event ID
within a single ESM->Receiver query window. This is called Dynamic Aggregation and it is
enabled by default by a per Receiver basis.

If Dynamic Aggregation were to be disabled, it would cause event aggregation records to


extend through multiple ESM->Receiver query windows. For example, some number of
events are aggregated under a single record, instead of closing out that record immediately
after the ESM queries the Receiver, it is held open to see if the event occurs again during
the next window. The same happens on the Receiver.

If the event occurs again during the next query window, the counters will be updated and
record will continue to be extended up until the max record time (12 hours by default). This
means that for each event that arrives there is an extra lookup on the both Receiver and
ESM and which will impact performance.

This is called Level 1 aggregation and used to be the default setting before Dynamic
Aggregation was added to increase performance and aggregation granularity. It's highly
recommended to always leave Dynamic Aggregation enabled.

This works really well in most scenarios, however, there are always exceptions.

Aggregation Use Case: Firewall Command Auditing

Every environment has its own unique use cases, requirements, business drivers, metrics,
alarms, reports, etc. One that I recently had the opportunity to work with had an excellent
use case to audit firewall changes by logging every command entered, generate a weekly
report with the firewall sessions and reconcile it with the change requests.

They had properly enabled the logging level so that each command entered was sent as an
individual log, but the logs, with the common event ID, source and destination IP addresses,
were being aggregated under a single "CLI Command Entered" event. The result was that
the fidelity of the text actually entered, which was parsed into the Command field, was not
visible in the aggregate event.

In this instance, the use case requires that every single event (CLI command) generated by
the log source be available for query and analysis and reporting in the ESM. For this to
work, aggregation must be disabled completely for the event ID to capture every single
command typed. Fortunately, manually typed firewall commands aren’t generating a high
volume of events so disabling aggregation for these events will not negatively impact
performance. There are instructions for disabling aggregation for a parsing rule ahead.
Aggregation vs. Performance

Aggregation is an effective method to summarize the events in such a way that the details
required for reporting, alarms and advanced correlation without requiring enormous
compute resources. The level of aggregation that is achieve varies between
implementations, but in general you can expect to see approximately a 10:1 average ratio
in most environments but it's not uncommon to see ratios of 30-50:1 for events with high
frequency and repetition like firewall flow setup logs. This means that the ESM only has to
analyze 10% of the events that the Receiver parses. This also enables the architecture to
scale horizontally by adding Receivers to feed a single ESM.

It also means that if aggregation were to be disabled completely that the ESM would not be
able to process events at scale. There is room for tuning aggregation however it’s important
to consider the impact of any changes that might drastically impact the event aggregation
ratio. It would not be a good practice to disable aggregation for a high volume event or low
quality event.

The aggregation ratio can also be impacted by the cardinality of the events. There is a
corner case that will reduce performance if the data fields show too much diversity. In most
cases, there is a semi-contiguous subnet where most of the data sources live which allows
for a normal level aggregation. If every event included a different source or destination IP,
which could be the case in a distributed DOS attack, then aggregation would be reduced and
impact performance.

To handle this sort of situation, the Receiver has two additional tiers of aggregation beyond
what Dynamic Aggregation provides.
Per the default settings, if a single event occurs more than 300,000 times in a single minute
then Level 2 Aggregation will kick in. This will cause the event to be aggregated more
aggressively. The destination IP for the event that crossed the threshold will be set to
0.0.0.0 and ignored for future events in that minute. This immediately reduces all the
records down to one and eases the performance that the burst required.

If the event count surpasses 350,000 in a minute then Level 3 Aggregation will kick in. This
will cause both the source and destination IP addresses to be reset to 0.0.0.0 leaving the
aggregation record to match every time that event occurs in that minute. In almost all
cases the default settings work well so it's not recommended to modify the Receiver wide
aggregation settings in most cases.

Disable Aggregation for a Parsing Rule

Best practice dictates that no changes are made at the Default Policy level. The most direct
route to disable aggregation for a single rule is to:
Find and highlight the event in a View and then select Show Rule from the top left context
menu.

This will open the Policy Manager with the rule and data source selected. You're able to
select the rule and disable Aggregation at this point.
Then roll out the policy for the change to take effect.

Aggregation Use Case: DNS Query Logs

Collecting DNS queries opens the door for numerous valuable use cases and is
reinforced even further when outgoing DNS is blocked. It's possible to compare
domains to threat feeds, indicators of compromise (IOCs), detect attacks like
FastFlux and monitor DNS sinkhole activity. The default aggregation settings are
not ideal for DNS queries because the domain data may be lost if the records are
aggregated on the source IP, destination IP (DNS server) and event ID (DNS query
event). DNS queries can represent a high volume of events so disabling
aggregation completely is not a recommended course of action. In this case, it is
most ideal to leverage Custom Aggregation to adjust the fields in which the DNS
events are aggregated while still maintaining some level of aggregation.
In the case of a DNS query event, custom aggregation allows for the common fields
to be changed to event ID, source IP and domain. This means that every time a
host makes a DNS query to the same domain within an ESM query window will be
aggregated under one event. A new aggregation record will be created each time
the domain changes so every domain is included in the metadata while still being
able to maintain some level of aggregation due to the repetitive nature of DNS. The
process to adjust Custom Aggregation follows.

Configure Custom Aggregation for a Rule

The most direct method to adjust the aggregation for an event is to locate the
event a View and then highlight Modify aggregation settings in the top left context
menu.

In the Custom Aggregation settings you're able to adjust the fields as needed.
You're also to see all of the devices using Custom Aggregation by selecting a Receiver,
going to Properties and Event Aggregation.
From there you're able to click the View button to show the list of devices with Custom
Aggregation settings applied.

Aggregation Use Case: Authentication Events

We've covered use cases for both disabling aggregation completely as well as using Custom
Aggregation to adjust the common fields used for aggregation. There is some amount of
discretion required to determine which events require additional granularity and how best to
provide it. One use case that can could use either is around authentication events.

There are numerous types of authentication events from different devices. Some
authentication events occur automatically in the background when remote files are accessed
or scheduled tasks are run. These can represent a very large volume of events. There are
also authentication events that represent interactive logins by humans that represent a
lower number, but more critical, events. The type and volume of event will dictate whether
aggregation should be disabled, adjusted or left at default.
In the picture below you can notice that events like "An account was successfully
logged on" (4624) happens at a very different rate than an event like "A login was
attempted using explicit credentials" (4648).

For event 4624, it would be best to use Custom Aggregation to match on common
event ID, source IP and source username. In this case, if there were multiple login
failures using different usernames from a single source IP address, there would be
a unique aggregation record for each username tracking the number of times it was
used. For an event like 4648 is better to disable aggregation completely due to the
low volume and high criticality of an explicitly performed login event.

Additional Aggregation Considerations

There are additional types of rules that could be considered for exemption from the default
aggregation settings. Some suggestions would be any custom correlation rules or direct
alarms (if this happens, alert me) might be good candidates as well. The key is to consider
the volume of the events and which fields are required for your use cases to find the best
balance between functionality and performance.

You might also like