Download as pdf or txt
Download as pdf or txt
You are on page 1of 86

SIT Internal

Lecture ICT 3204 Security


3 Analytics

Filtering, Normalization
and Correlation
SIT Internal

Review of Lecture 2 Contents

Apache web access log Column Description


1 Remote IP address
2 Identifier
3 User ID
○ 58.214.19.53 - - [21/Aug/2005:04:31:13 -
0400] "GET / HTTP/1.1" 403 3931 "-" 4 Date and time of entry
"compatible; MSIE 5.5; Windows 98" 5 Time zone
6 Type of request
7 URL and parameters
8 Protocol
9 HTTP status code
10 Size of data returned
SIT Internal

Review of Lecture 2 Contents

NetFlow log Column Description


1 Date
2 Time
○ 2017-02-24 04:54:54.917 42.682 UDP
3 Duration
84.77.114.176:57024 -> 10.16.54.6:19522 2 58 1 4 Protocol
5 Source IP address:port
6 “->”
○ NetFlow log provides statistical
information useful for examining 7 Destination IP address:port
anomaly in TCP/IP traffic 8 Number of packets transmitted
9 Number of bytes transmitted
10 Number of aggregated flows
SIT Internal

Review of Lab 2 Contents

Cyber threat intelligence


SIT Internal

Lecture 3 Contents
○ Events of security interests
○ Log filtering
○ Data normalization
○ Event correlation
SIT Internal

Event Logs to Monitor


SIT Internal

Security Related Host Logs


○ Host logs produced by OS ○ Host intrusion detection and
components prevention
○ Detect and block various attacks

○ Various network services logs of network, operating system, and


applications
○ Events recorded are related to
○ Logs of applications running on ○ Reconnaissance or probe detected
the system ○ Changes to executable files
SIT Internal

Windows Events to Monitor

○ Events about Windows Event ○ 1104(S): The security


Log service log is now full.
○ These events are recorded ○ 1105(S): Event log
to the Security event log, automatic backup.
regardless of the audit
policy ○ 1108(S): The event
○ 1100(S): The event logging service
logging service has encountered an error
shut down. while processing an
incoming event
○ 1102(S): The audit log published from %1
was cleared.

Events 1100 and 1102 may indicate malicious behaviors


of shutting down the Log Service or clearing Security
event log, to cover one’s activity
SIT Internal

Windows Events to Monitor

The list is not exhaustive


SIT Internal

Linux/Unix Events to Monitor

• Common items to search for in daily log reviews and forensics


SIT Internal

Web Server Events to Monitor


SIT Internal

OS logs
• Authentication

• Linux syslog, remote user authenticating with Secure Shell (SSH) daemon

• System startup, shutdown and reboot

• Linux syslog, system shutdown


SIT Internal

OS logs
• Service startup, shutdown and status change

• Solaris syslog, sendmail daemon starts up

• Service crash

• Linux syslog, FYP server shutting down involuntarily (due to a crash or a kill
command)
SIT Internal

OS logs
• Miscellaneous status message

• Linux syslog of a time synchronization daemon (NTPD)

• OS logs are security relevant


• Useful for intrusion detection, and
• incident response
SIT Internal

Network daemon logs


• Connection established to the service

• Linux syslog, successful connection to a POP3 mail daemon by a remote user “anton”

• Connection to server failed

• Linux syslog shows a connection failure (due to access controls) to a telnet service
SIT Internal

Network daemon logs


• Connection was established, but access was not allowed

• Linux syslog message shows an unsuccessful connection to the Secure Shell


server
• Various failure messages

• Linux syslog message shows a failure of a sendmail daemon to continue


talking to a client (likely a spam program)
SIT Internal

Network daemon logs


• Various status messages

• Linux syslog message indicates a successful Email transfer

• Network daemons present the most common entryways into the


system remotely and many of the attacks are targeted against them
SIT Internal

HIDS & HIPS


• Dragon HIDS examples
• A Nessus vulnerability scanner probe is detected by watching the FTP log

• Insecure system reconfiguration or corruption


• Dragon HIDS host sensor shows a critical system file deletion alert
SIT Internal

HIDS & HIPS


• Authentication or authorization failed
SIT Internal

Security related network logs


○ Network logs generated by network infrastructure
○ By routers and switches
○ NIDS, firewalls

○ Network infrastructure logs


○ Logins and logouts
○ Connection established to the service
○ Bytes transferred in and out
○ Reboots
○ Configuration changes
SIT Internal

Network Device Logs to Monitor


SIT Internal
SIT Internal

Accuracy Integrity Confidence

Concerns on free from defects


or misleading
free from
unauthorized
Priority or
severity
log data information alternation

Sanitization Normalization Time


synchronization
Remove or replace translate to a Challenges with
the attributes to well-known log time
be cleaned event format synchronization
SIT Internal

Accuracy of data
• Data discrepancy caused by

• Data decay (not updated data)

• Human errors and System errors


• Errors in recording data (Sensor data)
• Missing data
SIT Internal

Integrity of log data


○ Authenticate client and server, encrypt data
○ Send data in clear, but use dedicated network
○ Digital signature
SIT Internal

Confidence of log data


○ Take input from many disjoint areas, and deriving a more mature and
accurate fact from the set of all inputs
○ Reduce false positives, e.g.
○ IDS to consult a vulnerability database
○ IDS to implement a policy scheme whereby user and group profiles are
used to create acceptable network usage of individuals
SIT Internal

Sanitization of log data


○ Sanitized log data are extracted and placed into a secure file, can be
reconstructed at some later point

○ Remove redundant variables


○ Dependency between attributes
○ Handle missing values
○ Handle outliers
SIT Internal

Normalization of log data

• Inconsistent data representations or • Most logs are written to be readable


use of codes by different vendors by humans, not computers
• E.g., timestamp • Break down log messages into a
normalized format
• Make consistent types
• IP address xxx.xxx.xxx.xxx

• ISO 8601 standard


• YYYY-MM-DDTHH:MM:SS.SSS +/-H
SIT Internal

Time synchronization
○ Challenges with time synchronization
○ dead battery or other hardware failure
○ which time zone?
○ NTP clock drift causes time deviations at the order of seconds
○ syslog forwarder mystery: his time vs. my time
○ log lag
○ 5:17, AM or PM?
SIT Internal

Data Quality
• Data have quality if they satisfy the
requirements
• Accuracy - Errors
• Completeness – Missing values
• Consistency – Huge deviation
• Timeliness - Updated
• Believability – Trust in the data
• Interpretability – Ease to understand
SIT Internal

DATA CLEANSING
SIT Internal

Data Cleaning - Why


• Real-world data tend to be
• Dirty, incomplete, and inconsistent

• Data pre-processing techniques can


• improve data quality
• improve the accuracy of analysis, and
• efficiency of the subsequent process
SIT Internal

Data Cleaning
• Data filtering
• Irrelevant data fields
• Duplicated data entries, could be from different sources
• Redundant data that is heavily dependent and can be derived from other data,
e.g., collinearity between data, DoB and Age

• Data normalization and reformatting


• Break down known log message into a normalized format, e.g., inconsistent
representation between data sources
• Reformatting e.g., .pcap (for Wireshark) to csv (for Splunk)

• Handling data discrepancy


• Noise, outliers
• Missing values
SIT Internal

Filtering &
Raw logs Correlation
Normalization

Basic flow for filtering


and correlation
SIT Internal

Filtering &
Raw logs Correlation
Normalization

○ Filtering
○ Take in raw log data, determine whether to keep it
○ Normalization
○ Take the raw log, map its various elements to a common
format
○ Event – a normalized log message

○ Correlation
○ Normalized log data is input to correlation
○ Match a single normalized piece of data, or a series of
data, for the purpose of taking an action
SIT Internal

Filtering – Artificial Ignorance

• Take the things you know about, place them in an ignore file so you
can exclude things you know about.

• It is important to err on the side of keeping more data than filtering it


out.
SIT Internal

Filtering – Artificial Ignorance

Unix shell commands


cd /var/log
cat * | \
sed –e ‘s/^.*demo//’ –e ‘s/\[[0-9]*\]//’ | \
sort | uniq –c | \
sort –r –n > /tmp/xx
The “demo” string is the name of the system on which the commands are running.
The idea is to strip this and the preceding timestamps in the log messages so we can reduce the
variability in the log data.
SIT Internal

Filtering – Artificial Ignorance

Output
297 cron: (root) CMD (/usr/bin/at)
167 sendmail: alias database /etc/aliases.db out of date
120 ftpd: PORT
61 lpd: restarted
48 kernel: wdpi0: transfer size=2048 intr cmd DRQ

The number preceding the log message shows how many times the log message was seen in log
files.
SIT Internal

Filtering – Common and Useful Fields

Category Timestamps
Source IP Source port • E.g., login.success •Time generated
Destination IP Destination Port •Time received

User Information Priority


•E.g., username, •Low, medium, high Raw Log
directory
SIT Internal

Filtering – Removing Redundancy

• Can be derived from another attribute


• Use domain expertise

• Find the correlation between two attributes


• How does manipulating one attribute affect
the other

• Regression
• Best line to fit two attributes
• Equation
• Use one attribute to predict other
SIT Internal

Filtering – Removing Redundancy

• Example: Is there a relation between risk and reliability for


a process
• Use domain expertise
• To understand whether the question is sensible

• Find the correlation between risk and reliability


• Use statistics association

• Regression
• Take half of the dataset to
• Model an equation
• Validate the equation using other half of data set
• Predict risk from reliability and/or vice-versa
SIT Internal

Normalization
○ Parse the log messages that you would like to keep for piece apart
components to turn them into a common format.

○ Timestamps, IP addresses, etc


○ Translate (error) code into category
○ Vendor provides “ID = 6856” in the log message,
○ Transform it to Login Failure, as per defined by the vendor.
SIT Internal

Normalization - Steps
1. Get documentation for products you are using.
2. Read the documentation for descriptions of what the raw log data looks
like and what each field is.
3. Come up with the proper parsing expression to normalize the data.
○ Most log analysis systems utilize a regular expression
implementation to parse the data.
4. Test the parsing logic on sample raw log data.
5. Deploy the parsing logic.
SIT Internal

Identify values of the event


fields of the above Dragon
NIDS message
SIT Internal

Identify event fields (cont’d)

Some data is added when the


message is collected by the log
analysis solution
SIT Internal

Detecting Discrepancy
• Detect discrepancy
• Knowledge of metadata – Domain expertise
• Understanding data types and attributes

• Statistical data description


• Descriptive
• Symmetry and Skewness

• Outliers
• Standard deviation from mean
SIT Internal

Handling Missing Data


• Handle missing data VERY CAREFULLY
• Loads of data are missing
• Analysis is not possible
• Corrupt data

• Use a measure of central tendency for the attribute


• Normal and Symmetric distribution
• Mean can be used
• Other distribution
• Use probability

• Use the most probable value to fill in the missing value


• Regression equation
• Formal methods
• Decision tree
SIT Internal

Outlier Detection
• Quantitative data
• Descriptive statistics
• Standard deviation, Box plots
• Regression

• Example: Data consisting of probability of a


collection of emails to be spam
• Calculate descriptive statistics
• Draw boxplot
• Do a regression
• Find the points which deviate from the mean
SIT Internal

Outlier Detection
• Qualitative data
• Calculate similarity and correlation
• Use clustering methods
• Find the ones which cannot be
assigned to a cluster

• Example – Examine the requests send from a


specific region
• Find all the geo-codes for the location
• Cluster based on regions
• Eliminate the requests that do not
belong to a cluster
SIT Internal

Handling outliers

Attempt #1: Take the log of every value

A very long tail


SIT Internal

Handling outliers

Attempt #2: Clipping feature values

Take the log of every value


A very long tail
SIT Internal

Scaling feature values


○ Convert floating-point feature values from their
natural range (e.g., 100 to 900) into a standard range
(e.g., 0 to 1 or -1 to +1)

○ Benefits of feature scaling


○ Helps gradient descent converge more quickly
○ Helps the model learn appropriate weights for
each feature
○ Avoid paying too much attention to the
features having a wider range
SIT Internal

• on Lecture-2 contents • Open www.classpoint.app on


your web browser
• 3 MCQs • Key in the Class code that
appears in the top right-hand
corner of the presentation
• Type in your student ID and join
SIT Internal

Event Correlation
SIT Internal

Correlation makes a
difference between:
○ “14:10 7/4/20110 User Roberts Successful Authenticate to 10.100.52.105 from
10.10.8.22”

and...

○ “An Account belonging to Marketing connected to an Engineering


System from an office desktop, on a day when nobody should be in the
office”
SIT Internal

Correlation

○ Correlation is the process of matching similar or dissimilar


events from systems (hosts, network devices, security controls,
etc.)
○ Events from different sources can be combined and
compared against each other to identify patterns of behaviour
invisible to individual devices
○ They can also be matched against the information specific to
your business
SIT Internal

Correlation
○ Reduce false positives
○ E.g. Intrusion Detection System (IDS) to consult a vulnerability
database

○ Increase confidence in Priority or Severity


○ Take input from many disjoint areas, and deriving a more mature
and accurate fact from the set of all inputs
SIT Internal

Correlation
Basic forms of correlation
○ Rule based
○ Statistical

○ Correlate among logged events only (Micro correlation)


○ Correlate with other data sources (Macro correlation)
SIT Internal

Correlation
○ Normalization of raw event data is crucial to effectively perform atomic
correlation
○ Keeping normalized logs in a database table supports database style
searches
○ Show [All Logs] From [All Devices] from the [last two weeks], where
the [username] is [Roberts]
○ Just as with any database, event normalization allows the creation of
summarization reports
○ Which User Accounts have accessed the highest number of distinct
hosts in the last month?
SIT Internal

Rule Correlation
○ Correlate events by behavioural rules
○ Requires stateful rule engine
○ Pseudocode for reconnaissance attempts followed by a
firewall policy violation
○ If the system sees an event E1 where E1.eventType = portscan
○ followed by an event E2 where E2.srcip = E1.srcip and E2.dstip =
E1.dstip and E2.eventType = firewall.reject
○ then do something (Email, alert, etc.)
○ E1 is detected by IDS, E2 by a firewall that implicitly rejects
○ “Followed by” doesn’t mean follow immediately
SIT Internal

Rule Correlation
○ Functionalities required for rule correlation
○ Stateful behaviour
○ Counting
○ Timeout: e.g. default age-out period of five mins
○ Rule reuse: e.g. reuse components of conditional
statements
○ Priorities: dictate the order of rules to be performed
○ Language for specifying rules, e.g. XML, Lisp
○ Action: e.g. write to text files, create help desk tickets
SIT Internal

Micro-Level Correlation
○ Correlate fields within a single event or set of events
○ Source IP
○ Destination IP
○ Time
○ …
○ Match fields between events, across time periods, across devices
○ E.g. If a single host fails to log in to three separate servers using the
same credentials, within a 6-second time window, raise an alert.
SIT Internal

Macro-level Correlation
○ Pull in other sources of information, fusion correlation
○ E.g. compare vulnerability scan data with event data
○ Make reference to the Contextual data
○ E.g. user role on a particular system
○ Pull user information from an LDAP server or Active Directory server.
○ Contextual data can be input to rule correlation
SIT Internal

Environment contextual data


○ E.g. based on your company’s holiday schedule, raise an alert when
internal resources are accessed when everyone is at home.
○ Common environment triggers
○ Vacation schedules, Business hours, Holiday schedule
○ Access rights to internal resources
○ Repeating network events, e.g. vulnerability scans
○ Scheduled backups of systems, data stores
○ Maintenance schedule, e.g. router configuration changes and
reboots, OS patching
SIT Internal

Correlation Patterns

Micro-Level Macro-Level

Source IP Destination IP Time correlation Anti-port Geographic location Vulnerability


correlation correlation correlation correlation correlation

Interleaving Port correlation Watch list


address correlation correlation
SIT Internal

Source IP correlation
○ Sort a chunk of network connection data by the source IP
○ For analyst to visualize what the system is up to

○ Multiple related IPs working together


○ E.g. raise an alarm if one IP address connect to more than five of your
systems in an instance
○ What if attacker uses three or four systems on a subnet?
SIT Internal

Source IP correlation
- case study
○ BlackIce log (host IDS)
○ Arrival Time: Apr 4, 2000 20:49:31.0479
○ Version: 4
○ Header length: 20 bytes
○ Total length: 60
○ Identification: 0x5434
○ Source: 131.183.39.83 (131.183.39.83)
○ Destination: MY.NET.70.234 (MY.NET.70.234)
○ Transmission Control Protocol, Src port: 3611 (3611), Dst port: 53 (53)
○ Source host 131.183.39.83 has been detected probing the system with
BlackIce on destination port TCP 53 or DNS
SIT Internal

Source IP correlation
- case study (cont’d)
○ Snort records on 131.183.39.83
○ 04/04-20:42:57.484472 131.183.39.83:1641 -> MY.NET.1.0:53
○ 04/04-20:42:57.485577 131.183.39.83:1647 -> MY.NET.1.6:53
○ 04/04-20:42:57.485655 MY.NET.1.3:53 -> 131.183.39.83:1644
○ … (lots of records deleted)
○ 04/04-21:02:43.801043 131.183.39.83:2890 -> MY.NET.254.169:53
○ 04/04-21:02:44.795187 131.183.39.83:2924 -> MY.NET.254.203:53
○ 04/04-21:02:44.796316 131.183.39.83:2926 -> MY.NET.254.205:53

○ A network wide probe attempt


○ Because some systems have replied, the severity scale is fairly high
SIT Internal

Destination IP correlation
○ Target based analysis
○ Sort by destination IP to locate systems that have become servers
SIT Internal

Time correlation
○ Sorting by the time field and using the source IP as the second sort key
can merge files from more than one source to examine the network
activity that spans multiple sites

○ Sorting event fields in various ways helps us find the relations that might
otherwise remain hidden
SIT Internal

Interleaving addresses correlation


○ A technique commonly used by attackers to appear going low and slow
without actually slowing down
○ Probe or recon software often runs against multiple networks
simultaneously
○ Round robin scan hosts from multiple address blocks

○ We need multiple sensors to detect it


○ Report any detects to your CIRT, share detects to other organizations
SIT Internal

Port correlation
○ Interested when IDS detects activity targeting your Web server
○ Look for events having 80 or 443 as the destination port

○ Concerned about unwanted access attempts to a particular server


○ Event types, e.g. accept, drop from firewall events
○ Correlate events which have an event type of drop and a destination IP
address of the server
SIT Internal

Anti-port correlation
○ Use open port information along with firewall data to detect attacks in
the slow or low category.
○ Nmap can be used to track open ports on your systems
○ Pseudocode
○ if (event E1.dstport != (known_open_ports on event E1.dstip))
○ Then doSomething
○ It helps to detect worm
SIT Internal

Geographic location correlation

○ Website has tools that you can use to query on IP addresses and
networks, especially contact information.
○ Plot attacks on a map based on this information helps to track down evil
doers.
SIT Internal

Geographic location correlation


– case study
Enrich Web Server Log with Company Specific Information

○ Scenario: Public-facing company site catered for staff usage.


○ Data sources
○ Web server log
○ Data enrichment using company specific information
○ Country
○ Baseline
○ Blacklisted IP
○ Advantages of the macro correlation
○ Analyze the geographic distribution of authorized, unauthorized Web
access or access from black-listed source IP addresses
SIT Internal

Geographic location
correlation – case study
Apache access log has records of access which can be used to pinpoint and
identify
○ Location
○ Time
○ Type of request of client
○ Type of response of server

E.g.,
58.214.19.53 - - [21/Aug/2005:04:31:13 -0400] "GET / HTTP/1.1" 403 3931 "-"
"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)"
SIT Internal

Geographic location correlation – case


study
Records and information that are • Sample data
relatable to the company • List of Blacklisted IP Address
• Country – List of countries the 58.214.19.53, 210.87.31.250
company is located at • List of countries the company
operates in
• Baseline – Usual day to day CN, RU, US, BR
operations give a usual pattern and
• Company access frequency
trend, which is used as a baseline baseline
to detect anything abnormal

• Blacklisted IPs – From IP reputation


sites such as AlienVault, and from
results of previous analysis
SIT Internal

Geographical view of access after correlation


SIT Internal

Vulnerability correlation
○ Vulnerability scanners provide information on vulnerable host
○ Hostname or IP address, vulnerable service or port: e.g. Sendmail port
(25)
○ Remediation steps, e.g. Patch version of Sendmail
○ Combine vulnerability scan data with real-time event data
○ IDS reports a range of ports are scanned across several hosts
○ Verify whether the ports are active and vulnerable
SIT Internal

Vulnerability correlation
• Reduce noise by reporting based upon high value systems or asset
weights
• Add context of target operating system
• Add knowledge of vulnerabilities
• Rules
• Target Vulnerable to Detected Exploit
• Vulnerable to Detected Exploit on Different Port
• Vulnerable to Different Exploit than Detected on Attacked Port
SIT Internal

Vulnerability Correlation - example

• AlienVault Vulnerability
Scanner detected the
“IIS remote command
execution”
vulnerability on the
server

• and AlienVault IDS


detected an attack
exploiting that
vulnerability on the
same server

IIS remote command execution


SIT Internal

Watch list correlation


○ Place the source of an attack on a watch list
○ External intelligence e.g. Dshield has a list of top
attackers that you can place on the watch list
SIT Internal

Lecture 3 Summary

○ Events of security interests


SIT Internal

Lecture 3 Summary

Filtering &
Raw logs Correlation
Normalization

○ Data filtering
○ Irrelevant data fields
○ Duplicated data entries, could be from different sources
○ Redundant data that is heavily dependent and can be derived from other data,
e.g., collinearity between data, DoB and Age

○ Data normalization and reformatting


○ Break down known log message into a normalized format, e.g., inconsistent
representation between data sources
○ Reformatting e.g., .pcap (for Wireshark) to csv (for Splunk)

○ Handling data discrepancy


○ Noise, outliers
○ Missing values
SIT Internal

Lecture 3 Summary
Correlation Patterns
Micro-Level Macro-Level

Source IP Destination IP Time Anti-port Geographic Vulnerability


correlation correlation correlation correlation location correlation
correlation

Interleaving Port Watch list


address correlation correlation
correlation
SIT Internal

You might also like