Professional Documents
Culture Documents
Mining Anomalies Using Traffic Feature Distributions
Mining Anomalies Using Traffic Feature Distributions
Mining Anomalies Using Traffic Feature Distributions
33
Mining Anomalies Using Traffic Feature Distributions
Presented by
Idrees Fazili
( I Semester M.Sc IT)
Enrollment No. : 100217
Presented By
Idrees Fazili 55ofof 29
29
Research Paper Focus
Analysis of feature distributions using entropy as summarization tool
It enables highly sensitive detection of a wide range of anomalies,
augmenting detections by volume-based methods,
It enables automatic classification of anomalies via unsupervised learning
Validate claims on data from two backbone networks
10ofof29
10 29
Presented By Idrees Fazili
RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
>>RELATED WORK
FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
DATA
DETECTION
CONCLUSION
QUESTIONS
REFERENCES
11ofof29
11 29
Presented By Idrees Fazili
Related Work
Anomalies treated as deviations in overall traffic volume
Much of work in anomaly detection and identification has been
restricted to point-solutions for specific types of anomalies
Much of work in anomaly detection has focused on single-link traffic
data
use of Entropy as a summarization tool for feature distributions, with
much broader objective : that of detecting and classifying general
anomalies, not just individual types of anomalies
12ofof29
12 29
Presented By Idrees Fazili
RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
RELATED WORK
>>FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
DATA
DETECTION
CONCLUSION
QUESTIONS
REFERENCES
13ofof 29
13 29
Presented By Idrees Fazili
Feature Distributions
Analysis of traffic feature distributions is powerful tool for detection and
classification of network anomalies because many important kinds of
traffic anomalies cause change in the distribution of address or ports
observed in traffic.
Table lists a set of anomalies commonly encountered in backbone
network traffic
14ofof 29
14 29
Presented By Idrees Fazili
Feature Distributions Cont…
Traffic feature is a field in the header of a packet.
Four fields
Source address (sometimes called source IP and denoted srcIP)
Destination address (or destination IP, denoted dstIP)
Source port (srcPort)
destination port (dstPort)
15ofof29
15 29
Presented By Idrees Fazili
RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
RELATED WORK
FEATURE DISTRIBUTIONS
>>DIAGNOSIS METHODOLOGY
DATA
DETECTION
CONCLUSION
QUESTIONS
REFERENCES
16ofof29
16 29
Presented By Idrees Fazili
Diagnosis Methodology
Anomaly diagnosis methodology leverages observations about entropy
to detect and classify anomalies.
To detect anomalies introduced
Multiway subspace method
Showed how it can be used to detect anomalies across multiple traffic
features and across multiple Origin-Destination (or point to point) flows
To classify anomalies adopted
An unsupervised classification strategy
Show how to cluster structurally similar anomalies together
Together multiway subspace method and clustering algorithms form
foundation of anomaly diagnosis methodology
17ofof29
17 29
Presented By Idrees Fazili
Diagnosis Methodology: Multi -way Subspace Method
Subspace method
Its goal is to identify typical variation in a set of correlated metrics, and
detect unusual conditions based on deviation from that typical variation
Normal variation
projection of data onto this subspace
Abnormal variation
Any significant deviation of data from this subspace
Introduce multiway subspace to address anomalies typically induce
changes in multiple traffic features
18ofof29
18 29
Presented By Idrees Fazili
Diagnosis Methodology: Multi -way Subspace Method
19ofof 29
19 29
Presented By Idrees Fazili
Diagnosis Methodology: Unsupervised Classification
20ofof29
20 29
Presented By Idrees Fazili
RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
RELATED WORK
FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
>>DATA
DETECTION
CLASSIFCATION
CONCLUSION
QUESTIONS
REFERENCES
21ofof29
21 29
Presented By Idrees Fazili
Data
Proposed anomaly detection and classification framework using
sampled flow data collected from all access links of two backbone
networks Abilene and G´eant
Abilene is the Internet2 backbone network
Connecting over 200 US universities and peering with research networks in
Europe and Asia
It consists of 11 Points of Presence (PoPs), spanning continental US
Collected three weeks of sampled IP-level traffic flow data from every PoP
in Abilene
Sampling is periodic, at a rate of 1out of 100 packets
Abilene anonymizes destination and source IP addresses by masking out
their last 11 bits
G´eant is European Research network
Twice as large as Abilene, with 22 PoPs, located in major European capitals
Collected three weeks of sampled flow data from G´eant as well
Data from G´eant is sampled periodically, at a rate of 1 every 1000 packets
G´eant flow records are not anonymized
22ofof29
22 29
Presented By Idrees Fazili
RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
RELATED WORK
FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
DATA
>>DETECTION
CONCLUSION
QUESTIONS
REFERENCES
23ofof29
23 29
Presented By Idrees Fazili
Detection
First step in anomaly diagnosis is detection
Consideration in using feature distributions in anomaly detection
Does entropy allow detection of a larger set of anomalies than can be
detected via volume-based methods alone?
Are the additional anomalies detected by entropy fundamentally different
from those detected by volume-based methods?
How precise is entropy-based detection?
Compare sets of anomalies detected by
Volume-based
Entropy-based methods
Manually inspect anomalies detected to determine their type and to
determine false alarm rate
Inject known anomalies into existing traffic traces to determine
detection rate
24ofof29
24 29
Presented By Idrees Fazili
RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
RELATED WORK
FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
DATA
DETECTION
>>CONCLUSION
QUESTIONS
REFERENCES
25ofof29
25 29
Presented By Idrees Fazili
Concluding Words
Network anomaly diagnosis is an ambitious goal, but advent of
network-wide flow data brings goal closer to feasibility. Treating
anomalies yields considerable diagnostic power, in detecting
new anomalies, in understanding the structure of anomalies,
and in classifying anomalies
Ongoing work on extending feature-based diagnosis methodology
Online extensions to clustering methods, devising methods to expose raw
flow records involved in anomaly, and investigating additional information
that can aid in better classifying anomalies by their root-cause
26ofof29
26 29
Presented By Idrees Fazili
RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
RELATED WORK
FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
DATA
DETECTION
CONCLUSION
>>QUESTIONS
REFERENCES
27ofof29
27 29
Presented By Idrees Fazili
Questions
? ?
? ?
?
?
? ?
28ofof29
28 29
Presented By Idrees Fazili
Thank You!
Whenever you learn something new, the whole world becomes that much richer