Lecture MLfortrafficclassification 91917

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

A Survey of Techniques for

Internet Traffic Classification


using Machine Learning
Presenter: Klara Nahrstedt
CS598KN, September 19, 2017
Paper Authors: T. Nguyen, G. Armitage
IEEE Communications Survey & Tutorials, Vol. 10, No.4, 2008
Outline
• Motivation and Problem Description
• ML Techniques
• Application of ML in IP Traffic Classification
• Review of ML-based IP traffic Classification Techniques
• Summary
Motivation
• Real-time traffic classification has potential to solve difficult network
management problems
• Network managers need to know traffic characteristics
• Traffic classification can be useful in
• QoS provisioning
• Real-time traffic classification is core to QoS-enabled services and automated QoS
architectures
• RT traffic classification allows to respond to network congestion problems
• Automated intrusion detection systems
• Detect patterns indicative of denial of service attacks
• Trigger automated re-allocation of network resources
• Identify customer use of network resources which contradicts operator’s term of service
• Clarification of ISP obligations with respect to “lawful interception” of IP
traffic
Problem Description
• IP traffic is often defined as set of flows with 5-tuple parameters
• Protocol type
• Source address: port
• Destination address: port
• One can use simple classification to infer control applications which use ‘well-known’ TCP
or UDP port numbers (e.g., web traffic port 80) and use this classification to regulate
traffic
• Problem:
• Many apps use unpredictable port numbers
• We need more sophisticated classification techniques to infer app type
• Problem:
• Deep packet inspection techniques are not effective because these techniques assume
• Assumption 1: 3rd parties unaffiliated with either source or recipient are able to inspect each IP packet payload
• Assumption 2: Classifier knows syntax of each app packet payload
• Why didn’t IntServ or DiffServ work for QoS provisioning?
• Problems with QoS signaling
• Problems with service pricing mechanisms
Challenges
• Violation of 1st assumption
• Customers use encryption to obfuscate packet contents (including TPC and
UDP port numbers)
• Governments impose privacy regulations constraining the ability of 3rd parties
to lawfully inspect payloads at all
• Violation of 2nd assumption
• Commercial devices will need repeated updates to stay ahead of regular
changes in every app packet payload formats
• This causes heavy operational load
Goal and Approach
• Goal:
• We need new approaches to recognize application-level usage patterns
without deep packet inspection
• Approach:
• Recognize statistical patterns in externally observable attributes of the traffic
• Example: packet length, inter-packet arrival time
• Cluster IP traffic into groups that have similar traffic patterns,
• Classify one or more apps of interest
• Use Machine Learning techniques to IP traffic classification
Approach – Applying ML
• Step 1: features are defined by which future unknown IP traffic may
be identified and differentiated
• Features are attributes of flows calculated over multiple packets
• Feature examples:
• max and min of packet length in each direction;
• flow duration;
• inter-packet arrival time

• Step 2: ML classifier is trained to associate sets of features with


known traffic classes (creating rules)
• Step 3: ML algorithm is applied to classify unknown Traffic using
previously learned rules
Traffic Classification Metrics
• Classification techniques differentiate how accurately the technique or
model makes decisions when presented with previously unseen data.
• Assumption: we have traffic class X
• Goal: traffic classifier is being used to classify packets (or flows) belong to class X
when presenting a mixture of previously unseen traffic
• Input: mixed traffic of packets or flows,
• Output: does a flow (packet) belong to class X or not.
• Metrics characterize classifier’s accuracy
• False Negative
• False Positive
• True Negative
• True Positive
• Other classifier evaluation metrics in ML literature are
• Recall - % of members of class X correctly classified as belonging to class X
• Precision - % of those instances that truly are members of class X among all those
classified as class X
Traffic Classification Metrics
“In pattern recognition, information retrieval and
binary classification,
• precision (also called positive predictive value) is the
• fraction of relevant instances among the
retrieved instances, while
• recall (also known as sensitivity) is the
• fraction of relevant instances that have been
retrieved over the total amount of relevant instances.

Both precision and recall are therefore based on an


understanding and measure of relevance.” (Wikipedia)
Limitations of Packet Inspection for Traffic
Classification
• Traditional IP traffic classification uses
• Packet’s TCP or UDP port numbers (port-based classification)
• Reconstruction of protocol signatures in its payload (payload-based
classification)
• Port-based classification limitations
• Some apps may not have ports registered with IANA (e.g. Napster, Kazaa P2P
apps)
• Apps may use ports other than its well-known ports to avoid OS access
control restrictions (e.g., non-privileged users may be forced to run HTTP
servers on ports other than port 80)
• Server ports may be dynamically allocated (e.g., RealVideo streamer does the
dynamic negotiation of server port used for data transfer)
• IP layer encryption may obfuscate the TCP/UDP headers
Limitation of Packet Inspection
• Payload-based IP traffic classification limitations
• Payload-based inspection avoids reliance on fixed port numbers, but

• Imposes significant complexity and processing load on traffic identification


devices
• Must be kept up-to-date with extensive knowledge of application protocol
semantics
• Must be powerful enough to perform concurrent analysis of potentially
large number of flows
Background of Machine Learning
• Input of ML process
• Data instances /datasets
• Each instance is characterized by values of its features (attributes or discriminators)
• Output of MP process
• Description of knowledge (depends on particular ML approach)
• Types of Learning
• Classification (supervised learning)
• Clustering (unsupervised learning)
• Associations – learning associations between features
• Numeric predictions - outcome predicted is not discrete class, but numeric quantity
• Classification and Clustering are used for network traffic classification
Background of Machine Learning (2)
• Supervised Learning
• Modeling input/output relations
• Identifying mapping from input features to output class
• Knowledge learning represented as flowchart, decision tree, classification rules and
used to classify a new unseen instance
• Two phases
• Training phase – construct classification model
• Testing phase – use classification model to classify new unseen instances
• Classification algorithms
• Differ mainly how classification model is constructed and what optimization
algorithm is used to search for good model
• Examples of classification algorithms: Decision tree, Naïve Bayes techniques
Background of Machine Learning (3)
• Clustering
• Does not provide guidance
• Discovers natural clusters (groups) in data
• Finds patterns in input data
• Clusters instances with similar properties (e.g., distance measuring approach)
• Basic Clustering Methods
• Classic k-means algorithm
• Forms clusters in numeric domains, partitioning instances into disjoint clusters
• Incremental clustering
• Generates hierarchical grouping of instances
• Probability-based clustering method
• Assigns instances to classes probabilistically, not deterministically
K-mean clustering
“k-means clustering aims to partition n observations into k clusters
in which each observation belongs to the cluster with the nearest
mean, serving as a prototype of the cluster. This results in a
partitioning of the data space into Voronoi cells.” (wikipedia)
Background of Machine Learning (4)
• Evaluation of supervised learning algorithms
• Optimize recall and precision
• Problem: often there is tradeoff between them and app context decides which is more
important
• Consider tradeoff tools
• Receiver operating characteristics curve (ROC) provides a way to visualize tradeoffs between TP
and FP
• Consider important issue
• Cost of trading between Recall and Precision
• Challenge: datasets for training and testing (should be different, but it is
difficult)
Background of ML (5)
• Possible solution: consider “holdout” method
• Set aside some part (2/3) of pre-labeled dataset for training and 1/3 for testing
• Possible solution: if only small dataset available, consider N-fold cross-
validation method
• The set is first split into N approx. equal partitions (folds)
• Each partition (1/N) is used for testing while ((N-1)/N) are used for training
• Procedure repeats N times so that every instance has been used exactly once for
testing
• Recall and Precision are calculated from average of recalls, precisions measured
during all N tests
• Possible solution: if partitioning into N subsets does not guarantee equal
representation of any given class, consider stratification method
• Randomly sample dataset in such a way that each class is equally represented in
both training and testing
Background of Machine Learning (6)
• Evaluation of unsupervised learning
• Answer questions
• How many clusters are hidden in data
• What is optimal number of clusters
• Whether resulted clusters are meaningful or just an artifact of algorithms
• How easy they are to use
• How fast it is to be employed
• What is intra-cluster quality
• How good is inter-cluster separation
• What is cost of labeling clusters
• What are requirements in terms of computer computation and storage
• Three approaches to investigate cluster validity
• External criteria approach – based on prior information of data
• Internal criteria approach – based on examining internal structure inherited from
dataset
• Relative criteria approach – based on finding best clustering scheme that a clustering
algorithm can define under certain assumptions and parameters
Background of Machine Learning (6)
• Feature selection algorithms
• Feature selection process = Identification of smallest necessary set of features required to
achieve one’s accuracy goal
• Selection of features crucial – irrelevant or redundant features often lead to negative impacts
on accuracy of ML algorithms
• Classification of feature selection algorithms
• Filter methods
• Make independent assessment based on general characteristics of data
• Rely on certain metric to rate and select best subset before learning commences
• Are not biased towards any ML algorithms
• Wrapper methods
• Evaluate performance of different subsets using ML algorithm that will ultimately be employed for
learning
• Are biased towards ML algorithm used
• Example
• Correlation-based Feature Selection (CFS) filter techniques with Greedy, Best-First or Genetic
search
Application of ML to Traffic Classification (1)
• Definitions:
• Uni-directional flow (packets going in one direction and defined by five-tuple, source,
destination IP addresses, ports and protocol number)
• bi-directional flow (pair of uni-directional flows going in opposite directions between
the same source and destination IP addresses and ports)
• full flow (bidirectional flow captured over its entire life time)
• Class: IP traffic casued by application or group of apps
• Instances: multiple packets belong to same flow
• Features: numerical attributes calculated over multiple packets belonging
to individual flows
• Mean packet lengths
• Standard deviation of inter-packet arrival times,
• Total flow length
• Fourier transform of packet inter-arrival time
Application of ML (2)

Training supervised ML traffic classifier


- Traces are collected from online game traffic and
Other interfering apps (HTTP, DNS, SSH)
- Flow processing – calculate statistics properties of
These flows
- Data sampling – narrow search space
Training and testing for two-class supervised ML traffic classifier - Feature filtering – limit number of features actually
Used in training of ML classifier (cross-validation,…)
Application of ML (3)

Data flow within operational supervised ML traffic classifier


Clustering Approaches
• Flow Clustering using Expectation Maximization
• EM clusters traffic with similar observable properties into different app types
• HTTP, FTP, SMTP, IMAP, DNS, NTP traffic studied
• Group traffic flows into small number of clusters
• Create classification rules from clusters
• From these rules remove features that have no large impact
• Repeat this process
• EM algorithm groups traffic into number of classes based on traffic type (bulk
transfer, small transactions, multiple transactions)
• Results are limited in identifying individual apps of interest
• Other approaches in paper (read – interesting)
EM Algorithm
• “Expectation–maximization (EM) algorithm is an iterative method to
find maximum likelihood or maximum a posteriori (MAP) estimates
of parameters in statistical models, where the model depends on
unobserved latent variables. (wikipedia)
• Expectation (E) step
• creates a function for the expectation of the log-likelihood evaluated using the
current estimate for the parameters, and
• Maximization (M) step
• computes parameters maximizing the expected log-likelihood found on the E step.
• These parameter-estimates are then used to determine the distribution of
the latent variables in the next E step.
Supervised Learning Approaches
• Real-time traffic classification using multiple sub-flows (Nguyen &
Armitage, 2006)
• Timely and continuous classification is important
• RT classification uses most recent N packets of a flow – classification sliding window
• Use of small number of packets for classification
• Ensures timeliness of classification and reduces buffer space required to store packets for
classification
• Offers potential monitoring traffic flow during its lifetime in timely manner with constraints of
physical resources
• Training ML classifiers on multiple sub-flow features
• Extract two or more sub-flows (of N packets) from every flow that represents class of traffic
one whishes to identify in the future
• Each sub-flow should be taken from places of original flow having noticeably different statistical
properties (start, middle of flow)
• Train ML classifier wit combination of these sub-flows rather than original full flows
• This optimization is demonstrated using Naïve Bayes algorithm
• Other approaches – in paper (interesting)
Challenges
• Timely and continuous classification
• Most work has evaluated efficacy of different ML algorithms when applied to entire datasets of IP
traffic, trained and testbed over full-flows
• Some work has explored performance of ML classifiers that utilize only first few packets of flow ,
but cannot cope with missing flow’s initial packets
• Directional neutrality
• Many works assume bi-directional flows and knowledge of first packets
• Getting direction wrong will degrade classification accuracy
• Efficient use of memory and processors
• There are tradeoffs between classification performance of classifier and resource consumption
• Using large features improves accuracy, but required a lot of resources
• Portability and robustness
• Portability is not considered carefully in classification models
• Few works evaluate robustness of classification performance when packet loss, packet
fragmentation, delay and jitter occur.
Summary
• Good introduction to ML usage of traffic classification
• Traffic classification is important for many purposes and definitely in
multimedia networking and cyber-physical systems
• Besides QoS services, consider anomaly detection, proactive network real-
time monitoring, management, routing traffic
• Important concepts for multimedia traffic
• real-time traffic classification
• Continuous traffic classification
• Feature selection that includes delay, jitter, bandwidth
• Machine Learning algorithm with high accuracy for traffic classification

You might also like