Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Machine Learning based Flow Entry Eviction for OpenFlow Switches

Hemin Yang and George F. Riley


School of Electrical and Computer Engineering, Georgia Institute of Technology
{hyang350, riley}@gatech.edu

Abstract— Software Defined Networking (SDN) is funda- switches can only accommodate 2000 flow entries [3]. On
mentally changing the way networks work, which enables the other hand, the arrival rate of flows can reach 10,000
programmable and flexible network management and configu- flows per second per server rack in data centers [4]. In order
ration. As the de facto southbound interface of SDN, OpenFlow
defines how the control plane can directly interact with the to forward the packets in these flows, massive flow entries
forwarding plane. In OpenFlow, flow tables play a significant should be installed in OpenFlow switches which are much
role in packet forwarding. However, the capacity of flow table more than the capacity of the flow table. In this case, it is
is limited due to power, cost, and silicon area constraints. extremely significant to manage flow tables efficiently.
The capacity-limited flow table cannot hold the explosive flows In previous versions of OpenFlow specification, new flow
generated by the fine-grained granularity control mechanism
used in SDN. Thus the flow table is frequently overflowed. In the entries will not be inserted in the flow table and an error will
case of overflow, eviction strategy which replaces existing flow be returned to the controller if a flow table is full. However,
entries with the new ones is critical to guarantee the efficient this approach is problematic because the service may be
usage of the flow table. In this paper, we present a machine disrupted. From OpenFlow 1.4.0, eviction mechanism is
learning based eviction approach which can identify whether a introduced to enable smoother degradation of behavior in
flow entry is active or inactive and thus timely evict the inactive
flow entries when flow table overflow occurs. Our simulations the case of flow table overflow. Once the flow table is full,
based on real network packet traces show that the proposed the eviction enabled OpenFlow switch can kick out existing
method can increase the usage of flow table by more than flow entries of lower importance to install new flow entries
55% and reduce the number of capacity misses by up to 80%, instead of simply rejecting them. The key issue for eviction
compared with the Least Recently Used eviction policy. mechanism is to decide which flow entry should be evicted.
Intuitively, we can come up with three naive strategies: Least
I. I NTRODUCTION
Recently Used (LRU), First In First Out (FIFO), and random.
Software Defined Networking (SDN) is widely regarded These three policies evict the LRU, first installed, random
as the next generation networking technique which can create flow entry respectively. A. Zarek compared the performance
programmable, flexible and agile networks whilst reducing of these three strategies [5] based on several real network
costs. Google, Amazon, Facebook, and other industrial giants traces and concluded that LRU outperforms the other two
have heavily invested and researched in SDN both in their strategies. Besides these three naive strategies, R. Challa
data centers and wide area networks. For example, Google et al. [6] employed multiple bloom filters to encode the
leveraged SDN principles to build its Jupiter network which importance value of flows which captures both the locality
achieves a capacity increase of 100x [1]. The core of SDN is and recentness of reference. With these values, the switch can
to separate the control plane from the forwarding/data plane evict the “least important” flow entry in the case of flow table
in switches. This separation makes network applications overflow. In addition, B. Lee et al. [7] proposed a fair eviction
programmable and accelerates network innovations with the strategy based on LRU, where a new mice flow can only evict
help of the abstracts provided by SDN. To achieve this a mice flow and an elephant flow can only evict an elephant
separation, most SDN implementations use the de facto flow. Furthermore, T. Pan et al. [8] proposed Adaptive Least
southbound protocol OpenFlow [2] as the communication Frequently Eviction to prevent elephant flows from being
interface between the control and data planes. The kernel evicted by massive mice flows by assigning elephant flows
of OpenFlow is a packet processing pipeline consisting of with higher priorities.
several flow tables. A flow table contains a set of flow entries, Basically, the key question for flow entry eviction is to
which are used to match and process incoming packets (e.g., determine whether a flow entry is inactive or not. And we
forward the packet to a port, modify the packet). However, would like to evict an inactive flow entry to install a new
flow table is extremely stressed because of the big gap coming flow. All the existing solutions apply some heuristics
between the capacity of flow table and the explosive number to infer which flow entry is most likely to be inactive. For
of flows. On one hand, flow table is placed in Ternary example, the least recently used flow entry is more likely to
Content Addressable Memory (TCAM) which can achieve be inactive than the most recently used one. However, the
single clock cycle lookup in most commodity OpenFlow inferences based on these heuristics cannot be very accurate,
switches. Due to power, cost, and silicon area constraints, the which seriously affect the usage of precious flow table space.
size of TCAM is very limited. As reported by G. Lu et al., In this paper, we propose to use machine learning techniques
the Broadcom chipset which is widely used in commercial to learn from historical data of flows and thus make a more

978-1-5386-5156-8/18/$31.00 ©2018 IEEE


accurate decision on whether a flow entry is inactive or not.
The rest of the paper is organized as follows. Section II talks
2 1.5 3 2 9 3 4 0.8
f1
about the challenges of flow entry eviction for OpenFlow (UDP)
t
switches and why machine learning can help. In Section v =(0, 0.8, 2.17, 2, 3, 9, 4)

III, our proposal is elaborated. Section IV presents case


1 3.2 6 6
studies on three real network packet traces to demonstrate f1
(TCP)
the performance of our proposal. Section V concludes this v =(1, 6, 3.2, 0, 0, 1, 6)
t
t0
paper and shows our future work.

II. F LOW E NTRY E VICTION

A. Challenges of Flow Entry Eviction


Fig. 1: Examples of flow entry feature vector with Npkt = 4.
When a flow table is configured to perform eviction, the
OpenFlow switch will evict an existing flow entry to make
space for new one if the flow table is full. But which flow
A. Offline Training: Data Collection
entry should be evicted? To answer this question, we should
consider two scenarios. First, if there are some inactive flow First and foremost, we need to collect data for training
entries in the flow table, we should evict these inactive entries classification model. For every data point, it should contain
for sure. In this case, the challenge is to determine whether two parts: features and label. Features are used to char-
one flow entry is inactive or active. Second, if all the installed acterize the state of one specific flow entry, and the label
flow entries are active, we should evict the flow entry with the indicates whether this flow entry is active or inactive. In this
smallest packet arrival rate. Then the problem is to predict study, a flow is defined by five tuples, i.e., source IP address,
the future packet arrival rate for every flow entry. In this source port number, destination IP address, destination port
study, we focus on the first scenario, and our key challenge number, and the protocol. With this definition, we use the
is to identify inactive flow entries in the flow table. following features to characterize a flow entry: the protocol
(we only consider TCP and UDP flows in this study) of
B. Why Machine Learning Can Help the flow (1tcp ), the period of time since the last reference
of the flow entry (i.e., tidle =current time - the time when
Machine learning refers to the technology which can ac- the flow entry is last referred), the average inter-arrival time
quire knowledge (e.g., patterns) from raw data and apply the of the last Npkt packets referring to the flow entry (tia ),
knowledge to practical inference. Especially deep learning, and the length (li ) of the last Npkt packets referring to the
which can automatically learn data representations and build flow entry. tidle captures the recentness of the flow entry,
complex concepts out of simpler concepts using multilayer and larger tidle means the flow entry is more likely to be
neural networks, is widely used in commercial products (e.g., inactive since no packet will refer to an inactive flow entry.
image classification, speech recognition, natural language tia can reflect the locality of references, and the reference
processing) [9]. To apply machine learning, two conditions tends to be more local with smaller tia . li of last Npkt
must be satisfied: an existed pattern and related raw data. packets can reflect the communication state of a flow. For
In the case of flow entry eviction, there exists a pattern example, FIN packets will be sent when a TCP connection
for sure. For example, FIN packets will be sent when a is terminated. A larger Npkt can provide more information
TCP connection is terminated. In addition, we can collect but consume more OpenFlow switch memories. Therefore,
stats of billions of packets in practice easily. For example, we should set Npkt as small as possible without a significant
Wireshark can be used to capture live network data [10]. classification accuracy loss. These features constitute the
With the collected data, the problem of identifying inactive feature vector v = (1tcp , tidle , tia , l1 , l2 , · · · , lNpkt ) ∈ RNpkt
flow entries is actually a binary classification problem, where for each flow entry. For example, Fig. 1 shows two flow
each flow entry in the flow table is classified as either active entries with Npkt = 4. The upper flow entry is UDP and
or inactive. it is referred by four packets until t0 , with length 2, 3,
9, and 4. The inter-arrival time of these packets are 1.5,
III. O UR P ROPOSAL 2, and 3. At time t0 , the feature vector of this flow entry
is v = (0, 0.8, 2.17, 2, 3, 9, 4). As for the lower TCP flow
As discussed in Section II, each flow entry can be either entry, there are only two packets referred to it. In the case
classified as inactive (positive) or active (negative). To apply when there are less than Npkt packets referring to a flow
this classification, we first need to train a binary classification entry, we set li of the “missing” packets as 0. Therefore,
model. Therefore, our proposal mainly consists of two parts. the feature vector of the lower flow entry at t0 is v =
One is offline training to generate the classification model, (1, 6, 3.2, 0, 0, 1, 6).
and the other one is online flow table eviction utilizing the With these identified features, we can then generate the
trained model. dataset for training from real network packet traces. The
generation process is described in Listing 1, which simulates
the arrival of packets and updates the feature vector of the
corresponding flow entry when a packet arrives. The features
and label of every flow entry will be outputted as a data
sample in the case of flow table overflow, where a flow
entry is labeled as inactive/positive when packet p refers
to this flow entry and no more packet will refer to it in
the future. Note that all identified features are time-varying
except for 1tcp . Particularly, tidle of a flow entry changes
as time elapsed even if no more packet refers to it. In this
case, there will be thousands of data samples which are
exactly the same except that their tidle are slightly different.
For example, for the upper flow entry in Fig. 1, the feature
vector v will be (0, 0.801, 2.17, 2, 3, 9, 4) at time t0 + 0.01 Fig. 2: K-fold cross validation (K = 4) for tuning the
and (0, 0.802, 2.17, 2, 3, 9, 4) at t0 + 0.02. To prevent this, hyperparameters of random forest.
we employ trecord to record the time when the features and
label of a flow entry are outputted as a data sample. A flow
entry cannot generate data samples within tinterval after its to achieve the best performance. Many algorithms can be
trecord if no packet refers to it (see line 4 in Listing 1). used for classification problems, such as nearest neighbors,
Another important issue is which policy (e.g., random, LRU) support vector machine, decision tree, random forest, and
should be used for flow entry eviction in the case of flow multiple layer perception [11]. In this paper, we adopt
table overflow. In machine learning, we would like training random forest as the learning model. Random forest is
data and test data sets come from the same underlying data an ensemble learning method which contains multiple un-
distribution such that the trained model can achieve low correlated decision trees. This learning method has many
generalization error. Therefore, we evict a random flow entry advantages such as no input preparation, quick training, and
when the flow table is overflow (see line 6). In this way, robustness to noise. Furthermore, many empirical studies
the generated dataset will have a similar distribution as the have shown the excellent performance of random forest [12].
feature vectors which are fed into the trained model in the Besides selecting an appropriate machine learning algo-
online flow table eviction. rithm, we also need to select an appropriate performance
metric. There are many performance metrics for classifica-
Listing 1 Dataset generation tion, such as classification accuracy, recall, and F-score [11].
Require: Packet trace, tmax , Nmax , tinterval , Npkt For flow entry eviction, the cost of misclassifying an active
1: for Tcp/Udp packet p whose arrival time tp ≤ tmax do flow entry as inactive is much higher than the reverse one.
2: if p cannot match any flow entry then If an active flow entry is kicked out from the flow table, the
3: if the size of the flow table is Nmax then switch has to query the controller again to reinstall the flow
4: Output the features and label of each flow entry entry when the packets of the flow arrive in the future. This
which is recorded ≥ tinterval ago or updated; re-installment not only incurs unexpected delays [13] but also
5: Set each flow entry as non-updated and trecord of increases controllers’ workloads. Furthermore, evicting an
each flow entry as tp ; active TCP flow entry can seriously degrade the performance
6: Evict a random flow entry; of TCP connections because it may result in packet loss and
7: end if congestion window shrinkage for all TCP flows which share
8: Insert the flow entry subject to p in the flow table. a same switch buffer [14]. Therefore, we care more about
9: Set the flow entry as updated. the classification accuracy of active flow entries than inactive
10: else ones and try to minimize the number of false positives (i.e.,
11: Update the features of the flow entry referred by p active flows are misclassified as inactive). With this respect,
12: end if we use precision as the classification performance metric. It
13: Update the label of the flow entry referred by p is defined by
14: end for
Precision = #TP/(#TP + #FP), (1)
where #TP is the number of correctly classified inactive
B. Offline Model Training: Model Tuning flow entires, and #FP is the number of misclassified active
With the collected dataset, we need to select an appropriate flow entries. According to the definition, a high precision
machine learning algorithm and tune its hyperparameters1 indicates a small number of false positives, which is exactly
our objective.
1 For a machine learning algorithm, hyperparameters are parameters
The last issue for the offline model tuning is how to tune
whose values are set manually before the learning process begins. For
example, we need to determine the number of trees before we train a random the hyperparameters of random forest. We use k-fold cross-
forest model. validation, as shown in Fig. 2, to evaluate the performance of
random forest with different hyperparameter configurations. binary classification model for online flow table eviction.
K-fold cross-validation is a common approach to fine-tune A straightforward idea of applying the trained model is
model hyperparameters in machine learning area [11], where to evict all flow entries which are classified as inactive
the whole dataset is first split into a training set and a testing by the model. However, this native approach may suffer
set. The training set is further divided into K roughly equal from serious performance degradation. On one hand, the
parts (K=4 in Fig. 2). For fold k = 1, 2, · · · , K, we fit the trained model can make wrong classifications and many
random forest with hyperparameters to the other K-1 parts, misclassified active flow entries thus may be eliminated. On
and compute its precision in classifying the kth part (a.k.a., the other hand, one and only one flow entry is needed to
validate part). Then we get the average precision of these be evicted when a new flow entry arrives. Therefore, it is
K fold cross validations. We do this for many values of a better way to evict one flow entry which is most likely
the hyperparameters and choose the values which makes the to be inactive when the flow table is full. Fortunately, most
average precision largest. With the chosen values, we can binary classification algorithms can not only predict whether
fit random forest to the training set and get its precision in one flow entry is inactive but also give the confidence of
classifying the testing set. This precision value evaluates how the prediction (i.e., the probability of being inactive for the
accurately the best-tuned random forest model can classify flow entry). We rely on these probabilities for online flow
a flow entry unseen in training as inactive. Finally, we train entry eviction, as shown in Listing 2. Suppose we train
random forest with these best-tuned hyperparameters on the a binary classification model h from the collected dataset,
whole dataset including the training and testing sets, and where h(ve ) gives the probability of being inactive for flow
e
generate a random forest classification model for online flow entry e with feature vector ve (i.e., Pinactive ). A feature vector
table eviction. v is associated with each flow entry to extract the features.
The feature vector contains all identified features except for
C. Online Flow Table Eviction
tidle . In principle, tidle of a flow entry should be updated
as time elapses while the other features should be updated
Listing 2 Online Flow Entry Eviction
only when a new packet refers to the flow entry. To avoid
Require: Trained model h, Pmin , tinterval frequent updates of tidle , the feature vector of a flow entry
1: while a packet p is arriving at the switch do contains the arrival time of the last packet referring to this
2: if p is matched with a flow entry ep then entry (tlast ). When the feature vector is fed to the trained
3: update the feature vector vep associate with ep model for classification, tidle is calculated by tnow − tlast
4: else where tnow is the current time. In this respect, when a new
5: if the flow table is overflow then packet arrives, only the feature vector associated with the
6: isEvicted ← f alse flow entry the packet refers to will be updated.
7: for every flow entry e in the flow table do Another issue is to determine when the trained model
e
8: if ve is updated or Pinactive is updated tinterval should be applied for classification. When flow table over-
ago then flow happens, we need to apply the trained model to find the
9: e
Pinactive ← h(ve ) flow entry which is most likely to be inactive. Intuitively,
e
10: if Pinactive > 0.9 then we can use the implemented model to compute Pinactive
11: 1) evict the entry e (the probability of being inactive for a flow entry) of all
12: 2) isEvicted ← true flow entries and evict the flow entry with the maximum
13: break Pinactive . This straightforward approach suffers from two
14: end if disadvantages. First of all, it will be very computation in-
15: end if tensive because flow table overflow happens very frequently.
16: end for The intensive computation not only incurs a heavy workload
17: if not isEvicted then on the weak switch management CPU but also introduces
18: e∗ = argmax{Pinactive
e
} unacceptable latency. Secondly, it is meaningless to do
e∗
19: if Pinactive > Pmin then classification on the same flow entry again if its feature
20: evict the flow entry e∗ vector has no or little change. For example, we do not want
21: else to classify a flow entry again if its feature vector stays same
22: evict the least recently used flow entry except that tidle is 1 millisecond different. To address these
23: end if problems, the time when Pinactive is last updated is recorded
24: end if in our proposal. If no packet refers to the flow entry, Pinactive
25: end if cannot be updated within tinterval second, where tinterval is
26: send flow setup request to the controller to install a given constant (see line 8 in Listing 2). In general, large
flow entry for packet p tinterval makes the classification less intense but reduces the
27: end if sensitivity to find an inactive flow entry. Furthermore, when
28: end while a flow entry is predicted to be inactive with Pinactive > 0.9,
it will be evicted immediately without updating Pinactive of
Once offline training is finished, we can apply the trained other flow entries (see line 10-13). Otherwise, we have to
find the flow entry with the maximum Pinactive . In this way, of random forest. This library provides many classification
the frequency of doing classification is greatly reduced. algorithms, as well as the APIs for model selection based
The last issue is how to kick out the misclassified inactive on cross-validation. We split each dataset into a training
flow entries from the flow table. For inactive flow entries, set and a test set, and the splitting ratio is 80/20. We then
only tidle will change as time elapses. In this case, it is apply 5-fold cross-validation on the training set to tune the
possible that some inactive flow entries will always be hyperparameters of random forest. Random forest has many
classified as active and thus reside in the flow table forever. hyperparameters and we only consider two most important
To make matters worse, these inactive flow entries will ones: n estimators (the number of trees in the forest)
accumulate as time passes and occupy most of the flow table and max depth (the maximum depth of the tree), and
space, which seriously affects the usage of flow tables. To all the other hyperparameters defined in the library use
address this issue, we evict the least recently used flow entry the default values provided by the library (see the module
e
if max{Pinactive } ≤ Pmin (see line 22). Otherwise, the flow sklearn.ensemble.RandomForestClassifier
entry with maximum Pinactive will be evicted (see line 20). defined in [19] for more details). The search space for
In this way, the misclassified inactive flow entries, whose n estimators is {15, 20, 25, 30, 100}, and the space
tidle tends to be large, can be removed if no flow entry can for max depth is {10, 15, 20, 25, 30}. All possible
be classified to be inactive with probability higher than Pmin . combinations (25 in total) of these two hyperparameter
values are evaluated by 5-fold cross-validation in terms
IV. CASE STUDY
of precision. In this way, we can pick the best values for
A. Data Collection n estimators and max depth, which are shown in
In this section, we will present case studies on three real Table II. As we can see, the best-tuned hyperparameters are
network packet traces to show the performance gain of our same for a network regardless of flow table size. This makes
proposal. One packet trace UNIV2 is collected from the uni- sense because the timing when a flow entry will be inactive
versity data center by the authors of [4]. The other two packet depends on the flow itself, but does not correlate with flow
traces (UNIBS0930 and UNIBS1001) are collected by the table size. According to Fig. 2, we fit random forest with its
telecommunication networks group at University of Brescia best-tuned hyperparameters to the training set and test the
from their workstations at 09/30/2009 and 10/01/2009 [15], trained model on the testing set to evaluate the performance
[16]. We summarize these three packet traces in Table I. As of the best-tuned random forest. The test results are also
we can see, the traffic in UNIV trace is much intenser (i.e., reported in Table II. As we can see, the trained models can
smaller duration but more packets) than the other two traces. achieve relative high precisions, especially for UNIBS0930
To collect data, we build an OpenFlow simulator which and UNIB1001 traces. Furthermore, we train the best-tuned
can replay packet traces according to the OpenFlow spec- random forest on the whole dataset and save the trained
ification [2] and collect data according to Listing 1. Our model with the help of joblib, which will be loaded for
simulator contains two objects: an OpenFlow switch and a online flow entry eviction.
controller. All the packets in a packet trace will be replayed
and arrive at the OpenFlow switch. When a packet p in C. Online Simulation Results
the trace arrives, the switch will check whether any flow We carried out simulations on the three packet traces with
entry in the flow table can match with p. If none of the different flow table sizes (1K, 2K, and 4K). The simulator
entries can match, a PacketIn message will be sent to the is similar to the one used in dataset generation except that
controller. Once received a PacketIn, the controller will the flow entry eviction policy can be configured. We have
instruct the switch to install a new flow entry with respect implemented two policies, machine learning policy and LRU
to p. Furthermore, the switch will update the feature vectors policy, for performance comparison. For machine learning
of flow entries and output data samples according to Listing policy, the switch determines which flow entry should be
1. For all three packet traces, we set the size of flow table evicted in the case of flow table overflow according to Listing
(Nmax ) to be 1K, 2K, and 4K, which are compatible with the 2, where Pmin = 0.65 and tinterval is set according to Table
configurations in many studies [6], [17]. As for Npkt , we set I. For LRU policy, the switch kicks out the least recently
it 10 for all traces and table sizes. In addition, for the UNIV used flow entry when the flow table is full.
packet trace, we generate datasets for Npkt = 5, 6, 7, 8, 9 First, we investigate the number of capacity misses. A
with 1K flow table size in order to study the effects of Npkt flow table miss occurs if an incoming packet cannot match
on our proposal’s performance. All these generated datasets any flow entry in the flow table. In general, there are two
are summarized in Table I. kinds of misses in the flow table. One is compulsory miss,
which occurs when the first packet of a flow arrives at the
B. Offline Training Results flow table. The other one is capacity miss, which occurs
We use scikit-learn [18], an open source machine when flow entries are discarded from the flow table because
learning library in Python, for tuning the hyperparameters of the limitation on the size of the flow table. Compulsory
2 This packet trace can be downloaded from http://pages.cs.
miss is inevitable and we here only consider capacity misses.
wisc.edu/˜tbenson/IMC10_Data.html, and it is collected from Fig. 3 shows the performance of machine learning policy in
UNIV1 datacenter. For simplicity, we call it UNIV in the rest of the letter. terms of the number of capacity misses, where the number
TABLE I: Summary of packet traces and generated datasets
Number Number Number of samples Number of packets
Duration (s) tinterval tmax
of packets of flows 1K 2K 4K within training duration
UNIV 3,914 19,855,388 577,675 1 1,030,222 1,595,952 2,716,847 600 1,743,725
UNIBS0930 81,203 4,190,465 43,539 20 567,277 833,531 1,032,696 10,000 1,283,407
UNIBS1001 32,407 3,322,174 39,743 10 396,605 665,100 777,190 5,000 364,452
Note: The number of samples in the datasets are same for different Npkt given a flow table size and a packet trace.

TABLE II: Offline training results, Npkt = 10


UNIV UNIBS0930 UNIBS1001
1K 2K 4K 1K 2K 4K 1K 2K 4K
Hyperparameters
(20,20) (20,20) (20,20) (20,25) (20,25) (20,25) (20,25) (20,25) (20,25)
(n estimators, max depth)
Precision 0.874 0.895 0.922 0.991 0.996 0.997 0.993 0.997 0.999

of capacity misses of one packet trace for each policy is


normalized with respect to the total number of capacity
misses across all policies for that trace. From Fig. 3, we
can see that machine learning policy can achieve much fewer
capacity misses for all three packet traces with all considered
flow table sizes. In the case of 1K and 2K flow table, machine
learning policy can achieve over 45% fewer capacity misses
on the UNIV packet trace and over 60% on the UNIBS0930
and UNIBS1001 traces. Especially, for the UNIBS1001, our
proposal can achieve 80% fewer capacity misses in the case
of 1K flow table. We should add that reducing capacity
misses is extremely important for OpenFlow network per-
formance. On one hand, fewer capacity misses will reduce
the number of PacketIn events, and thus relieve the load
on control channels and controllers. On the other hand, fewer
capacity misses means that fewer TCP transmissions are
Fig. 3: The performance of machine learning flow entry
interrupted. With this respect, the over 45% performance
eviction policy and LRU eviction policy in terms of number
gain in terms of capacity miss achieved by our proposal
of capacity misses.
is very significant. With 4K flow table, the performance
gain is decreased. For example, machine learning policy can
only reduce the number of capacity misses on the UNIV
trace by 13% compared with LRU policy. This is because based flow entry eviction policy can achieve significant
it is more likely for LRU policy to remove an inactive flow performance gain compared with LRU policy.
entry with a larger flow table. Furthermore, we observe that
Finally, we investigate the effects of Npkt and Pmin on
the performance gains on UNIBS traces are higher than the
our machine learning policy. The effects are evaluated by
UNIV trace. This is because the trained model can achieve
the number of capacity misses because it is directly related
higher precision on UNIBS than UNIV, as shown in Table
with networking Quality of Service metrics. We generate
II.
the datasets with Npkt = 5, 6, 7, 8, 9 for the UNIV packet
Second, we investigate the number of active flow entries in trace, where the size of flow table is 1K. For these datasets,
the flow table. Fig. 4 shows the active flow entries in the flow we use random forest model and tune n estimators and
table with machine learning and LRU policies on the UNIV max depth to achieve the best performance. These trained
packet trace. As we can see, the number of active flow entries models are used in the online flow table eviction simulation,
in the flow table with machine learning policy is much larger and the results are shown in Fig. 5. As we can see, the
than LRU. On average, machine learning policy can increase number of capacity misses is increased as Npkt decreased.
the usage of the flow table with 1K, 2K, and 4K capacity For example, the number of capacity misses is increased by
by 57%, 65%, and 63% respectively, compared with LRU. 10% when Npkt is reduced from 10 to 5. This is because
This significant improvement is achieved because machine larger Npkt can in general provide more information for the
learning policy can correctly identify and evict inactive flow model and thus help increase the classification accuracy. In
entries when flow table overflow occurs. On the contrast, this way, the machine learning policy can be more likely to
LRU may frequently remove active flow entries and leave evict inactive flow entries. On the other hand, larger Npkt
inactive flow entries in the flow table. Combining Fig. 3 and means the OpenFlow switch need more memory to store
4, we can reach the conclusion that our machine learning the feature vectors. For example, when Npkt is increased
2,000
ML (1K) ML (2K) ML (4K) LRU (1k) LRU (2K) LRU (4K)
70000
1,800
68000

Number of capacity misses


1,600
66000
Number of active flow entries in the flow table

1,400
64000
1,200
62000
1,000
60000
800
58000
600

56000
400 10 9 8 7 6 5
200
Npkt

0
0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 Fig. 5: The effects of Npkt on our proposal.
Time (second)

75000
Fig. 4: Number of active flow entries in the flow table for
the UNIV packet trace.

Number of capacity misses


70000

from 5 to 10, the memory cost of storing feature vectors


will almost double. Therefore, we need to make a trade-off 65000

between memory consumption and classification accuracy in


practical implementation.
In addition, we also investigate the effect of Pmin on 60000

our proposal. We conducted simulations on the UNIV trace


with 1K flow table and different Pmin . As we can see from
55000
Fig. 6, the number of capacity misses decreases up to a 0.5 0.6 0.65 0.7 0.8 0.85

certain Pmin , then increases as Pmin grows. For example, Pmin

the number of capacity misses is reduced by 21% when Fig. 6: The effects of Pmin on our proposal.
Pmin is changed from 0.5 to 0.7, and increased by 15%
from 0.7 to 0.85. This is because small Pmin allows the
switch to evict flow entries which are classified as inactive can be estimated through:
with low confidence. In this case, it is highly possible that
an misclassified active flow entries will be evicted. On the Ne ≈ N (Pinactive > Pmin ), (2)
N (Pinactive >Pmin ∧y=0)
contrast, large Pmin will prevent the switch from evicting P [y = 0|Pinactive > Pmin ] ≈ N (Pinactive >Pmin ) , (3)
inactive flow entries which are not identified by the trained
model with very high confidence. Actually, with large Pmin , where N (·) is a function returns the number of elements
the switch will heavily relies on the LRU policy (line 22 in satisfying a predicate. Combining (2) and (3), we can get
Listing 2) for eviction instead of the machine learning one. Nright ≈ N (Pinactive > Pmin ∧ y = 1). (4)
Then, how can we set Pmin properly? According to Listing
2, a flow entry with Pinactive > Pmin will be evicted in Therefore, we can set Pmin by
the case of flow table overflow (see line 20). Suppose the ∗
Pmin = argmax N (Pinactive > Pmin ∧ y = 1). (5)
total number of such evictions is Ne . Then the objective of Pmin
our proposal is to maximize the number of right evictions In the case of the UNIV trace with 1K flow table, Pmin
Nright = Ne ∗ (1 − P [y = 0|Pinactive > Pmin ], where y = 0 generated by (5) is 0.65, which is close to the optimal value
indicates the flow entry is active. Note that minimizing the (0.7) in Fig. 6.
number of wrong evictions is different from maximizing the
number of right evictions. If we want to minimize the number V. C ONCLUSION AND L IMITATIONS
of wrong evictions, we can just set Pmin = 1 such that In this paper, we focused on improving flow entry eviction
Ne will approximate 0. In this case, our proposal will be by exploiting machine learning techniques for OpenFlow
meaningless because eviction decisions seldom depend on switches. Our proposal includes collecting datasets from
the the predictions of the trained random forest model. packet traces, training a random forest binary classification
Given Pmin , Nright can be approximated from the dataset model based on the collected data, and applying the trained
generated by Listing 1. In the dataset, every data sample has model for online flow entry eviction. Our case studies show
a label (i.e., y). Furthermore, with the trained model, we can that our proposal can achieve much fewer capacity misses
calculate Pinactive for every data sample. Therefore, Nright and higher flow table usage, compared with LRU policy.
However, we do not discuss some implementation issues
of our proposal in this paper. For example, how to associate
feature vectors with flow entries in OpenFlow switches?
How to update feature vectors with minimum cost? Another
problem is the memory overhead of our proposal. These
issues are related with the memory architecture of physical
OpenFlow switches, which will be discussed in our following
journal paper. In addition, we will conduct emulations to
check our proposal’s performance in terms of networking
metrics (e.g., latency, throughput) in our future works.
R EFERENCES
[1] A. Singh et al., “Jupiter rising: A decade of clos topologies and cen-
tralized control in google’s datacenter network,” SIGCOMM Comput.
Commun. Rev., vol. 45, no. 4, pp. 183–197, Aug. 2015.
[2] OpenFlow Switch Specification (Version 1.5.1), Open Networking
Foundation Std., Rev. 1.5.1, Mar. 2015.
[3] G. Lu et al., “Serverswitch: A programmable and high performance
platform for data center networks.” in Nsdi, vol. 11, 2011, pp. 2–2.
[4] T. Benson, A. Akella, and D. A. Maltz, “Network traffic character-
istics of data centers in the wild,” in Proceedings of the 10th ACM
SIGCOMM Conference on Internet Measurement. New York, NY,
USA: ACM, 2010, pp. 267–280.
[5] A. Zarek, “Openflow timeouts demystified,” Master’s thesis, University
of Toronto, 2012.
[6] R. Challa, Y. Lee, and H. Choo, “Intelligent eviction strategy for
efficient flow table management in openflow switches,” in 2016 IEEE
NetSoft Conference and Workshops (NetSoft), pp. 312–318.
[7] B.-S. Lee, R. Kanagavelu, and K. M. M. Aung, “An efficient flow
cache algorithm with improved fairness in software-defined data center
networks,” in 2013 IEEE 2nd International Conference on Cloud
Networking (CloudNet), pp. 18–24.
[8] T. Pan, X. Guo, C. Zhang, W. Meng, and B. Liu, “Alfe: A replacement
policy to cache elephant flows in the presence of mice flooding,” in
2012 IEEE International Conference on Communications (ICC), pp.
2961–2965.
[9] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol.
521, no. 7553, pp. 436–444, 2015.
[10] C. Sanders, Practical packet analysis: Using Wireshark to solve real-
world network problems. No Starch Press, 2017.
[11] E. Alpaydin, Introduction to machine learning. MIT press, 2014.
[12] R. Caruana, N. Karampatziakis, and A. Yessenalina, “An empirical
evaluation of supervised learning in high dimensions,” in Proceedings
of the 25th international conference on Machine learning. ACM,
2008, pp. 96–103.
[13] M. Kuźniar, P. Perešı́ni, and D. Kostić, “What you need to know
about sdn flow tables,” in Proceedings of 2015 Springer International
Conference on Passive and Active Network Measurement, pp. 347–
359.
[14] Z. Guo et al., “Star: Preventing flow-table overflow in software-defined
networks,” Computer Networks, vol. 125, no. Supplement C, pp. 15 –
25, 2017.
[15] M. Dusi, F. Gringoli, and L. Salgarelli, “Quantifying the accuracy
of the ground truth associated with internet traffic traces,” Computer
Networks, vol. 55, no. 5, pp. 1158–1167, 2011.
[16] F. Gringoli et al., “Gt: picking up the truth from the ground for internet
traffic,” ACM SIGCOMM Computer Communication Review, vol. 39,
no. 5, pp. 12–18, 2009.
[17] A. Vishnoi, R. Poddar, V. Mann, and S. Bhattacharya, “Effective
switch memory management in openflow networks,” in Proceedings
of the 8th ACM International Conference on Distributed Event-Based
Systems, New York, NY, USA, pp. 177–188.
[18] F. Pedregosa et al., “Scikit-learn: Machine learning in python,” Journal
of Machine Learning Research, vol. 12, no. Oct, pp. 2825–2830, 2011.
[19] Scikit-learn: Machine learning in python. [Online]. Available:
http://scikit-learn.org/stable/index.html

You might also like