Professional Documents
Culture Documents
DeviceMien: Network Device Behavior Modeling For Identifying Unknown IoT Devices
DeviceMien: Network Device Behavior Modeling For Identifying Unknown IoT Devices
106
IoTDI’19, April 15–18, 2019, Montreal, QC, Canada Jorge Ortiz, Catherine Crawford, and Franck Le
(b) Choose optimal clustering (c) Train classifier using cluster la-
(a) Train an LSTM-Autoencoder. paramters. bels.
Figure 1: This figure shows all the steps taken during the training phase. The first step is to train an stacked, LSTM-Autoencoder
Figure 1a. The second step is to use Bayesian optimization to find the optimal clustering parameters, using DBSCAN, Figure 1b.
Finally, we train a classifier based on the assigned labels from the optimal clustering algorithm, Figure 1c.
work very well under today’s assumptions, they may become ob- (4) We can identify new devices 100% of the time and can infer
solete over time. In [30], for example, they extract domain names, the right class (IoT/Non-IoT) with over 80% F1 score on
port numbers, and cypher suites from network traffic per device. average.
There are a number of undesirable properties to this approach. New (5) Our model exploits a Bayesian analytical convenience to
devices may not use any of these protocols to communicate. IoT allow for a simple, flexible representation that is easy to
devices that are deployed by new companies use different protocols, instantiate, update and run in only a few lines of code.
non-commercial entities may deploy devices that never issue DNS
requests. Moreover, the port numbers may be entirely unique to Our approach is general and is not limited to only IoT device identi-
the application and the traffic may not be encrypted. It is not only fication. However, we observe that it can be used in this fashion and
possible, it is likely in non-commercial deployments that none of that it is especially useful, in comparison to existing approaches,
these pre-requisites will be met. Also, since a bag of words model when either a new device or new device behavior is observed. By
is used and it relies on a dictionary of terms, new terms may be not relying on hand-crafted features, it also provides the flexibil-
introduced that expand the dictionary, which may either require ity to run on other kinds of IP traffic. In this paper, we focus on
re-training or require that multiple dictionaries are managed for old identifying new devices and providing class-level feedback about
and new devices. This is not only true for this particular instance the kind of device that is being observed (IoT vs Non-IoT). In the
of related work but for most of the existing approaches. While they rest of the paper we provide a brief overview of related work, es-
certainly provide value towards a general solution, those solution pecially those focused on IoT device identification. We then give
proposals will require complex maintenance in production and will a high-level description of the framework and the three phases
become unwieldy at scale. of its construction – the training phase, the modeling phase, and
We consider a fundamentally different approach. Our approach the execution phase. We then describe each of the components
does not rely on hand-crafted features. Instead, it relies only on used in each phase and the specific selection of technique to enable
the data. We use an automatic feature learning technique to cast it. We describe the data sets used in our experiments and give a
the traffic onto a distribution that serves as a pseudo-signature detailed experimental methodology. Finally, we present the results
for devices and device type (IoT/Non-IoT). Our solution casts the and discuss the implications and conclusions.
problem as one of comparing observed distributions of device traffic,
in a streaming fashion. By using probabilistic matching, we are able
2 RELATED WORK
to match known device behaviors and give meaningful feedback
about the confidence we have about that inference. In summary, Many studies [2, 21, 31, 32, 37] have stressed the vulnerability of
we make the following contributions: IoT devices to security attacks, and emphasized the need for means
to detect, recognize, identify and discover IoT devices. Apthorpe
(1) We automatically learn features from the data and construct et al. [2] showed that by passively monitoring IoT network traffic,
a pipeline that combines deep learning elements with proba- one could infer user behaviors (e.g., user’s sleeping patterns) even
bilistic inference, successfully. when the traffic is encrypted. Yu et al. [37] discussed the root causes
(2) We provide a framework that identifies known devices with behind several reported IoT vulnerabilities (e.g., unprotected RSA
over 99% accuracy after only a few TCP-flow observations. key pairs, open DNS resolver), and present potential multi-stage
(3) We introduce an inexpensive modeling approach that models cross-device attacks. The authors emphasize the need to be able
the distribution of TCP-flows and uses the distribution as a to identify and understand IoT device classes so the cross-device
pseudo-signature; a general approach that works with any interactions can be learned, and acceptable behaviors distinguished
packet data. from potential attacks.
107
DeviceMien: Network Device Behavior Modeling for Identifying Unknown IoT DevicesIoTDI’19, April 15–18, 2019, Montreal, QC, Canada
vector of aggregate statistics and header field values. Each sample
vector is constructed from a summary of a complete, semantic TCP
flow1 , similar to our approach, it is not limited by this experimental
design choice. Also, their technique requires labeled data and does
108
IoTDI’19, April 15–18, 2019, Montreal, QC, Canada Jorge Ortiz, Catherine Crawford, and Franck Le
to match and/or rank newly observed distributions with labeled
ones. This is illustrated in Figure 2 and Figure 3, respectively.
Our framework aims to model the behavior of a device as a
distribution over classes of sequences of packets transmitted by
a (pair of) device(s). TCP provides a natural sequence of packets
to model. We train a neural network to encode a packet sequence
into a fixed-size vector, then we cluster the vectors to find their
“natural” separation and train a classifier on the cluster labels. Once
trained, we pass all the TCP flows for each device, encode it and
classify the encoding, maintaining a count over each of the classes
for that device. Finally, we model each histogram as a multinomial Figure 4: A TCP-flow set consists of a series of packets in
distribution. All traffic from previously unseen devices is similarly sequential order. We remove the headers from each packet
modeled and we calculate the similarity between distributions to and treat all payload sequences as the set that we feed into
identify them. the autoencoder.
The choice of technique applied in each stage is made judiciously.
The output of each stage is critical for building an effective discrim-
inative model. For example, some sequence encoders yield “uninter- how we combine these architectures to learn features from samples
esting” distributions over types that make sequence classification of TCP-flow packet data. Finally, we describe the rationale for our
ineffective – distributions for several devices are indiscernible. In design in Section 3.2.4 and describe how the input is trained and
the rest of this section, we explain each design choice and discuss encoded in Section 3.2.3.
some of the alternatives we experimented with. 3.2.1 Recurrent Neural Network. Recurrent neural networks con-
tain cyclic connections that make them more useful for capturing
3.1 Processing TCP Connection Data dependencies between sequences of values than feed-forward neu-
The transmission control protocol used by most internet device ral networks. RNNs have been used to successfully model sequences,
provides a natural sequence of packets transmitted between a pair of such as handwriting recognition [10], language modeling [22] and
devices. The protocol consists of a connection-establishment phase, acoustic modeling [12]. RNNs contain cyclical connections that
a communication phase, and a teardown phase. TCP controls the feed the activation nodes with output from the previous step. The
transmission rate between parties, so the number of packets in flight Long-Term Short-Term (LSTM) architecture is a type of RNN [14]
varies with the available bandwidth. The maximum packet size is which is a modification of a standard RNN.
1500 bytes. Each packet consists of several headers, concatenated A vanilla RNN cannot capture long-term dependencies due to
to encapsulate the MAC, network, transmission, and application- the vanishing and exploding gradient problem [5, 26]. To address
layer information. Although TCP manages the transmission rate, this issue, LSTMs were designed with special units called memory
re-transmissions, and packet ordering, a sniffer does not. Sniffed blocks in the recurrent hidden layer. The memory blocks contain
packets may be captured and recorded out of order, duplicated, memory cells with self-connections storing the temporal state of
or missing altogether. We choose to model TCP connection data the network in addition to special multiplicative units called gates
between devices because it provides a natural, temporal ordering to control the flow of information. Each memory block in the orig-
of packets for any pair of devices. However, because we capture inal architecture contains an input gate and an output gate. The
packets on the network with a sniffer, we must pre-process it to input gate controls the flow of input activations into the memory
ensure the sequences are in the right order and duplicated packets cell. The output gate controls the output flow of cell activations
are removed. We ignore missing packets. into the rest of the network. Later, the forget gate was added to
the memory block [8]. This addresses a weakness of LSTM models
3.2 Unsupervised Feature Learning preventing them from processing continuous input streams that
are not segmented into sub-sequences. The forget gate scales the
In this section we describe the neural network architecture we
internal state of the cell before adding it as input to the cell through
use to learn features from TCP-flow packet data. We first describe
the self-recurrent connection of the cell, therefore adaptively forget-
the foundational elements of the architecture and give a detailed
ting or resetting the cell’s memory. In addition, the modern LSTM
description of how these networks are connected and trained. Our
architecture contains peephole connections from its internal cells to
feature-learning architecture consists of two types of networks: the
the gates in the same cell to learn precise timing of the outputs [9].
long-term, short-term (LSTM) neural network architecture and a
We connect an LSTM layer as input to a stacked autoencder net-
stacked autoencoder architecture. In the following section, we give
work – discussed in the next section (Section 3.2.2) – in order to
a brief overview of a recurrent neural network (RNN) and how an
learn a function that maps to low-dimensional representation of a
LSTM is constructed from an RNN. We then give a brief overview
TCP-flow sample.
of autoencoders and describe how this network architecture is used
to learn features in an unsupervised fashion. We also describe a 3.2.2 Autoencoders. Autoencoders [13, 16, 35] are a class of neural
version of autoencoders, stacked autoencoders, that allow us to networks used to learn a compact representation of the input. The
learn higher-level, semantic feature representations. We describe network consists of two components: an encoding component and
109
DeviceMien: Network Device Behavior Modeling for Identifying Unknown IoT DevicesIoTDI’19, April 15–18, 2019, Montreal, QC, Canada
110
IoTDI’19, April 15–18, 2019, Montreal, QC, Canada Jorge Ortiz, Catherine Crawford, and Franck Le
(a) Android phone (b) Belkin Switch (c) Insteon web camera
Figure 6: For each device, a distribution over TCP-flow types, enumerates on the horizontal axis.
(j)
P(θ |D, α) = Dir (α ); α = α j + di (2)
d i ∈D
111
DeviceMien: Network Device Behavior Modeling for Identifying Unknown IoT DevicesIoTDI’19, April 15–18, 2019, Montreal, QC, Canada
4.1 Data Sets data. We chose this approach because we have more information
We evaluate our proposed approach on the network traffic from two about the public data set than we did about the private one. In future
independent sources. The first one is the publicly available trace versions of our work, especially before a live deployment, we must
from the University of New South Wales [3]. A lab was set up at the consider training on a larger set of samples. In spite of the relatively
campus facility, comprising both IoT and non IoT devices, and their small sub-sample, the performance is still very good, even across
traffic was captured over a period of 21 days starting from Septem- the private-lab data. This suggests that the encoder generalized
ber 23, 2016. More specifically, the IoT devices consists of twenty well and that the probabilistic framework does not require much
one commercial IoT devices representing different device classes data to effectively capture and characterize device behavior across
of IoT devices, e.g., cameras (Nest Dropcam, Samsung SmartCam, deployments in different settings.
Netatmo Welcome, Insteon Camera, TP-Link Day Night Cloud Cam-
era, Withings Smart Baby Monitor), switches and triggers (iHome,
4.2.1 Optimal Cluster Parameters. To find the optimal clustering
TP-Link Smart Plug, Belkin Wemo Motion Sensor, Belkin Wemo
parameters we use Bayesian optimization to automatically explore
Switch), hubs (Smart Things, Amazon Echo), air quality sensors
the parameter space and find a configuration that maximizes the Sil-
(NEST Protect smoke alarm, Netatmo Weather station), electron-
houette score [28]. Specifically, we use Bayesian optimization with
ics (Triby speaker, PIXSTAR Photoframe, HP Printer), healthcare
Gaussian Process priors [33]. Bayesian optimization with Gaussian
devices (Withings Smart scale, Withings Aura smart sleep sensor,
Process (GP) priors is a technique that allows us to do black-box ex-
Blipcare blood pressure meter) and light bulbs (LiFX Smart Bulb).
ploration of a model through its parameters and a scoring function.
The non IoT devices included laptops, mobile phones, and tablets.
The devices are identified and labeled with their MAC address.
The second source is a private lab which was set up in 2016
in North America, and where commercial IoT devices had been
5 RESULTS
continuously added and removed. The devices are also identified In this section, we describe the results of our experiments. Our
and labeled with their MAC address, and their traffic captured at the experiments are partitioned into two sections. Section 5.1 looks at
border router. For this study, we focus on the traffic captured during how well our approach is able to match unlabeled devices with its
the month of April 2017, as it is the most recent month in the trace. corresponding labeled one. This is similar to supervised learning
During that time, we observe a total of 72 IoT devices. Figure 8 but with some important differences. The main difference is that our
presents a sample of the IoT devices from the second dataset. Each match process is probabilistic and we simply do a each comparison
device is represented with the format (Vendor, Type, [DeviceId]), in pairs. Then, we rank the comparison results. We do not set
where Vendor is the name of the vendor, Type is the name of the a threshold on the comparison score, we are interested in how
device type, and DeviceId is optional and only present when there informative the scoring is with respect to match difference with
are multiple instances of a same device type in the dataset. For some similar (and different distributions). We look at the comparisons
device types (e.g., Google Chromecast, SmartThings hub, Samsung across all devices and show that the more observations we have,
SmartTV), many instances of the same device were present. Also, the better we can differentiate between one device or another. In
we observe that many device types (e.g., Amazon Echo, Google addition, we show that there are class similarities amongst devices,
Dropcam) were present in both datasets. which suggests that we can reliably infer whether or not the device
is likely to be an IoT device. In Section 5.3, we show how to compare
4.2 Training completely new devices – devices for which we have no known
counterpart in our data – to show that the scores provide valuable
We collected over 170, 000 flows from the the UNSW data and over
insight into how the device is behaving relative to known devices,
4.3 million samples from the private lab in the month of Apirl 2017.
whether the behavior distribution is new, and whether or not it is
We only trained our encoder with 60% of the UNSW data. That is,
likely to be an IoT or non-IoT device.
we learned an encoder based only on 100, 000 flows from the UNSW
112
IoTDI’19, April 15–18, 2019, Montreal, QC, Canada Jorge Ortiz, Catherine Crawford, and Franck Le
113
DeviceMien: Network Device Behavior Modeling for Identifying Unknown IoT DevicesIoTDI’19, April 15–18, 2019, Montreal, QC, Canada
class’ distribution shows the distribution of scores when the pair 5.2 Ranking
being compared are both IoT or both non-IoT devices. The ‘diff In the previous section we examined the distribution of normalized
class’ distribution shows the distribution of scores for device pairs scores. However, the normalizing constant depends on the true
belonging to different device classes (i.e. one is IoT and the other is label for the unlabeled device. This is only useful if we know what
non-IoT). We include all devices in both distributions, regardless it is. For example, if we separate models for the same device, for
of the number of observations for each device in the pair. We can different times of the day or week, and we want to compare their
see that the distribution shifts to the right for devices in the same distributions to detect if we observe any changes. In this case, we
class. This suggests that flow behavior for devices from the same know the normalizing constant is the one associated with that
class show more similarity that flow behavior for devices from particular device. However, if the address of the device changes
different classes. However, both display a similar range, which or we are observing a new, unlabeled device, we will not know
indicates that even device pairs across similar (or different) classes how to normalize the score. In order to make use of the scores, we
can show similar types of traffic and may be indistinguishable from must rank the results instead. In principle, ranking should allow us
one another. to create an ordered list of the devices, where the highest ranked
We also examine the relationship between IoT devices and their device is the most similar to the unlabeled one.
similarity within class and without class membership. More specifi- To examine this, we create such a list without normalizing the
cally, we look at the (IoT, IoT) similarity score distribution and the scores and declared a match when the model with the correct label
(Non-IoT, Non-IoT) similarity score distribution. Figure 10a shows was the top-ranked device in the list associated with the corre-
the two CDFs for each distribution. Not surprisingly, we observe a sponding unlabeled device model. Figure 11a shows the rank of
similar relationship to the one seen in Figure 10c. IoT devices resem- the true label device relative to the number of observation made
ble each other more than they resemble Non-IoT devices. However, for the unlabeled device. Notice, the number of observations that
the distribution is more skewed within class than the ‘same-class’ need to be made varies by device. That is, some devices require
distribution observed in the more general class comparison. This less observation to match their true label than others. Figure 11b is
suggests that most IoT devices – between 30-40% of (IoT,IoT) device a close-up of the to upper left-hand corner of the Figure 11a. We
pairs – have a normalized similarity score greater than or equal can see that there is some instability when few observations are
to 0.9. IoT devices share much of the same behavior characteristics. made but that for most devices that instability is bounded, with
However, we still observe that scores span the entire range, sug- the true label appearing in the top-three ranked results. Figure 11a
gesting that IoT devices can behave indistinguishably from non-IoT also shows that the rank is stable as we continue to make new
devices. observations. We also observe that relatively few observations are
Figure 10d and Figure 10b show the distribution comparison needed to get the labeled device to match with its corresponding
within / without class and for an IoT devices a the point of reference, unlabeled counterpart. The average number of TCP-flow samples
respectively. This means that the ‘same class’ is (IoT, IoT) and the that need to be observed before being the top-ranked match is 18.9.
‘diff class’ is (Non-IoT, IoT). Note, for these two graphs we to only Table 1 shows a subset of the devices in our data set and the
look at pairwise comparisons for devices with at least 50 TCP-flow number of match rounds that need to take place before it becomes
sample observations. That is, there is a marked difference is the a match. Note, for most IoT devices the number of rounds is small,
effectiveness of the approach as more observations are available. only requiring a single round of observations (10 TCP-flow samples)
Note that while the distributions do not change very much between before a match is made. While, non-IoT devices – highlighted in
Figure 10c and 10d, we do see a marked and important change blue in the table – typically require more observations. In some
in the in-class score distribution from Figure 10a to 10b. First, a cases, needing as many as 12 round (120 TCP-flow samples) before
smaller fraction of (IoT, IoT) pairs have high scores. Figure 10a has becoming a match. Column three of the table suggests that this
roughly 40% of the distribution with score of 0.5 or higher while is related to the complexity of the model. The more complex the
Figure 10b shows roughly half as many (IoT, IoT) pairs with scores model, the more observation we need to make to get a match. Non-
higher than 0.5. The similarity score becomes more informative IoT devices have more intricate distributions over TCP-flow classes,
with each sample. It is interesting to see the shift in the distribution, therefore they require more observation to identify correctly and
yet this result is not surprising. The second observation is that the with high certainty.
out-of-class distribution maxes out at around 0.75. There is no (IoT, The expected number of observations is a function of the number
Non-IoT) device pair with a score higher than 0.75 when at least 50 of TCP-flow types and their distribution. Skewed distributions are
observations have been made for each device. The two classes are the most challenging, since rare classes take longer to observe.
clearly distinguishable in those cases and this could be used as a It is also challenging to model devices that rarely transmit (or
filter threshold. receive) data. The average number of TCP-flow types observed by
The fact the we get less precise results with less observation is a IoT devices is 6.9 while the number of TCP-flow classes observed
fundamental limitation of supervised methods with weak (or no) from Non-IoT devices is 22. Non-IoT devices are much chattier
priors. We use a weak prior in our model and we use that prior on average than IoT devices, so its fortuitous that IoT devices are
for all of our devices. Different techniques could be used to infer simpler to model and that we can typically do so with high certainty.
a better prior that may require less observations to converge to a Table 2 and Table 3 shows examples of IoT and non-IoT top-3
representative model. However, that is beyond the scope of this ranked matches. Observe the 2nd and 3rd ranked devices in both
paper and we leave that as an exercise for future work. tables. We see that the rank is usually dominated by devices in the
114
IoTDI’19, April 15–18, 2019, Montreal, QC, Canada Jorge Ortiz, Catherine Crawford, and Franck Le
Reference Rank 1 Rank 2 Rank 3 without any prior knowledge about that device. In practice, this is
Android Phone Laptop Android Phone Laptop the common case. It is most often the case that a large fraction of
Samsung Samsung Samsung IoT devices plugged into networks are unregistered. This approach
MacBook
Galaxy Tab Galaxy Tab Galaxy Tab takes advantage of the few labeled ones and becomes more reliable
Android Android Android Samsung as more labels are acquired.
Phone Phone Phone Galaxy Tab
Samsung 5.3 Identifying Previously Unseen Devices
MacBook MacBook MacBook
Galaxy Tab The common case in real deployment will not have a reference
Android distribution to compare to. We attempt to emulate this condition
Laptop Laptop Laptop
Phone by running an experiment where we compare each device to all
Table 3: Non-IoT devices rank high as potential matches for the others. In this case, there is no true ‘match’ so we attempt
other non-IoT devices. This table shows a subset of non-IoT instead to infer whether or not the device is IoT or non-IoT. In
devices for which we observed at least 50 TCP-flow samples. this experiment, we use the device model for every device except
one, and train a one-class SVM [29] with these samples. The model
produces a sequence of TCP-flow classes of size 100 and we treat
this as a feature for our one-class SVM. For the device left out, we
create a model using its data and draw 1000 samples from the model
and run each sample through the one-class SVM for each known
device. The final score is the fraction of samples classified as IoT
vs non-IoT for all known IoT and non-IoT classes. For example, if
there are 10 IoT SVM models and 10 non-IoT SVM models and 800
of the 1000 samples were classified as IoT by at least 1 of the 10
IoT SVMs, then we give a score of 0.8 to IoT. The non-IoT SVM
models are evaluated separely, so we could also get a score 0.8 for
(a) Rank versus observations. (b) Zoomed-in slice of 11a the non-IoT set evaluation.
Dataset F1 Acc
Figure 11: These show the rank of the true label versus the private 0.91 0.83
number of observations made for the unlabeled device. UNSW 0.79 0.66
all 0.76 0.61
same class – with one exception where a ‘PIX-STAR Photo-frame’ Table 4: Performance on different data sets for determining
ranks an ’Android-Phone’ as 3rd most similar. IoT devices show the right class of device when the device has not be previ-
statistical similarity to other IoT devices and non-IoT devices show ously observed.
statistical similarity to non-IoT devices. We see this trend by using
this ranking scheme, independent of the normalizing factor! This is
important. It suggests that we can infer whether or not a new device Table 4 summarizes the aggregate performance figures for the
‘looks’ more like an IoT device or more like an non-IoT device, even leave-one-out experiment on both dataset independently and on
115
DeviceMien: Network Device Behavior Modeling for Identifying Unknown IoT DevicesIoTDI’19, April 15–18, 2019, Montreal, QC, Canada
both, together. That is, for the single-dataset case, we only consider Device IoT Non-IoT
the devices in that dataset in our comparison. For the combined NEST Protect smoke alarm 0 0
one, we consider the type-level similarity for all devices and the Laptop 0 0
one left out. Observe that the F1 score and accuracy is highest for Belkin Wemo switch 0 0
the data set obtained from the private IoT deployment. The F1 score
Dropcam 0.009 0
is 0.91 and accuracy is 0.83. We believe this is because this data
set has many devices with lots of samples. This allows us to build Withings Smart scale 0.001 0
better models of behavior that improve matching accuracy. Triby Speaker 0.003 0
The performance of the combined dataset is affected by the addi- MacBook/Iphone 0.001 0
tion of the UNSW data. We observe that the probabilistic compari- Light Bulbs LiFX Smart Bulb 0.006 0.001
son become less reliable with fewer samples. One way to address Nest Dropcam 0.006 0
this short-coming is to use a better prior distribution for the initial IPhone 0.002 0
device model. Text-based approaches may be used to infer the type
and the associated model parameters for that device could be used
Samsung SmartCam 0.732 0
as a prior until enough samples are obtained. Table 5 shows a num- Netatmo weather station 0.912 0
ber of devices picked from the mixture. We see that our model is Belkin wemo motion sensor 0.033 0
able to identify devices which do not have a similar distribution to Amazon Echo 0.453 0
the ones that have been seen. We also see that some devices do in PIX-STAR Photo-frame 0.727 0
fact show a similar distribution to others that have been observed. Withings Smart Baby Monitor 0.188 0
Also, note that even for devices that do not share much behavior
Netatmo Welcome 0.363 0
characteristics with the others, they do tend to lean (very slightly)
towards to true class type. This suggests that IoT and non-IoT do in-
Withings Aura smart sleep sensor 0.415 0
deed have different distributions on average (with some exceptions, iHome 0.747 0
like the Amazon Echo). TP-Link Day Night Cloud camera 0.852 0
The first 10 devices in the table have show either no leaning or HP Printer 0.438 0
a slight leaning towards the right class. They can essentially be Insteon Camera 0.075 0.003
classified as being ‘new’ distributions. As ‘new’ distributions, the MacBook 0 0.019
administrator could decide to go further in her investigation and
Android Phone 0 0.033
either quarantine the device or kick it off the network entirely. For
the others, there is a stronger lean in one direction or the other. I Samsung Galaxy Tab 0 0.101
would like to draw your attention to the last three entries in the Android Phone 0.005 0.035
table. These were unlabeled in our data set. We have not been able to None 0 0
verify what kind of devices they are, however, our analysis suggests None 0.067 0
that atleast one of them is an IoT device; although confidence in this None 0 0
assessment is small. The other two labeled ‘None’ do not resemble
Table 5: Most devices are declared ‘novel’ or previously unob-
distributions for any of the labeled devices in the dataset.
served. For these devices, note that the confidence score for
the classes is leaning in the right direction. All the ‘None’
6 CONCLUSION devices were positively identified as generating completely
In this paper we presented a technique for modeling device behavior new traffic.
using its network traffic. Our technique consists of three phases.
The first phase is the training phase, whereby we pre-process TCP-
flow data for each device and train a deep LSTM-Autoencoder
network to learn a set of representative features from the data itself. labeled counterpart with high accuracy, especially as more samples
Then, we use a Bayesian hyper-parameter tuning framework to are observed for both the labeled and unlabeled device models. We
tune a clustering algorithm that separates the features into the most also show that it is simple to create and compare models in code,
discernible classes, according to the cluster silhouette score. That only requiring a few lines to capture the distributions effectively.
is, we tune the clustering algorithm to maximize the separation We show that IoT devices and Non-IoT devices show differen-
between the different clusters. Then, we train a classifier on the tiable behavior, through their respective distributions. However,
labels assigned to the clusters. some IoT and non-IoT devices are statistically similar and cannot
In the next phase of the technique we use the classifier to produce be effectively classified as one or the other. We show that we can
a distribution over these classes, by classifying all the TCP-flow infer the class of a device never observed as well, leaning towards
samples for every device. We then create a model of this distribution the correct class on average. We also present a technique that uses
as a multinomial distribution with a Dirichlet prior on the parame- a mixture of one-class SVMs to infer the final class label of unseen
ters. Finally, for each device we want to ‘match’, we generate its devices and even uncover when a device’s distribution is completely
distribution and build a probabilistic model and compare its distri- new. All of these capabilities happen without supervision and on
bution to the others. We show that we can match devices to their unlabeled data. As such, in practice they will be very useful for
116
IoTDI’19, April 15–18, 2019, Montreal, QC, Canada Jorge Ortiz, Catherine Crawford, and Franck Le
providing meaningful information to network administrators as the [24] BBC News. 2014. Smart meters can be hacked to cut power bills. http://www.
number and diversity of IoT devices continues to increase. This can bbc.com/news/technology-29643276.
[25] Jorge Ortiz, Catherine Crawford, Franck Le, and Ali Hasan. 2017. Strange (Internet
help them make better decisions in this setting and can be used in of) Things: Towards Automatic Identification of IoT Devices in the Wild. https:
combination with existing approaches to provide a complete suite //goo.gl/ExWQ6A.
[26] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the Difficulty
of tools that address the challenges with securing network from of Training Recurrent Neural Networks. In Proceedings of the 30th International
the onslaught of unsafe IoT and non-IoT device alike. Conference on International Conference on Machine Learning - Volume 28 (ICML’13).
JMLR.org, III–1310–III–1318. http://dl.acm.org/citation.cfm?id=3042817.3043083
[27] Gartner Research. 2017. Gartner Says 8.4 Billion Connected “Things” Will Be in
REFERENCES Use in 2017, Up 31 Percent From 2016. http://www.gartner.com/newsroom/id/
3598917.
[1] Apple. 2010. Bonjour Service Discovery Suite. https://developer.apple.com/
[28] Peter Rousseeuw. 1987. Silhouettes: A Graphical Aid to the Interpretation and
bonjour/.
Validation of Cluster Analysis. J. Comput. Appl. Math. 20, 1 (Nov. 1987), 53–65.
[2] N. Apthorpe, D. Reissman, and N. Feamster. 2016. A Smart Home is No Cas-
https://doi.org/10.1016/0377-0427(87)90125-7
tle: Privacy Vulnerabilities of Encrypted IoT Traffic. In Workshop on Data and
[29] Bernhard Schölkopf, John C. Platt, John C. Shawe-Taylor, Alex J. Smola, and
Algorithmic Transparency (DAT).
Robert C. Williamson. 2001. Estimating the Support of a High-Dimensional
[3] UNSW Australia. 2017. Testbed Setup for IoT Data Collection. http://149.171.189.
Distribution. Neural Comput. 13, 7 (July 2001), 1443–1471. https://doi.org/10.
1.
1162/089976601750264965
[4] Avahi. 2010. Avahi Service Discovery Suite. http://www.avahi.org/.
[30] A. Sivanathan, H. Habibi Gharakheili, F. Loi, A. Radford, C. Wijenayake, A. Vish-
[5] Y. Bengio, P. Simard, and P. Frasconi. 1994. Learning long-term dependencies
wanath, and V. Sivaraman. 2018. Classifying IoT Devices in Smart Environments
with gradient descent is difficult. IEEE Transactions on Neural Networks 5, 2
Using Network Traffic Characteristics. IEEE Transactions on Mobile Computing
(March 1994), 157–166. https://doi.org/10.1109/72.279181
(2018), 1–1. https://doi.org/10.1109/TMC.2018.2866249
[6] Léon Bottou. 1991. Stochastic gradient learning in neural networks. Proceedings
[31] Arunan Sivanathan, Daniel Sherratt, Hassan Habibi Gharakheili, and Vijay Sivara-
of Neuro-Nımes 91, 8 (1991).
man amd Arun Vishwanath. 2016. Low-cost flow-based security solutions for
[7] CNBC. 2014. Suddenly hot smart home devices are ripe
smart-home IoT devices. In Advanced Networks and Telecommunications Systems
for hacking, experts warn. https://www.cnbc.com/2016/12/25/
(ANTS).
suddenly-hot-smart-home-devices-are-ripe-for-hacking-experts-warn.html.
[32] Arunan Sivanathan, Daniel Sherratt, Hassan Habibi Gharakheili, Adam Radford,
[8] Felix A. Gers, JÃijrgen Schmidhuber, and Fred Cummins. 1999. Learning to Forget: Chamith Wijenayake, Arun Vishwanath, and Vijay Sivaraman. 2017. Characteriz-
Continual Prediction with LSTM. Neural Computation 12 (1999), 2451–2471. ing and Classifying IoT Traffic in Smart Cities and Campuses. In IEEE INFOCOM
[9] Felix A. Gers, Nicol N. Schraudolph, and Jürgen Schmidhuber. 2003. Learning Workshop on SmartCity: Smart Cities and Urban Computing. Atlanta, GA.
Precise Timing with Lstm Recurrent Networks. J. Mach. Learn. Res. 3 (March [33] Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. Practical Bayesian Opti-
2003), 115–143. https://doi.org/10.1162/153244303768966139 mization of Machine Learning Algorithms. In Proceedings of the 25th International
[10] A. Graves, M. Liwicki, S. FernÃąndez, R. Bertolami, H. Bunke, and J. Schmidhuber. Conference on Neural Information Processing Systems (NIPS’12). Curran Associates
2009. A Novel Connectionist System for Unconstrained Handwriting Recognition. Inc., USA, 2951–2959. http://dl.acm.org/citation.cfm?id=2999325.2999464
IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 5 (May 2009), [34] The Verge. 2014. How an army of vulnerable gadgets took down
855–868. https://doi.org/10.1109/TPAMI.2008.137 the web today. https://www.theverge.com/2016/10/21/13362354/
[11] The Guardian. 2013. Will giving the internet eyes and ears mean the dyn-dns-ddos-attack-cause-outage-status-explained.
end of privacy? https://www.theguardian.com/technology/2013/may/16/ [35] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol.
internet-of-things-privacy-google. 2008. Extracting and composing robust features with denoising autoencoders.
[12] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, In Proceedings of the 25th international conference on Machine learning. ACM,
Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N 1096–1103.
Sainath, et al. 2012. Deep neural networks for acoustic modeling in speech [36] Security Week. 2014. Hackers Attack Shipping and Logistics Firms Using
recognition: The shared views of four research groups. IEEE Signal Processing Malware-Laden Handheld Scanners. https://goo.gl/BTppBy.
Magazine 29, 6 (2012), 82–97. [37] Tianlong Yu, Vyas Sekar, Srinivasan Seshan, Yuvraj Agarwal, and Chenren Xu.
[13] Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensional- 2015. Handling a Trillion (Unfixable) Flaws on a Billion Devices: Rethinking
ity of data with neural networks. Science (2006). Network Security for the Internet-of-Things. In Proceedings of the 14th ACM
[14] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Workshop on Hot Topics in Networks (HotNets-XIV).
computation 9, 8 (1997), 1735–1780.
[15] Diederik P. Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes.
CoRR abs/1312.6114 (2013).
[16] Honglak Lee, Chaitanya Ekanadham, and Andrew Y Ng. 2008. Sparse deep belief
net model for visual area V2. In Advances in neural information processing systems.
873–880.
[17] M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas, and J. Lloret. 2017. Network
Traffic Classifier With Convolutional and Recurrent Neural Networks for Internet
of Things. IEEE Access 5 (2017), 18042–18050. https://doi.org/10.1109/ACCESS.
2017.2747560
[18] Wired Magazine. 2014. The Internet of Things is Wildly Insecure – and Often
Unpatchable. https://goo.gl/cuKnLN.
[19] Wired Magazine. 2015. Hackers Remotely Kill a Jeep on the Highway – With Me
In It. https://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/.
[20] F. J. Massey. 1951. The Kolmogorov-Smirnov test for goodness of fit. J. Amer.
Statist. Assoc. 46, 253 (1951), 68–78.
[21] Markus Miettinen, Samuel Marchal, Ibbad Hafeez, Tommaso Frassetto, N. Asokan,
Ahmad-Reza Sadeghi, and Sasu Tarkoma. 2017. IoT Sentinel Demo: Automated
Device-Type Identification for Security Enforcement in IoT. In Proc. 37th IEEE
International Conference on Distributed Computing Systems (ICDCS 2017). IEEE.
[22] Tomas Mikolov, Martin KarafiÃąt, LukÃąs Burget, Jan Cernocká, and Sanjeev
Khudanpur. 2010. Recurrent neural network based language model.. In INTER-
SPEECH, Takao Kobayashi, Keikichi Hirose, and Satoshi Nakamura (Eds.). ISCA,
1045–1048. http://dblp.uni-trier.de/db/conf/interspeech/interspeech2010.html#
MikolovKBCK10
[23] Andrew W. Moore and Denis Zuev. 2005. Internet Traffic Classification Using
Bayesian Analysis Techniques. In Proceedings of the 2005 ACM SIGMETRICS
International Conference on Measurement and Modeling of Computer Systems
(SIGMETRICS ’05). ACM, New York, NY, USA, 50–60. https://doi.org/10.1145/
1064212.1064220
117