Karimibiuki 2019

Drones' Face off:
Authentication by Machine Learning in

Autonomous loT Systems
Mehdi Karimibiuki, Michal Aibin, Yuyu Lai, Raziq Khan, Ryan Norfield, and Aaron Hunter
Department of Computing
British Columbia Institute of Technology (BCIT)
Vancouver, Canada
Abstroct-Autonmnous Internet..,f·Things (loT) are cmnprised falsified information [6]. The attacker's goal is to either crash
of moving objects such as drenes and rovers that us. self· the drone or manipulate the behavior such that the drone lands
control techniques to accomplish a mission while following a in a hostile location [7].
path. However, losing control in such systems usually by spoofing
their sensors or hijacking with misleading cmnmands can lead A. Key Insight
to catastrophic safety consequeoces. In this paper, we elese the
gap by autheoticating the behavior of autonmnous loT systems In this paper, we investigate the following 3 machine-
during eperatlon, In particular, w. eheek the b.havior of a learning classifiers to find abnormal behaviors of a drone:
moving loT object, •. g., a drone, by eYBiuating its tim....ri.. • K-Nearest Neighbour (KNN) [8], which stores the train-
telemetry traces during the flight. W. examin. three differeot
machine-learning algorithms for this parpose, namely, K-Nearest ing flights and classifies new traces by a majority vote of
N.ighbour (KNN), Support Vector Machin. (SVM), aod Logistic its neighbours.
Regression (LR). Our results show that KNN is the best method • Support Vector Machine (SVM) [9], which creates a
of the three selected techniqn.. for antheotication in dynamic hyper-plane during training, then uses it to separate
loT systems, •. g., drones. W. aemeved 93.4% in precision rate correct from incorrect flight paths during new flights. For
and 100% recall rate with KNN.
Index Terms-Drones, Unmanned aircraft systems; Un1llQnned improving accuracy the size of the hyper-plane can be
aiTcrq/t vehit:les; Autonomous; Authe1Jtieation; Machine learning increased or decreased during the testing phase.
• Logistic Regression (LR) [10], which forms a linear
I. INTRODUCTION decision surface based on the training data.
The prevalence of Internet-of-Things (loT) has em- If the time-series data captured from a drone during flight
anated ubiquitous communication between many-to-many au- operation are evaluated as false by the classifiers above, it
tonomous objects. There now exists drones and rovers that means they are off-track data, which consequently means the
receive origin and destination instructions from a control drone is not flying in an expected path. The interpretation of
station to follow a path in autonomous mode. The Federal the failure is that the drone is either behaving abnormally or
Aviation Administration (FAA) is forecasting the number of if deflected from the mission planned, it could be hijacked or
DAVs will grow to 7 million by 2020 [I]. Similarly, it is malfunctioned. In this paper, we do not distinguish between
estimated that there will be 10 million self-driving cars to hit a drone that is malfunctioned and a drone that is hijacked.
roads by 2020 [2]. The growing trends show that there is an Instead, we consider any authentication failure as an attack that
immediate need to secure the safety of such moving objects, is trying to hijack the drone. To the best of our knowledge, we
particularly, checking their behaviour during operation. are the first to evaluate 3 different machine learning algorithms
In this paper, we focus on device authentication by machine using flight data for authenticating drones.
learning algorithms. We pick an emerging loT device, a drone, B. Attack Model
to train 3 classifiers, namely, KNN, SVM, and LR. After
training, we use the classifier to tell if the new data that are We consider an autonomous loT environment where there
are drones to follow command and control (C2 [II]) in-
being collected from a drone are representative of a correct
structions from a trosted hub [12] that manages traffic of
behaviour of the drone or not. We evaluate the correct behavior
of the drone during their flight operation. the friendly drones in the network. The drones upon taking
instructions from the C2 system use their feedback control
A drone could easily get off track and be hijacked via
man-in-the-middle attacks [3], [4]. An attacker can potentially loop to manage themselves to accomplish a mission and to
arrive at destination given.
infiltrate the network with computational power as low as a
Raspberry Pi [5] to fool and derail the drone by commanding Figure I illustrates the attack model. In this scenario, we
assume an attacker can infiltrate the network by man-in-the-
middle (MitM) attack [5], record and learn communication
978-1-7281-3885-5/191$31.00 ©2019lEEE commands between drones and a trusted hub, then manipulate
03 29
Authorized licensed use limited to: SUNY AT STONY BROOK. Downloaded on July 26,2020 at 11:40:18 UTC from IEEE Xplore. Restrictions apply.
II. RELATED WORK
The security of drones with respect to authorization and
authentication has been studied widely in previous work [17]-
[23]. However, in device authentication, Shoufan [24] is close
to our work in that he also used machine learning to distinguish
between authentic and malicious commands. However, our
work establishes authentication based on the behavior of the
drone itself and not the commands that are being commu-
nicated between a drone and a controller. Additionally, in
our approach, a drone could be flying in autonomous mode
without continuous communication with a control station thus
our approach can be used to tell if a drone is travelling a
correct path or a wrong direction.
In a closer work, Bartak et al. [25] used machine learning as
a mechanism to identify activities from a sequence of sensor
reading and corresponding control signals of a drone. Bartak et
al. treat drone behaviour as data that are the basis for learning.
However, Bartak et al. did not articulate if their approach could
generalize to all behaviors of a drone. Our goal is to evaluate
several machine learning techniques and suggest one that could
generally be used for moving loT systems, e.g. drones, and
Fig. 1: Attack model. Only one communication between a their behaviour in the environment as long as the classifier is
drone and the UTM is shown. Other drones in the network well-trained for such.
would also initially communicate with the UTM but eventually III. METHODOLOGY
the MitM dominate their resource availability forcing drones
The machine-learning algorithms that we have selected
to follow her commands.
(KNN, SVM, LR) are suitable specifically for classification in
large-scale datasets. In our approach we first train the models
of each technique and then use the trained model to predict the
and send falsified C2 commands to the drones in the network. validity of newly generated flight paths. We use both genuine
The attacker's goal is to derail the drones from their mission and off-track data to measure precision and recall rates in
by sending them wrong commands. The ultimate goal of the our experiments. Precision rate measures how much of the
attacker is to crash the drones or guide them toward a hostile correct paths are validated correctly by our machine learning
area. Several of such attacks have been attempted in real- algorithms, and recall rate measures how much of incorrect
world [13], [14]. We also have demonstrated this attack by data are correctly distinguished as wrong flight paths.
sending 'arm' and 'disarm' commands to a hobbyist drone [6].
A. K-Nearest Neighbour (KNN)
KNN stores all available flight traces to classify new ones by
C. Contributions a majority vote of its neighbours. All flights are categorized
through calculation of distance. In our research, we imple-
The following are the main contributions of this paper.
mented a KNN model that takes in a pre-processed dataset
• Approach: We study authentication in autonomous loT that has the false data labeled using the Euclidean distance
systems by evaluating three most commonly used su- function. When a prediction is required for an unseen data
pervised machine learning algorithms. We do this by instance, the algorithm searches through the dataset to find the
computing and reporting the precision and recall rates most similar one. Then, the prediction attribute of the collected
for each algorithm (see Table I). instances are summarized as prediction of the unseen instance.
• Experimentation: We collect flight data and experiment The pseudocode of KNN is presented in Algorithm 1.
with an open-source drone simulator called ArduPi-
lot [15]. B. Support Vector Machine (SVM)
• Evaluation: We evaluate the three machine-learning al- Support Vector Machines (SVMs) are learning classifiers
gorithms by recording simulation data taken from the used to analyze data and recognize any patterns in supervised
Ardupilot Software In The Loop (SITL) [16]. We sim- mode. SVMs are typically used for classification and regres-
ulated 3 flight paths and recorded 600 different instances sion. SVM models are based on being given training data and
for every path. We analyzed and compared the results of separating the data into categories. Any data that does not
flight data. We find that KNN yields the highest results fit into those categories will be considered false. The SVM
with the average precision of all flights being 93.4%. will also make predictions for the next points in each of the
2
0330
Algoritbm 1: KNN We chose to use One-Class SVM as one of our classifiers
Data: New flight datasets, source and destination of IIight because every column in the dataset (yaw. roll, pitch. altitude.
Result: Prediction of IIight datapoint (valid - 0 I fake - I) latitude and longitude) is independent of other columns. If
1 Calculate the shortest path between source and one of the rows in the dataset bas fake data. but the other
destination of IIight rows are correct. it could potentially be considered as an
2 Set number of neighbours K = 10 attack. One-Class SVM makes sure all columns in the path are
3 foreach datapoint in Flight Data do following a correct pattern [27]. For authentication in drones.
4 Calculate the Euclidean distance from that datapoint we implemented a One-Class SVM that uses data retrieved
to K neighbours. from the drone flight. The SVM receives training data and
• if Majority of neighbours are fake flights then will predict whether the next given data is either false or true
, I return 0 based on the pattern given from the training data. The training
7 else data is based on sets of valid and fake flights, so the SVM can
8 I return I start to differentiate the differences between them.
9 end
,. end C. Logistic Regression
Logistic Regression is a widely used machine learning
classification algorithm to predict a binary outcome using
categories. This is useful for drone authentication because any a set of independent variables. Instead of finding a pattern.
irregolarities typically mean that an attack has occurred [26]. Logistic Regression finds a relationship between features and
In this paper, we use one-class SVM for the pattern recog- creates a probability of the next state. This machine learning
nition on the flight data. If the drone is misguided or leads model is suitable for our data since our flights are labelled in
off track by an attack, the SVM will detect an anomaly in dichotomous nature. i.e, taking hinary values of either 0 (valid
the pattern. A pattern will be formulated based on the training IIight) or I (fake data).
data and any points that do not follow the pattern created from In this approach. we establish set X of n = 6 indepen-
the SVM will be considered fake. One-Class SVM will then dent features (Xt.X2 .....X n). namely yaw. roll. pitch. altitude.
compare training data with every row in the dataset. latitude and longitude. The probahility of attack detection is
In the SVM algorithm. we are looking to find the hyperplane calculated as:
that maximizes the margin between the two classes (fake
p= 1+.·
.' (2)
and valid data). We are using hinge loss function that helps
maximize the margin. defined as: where y = Yo + y,X, + Y2X2 + ... + YzX n and Yz is the
I coefficient. We calculate coefficients and threshold ( during
211wl12 + CL:max(O, 1- y,(wTx, + b))
, (I) the training process of this model. Since logistic regression
follows a binary outcome and finds a probability. we can use
where w is the regularizer, x and y are coordinates (which this classifier to predict the next outcome of the drones path.
are used to check if the flight is inside the margin of the The threshold ( is later used to determine if the drone does
hyperplane), and finally b is a bias parameter computed as not follow the predicted path. thus. the point is considered as
the average error using the sum over the (target value - a fake one. The processing steps for testing dataset of IIights
predicted value) on the training data. The pseudocode of SVM generated on-the-fly are presented in Algorithm 3.
is presented in Algorithm 2.
Algoritbm 3: Logistic Regression
Algoritbm 2: SVM - Training Data: Coefficients Yz. threshold (. feature set X, new
Data: New flight datasets, source and destination of IIight flight datasets, source and destination of flight
Result: Trained model Result: Prediction of IIight datapoint (valid - 0 I fake - 1)
1 Calculate the shortest path between source and 1 Calculate the shortest path between source and
destination of IIight destination of flight
2 Define an optimal hyperplane: maximize margin. 2 foreach datapoint in Flight Data do
3 foreach datapoint in Flight Data do 3 Find the probability p using the Equation 2
4 if Incorrectly classified then 4 ifp>(then
s I Include penalty for misclassification and decrease • I return I
the margin • else
, else 7 I return 0
7 I Increase the margin 8 end
8 end 9 end
9 end
3
033 1
IV. EXPERIMENTAL SETUP
For simulating drone flights, we used the drone flight sim-
ulator program, ArduPilot [15]. ArduPilot is an open-source
autopilot software that simulates the behavior of unmanned
vehicles such as drones, rovers, etc. This program allows us
to run a drone in real geographical coordinate system (GCS)
and log traces in real-time. The flight data includes time-series
information of drones attitude such as yaw, roll, pitch, and
GCS such as altitude, latitude and longitude. We have collected
such data to train our machine learning models. Figure 2 shows
Fig. 3: lllustration of a fake route. A drone is flying from
a trajectory with origin 1 and destination 2 that has been
origin 1 to destination 3, but taking a detour at mid-point 2.
created to fly a drone in simulator and log time-series data.
We run our simulations on a Windows 10 64-bit machine
with processor Intel(R) i5-8400 CPU @ 2.80 GHz, 6 cores 2) RQ2. What is overhead time to detect if a path is fake or
and 16 GB RAM. We have a script in Python that generates real?
trajectories based on given origin and destination coordinates
(latitude, longitude, and altitude). We generated 600 samples B. RQ1. Precision and Recall
of a same trajectory and used 80% of such data for training In RQl, we check for what is the proportion of correct
and the rest for testing. We selected 3 paths on earth ranging paths are identified as correct paths (precision) as well as
between 100 and 500 meters to fly a drone in simulation and
what proportion of off track paths was identified correctly as
log time-series data. It took an average of about 15 minutes wrong paths (recall). These two measures allow us to check
to train the training functions for such paths with an average the relevance of our models, by calculating the Fl score - a
of 120 snapshots in each log file for every path. However, harmonic average of the precision and recall. Finally, in RQ2,
once the training is finished, validation takes 3 seconds to run
we check what is the training and testing time for models to
through a new path and annotate if the path is fake or real. For
verify their applicability to the existing systems.
creating fake paths, we have the drone fly a detour trajectory
First, we present answer the RQ1 using the Table I:
as shown in Figure 3.
The memory overhead in our work is mainly due to TABLE I: Accuracy of machine learning models.
recordings of the state-space traces for the training sessions.
We have saved up to 1000 traces for each path to train the Technique Precision Recall Fl
machine learning functions. The average size of every trace KNN 93.4% 100% 96.5%
is about 54 kB and the total to save all traces is 54 MB, SVM 95.6% 96% 95.8%
LR 87.34% 72% 78.9%
which is reasonable for the UTM machine. The size of the
implementation code itself is only 11 kB. The training can
happen offline, and only the checker function needs to be run KNN is the best approach overall. It is slightly worse
online. than SVM in Precision metric, but it allows much higher
Recall rates. As a result, it achieves the best Fl score. It
means that we are very effective in detecting fake traffic.
SVM classifies correctly valid traffic more often, but it lacks
on the classification of fake data points. In our application
for authentication of autonomous loT systems, we should
prefer our classifier to be more sensitive; thus, high recall is
essential. The worst classifier is Linear Regression, with its
major problem in the Recall metric.
Fig. 2: lllustration of a sample drone trajectory from origin C. RQ2. Detection Time
point 1 to destination point 2.
Next, we measure the detection time to answer RQ2, in
Table II. We can observe that Linear Regression is the fastest
V. RESULTS approach to train. On the other hand, even a trained model
takes a similar time to detect anomalies. The fastest approach
A. Research Questions
is SVM. It allows moderately fast training and a short time
In our study, we evaluate the approaches presented above to detect anomalies. If there is a specific time constraint in
using two research questions: our systems, we might use prefer it over KNN. KNN shows a
1) RQ1. What are the precision and recall rate results in significantly higher computational cost due to its lazy learning
our simulations of three different machine-leaning algo- - it does not learn a discriminative function from the training
rithms? data but memorizes the training dataset instead.
4
033 2
TABLE II: Average computing time of used models. [12] M. Karimibiuki and A. Ivanov, ''Minic1oud: A mini storage and query
service for local heterogeneous iot devices," in Proceedings of the 8th
Model Training time Detection time International Conference on the Internet ofThings, p. 21, ACM, 2018.
KNN 3 hours 3 seconds [13] S. Shane and D. E. Sanger, "Drone crash in iran reveals secret us
SVM 90 seconds 2 seconds surveillance effort," The New York Times. vol. 7, 2011.
LR 10 seconds 8 seconds [14] K. Jansen, M. Sehllfer, D. Moser, V. Lenders, C. Popper, and J. Schmitt,
"Crowd-gps-sec: Leveraging crowdsourcing to detect and localize gps
spoofing attacks," in 2018 IEEE Symposium on Security and Privacy
(SP), pp. 1018-1031, IEEE, 2018.
VI. CONCLUSION [15] A. D. Team. "ardupilot," URL; www. ardupilot. org, accessed, vol. 2,
Device authentication can be studied by means of machine p. 12,2016.
[16] A. D. Team, "Sill simnlntor (software in the loop);' 2016.
learning algorithms. We have used three supervised learning [17] A. Davauian, F. Massacci, and L. Allodi, "Diversity: A poor man's
methods and find that KNN yeilds the best results for a drone solution to dronetakeover," in Proceedings ofthe 7th International Joint
Coriference on Pervasive and Embedded Computing andCommunication
fligbt dataset. Such investigations can be used in future to Systems (PECCS 2017), Madrid, Spain, July 24-26, 2017., pp. 25-34,
shed some light over authentication of other Autonomous loT 2017.
objects such as rovers and self-driving cars. We also find that [18] C. A. T. Bonilla, O. J. S. Parra, and J. H. D. Forero, "Common secu-
rity attacks on drones," International Journal ofApplied Engineering
KNN althougb it takes for it to train at longer time, it wins
Research, vol. 13, no. 7, pp. 4982-4988, 2018.
over SVM and LR in precision and recall rates. Our results [19] D. Meodes, N. Ivnki, and H. Madeira, ''EIIects of GPS spoofing on
were based on flying a drone simulator in three unique paths unmanned aerial vehicles," in 23rd IEEE Pacific Rim International
Symposium on Dependable Computing, PRDC 2018, Taipe~ Taiwan,
and 600 fligbt samples at each path. December 4-7, 2018, pp. 155-160,2018.
REFERENCES [20] S. M. Giray, ''Anatomy ofnnmanned aerial vehicle hijacking with signal
spoofing," in 2013 6th lntemational Conference on Recent Advances in
[1] F. A. A. (FAA), ''Faa releases 2016 to 2036 aerospace forecast;' March Space Thchnologies (RAST), pp. 795--800, 2013.
2016. [21] I. Gilveny, F. Koohifar, S. Singh, M. L. Sichitiu, and D. Matolak,
[2] forbes.com, "10 million self-driving cars will hit the road by 2020 - "Detection, tracking, and interdiction for amateur drones," IEEE Com-
here's how to profit," Mar. 2017. munications Magazine, vol. 56, no. 4, pp. 75--81, 2018.
[3] N. Rodday, "Hacking a professional drone," Slides at www. blackhot. [22] G. Vasconcelos, R. S. Miani., V. Guizlini, and J. R. Souza, ''Evaluation
comldocslasia-16lmarerialslasia-16-Rodday-Hacking-A-Professional- of dos attacks on commercial wi-B.-based uevs," IJCNIS, vol. 11, no. I,
Drone. pdf, 2016. 2019.
[4] N. M. Rodday, R d. O. Schmidt, and A. Pras, "Exploring security [23] M. Karimibinki, E. Aggarwal, K. Pnttabiraman, and A. Ivanov, "Dynpo-
vulnerabilities of UDmaDDM aerial vehicles:' in NOMS 2016-2016 lac: Dynamicpolicy-based access controlfor iot systems:' in 2018 IEEE
JEEE/lFIP Network Operations andManagement Symposium, pp. 993- 23rd Pacific Rim International Symposium on Dependable Computing
994, IEEE, 2016. (PRDC), pp. 161-170, IEEE, 2018.
[5] jeffq, "Setting up a man-in-the-middl.e device with raspberry pi," Feb. [24] A. Shoufan, "Continuous authentication of DAV flight command data
2014. using behaviometrics," in 2017 IFIPREEE InternationalCoriference on
[6] M. Karimibiuki, "Drone back by man-in-the-middle attack," 2019. Very Large Scale Integration, VLSI-SoC 2017, Abu Dhab~ United Arab
YonThbe link: h\1pll://yootn.beISrJv04RwMUQ. Emirates, October 23-25, 2017, pp. 1--6,2017.
[7] J. McNeely, M. Hatfield, A. Hasan, and N. Jaban, "Detection of [25] P. Abbeel, A. Conte" M. Quigley, and A. Y. Ng, "An application of
uav hijacking and malfunctions via variations in flight data statistics," reinforcement learning to aerobatic helicopter flight," in Advances in
in Security Technology (ICCST), 2016 IEEE International Carnahan Neural Information ProcessingSystems 19. Proceedings ofthe TWentieth
Conference on, pp. 1--8, IEEE. 2016. Annual Conference on Neural1nfonnation Processing Systems, Vancou-
[8] T. M. Cover, P. E. Hart, et aL, "Nearest neighborpattern classification," ver, British Columbia, Canada, December 4-7, 2006, pp. 1-8, 2006.
IEEE transactions on information theory, vol. 13, no. 1, pp. 21-27, [26] A. Bernardini, F. Mangiatordi, E. Pallotti, and L. Capodiferro, ''Drone
1967. detection by acoustic signature identification," Electronic Imaging,
[9] L. Wang, Support vector machines: theory and applications, vol. 177. vol. 2017, DO. 10,pp. 60-64, 2017.
Springer Science & Business Media, 2005. [27] S. Mukkamala andA. H. Sung, "Detecting denialof serviceattacks using
[10] C. M. Bishop, Patternrecognitionand machine learning. springer,2oo6. support vector machines," in The 12th IEEE Intemational Conference
[11] H. C. Ngoyen, R Ammirn, J. wlgerd, I. Z. Kovacs, and P. Mogensen, on Fuuy Systems, 2003. FUZZ'03., vol. 2, pp. 1231-1236, IEEE, 2003.
"Using lte networks for uav command and control link: A rural-area
coverage analysis:' in 2017 IEEE86th Vehicular Technology Conference
(VTC-Fall), pp. 1--6, IEEE, 2017.
5
0333

Karimibiuki 2019

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Karimibiuki 2019

Uploaded by

Copyright:

Available Formats

Drones' Face off:

Authentication by Machine Learning in

You might also like