Primary User Emulation Attacks

274 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO.
1, JANUARY 2011
Dogfight in Spectrum: Combating

Primary User Emulation Attacks in Cognitive
Radio Systems–Part II: Unknown Channel Statistics
Husheng Li and Zhu Han
Adversarial
Abstract—The defense against the Primary User Emulation Uniform random attack
bandit
Selectively random attack
Attack (PUE) is studied in the scenario of unknown channel Maximal interception attack
Unknown channels
learning
statistics (coined blind dogfight in spectrum). The algorithm of
the adversarial bandit problem is adapted to the context of blind Channel 1
dogfight. Both cases of complete and partial information about
the rewards of different channels are analyzed. Performance
Channel 2
bounds are obtained subject to arbitrary channel statistics
and attack policy. Several attack strategies, namely uniformly
random, selectively random and maximal interception attacks, Channel 3
PUE attacker Secondary user
are discussed. The validity of the defense strategy is then
demonstrated by numerical simulation results.
Index Terms—Cognitive radio, primary user emulation attack, Fig. 1. An illustration of the blind dogfight in spectrum.
adversarial bandit algorithm.
I. I NTRODUCTION
channel statistics, e.g. the channel idle (or busy) probabilities
I N recent years, the security issues of cognitive radio

networks have received more and more studies [4]. Possible
attacks on cognitive radio networks include the false report in
and channel state transition probabilities, are all known to
the attacker and secondary users. In practice, this information
may be unavailable. Moreover, the channel properties could
cooperative spectrum sensing [5] and primary user emulation change over time. On the other hand, the Nash equilibrium
attack (PUE) [6] [8]. Particularly, the PUE attack means that strategy may be too conservative for the secondary user since
an attacker sends out primary-user-alike signals during the the attacker may not be smart enough to launch the most sever
spectrum sensing period of secondary users, thus “scaring attack. Actually, it is even possible that the secondary user
away” the secondary users since they are unable to distinguish cannot judge whether an attacker exists.
the signals from primary users and the attacker. Due to the
similarity between the PUE attack and the fight of aircrafts, In this paper, we study the dogfight when the channel
we coined the PUE attack dogfight in spectrum [14]. statistics are unknown (we coin it blind dogfight, as illustrated
As defense strategies, the secondary users can either proac- in Fig. 1). Moreover, the strategy of the attacker could be
tively detect the attacker by estimating the transmission range arbitrary and even time-varying. Then, the key challenge for
of the attacker [4] [6] [8], based on the assumption that the secondary user is how to address the uncertainties in
the attacker has much weaker transmit power than the real the channel statistics and the attacker’s policy. An effective
primary users, or passively avoid being jammed by the attacker approach is to learn the optimal defense strategy using the
[14]. In the first part of this paper [14], we adopted the experience of spectrum access, thus adapting the channel
passive defense policy and modeled the dogfight in spectrum selection policy to the current channel dynamics and attack
as a zero-sum game. Nash equilibrium has been identified strategy. In this paper, we adopt the approach of adversarial
for the game and the performance of anti-jamming at the bandit algorithm [2] [3]. The algorithm is significantly modi-
Nash equilibrium is analyzed. However, in the study of the fied for the context of blind dogfight in spectrum. Moreover,
dogfight game, we have an implicit assumption, i.e., the several attack strategies, namely uniformly random, selectively
random and maximal interception attacks, are discussed. Both
Manuscript received April 17, 2010; revised August 23, 2010; accepted cases of complete and partial information about the rewards
November 2, 2010. The associate editor coordinating the review of this paper of different channels will be considered. The performance of
and approving it for publication was C.-F. Chiasserini.
H. Li is with the Department of Electrical Engineering and Com- the secondary users will be evaluated using both theoretical
puter Science, the University of Tennessee, Knoxville, TN, 37996 (e-mail: analysis and numerical simulations.
husheng@eecs.utk.edu).
Z. Han is with the Department of Electrical and Computer Engineering, Note that this is the first study on applying the adversarial
University of Houston, Houston, TX, 77004. bandit algorithm [2] [3] in the context of anti-jamming in
This work was supported by the National Science Foundation under grants cognitive radio. In contrast to traditional bandit algorithms,
CNS-0910461, CNS-095377, CNS-090556, CCF-0830451, ECCS-0901425
and, ECCS-1028782. which have been applied in cognitive radio for the task of
Digital Object Identifier 10.1109/TWC.2010.112310.100630 channel selection [11], the adversarial bandit algorithm does
1536-1276/11$25.00 ⃝
c 2011 IEEE
LI and HAN: DOGFIGHT IN SPECTRUM: COMBATING PRIMARY USER EMULATION ATTACKS IN COGNITIVE RADIO SYSTEMS–PART II . . . 275
not assume a time-invariant structure of the channel reward, to the assumption that a secondary user knows all channel
thus being suitable for the scenario of arbitrary channel occupancies by the end of each time slot. For the partial
statistics and attack policy. Compared with the original study information, secondary user 𝑖 knows 𝑟𝑖𝑗 (𝑡) only when it
on the adversarial bandit algorithms [2] [3], this paper also sensed channel 𝑗 at time slot 𝑡.
considered the case of multiple player case in which collisions We assume that there exists an attacker. In each time slot,
could occur. Compared with other studies on PUE [4] [6] [8], it chooses one channel and sends out the PUE attack signal
the algorithm in this paper adopts a passive way and does not during the spectrum sensing period. It does not attack during
require collaboration among secondary users. the data transmission period since it requires much higher
The remainder of this paper is organized as follows. The power to suppress the secondary user’s signal. A secondary
system model is introduced in Section II. Both cases of full user is unable to distinguish the signals of attacker and primary
and partial information on the reward are discussed in Sections user. Therefore, the secondary user identifies the attacker as a
III and IV, respectively. Numerical results are provided in normal primary user signal and cannot access the channel if
Section V. Conclusions are drawn in Section VI. it happens to choose the channel that the attacker is jamming,
even if the channel is actually not occupied by primary users.
II. S YSTEM M ODEL Note that the secondary user does not cease transmitting
during the data transmission period even if primary user
We consider a cognitive radio system with 𝑁 secondary
emerges or the attacker begins to jam during this period. We
users and 𝑀 licensed channels. At the beginning of each time
do not specify the attacker’s strategy, which is also unknown
slot, each secondary user can sense only one channel due to
to the secondary users.
its limited capability of sampling1 . For simplicity, we do not
consider spectrum sensing errors. When a channel is found
not to be occupied by primary users, it can be used by the
secondary user for data transmission in the remainder of the III. B LIND D OGFIGHT WITH F ULL I NFORMATION
time slot. For the availability of spectrum information, we
In this section, we consider the case of full information,
consider the following two cases:
i.e., a secondary user knows the rewards of all channels at
∙ Full information: Although a secondary user can sense the end of each time slot. This is reasonable if the data
only one channel during its spectrum sensing, we con- transmission period is much longer than the spectrum sensing
sider the full information case, in which the secondary period. During the data transmission period, the secondary
user knows the states of all channels at the end of each user can switch to different channels for sensing, provided that
time slot since the secondary user can continue to sense it can distinguish the signals of primary users and secondary
during the data communication period2. users. We will consider both single defender and multiple
∙ Partial information: In this case, a secondary user knows defender cases.
only the state of one channel at the end of each time slot.
The states of all other channels need to be predicted.
Each licensed channel is modeled as a random process, A. Single Defender Case
which equals 1 when the channel is not occupied by primary
users and equals 0 when primary users are present. We do not When there is only one defender, we adopt a channel access-
specify the detailed distribution of the spectrum occupancies ing scheme motivated by the Hedge algorithm for adversarial
over different time slots and different channels. The spectrum bandit proposed in [7]. Since there is only one secondary
occupancies could be correlated in time or in spectrum. The user defender, we omit the indices for secondary users. In
performance analysis does not depend on the distribution. contrast to the Hedge algorithm, we adopt a forgetting factor
Spectrum measurement obtained from measurements will be for the estimation of merits of different channels while the
used in the simulation. We denote by 𝜇𝑚 the probability Hedge algorithm uses the sum of rewards which could incur
that channel 𝑚 is not occupied by primary users. These an overflow problem in practical systems. As we will see, the
probabilities are unknown to the secondary users. For each proposed algorithm is also similar to the 𝑄-learning algorithm
spectrum sensing, a secondary user receives reward 1, if the in reinforcement learning [15]. After proposing the algorithm,
sensed channel is idle and there are no other secondary users we derive lower bounds for the performance of spectrum
competing for this channel, and otherwise 0. We denote by sensing subject to several typical PUE attack strategies.
𝑟𝑖𝑗 (𝑡) the channel availability (1: available; 0: unavailable)3 1) Algorithm: For the defender, we set a value for each
for secondary user 𝑖 over channel 𝑗 at time slot 𝑡. In the channel, denoted by 𝑄𝑖 for channel 𝑖, which represents the
full information case, for any channel 𝑗, 𝑟𝑖𝑗 (𝑡) is known to goodness of the channel and can be initialized to 0. In the
secondary user 𝑖 at the beginning of time slot 𝑡 + 1 due 𝑡-th time slot, the values of {𝑄𝑖 }𝑖=1,...,𝑀 are updated by
1 It is easy to extend to the case in which a secondary user can sense
multiple channels. We only need to change the action space to the set of
𝑄𝑖 (𝑡) = (1 − 𝛼(𝑡))𝑄𝑖 (𝑡 − 1) + 𝛼(𝑡)𝑟𝑖 (𝑡), ∀𝑖, (1)
channels that the secondary user senses.
2 We assume that secondary users can well distinguish the signals of primary
where 𝑟𝑖 (𝑡) is the availability of channel 𝑖 (since there is only
users and secondary users. one defender, we ignore the index of the defender) and 0 <
3 Note that 𝑟 (𝑡) represents the potential reward at channel 𝑗 for secondary
𝑖𝑗
user 𝑖 at time slot 𝑡. It is independent of the spectrum sensing and is not the 𝛼(𝑡) < 1 is a forgetting factor which could be time-varying
actual reward of secondary user. or constant.
276 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 1, JANUARY 2011
Q1=3 Q1=3 Q1=3 slots when using a sufficiently small constant forgetting
factor, by comparing with the performance of the strategy
insisting on only one channel in all time slots. The
Q2=1 Q2=1 Q2=1 corresponding proof is given in Appendix A. Note that
Uniformly random attack Selectively random attack Maximal interception attack
the lower bound holds for any arbitrary strategy of the
PUE attack.
Fig. 2. An illustration of three PUE attacks for single user case. It is assumed Proposition 1: When 𝛼𝑖 (𝑡) = 𝛼 < 𝑇 , ∀𝑖, 𝑡, for the
that there are two channels and the 𝑄 values are labeled in the corresponding spectrum access algorithm in Procedure 1, we have
squares. The probability of jamming the corresponding channel is represented
by the thickness of the arrow. ∑t ∑ 𝑀 ∑t
𝛼(1 − 𝛼)t−𝑡 𝑟𝑗 (𝑡) − ln 𝑀
𝑝𝑖 (𝑡)𝑟𝑖 (𝑡) ≥ 𝑡=1 ,
𝑡=1 𝑖=1
𝛼(𝑒 − 1)
Then, the probability of accessing channel 𝑖 is given by the (3)
Boltzman distribution4 [15]:
for all 𝑗 = 1, 2, ..., 𝑀 .
𝑄𝑖 (𝑡−1)
𝑒 𝑇 ∙ Uniformly Random Attack: When the attacker jams dif-
𝑝𝑖 (𝑡) = ∑ 𝑄𝑗 (𝑡−1)
, (2) ferent channels uniformly, the availability probability of
𝑀
𝑗=1 𝑒 𝑇
channel 𝑖 is equal to (𝑀−1)𝜇 𝑖
. When 𝛼𝑖 (𝑡) decreases with
𝑀
where 𝑇 is a constant called temperature which is used to time and satisfies
∑∞
control the balance between the exploration and exploitation.
The intuition of the algorithm is to choose channels having 𝛼𝑖 (𝑡) = ∞, (4)
𝑡=1
good reward histories with higher probabilities. The procedure
of spectrum access is summarized in Procedure 1. ∑Note that, and
𝑡
in the Hedge algorithm, the sum of rewards, 𝜏 =1 𝑖 (𝜏 ),
𝑟 ∞
∑
is used in lieu of the 𝑄𝑖 (𝑡) used in this paper. Obviously, 𝛼2𝑖 (𝑡) < ∞, (5)
the sum of rewards diverges to ∞ as 𝑡 → ∞, while 𝑄𝑖 (𝑡) 𝑡=1
is upper limited by 1, which is more numerically stable. It the procedure of updating 𝑄𝑖 (𝑡) converges to the expec-
is easy to verify that the computational complexity is low. tation of the reward of channel 𝑖 in each time slot, which
The main computational cost is incurred by the algorithm is is equal to (𝑀−1)𝜇 𝑖
[15]. Then, the expected reward in
𝑀
the computation of 𝑄-values in Line 7 and the update of each time slot is given by
the spectrum access probabilities in Line 8, which are both
∑𝑀 (𝑀 −1)𝜇𝑖
linear with respect to the number of channels, as well as the (𝑀 − 1)𝜇𝑖 𝑒 𝑀𝑇
random number generation in Line 4, which is also linear in 𝑟¯ = ∑𝑀 . (6)
𝑖=1
𝑀 (𝑀 −1)𝜇𝐾
𝑒 𝑀𝑇
the number of channels and can be efficiently implemented 𝑘=1
with many random number generation algorithms. ∙ Selectively Random Attack: Since a larger 𝑄𝑖 (𝑡) implies
a higher probability of sensing channel 𝑖, it also means
Procedure 1 Procedure of Channel Selection with Full Infor- that the attacker may have more opportunity to intercept
mation the defender over channel 𝑖. Therefore, the attacker can
1: Initialize all 𝑄-values to 0. adopt a selectively random attack strategy by considering
2: Randomly choose the spectrum access probabilities. 𝑄𝑖 (𝑡) as the metric over channel 𝑖 and using the Boltzman
3: for Each time slot 𝑡 do
distribution for the channel selection probability. Then,
4: Randomly choose one channel to carry out spectrum sensing
according to the spectrum access probabilities, which can be the jamming probability over channel 𝑖 is given by (same
realized by a software or hardware random number generator. as (2))
5: Data communication if the channel is idle. 𝑄𝑖 (𝑡−1)
6: Collect the states of all channels, {𝑟𝑖 (𝑡)}𝑖=1,...,𝑁 . 𝑒 𝑇
7: Update the 𝑄-values using (1). 𝑞𝑖 (𝑡) = ∑ 𝑄𝑗 (𝑡−1)

. (7)
𝑀
8: Compute the spectrum access probabilities for time slot 𝑡 + 1 𝑗=1 𝑒 𝑇
using (2).
9: end for Then, the expected reward of the defender in channel
𝑖 is equal to (1 − 𝑞𝑖 (𝑡))𝜇𝑖 . The stationary point of the
dynamics of {𝑄𝑖 (𝑡)}𝑖=1,...,𝑀 is given by
2) Performance Analysis: We first provide a performance ⎛ ⎞
𝑄𝑖
lower bound for arbitrary strategies of PUE attacks. Then, we 𝑒 𝑇
discuss three special cases of PUE attacks, which is illustrated 𝑄 𝑖 = 𝜇𝑖 ⎝ 1 − ∑ 𝑄𝑗

⎠. (8)
𝑀
in Fig. 2. 𝑗=1 𝑒 𝑇
∙ Arbitrary Attack Strategy: The following proposition For simplicity, we consider the special case of equal chan-
provides a lower bound for the sum of rewards in t time nel availabilities, i.e., 𝜇𝑖 = 𝜇, ∀𝑖. Then, 𝑄𝑖 = 𝜇(𝑀−1) 𝑀 ,
∀𝑖, is a solution to (8). Moreover, we have the following
4 There are usually two types of distributions for selecting the actions in
simple lemma
machine learning, i.e., 𝜖-greedy and Boltzman distribution [15]. The reason
for choosing the Boltzman distribution is due to its continuity which facilitates Lemma 1: 𝑄𝑖 = 𝜇(𝑀−1) 𝑀 , ∀𝑖, is the only stationary point
the mathematical analysis. of (8).
Proof: Suppose that there is another stationary point, 3/3 3/3 3/3
in which there exist 𝑖 ∕= 𝑗 such that 𝑄𝑖 > 𝑄𝑗 . Then, we 0.5 / 0.5 / 0.5 /
have 0.01 0.01 0.01
𝑄𝑖 𝑄𝑗 0.01 / 0.01 / 0.01 /
𝑒 𝑇 𝑒 𝑇 0.01 0.01 0.01
∑𝑀 𝑄𝑘 > ∑𝑀 𝑄𝑘 , (9)
𝑘=1 𝑒 𝑇
𝑘=1 𝑒 𝑇
Uniformly random attack Selectively random attack Maximal interception attack
which implies that the righthand side of (8) for channel

Fig. 3. An illustration of three PUE attacks for multiple secondary user
𝑖 is smaller than that for channel 𝑗, which conflicts with case. We assume that there are three channels and two secondary users. Each
the assumption 𝑄𝑖 > 𝑄𝑗 . channel is represented by a square. The numbers 𝑥/𝑦 in each square mean
Then, at the stationary point, the attacker jams each that 𝑥 is the 𝑄-value of secondary user 1 for this channel and 𝑦 is the 𝑄-value
of secondary user 2 for this channel.
channel in an equiprobable manner, which is the same
as the uniform random attack.
The following proposition shows that the game between
the attacker and the defender converges under certain perfectly avoid collisions, e.g., via a powerful common control
conditions. channel, we can adopt the same approach as in the first part
Proposition 2: If (4) and (5) hold and the attacker uses of this paper [14], i.e., assuming an equivalent secondary user
the selectively random attack, the attacker’s strategy who can access 𝑀 channels simultaneously. Since the analysis
converges to the uniformly random attack, when 𝜇𝑖 = 𝜇, is similar, we omit the case of perfect collision avoidance due
∀𝑖. to the limited space.
The proof given in Appendix B is based on the theory of Since the strategies of the attacker, channel and other
stochastic approximation [10] and the stability analysis of defenders are arbitrary for each individual defender, the adver-
an ordinary differential equation, similarly to the proof in sarial multi-armed bandit technique still applies. Again, each
[12]. As will be seen, the performances of the uniformly defender uses Procedure 1 to compute its strategies. Below, we
random and selectively random attacks are close to each discuss the strategies of the attacker that have been addressed
other even when they have not converged. for the single defender case in the previous section. These
∙ Maximal Interception Attack: Another intelligent attack strategies are illustrated in Fig. 3 for the case of multiple
strategy is to always jam the channel that the defender secondary users. It is very difficult to analyze the performance
will sense most probably. This attack is possible since the for the multiple defender case. Therefore, we will evaluate the
attacker is assumed to know the 𝑄-values of all channels performance in the numerical simulation section.
of the defender. Such an attack strategy can maximize
∙ Uniformly Random Attack: Similarly to the single de-
the probability that the attacker intercepts the defender.
fender case, the availability probability of channel 𝑖
Therefore, it is coined maximal interception attack.
subject to the uniformly random attack is equal to
The performance lower bound of the defender subject (𝑀−1)𝜇𝑖
𝑀 . Then, the problem is equivalent to the compet-
to the maximal interception attack is provided in the
itive spectrum sensing addressed in [13]. According to
following proposition, whose proof is given in Appendix
the conclusion (Prop. 1) in [13], the defenders’ strategies
C.
will converge to a certain stationary point, when (4) and
Proposition 3: If the attacker carries out the maximal
(5) hold.
interception attack, the expected reward of the defender
∙ Selectively Random Attack: We assume that the attacker
is lower bounded by
[ t ] chooses a channel to attack using the following probabil-
∑ 𝛼
t𝑒− 𝑇 ities:
𝐸 𝑟𝑖 (𝑡) ≥ 𝛼 min 𝜇𝑚 . (10)
𝑡=1
1 + 𝑒− 𝑇 𝑚 ∑𝑁
𝑛=1 𝑄𝑛𝑖 (𝑡−1)
An interesting observation is that the lower bound is 𝑒 𝑇

𝑞𝑖 (𝑡) = ∑ ∑𝑁
𝑄 (𝑡−1)
, 𝑖 = 1, ..., 𝑀, (11)
improved as 𝑇 increases. When 𝑇 → ∞, the average 𝑀 𝑚=1 𝑛𝑗
𝑗=1 𝑒
𝑇
reward is lower bounded by 12 min𝑚 𝜇𝑚 . As will be seen

in the numerical simulations, the actual average reward
where 𝑄𝑛𝑖 is the 𝑄-value for the 𝑖-th channel and
also increases with 𝑇 .
secondary user 𝑛. The rationale of (11) is to assign
more probability to the channel having a larger sum
B. Multiple Defender Case of 𝑄-values, which implies higher throughput over this
In this subsection, we discuss the multiple defender case. channel. Similarly to Prop. 2, the strategies of both
We assume that the defenders cannot exchange information defenders and attacker will converge to a stationary point
for collision avoidance. If two or more secondary users sense when 𝜇𝑖 are identical. The proof is omitted due to limited
the same channel and then transmit over the channel, the space.
transmissions will collide and then fail. This is reasonable ∙ Max Interception Attack: In this type of attack, the
since the information exchange for collision avoidance, e.g., attacker chooses the most probable channel which one
request-to-send (RTS) and clear-to-send (CTS), incurs signifi- and only one secondary user senses. There is no need to
cant overhead, particularly when the primary user’s activity is jam a channel over which two or more secondary users
highly dynamic. For the case in which the secondary users can sense since their transmission will collide. Therefore, the
channel that the attacker chooses at time slot 𝑡 is for all 𝑗 = 1, 2, ..., 𝑀 .
⎡ The analysis on different attacking strategies is similar to
∑𝑁 𝑄𝑛𝑚 (𝑡−1)
⎣ 𝑒 that in Section III. We omit the analysis due to limited length

𝑇
𝑚∗ (𝑡) = arg max ∑𝑀 𝑄𝑛𝑗 (𝑡−1)
𝑚 of the paper and demonstrate the performance in the section
𝑛=1 𝑗=1 𝑒
𝑇
⎛ ⎞⎤ of numerical results.
𝑄𝑘𝑚 (𝑡−1)
∏ 𝑒 𝑇
× ⎝1 − ⎠⎦ , (12)
∑𝑀 𝑄𝑘𝑗 (𝑡−1) V. N UMERICAL R ESULTS
𝑘∕=𝑛 𝑗=1 𝑒 𝑇
In this section, we use numerical simulation to demonstrate

where the first term in the right hand side is the prob- the performance of the proposed algorithms for combating the
ability that secondary user 𝑛 senses channel 𝑚 while PUE attack. We use the spectrum occupancies of 20 channels
the remaining product is the probability that all other between 2.4GHz to 2.5GHz from measurement experiment in
secondary users do not access channel 𝑚. both indoor and outdoor environments. The indoor measure-
ment lasted for 30 minutes while the outdoor one lasted for
IV. B LIND D OGFIGHT WITH PARTIAL I NFORMATION 90 minutes. More details can be found in the first part of this
In the previous section, we assume full information for the paper [14].
secondary users by assuming that they can sense the states
of all channels in the subsequent period for data transmission. A. Complete Information
However, this assumption may not hold for situations in which We first tested the performance when the secondary users
a secondary user cannot distinguish the signals of primary have the complete information about the rewards of all chan-
users and secondary users or has only one antenna (either nels.
for transmitting or for sensing). Therefore, each secondary 1) Single User Case: Figures 4 and 5 show the average
user knows only the reward in the channel it senses. We rewards in each time slot versus different temperatures for
call this situation the blind dogfight with partial information. the indoor and outdoor environments, respectively. We assume
Then, the algorithm in Procedure 1 should be extended to the that there is only one secondary user, which has the complete
case of partial information. The extension is similar to the information about all channels. We set 𝛼 = 0.01. Note that
Exp3 algorithm in [2]. For completeness of the paper, we list the average reward is no more than 0.95. This is because
the procedure of channel selection with partial information in that there are 20 channels. If the attacker takes a uniformly
Procedure 2. random attack, then the probability that the secondary user is
jammed over a certain channel is 0.05. We observe that the
Procedure 2 Procedure of Channel Selection with Partial
maximum interception attack causes the most serious damage
Information
1: Initialize all 𝑄-values to 0 in Procedure 1. Initialize the
to the average reward, especially when the temperature is
parameter 𝛾. low. The reason is that the actions of the secondary user
2: for Each time slot 𝑡 do are more focused on the desired channel and are easier to
3: Compute the spectrum access probabilities {𝑝𝑖 }𝑖=1,...,𝑀 us- be intercepted by the attacker. The performances subject to
ing Procedure 2. selectively and uniformly random attacks are very similar
4: Select action 𝑖∗𝑡 randomly using the probability 𝑝˜𝑗 = (1 −
𝛾)𝑝𝑗 + 𝑀𝛾
.
(the uniformly random attack is slightly more harmful) and
5: Receive reward 𝑟. are almost constant for different temperatures. As have been
6: Generate simulated reward using shown in Prop. 2, the selectively random attack finally be-
{ 𝑟 comes the uniformly random attack when the opportunities
, if 𝑗 = 𝑖∗𝑡
𝑥˜𝑗 = 𝑝
˜𝑖∗
𝑡 . (13) of different channels are the same. Since the channels are
0, otherwise not symmetric in the measurement data, there is a small gap
7: Feed the simulated reward {˜
𝑥𝑖 }𝑖=1,...,𝑀 to update the 𝑄 between the performance under the selectively random attack
values. and the uniformly random attack. When the temperature is
8: end for high, i.e., the actions are more distributed over the channels,
the performances subject to different attack policies are sim-
ilar. This motivates us to adopt a higher temperature for the
For the case of partial information, a performance lower
blind dogfight in spectrum. Meanwhile, we observe that the
bound is given in the following proposition for the single
performances in both the indoor and outdoor environments
defender case. The proof is similar to that of Theorem 4.1
are similar while the performance in the outdoor environment
in [2] and is provided in Appendix D.
is slightly worse. The reason is that the average spectrum
Proposition 4: For the single defender case with partial occupancy probability in the outdoor environment is slightly
observations, the performance of Procedure 2 is lower bounded higher, according to the measurement data. Note that the lower
by bound in Prop. 1 is also shown in both figures. We observe
[ t ]
∑ 1 ( 𝛾 ) that this lower bound is quite loose since it is valid for all
𝐸 𝑟𝑖 (𝑡) ≥ 1− possible attack strategies.
1−𝛾 𝑀
∑t
𝑡=1 In Fig. 6, we also tested the case of multiple attackers.
t−𝑡
𝑡=1 𝛼(1 − 𝛼) 𝐸 [𝑟𝑗 (𝑡)] − ln 𝑀 We assume that these attackers are unable to collaborate
× , (14)
𝛼(𝑒 − 1) due to the lack of communication means. Therefore, the
1 1
0.9 0.9
0.8 0.8
0.7
0.7
average reward
average reward
0.6
0.6
0.5
0.5
0.4
0.4 1 attacker: uniform
0.3
2 attackers: uniform
0.3 uniformly random attack 0.2 3 attackers: uniform
selectively random attack 1 attacker: max
0.2 max interception attack 0.1 2 attackers: max
lower bound 3 attackers: max
0.1 0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
temperature temperature
Fig. 4. Average rewards versus different temperatures with complete Fig. 6. Average rewards versus different temperatures with complete
observation for the indoor measurements and a single secondary user observation for the indoor measurements, single secondary user and multiple
attackers
0.9
0.9
0.8
0.8
0.7
0.7
0.6
average reward
average reward
0.6
0.5
0.4 0.5
0.3 0.4
uniformly random attack
0.2 selectively random attack 0.3
max interception attack PU traffic level 1
0.1 lower bound PU traffic level 2
0.2 PU traffic level 3
0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.1
temperature 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
temperature
Fig. 5. Average rewards versus different temperatures with complete

Fig. 7. Average rewards versus different primary user traffics for the indoor
observation for the outdoor measurements and a single secondary user
and complete observation scenario
attackers make decisions independently. We tested only the 2) Multiple User Case: Figures 8 and 9 show the average
uniform attack case and the maximal interception attack case rewards in each time slot versus different temperatures for
for the indoor situation. We observe that, as the number of the indoor and outdoor environments, respectively, when there
attackers increases, the effect of uniform attack is improved. are five secondary users. A significant different between the
An interesting observation is that the effect of attack is almost multiple user and single user cases is that the performance de-
not changed for the maximal interception when the number creases almost monotonically with respect to the temperature,
of attackers increases from 1 to 3. The reason is that all the even for the maximum interception attack case. A reasonable
attackers will jam the same channel with a large probability explanation is that, when the actions of the secondary users are
due to the lack of collaboration. Therefore, much of the more distributed due to the higher temperature, the probability
attack effort is wasted. Hence, from the viewpoint of attackers, of collisions is also increased, thus decreasing the average
collaboration is of key importance, which is beyond the scope reward of the secondary users. Another interesting observation
of this paper. is that the performance in the maximal interception attack
In Fig. 7, we tested the impact of the primary user traffics is significantly improved in the low temperature regime,
for the indoor and complete observation case. We set three compared with the single secondary user case. A reasonable
traffic levels for the primary users. In level 1, we use the explanation is that the multiple secondary users diversified the
original measurement data. In levels 2 and 3, we add random focus of the attacker, thus averaging the damage to different
data traffics such that the primary user data traffic is doubled secondary users. Therefore, in a multiple user cognitive radio,
and tripled, respectively, compared with the traffic level 1. we need to consider using low temperatures in the adversarial
We observe that the average reward is significantly decreased bandit algorithm. Note that this does not mean that the temper-
when the primary user data traffic is increased. This also ex- ature should be as low as possible since the simulation does
plain why the outdoor data incurs worse performance than the not consider the speed of learning and the possible change
indoor data since the data traffic in the outdoor measurement of environment. If the temperature is too low, the learning
set is higher. procedure will be very slow and cannot track the environment
0.95 1
selectively random attack 0.9
max interception attack
0.9
0.8
average reward
average reward
0.7
0.85
0.6
0.8 0.5
0.4 1 secondary user

0.75 3 secondary users
0.3 5 secondary users
9 secondary users
20 secondary users
0.7 0.2
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Fig. 8. Average rewards versus different temperatures with complete Fig. 10. Average rewards versus different temperatures with complete
observation for the indoor measurements and multiple secondary users observation for the indoor measurements and various numbers of secondary
users
0.95
uniformly random attack 1
selectively random attack
0.9 max interception attack 0.9
0.85
0.8
average reward
0.8 0.7
average reward
0.75 0.6
0.7 0.5
0.4
0.65
0.3
0.6 uniformly random attack
0.2 selectively random attack
0.55 max interception attack
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.1
temperature 0 0.5 1 1.5 2 2.5
temperature
Fig. 9. Average rewards versus different temperatures with complete
observation for the outdoor measurements and multiple secondary users Fig. 11. Average rewards versus different temperatures with partial obser-
vation for the indoor measurements and a single secondary user
change. Therefore, a tradeoff needs to be achieved between

the steady state average reward and the learning speed. indoor and outdoor environments, respectively, when there is
In Fig. 10, we tested the impact of the number of secondary one secondary user, which has only partial information about
users. The number of secondary users ranges from 1 to 20. the reward. The performance is very similar to that of the com-
We observe that, as the number of secondary users increases, plete information case. This demonstrates the effectiveness of
the average reward in the low temperature regime is increased Procedure 2.
since the multiple secondary users can be more coordinated. 2) Multiple User Case: Figures 13 and 14 show the average
On the other hand, the average reward is decreased in the rewards in each time slot versus different temperatures for
high temperature regime when the number of secondary users the indoor and outdoor environments, respectively, when there
is increased. The reason is: when the temperature is increased, are five secondary users, which have only partial information
the actions of the secondary users become more chaotic since about the reward. The performance is slightly worse than
there is no explicit coordination among the secondary users, the complete information case while the performance trend
thus incurring more collisions and less average reward. with respect to the temperature is very similar to that of the
complete information case. The reason is that the incomplete
B. Partial Observation observation will degrade the decision made by the secondary
user. Therefore, this demonstrates the validity of Procedure 2
Then, we tested the performance when the secondary users in the multiuser scenario.
have only partial information about the rewards, i.e., every
secondary user does not know the rewards over the channels
it does not sense. Note that, for the partial information case, VI. C ONCLUSIONS
we use higher temperatures than the complete information case The game between defending secondary user(s) and an
since the simulated reward is higher than the real reward in attacker in a multichannel cognitive radio system is modeled
Procedure 2. as a dogfight game in spectrum. When the channel statistics,
1) Single User Case: Figures 11 and 12 show the average e.g. the channel idle probabilities, are unknown to both the
rewards in each time slot versus different temperatures for the defender(s) and attacker, the dogfight game is blind. To
0.9 0.74

0.8
selectively random attack
0.7 max interception attack
0.7
0.68
average reward
average reward
0.6
0.66
0.5
0.64
0.4
0.62
0.3
0.6
0.2 selectively random attack 0.58
max interception attack
0.1 0.56
0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5
Fig. 12. Average rewards versus different temperatures with partial obser- Fig. 14. Average rewards versus different temperatures with partial obser-
vation for the outdoor measurements and a single secondary user vation for the outdoor measurements and multiple secondary users
0.86
Then, we have
selectively random attack ∑𝑀 (1−𝛼)𝑄𝑚 (𝑡) 𝛼𝑟𝑚 (𝑡)
max interception attack 𝑆𝑡+1 𝑒 𝑇 𝑒 𝑇
0.82 =
𝑆𝑡 𝑚=1
𝑆𝑡
0.8 ( )
average reward
(1−𝛼)𝑄𝑚 (𝑡)
0.78 ∑𝑀 𝑒 𝑇 1 + (𝑒−1)𝛼𝑟
𝑇
𝑚 (𝑡)
≤
0.76
𝑚=1
𝑆𝑡
0.74
∑𝑀 (1−𝛼)𝑄𝑚 (𝑡)
𝑚=1 𝑒
𝑇
= ∑𝑀 𝑄𝑚 (𝑡)
0.72
𝑚=1 𝑒
𝑇
0.7 ∑𝑀 𝑄𝑚 (𝑡)
𝛼(𝑒 − 1) 𝑚=1 𝑟𝑚 (𝑡)𝑒 𝑇
0.68 +
0 0.5 1 1.5 2 2.5 𝑇 𝑆𝑡
temperature ∑𝑀 𝑄𝑚 (𝑡)
𝛼(𝑒 − 1) 𝑚=1 𝑟𝑚 (𝑡)𝑒
𝑇
≤ 1+ , (16)
Fig. 13. Average rewards versus different temperatures with partial obser- 𝑇 𝑆𝑡
vation for the indoor measurements and multiple secondary users
where the first equality is due to the definition of 𝑆𝑡 , the first
inequality is due to the assumption 𝑇𝛼 < 1 and the inequality
combat the arbitrary strategies of the attacker and the unknown (1 + 𝑥)𝑎 ≤ 1 + 𝑎𝑥, when 𝑎 < 1, the second equality is
channel statistics, the defender(s) considers each channel as a obtained by decomposing the result into two terms and the
bandit arm and uses the technique of adversarial multi-armed last inequality is because
bandit as its strategy. We have considered several typical types
𝑒(1−𝛼)𝑄𝑚 (𝑡)
of attacks. For the single defender case, we have derived the = 𝑒−𝛼𝑄𝑚 (𝑡) ≤ 1, (17)
corresponding lower bound of performance for arbitrary attack 𝑒𝑄𝑚 (𝑡)
strategies. The discussion has been extended to the multiple since 0 ≤ 𝑄𝑚 (𝑡) ≤ 1.
user case. We have also discussed both cases of full and Then, we have
partial information. Numerical simulations have been used
t
∑
to demonstrate the performance of the proposed algorithm 𝑆t 𝑆𝑡+1
ln = ln
for both single defender and multi defender cases. We have 𝑆1 𝑆𝑡
𝑡=1
observed that the proposed maximal interception attack incurs t
( ∑𝑀 )
∑ 𝑄𝑚 (𝑡)
𝛼(𝑒 − 1) 𝑚=1 𝑟𝑚 (𝑡)𝑒 𝑇
the worse performance degradation. The numerical results ≤ ln 1 +
also show that the performance degradation decreases with 𝑡=1
𝑇 𝑆𝑡
increasing temperature, except for the maximal interception ∑t ∑𝑀 𝑄𝑚 (𝑡)
attack in the single defender case. 𝛼(𝑒 − 1) 𝑚=1 𝑟𝑚 (𝑡)𝑒 𝑇

≤
𝑡=1
𝑇 𝑆𝑡
A PPENDIX A t 𝑀
𝛼(𝑒 − 1) ∑ ∑
P ROOF OF P ROP. 1 = 𝑢𝑚 (𝑡)𝑟𝑚 (𝑡), (18)
𝑇 𝑡=1 𝑚=1
Proof: We define
𝑀 where the second inequality is due to ln(1 + 𝑥) ≤ 𝑥, when
∑
𝑆𝑡 = 𝑒
𝑄𝑚 (𝑡)
𝑇 . (15) 𝑥 > 0. The conclusion ∑
(3) is obtained by the observation that
1 t t−𝑡
𝑚=1 𝑆1 = 𝑀 and 𝑆𝑇 ≥ 𝑒 𝑇 𝑡=1 𝛼(1−𝛼) 𝑟𝑚 (𝑡) , ∀𝑚.
A PPENDIX B A PPENDIX C
P ROOF OF P ROP. 2 P ROOF OF P ROP. 3
Proof: It is easy to verify that the updating rule in (1) is Proof: For lower bound (10), we have the following key
equivalent to solving the equation (8) using Robbins-Monro observation: for 𝑚1 (𝑡) = arg max𝑚 𝑟𝑚 (𝑡), i.e., the channel
algorithm [10], i.e. having the highest sensing probability at time 𝑡, 𝑟𝑚1 (𝑡) = 0
since the attacker uses the maximal interception attack strategy
q(𝑡 + 1) = (1 − 𝛼(𝑡))q(𝑡) + 𝛼(𝑡)r(𝑡) and always jams this channel. However, 𝑄𝑚1 (𝑡) ≤ 𝑄𝑚2 (𝑡) +
= q(𝑡) + 𝛼(𝑡)Y(𝑡), (19) 𝛼, where 𝑚2 is the index of the channel having the second
largest 𝑄 value. We can prove this by carrying out induction
where q = (𝑄1 , ..., 𝑄𝑀 ), 𝛼(𝑡) is the vector of all step factors, over 𝑡. When 𝑡=1, 𝑄𝑖 (𝑡) = 0, for all 𝑖, due to the zero
r(𝑡) is the vector of rewards obtained at spectrum access initialization. Now, we assume that 𝑄𝑚1 (𝑡) ≤ 𝑄𝑚2 (𝑡) + 𝛼
period 𝑡 and Y(𝑡) is a random observation contaminated by at time slot 𝑡. At time slot 𝑡 + 1, we have
noise, i.e. 𝑄𝑚1 (𝑡 + 1) = (1 − 𝛼)𝑄𝑚1
Y(𝑡) = r(𝑡) − q(𝑡) ≤ (1 − 𝛼) (𝑄𝑚2 (𝑡) + 𝛼)
= r̄(𝑡) − q𝑠 (𝑡) + r(𝑡) − r̄(𝑡) = (1 − 𝛼)𝑄𝑚2 (𝑡) + 𝛼(1 − 𝛼)
= g(q(𝑡)) + 𝛿m(𝑡), (20) ≤ 𝑄𝑚2 (𝑡 + 1) + 𝛼, (25)
where the first equality is due to the fact that channel 𝑚1
where g(q(𝑡)) = r̄(𝑡) − q(𝑡), 𝛿m(𝑡) = r(𝑡) − r̄(𝑡) is the noise
will be jammed by the attacker at time slot 𝑡 + 1 and the first
and r̄(𝑡) is the expected reward given by
inequality is because of the assumption 𝑄𝑚1 (𝑡) ≤ 𝑄𝑚2 (𝑡)+𝛼.
⎛ ⎞
𝑄𝑖 Then, we have
𝑒 𝑇
(r̄(𝑡))𝑖 = 𝜇𝑖 ⎝1 − ∑ 𝑄𝑗
⎠. (21) 𝛼
𝑢𝑚1 (𝑡) ≤ 𝑢𝑚2 (𝑡)𝑒 𝑇 . (26)
𝑀
𝑗=1 𝑒
𝑇
Substituting (26) into the obvious inequality 𝑢𝑚1 (𝑡) +

Obviously, 𝐸 [𝛿m(𝑡)] = 0 due to the fact 𝐸 [𝑋 − 𝐸 [𝑋]], 𝑢𝑚2 (𝑡) ≤ 1, we have
thus making the observation 𝛿m𝑠 (𝑡) a Martingale difference. 1
It is well known that the convergence of the Robbins- 𝑢𝑚1 (𝑡) ≤ 𝛼 . (27)
1 + 𝑒− 𝑇
Monro algorithm can be characterized by an ordinary differ-
Therefore, the probability that the secondary user does not
ential equation (ODE). Due to the Martingale difference noise
choose channel 𝑚1 (thus not 𝛼being jammed by the attacker)
𝛿m(𝑡), we obtain the following lemma −
at any time slot is at least 𝑒 −𝑇 𝑇𝛼 . Therefore, at any time slot,
Lemma 2: With probability 1, the sequence q(𝑡) converges 1+𝑒
to some limit set of the ODE the probability that the secondary user gets its packet through
is at least
𝛼
q̇ = g(q), (22) 𝑒− 𝑇
min 𝜇𝑚 𝛼 . (28)
𝑚 1 + 𝑒− 𝑇
where
⎛ ⎞ This concludes the proof.
𝑄𝑖
𝑒𝑇
(g(q))𝑖 = 𝜇𝑖 ⎝1 − ∑ 𝑄𝑗
⎠ − 𝑄𝑖 . (23) A PPENDIX D
𝑀
𝑗=1 𝑒 P ROOF OF P ROP. 4
𝑇
Proof: The proof is similar to that of Theorem 4.1 in [2].

According to Lemma 1, there is only one stationary point
According to Prop. 1 and the definition of 𝑝˜ in Procedure 2,
for the ODE (22). Therefore, what remains to do is to prove the
we have
convergence of the ODE. To that end, we define the Lyapunov ∑t
∑t ∑ 𝑀
function 𝑉 (q) = ∥q∥2 . Then, we have 𝛼(1 − 𝛼)t−𝑡 𝑟˜𝑗 (𝑡) − ln 𝑀
𝑟𝑖 (𝑡) ≥ 𝑡=1
𝑝𝑖 (𝑡)˜ , (29)
𝛼(𝑒 − 1)
𝑑𝑉 𝑑q 𝑡=1 𝑖=1
= q𝑇 𝛾
𝑑𝑡 𝑑𝑡 ⎛ ⎛ ⎞⎞ for all 𝑗 = 1, 2, ..., 𝑀 . According to 𝑝˜𝑗 = (1 − 𝛾)𝑝𝑗 + 𝑁 and
∑𝑀
∂ ⎝ ⎝
𝑄𝑖
𝑒𝑇 (13) in Procedure 2, we have
= 𝑄𝑖 𝜇𝑖 1 − ∑ ⎠⎠ 𝑁
∂𝑡 𝑀
𝑒
𝑄𝑗
𝑇
∑ 𝑟𝑖∗𝑡 (𝑡)
𝑖=1 𝑗=1 𝑝𝑖 (𝑡)˜
𝑥𝑖 (𝑡) = 𝑝𝑖∗𝑡
𝑀 ∑ 𝑄𝑖 +𝑄𝑗 𝑝˜𝑖∗𝑡
∑ 1
𝑗∕=𝑖 𝑇 𝑒
𝑇 𝑖=1
= − 𝑄 𝑖 𝜇𝑖 ( )2 𝑟𝑖∗𝑡 (𝑡)
∑𝑀 𝑄𝑗
=
𝑖=1
𝑗=1 𝑒 𝑇
1 − 𝛾 + 𝑁 𝛾𝑝 ∗
𝑖𝑡
< 0. (24) 𝑟𝑖∗𝑡 (𝑡)
≤ 𝛾 . (30)
Since the Lyapunov function has a negative derivative with 1−𝛾+ 𝑁
respect to 𝑡, the ODE converges to the stationary point. This The subsequent argument is the same as that of Theorem
concludes the proof. 4.1 in [2].
R EFERENCES Husheng Li (S’00-M’05) received the BS and MS

degrees in electronic engineering from Tsinghua
[1] S. Anand, Z. Jin, and K. P. Subbalakshmi, “An analytical model for University, Beijing, China, in 1998 and 2000, re-
primary user emulation attacks in cognitive radio networks,” in Proc. spectively, and the Ph.D. degree in electrical engi-
IEEE International Symposium of New Frontiers in Dynamic Spectrum neering from Princeton University, Princeton, NJ, in
Access Networks (DySPAN), 2008. 2005.
[2] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, “Gambling in From 2005 to 2007, he worked as a senior engi-
a rigged casino: the adversarial multi-armed bandit problem,” in Proc. neer at Qualcomm Inc., San Diego, CA. In 2007,
36rd IEEE Annual Symposium on Foundations of Computer Science he joined the EECS department of the University of
(FOCS), 1995. Tennessee, Knoxville, TN, as an assistant professor.
[3] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, “The non- His research is mainly focused on statistical signal
stochastic multi-armed bandit problem,” SIAM J. Computing, vol. 32, processing, wireless communications, networking and smart grid. Particularly,
pp. 48–77, Jan. 2001. he is interested in applying machine learning and artificial intelligence in
[4] K. Bian and J.-M. Park, “Security vulnerabilities in IEEE 802.22,” in cognitive radio networks. Dr. Li is the recipient of the Best Paper Award of
Proc. Fourth International Wireless Internet Conference (WICON), Nov. EURASIP Journal of Wireless Communications and Networks, 2005 (together
2008. with his PhD advisor: Prof. H. V. Poor).
[5] R. Chen, J.-M. Park, and K. Bian, “Robust distributed spectrum sensing
Zhu Han (S’01-M’04-SM’09) received the B.S.
in cognitive radio networks,” in Proc. IEEE Conference on Computer
degree in electronic engineering from Tsinghua Uni-
Communications (Infocom), 2008.
versity, in 1997, and the M.S. and Ph.D. degrees in
[6] R. Chen, J.-M. Park, and J. H. Reed, “Defense against primary user
electrical engineering from the University of Mary-
emulation attacks in cognitive radio networks,” IEEE J. Sel. Areas
land, College Park, in 1999 and 2003, respectively.
Commun., vol. 26, no. 1, Jan. 2008.
From 2000 to 2002, he was an R&D Engineer
[7] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-
of JDSU, Germantown, Maryland. From 2003 to
line learning and an application to boosting,” in Proc. Second European
2006, he was a Research Associate at the Univer-
Conference on Computational Learning Theory: EuroCOLT, 1995. sity of Maryland. From 2006 to 2008, he was an
[8] Z. Jin, S. Anand, and K. P. Subbalakshmi, “Mitigting primary user em- assistant professor in Boise State University, Idaho.
ulation attacks in dynamic spectrum access networks using hypothesis Currently, he is an Assistant Professor in Electrical
testing,” ACM Mobile Computing and Commun. Rev., to appear. and Computer Engineering Department at University of Houston, Texas. In
[9] Z. Jin, S. Anand, and K. P. Subbalakshmi, “Detecting primary user June-August 2006, he was a visiting scholar in Princeton University. In May-
emulation attacks in dynamic spectrum access networks,” in Proc. IEEE August 2007, he was a visiting professor in Stanford University. In May-
International Conference on Communications (ICC), 2009. August 2008, he was a visiting professor in University of Oslo, Norway
[10] H. J. Kushner and G. G. Yin, Stochastic Approximation and Recursive and Supelec, Paris, France. In July 2009, he was a visiting professor in
Algorithms and Applications. Springer, 2003. the University of Illinois at Urbana-Champion. In June 2010, he visited the
[11] L. Lai, H. Jiang, and H. V. Poor, “Medium access in cognitive radio net- University of Avignon, France. His research interests include wireless resource
works: a competitive multi-armed bandit framework,” in Proc. Asilomar allocation and management, wireless communications and networking, game
Conference on Signals, Systems and Computers, 2008. theory, wireless multimedia, and security.
[12] H. Li, “Multi-agent Q-learning of channel selection in multi-user cog- Dr. Han is an NSF CAREER award recipient 2010. Dr. Han is an Associate
nitive radio systems: a two by two case,” in Proc. IEEE International Editor of IEEE T RANSACTIONS ON W IRELESS C OMMUNICATIONS since
Conference on Systems, Man and Cybernetics (SMC), 2009. 2010. Dr. Han was the MAC Symposium vice chair of IEEE Wireless Com-
[13] H. Li, “Multi-agent 𝑄-Learning for Aloha-like spectrum access in munications and Networking Conference, 2008. Dr. Han was the Guest Editor
cognitive radio systems,” EURASIP Journal on Wireless Communica- for Special Issue on Cooperative Networking Challenges and Applications
tions and Networking, vol. 2010, Article ID 876216, 15 pages, 2010. (IEEE J OURNAL ON S ELECTED A REAS IN C OMMUNICATIONS ) Fairness of
doi:10.1155/2010/876216. Radio Resource Management Techniques in Wireless Networks (EURASIP
[14] H. Li and Z. Han, “Dogfight in spectrum: combating primary user Journal on Wireless Communications and Networking), and Special Issue
emulation attacks in cognitive radio systems–part I: known channel on Game Theory (EURASIP Journal on Advances in Signal Processing).
statistics,” this paper has been accepted by IEEE Trans. Wireless Dr. Han is the coauthor for the papers that won the best paper awards in
Commun.. IEEE International Conference on Communications 2009 and 7th International
[15] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless
Cambridge, MA: The MIT Press, 1998. Networks (WiOpt09).

Primary User Emulation Attacks

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Primary User Emulation Attacks

Uploaded by

Copyright:

Available Formats

274 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO.

Dogfight in Spectrum: Combating

I N recent years, the security issues of cognitive radio

7: Update the 𝑄-values using (1). 𝑞𝑖 (𝑡) = ∑ 𝑄𝑗 (𝑡−1)

discuss three special cases of PUE attacks, which is illustrated 𝑄 𝑖 = 𝜇𝑖 ⎝ 1 − ∑ 𝑄𝑗

which implies that the righthand side of (8) for channel

An interesting observation is that the lower bound is 𝑒 𝑇

reward is lower bounded by 12 min𝑚 𝜇𝑚 . As will be seen

⎣ 𝑒 that in Section III. We omit the analysis due to limited length

In this section, we use numerical simulation to demonstrate

Fig. 5. Average rewards versus different temperatures with complete

0.4 1 secondary user

change. Therefore, a tradeoff needs to be achieved between

0.72 uniformly random attack

attack in the single defender case. 𝛼(𝑒 − 1) 𝑚=1 𝑟𝑚 (𝑡)𝑒 𝑇

Substituting (26) into the obvious inequality 𝑢𝑚1 (𝑡) +

Proof: The proof is similar to that of Theorem 4.1 in [2].

R EFERENCES Husheng Li (S’00-M’05) received the BS and MS

You might also like