Professional Documents
Culture Documents
AdaptiveWirelessCommunicationsMIMOChannelsandNetworks-1
AdaptiveWirelessCommunicationsMIMOChannelsandNetworks-1
AdaptiveWirelessCommunicationsMIMOChannelsandNetworks-1
Adopting a balanced mix of theory, algorithms, and practical design issues, this compre-
hensive volume explores cutting-edge applications in adaptive wireless communications,
and the implications these techniques have for future wireless network performance.
Presenting practical concerns in the context of different strands from information
theory, parameter estimation theory, array processing, and wireless communications, the
authors present a complete picture of the field. Topics covered include advanced multiple-
antenna adaptive processing, ad hoc networking, MIMO, MAC protocols, space-time
coding, cellular networks, and cognitive radio, with the significance and effects of both
internal and external interference a recurrent theme throughout.
A broad, self-contained technical introduction to all the necessary mathematics, statis-
tics, estimation theory and information theory is included, and topics are accompanied
by a range of engaging end-of-chapter problems. With solutions available online, this is
the perfect self-study resource for students of advanced wireless systems, and wireless
industry professionals.
“Great book! Fills a gap in the wireless communication textbook arena with its com-
prehensive signal-processing focus. It does a nice job of handling the breadth-vs-depth
trade-off in a topic-oriented textbook, and is perfect for beginning graduate students or
practicing engineers who want the best of both worlds: broad coverage of both old and
new topics, combined with mathematical fundamentals and detailed derivations. It pro-
vides a great single-reference launching point for readers who want to dive into wireless
communications research and development, particularly those involving multi-antenna
applications. It will become a standard prerequisite for all my graduate students.”
A. Lee Swindlehurst, University of California, Irvine
Adaptive Wireless
Communications
MIMO Channels and Networks
DANIEL W. BLISS
Arizona State University
SIDDHARTAN GOVINDASAMY
Franklin W. Olin College of Engineering, Massachusetts
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town,
Singapore, São Paulo, Delhi, Mexico City
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9781107033207
C Dan Bliss and Siddhartan Govindasamy 2013
Dan Bliss’s contributions are a work of the United States Government and
are not protected by copyright in the United States.
Printed and bound in the United Kingdom by the MPG Books Group
A catalogue record for this publication is available from the British Library
1 History 1
1.1 Development of electromagnetics 1
1.2 Early wireless communications 2
1.3 Developing communication theory 5
1.4 Television broadcast 6
1.5 Modern communications advances 7
1.5.1 Early packet-radio networks 9
1.5.2 Wireless local-area networks 10
12 2 × 2 Network 392
12.1 Introduction 392
12.2 Achievable rates of the 2 × 2 MIMO network 393
12.2.1 Single-antenna Gaussian interference channel 393
12.2.2 Achievable rates of the MIMO interference channel 397
12.3 Outer bounds of the capacity region of the Gaussian MIMO
interference channel 399
12.3.1 Outer bounds to the capacity region of the single-antenna
Gaussian interference channel 399
12.3.2 Outer bounds to the capacity region of the Gaussian
interference channel with multiple antennas 405
12.4 The 2 × 2 cognitive MIMO network 408
12.4.1 Non-cooperative primary link 409
12.4.2 Cooperative primary link 412
Problems 412
References 569
Index 589
Preface
In writing this text, we hope to achieve multiple goals. Firstly, we hope to de-
velop a textbook that is useful as a reference for graduate or a supplement
to advanced undergraduate classes investigating advanced wireless communica-
tions. These topics include adaptive antenna processing, multiple-input multiple-
output (MIMO) communications, and wireless networks. Throughout the text,
there is a recurring theme of understanding and mitigating both internal and ex-
ternal interference. In addressing these areas of investigation, we explore concepts
in information theory, estimation theory, signal processing, and implementation
issues as are applicable. We attempt to provide a development covering these
topics in a reasonably organized fashion. While not always possible, we attempt
to be consistent in notation across the text. In addition, we provide problem sets
that allow students to investigate these topics more deeply. Secondly, we attempt
to organize the topics addressed so that this text will be useful as a reference.
To the extent possible, each chapter will be reasonably self-contained, although
some familiarity with the topic area is assumed. To aid the reader, reviews of
many of the mathematical tools needed within the text are collected in Chap-
ters 2 and 3. In addition, an overview of the basics of communications theory
is provided in Chapters 4 and 5. Finally, in discussing these topics, we attempt
to address a wide range of perspectives appropriate for the serious student of
the area. Topics range from information theoretic bounds, to signal processing
approaches, to practical implementation constraints.
While there are many wonderful texts (and here we only list a subset) that
address many of the topics of wireless communications [355, 280, 287, 314, 115,
324, 251, 255, 203], networks [100, 62], signal processing [275, 297, 238, 220,
204], array processing [294, 223, 205, 312, 248, 189], MIMO communications
[247, 331, 160, 45, 22, 183, 84], information theory [68, 202, 212], estimation
theory [312, 172, 297], and the serious researcher may wish to collect many of
these texts, we hope that the particular collection and presentation of topics is
uniquely useful to the research in advanced communications.
Acknowledgments
Dan Bliss
Cambridge, MA
Acknowledgments xix
I would like to thank and remember Professor David H. Staelin, formerly of the
Massachusetts Institute of Technology for his inspiration, guidance and mentor-
ship, and in particular for introducing me to my coauthor.
I would like to thank my coauthor for his insight, mentorship and for being
the driving force behind this book.
I would also like to thank my former colleague at MIT, Danielle Hinton, in
particular for insightful discussions on multiantenna protocols. I am grateful
to my colleagues at Olin College including Brad Minch, Mark Somerville, and
Vin Manno, for their encouragement and general discussions, both technical and
non-technical. I would also like to thank my students and former students at
Olin College, in particular Yifan Sun, Annie Martin, Rachel Nancollas, Katarina
Miller, Jacob Miller, Jeff Hwang, Sean Shi, Elena Koukina, Yifei Feng, Rui Wang,
Raghu Rangan, Tom Lamar, Avinash Uttamchandani, Ashley Lloyd, Junjie Zhu,
and Chloe Egthebas for their direct and indirect contributions to this work, and
in particular for helping me refine my presentation of some of the material that
has made its way into the book.
Finally, I would like to thank Alo, Antariksh, my parents, parents-in-law, sib-
lings, and the rest of my family for their patience and tireless support.
Siddhartan Govindasamy
Natick, MA
1 History
For better or worse, wireless communications have become integrated into many
aspects of our daily lives. When communication systems work well, they almost
magically enable us to access information from distant, even remote, sources. If
one were to take a modern “smart” phone a couple of hundred years into the past,
one would notice a couple of things very quickly. First, most of the capability of
the phone would be lost because a significant portion of the phone’s capabilities
are based upon access to a communications network. Second, being burned at
the stake as a witch can make for a very bad day.
There are many texts that present the history of wireless communications in
great detail, for example in References [186, 48, 146, 304, 61]. Many of the papers
of historical interest are reprinted in Reference [348]. Because of the rich history
of wireless communications, a comprehensive discussion would require multiple
texts on each area. Here we will present an abridged introduction to the history
of wireless communications, focusing on those topics more closely aligned with
the technical topics addressed later in the text, and we will admittedly miss
numerous important contributors and events.
The early history of wireless communications covers development in basic
physics, device physics and component engineering, information theory, and sys-
tem development. Each of these aspects is important, and modern communica-
tion systems depend upon all of them. Modern research continues to develop and
refine components and information theory. Economics and politics are an impor-
tant part of the history of communications, but they are largely ignored here.
While he was probably not the first to make the observation that there is a
relationship between magnetism and electric current, the Danish physicist Hans
Christian Ørsted observed this relationship in 1820 [239] and ignited investiga-
tion across Europe. Most famously, he demonstrated that current flowing in a
wire would cause a compass to change directions. Partly motivated by Ørsted’s
results, the English physicist and chemist Michael Faraday made significant ad-
vancements in the experimental understanding of electromagnetics [304] in the
early 1800s. Importantly for our purposes, he showed that changing current in
2 History
one coil could induce current in another remote coil. While this inductive cou-
pling is not the same as the electromagnetic waves used in most modern wire-
less communications, it is the first step down that path. The Scottish physicist
James Clerk Maxwell made amazing and rich contributions to a number of areas
of physics. Because of his contributions in the area of electromagnetics [211],
the fundamental description of electromagnetics bears his name. While Maxwell
might not immediately recognize them in this form, Maxwell’s equations in in-
ternational system of units (“SI”) [290, 178] are the fundamental representation
of electromagnetics and are given by
∇·d=ρ
∇·b=0
∂b
∇×e=−
∂t
∂d
∇×h=j+ , (1.1)
∂t
where ∇ indicates a vector of spatial derivatives, · is the inner product, × is the
cross product, t is time, ρ is the charge density, j is the current density vector, d
is the electric displacement vector, e is the electric field vector, b is the magnetic
flux density vector, and h is the magnetic field vector. The electric displacement
and electric field are related by
e = d
b = μh, (1.2)
where is the permittivity and μ is the permeability of the medium. These are the
underpinnings of all electromagnetic waves and thus modern communications.
In 1888, the German physicist Heinrich Rudolf Hertz convincingly demonstrated
the existence of the electromagnetic waves predicted by Maxwell [144, 178]. To
demonstrate the electromagnetic waves, he employed a spark-gap transmitter.
At the receiver, the electromagnetic waves coupled into a loop with a very small
gap across which a spark would appear. The spark-gap transmitter with vari-
ous modifications was a standard tool for wireless communications research and
systems for a number of following decades.
In the late 1800s, significant and rapid advances were made. Given the prolifera-
tion of wireless technologies and the penetration of these technologies into every
area of our lives, it is remarkable that before the late 1800s little was known
about even the physics of electromagnetics. Over the years, there have been var-
ious debates over the primacy of the invention of wireless communications. Who
invented wireless communications often comes down to a question of semantics.
How many of the components do you need before you call it a radio? As is often
1.2 Early wireless communications 3
true in science and engineering, it is clear that a large number of individuals per-
formed research in the area of wireless communications or, as it was often called,
wireless telegraphy. The following is an incomplete list of important contributors.
In 1872, before Hertz’s demonstration, a patent was issued to the American1
inventor and dentist Mahlon Loomis for wireless telegraphy [193]. While his
system reportedly worked with some apparent reliability issues, his contributions
were not widely accepted during his life. This lack of acceptance was likely partly
due to his inability to place his results in the scientific context of the time.
In 1886, American physicist Amos Emerson Dolbear, received a patent for a
wireless communication system [82]. This patent later became a barrier to the
enforcement of Guglielmo Marconi’s patents on wireless communications in the
United States, until the Marconi Company purchased Dolbear’s patent. It is
worth noting this demonstration was also before Hertz’s demonstration.
In 1890, the French physicist Edouard Eugene Desire Branly developed an
important device used to detect electromagnetic waves. The so-called “coherer”
employed a tube containing metal filings filling a gap between two electrodes
and exploited a peculiar phenomenon of these filings [279]. When exposed to
radio-frequency signals, the filings would fuse or cling together, thus reducing the
resistance across the electrodes. British physicist Sir Oliver Joseph Lodge refined
the coherer by adding a “trembler” or “decoherer” that mechanically disrupted
the fused connections. Many of the early experiments in wireless communications
employed variants of the coherer.
The Serbian-born, American engineer Nicola Tesla was one of those larger-
than-life characters. He made significant contributions to a number of areas of
engineering, but with regard to our interests, he received a patent for wireless
transmission of power in 1890 [309] and demonstrated electromagnetic trans-
fer of energy in 1893 [310]. Tesla is rightfully considered one of the significant
contributors to the invention of wireless communications.
Bengal-born Indian scientist Jagdish Chandra Bose contributed significantly
to a number of areas of science and engineering. He was one of the early re-
searchers in wireless communication and developed an improved coherer. In 1885,
he demonstrated radio communication with a rather dramatic flair [107]. By us-
ing a wireless communication link, he remotely set off a small explosive that
rang a bell. His improved coherer was a significant contribution to wireless com-
munications. His version of the coherer replaced the metal filings with a metal
electrode in contact with a thin layer of oil that was floating on a small pool of
mercury. When exposed to radio-frequency signals, the conductivity across the
oil film would change. Marconi used a similar coherer for his radio system.
The German physicist Karl Ferdinand Braun developed a number of important
technologies that contributed to the usefulness of wireless communication. He
developed tuned circuits for radio systems, the cat’s whisker detector (really an
early diode), and directional antenna arrays. In 1909, he shared the Nobel Prize
in physics with Guglielmo Marconi for his contributions.
The Russian physicist Alexander Stepanovich Popov presented results on his
version of a coherer to the Russian Physical and Chemical Society on May 7th,
1895 [304]. He demonstrated links that transmitted radio waves between build-
ings. As an indication of the importance of this technology to society, in the
Russian Federation, May 7th is celebrated as Radio Day.
The Italian engineer Guglielmo Marconi, began research in wireless commu-
nications in 1895 [304] and pursued a sustained, intense, and eventually well-
funded research and development program for many years to follow. He received
the Nobel Prize in physics (with Karl Ferdinand Braun) in 1909 for his contri-
butions to the development of wireless radios [304]. While he is not the inventor
of radio, as is sometimes suggested, his position as principal developer cannot
be dismissed. His research, development, and resulting company provided the
impetus to the commercialization of wireless communications. In 1896 Marconi
moved to England, and during that and the following year he provided a number
of demonstrations of the technology. In 1901, he demonstrated a transatlantic
wireless link, and in 1907 he established a regular transatlantic radio service.
A somewhat amusing (or annoying if you were Marconi) public demonstration
of the effects of potential interference in wireless communications was provided
in 1903 by British magician and inventor Nevil Maskelyne [146]. Maskelyne was
annoyed with Marconi’s broad patents and his claims of security in his wireless
system. During a public demonstration of Marconi’s system for the Royal In-
stitution, Maskelyne repeatedly transmitted the Morse code signal “rats” and
other insulting comments which were received at the demonstration of the sys-
tem, which was supposedly immune to such interference. Previously, in 1902,
Maskelyne had developed a signal interception system that was used to receive
signals from Marconi’s ship-to-shore wireless system. Marconi had claimed his
system was immune to such interception because of the precise frequency tuning
required for reception.
In the first few decades of the twentieth century, wireless communication
quickly evolved from a technical curiosity to useful technology. An important
technology that enabled widespread use of wireless communication was amplifi-
cation. The triode vacuum-tube amplifier was developed by American engineer
Lee de Forest. He filed a patent for the triode (originally called the de For-
est valve) in 1907 [95]. The triode enabled increased power at transmitters and
increased sensitivity at receivers. It was the fundamental technology until the
development of the transistor decades later.
In the late 1910s, a number of experimental radio broadcast stations were
constructed [304]. In the early 1920s, the number of radio broadcast stations
exploded, and wireless communications began its integration into everyday life.
During the Second World War, the concept of tactical communications under-
went dramatic development. The radios became small enough and sufficiently
robust that a single soldier could carry them. It became common for various
1.3 Developing communication theory 5
techniques that dominate signal processing to this day. Addressing a similar set of
technical issues, prolific Russian mathematician Andrey Nikolaevich Kolmogorov
published his results in 1941 [176].
A frequency-hopping modulation enables a narrowband system to operate over
a wider bandwidth by changing the carrier frequency as a function of time. A
variety of versions of frequency hopping were suggested over time, and the iden-
tity of original developer of frequency hopping is probably lost because of the
secrecy surrounding this modulation approach. However, in what must be consid-
ered a relative unexpected source of contribution to communication modulation
technology, a frequency hopping patent, was given to Austrian-born American
actress Hedy Lamarr (filed as Hedy Kiesler Markey) and American composer
George Antheil [208]. The technology exploited a piano roll as a key to select
carrier frequencies of a frequency-hopping system.
As opposed to frequency hopping, direct-sequence spread spectrum (DSSS)
modulates a relatively narrowband signal with a wideband sequence. The re-
ceiver, knowing this sequence, is able to recover the original narrowband sig-
nal. This technology is exploited by code-division multiple-access (CDMA) ap-
proaches to enable the receiver to disentangle the signals sent from multiple users
at the same time and frequency. The origins of direct-sequence spread spectrum
are partly a question of semantics. An early German patent was given to Ger-
man engineers Paul Kotowski and Kurt Dannehl for a communications approach
that modulated voice with a rotating generator [278]. This approach has a loose
similarity to the digital spreading techniques used by modern communication
systems. In the early 1950s, for direct-sequence spread-spectrum communica-
tions, the noise modulation and correlation (NOMAC) system was developed
and demonstrated at Massachusetts Institute of Technology Lincoln Laboratory
[338]. In 1952, the first tests of the communication system were performed. The
system drew heavily from the doctoral dissertation of American engineer Paul
Eliot Green, Jr. [338], who was one of the significant contributors to the NO-
MAC system at Lincoln Laboratory. Because direct-sequence spread-spectrum
systems are spread over a relatively wide bandwidth, they can temporally re-
solve multipath more easily. Consequently, the received signal can suffer from
intersymbol interference. To compensate for this effect, in 1958, the concept of
the rake receiver, developed by American engineers Robert Price and Paul Eliot
Green, Jr. [254, 338], implemented channel equalization. During late 1950s, the
ARC-50 radio was designed and tested [278]. Magnavox’s ARC-50 was an oper-
ational radio that is recognizable as a modern direct-sequence spread-spectrum
system.
While television technology is not a focus of this text, its importance in the
development of wireless technology cannot be ignored. Given the initial success
1.5 Modern communications advances 7
of wireless data and then voice radio communications, it didn’t take long for
researchers to investigate the transmission of images. Because of the significant
increase in the amount of information in a video image compared to voice, it
took decades for a viable system to be developed. Early systems often involved
mechanically scanning devices.
German engineers Max Dieckmann and Rudolf Hell patented [81] an electri-
cally scanning tube receiver that is recognizable in concept to televisions used
for the following seventy years. Apparently, they had difficulty developing their
concept to the point of demonstration.
Largely self-taught, American engineer Philo Taylor Farnsworth, developed
concepts for the first electronically scanning television receiver (“image dissec-
tor”) that he conceived as a teenager and for which he filed a patent several
years later in 1927 [91]. In 1927, he also demonstrated the effectiveness of his
approach.
During a similar period of time, while working for Westinghouse Laboratories,
Russian-born American engineer Vladimir K. Zworykin filed a patent in 1923
for his version of a tube-based receiver [365]. However, the U.S. Patent Office
awarded primacy of the technology to Farnsworth. In 1939, RCA, the company
for which Zworykin worked, demonstrated a television at the New York World’s
Fair. Regular broadcasts soon began; these are often cited as the beginning of
the modern television broadcast era.
transmitter with the weakest propagation. The most basic concept for an effec-
tive space-time code is the Alamouti block code. This code, patented by Iranian-
born American engineers Siavash M. Alamouti and Vahid Tarokh [7], is described
in Reference [8]. Tarokh and his colleagues extended these concepts to include
larger space-time block codes [305] and space-time trellis codes [307].
Ethernet at low cost perhaps reduced the interest in developing wireless net-
working technologies for commercial use.
Interest in wireless networks for commercial use increased after the U.S. Fed-
eral Communications Consortium (FCC) established the industrial, scientific,
and medical (ISM) frequency bands for unlicensed use in the United States in
1985. The ISM bands are defined in Section 15.247 of the Federal Communica-
tions Consortium rules. The freeing of a portion of the electromagnetic spectrum
for unlicensed use sparked a renewed interest in developing wireless networking
protocols [103].
Other major developments in the late 1980s and 1990s that increased interest
in wireless networks were the increased use of portable computers, the inter-
net, and significant reduction in hardware costs. Since portable computer users
wanted to access the internet and remain portable, wireless networking became
essential.
At some point between the years 2000 and 2010, a rather significant change
occurred in the use of wireless communications. The dominant use of wireless
communication links transitioned from broadcast systems such as radio or televi-
sion to two-way, personal-use links such as mobile phones or WiFi. At that point,
it became considered strange to not be in continuous wireless contact with the
web. Not only did this change our relationship with information, possibly funda-
mentally altering the nature of the human condition, but it also changed forever
the nature of trivia arguments held in bars and pubs around the world.
2 Notational and mathematical
preliminaries
2.1 Notation
2.1.2 Scalars
A scalar is indicated by a non-bold letter such as a or A. Scalars can be integer
Z, real R, or complex numbers C:
a ∈ Z,
a ∈ R , or
a ∈ C, (2.2)
respectively.
The square root of −1 is indicated by i,
√
−1 = i . (2.3)
The Euler formula for an exponential for some real angle α ∈ R in terms of
radians is given by
a = ρ eiα
= ρ cos(α) + i ρ sin(α), (2.5)
{a} = ρ cos(α)
{a} = ρ sin(α) , (2.6)
a∗ = (ρ ei α )∗ = ρ e−i α . (2.7)
i = ei π /2+i 2π m ∀ m ∈ Z . (2.8)
such that
under the assumption that a and y are real. When the base is not explicitly
indicated,2 it is assumed that a natural logarithm (base e) is indicated such that
Note that the conjugation of the order is switched between the vector notation
and the bracket notation, such that
∗
a† b = a, b . (2.39)
This switch is performed to be consistent with standard conventions. When using
the phrase “inner product,” we will use both forms interchangeably. Hopefully,
the appropriate conjugation will be clear from context.
2.1 Notation 17
While we will not be particularly concerned about the technical details, the
higher-dimensional space in which the inner products are operating is sometimes
referred to as a Hilbert space. This space can be extended to an infinite dimen-
sional space. For example, a vector a can be indexed by the variable x,
a → fa (x) , (2.40)
where the function is defined along the axis x. Inner products in this complex
infinite-dimensional space are given by integrating over the indexing parameter.
In this case it is x. The complex infinite-dimensional inner product between
functions f (x) and g(x) that represent two infinite-dimensional vectors is denoted
f (x), g(x) = dx f (x) g ∗(x) . (2.41)
With this form, a useful inequality can be expressed. The Cauchy–Schwarz in-
equality is given by
2
f (x), g(x) ≤ f (x), f (x) g(x), g(x) . (2.42)
a b† . (2.44)
C = AB
{C}m ,n = Am ,k Bk ,n . (2.45)
k
C=A B
{C}m ,n = {A B}m ,n
= Am ,n Bm ,n . (2.46)
18 Notational and mathematical preliminaries
C=A ⊗ B
⎛ ⎞
{A}1,1 B {A}1,2 B {A}1,3 B ...
⎜ {A}2,1 B {A}2,2 B {A}2,3 B ⎟
⎜ ⎟
= ⎜ {A} B {A} B ⎟. (2.47)
⎝ 3,1 3,2 ⎠
..
.
(A ⊗ B)T = AT ⊗ BT (2.48)
∗ ∗ ∗
(A ⊗ B) = A ⊗ B (2.49)
† † †
(A ⊗ B) = A ⊗ B (2.50)
−1 −1 −1
(A ⊗ B) =A ⊗B , (2.51)
where it is assumed that A and B are not singular for the last relationship. The
Kronecker product obeys distributive and associative properties,
(A + B) ⊗ C = A ⊗ C + B ⊗ C (2.52)
(A ⊗ B) ⊗ C = A ⊗ (B ⊗ C) . (2.53)
where the trace and determinant are defined in Section 2.2. Note that the ex-
ponents M and N are for the size of the opposing matrix. The vector operation
and Kronecker product are related by
vec(a bT ) = b ⊗ a (2.57)
vec(A B C) = (C ⊗ A) vec(B) .
T
(2.58)
If the dimensions of A and B are the same and the dimensions of C and D are
the same, then the Hadamard and Kronecker products are related by
(A B) ⊗ (C D) = (A ⊗ C) (B ⊗ D) . (2.59)
2.2 Norms, traces, and determinants 19
2.2.1 Norm
The absolute value of a scalar and the L2-norm of a vector are indicated by
either · or · 2 . We reserve the notation |.| exclusively for the determinant of
a matrix. The absolute value of a scalar a is thus a , and the norm of a vector
a is denoted as follows:
√
a = (a)m 2 = a† a . (2.60)
m
2.2.2 Trace
The trace of a square matrix M ∈ Cm ×m of size m is the sum of its diagonal
elements and is indicated by
tr{M} = (M)m ,m . (2.63)
m
The trace of a matrix is invariant under a change of bases. The product of two
matrices commutes under the trace operation,
tr{A B} = tr{B A} .
This property can be extended to the product of three (or more) matrices such
that
tr{A B C} = tr{C A B} = tr{B C A} . (2.64)
2.2.3 Determinants
The determinant of a square matrix A is indicated by
|A| = (A)m ,n (−1)m +n |Mm ,n | , (2.65)
n
20 Notational and mathematical preliminaries
|c M| = cm |M| . (2.68)
|I| = 1 , (2.69)
|U| = 1 . (2.70)
|U A U† | = |U A| |U† |
= |U† U A|
= |A| . (2.71)
The product of matrices plus the identity matrix commute under the
determinant,
|I + A B| = |I + B A| , (2.72)
A = B† B (2.75)
|A| = |B|∗ |B| = |B| 2
≤ bm 2
= {A}m ,m , (2.76)
m m
M v m = λ m vm . (2.77)
While, in general, for some square matrices A and B the eigenvalues of the sum
do not equal the sum of the eigenvalues
λm {I + A} = 1 + λm {A} . (2.81)
tr{M} = λ1 + λ2
= a + b. (2.83)
|M| = λ1 λ2
= ab − c 2
. (2.84)
By combining these two results, the eigenvalues can be explicitly found. The
eigenvalues are given by
λ1 + λ 2 = a + b
λ21 + λ1 λ2 = (a + b)λ1
0 = λ2 − (a + b)λ + ab − c 2
a + b ± (a + b)2 − 4(ab − c 2 )
λ=
2
a + b ± (a − b)2 + 4 c 2
= . (2.85)
2
As will be discussed in Section 2.3.3, Hermitian matrices constructed from quadratic
forms are positive-semidefinite.
Q = U S V† , (2.86)
2.3 Matrix decompositions 23
Q Q† = U S V† V S† U† = U S S† U† , (2.88)
λm {QQ† } = (S S† )m ,m ≥ 0 . (2.89)
where #{·} is used to indicate the number of entries that satisfy the condition.
2.3.4 QR decomposition
Another common matrix decomposition is the QR factorization. In this decom-
position, some matrix M is factored into a unitary matrix Q and an upper
right-hand triangular matrix R, where an upper right-hand triangular matrix
24 Notational and mathematical preliminaries
M = QR. (2.94)
where the upper triangular matrix has dimensions R ∈ Cn ×n , and the zero
matrix 0 has dimensions (m − n) × n.
M = MA + MA ⊥ . (2.96)
span (A)
MA = PA M . (2.98)
rank{MA } ≤ m . (2.99)
P⊥
A = I − PA
= I − A (A† A)−1 A† . (2.100)
We define the matrix MA ⊥ to be the matrix projected onto the basis orthogonal
to A, MA ⊥ = P⊥
A M. Consequently, the matrix M can be decomposed into the
matrices
M = IM
= (PA + P⊥
A) M
= MA + MA ⊥ . (2.101)
To illustrate, consider Figure 2.1. The projection matrix PA projects the vec-
tor v onto a subspace that is spanned by the columns of the matrix A which
is illustrated by the shaded region. The projected vector is illustrated by the
dashed arrow. The associated orthogonal projection P⊥ A projects the vector v
onto the subspace orthogonal to that spanned by the columns of A resulting in
the vector illustrated by the dotted arrow.
26 Notational and mathematical preliminaries
Rank-1 matrix
For example, a rank-1 square matrix M is constructed by using complex n-vectors
v ∈ Cn ×1 and w ∈ Cn ×1 ,
M = v w† . (2.104)
2.5 Matrix inversion 27
Rank-2 matrix
A Hermitian rank-2 matrix M can be constructed by using two n-vectors x ∈
Cn ×1 and y ∈ Cn ×1 ,
M = xx† + yy† . (2.105)
The eigenvalues can be found by using the hypothesis that the eigenvector is
proportional to x + ay where a is some undetermined constant. The nonzero
eigenvalues of M are given by λ+ and λ− ,
2
x 2 + y 2 ± ( x 2 − y 2 ) + 4 x† y 2
λ± {M} = . (2.106)
2
For a square nonsingular matrix, that is a matrix with all nonzero eigenvalues,
so that |M| = 0, the matrix inverse of M satisfies
M−1 M = M M−1 = I . (2.107)
The inverse of the product of nonsingular square matrices is given by
(A B)−1 = B−1 A−1 . (2.108)
The inverse and the Hermitian operations as well as the transpose operations
commute,
(M† )−1 = (M−1 )† and (MT )−1 = (M−1 )T . (2.109)
The SVD, discussed in Section 2.3.3, of the inverse of a matrix is given by
−1
M−1 = U D V†
= V D−1 U† . (2.110)
It is often convenient to consider 2 × 2 matrices. Their inverse is given by
−1
a b 1 d −b
= . (2.111)
c d ad − bc −c a
The general inverse of a partitioned matrix is given by
−1
A B (A − B D−1 C)−1 −A−1 B(D − C A−1 B)−1
= .
C D −D−1 C (A − B D−1 C)−1 (D − C A−1 A)−1
(2.112)
28 Notational and mathematical preliminaries
λm {I + M} = 1 + λm {M} , (2.121)
where the mth eigenvalue of a matrix is indicated by λm {·}, the capacity expres-
sion in Equation (2.120) is equal to
c = log2 (1 + λm {M})
m
= log2 (1 + λm {M})
m
= log2 (e) log (1 + λm {M})
m
≈ log2 (e) λm {M} = log2 (e) tr{M} , (2.122)
m
and
∂ ∂
M = {M}m ,n , (2.125)
∂α m ,n ∂α
respectively.
A few useful expressions follow [217]. Under the assumption that the complex
vectors z and A are functions of α, the derivative of the quadratic form z† A z
with respect to the real parameter α is given by
∂ † ∂ † † ∂ ∂
z Az = z Az + z A z + z† A z. (2.126)
∂α ∂α ∂α ∂α
The derivative for the complex invertible matrix M with respect to real param-
eter α can be found by considering the derivative of ∂/∂α(MM−1 ) = 0, and it
is given by
∂ ∂
M−1 = −M−1 M M−1 . (2.127)
∂α ∂α
The derivatives of the determinant and the log determinant of a nonsingular
matrix M, with respect to real parameter α are given by
∂ −1 ∂
|M| = |M| tr M M (2.128)
∂α ∂α
and
∂ −1 ∂
log |M| = tr M M . (2.129)
∂α ∂α
The derivative of the trace of a matrix is equal to the trace of the derivative of
the matrix,
∂ ∂
tr{M} = tr M . (2.130)
∂α ∂α
For a real column vector x of size N , the derivative of a scalar function f (x)
with respect to a real column vector x of length N is defined to be a row vector
given by
∂ ∂ ∂ ∂
f (x) = f (x) f (x) ··· f (x) . (2.131)
∂x ∂{x}1 ∂{x}2 ∂{x}N
This is the typical, but not the only, convention possible.
Under certain circumstances, it is convenient to use the gradient operator that
produces a vector or matrix of the same dimension as the object with which the
derivative is taken,
⎛ ∂
⎞
∂ {x}1 f (x)
⎜ ∂ ⎟
⎜ ∂ {x}2 f (x) ⎟
∇x f (x) = ⎜⎜ ⎟, (2.132)
.. ⎟
⎝ . ⎠
∂
∂ {x}N f (x)
where the scalar function is indicated by f (·), and the gradient is with respect
to matrix x ∈ RN ×1 , and
⎛ ⎞
∂
∂ {A}1 , 1 f (A)
∂
∂ {A}1 , 2 f (A) ··· ∂
∂ {A}1 , N f (A)
⎜ ∂ ∂
··· ∂
f (A) ⎟
⎜ ∂ {A}2 , 1 f (A) ∂ {A}2 , 2 f (A) ∂ {A}2 , N ⎟
⎜
∇A f (A) = ⎜ ⎟ , (2.133)
.. ⎟
⎝ . ⎠
∂
∂ {A}M , 1 f (A)
∂
∂ {A}M , 2 f (A) ··· ∂
∂ {A}M , N f (A)
where the scalar function is indicated by f (·), and the gradient is with respect
to matrix A ∈ RM ×N .
The Laplacian operator [11] is given by
∇x2 f (x) = ∇x · ∇x f (x) . (2.134)
Note that the term “Laplacian” can be used to describe several different quan-
tities or operators. In the context of this book, in particular in Chapters 13 and
14, we also make reference to the Laplacian of a random variable, which is the
Laplace transform of the probability density function of the random variable.
In a Euclidean coordinate system, the Laplacian operator is given by
N
∂ 2 f (x)
∇x2 f (x) = . (2.135)
m =1
∂{x}2m
and
⎛ ⎞ ⎛ ⎞
eT1 A b1
∂ ⎜ T ⎟ ⎜ b2 ⎟
T
xT A = ⎝ e2 A ⎠ = ⎝ ⎠
∂x .. ..
. .
= A. (2.146)
Another common expression is the quadratic form xT Ax. The derivative of the
quadratic form with respect to x is given by
∂ T
x A x = eT1 A x eT2 A x eT3 A x · · ·
∂x
+ xT A e1 xT A e2 xT A e3 · · ·
= xT AT + xT A = xT (A + AT ) . (2.147)
where x and y are real variables, the derivative of f with respect to z is given by
df f (z) − f (z0 )
= lim
dz z →z 0 z − z0
[u(x, y) − u(x0 , y0 )] + i[u(x, y) − u(x0 , y0 )]
= lim . (2.150)
x→x 0 ,y →y 0 [x − x0 ] + i[y − y0 ]
Because the path to z0 cannot matter for holomorphic functions, there is freedom
to approach the point at which the derivative is evaluated by moving along x or
along y. Consequently, the derivative can be expressed by
df [u(x, y0 ) − u(x0 , y0 )] + i[u(x, y0 ) − u(x0 , y0 )]
= lim
dz x→x 0 [x − x0 ]
z = z
∂u ∂v 0
= +i . (2.151)
∂x ∂x
With equal validity, the derivative can be taken along y, so that
df [u(x0 , y) − u(x0 , y0 )] + i[u(x0 , y) − u(x0 , y0 )]
= lim
dz y →y 0 i[y − y0 ]
z =z 0
1 ∂u ∂v
= +i . (2.152)
i ∂y ∂y
In order for the derivative to be independent of direction, the real and imaginary
components of the derivative must be consistent, so the holomorphic function
2.8 Complex derivatives 35
must satisfy
∂u ∂v
=
∂x ∂y
∂u ∂v
=− . (2.153)
∂y ∂x
These relationships are referred to as the Cauchy–Riemann equations.
g(z) = z 2
= z z∗
= (x + iy) (x − iy)
= x2 + y 2
u(x, y) = x2 + y 2 , v(x, y) = 0 . (2.154)
With this notation, a new set of real variables ζ and ζ can be constructed with
a transformation that is proportional to a rotation in the complex plane,
ζ 1 i x
=
ζ 1 −i y
x + iy
= . (2.157)
x − iy
Consequently, the real variables {ζ, ζ} can be directly related to the complex
variable z and its complex conjugate z ∗ . The real components of z can be found
36 Notational and mathematical preliminaries
−1
x 1 i ζ
=
y 1 −i ζ
1 1 1 ζ
= . (2.158)
2 i −i ζ
and
df 1 ∂f ∂f
= +i , (2.160)
dζ 2 ∂x ∂y
where the terms z and z ∗ in the expression f are replaced with the complex
doppelgangers ζ and ζ, respectively. It is worth stressing that the complex dop-
pelgangers are not complex variables. If great care is taken, one can use the
notation in which z and z ∗ are used as the complex doppelgangers directly. It
is probably clear that this approach is ripe for potential confusion because ·∗ is
both an operator and an indicator of an alternate variable. Furthermore, in us-
ing the Wirtinger calculus, we take advantage of underlying symmetries. While
taking a derivative with respect to a single doppelganger variable may be useful
for finding a stationary point (as evaluated in Equation (2.170)), it is not the
complete derivative. As an example, when the value of the gradient is of interest,
typically the full gradient with both derivatives is necessary.
This derivative form [5, 262, 172] is sometimes referred to as Wirtinger cal-
culus, or complex-real (CR) calculus. Given that this is just a derivative under
a change of variables, it is probably unnecessary to give the approach a name.
However, for notational convenience within this text, this approach is referenced
as the Wirtinger calculus. It is worth noting that this definition is not unique.
As an aside, the Cauchy–Riemann equations can be expressed by taking the
derivative with respect to the “conjugate” doppelganger variable,
∂f
= 0. (2.161)
∂ζ
3 The term “doppelgangers” has not been in common use previously. It is used here to stress
the difference between the complex variable and its conjugate, and two real variables used
in their place.
2.8 Complex derivatives 37
∂ ∂
g(x, y) = 0 and g(x, y) = 0 . (2.162)
∂x ∂y
When one is searching for a stationary point of a real function with complex
parameter z, it is useful to “rotate” the independent real variables {x, y} into
the space of the doppelganger complex variables {ζ, ζ}. The function of the
complex variable z is given by
∂z ∂z ∗
= ∗ =1
∂z ∂z
∂z ∂z ∗
= = 0. (2.164)
∂z ∗ ∂z
For example, under Wirtinger calculus, the derivatives with respect to the dop-
pelganger variables of the expression z 3 z ∗ 2 are given by
∂ 3 ∗2
z z = 3z 2 z ∗ 2 (2.165)
∂z
and
∂ 3 ∗2
z z = 2z 3 z ∗ . (2.166)
∂z ∗
This result is somewhat nonintuitive if you consider the meaning of z and z ∗ .
However, by remembering that here z and z ∗ represent real doppelganger vari-
ables, it is slightly less disconcerting. In particular, the stationary point of a
real function of complex variables expressed in terms of the conjugate variables
g̃(z, z ∗ ) satisfies
∂
g̃(z, z ∗ ) = 0 (2.167)
∂z
and
∂
g̃(z, z ∗ ) = 0 . (2.168)
∂z ∗
38 Notational and mathematical preliminaries
A case of particular interest is if f (z, z ∗ ) is real valued. In this case, the deriva-
tives with respect to z and z ∗ will produce the same stationary point,
∂f 1 ∂f ∂f
= + i
∂z ∗ 2 ∂x ∂y
∗
1 ∂f ∂f
= −i
2 ∂x ∂y
∗
∂f
= . (2.169)
∂z
In other words, the relationships
∂f ∂f
= 0 and =0 (2.170)
∂z ∗ ∂z
produce the same solution for z.
∂z ⎜ .
.. ⎟
⎝ ⎠
∗ ∗ ∗
∂
∂ {z}1 {f (z, z )}N
∂
∂ {z}2 {f (z, z )}N · · · ∂
∂ {z}M {f (z, z )}N
(2.171)
Similarly, the derivative with respect to z∗ is given by
⎛ ∗ ∗
⎞
∂ {z ∗ }1 {f (z, z )}1
∂
∂ {z ∗ }2 {f (z, z )}1
∂
··· ∂
∂ {z ∗ }M {f (z, z∗ )}1
⎜ ∗ ∗
{f (z, z∗ )}2 ⎟
⎜ ∂ {z ∗ }1 {f (z, z )}2 ∂ {z ∗ }2 {f (z, z )}2 ···
∂ ∂ ∂
∂ ∂ {z ∗ }M ⎟
f (z, z∗
) = ⎜ ⎟.
∂z∗ ⎜ .. ⎟
⎝ . ⎠
∗
∂
∗
∂ {z }1 {f (z, z )}N
∂
∗
∂ {z }2 {f (z, z∗ )}N ··· ∂
∂ {z ∗ }M {f (z, z∗ )}N
(2.172)
By using Wirtinger calculus, the differential of f is given by
∂ ∂
df (z, z∗ ) = f (z, z∗ ) dz + ∗ f (z, z∗ ) dz∗ . (2.173)
∂z ∂z
constructed from real vectors x and y is most clearly defined by its derivation
in the real space, where the real gradient was discussed in Section 2.7.1. For
some real function f (z), the gradient of f (z) = f (x, y) is probably clearest when
expressed by building a vector from stacking x and y,
x
v= , (2.174)
y
so that the gradient is given by
∇v f (x, y) . (2.175)
This gradient can be remapped into a complex gradient by expressing the com-
ponents associated with y as being imaginary,
∇x f (x, y) + i ∇y f (x, y) . (2.176)
Some care needs to be taken in using this form because it can be misleading.
To evaluate a complex gradient, it is sometime useful to evaluate it by using
Wirtinger calculus. The problem with using the Wirtinger calculus to describe
the gradient is that it is not a complete description of the direction of maximum
change. There is some confusion in the literature in how to deal with this issue
[47]. Here we will first employ an explicit real gradient as the reference. Second,
we will abuse the notation of gradient slightly by defining a complete gradient
of a real function as being different from the Wirtinger gradient.
As an example, consider the function
f (z) = z† z
= xT x + y T y . (2.177)
The gradient is given by
∇x (xT x + yT y) + i ∇y (xT x + yT y) = 2x + i 2y . (2.178)
By interpreting Equations (2.159) and (2.160) as gradients, the complete gradient
of a real function in terms of the Wirtinger calculus is given by
∇x f (x, y) + i ∇y f (x, y) = 2 ∇z ∗ f (z) . (2.179)
For the above example, the complete gradient is then given by
∇x (z† z) + i ∇y (z† z) = 2∇z ∗ (z† z)
= 2z. (2.180)
For signal processing applications, two types of integral are used commonly:
contour integrals and volume integrals. A volume may be a simple area as in the
case of a single complex variable, or a hypervolume in the case of a vector space
of complex variables.
40 Notational and mathematical preliminaries
If the path is closed (forming a loop), then the term contour integral is often used
[54, 180, 40]. In contour integration, a particularly useful set of tools is available
if the function is differentiable, which is identified as a holomorphic or analytic
function with some countable number of poles. A pole is a point in the space of z
at which the function’s value goes to ± infinity. The integrals are often the result
of evaluating transformations of functions. If the path S forms a loop, then the
path is said to be closed and the notation is given by
dz f (z) . (2.182)
S
In order to deform an integral past a pole, a residue is left. The integral is then
given by the sum of these residues created by deforming past the poles located
at am enclosed within the original path,
dz f (z) = 2πi Resa m {f (z)} , (2.186)
S m
2.9 Integration over complex variables 41
where Resa m {f (z)} indicates the residue located at the mth enclosed pole located
at am , of the function f (z). In general, f (z) can be expressed in terms of a
Laurent series [53] about the mth pole am
∞
f (z) = bn (z − am )n , (2.187)
n = −∞
+i
−i
Figure 2.2 Contour of integration using the upper half plane with poles at ±i.
when ω > 0. Similarly, if ω < 0, the lower path encloses the pole at z = −i, so
φ = π e−ω . (2.195)
We have addressed the cases of poles enclosed by a path and poles outside
a path. In the case in which a path is constrained such that a pole is on the
path, the residue evaluates to 1/2 of that if the pole were enclosed. Because
of the potential subtleties involved in evaluating contour integrals, consulting a
complex analysis reference is recommended [53].
Here the notation is borrowed from the Wirtinger calculus for dz and dz ∗ formally
acting as independent variables. These integrals are often the result of evaluating
probabilities. Often the slightly lazy notation of f (z) is used rather than f (z, z ∗ ).
Notationally, this form can be extended to complex n-vector space z ∈ Cn ×1
using the notation
dn z dn z ∗ f (z) , (2.197)
where dn z and dn z ∗ are shorthand notation for d{z}1 , d{z}2 . . . d{z}n and
d{z}∗1 , d{z}∗2 . . . d{z}∗n .
In general, the integrals need to be converted to the real space of x and y for
evaluation. When convenient, the notation
d2 z = dx dy (2.198)
2.9 Integration over complex variables 43
is employed. Also used is the notation dΩz to indicate the differential hypervol-
ume over the real and imaginary components of z, given by
∗ 2 e− σ 2
φ= dz dz z , (2.207)
2πσ 2
is given by
x2+y2
2 e− σ 22
φ = dx dy (x + y )
πσ 2
− σ2x2+y2 2
− x σ+2y
2
2 e 2 e
= dx dy x + dx dy y
πσ 2 πσ 2
x2 y2
e− σ 2 e− σ 2
= dx x2 √ + dy y 2 √
πσ 2 πσ 2
√ 3/2 √ 3/2
πσ πσ
= √ + √ = σ2 . (2.208)
2 πσ 2 2 πσ 2
Because x and y are often used to indicate variables other than the real and
imaginary parts of z, the notations zr and zi may sometimes be invoked, respec-
tively. Consequently, the differential real variable area dx dy would be indicated
by dΩz = dzr dzi .
4 In some of the literature, the normalization is defined for symmetry for the angular
frequency variable
∞
1
G̃(ω) = √ dt e−i ω t g(t) . (2.211)
2π −∞
The inverse transform is given by
∞
1
g(t) = √ dω ei ω t G̃(ω) . (2.212)
2π −∞
46 Notational and mathematical preliminaries
which also implies that the integral over the magnitude squared in either domain
is the same,
dt g(t) 2 = df G(f ) 2 . (2.219)
1
Ts ≤ . (2.220)
B
It is worth noting that if the spectrum is known to be sparse, then compressive
sampling techniques [88] can be used that reduce the total number of samples,
but that discussion is beyond the scope of this text.
If the spectral content extends just a little beyond that supported by the
Nyquist criteria, then the spectral estimate at the spectral edges will be contam-
inated by aliasing in which spectral components at one edge extend beyond the
estimated spectral limits and contaminate the estimates at the other end. A set
of regularly spaced samples is assumed here. This set may be of finite length.
The samples in the time domain (organized as a vector) are represented here by
(2.225)
This DFT matrix satisfies the unitary characteristics,
F−1 = F†
F† F = FF† = I . (2.226)
y = Fx
x = F† y . (2.227)
Another transform that is often encountered is the Laplace transform, which can
be viewed as a generalization of the Fourier transform. The Laplace transform
of a function f (·) is defined as
L {f (·)} (s) = dxf (x) e−sx , (2.228)
−0.2
−0.4
f(x 1, x2)
−0.6
constraint
−0.8
−1
−2
−1
0
1 2
1
0
2 −1
−2
x1 x
2
are shown in surface and contour plots, respectively. The minimum occurs at the
point marked by the dot in Figure 2.3.
As we trace out the path of the constraint function g(x) = 0 on the surface of
the objective function f (x), observe that the constraint function intersects the
contours of the objective function, except at stationary points (the point from
which the dashed arrow originates in Figure 2.4), where the constraint func-
tion just touches the contour of the objective function, but never crosses it. In
other words, the constraint function is tangent to the surface of the objective
function at all stationary points. Since the gradient is always perpendicular to
tangents, the gradient vector must be parallel to the objective function at station-
ary points. Hence, if a point x̄ is a stationary point that satisfies the constraint
equation
g(x) = 0, (2.233)
the gradient vectors of the objective function f (x) and the constraint function
g(x) are parallel at x̄. In other words, the gradient vectors must be linearly
50 Notational and mathematical preliminaries
−0.2
−0.4
f (x1, x2)
−0.6 constraint
−0.8
−1
−2 tangent contour
−1
0
1 2
1
0
2 −1
−2
x1 x2
The term λ is known as the Lagrange multiplier. We can combine the two equa-
tions above by defining a function Λ(x̄, λ) as follows:
The gradient operator in the equation above is with respect to the elements of
x and λ. Taking the gradient with respect to the elements of x ensures that
the gradient vectors of the objective function f (x) and the constraint function
g(x) are parallel, that is to say, Equation (2.234) is satisfied. Taking the gradient
with respect to the Lagrange multiplier λ ensures that the constraint equation is
satisfied since taking the derivative of Equation (2.236) with respect to λ results
2.12 Constrained optimization 51
g1 (x) = 0 (2.237)
g2 (x) = 0
..
.
gK (x) = 0.
g1 (x) ≤ 0
g2 (x) ≤ 0
..
.
gM (x) ≤ 0 . (2.241)
If a point x̄ is a local minima that satisfies the constraints, x̄ must satisfy the
Karush–Kuhn–Tucker conditions, which are as follows.
There exist μ̄ = (μ̄1 , μ̄2 , . . . , μ̄M ) and λ̄ = (λ̄1 , λ̄2 , . . . λ̄K ) such that
M
M
∇f (x̄) + μ̄m ∇ gm (x̄) + λ̄k ∇ hk (x̄) = 0 ,
m =1 m =1
Note that x̄ must satisfy the constraints of the problem, which are known as the
primal feasibility constraints, since without them x̄ cannot be a solution to the
optimization problem:
gm (x̄) ≤ 0 for m = 1, 2, . . . M
hk (x̄) = 0 for k = 1, 2, . . . K . (2.243)
x1 − x2 ≤ 0.
The arrows in the plot indicate the region of the x1 − x2 plane that satisfies the
boundary conditions. The global minimum is marked by the dot.
In this case, the Karush–Kuhn–Tucker conditions are given by
and
The feasibility conditions are satisfied by any point in the region indicated by
the arrows.
Observe that Equation (2.245) can be satisfied either if x̄1 − x̄2 = 0, i.e.,
the optimal point is at the boundary of the constraint function, or if μ̄ = 0, in
which case the optimal point is not on the boundary. If the point is not on the
boundary, then μ̄ = 0, and Equation (2.244) becomes
x2 − x1 ≤ 0 ,
and the objective function f (x1 , x2 ) is the same as before. Figures 2.6 and 2.7
illustrate this case, where the dot indicates the optimal point.
The Karush–Kuhn–Tucker conditions for this optimization problem are
and
25
20
15
f(x1, x2)
10
−5 10
10 5 0 5
0 −5 −5
−10 −10
x2 x
1
with the feasibility constraints satisfied by points on the region of the x1 –x2
plane indicated by the arrows.
Note that at the minima, Equation (2.248) is satisfied for any μ̄ because x̄2 −
x̄1 = 0. Hence Equation (2.247) remains unchanged and identical to the Lagrange
multiplier technique for optimization with equality constraints for which the
optimal point is apparent from Figure 2.7. The global optimality of this point
follows from the convexity of f (x1 , x2 ).
6 Note that the general definition of a functional is a mapping from a vector space (the
space of functions is a vector space) to the real numbers.
54 Notational and mathematical preliminaries
25
20
15
f(x1, x2)
10
−5 10
10 5 0 5
0 −5 −5
−10 −10
x x
2 1
where the functional is g(·, ·, ·). We use the notation y (x) to represent the deriva-
tive of y with respect to x, i.e.,
d
y (x) = y(x) (2.250)
dx
For all continuous functions that maximize or minimize the quantity I in
Equation (2.249), the function y(x) must satisfy
dg d dg
− = 0. (2.251)
dy dx dy
b 2
d
dx 1+ y(x) . (2.252)
a dx
2.12 Constrained optimization 55
4
x1− x2 = 0
2
x2
−2
−4
−6
−8
−8 −6 −4 −2 0 2 4 6 8
x1
(b, y(b))
(a, y(a))
x
Figure 2.8 Shortest path between two points determined by using calculus of
variations.
b 2
d
I(y(.)) = dx 1+ y(x) . (2.253)
a dx
56 Notational and mathematical preliminaries
If the function y(.) minimizes I(y(.)), then it must satisfy Equation (2.251).
Writing the Euler–Lagrange differential equation we have
⎛ ⎞
d y (x)
dg d ⎝ ⎠
0= − dx
d 2
d y(x) d x
1 + dx y(x)
⎛ ⎞
d 2 y (x) 2 d 2 y (x)
⎝ ⎠ d y(x)
=− dx 2 d x2
d 2 + dx d 2 3/2
1 + dx y(x) 1 + dx y(x)
d 2 y (x)
= − d x2
2 3/2
(2.254)
d
1+ dx y(x)
Hence, the curve that minimizes the distance between the two points (a, y(a))
and (b, y(b)) must have a second derivative that equals zero for all x in [a, b]. If
the second derivative is zero in the interval, then for all x in [a, b], it must be the
case that
d d
y(x) = y(a) . (2.255)
dx dx
Integrating both sides with respect to x yields
d
y(x) = x y(a) + A , (2.256)
dx
where A is a constant. Hence, we have proved that if y(x) is the curve that
minimizes the distance between the points (a, y(a)) and (b, y(b)), it must be a
straight line. Note that technically, we haven’t proved that there is a curve that
minimizes the distance between those points, but assuming that there is such a
curve, we have shown that it is a straight line.
In order to prove that it is necessary for any function y(x) which minimizes
(2.249) to satisfy the Euler–Lagrange equation, we first observe that any function
y(x) that minimizes Equation (2.249) must satisfy
where y (x) is a perturbed version of y(x). The perturbed function y (x) is given
by the following
Here h(x) is any other function which has the following properties
h(a) = h(b) = 0 .
The last property implies that y(a) = y (a) and y(b) = y (b). Additionally,
assume that h(x) is continuous and has a continuous derivative.
If y(x) minimizes I(y(.)), it must be the case that y (x) at = 0 minimizes
I(y(.)). Hence, the derivative of I(y (x)) with respect to must be zero. This
2.13 Order of growth notation 57
requirement leads to
b
d d d
I(y (x)) = dx g x, y , y (x) x . (2.257)
d d a dx
Moving the derivative into the integral, one finds that
b
d d d
I(y (x)) = dx g x, y , y (x) (2.258)
d a d dx
b
d d d d d d
= dx g x+ g y +
g y . (2.259)
a dx d d y d d y d
Substituting y (x) = y(x) + h(x) yields
b
d d d d
I(y (x)) = dx g h(x) +
g h (x) dx. (2.260)
d a d y d y d y
d
Because y(x) = y0 (x) minimizes I(y(.)), d I = 0 at = 0. Therefore, at = 0,
we have
b
d d
dx g h(x) + g h (x) = 0 . (2.261)
a dy dy
We can then use integration by parts to write the right-hand side of the above
integral as follows
b
d d d
0= dx g− g h(x) .
a dy d x d y
Since the function h(x) is any arbitrary continuous function with a continuous
derivative, the above equation can hold only if
d d d
g− g = 0,
dy d x d y
which is the Euler–Lagrange differential equation.
for all x > X. In other words, the function g(x) grows at a faster rate with x
than the function f (x). We say that f (x) is “little-O” of g(x), or f (x) = o(g(x)),
when
g(x)
→0 (2.263)
f (x)
as x → ∞. We say that f (x) is “theta of” g(x), or f (x) = Θ(g(x)), if there exist
an X, and constants A1 and A2 such that for all x > X,
A1 g(x) ≤ f (x) ≤ A2 g(x). (2.264)
In other words f (x) and g(x) grow at the same rate for sufficiently large x.
Confusingly, it is common practice to use this notation to describe the order of
growth of functions when x is close to zero as well as when x is large as described
above. The little-O notation is most commonly used in this context, whereby we
say that f (x) is little-O of g(x), or f (x) = o(g(x)), when
g(x)
→0 (2.265)
f (x)
as x → 0. A common application of the little-O notation is in writing Taylor
series expansions of functions for small arguments. For instance, one may write
the Taylor series expansion of log(1 + x) for small x as
log(1 + x) = x + o(x) . (2.266)
In this section, we briefly summarize some special functions that are often en-
countered in wireless communications in general and in this text in particular.
The integral requires analytic continuation to evaluate in the left half plane,
and the gamma function is not defined for non-positive integer real values of z.
For the special case of integer arguments the gamma function can be expressed
in terms of the factorial,
Γ(n) = (n − 1)! (2.268)
−1
n
= m. (2.269)
m =1
2.14 Special functions 59
Two related functions are the upper and lower incomplete gamma functions
defined respectively as
∞
Γ(s, x) = dτ τ s−1 e−τ , (2.270)
x
and
x
γ(s, x) = dτ τ s−1 e−τ . (2.271)
0
1 s
γ(s, x) = x + o(xs ) . (2.272)
s
∞ !p
(a ) xk
p Fq (a1 , . . . ap ; b1 , . . . bq ; x) = !mq =1 m k , (2.273)
n =1 (bn )k k!
k =0
Γ(a + k)
(a)k = = a(a + 1)(a + 2) · · · (a + k − 1). (2.274)
Γ(a)
From its definition, it can be observed that the hypergeometric series does not
exist for non-positive integer values of bn because it would result in terms with
zero denominators.
The hypergeometric series arises in a variety of contexts; in particular, the
Gauss hypergeometric function (p = 2, q = 1) has some special properties. For
convenience of notation let a = (a)1 , b = (a)2 , and c = (b)1 , when the argument
is unity, we have
Γ(c) Γ(c − a − b)
2 F1 (a, b; c; 1) = . (2.275)
Γ(c − a) Γ(c − b)
60 Notational and mathematical preliminaries
and numerous other values which can be found in the literature. One particular
identity that is not widely available in the literature is the following, which
applies when p and q are integers with 0 < p < q:
p p p p
F
2 1 1, ; 1 + ; x = Lerch x, 1, (2.281)
q q q q
p k q1 −p
q −1
1
=− ζq x log 1 − ζqk x q , (2.282)
q
k =0
2π i
where ζq = e q is the qth root of unity and the Lerch transcendent is defined as
∞
xk
Lerch(x, s, a) = s , (2.283)
(a + k)
k =0
1
Lerch(x, s, a) = x Lerch(x, s, a + 1) + . (2.284)
(a2 )s/2
B(z; x, y)
I(z; x, y) = .
B(x, y)
z = W (z) eW (z ) . (2.293)
For real arguments, the Lambert W function W (z) has only two real branches:
the principal branch W0 (z) and another branch that is simply denoted by W−1 (z).
62 Notational and mathematical preliminaries
For integer order, this form can be expressed in terms of the contour integral
1
Im (x) = (i)−α dz z −m −1 e 2 (z −1/z )
ix
2πi C
1
dz z −m −1 e 2 (z +1/z ) ,
x
= (2.299)
2πi C
where the contour C encloses the origin. The modified Bessel function of the
second kind is given by
π
Kα (x) = (i)α +1 [Jα (i x) + i Yα (i x)] . (2.300)
2
Problems
2.1 Evaluate
√ the following expressions.
(a) e−i π
64 Notational and mathematical preliminaries
(b) log4 (1 + i)
∞ 2
(c) dx δ(x − 1) cosh√[π (x−1)]
1−∞ 2−x 2
(d) I4
a
(e) Γ(2)
2.3 For unit-norm complex vectors a and b, evaluate the following expressions.
(a) λm {I + a a† + b b† } if a† b 2 = 1/2
(b) tr{I + a a† + b b† }
(c) |I + a a† + b b† |
|I + A A† + B B† | ≥ |I + A A† | .
2.6 Evaluate the following integrals under the assumption that the closed con-
tour encloses a radius of 10 of the origin.
# 1
(a) dz (z −1)
#
2 z
1
(b) dz (z −20) 2 z
#
(c) dz (z(z−2)(z −3)
−1) 2 z
#
(d) dz (z 2 z−1)
# z
(e) dz (ze−1)
2.7 Evaluate the following integrals where V indicates the entire volume spanned
by the variables of integration.
(a) For real variables x and y
∞ ∞
dx −∞ dy(x2 + y 2 ) e−x e−y
2 2
−∞
Problems 65
V
(c) For the complex n-vector z
dΩz z 2 e−z
2
2.9 By using the calculus of variation, find the shortest distance between a
point on the zenith of a sphere (the north pole) and a point on the equator.
3 Probability and statistics
3.1 Probability
Depending upon the situation, the explicit dependency upon parameters may
be suppressed. The cumulative distribution function (CDF) is the probability
PX (x0 ) that some random variable X is less than or equal to some threshold x0 ,
PX (x0 ) = Pr{x ≤ x0 }
x0
= dx pX (x; a) . (3.2)
−∞
For single variables, and pY (y) > 0, this relationship (Bayes’ theorem) can be
written in the important form
pY (y|x) pX (x)
pX (x|y) = . (3.4)
pY (y)
A useful interpretation of this form is to consider the random variable X as the
input to a random process that produces the random variable Y that can be
observed. Thus, the likelihood of a given value x for the random input variable
X is found given the observation y of the output distribution Y . Throughout
statistical signal processing research, a common source of debate is the use of
implicit and sometimes unstated priors in analysis. These priors can dramatically
affect the performance of various algorithms when exercised by using measured
data that often have contributions that do not match simple models.
where x = f −1 (y) indicates the inverse function of f (x), and the notation ·|x= x 0
indicates evaluating the expression to the left with the value x0 . However, it
is not uncommon for the inverse to have multiple solutions. If the jth solution
given by x at some value y to the inverse is given by x = fj−1 (y), then the
transformation of densities is given by
pX(fj−1 (y))
pY (y) = $ $. (3.6)
$ ∂ $
j $ ∂ x f (x)x= f −1 (y ) $
j
where |∂f (x)/∂x| is the Jacobian associated with the two random vectors, and
the notation |.|x = x0 indicates that the absolute value of the quantity within
the bars is evaluated with the parameter x = x0 .
The mth central3 moment about the mean indicated here by μm is given by
μm = dx (x − X)m pX (x). (3.10)
= μ2 . (3.11)
Note that in situations where the random variable in concern is clear, we shall
omit the subscript, denoting the variance simply as σ 2 .
The skewness of random variable X is an indication of the asymmetry of a
distribution about its mean. It is given by the third central moment normalized
by the variance to the 3/2 power; thus, it is unitless,
dx (x − X)3 pX (x)
skew{X} = 3/2
(σ 2 )
μ3
= 3/2
. (3.12)
(σ 2 )
Finally, the kurtosis of random variable X is a measure of a distributions “peak-
iness.” It is given by the fourth central moment normalized by the variance
squared; thus, it is unitless. The excess kurtosis is the ratio of the fourth cumulant
3 Central indicates that it is the fluctuation about the mean that is being evaluated.
3.1 Probability 69
Jensen’s inequality
Jensen’s inequality can be used to relate the mean of a function of a random
variable to the function of the mean of the random variable. Specifically, Jensen’s
inequality states that for every convex function f (·) of a random variable X,
μm = X m
= dx xm pX (x) . (3.16)
The mth moment is found by noting that the derivative with respect to t evalu-
ated at t = 0 leaves only the mth term in a Taylor expansion of the exponential,
∂
M (t; X) = X m . (3.18)
∂t t=0
70 Probability and statistics
for which s is the transformed variable for x that corresponds to the angular
frequency. Note that the characteristic function of a random variable is essen-
tially the Fourier transform (see Section 2.10) of its probability density function.
The moment-generating function, on the other hand, is essentially the Laplace
transform (see Section 2.11) of the PDF evaluated at real values.
In terms of the central moments μm , the first few cumulants are given by
k1 = μ
k2 = μ2
k3 = μ3
k4 = μ4 − 3μ22
k5 = μ5 − 10μ2 μ3
k6 = μ6 − 15μ2 μ4 − 10μ23 + 30μ32 . (3.21)
If the random variables are independent, then the joint probability density func-
tion is equal to the product of the individual probability densities,
pX 1 ,X 2 ,... (x1 , x2 , . . .) = pX m (xm ) . (3.23)
m
where dΩX , discussed in Section 2.9.2, is the notation for the measure and is
given by
Note that the measure is expressed in terms of the real and imaginary compo-
nents of the complex random variable. This convention is not employed univer-
sally, but will be assumed typically within this text. In the case of a real random
variable, the imaginary differentials are dropped.
The probability density function of a given set of random variables xm given
or conditioned on particular values for another set of variables yn is denoted by
1 − ( z −μ2 ) 2
pZ (z; μ, σ) dz dz = e σ dz dz . (3.29)
πσ 2
1 † −1
pZ (Z; X, R) dΩZ = e−tr{(Z−X) R (Z−X)} dΩZ , (3.30)
|R|k π m k
q= z (3.32)
3.1 Probability 73
then the probability density4 for the real Rayleigh variable Q is given by
+
2 q −q 2 /σ 2
σ2 e dq ; q ≥ 0
pR ay (q) dq = (3.33)
0 dq ; otherwise,
λ e−λ x dx ; x≥0
pE xp (x) dx = (3.35)
0 dx ; x < 0.
1 − e−λ x ; x≥0
PE xp (x) = (3.36)
0 ; x < 0.
1 1
Its mean and variance are λ and λ2 , respectively.
k
q= x2m . (3.37)
m =1
4 This density assumes that the complex variance is given by σ 2 , which is different from a
common assumption that the variance is given for a real variable. Consequently, there are
some subtle scaling differences.
74 Probability and statistics
Since the random variables Xm have unit variance and zero mean,
% 2
&
Xm = 1. (3.38)
The distribution pχ 2 (q) for the sum of the magnitude square q of k independent,
zero-mean, unit-variance real Gaussian random variables is given by
+
1
2 k / 2 Γ(k /2)
q k /2−1 e−q /2 dq ;q≥0
pχ 2 (q; k) dq = (3.39)
0 dq ; otherwise.
where Γ(·) is the standard gamma function, and γ(·, ·) is the lower incomplete
gamma function given by Equation (2.271).
Complex χ2 distribution
With a slight abuse in terminology, we define the complex χ2 distribution as
the distribution of the sum q of n independent complex, circularly symmetric
Gaussian random variables Zm with values zm . The sum q is given by [173]
n
2
q= zm (3.41)
m =1
% 2
& 2
Zm =σ . (3.42)
To be clear, the variance detailed here is in terms of the complex Gaussian vari-
able and we include the variance explicitly as a parameter of the distribution. By
employing Equation (3.5) and noting that the number of real degrees of freedom
is twice the number of complex degrees of freedom (k = 2n), the distribution
pCχ 2 (q; n, σ ) for the sum of the magnitude squared q ≥ 0 is given by
2
1 q
pC 2
χ 2 (q; n, σ ) dq = pχ 2 ; 2n dq
σ 2 /2σ 2 /2
n −1
1 2q q
= 2 n −1 2
e− σ 2 dq
σ 2 Γ(n) σ
q n −1 q
= 2 n e− σ 2 dq , (3.43)
(σ ) Γ(n)
3.1 Probability 75
where it is assumed that the variance σ 2 of zm is the same for all m, and the
density is zero for q < 0. The cumulative distribution for q is given by
q
PχC2 (q; n, σ 2 ) = dr pC 2
χ 2 (q; n, σ )
0
1 q
= γ n, 2 . (3.44)
Γ(n) σ
Xm ∼ N (μm , 1) . (3.45)
The random variable q is given by the sum of the magnitude squared of the real
independent Gaussian variables
k
q= x2m (3.46)
m =1
μm = Xm (3.47)
% &
σ 2 = Xm − μm 2
= 1, (3.48)
where here μm indicates the mth mean (and not the moment). The probability
density for q ≥ 0 is given by [174]
1 q k /4−1/2 √
pχ 2 (q; k, ν) = e−(q +ν )/2 Ik /2−1 ( ν q) , (3.49)
2 ν
where Im (·) indicates the mth order modified Bessel function of the first kind
(discussed in Section 2.14.5), and the density is zero for q < 0. The noncentrality
parameter ν is given by the sum of the standard deviation normalized means,
k
ν= μ2m . (3.50)
m =1
where QM (·, ·) is the Marcum Q-function discussed in Section 2.14.8, and γ(·, ·)
is the lower incomplete gamma function.
and is zero for q < 0. The cumulative distribution function for q ≥ 0 is given by
q
PχC2 (q; n, σ 2 , ν C ) = dr pC 2 C
χ 2 (r; n, σ , ν )
0
, ,
2ν 2q
= 1 − Qn 2
, . (3.54)
σ σ2
3.1.13 F distribution
The F distribution is a probability distribution of the ratio of two independent
central χ2 distributed random variables. It is parameterized by two parameters
n1 and n2 and has a density function
⎧ n1 n2
⎨
n1
n 1 2 n 2 2 x 2 −1
1
n1 n2 n1 n
B( 2 , 2 ) (n 1 x+ n 2 ) 2 + 22
;x≥0
pF (x; n1 , n2 ) = (3.55)
⎩
0 ; otherwise.
B(·, ·) here refers to the beta function defined in Section 2.14.3.
The cumulative distribution function of the F -distributed random variable is
given by
n1 x n1 n2
PF (x; n1 , n2 ) = I ; ; , (3.56)
n2 + n1 x 2 2
where I(.; ., ) is the regularized beta function defined in Section 2.14.3.
3.1 Probability 77
y = a+z
= (a + {z})2 + {z}2 . (3.58)
The random variable y follows the Rice or Rician distribution whose probability
density function pR ice (y) is,
+ 2 a y −(y 2 +a 2 )/σ 2
2y
σ 2 I0 σ 2 e dy ; y ≥ 0
pR ice (y) dy = , (3.59)
0 dy ; otherwise
where I0 (·) is the zeroth order modified Bessel function of the first kind (discussed
in Section 2.14.5). In channel phenomenology, it is common to describe this
distribution in terms of the Rician K-factor, which is the ratio of the coherent
to the fluctuation power,
a2
K= . (3.60)
σ2
It may be worth noting that the Rician distribution is often described in terms
of two real Gaussian variables. Consequently, the distribution given here differs
from the common form by replacing σ 2 with σ 2 /2.
The cumulative distribution function for a Rician variable for value greater
than zero is given by
y0
PR ice (y0 ) = dy pY (y)
0 y 0
2y 2ay
e−(y +a )/σ
2 2 2
= dy 2 I0 2
0 σ σ
√ √
a 2 y0 2
= 1 − QM =1 , , (3.61)
σ σ
78 Probability and statistics
where QM (ν, μ) is the Marcum Q-function discussed in Section 2.14.8. The dis-
tribution for the square of a Rician random variable q = y 2 is the complex
noncentral χ2 distribution with one complex degree of freedom.
where Gm is a real Gaussian random variable with the same statistics as Gm
(with values gm and gm , respectively). The probability density of the beta dis-
tribution pβ (x; j, k) is given by
Γ(j + k) j −1
pβ (x; j, k) dx = x (1 − x)k −1 dx , (3.70)
Γ(j) Γ(k)
and the corresponding CDF Pβ (x0 ; j, k) is given by
x0
Pβ (x0 ; j, k) = dx fβ (x; j, k)
0
Γ(j + k)
= B(x0 ; j, k) , (3.71)
Γ(j) Γ(k)
where B(x; j, k) is the incomplete beta function that is discussed in Section
2.14.3.
Note that while the beta distribution can be used to describe the retained
fraction of the norm square of a (k + j)-dimensional Gaussian random vector
projected onto a k-dimensional space, the beta distribution is more general than
that. As such, the parameters j and k could be non-integers as well.
the log-normal random variable X with value x, the probability density function
is given by
1 l o g x −μ
plog N or m (x; μ, σ 2 ) dx = √ e− 2 σ2 dx . (3.72)
x 2πσ 2
The cumulative distribution function is given by
1 log x − μ
2
Plog N or m (x0 ; μ, σ ) = 1 + erf √ , (3.73)
2 σ 2
where erf(·) is the error function discussed in Section 2.14.6.
pZ (z) = dx dy pX ,Y (x, y) pZ (z|x, y)
= dx dy pX (x) pY (y) pZ (z|x, y) , (3.74)
where pX ,Y (x, y) is the joint probability density for x and y, pZ (z|x, y) is the
probability density of z conditioned upon the values of x and y, and x and y are
assumed to be independent. Because z = x + y, the conditional probability is
simple, given by
where δ(·) is the Dirac delta function. Consequently, the distribution for Z is
given by
pZ (z) = dx dy pX (x) pY (y) δ(x + y − z)
= dx pX (x) pY (z − x) , (3.76)
z = xy. (3.80)
2 2
− 2xσ 2 − 2yσ 2
e x e y
pZ (z) = dx dy δ(x y − z)
2π σx2 2π σy2
1 z
= K0 , (3.82)
π σx σy σx σy
where K0 (·) is the modified Bessel function of the second kind of order zero
discussed in Section 2.14.5.
What does it mean for a random variable to converge? While the convergence
of a sequence of deterministic variables to some limit is straightforward, conver-
gence of random variables to limits is more complicated due to their probabilistic
nature. In the following, we define several different modes of convergence of ran-
dom variables, starting with modes of convergence that are typically viewed as
stronger modes of convergence followed by weaker ones. The proofs of these prop-
erties can be found in standard probability texts such as References [131] and
[171].
82 Probability and statistics
Almost-sure convergence
We say that Xn converges with probability 1, or almost surely to a random
variable X, if
0 1
Pr lim Xn = X = 1 . (3.83)
n →∞
Almost-sure convergence simply means that the event that Xn fails to converge
to X has zero probability.
This mode of convergence simply means that the mean of the squared deviation
of Xn from its limiting value X goes to zero. Note that almost-sure convergence
does not in general imply convergence in mean square or vice versa. However,
suppose that the sum of the mean-square deviations of Xn and its limiting value
X is finite, i.e.,
∞
% &
Xn − X 2
< ∞, (3.89)
n =1
In other words, convergence in the kth mean of a random variable implies con-
vergence in all lower-order means as well.
Convergence in probability
We say that Xn converges in probability to X if for every > 0,
Convergence in probability simply means that the probability that the random
variable Xn deviates from X by any positive amount goes to zero as n → ∞.
Convergence in distribution
We say that Xn converges in distribution to a random variable X if the cumu-
lative density functions of Xn converge to the cumulative density function of X
as n → ∞. In other words,
General relationships
Convergence of a random variable Xn to a random variable X with probability
1 or almost surely implies that the random variable Xn converges in probability
to X as well, since convergence in probability is a weaker notion of convergence
than convergence with probability 1. Similarly, the convergence of Xn to X in
quadratic mean implies convergence in probability of Xn to X. Since convergence
in distribution is weaker than convergence in probability, convergence of random
variables Xn to X in probability implies convergence in distribution of Xn to X.
In other words, the cumulative distribution function of Xn converges to that of
X if Xn converges to X in probability. Mathematically, these relationships can
be written as follows
a.s. P
Xn −−→ X =⇒ Xn −
→ X (3.99)
q.m . P
Xn −−−→ X =⇒ Xn −
→ X (3.100)
P D
Xn −
→ X =⇒ Xn −
→ X, (3.101)
Note that if f (X) is bounded in addition to being continuous, i.e., f (X) < A,
for some finite constant A, in addition to the property above we also have
D D
f (Xn ) −
→ f (X) =⇒ Xn −
→ X. (3.105)
this case, it can be shown that convergence in probability holds as well. More
formally, suppose that A is a constant, then
D P
Xn −
→ A =⇒ Xn −
→ A. (3.106)
Note that the above property does not hold in general for convergence in distri-
bution. However, we have the following property, known as Slutsky’s theorem,
which applies when one of the sequences of variables converges to a constant A,
D D D
Xn −
→ X and Yn −
→ A, then Xn + Yn −
→ X + A. (3.110)
Note that even almost-sure convergence does not imply convergence of means
in general. In other words, even if a sequence of random variables Xn converges
with probability 1 to a random variable X, it is not necessarily the case that
the expected values of the Xn , i.e., Xn , converge to X. The reason for this
apparent paradox is that convergence of the mean of a random variable depends
on the rate at which the probabilities associated with that random variable
converge, whereas convergence in probability or almost-sure convergence do not
depend on the rate of convergence of the probabilities associated with the random
variables. A simple example that is often given in textbooks on probability is
the following.
Let the random variable Xn take on the following values,
n with probability n1
Xn = . (3.111)
0 with probability 1 − 1
n
The contents of the expectation operator in (3.112) are nonzero only for suitably
large values of Xn , i.e. |Xn | > ν. In other words, the expectation is averaging
only “large” values of Xn . The supremum over n outside the expectation looks for
the value of n for which the average value of |Xn | when small values of |Xn | are
forced to zero is largest. Finally, ν is taken to infinity which means that the values
of Xn that are not zeroed in the averaging operation get successively larger.
This property ensures that the mean value of Xn converges at the correct rate
such that convergence in probability implies convergence of the means. We then
have the following property, which states that the absolute value of the deviation
of Xn from X converges to zero, if and only if Xn converges in probability to a
random variable X, and Xn is uniformly integrable. Mathematically, this can be
expressed as
P
Xn −
→ X and Xn is uniformly integrable ⇐⇒ Xn − X → 0. (3.113)
Xn ≤ W ∀ n (3.114)
W < ∞ (3.115)
and
P
Xn −
→ X, (3.116)
then
Xn − X → 0 . (3.117)
While random variables are mappings from an underlying space of events to real
numbers, random process can be viewed as a mapping from an underlying space
of events onto functions. Random processes are essentially random functions and
are useful for describing transmitted signals when the underlying signal sources
are nondeterministic.
Figure 3.1 illustrates a random process X(t) in which elements in an underlying
space of events (which need not be discrete) map onto functions. The set of all
3.3 Random processes 87
.
.. .
. . .. x1(t), x2(t) – possible realizations
of the process X(t)
x2(t)
possible functions that X(t) can take is called the ensemble of realizations of the
process X(t).
Note that X(t) for any particular t is simply a random variable. A complete
statistical characterization of a random process requires the description of the
joint probability densities (or distributions), of the random variables X(t) for
all possible values of t. Note that since t is in general uncountable and hence
cannot be enumerated, the joint distribution in general needs to be specified for
a continuum of values of t.
In general, the joint density for all possible t is very difficult to obtain for real-
world signals. If we restrict ourselves to ergodic processes, that loosely speaking,
are random processes for which single realizations of the process contain the
statistical properties of the entire ensemble, it is possible to estimate certain
statistical properties of the ensemble from a single realization of the process. Of
particular interest are the second-order statistics of ergodic random processes,
which are the mean function
It is also possible to define the cross correlation between two random processes
X(t) and Y (t) as follows:
Note that the expectations above are taken with respect to the ensemble of
possible realizations of the processes X(t) and Y (t), jointly.
88 Probability and statistics
X(t) = μ (3.121)
∗
RX (τ1 , τ2 ) = X((τ1 − τ2 ) + t) X (t) ∀ t. (3.122)
Two processes are jointly wide-sense stationary if they are each wide-sense sta-
tionary and their cross correlation is just a function of the time lag. The cross
correlation of wide-sense stationary processes is usually written with a single
index as follows:
Observe here that the power-spectral density is the Fourier transform of the
autocorrelation function.
Wide-sense stationary processes are good approximations for many nondeter-
ministic signals encountered in the real world, including white noise. Addition-
ally, the effect of linear-time-invariant (LTI) systems on wide-sense stationary
random processes can be characterized readily.
x(t) y(t)
h(t)
in many scenarios observing the output signal y(t) in response to one realization
of x(t) may not be very useful to characterize the behavior of the LTI system.
Much more meaningful results can be obtained by characterizing the second-
order statistics of X(t) and Y (t).
Suppose that h(t) is absolutely integrable, i.e.,
∞
dt h(t) < ∞ .
−∞
Then it can be shown that X(t) and Y (t) are jointly wide-sense stationary,
provided that X(t) is wide-sense stationary. The cross-correlation function can
be found as follows:
) ∞ *
RY X (τ ) = Y (t + τ ) X ∗ (t) = dα h(α) X(t + τ − α) X ∗ (t)
−∞
∞
= dα h(α) X(t + τ − α) X ∗ (t)
−∞
∞
= dα h(α) RX (τ − α)
−∞
= h ∗ RX (τ ) . (3.126)
Note that the expectation can be taken into the integral because h(t) is absolutely
integrable. Using a similar set of steps, it can be shown that
←
−
RX Y (τ ) = h ∗ RX (τ ) , (3.127)
←
−
where h (t) = h(−t) is a time-reversed version of h(t). Similarly, it can be shown
that
←
−
RY (τ ) = h ∗ h ∗ RX (τ ) . (3.128)
RN (τ ) = N0 δ(τ ) . (3.129)
Note that N0 here is the value of the power spectral density of the white-noise
process since the power-spectral density is simply the Fourier transform of the
90 Probability and statistics
autocorrelation. Hence, white-noise processes have a flat PSD since the Fourier
transform of an impulse is a constant. This fact implies that white-noise processes
have infinite bandwidth and so infinite power. In practice, however, all systems
have limited bandwidth and the observed noise is not white.
Additionally, zero-mean, white-noise processes are uncorrelated at different
time samples since
The last step follows from the fact that S(f ) is a constant for all f .
Note that by taking the Fourier transform of Equation (3.128), we find
SY (f ) = H(f ) 2 SX (f ) , (3.132)
where H(f ) is the Fourier transform of h(t). Hence, if a white-noise process N (t)
is filtered through a band-pass filter, the resulting output is no longer white and
may have finite variance.
Perhaps the most commonly used wide-sense stationary random process in the
analysis of wireless communication systems is the white-Gaussian-noise (WGN)
process. The white-Gaussian-noise process is a white-noise process with zero
mean and amplitude distributed as a Gaussian random variable. Since white-
noise processes are uncorrelated at different time instances and uncorrelated
Gaussian random variables are also independent, samples of a white-Gaussian-
noise process at different time instances are independent random variables.
As an example, consider a zero-mean white-Gaussian-noise process N (t) with
power-spectral-density N0 that is filtered through an ideal low-pass filter with
cut-off frequency of ±W and unit height in the pass band. The variance of the
output of the low-pass filter can then be found as follows. Let the output of the
filter be Y (t). Then the variance of Y (t) is
∞
% 2
&
Y (t) = RY (0) = df SY (f )
−∞
= df Sf (f ) = 2 W N0 . (3.133)
pass band
The homogeneous Poisson process is a Poisson process for which λ(t) = λ, i.e.,
the mean number of arrivals in any interval is simply proportional to the length
of the interval. The following are the main characteristics of a homogeneous
Poisson process with intensity λ.
(1) The numbers of arrivals in disjoint intervals are independent random vari-
ables.
(2) The number of arrivals in a duration τ is a Poisson random variable with
parameter λτ , i.e., the probability mass function is
1
Pr{k arrivals in any interval of length τ } = (λτ )k e−λτ . (3.135)
k!
(3) The time between two consecutive arrivals is an exponential random variable
with mean λ1 .
The Poisson process can also be defined in Rd , where it is referred to as a
Poisson point process (PPP). The defining characteristic of the Poisson point
process is that the number of points in any disjoint subset of Rd are independent
random variables. The number of points in any subset B ∈ Rd is a Poisson
random variable with mean
dx λ(x) . (3.136)
B
to the nearest, second-nearest user, and so forth, are useful in the analysis of
wireless networks. The probability density function of the distance rk between
an arbitrary point to the kth nearest point of a two-dimensional Poisson point
process can be found as follows [210]. Suppose that the point is the origin, then
While the eigenvalues of M are random for finite values of m and n, the dis-
tribution of eigenvalues converges to a fixed distribution (the Marcenko–Pastur
distribution) as m and n approach ∞ [347, 206, 168, 286, 323, 259, 315, 33]. Be-
cause M grows to be infinite in size, there are correspondingly an infinite number
of eigenvalues for M. The technical tools to develop the resulting eigenvalue dis-
tribution are discussed in the following section (Section 3.6.1). The distribution
for the eigenvalues is given by the sum of a continuous probability distribution
fr (λ) and a discrete point at zero weighted by cr :
where
√ √
ar = ( r − 1)2 , br = ( r + 1)2 . (3.146)
∞
τ
z m(z) + 1 = m(z) c dH(τ ) . (3.151)
0 1 + τ m(z)
Note that Equation (3.145) can be found by setting dH(τ ) equal to the Dirac
measure at 1 and solving Equation (3.151). Also, the Stieltjes transform of dφ(t)
3.7 Estimation and detection in additive Gaussian noise 95
z = Hs + n. (3.153)
The previous equation can be derived by starting with the PDF of the conditional
probability in Equation (3.154),
1
† −1
p(z|s) = exp − (z − H s) R (z − H s) . (3.156)
π n |R|
By exploiting the Wirtinger calculus discussed in Section 2.8.2, setting the deriva-
tive to zero yields,
d †
(z − H s) R−1 (z − H s) = 0
d s†
H† R−1 H s − H† R−1 z = 0
−1 † −1
s = H† R−1 H H R z. (3.158)
† −1 −1
We have assumed here that H R H is positive-definite, even though it
is only guaranteed to be non-negative-definite by virtue of the fact that R is a
covariance matrix.
Vector detection
Consider a system described by Equation (3.154) but the vectors s can only
take on one of the values s1 , s2 , . . . , sK . Given the noisy observation z, we wish
to detect which of the possible s vectors was actually present such that the
probability of making an error Pe is minimized, where
Pe = Pr{ŝ = s} . (3.159)
For a given observation z, the probability of error is minimized if the estimated
value ŝ is such that the conditional probability of s given z is maximized. That
is to say, the minimum probability of error estimator of s is
ŝ = arg max p (s = s |z) . (3.160)
s ∈{s 1 ,s 2 ,...,s k }
To illustrate this problem, it is instructive to consider the case when the random
vector of interest can take one of two possible, values i.e., K = 2. We can write
the conditional probability above using Bayes rule discussed in Section 3.1.7 as
p (z|s) p (s)
p (s|z) = . (3.161)
p (z)
Thus, ŝ = s1 if
p (z|s = s1 ) p (s = s1 ) p (z|s = s2 ) p (s = s2 )
≥
p (z) p (z)
p (z|s = s1 ) p (s = s2 )
≥ . (3.162)
p (z|s = s2 ) p (s = s1 )
The quantity on the left-hand side is known as the likelihood ratio.
Assuming that s1 and s2 are equally likely and that n ∼ CN (0, R), we can
write Equation (3.162) as
1
† −1
exp − (z − H s 1 ) R (z − H s1 )
π n |R|
1
†
≥ n exp − (z − H s2 ) R−1 (z − H s2 ) . (3.163)
π |R|
3.7 Estimation and detection in additive Gaussian noise 97
If the noise samples in the vector n are uncorrelated and have equal variance,
R = σ 2 I and the expression above becomes
2 2
||z − H s1 || ≤ ||z − H s2 || . (3.165)
0 3 4 1
= Pr 2 (s1 − s2 )† H† n ≥ ||H (s1 − s2 )||
2
0 1
2
= Pr v ≥ ||H (s1 − s2 )|| , (3.166)
2
where v ∼ N 0, 2σ 2 ||H (s1 − s2 )|| . Hence Equation (3.166) evaluates to
||H (s1 − s2 )||
Pr{ŝ = s2 |s = s1 } = Q √ , (3.167)
2 σ2
which by symmetry (because s1 and s2 are equally likely) equals the probability
of error.
Extending this analysis to systems with a larger number of possible values of
the vector s, i.e., K ≥ 2 (for a general R), yields the following expression for
the minimum probability of error estimator for s when s is uniformly distributed
among s1 , s2 , . . . sK ,
†
ŝ = arg min (z − H s ) R−1 (z − H s ) . (3.168)
s ∈{s 1 ,s 2 ,...,s k }
The probability of error can be bounded from above by finding the worst-case
difference between H s and H sm . Let
Then, the probability of error when s is one of K equally likely vectors, is bounded
from above by
|d |
Pe ≤ Q √ min . (3.171)
2 σ2
Recall that ||A||F is the Frobenius norm of A, which is the square root of the
sum of the squares of all entries of the matrix A. The probability of error is thus
bounded from above by
d̄
Pe ≤ Q √ min . (3.177)
2 σ2
where the function θ{x} for some real variable x is defined here to be
1 ; x≥0
θ{x} = . (3.180)
0 ; x<0
A false alarm occurs when the test statistic exceeds the detection threshold when
no signal of interest is present. Given an ensemble of observations defined by the
density p(Z|H0 ) in which signal is absent H0 , the probability of false alarm
Pf a (η) is defined by
Pf a (η) = Pr{φ(Z) ≥ η}
= dΩZ p(Z|H0 ) θ{φ(Z) − η} . (3.181)
by
5 6
var(θ̂) = (θ̂ − θ)2
= dx (θ̂ − θ)2 p(x; θ) . (3.182)
If the estimator is unbiased, then the mean of the estimator is the actual
parameter,
5 6
θ̂ = θ . (3.183)
5 6
(θ̂ − θ)2 ≥ J −1
' 2 (
∂
J= log p(x; θ) . (3.184)
∂θ
The basis of this bound is given by the statistical relationship for random vari-
ables a and b that is constructed by the Cauchy–Schwarz inequality (defined in
Equation (2.42)),
2 % &% &
a b ≤ a2 b2 . (3.185)
) *2 ' 2 (
∂ log p(x; θ) 5 6 ∂ log p(x; θ)
(θ̂ − θ) ≤ (θ̂ − θ)2
∂θ ∂θ
= var{θ̂} J . (3.186)
) * ) * ) *
∂ log p(x; θ) ∂ log p(x; θ) ∂ log p(x; θ)
(θ̂ − θ) = θ̂ − θ . (3.187)
∂θ ∂θ ∂θ
In the following discussion, it is shown that the first term on the right-hand side
of Equation (3.187) is one and the second is zero. Focusing on the first term, the
3.8 Cramer–Rao parameter estimation bound 101
expression simplifies to
) * ) *
∂ log p(x; θ) 1 ∂p(x; θ)
θ̂ = θ̂
∂θ p(x; θ) ∂θ
1 ∂p(x; θ)
= dx p(x; θ) θ̂
p(x; θ) ∂θ
∂
= dx p(x; θ) θ̂
∂θ
∂ 5 6
= θ̂
∂θ
∂
= θ = 1, (3.188)
∂θ
) * ) *
∂ log p(x; θ) 1 ∂p(x; θ)
θ = θ
∂θ p(x; θ) ∂θ
1 ∂p(x; θ)
= dx p(x; θ) θ
p(x; θ) ∂θ
∂p(x; θ)
= dx θ
∂θ
∂ ∂θ
= dx p(x; θ) θ − dx p(x; θ)
∂θ ∂θ
∂θ ∂θ
= − = 0, (3.189)
∂θ ∂θ
1
var{θ̂} ≥ . (3.190)
J
The Fisher information can also be represented by a form involving the second
derivative of the log of the probability,
' 2 (
∂
J= log p(x; θ)
∂θ
) 2 *
∂
=− log p(x; θ) . (3.191)
∂θ2
102 Probability and statistics
This relationship can be found by first noting the expectation of the derivative
of the log of the probability is zero,
) *
∂ ∂
log p(x; θ) = dx p(x; θ) log p(x; θ)
∂θ ∂θ
∂
= dx p(x; θ)
∂θ
∂
= 1 = 0. (3.192)
∂θ
By using this observation and evaluating the derivative of zero, the second form
of the Fisher information is found,
) *
∂ ∂ ∂
0= 0= log p(x; θ)
∂θ ∂θ ∂θ
∂ ∂
= dx p(x; θ) log p(x; θ)
∂θ ∂θ
∂p(x; θ) ∂ ∂2
= dx log p(x; θ) + p(x; θ) 2 log p(x; θ)
∂θ ∂θ ∂θ
∂ ∂ ∂2
= dx p(x; θ) log p(x; θ) log p(x; θ) + 2 log p(x; θ)
∂θ ∂θ ∂θ
' 2 ( ) 2 *
∂ ∂
log p(x; θ) =− log p(x; θ) . (3.193)
∂θ ∂θ2
)$ $2 *
$ $
${θ̂ − θ}m $ ≥ {J−1 }m ,m , (3.194)
3.8 Cramer–Rao parameter estimation bound 103
where the inequality between matrices is used to indicate that the difference be-
tween the matrices (cov{θ}−J−1 ) is positive-semidefinite or, in other words, has
non-negative eigenvalues. The Fisher information matrix for observation vector
z with probability density function p(z; θ) is given by
) *
∂ ∂
{J}m ,n = log p(z; θ) log p(z; θ)
∂{θ}m ∂{θ}n
) 2 *
∂ log p(z; θ)
=− . (3.196)
∂{θ}m ∂{θ}n
1 † −1
p(z; θ) = e−[z−μ(θ)] R (θ) [z−μ(θ)] (3.197)
πn |R(θ)|
) *
∂2
{J}α ,β =− log p(z; θ)
∂α ∂β
) 2 *
∂ † −1
= ([z − μ] R [z − μ] + log |R|)
∂α ∂β
)
∂ ∂μ† −1 ∂μ
= − R [z − μ] − [z − μ]† R−1
∂α ∂β ∂β
*
∂R ∂R
− [z − μ]† R−1 R−1 [z − μ] + tr R−1 ,
∂β ∂β
(3.198)
104 Probability and statistics
where the observations that the mean of the data vector with the mean re-
moved is zero z − μ = 0, that the quadratic form can be reordered by using
the trace v† M v = tr{v v† M}, and that the expectation of the outer prod-
uct
% of the difference
& with itself is the interference-plus-noise covariance matrix
[z − μ][z − μ]† = R are all employed. By reverting to the earlier notation and
observing that the sum of the variable and its conjugate is equal to twice the real
part of the variable, the Fisher information matrix for this Gaussian distribution
is given by
∂R(θ) −1 ∂R(θ)
{J(θ)}m ,n = tr R−1 (θ) R (θ)
∂{θ}m ∂{θ}n
†
∂μ (θ) −1 ∂μ(θ)
+ 2 R (θ) , (3.200)
∂{θ}m ∂{θ}n
Change of variables
It is sometimes convenient to calculate the Fisher information matrix in one
basis and then change to another set of variables. By using the matrix form
found in Equation (3.196), the covariance for the estimation error on a vector of
parameters θ is given by
5 6
(θ̂ − θ)(θ̂ − θ)T ≥ J−1
5 6−1
T
= (∇θ log p(z; θ)) (∇θ log p(z; θ)) . (3.201)
3.8 Cramer–Rao parameter estimation bound 105
Because only the {a, a}th component of the inverse is desired, the inverse of the
entire information matrix does not need to be evaluated. By using the Sherman
relation discussed in Section 2.5, the variance bound for the set of parameters of
interest is given using the reduced Fisher information matrix
−1 −1
(r )
Ja,a = Ja,a − Ja,b J−1
b,b Jb,a . (3.204)
At this point, there is an issue of convention for the variance of the ν. Because
the doppelganger variables are real, it would be natural to define the variance
as the expectation of ν ν T . However, we will use the form
% &
cov{ν} = (ν̂ − ν)(ν̂ − ν)† . (3.210)
This approach will lead to a swapping of the position of terms in the Fisher
information matrix. In this particular case, it is desirable for the diagonal terms
to be associated with ξ ξ ∗ , which is the term of interest for complex variables.
Everything could be done by using the traditional real covariance definition and
then focusing on the off-diagonal elements.
By using the form for change of variables given in Equation (3.202), the co-
variance matrix for the conjugate pair is given by
% &
cov{ν} = (ν̂ − ν)(ν̂ − ν)†
∂g(a) % & ∂g(a) †
= (â − a)(â − a) T
∂a ∂a
†
∂g(a) −1 ∂g(a)
≥ J (a)
∂a ∂a
' T (−1 †
∂g(a) ∂ ∂ ∂g(a)
= log p(z; a) log p(z; a)
∂a ∂a ∂a ∂a
' T (
−1 †
1 i ∂ ∂ 1 i
= log p(z; a) log p(z; a)
1 −i ∂a ∂a 1 −i
' −† T −1 (−1
1 i ∂ ∂ 1 i
= log p(z; a) log p(z; a)
1 −i ∂a ∂a 1 −i
' T (−1
1 1 i ∂ ∂ 1 1 1
= log p(z; a) log p(z; a) ,
2 1 −i ∂a ∂a 2 −i i
(3.211)
where the superscript −† indicates the Hermitian conjugate and inverse. By
making the observation that the Wirtinger partial derivatives are defined by
∂ 1 1 −i ∂
T
=
∂ν 2 1 i ∂aT
1 1 −i ∂
∂
α
= ∂
2 1 i ∂β
∂
∂ξ
= ∂ , (3.212)
∂ξ∗
108 Probability and statistics
as can be seen in Equations (2.159) and (2.160), similarly the complex version
of the above is given by
∂ 1 1 i ∂
=
∂ν † 2 1 −i ∂aT
∂
1 1 i ∂α
=
2 1 −i ∂
∂β
∂
∂ξ∗
= ∂ . (3.213)
∂ξ
where the subscripts in the notation Jξ∗ ,ξ and Kξ∗ ,ξ∗ are used to indicate the
parameter with which the derivative is being taken. From Equation (2.112), the
upper-right-hand block of the inverse of the complete Fisher information matrix
{F−1 }u .r. is given by
where the superscript (·)−∗ indicates the inverse conjugate of the expression. If
it is known that the expectation of the term (ξ̂ − ξ)(ξ̂ − ξ)T is 0, because of
knowledge of the distribution, then the upper-right-hand block of the inverse of
the complete Fisher information matrix must also 5be zero. For bounded6 (not
infinite) J and K and nonzero J, the relationship (ξ̂ − ξ)(ξ̂ − ξ) T
= 0 can
only be satisfied by a pseudo-Fisher information matrix that is zero, K = 0,
5 6
(ξ̂ − ξ)(ξ̂ − ξ)T = 0 and J = 0 ⇒ K = 0. (3.225)
It is worth stressing that the general form of the definition of the complete
Fisher information matrix allows for nonzero pseudo-Fisher information matri-
ces [237]; however, for many problems of interest the pseudo-Fisher information
matrices evaluate to the zero matrix and the bound is given by
5 6
(ξ̂ − ξ)(ξ̂ − ξ)† ≥ J−1
ξ∗ ,ξ = J
−1
(3.226)
if K = 0. One additional note, because of the matrix inversion, the ordering of the
conjugation between the estimation error covariance matrix and the derivatives
is reversed.
1 † −1
p(z; ρ) = e−[z−μ] R [z−μ] . (3.227)
π n |R|
Recalling the gradient operation under the assumption that vectors a and b are
functions of x while M is not, then the gradient of the quadratic form is given
by
∇ξ∗ log p(z; ξ) = ∇ξ∗ −[z∗ − μ∗ ]T R−1 [z − μ] − log |R|
∗ T
∂μ −1 ∗ ∗ T −1 ∂μ
= R [z − μ] + [z − μ ] R
∂ξ ∗ ∂ξ ∗
∂R −1 −1 ∂R
+ [z∗ − μ∗ ]T R−1 R [z − μ] − tr R .
∂ξ ∗ ∂ξ ∗
(3.230)
Here two different regimes are considered: first, the mean μ is a function of the
parameter, and second, the covariance R is a function of the parameter. Under
the assumption that the mean is a function of the complex parameter, but the
covariance is not, the derivative is given by
For many calculations, the element of the matrix given by ∇ξ† μ or ∇ξ† μ∗ is zero
because the mean is a function of either ξ or ξ ∗ . Consequently, in this case the
pseudo-Fisher information matrix K is zero. If the pseudo-Fisher information
matrix K is zero, then the Fisher information matrix is given by
5 6
T
J = [∇ξ∗ log p(z; ξ)] [∇ξ log p(z; ξ)]
= ∇ξ∗ μ† R−1 ∇ξT μ + ∇ξ∗ μT R−T ∇ξT μ∗ , (3.233)
% &
and the observation that [z − μ][z − μ]T = 0 is employed.
If the mean is not a function of the parameter, but the covariance matrix is a
function of the parameters, then the derivative with respect to the doppleganger
112 Probability and statistics
variables is given by
∂ ∂R ∂R
log p(z; ξ) = [z − μ]† R−1 R−1 [z − μ] − tr R−1
∂{ξ ∗ }n ∂{ξ ∗ }n ∂{ξ ∗ }n
∂2 ∂R ∂R
log p(z; ξ) = −[z − μ]† R−1 R−1 R−1 [z − μ]
∂{ξ }m ∂{ξ ∗ }n
∗
∂{ξ ∗ }m ∂{ξ ∗ }n
∂2 R
+ [z − μ]† R−1 R−1 [z − μ]
∂{ξ }m ∂{ξ ∗ }n
∗
∂R ∂R
− [z − μ]† R−1 R−1 R−1 [z − μ]
∂{ξ ∗ }n ∂{ξ ∗ }m
∂R ∂R
+ tr R−1 R −1
∂{ξ ∗ }m ∂{ξ ∗ }n
∂2 R
− tr R−1 . (3.235)
∂{ξ ∗ }m ∂{ξ ∗ }n
The expectation of the additive inverse of the second derivative generates the
(mth, nth) element of the pseudo-Fisher information matrix K. This expectation
is given by
) *
∂2
{K}m ,n = − log p(z; ξ)
∂{ξ ∗ }m ∂{ξ ∗ }n
∂R −1 ∂R −1
= tr R R . (3.236)
∂{ξ ∗ }m ∂{ξ ∗ }n
Many useful models for covariance matrices do not satisfy the requirement. Con-
sequently, a real parameter evaluation is required.
Reduced Fisher information for real parameters with complex nuisance parameters
It is assumed here that a set of real parameters contained in the vector u are the
parameters of interest and that the complex parameters are nuisance parameters.
Once again defining the stacked vector ν of the doppelganger variables,
ξ
ν= . (3.237)
ξ∗
for probability density p(z; x, y) over observation variable z, the complete Fisher
information matrix is given by
Ju,u Ju,ν
J= . (3.239)
Jν ∗ ,u Jν ∗ ,ν
3.8 Cramer–Rao parameter estimation bound 113
By considering the upper-left term of the inverse of the 2 × 2 block matrix from
Equation (2.112), the reduced Fisher information matrix is given by
(r )
Ju,u = Ju,u − Ju,ν J−1
ν ∗ ,ν Jν ∗ ,u
Jξ∗ ,u
= Ju,u − (Ju,ξ Ju,ξ∗ ) J−1
ν ∗ ,ν
Jξ,u
−1
Jξ∗ ,ξ Jξ∗ ,ξ∗ Jξ∗ ,u
= Ju,u − (Ju,ξ Ju,ξ∗ ) . (3.240)
Jξ,ξ Jξ,ξ∗ Jξ,u
In general, the expression must be evaluated completely. However, as mentioned
in the previous section, often some of these terms quickly evaluate to zero. In
particular, if the psuedo-information matrix evaluates to zero, the form of the
reduced Fisher information matrix simplifies to
−1
Jξ∗ ,ξ 0 Jξ∗ ,u
(r )
Ju,u → Ju,u − (Ju,ξ Ju,ξ∗ )
0 Jξ,ξ∗ Jξ,u
= Ju,u − Ju,ξ J−1 ∗ ∗
−1
ξ∗ ,ξ Jξ ,u − Ju,ξ Jξ,ξ∗ Jξ,u (3.241)
if Jξ∗ ,ξ∗ = Jξ,ξ = 0. For many applications, either the second or third term in
Equation (3.241) is zero.
Reduced Fisher information for Gaussian distribution with real parameter in the mean
with complex nuisance parameters
The Fisher information matrix for parameter pair x, y is given by
% &
Jx,y = [∇x log p(z; x, y)] [∇y log p(z; x, y)]T , (3.242)
where z is the vector of observations. A common form for the mean that is
encountered in array processing is given by
μ(θ) = a ν(u) , (3.243)
where a is a complex attenuation scalar and ν(u) is a vector function of a vector
of real direction parameters u. Typically, the attenuation scalar a is considered
a nuisance parameter. The vector of parameters can be represented by
⎛ ⎞
u
θ=⎝ a ⎠. (3.244)
a∗
For a complex Gaussian model, the log of the probability density is given by
log p(z; a, u) = −[z − a ν(u)]† R−1 [z − a ν(u)] + const. , (3.245)
where R is the spatial covariance matrix of the noise. Given this model for the
probability density, the pseudo-information terms for the complex parameters
are zero. Consequently, the reduced information matrix is given by Equation
(3.241),
(r )
Ju,u = Ju,u − Ju,a J−1 −1
a ∗ ,a Ja ∗ ,u − Ju,a ∗ Ja,a ∗ Ja,u . (3.246)
114 Probability and statistics
The first term is the Fisher information matrix associated with the real direc-
tion parameters u. This term is given by the parameter-in-the-mean term found
in Equation (3.200),
∂[a ν(u)]† −1 ∂[a ν(u)]
J{u}m ,{u}n = 2 R
∂{u}m ∂{u}n
3 4
Ju,u = 2 a (∇u [ν(u)] ) R−1 (∇u [ν(u)])T
2 †
0 1
Ju,u = 2 a 2 V̇† R−1 V̇ , (3.247)
where the two attenuation Fisher information terms are equal because the scalar
derivative terms commute, Ja ∗ ,a = Ja,a ∗ .
The cross-parameter information terms are given by
% &
Ju,a ∗ = [∇u log p(z; a, u)] [∇a ∗ log p(z; a, u)]T
∇u log p(z; a, u) = −∇u ([z − a ν(u)]† R−1 [z − a ν(u)])
= a∗ [∇u ν † (u)] R−1 [z − a ν(u)] + a [∇u ν T (u)] R−T [z − a ν(u)]∗
∇a ∗ log p(z; a, u) = ν † (u) R−1 [z − a ν(u)]
% &
Ju,a ∗ = a [∇u ν T (u)] R−T [z − a ν(u)]∗ [z − a ν(u)]T R−T ν ∗ (u)
= a [∇u ν T (u)] R−T ν ∗ (u) , (3.250)
From the definition of the information matrix, reversing the parameters is given
by the transpose of the information matrix, so that
Ja ∗ ,u = JTu,a ∗
= aν † (u) R−1 [∇u T ν(u)]
Ja,u = JTu,a
= a∗ ν T (u) R−T [∇u T ν ∗ (u)] . (3.252)
The second and third terms for the reduced Fisher information matrix are given
by
[∇u ν † (u)] R−1 ν(u) ν † (u) R−1 [∇u T ν(u)]
Ju,a J−1
a ∗ ,a Ja ∗ ,u = a
2
(3.253)
ν † (u) R−1 ν(u)
and
−T ∗
2 [∇u ν (u)] R
T
ν (u) ν T (u) R−T [∇u T ν ∗ (u)]
Ju,a ∗ J−1
a,a ∗ Ja,u = a
ν † (u) R−1 ν(u)
−1
∗
= Ju,a Ja ∗ ,a Ja ∗ ,u . (3.254)
Consequently, the reduced Fisher information matrix is given by
3 4
(r )
Ju,u = Ju,u − 2 Ju,a J−1 a ∗ ,a Ja ∗ ,u
0 1 3 4
= 2 a 2 V̇† R−1 V̇ − 2 Ju,a J−1 a ∗ ,a Ja ∗ ,u
0 1
= 2 a 2 V̇† R−1 V̇
+ 7
† −1 † −1
V̇ R ν(u) ν (u) R V̇
− 2 a 2 . (3.255)
ν † (u) R−1 ν(u)
where the spatially whitened vector and derivative matrix are defined by
x(u) = R−1/2 ν(u)
Ẋ = R−1/2 V̇ . (3.257)
This form can be simplified further by defining the projection operator orthog-
onal to the column space spanned by x(u) and is given by
P⊥ †
x(u) = I − x(u) [x (u) x(u)]
−1 †
x (u) . (3.258)
By using this operator, the final form for the reduced Fisher information matrix
is given by
0 1
(r )
Ju,u = 2 a 2 Ẋ† P⊥x(u) Ẋ . (3.259)
116 Probability and statistics
Problems
3.1 Evaluate the variance of a random variable from the log-normal distribu-
tion.
3.2 If the real variables X and Y with values x and y are given by Y = X 2 ,
evaluate the probability density for Y given the density for X for the cases
(a) in general,
(b) if X is given by a Rayleigh distribution.
3.4 Evaluate the first four central moments of a random variable that is char-
acterized by unit variance and is uniform over phase.
3.6 For large Wishart matrices, constructed by the outer product G G† where
the matrix G ∈ Cm ×n contains entries drawn independently and randomly from
a complex circular Gaussian distribution, evaluate the approximate peak-to-
average eigenvalue ratio under assumptions of
(a) n/m = 1
(b) n/m = 2
(c) n/m = 4
(d) n/m = 16.
3.8 Let h(t) be the impulse response of a linear-time-invariant system such that
h(t) is square integrable. Let the input to this system be a stationary random
process X(t). Show that the autocorrelation function of the output process Y (t)
is given by
←
−
RY Y (τ ) = h ∗ h ∗ RX X (τ ) (3.260)
Problems 117
where α > 2.
(a) Show that the mean signal power at the center of the disk I is infinite.
(b) Show that the signal power at the center of the disk I is finite with proba-
bility 1.
3.11 Use Equation (3.5) to derive Equation (3.59) from Equation (3.53).
4 Wireless communications
fundamentals
For convenience in design, the operations of radios are often broken into a number
of functional layers. The standard version of this stack is referred to as the open
systems interconnection (OSI) model [291], as seen in Figure 4.1. The model
has two groups of layers: host and media. The host layers are the application,
presentation, session, and transport layers. The media layers are the network,
data-link, and physical layers. In many radio systems, some of these layers are
trivial or the division between the layers may be blurred. The OSI stack is
commonly interpreted in terms of wired networks such as the internet. Depending
upon the details of an implementation, various tasks may occupy different layers.
Nonetheless, the OSI layered architecture is useful as a common reference for
discussing radios. In this text, the media layers are of principal importance.
The network layer indicates how data are routed from an information source
to a sink node, as seen in Figure 4.2. In the case of a network with two nodes,
this routing is trivial. In the case of an ad hoc wireless network, the routing
may be both complicated and time varying. The network layer may break a
data sequence at the source node into smaller blocks and then reassemble the
data sequence at the sink node. It also may provide notification of errors to the
transport layer.
The data-link layer controls the flow of data between adjacent nodes in a
network. This layer may provide acknowledgments of received data, and may or
may not contain error checking or correction. Sometimes this layer is broken into
the logical-link-control and media-access-control (MAC) sublayers. The MAC is
used to control the network’s reaction to interference. The interference might be
internal, that is, caused by the network’s own links, or external, that is, cause by
a source not under the network’s control. The logical-link-control sublayer is used
by the protocol to control data flow. The logical-link-control sublayer interprets
frame headers for the data-link layer. The MAC specifies a local hardware address
and control of a channel.
The physical layer defines the mapping of information bits to the radiated sig-
nal. The physical layer includes error-correction coding, modulation, and spectral
occupancy. It also includes all the signal processing, at both the transmitter and
receiver.
4.2 Reference digital radio link 119
Simple Network
Sink
Source
Figure 4.2 Network of nodes with a connection between a source node and a sink node.
The basic physical layer of a digital radio link has nine components: data source,
encoding, modulation, upconversion, propagation, downconversion, demodula-
tion, decoding, and data sink. While not all digital radios conform to this struc-
ture, this structure is flexible enough to capture the essential characteristics for
discussion in this text. Here we have distinguished between up/downconversion
and modulation. This distinction is a convenient convention for digital radios.
In practice there is a large variety of data sources. The classic modern example
is the cellular or mobile phone [260]. The modern mobile phone is used for
internet access, data, video, and occasionally voice. For this discussion, we will
focus on voice communications. In the uplink, voice data are sent from the phone
to the base station. The analog voice signal is digitized and compressed by using
a vocoder. There are a variety of approaches to vocoders that in general provide
significant source compression. The raw digitized signal might require a data
rate of as much as 200 kbits/s or more. The signals compressed by vocoders
typically require around 10 kbits/s. These data, along with a number of control
parameters, are the data source.
120 Wireless communications fundamentals
The encoding of the data typically includes some approach to compensate for
noisy data, denoted forward-error-correction (FEC) encoding. Error-correction
codes introduce extra parity data to compensate for noise in the channel. With a
strong code, multiple errors caused by noise in the channel can be corrected. The
theoretical limit for the amount of data (that is, information not parity) that
can be transmitted over a link in a noisy channel with essentially no errors is
given by the Shannon limit and is discussed in Section 5.3. Modern codes allow
communication links that can closely approach this theoretical limit. Coding
performance and computation complexity can vary significantly.
Following the data encoding is the modulation. Depending upon the details of
the forward-error-correction coding scheme, the error-correction algorithms used
may or may not be strongly coupled with the modulation. As an example, trellis
coding strongly couples modulation and coding [316, 317].
The modulation translates the digital data to a baseband signal for transmis-
sion. The baseband signal is centered at zero frequency. Associated with each
transmit antenna are an in-phase and a quadrature signal. It is often convenient
to view these signals as being complex, with the real component correspond-
ing to the in-phase signal and the imaginary component corresponding to the
quadrature signal. The variety of modulation schemes vary in complexity. Some
examples shown in Figure 4.3 are binary phase-shift keying (BPSK), which uses
symbols
in Figure 4.4. The mixer creates images at the sum and difference of the IF
frequency and the analog upconversion frequency. Because of filter design con-
straints, it is helpful to keep the IF frequency reasonably high so that the filter
can easily select one of these images. For logistical reasons, even more stages are
sometimes used.
with delay spread are said to be frequency selective. If there is motion in the
environment or if the transmitter or receiver is moving, the channel will change
over time. Because the directions to various scatters are not typically identical,
motion introduces a range of Doppler frequency shifts. In this regime, it is said
that the channel has Doppler spread. The effects of this complicated environment
can be mitigated or even exploited by using adaptive techniques as discussed in
Chapter 10.
For the sake of convenience, the channel attenuation that is caused by mul-
tipath scattering is often factored into a term associated with fading that in-
corporates the variation in the channel due to relative delays and motion, and
a term associated with overall average attenuation. The average attenuation is
typically parameterized by the link length, r. Typically, average signal power in
ad hoc wireless networks is assumed to decay with distance r as r−α e−γ r , where
α is known as the path-loss exponent and γ is an absorption coefficient. In most
works in the literature, γ is set to zero and α > 2.
f kB TK /h
≈ 6 THz at room temperature, (4.6)
Pn ≥ kB TK B . (4.7)
right-hand side of Equation (4.7). The noise figure of a good receiver might be
two to three decibels. However, it is not uncommon to have noise figures a few
decibels higher.
The channel also includes external interference that may come from unin-
tended spectral sidelobes of nearby spectral occupants or competing users at
the same frequency. External interference is a common issue in the industrial-
scientific-medical (ISM) band in which WiFi operates [150]. In the case of ad hoc
wireless networks, this interference may be caused by other users in one’s own
network.
Similar to upconversion, downconversion is used to transform the signal at car-
rier frequency to a complex baseband signal s(t). The downconversion may be
performed in a single step or in multiple steps using an intermediate frequency.
These conversions may be performed digitally or by using analog circuitry. As
an example, a single-stage downconversion can be notionally achieved by multi-
plying the received signal by the complex conjugate of the upconversion, eiω t ,
eiω t [s(t) e−iω t ] = eiω t ({s(t)} cos(ωt) + {s(t)} sin(ωt))
= {s(t)} cos2 (ωt) + i sin(ωt) cos(ωt)
+ {s(t)} i sin2 (ωt) + sin(ωt) cos(ωt)
1 1 1
= {s(t)} + cos(2ωt) + i sin(2ωt)
2 2 2
1 1 1
+ {s(t)} i − i cos(2ωt) + sin(2ωt) . (4.8)
2 2 2
It is clear from the above form that the signal is broken into a high-frequency
component centered at 2ω and a baseband component near zero frequency. Un-
der the assumption that ω is large compared with 2πB, where B is the signal
bandwidth, the baseband signal can be recovered by applying a lowpass filter.
This will remove the 2ωt terms, giving the downconverted signal,
1 1
({s(t)} + i{s(t)}) = s(t) . (4.9)
2 2
Demodulation covers a range of approaches to working with the received base-
band signal. For multiple-antenna receivers, some form of signal combining can
be used to improve demodulation performance by increasing signal-to-noise ratio
(SNR) or mitigating cochannel interference. Two basic classes of decoders are
the hard and the soft decoder. As an example, consider a complex baseband
QPSK signal in noise, displayed in Figure 4.6. Because of the noise, the received
points lie in a region near but not at the transmitted QPSK symbol. A hard
decision on the two modulated bits can be made based on the quadrant in which
the signal is observed. These hard decisions can be sent to the decoder. In the
example depicted in Figure 4.6, some hard decisions (for example, the dark gray
dots in the bottom left quadrant) will be incorrect. In modern receivers, it is
more common to estimate the likelihood of the possible bit states and pass these
“soft decisions” to the decoder. In general, hard decisions reduce computation
4.3 Cellular networks 125
−1
−2
−2 −1 0 1 2
Real Part Of Received Signal
complexity, while soft decisions improve performance. Soft decisions also blur
the line between demodulation and decoding.
The decoder is intimately tied to the encoder. By using the hard or soft de-
cisions provided by the demodulator, the decoder extracts an estimate of the
original data sequence. Strong decoders can compensate for large symbol error
rates.
Finally, the data sink uses these data. In the case of a mobile phone, the
vocoded signal is reproduced. In the case, of a wireless network, the data may
be repackaged and retransmitted along the next link in the network.
Figure 4.7 Cellular model with Poisson cells, with base stations denoted by dots.
3.5
2.5
1.5
0.5
−0.5
−1
−1.5
−2
−2 −1 0 1 2 3 4 5
Figure 4.8 Cellular model with hexagonal cells, with base stations denoted by dots.
station, the hexagonal-cell model results in the most efficient coverage of a given
two-dimensional area, that is the fewest number of base stations are required to
cover a given area.
Note that both the Poisson-cell model and the hexagonal-cell model are
opposite extremes. In reality, base-station locations are carefully planned, but
are subject to geographical and usage variances.
3 2 1 3 2
2 1 3 2 1 3
3 2 1 3 2
2 1 3 2 1 3
3 2 1 3 2
2 1 3 2 1 3
Figure 4.9 Channel assignment for hexagonal cells with reuse factor three.
Guard time/interval
for Mobiles (GSM). The base station assigns mobile users to noninterfering time
slots and provides an accurate common timing reference for all in-cell users.
4
yk (t) = xj cj (t) + nk (t), (4.10)
j =1
where nk (t) is the noise process at the kth receiver. The kth receiver may recover
a noise-corrupted version of xk by filtering the received signal through a filter
matched to its code ck (t) and sampled at time t = 0,
⎛ ⎞
∞ 4
1
rk (0) = dτ ck ∗ (τ ) ⎝ xj cj (τ ) + nk (τ )⎠
4 −∞ j =1
= xk + nk , (4.11)
130 Wireless communications fundamentals
c3 (t) c4 (t)
c1 (t) c2 (t)
∞
where nk = 14 −∞ ck ∗ (τ ) nk (τ ) dτ . Thus, the interference is eliminated as rk (0)
does not contain any contribution intended for the other mobile units.
Note that length-M Walsh functions can also be represented as length-M
vectors, where ck (t) for M = 4 can be represented by the following vectors:
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 1 1 1
⎜ 1 ⎟ ⎜ 1 ⎟ ⎜ −1 ⎟ ⎜ −1 ⎟
c1 = ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ 1 ⎠ , c2 = ⎝ −1 ⎠ , c3 = ⎝ −1 ⎠ , c4 = ⎝ 1 ⎠ .
⎜ ⎟ (4.12)
1 −1 1 −1
With this representation, the operation of matched filtering followed by sampling
can be interpreted as a simple inner product. A matched filter, discussed for
spatial processing in Section 9.2.1, has a form that has the same structure as
that expected of the signal. Here the matched filter is given by the structure of
the Walsh function spreading sequence.
On the uplink, mobile users do not encode their signals using orthogonal codes
because the transmitted signals from the mobile units pass through different
channels, thus destroying orthogonality. Additionally, the Doppler spread (dis-
cussed in detail in Chapter 10) resulting from relative motions of the different
mobile units causes signals received by the base station from different mobiles
to no longer be orthogonal. For these reasons, practical CDMA systems utilize
random spreading codes on the uplink from the mobiles to base stations.
4.3 Cellular networks 131
Consider the following simple example in which two mobile users (nodes 1 and
2) are in the same cell. Let hk (t) denote a linear-time-invariant (LTI) channel
impulse response between the base station and node k. Furthermore, we shall
assume reciprocity.
On the downlink, let the base station encode the signal intended for the kth
user by using the function dk (t). The received signal at the kth mobile user is
given by
yk (t) = x1 hk ∗ d1 (t) + x2 hk ∗ d2 (t) + nk (t) . (4.13)
Suppose that the kth mobile user is able to invert the channel perfectly by using
an equalizer whose impulse response is fk (t). The equalized signal for user k is
given by
ỹk (t) = x1 d1 (t) + x2 d2 (t) + fk ∗ nk (t) . (4.14)
Mobile node 1 can then match filter ỹk (t) with d1 and sample at time 0 to remove
the interference. Node 2 can do the same with d2 (t).
The situation is different on the uplink from the mobile units to the base
stations. Let the kth user encode its transmit symbol xk by using the function
ck (t). The received signal at the base station is given by
y(t) = x1 h1 ∗ c1 (t) + x2 h2 ∗ c2 (t) + n(t) . (4.15)
In this case, unless h1 (t) and h2 (t) take very specific forms, it is not possible to
simultaneously invert h1 (t) and h2 (t). Any orthogonality associated with c1 (t)
and c2 (t) will be lost, making it not very useful to employ orthogonal codes on
the uplink. Instead, pseudorandom (but known at the base station) codes are
used. If the codes are sufficiently long, they will be nearly orthogonal as the
following analysis illustrates.
Consider a collection of M codes of length M where each code vector has M
zero-mean i.i.d. entries of variance 1/M . Let the jth entry of ck be denoted by
cij . We then have
M
1 √
M
c†k ck = c∗ij cij = | M cij |2 . (4.16)
j =1
M j =1
Nominal beam
pattern for one
directional antenna
Three directional
antennas
combined to
cover cell in three Cell boundary
sectors
Hence, we conclude that long random codes√ are nearly orthogonal. Note that,
typically, cij are assigned values from ±1/ M with equal probability although
the above analysis holds for other distributions as well.
Figure 4.13 Ad hoc network with transmitters denoted by dots and receivers by circles.
Per-link capacity
1
TDMA ~ n
1
Multi-hop - Gupta & Kumar (2000) etc. ~
n
Ozgur, Leveque, Tse (2007) ~ Constant
Number of
>1000 >10 9 nodes, n
Figure 4.14 Illustration of key results on the capacity-scaling laws of ad hoc wireless
networks. Note that the per-link rates illustrated in this figure are qualitative.
between a given node and one of its nearby nodes. Since the distances between
√
nodes and their nearest neighbors decay with 1/ n, the signal power received
by a node from its nearest neighbors increases as nα /2 . However, so does the
interference power. Thus, the physical-layer links can maintain approximately
constant signal-to-interference ratios and approximately constant data rates with
increasing n. Because the number of hops required to traverse a fixed distance
√ √
increases as n, the data rate also decays approximately as n. Gupta and
Kumar showed this principle for a specific traffic pattern, and it was extended
to random traffic patterns by Franceschetti et al. [101]. The Ozgur et al. result
uses a hierarchical cooperation scheme with distributed multiple-input multiple-
output (MIMO) links, where collections of nearby nodes act as virtual antenna
arrays. More details on these results are given in Chapter 14.
Single-antenna systems
A different approach used in analyzing ad hoc wireless networks is to find the
data rates achievable for a given outage probability in networks in which nodes
utilize specific communication schemes. The primary tool for such analyses has
been stochastic geometry, where nodes are typically modeled as distributed on
a plane according to a homogenous Poisson point process that places nodes
randomly with uniform spatial probability density.
For narrowband systems that have nodes distributed according to a Poisson
point process, that transmit with i.i.d. power levels, the aggregate interference
power seen at a typical node in the network can be characterized by its charac-
teristic function, although the CDF and PDF of the interference powers is not
known. For the case of an ad hoc wireless network with α > 2 and the density
of transmitting nodes equaling ρ, the characteristic function of the interference
4.4 Ad hoc wireless networks 135
5 6
% & 2
e−sI = exp −ρπ (hP )2/α Γ 1 −
2
sα , (4.21)
α
% &
where (hP )2/α is the expected value of the product of the channel fading
coefficients h and the transmit powers P of the transmitting nodes.
The characteristic function is particularly useful in computing the probability
that the signal-to-interference ratio (SIR) of a representative link exceeds some
threshold in Rayleigh fading channels. This property is the result of the fact
that the received signal power conditioned on the transmit power and link length
in Rayleigh fading channels is distributed as a unit-mean exponential random
variable (see Section 3.1.10) whose CDF PE xp (x) is
1 − e−x , x ≥ 0,
PE xp (x) = (4.22)
0, x < 0.
For x ≥ 0, the complementary cumulative distribution function (CCDF) is
1−PE xp (x) = e−x . With S and I representing the received signal and interference
powers respectively, the probability that the SIR is greater than some threshold
τ is given by
Using tools from stochastic geometry, Weber et al. [340] introduced the idea of
transmission capacity, which is the product of the data rate and the maximum
density of nodes in an ad hoc network that achieves a particular SINR for a
given outage probability weighted by the probability. This quantity enables a
more direct analysis of the achievable data rates in such networks. The authors
in Reference [340] use transmission capacity to compare direct-sequence CDMA
that uses matched-filter receivers, with frequency-hopping CDMA in networks
with spatially distributed nodes. They find that the order-of-growth of the trans-
mission capacity with spreading length of frequency-hopping CDMA systems is
larger than that of direct-sequence CDMA systems.
136 Wireless communications fundamentals
Multiple-antenna systems
Antenna arrays in ad hoc wireless networks are useful both in terms of spatial
multiplexing (that is enabling a single user to transmit multiple data streams)
as well as for SDMA. By nulling out the interference contribution from nearby
nodes, it is possible to get significant increases in SINR and hence data rates.
With N antennas per receiver, it is possible to null out the interference from
N − 1 sources in ideal situations; however, this interference mitigation may come
at the expense of some loss of signal SNR at the output of the mitigating receiver.
Alternatively, it is also possible to increase signal power by a factor of N relative
to noise by coherently adding signals from N antennas coming from a target
signal source.
The following simple heuristic argument shows that it is possible to achieve
signal-to-interference-plus-noise ratio scaling on the order of (N/ρ)α /2 in ad hoc
wireless networks with a power-law path-loss model.
Consider a receiver with N antennas at the center of a circular network of
interferer density ρ as illustrated by the square in Figure 4.15. Suppose this
receiver uses a fraction ζ of its degrees of freedom to null the ζN − 1 interferers
closest to it. When N is large, these nulled interferers occupy a circle of radius
approximately equal to
ra = (ζN − 1)/πρ ≈ ζN/πρ . (4.31)
Assuming that the interferers are distributed continuously from ra to infinity and
integrating their interference contribution, we find that the residual interference
grows as ra2−α ∼ (N/ρ)1−α /2 . Suppose that the remaining (1 − ζ)N degrees of
freedom are used by the receiver to increase the SINR relative to thermal noise
and residual interference by a factor N . Then, the SINR grows as (N/ρ)α /2 .
In networks with α > 2, which are common, the SINR growth with number of
antennas is greater than linear, which would be the case for simple coherent com-
bination. Additionally, this heuristic analysis indicates that it may be possible to
4.5 Sampled signals 137
Closest un-
nulled
interferer
Nulled region
Continuous distribution
of un-nulled interferers.
Figure 4.15 Illustration of interference contribution from planar network with nulling
of nearby interference.
increase the density of users and maintain the same SINR by linearly increas-
ing the number of antennas per receiver with the node density. This has been
shown independently for different sets of assumptions in References [56], [123],
and [164].
There are some subtleties in considering sampled signals and channels. Some of
these effects with regard to channels are explored in more detail in Section 10.1.
From the basic physics, in some sense, all electromagnetic signals are quantized
because the signal is mediated by photons that have quantized energies. However,
at frequencies of interest for most wireless communications, the energy of a single
photon is so small that it is not useful, although it is an important consideration
for optical communications. The large number of photons received at an antenna
are so very large that statistically they can be modeled well by continuous signals.
Somewhat amusingly, because essentially all modern communications are digital,
the physical signals are quantized in energy and time, although this quantization
has little to do with the physical photon quantization. In general in this text,
it is assumed that any subtleties associated with sampling have been addressed
and we work with the complex baseband signal. However, we will address some
of the sampling issues here.
The fundamental issue is that sampled signals need to be sampled at a rate
that satisfies the Nyquist criterion,2 that is, for a band limited signal of width
2 While “Nyquist” is widely used to indicate this criterion, work by John Whittaker,
Vladimir Kotelnikov, and Claude Shannon could justify use of their names, and the
criterion is sometimes denoted the WKS criterion [25].
138 Wireless communications fundamentals
where δ(·) is the delta function, and the scaling by T is for convenience. The
evaluated form in Equation (4.33) is the discrete-time Fourier transform of the
sampled form of the signal. Two issues can be observed from Equation (4.33).
First, for a received signal, if the spectral width of the signal S(f ) is greater
than 1/T the various images of the continuous signal would overlap, resulting
in an inaccurate sampled image. This phenomenon is referred to as aliasing.
Second, if one were to transmit this signal, it would occupy infinite bandwidth.
This is undesirable, and physically unrealizable. A pulse shaping filter is applied
typically to reduce the spectral width of the signal. To get perfect reconstruction
of a band-limited signal (of bandwidth B ≤ 1/T ), one can theoretically employ
a perfect spectral “brick-wall” filter θ(f T ) that is spectral flat, within −1/(2T )
to 1/(2T ) and zero everywhere else. The impulse response of this filter is given
by the sinc function, so that the reconstructed signal is given by
t − mT
s(t) = s(mT ) sinc . (4.34)
m
T
Unfortunately, this approach is not achievable because the sinc function is infinite
in extent; however, reconstruction filters that require a small number of samples
can be designed that still work well.
Problems
i
(01) (00)
−1 1
(10) (11)
−i
the constellation is driven by the distance between the closest points. Compared
to a BPSK constellation, find the relative average power required to hold this
minimum amplitude distance equal for the following constellations:
(a) QPSK,
(b) 8-PSK,
(c) 16-QAM,
(d) 64-QAM,
(e) 256-QAM.
4.5 Consider the constellation diagram for a QPSK system shown in Figure 4.16
with the constellation points assigned to 2-bit sequences. If possible, find an
alternative assignment of bits that leads to a lower average probability of bit
error.
140 Wireless communications fundamentals
M
z= sj cj + n , (4.37)
j =1
where the si terms take on values of ±1 with equal probability and n contains
independent, identically distributed Gaussian random variables with zero mean
and variance 0.01. Let an estimate of s1 be given by ŝ1 = sign c†1 y .
(a) Suppose that c†j c1 = 0 for j = 1. Find the probability that ŝ1 = s1 .
(b) Suppose that√the entries of the vectors cj are i.i.d. random variables taking
values of ±1/ M with equal probability. Find the probability that â1 = a1 .
4.8 Consider a network comprising interferers distributed according to a Pois-
son point process with density of interferers ρ and subject to the standard inverse-
power-law path-loss model with path-loss exponent α > 2. Consider a link of
length r in this network between a receiver that is not part of the process and
an additional transmitter at a distance r away. Assuming that the signals are
subject to Nakagami fading with shape parameter μ equaling a positive integer,
find the cumulative distribution function (CDF) of the signal-to-interference ra-
tio of the link in question. Hint: the upper incomplete gamma function Γ(s, x)
for positive integers s can be expressed as follows:
s−1 k
x
Γ(s, x) = (s − 1)! e−x . (4.38)
k!
k =0
5.1 Antennas
The study and design of antennas is a rich field [15]. Here, we focus on a small
set of essential features. The first important concept is that antennas do not
radiate power uniformly in direction or in polarization. The radiated power as
a function of direction is denoted the radiation pattern. If the antenna is small
compared with the wavelength (for example, if the antenna fits easily within
radius of a 1/8 wavelength), then the shape of the radiation pattern is relatively
smooth. However, if the antenna is large compared with the wavelength, then
the radiation pattern can be complicated. Antenna patterns are often displayed
in terms of decibels relative to a notional isotropic antenna (denoted dBi). The
notional isotropic antenna has the same gain over all 4π of solid angle.1 Gain is an
indication of directional preference in the transmission and reception of power.
The axisymmetric radiation pattern for an electrically small (small compared
with a wavelength) dipole antenna is displayed in Figure 5.1. In the standard
spherical coordinates of r, θ, φ, which correspond to the radial distance, the polar
angle, and the azimuthal angle, respectively, the far-field electric field is limited
to components along the direction of θ, denoted eθ . For electrically small dipoles,
the radiation pattern is proportional to [290, 154]
1
eθ 2
∝ sin2 θ . (5.1)
r2
1 Consider an object in a three-dimensional space projected onto a unit, from a point at the
origin, the solid angle encodes the fraction is the area of the unit sphere that is occupied
on the unit sphere. Solid angle is typically normalized such that 4π covers the entire
viewable angular area.
142 Simple channels
One can find this relationship by noting that the radiation pattern must satisfy
the differential equation
1 ∂2
∇ − 2 2
2
e(r, t) = 0 , (5.2)
c ∂t
to satisfy Maxwell’s equations, where e(r, t) is the electric field vector as a func-
tion of time t and position r, and is determined by radial distance, polar an-
gle, and azimuthal angle indicated by r, θ, φ. The speed of light is indicated
by c, and ∇2 is the Laplacian operator discussed in Section 2.7.1. Solutions to
this equation are proportional to gradients of spherical harmonics [11] denoted
Yl,m (θ, φ), where l indicates the degree of the harmonic and m is the order of the
harmonic.
By observing various symmetries of the antenna and of the spherical har-
monic function, one can determine the contributing degree and order. Here these
observations are made without proof, and the interested reader can find more
thorough discussions of spherical harmonics. These symmetries are presented in
various texts discussing spherical harmonics [11]. Based on the axial symmetry
of the antenna, the radiated power must be axisymmetric. Solutions with m = 0
are a function of φ and therefore have azimuthal structure (that is, they are
a function of φ); consequently, order zero m = 0 is required. Furthermore, the
value of the θ-direction component of {e}θ is the same under parity inversion
such that the direction vector r → −r. A result of this symmetry is that there
is a symmetry in the value eθ above and below the θ = π/2 plane. Spherical
harmonics that observe this symmetry require that only odd values of degree l
are allowed. Here it is assumed that the antenna is small compared with a wave-
length. Given the short length of the antenna compared to a wavelength, it is
difficult to induce complicated radial functions of current flow [15]. The coupling
of current to induce a particular spherical harmonic is proportional to the spher-
ical Bessel function jl (k d) [11], where k is the wavenumber and d is the distance
along the antenna. The l = 1 spherical Bessel function moves the quickest from
zero as a function of k d, and thus corresponds to the solution with the largest
current and thus radiated power. The lowest-order spherical harmonic satisfying
all the symmetries and radiating the most power for an electrically small dipole
is Yl=1,m =0 (θ, φ). Consequently, the electric field that is proportional to the gra-
dient of this function is proportional to sin(θ). The gain is therefore given by
Equation (5.1).
The notional isotropic antenna would radiate equal power in all directions.
Consequently, if the isotropic antenna and the small dipole antenna radiated
the same total power, such that the integrals over a surface at some distance r
are the same, then the peak gain Gdipole , which is at the horizon (θ = π/2), is
5.2 Line-of-sight attenuation 143
Dipole
2
e
∼1.76 dBi,
sin2 (π/2)
Gdipole = % 2 &
sin (θ)
sin2 (π/2)
= dφ dθ sin θ sin 2 (θ )
dφ dθ sin θ
3
= ≈ 1.76 dBi . (5.3)
2
Typically, the peak gain increases as the size of the antenna increases in units of
wavelength.
A second important concept is that the magnitude of the radiation pattern is
the same for transmitting and for receiving. This property is because Maxwell’s
equations are symmetric under time reversal [290, 154]. Consequently, a transmit
and receive pair of antennas observe reciprocity; that is, they will observe the
same channel and antenna gain when the link direction is reversed. If an antenna
tends to radiate power in a particular direction, then it will prefer receiving power
from that same direction. Signals being received from other directions will have
some relative attenuation.
Surface
2
Area ~ r
Effective
Area Gain
Aeff ,2 Gt,2
= . (5.5)
Aeff ,1 Gt,1
Load Antenna
where the first term on the right-hand side of Equation (5.8) includes the radia-
tion that matches the polarization of the antenna and the second term includes
the radiation that is orthogonal to the polarization of the antenna. Consequently,
the received power spectral density from the blackbody radiation is given by
π
2f 2 kB T 2π
P = dφ dθ Aeff (θ, φ)
2 c2 0 0
f 2 kB T
= 4π Aeff (θ, φ) . (5.9)
c2
As discussed in Section 4.2.2, at lower frequencies, the thermal power spectral
density due to the resistor and radiated by the antenna is given by
P = kB T . (5.10)
By equating the incoming and outgoing power, the average effective area is found
to be
λ2
Aeff (θ, φ) = , (5.11)
4π
by noting that the wavelength is given by λ = c/f . By construction, the average
gain is one:
Because the effective area and gain are proportional and their averages are de-
termined here, they are related by
Aeff (θ, φ)
= G(θ, φ) , (5.13)
λ2 /4π
under the assumption that their polarizations are matched.
When the direction parameters are dropped, maximum gain and effective area
are typically assumed, so that the gain and effective area are related by
4π Aeff
G= . (5.14)
λ2
As mentioned previously, the effective area of an antenna is somewhat different
from the physical area. For moderately sized antennas (a few wavelengths by
a few wavelengths), effective area is typically smaller than the physical area.
Effective area is difficult to interpret for wire antennas and electrically small
antennas, although if one imposes the constraint that no physical dimension can
be smaller than some reasonable fraction of a wavelength (something on the
order of 1/3), then it is at least somewhat consistent with the effective area.
While it is generally assumed that the gain indicates the peak gain from the
antenna exclusively, sometimes the inefficiencies due to impedance mismatches
between the amplifier or receiver and the antenna are included with the gain.
In this case, directionality is the gain of the ideal antenna, and the gain is the
product of the directionality and the adverse effects of inefficiencies [15].
5.2 Line-of-sight attenuation 147
Figure 5.4 Notional radiation beamwidth for antenna with lh × lw effective area.
5.2.2 Beamwidth
The beam shape is dependent upon the details of the antenna shape, but the
shape of the main lobe can be approximated by considering a rectangular an-
tenna with area Aeff ≈ A = lw lh , where lw and lh are the width and height
of the antenna, as seen in Figure 5.4 The beamwidths in each direction are ap-
proximately λ/lw and λ/lh . For a square antenna l = lw = lh of gain G, the
beamwidth, Δθ, in each direction is approximately
,
λ 4π
Δθ ≈ = . (5.15)
l G
There are advantages and disadvantages to having higher-gain antennas. In a
line-of-sight propagation environment, the increased gain provides more signal
power at the receiver. However, this comes at the cost of requiring greater ac-
curacy in pointing the antennas. Furthermore, in non-line-of-sight environments
with significant multipath, there is no one good direction for collecting energy
from the impinging wavefronts. Collecting energy in complicated scattering en-
vironments is one of the advantages of using adaptive antenna arrays versus
high-gain antennas.
As an example, if we imagine a square 20 dBi antenna, the effective area is
given by
G 2
A= λ ≈ 8 λ2 (5.16)
4π
and the approximate beamwidth is given by
,
4π
Δθ ≈ ≈ 0.35 rad . (5.17)
G
In terrestrial communications, it is not uncommon for communication links to
be blocked by walls, buildings, or foliage. Even when the transmitter and receiver
have a direct path, the propagation is often complicated by the scatterers in
the environment. Nonetheless, line-of-sight propagation is useful as a reference
propagation loss.
148 Simple channels
r ~ 36 000 km
Gt
Gr
Next, two approaches for motivating channel capacity are discussed. The first
is geometric. The second is based on the concept of mutual information. For a
more thorough discussion see [68, 314], and the original discussion [284].
added to s(t0 ), and s(t1 ) with equal probability. The complete list of 64 possible
channel output states is displayed in Figure 5.6. The four values of the s(t0 ),
s(t1 ) pair are represented by the dots, and all of the potential output states of
z(t0 ), z(t1 ) are represented by the intersections of the grid lines.
Because of the careful construction of the noise, all four of these states can
be recovered. Consequently, the system reliably can communicate 2 bits. The
potential number of useful information bits can also be seen by considering the
entropy of the received signal and the noise. By analogy to statistical mechanics,
the entropy with equally likely states is given by the logarithm of the number of
potential states. Thus, the entropy associated with z(t0 ), z(t1 ), denoted here as
Hz , is given by
The total number of information bits or the capacity C of this channel is given
by
C = Hz − Hn
= 6 − 4 = 2. (5.23)
z(t1)
4
z(t0)
-4 -2 2 4
-2
-4
Figure 5.6 Simple entropy example. There are 16 possible noise states and 64 possible
received states.
hardens. The notion of hardening indicates that the fluctuation about a central
value decreases as the number of symbols increases. Consequently, essentially
all draws from the Gaussian distribution will be arbitrarily close to the surface
of the sphere. As the number of complex symbols ns increases, the probability
of large fluctuations in noise distance decreases. A second implication of con-
sidering the large dimensional limit is that the ns -dimensional hypervolume is
dominated by the contributions at the surface, that is the vectors associated with
codewords are nearly always close to the surface. Thus, by assuming that the
distribution of codewords is statistically uniform across the surface, the number
of states is proportional to the volume. The capacity is given by the ratio of the
volume of hyperspheres in which signal-plus-noise vectors occupy to the volume
of hyperspheres in which noise vectors occupy. It is shown later in this section
that this packing is achievable. A set of code words generated by drawing from
a complex Gaussian distribution satisfies the requirements of sphere hardening
and uniformity. In Section 5.3.2, the optimality of the Gaussian distribution is
discussed.
Sphere hardening
The magnitude-squared norm (here denoted x) of an n-dimensional real noise
vector n with elements sampled from a real zero-mean unit-variance complex
152 Simple channels
For a fuzzy Gaussian ball, the square of the radius is given by x and the standard
deviation of the fluctuation about the mean of x is given by the square root of
the variance of x. The fuzziness ratio indicated by the standard deviation to the
mean square of the radius is given by
√ 2
nσ
→ 0 ; as n becomes large. (5.27)
n σ2
Because the ratio goes to zero, it is said that the noise sphere has hardened. This
effect can be observed in Figure 5.7. As n becomes larger, the density function
becomes sharply peaked about the mean. For larger values of n, the noise about
the codeword vector is modeled well by a hard sphere.
Volume of hypersphere
For some point in a k-dimensional real space, the volume of the hypersphere of
some radius r is given by [334]
π k /2
V ( ) (k, r) = rk . (5.28)
Γ[k/2 + 1]
n
1
fC2 x, n,
2
0
0.0 0.5 1.0 1.5 2.0
x
n
Figure 5.7 Probability density function for fχC2 (x; n, 1/n), with n = 10, 20, 40, . . . , 100.
The volume Vn of the fuzzy complex noise ball for large m is approximated by
πm m
Vn (m, σ) ≈ x
m!
πm 2
= (σ m)m , (5.30)
m!
where x denotes the magnitude-squared norm of the noise used in the previous
section. By using essentially the same argument and by noting that the signal and
noise are independent, the variances of the noise and the signal power add. The
volume of the hypersphere associated with the received signal Vz is approximated
by
πm
Vz (m, σ 2 + Pr ) ≈ ([σ 2 + Pr ] m)m , (5.31)
m!
where Pr is the average receive signal power observed at the receiver in the
absence of noise.
Capacity b symbol
5
4
3
2
1
0
10 5 0 5 10 15 20
SNR dB
Figure 5.8 Channel capacity in terms of bits per symbol as a function of SNR.
Achievability
To demonstrate that the capacity bound is asymptotically achievable, a partic-
ular implementation is proposed. For a given data rate, the probability of error
must go to zero as the number of symbols goes to infinity. The effect of sphere
hardening is exploited here. Consider a random code over m complex symbols
with ncode codewords constructed using vectors drawn randomly from a complex
Gaussian distribution. These codewords would randomly fill the space (lying
near the surface with high probability). Given that a particular codeword was
5.3 Channel capacity 155
Potentially
Confused
n-Dimensional
Symbol
Space
Noise
Symbol
Figure 5.9 Notional representation of two symbols. The first symbol is displayed
including some distribution of noise. The second symbol is potentially confused with
the first symbol. In this case, the second symbol does not cause confusion because it
is outside the noise volume of the first symbol.
Vn (m, σ 2 )
p(1)
er r =
Vz (m, σ 2 + Pr )
m
σ2
= , (5.34)
Pr + σ 2
By using the definition that the coding rate r is the number of bits encoded by
the codebook normalized by the m complex symbols used
log2 (ncode )
r= , (5.36)
m
156 Simple channels
Some instance of that variable x is drawn from the probability distribution asso-
ciated with X. Throughout this section, it is assumed that the random variables
are complex.
The maximum information per symbol (or data rate) is given by the mutual
information I(S; Z) between the random variables S and Z when the distribution
for S is optimized. The mutual information is given by
2 2 p(s, z)
I(S; Z) = d s d z p(s, z) log2
p(s) p(z)
= h(Z) − h(Z|S)
= h(S) − h(S|Z) , (5.41)
where the temporal parameter t of the random variables has been suppressed
for s and z, and the differential area in the complex space d2 s is described in
Section 2.9.2. Differential entropy for some random variable is indicated by h(·).
The conditional differential entropy is indicated by h(·|·), where the second term
is the constraining condition. The joint probability distribution of S and Z is
indicated by p(s, z). While formally the notion pS,Z (s, z) might be clearer, it is
assumed that dropping the subscripts will not cause confusion. Similarly, the
probability distributions of S and Z are indicated by p(s) and p(z), respectively.
The differential entropy h(·) and conditional differential entropy h(·|·) are
named, making a connection with statistical mechanics [264]. The use of the
modifier “differential” indicates that this is the entropy used for continuous ran-
dom variables. While the derivation will not be presented explicitly, the moti-
vation for the mutual information being given by the difference in the entropy
terms is directly related to the geometric discussion in the previous section. In
statistical thermodynamics, entropy is proportional to the log of the number of
possible states Ω. Each state is assumed to be equally likely with probability
p = 1/Ω. Consequently, statistical mechanical entropy is given by
1
kB log Ω = kB log
p
= −kB log p , (5.42)
) *
1
h(X) = log2 . (5.43)
p(x)
For continuous variables, h(X) is called the differential entropy and is given by
1
h(X) = d2 x p(x) log2
p(x)
= − d2 x p(x) log2 [p(x)] . (5.44)
) *
1
h(X|Y ) = log2
p(x|y)
= − d2 x d2 y p(x, y) log2 [p(x|y)] , (5.45)
where p(x|y) is the probability density of x assuming a given value for y. The
difference between the entropy and the conditional entropy is given by
h(X) − h(X|Y ) = − dx p(x) log2 [p(x)] + d2 x d2 y p(x, y) log2 [p(x|y)]
=− d2 x d2 y p(x, y) log2 [p(x)]
p(x, y)
+ d2 x d2 y p(x, y) log2
p(y)
p(x, y)
= d2 x d2 y p(x, y) log2
p(x) p(y)
= I(X; Y ) . (5.46)
where the relationship p(x, y) = p(x|y) p(y) is employed. By observing the sym-
metry between x and y in Equation (5.46), it can be seen that the mutual
information is also given by
The differential entropy h(Z|S) is simply the differential entropy associated with
the noise h(N ),
This differential entropy evaluation can be seen directly by noting that under
the change of variables z = s + n, the probability p(s + n|s) = p(n),
h(Z|S) = − d2 s d2 z p(z, s) log2 [p(z|s)]
=− d2 s d2 n p(s + n|s) p(s) log2 [p(s + n|s)]
=− d2 s d2 n p(n) p(s) log2 [p(n)]
=− d2 n p(n) log2 [p(n)]
= h(N ) . (5.51)
Here we use the observation that the integrals over p(s + n|s) and p(n) are the
same even though the distributions themselves are not the same. The differential
160 Simple channels
−
n 2
2 e σn2
n 2 2
= d n +log2 (e)
log 2 [πσ n ]
πσn2 σn2
n 2
− σ2
2 2 e
n n 2
= log2 [πσn ] + log2 (e) d n
πσn2 σn2
= log2 [πσn2 ] + log2 (e)
= log2 [π σn2 e] . (5.52)
For some random complex variable X, the distribution p(x) that maximizes
the entropy can be found using the Lagrangian constrained optimization. There
are three basic constraints that p(x) must satisfy. These are the basic constraints
on a probability,
• p(x) ≥ 0
• d2 x p(x) = 1 ,
• d2 x x 2
p(x) = σx2 .
5.3 Channel capacity 161
− λσ d2 x x 2 p(x) − σ 2 . (5.55)
The optimal probability density is found by setting the total variation of the
function δφ to zero,
1
δφ = d x log2 [p(x)] −
2
− λ1 − λσ x 2
δp(x) = 0 , (5.56)
log 2
where the relationship log2 (x) = log(x)/ log(2) is used. By solving for p(x) such
that the total variation is zero for some arbitrary variation of δp(x), the following
form is found,
1
0 = log2 [p(x)] − − λ1 − λσ x 2
log 2
p(x) = e 2(λ 1 + λ σ x
2
)
= a eλ σ x ,
2
(5.57)
π
= −a
λσ
λσ
⇒a=− , (5.58)
π
and, by using the notation x = xr + ixi ,
λσ
d2 x x 2 eλ σ x
2
σx2 = −
π
λσ 2 2
=− dxr dxi (x2r + x2i ) eλ σ (x r +x i )
π
λσ 2 2 λσ 2 2
=− dxr dxi x2r eλ σ (x r +x i ) − dxr dxi x2i eλ σ (x r +x i )
π π
λσ 2 2
= −2 dxi eλ σ x i dxr x2r eλ σ x r
π
√ √
λσ π π 1
= −2 √ =−
π −λσ 2(−λσ )3/2 λσ
1
⇒ λσ = − 2 . (5.59)
σx
162 Simple channels
Encoder
While it may seem unrealistic that the transmitter knows the interfering sig-
nal perfectly, there are many situations in which this model is applicable. For
instance, in broadcast channels where one transmitter has two different informa-
tion symbols to send to two different receivers, the signal intended for a particular
receiver is interference to an unintended receiver. Moreover, since the transmit-
ter is the source of both symbols, it must know the interfering signal perfectly.
Another possible application is in intersymbol-interference (ISI) channels where
successive symbols interfere with one another because of dispersion by the chan-
nel.
The capacity of this channel when the interfering signal t is not necessarily
Gaussian has been found by Gel’fand and Pinsker [108]. To achieve capacity, an
auxillary random variable U is used to aid in the encoding of the message via a
binning strategy.
Suppose that the transmitter wishes to send a message m (see Figure 5.10) to
the receiver. In the canonical additive-white-Gaussian-noise (AWGN) channel,
the transmitter will transmit an ns -symbol-long codeword, which is used to rep-
resent the message m, whereby each message maps to a single codeword. In the
Gel’fand–Pinsker scheme, each message m maps to several possible codewords,
each of length ns symbols. All the codewords corresponding to a particular mes-
sage m are said to come from the same bin. The codeword that is ultimately
selected to be transmitted is based on the value of the ns state symbols that
will occur during the transmission of the ns symbols that represent the message
m. Hence, since the transmitted codeword is dependent on the state symbols
that will occur during the transmission of that codeword, the transmitter can
precompensate for the effect of the interfering state symbols. By using random,
jointly typical coding and decoding (see, for example, Reference [68]), Gel’fand
and Pinsker show that the capacity of this channel is
s = s̃ − t . (5.69)
Since the received signal is a superposition of s and t, the receiver only sees the
codeword associated with the message m, i.e. s̃. The average transmit power for
this scheme is
% 2&
|s| = P + σt2 . (5.70)
This power is, of course, higher than the transmit power budget. Thus, dirty-
paper coding precompensates for the interference by cleverly encoding the trans-
mitted signal without a power penalty.
There is a nice geometric interpretation of the dirty-paper coding technique
that is depicted in the following figures, which is based on the presentation in
[314]. Consider a two-dimensional codeword space, and suppose that there are
5.4 Energy per bit 165
μt
u − μt
σn2 = N0 B
= kB T B , (5.74)
where kB is the Boltzmann constant (∼ 1.38 · 10−23 J/K) and T is the absolute
temperature as discussed in Section 4.2.2. The spectral efficiency bound for the
spectral efficiency c ≥ R/B in bits/second/hertz is given by
Pr
c = log2 1 + , (5.75)
N0 B
5.4 Energy per bit 167
where Pr is the receive power of the transmitted signal, N0 is the noise spectral
density (assuming complex noise), and B is the bandwidth.
For this simple channel, the SNR is given by
Pr
SNR = . (5.76)
N0 B
A related useful measure of signal energy to noise is Eb /N0 , which is sometimes
unfortunately pronounced “ebb-no.” This is the energy per information bit at
the receiver divided by the noise spectral density. The energy is given by the
power divided by the symbol rate. The energy per information bit normalized
by the noise spectral density Eb /N0 is given by
Eb Pr 1 B
= = SNR . (5.77)
N0 R N0 R
Consequently, capacity in terms of Eb /N0 is given by
Eb R
c ≥ log2 1 + . (5.78)
N0 B
The equality is satisfied if the communication rate density R/B is equal to the
capacity c. Thus, the implicit relationship between bounding spectral efficiency
and Eb /N0 is defined as
Eb
c = log2 1 + c . (5.79)
N0
We can solve for Eb /N0 for a capacity-achieving link:
Eb
2c = 1 + c
N0
Eb 2c − 1
= . (5.80)
N0 c
In the limit of low spectral efficiency, we can use the fact that the log of 1 plus a
small number is approximately the small number presented in Equation (2.14),
Eb
c = log2 (e) log 1 + c
N0
Eb
≈ log2 (e) c . (5.81)
N0
Consequently, for small spectral efficiencies c 1, there is a limiting Eb /N0 that
is independent of the exact value of spectral efficiency,
Eb 1
≈ ≈ −1.59 dB . (5.82)
N0 log2 (e)
Furthermore, this is the smallest Eb /N0 required for any nonzero spectral effi-
ciency. Links with large spectral efficiencies c 1 require larger Eb /N0 . The
required Eb /N0 as a function of the channel capacity is displayed in Figure 5.13.
168 Simple channels
Eb N0 dB
4
Figure 5.13 The required noise-normalized energy per information bit as a function of
the channel capacity in terms of bits per second per hertz.
As the capacity falls below about 0.1 b/s/Hz, the required Eb /N0 does not change
appreciably. The implication is that in this constant Eb /N0 regime the data rate
is proportional to power. The capacity region where this is true is sometimes
denoted the noise-limited regime. Above about 1 b/s/Hz, increasing the data
rate requires an exponential increase in power. The capacity region in which this
is true is sometimes denoted the power-limited regime.
Problems
5.1 For a satellite in geosynchronous orbit about the earth centered over the
continental United States,
(a) find the antenna gain required to cover the continental United States well
(approximate 3 dB northeast corner of Maine and southwest corner of
California, about 5000 km); and
(b) evaluate the approximate effective area assuming a carrier frequency of
10 GHz.
5.2 In a line-of-sight environment without scatterers, find the largest achievable
data rate between two short dipoles with the same orientation, separated by
1 km, transmitting 1 W, operating at a carrier frequency of 1 GHz, with a
receiver at temperature of 300 K.
5.3 Consider Figure 5.14, which is a block diagram of a Tomlinson–Harashima
precoder developed in the 1970s to mitigate intersymbol-interference. The prin-
ciples of its operation are very similar to the Costa precoding described in this
chapter.
(a) Suppose that the box marked f (·, V ) is removed, i.e., x[k] equals m[k] with
the output of the filter g[k] subtracted out. Find g[k] such that y[k] = x[k].
Problems 169
Transmitter
(b) Suppose that ||m[k]|| ≤ M . What is the largest possible value that x[k] can
take, assuming that the box marked f (·, V ) is still not present?
(c) Please specify f (·, V ) such that ||x[k]|| ≤ V ∀k and show that y[k] = x[k].
5.4 By noting that at low spectral efficiency the best case Eb /N0 ≈ −1.59 dB,
evaluate the minimum received energy required to decode 1000 bits at a temper-
ature of 300 K.
5.5 Evaluate the differential entropy for a real Rayleigh random variable.
6 Antenna arrays
Arrays of antennas can be used to improve the signal-to-noise ratio (SNR) and
to mitigate interference. For many communication links, the propagation envi-
ronment is complicated by scattering that can distort an incoming signal both
in direction and delay. In this chapter’s introductory discussion, a simplifying
assumption is employed. Within this chapter, it is assumed that there is no scat-
tering. An example would be a line-of-sight link in a large anechoic chamber.
In addition, it is assumed that the antenna array is small compared with the
ratio c/B of the propagation speed c to the bandwidth B. As a consequence, it
is assumed that the signal is not dispersive across the antenna array. A disper-
sive channel would have resolvable delay spread across the antenna array. These
restrictions will be removed in subsequent chapters.
6.1 Wavefront
Consider a single transmitter that is a long distance away from an array of receive
antennas. The wavefront that expands in a sphere about the transmitter can be
approximated by a plane near the antenna array as seen in Figure 6.1. In other
words, the transmitter is far enough from the receive antenna array such that
the phase error associated with the plane wave approximation is small. This is
a valid approximation1 when
L2
R , (6.1)
λ
if R is the distance from the source, L is the size of the array (L is the largest
distance between any two antennas), and λ is the wavelength. Here it is assumed
that each receive antenna is identical. Because the receive antennas are at slightly
different distances from the source in general, the plane wave impinges upon each
antenna with some relative delay. Under the assumption of a narrowband signal,
that is, a signal that does not have sufficient signal bandwidth B to resolve the
relative antenna locations
c
B , (6.2)
L
1 The notation indicates much greater, which is not a precise notion, but is dependent
upon the allowed error in the approximation.
6.1 Wavefront 171
2 If the sign of the terms in the exponent in Equation (6.5) are inverted, then Equation (6.3)
is still satisfied. Unfortunately, both conventions are employed.
172 Antenna arrays
The delay relative to the origin for the mth receive antenna is given by the
relative distance divided by the propagation speed,
k
k · xm
Δtm = . (6.10)
c
The relative phase is given in the argument of e−i ω (t−Δ t m ) . By focusing on the
relative phase of the baseband signal, the following antenna-dependent phase
term is found,
k ·x
k m
eiω Δ t m = eiω c
= eik k ·x m
k
= eik·x m , (6.11)
where the relationship k = 2π/λ = ω/c is used.
sums the results, producing a single stream of data. The beamformer can be
considered a form of spatial filtering, which is analogous to spectral filtering. A
beamformer cam be constructed to selectively receive energy from some given
direction. The resulting array beam pattern is a measure of power at the output
of the beamformer (or array factor in amplitude) for signals coming from other
physical directions. In general, the term beamformer can be applied to either
transmission or reception. The mathematical formulation is the same up to a
reversal of time.
There are a variety of approaches for constructing the beamformer. Various
approaches are considered in Chapter 9. The implementation of the beamformer
can be at the carrier frequency; however, in modern digital receivers the
beamforming is typically applied to the complex baseband signal with the equiv-
alent effect. A transmit beamformer is the same up to a time reversal (causing
a conjugation), so that, for transmitting, w∗ would be employed rather than w.
The average amount of power Pw at the output of a beamformer w ∈ Cn r ×1
applied to the received data stream as a function of time z(t) ∈ Cn r ×1 , built
from the vector of received signals in Equation (6.7) is given by
% &
w† z(t) 2
Pw =
w 2
† 2
% &
w v(k) a 2 s(t) 2
= , (6.16)
w 2
where it is assumed that the received signal consists of a single wavefront propa-
gating along the wavevector k. The normalizing term of w 2 in the denominator
keeps the noise power constant for different scales of the beamformer. Once again,
noise is assumed to be negligibly small for this discussion. The steering vector,
defined in Equation (6.13), indicates the relative phases and amplitudes
% & for the
incoming signal. The mean square of the transmitted signal s(t) 2 = Pt is
associated with the transmit power and does not affect the shape of the beam
pattern.
It is sometimes useful to consider the normalized beam pattern ρw (k). It is
constructed so that the matched response (when w ∝ v(k)) would be unity and
is given by
w† v(k) 2
ρw (k) = . (6.17)
w 2 v(k) 2
The relative power at the output of the beamformer is given by the square of
the normalized inner product between the beamformer and the steering vector
for wavefronts propagating along various directions.
176 Antenna arrays
compared with what one might expect when using the direction of wavefront
propagation convention.
A sidelobe is a local peak of received power in a direction different from the
intended beamformer direction. For many applications, such as geolocation, the
levels of these sidelobes can be important because they can cause confusion with
regard to direction to the signal source. In environments with interfering users,
the beam pattern can be used to reduce power from interfering users at differ-
ent directions. High sidelobes indicate the potential for higher levels of interfer-
ence power at the output of a receive beamformer. A reasonable question is, do
sidelobes matter? For many wireless communication applications, they are not
important. If there is significant multipath such that the notion of line-of-sight
beam patterns is not valid, then the idea of sidelobes for a line-of-sight beam
pattern has little applicability. Also, if there is a single line-of-sight source, then
accepting energy from other directions in addition to receiving energy from the
intended direction will cause no adverse effects. Similarly, given a small number
of interferers, if adaptive processing is used, then the sidelobes can be distorted
to avoid accepting energy from the potential interferers. Once again, there are no
adverse effects for most applications. Conversely, if line-of-sight propagation is a
reasonable model, and either there is a very large number of interferers so that
adaptivity is not effective, or adaptivity is not possible, then sidelobe levels can
be important. If the array is being used for direction of arrival estimation in the
presence of significant noise, then the sidelobes can be important because of the
potential of confusing the correct angle of arrival with an angle corresponding
to a sidelobe direction.
1.5
1.0
Axis wavelengths
0.5
0.0
0.5
x 2
1.0
1.5
1.5 1.0 0.5 0.0 0.5 1.0 1.5
x 1 Axis wavelengths
Figure 6.2 Antenna array geometry for an eight-element array of radius one
wavelength.
Relative Beamformer Output Power dB
10
15
20
25
150 100 50 0 50 100 150
Angle deg
Figure 6.3 The beam pattern for eight-element circular array with radius of one
wavelength.
a little less than 30 degrees. The aperture is the length of the array.3 For a
beamformer optimized for φ = 0, the amount of power accepted from other
directions can be relatively high. In this example, at the angle of about ±85
degrees, the attenuation is only down by 5 dB. This region of relatively low
attenuation is denoted a sidelobe. It is often desirable to minimize the height
of these sidelobes to minimize interference from undesired sources. The various
approaches to do this include both adaptive and nonadaptive techniques.
For a linear receive antenna array in the {x}1 –{x}2 plane with the antennas
along the {x}2 axis starting at the origin (as seen in Figure 6.4), with regular
antenna spacing d, the positions are given by
{xm } = (m − 1) d . (6.25)
The inner product between the wavevector and the position of the antenna (de-
termined by angle φ) is given by
2π (m − 1) d sin(φ)
k · xm = − . (6.26)
λ
In the special case of a line-of-sight transmitter, the array response is given by
i 2 π ( m −1 ) d s i n ( φ )
e− λ
{v(φ)}m = √ , (6.27)
nr
√
where the arbitrary normalization is chosen by using the term nr so that the
magnitude of the array response is 1,
v(φ) = 1 . (6.28)
Given this formulation, the beam pattern is given by
ρv(φ 0 ) (φ) = v†(φ) v(φ0 ) 2
$ n −1 $2
$1 r $
$ i 2 π m d sin (φ ) −i 2 π m d s i n ( φ 0 ) $
=$ e λ e λ $
$ nr $
m =0
$n −1 $2
1 $ $
$ i 2 π m d [ s i n (λφ ) −s i n ( φ 0 ) ] $
r
= 2$ e $ . (6.29)
nr $ m =0
$
Axis wavelengths
2
1
x 2
0
2 1 0 1 2
x 1 Axis wavelengths
Figure 6.4 Antenna array geometry for an eight-element linear array with spacing of
1/2 wavelength.
Figure 6.4. For a matched-filter beamformer optimized for φ = 0 (along the {x}1
axis), with steering vector
1
v(φ0 ) = √
nr
⎛ ⎞
1
⎜ 1 ⎟
⎜ ⎟
1=⎜ . ⎟, (6.30)
⎝ .. ⎠
1
the relative power at the output of the beamformer as a function of transmitter
angle is given in Figure 6.5. As with the circular array, the region around φ = 0
is the mainlobe of the pattern. The width of the mainlobe is approximately
given in radians by the wavelength divided by the aperture. In this case, the
beamwidth is about 1/4 radians or a little less than 15 degrees. In this example,
the peak sidelobes are lower than the peak by about 13 dB. In fact, because of
the rotational symmetry about the {x}2 axis, energy received from various angles
will be equal in response for the line along a cone at the given angle as displayed
in Figure 6.6. In this particular example in a plane, the rotational symmetry
creates an exact forward–backward ambiguity in the beamformer, causing the
acceptance of power to be equal at φ = 0 and φ = 180 degrees.
For isotropic antenna elements, the Nyquist spacing for an antenna array is
d = λ/2. At this spacing, there will be no ambiguities over the range of angles
φ ∈ (−π/2, π/2) for signals in the {x}1 –{x}2 plane. Out of this plane, there can
be some confusion. The direction from which energy is preferentially received
6.3 Linear arrays 181
10
15
20
25
150 100 50 0 50 100 150
Angle deg
Figure 6.5 The beam pattern for an eight-element linear array with spacing of 1/2
wavelength.
Sou n To
Dir
bigu
rce
ctio
ous
Dire
φ
{x}1 axis
Antenna Array
Figure 6.6 For linear array direction to source and cone of ambiguity.
= 2$ e $ .
nr $ m =0 $
$n −1
1 $$
r
2π m d [sin(φ) − sin(φ0 )]
= 2$ cos
nr $ m =0 λ
$2
2π m d [sin(φ) − sin(φ0 )] $ $
+i sin $
λ
$n −1 $
1 $ 2π m d [sin(φ) − sin(φ0 )] $
2
$
r
$
= 2$ cos $
nr $ m =0 λ $
$n −1 $2
1 $$
r
2π m d [sin(φ) − sin(φ0 )] $
$
+ 2$ sin $ . (6.31)
nr $ m =0 λ $
For this discussion, it will be convenient to define the antenna weighting vector
a ∈ CM ×1 indexed from 0 to M − 1,
The vector contains information about both the existence of an antenna at some
lattice position and the phasing of that element.
Given these definitions and the assumption of λ/2 antenna spacing, the beam
pattern is given by
so that the the beam pattern is now evaluated at a discrete set of points de-
termined by q Δu. Here we employ the observation that extending the range of
summation from nr − 1 to M − 1 indices has no effect because of the zero entries
in the antenna weighting vector a. The argument of the norm operator, which
we will denote the complex beam pattern b(q) as a function of q, is given by
M −1
1 i2π m q
b(q) = √ e M {a}m . (6.37)
nr m =0
However, because the exponential has the same value for arguments with imagi-
nary components separated by integral multiples of 2π, the first and the (M +1)th
indices are redundant,
i2π i2π
mq m (q + M )
eM =eM . (6.39)
are necessary. For convenience, consider the admittedly strange ordering of pos-
sible values of q,
M M M M M
q ∈ 0, 1, . . . , − 2, − 1, − , − + 1, − + 2, . . . , −1 . (6.41)
2 2 2 2 2
Notice that the negative values have been moved to the right-hand side of the
list. Furthermore, because of the same modularity characteristic used above, a
new index variable q is constructed spanning the same space of angles,
q ∈ {0, 1, . . . , M − 2, M − 1} . (6.42)
In the continuous version of the vector inner product, one can think of an
infinite-dimensional vector indexed by some continuous parameter, as introduced
in Section 2.1.4. While we will not be particularly involved in the technical de-
tails, this vector space is sometimes referred to as an infinite-dimensional Hilbert
space. For example, the antenna weighting vector a can be indexed by the po-
sition along the linear array x, under the assumption of some pointing direction
u. The continuous antenna weighting function is denoted
a → fa (x; u) , (6.47)
where the function is defined as a distance along the antenna array x in units of
wavelength. Inner products in this complex infinite-dimensional space are given
by integrating over the indexing parameter. In this case it is x. The complex
infinite-dimensional inner product between function f (x) and g(x) is denoted
f (x), g(x) = dx f (x) g ∗(x) . (6.48)
The beam pattern is related to the magnitude squared of the Fourier transform
of a continuous version of the antenna weighting vector. Similarly, the continuous
version of complex beam pattern is denoted
b → fb (u) , (6.49)
where L is the length of the antenna array in units of wavelength. The inner prod-
uct between the continuous steering vectors at some direction u and boresight
u = 0 is given by
fa (x; u), fa (x; 0) = dx fa (x; u) fa∗ (x; 0)
1 L
= dx e−i2π u x
L 0
−1 −i2π u L
= e −1
i2π u L
e−iπ u L iπ u L
= e − e−iπ u L
i2π u L
sin (πuL)
= e−iπ u L
πuL
= e−iπ u L sinc (uL) , (6.51)
186 Antenna arrays
(a)
1
fa x;0
0.5
0
0.5 0 0.5 1
xL
(b)
0
5
Beam Pattern, u
10
15
20
25
30
3 2 1 0 1 2 3
Direction, u L
Figure 6.7 (a) The continuous antenna weighting function in terms of position along
the antenna array, assuming the array is phased to point perpendicularly to the array
(along boresight). (b) The beam pattern of the continuous antenna array
approximation as a function of the product of the direction u = sin(φ) and the array
side L.
For applications such as direction finding, with sparse arrays the sidelobes can be-
come sufficiently high to cause angle-estimation confusion. For communications,
these sidelobes are not typically an issue because, in complicated multipath en-
vironments, knowledge of the direction is not particularly meaningful. However,
the sparse array element can improve the spatial diversity, typically improving
performance.
The probability of being below the threshold at a given point Pbelow (η) is found
by observing that the sum of independent random phasors tends toward a Gaus-
sian distribution because of the central limit theorem. The power received at the
output of a matched-spatial-filter beamformer constructed for direction u0 =
sin φ0 given an array response from some other direction u = sin φ, for angles
φ0 and φ from boresight, is given by the inner product between the two steering
vectors v† (u0 ) v(u), where the steering vector v(u) ∈ Cn r ×1 . For some direction
6.4 Sparse arrays 189
u, the value of the complex ratio of the sidelobe to mainlobe amplitude z is given
by
v† (u0 ) v(u)
z(u) =
v† (u0 ) v(u0 )
= v† (u0 ) v(u)
r(u) = z(u) , (6.59)
where, in this section, it is assumed that the norm of the steering vector is 1,
v(u) = 1 ∀ u. (6.60)
To make contact with previous sections, the square of the sidelobe to mainlobe
amplitude ratio is the normalized beam pattern, which is given by
It is assumed that the steering vector can be constructed by using a simple plane
wave model for a linear array. The element of the steering vector associated with
the mth randomly placed antenna is given by
1
{v(u)}m = √ eik u x m , (6.62)
nr
By integrating the probability density from zero to the threshold η, the value for
the probability of being below the threshold at some sidelobe level at a specific
190 Antenna arrays
1.0
0.6
0.4
0.2
30 25 20 15 10 5 0
Threshold dB
Figure 6.8 Probability that sidelobe-to-mainlobe ratio for sparse arrays with 25 (light
gray), 20, 15, and 10 (black) randomly placed antennas is less than some value.
= 1 − e−n r η2
. (6.65)
Simple approximation
At this point, one could observe that the Nyquist sampling density is 1/(2L), and
that the scanned space for u is from −1 to 1, so that there are up to approximately
2 · 2L/λ − 1 distinct sidelobes. Under the generous approximation that they are
independent, the probability of exceeding the threshold is approximately given
by
4L /λ−1
Pr(r < η) ≈ 1 − e−n r η2
. (6.66)
6.4 Sparse arrays 191
where σr2 and σr2 are the variances of the real part of the complex amplitude
ratio and the derivative of the real part of it respectively. To develop this density,
a few intermediate results are required. It is assumed that the probability of the
value of the real zr and imaginary zi parts of the sidelobe-to-mainlobe complex
amplitude ratio z and their derivatives (zr and zi ) can be represented by real
independent Gaussian distributions. The probability density for these variables
is given by
z r2 z i2 z 2 z i2
1 − 12 + + σ r2 +
p(zr , zi , zr , zi ) = e σ r2 σ2
i r σ 2
i , (6.69)
(2π)2 σr σi σr σi
where σi2 and σi2 are the variances of the imaginary portion of the amplitude ratio
and its derivative with respect to the direction parameter u. In the sidelobes,
192 Antenna arrays
it is expected that there is symmetry between the real and imaginary portions
of the amplitude ratio. Consequently, the variances for the real and imaginary
parts are equal, σi2 = σr2 and σi2 = σr2 . The probability density for the real
and imaginary parts of the amplitude ratio can be expressed in terms of polar
coordinates, and are given by
zr = r cos(θ)
zi = r sin(θ) , (6.70)
where the parameter θ = arctan(zi /zr ) is implicitly a function of u. The deriva-
tives with respect to the direction parameter u of the real and imaginary part of
the amplitude ratio are given by
zr = r cos(θ) − r θ sin(θ)
zi = r sin(θ) + r θ cos(θ) , (6.71)
where θ is the derivative of θ with respect to u. The sum of the squares of the
real and imaginary components of the amplitude ratio and their derivatives are
given by
zr2 + zi2 = r2
zr2 + zi2 = r2 + r2 θ2 . (6.72)
The probability density in terms of the polar coordinates is given by
r2 2 + r 2 θ 2
r2 − 12 σ r2
+r σ r2
p(r, r , θ, θ ) = 2 2 2
e , (6.73)
(2π) σr σr
To find the probability density for the magnitude of the amplitude ratio and its
derivative, an integration over the angular components is performed,
p(r, r ) = dθ dθ p(r, r , θ, θ )
r2 2 + r 2 θ 2
r2 − 12 +r
= dθ dθ e σ r2 σ r2
∂2
σr2 = − lim Rz ,z (δu) , (6.76)
δ u →0 ∂δu r r
where Rz r ,z r (δu) is the autocovariance5 of the real component of the amplitude
ratio zr . This relationship can be understood by noting that for a zero-mean
stationary (which means the statistics are independent of angle far from the
mainlobe) processes a(u), the autocovariance Ra,a (u1 , u2 ) is given by
∂2
σr2 = − lim Rz r ,z r (δu)
δ u →0 ∂δu
cos(δu Lπ) sin(δu L π) L π sin(δu L π)
= − lim − −
δ u →0 δu2 2δu3 L π 2δu
2 2
L π
= . (6.81)
6 nr
By substituting the value for the variances, the joint probability density for
the sidelobe-to-mainlobe ratio and the derivative of the ratio is given by
r2 2
r − 12 + σr 2
p(r, r ) = √ e σ r2 r
2π σr2 σr
r2 r 2
r − 12 1 + L2 π2
=√ e 2 nr 6 nr
1 L2 π 2
2π 2 nr 6 nr
2
12 −1 2 n r r 2 + 6 Ln2r πr 2
=r e 2 , (6.82)
π 3 L2 n3r
where Equations (6.75) and (6.81) are employed to determine the values for σr2
and σr2 . The probability pcr (u; η) of crossing the threshold at some direction
u [formulated in Equation (6.67)] is given by integrating the joint probability
density over all derivative values near the threshold value of interest dr ≈ r du.
The resulting probability density is a function of the direction and threshold
6.4 Sparse arrays 195
From the above discussion, the total probability of the peak sidelobe being
below the threshold η is approximated by
M
um ax − um in
Pbelow (η) = lim 1− pcr (u; η)
M →∞ M
m =1
M
um ax − um in m−1
= lim 1− pcr um in + (um ax − um in ); η
M →∞ M M
m =1
M
∞
2
= lim 1− dr r p(η, r )
M →∞ M 0
m =1
M ,
2 π nr
η L e−n r η
2
= lim 1−
M →∞ M 3
m =1
0 √ 2
1
− [ 4 π 3n r η L e −n r η ]
=e , (6.84)
where um ax and um in are the limits of direction and are given by 1 and −1
respectively, and pcr (u; η) is evaluated by using Equation (6.83). Here we employ
the observations that the argument of the product above is independent of u and
that from Equation (2.15) the product can be expressed as an exponential by
using the following relationship,
x n
lim 1 + = ex . (6.85)
n →∞ n
Consequently, the product of the probability of exceeding the threshold at a
given initial angle times the probability of crossing the threshold η over the
visible region is given by
0 √ 2
1
−n r η 2 − [ 4 π 3n r η L e −n r η ]
Pr(r < η) ≈ [1 − e ]e . (6.86)
1.0
2
0.8
r2
0.6
Figure 6.10 Propagation of wavefront along the {x}1 -axis with horizontal and vertical
polarization.
polarization axes are defined by the directions of the electric field oriented hori-
zontally and vertically in the plane perpendicular to the direction of propagation.
While line-of-sight propagation is considered within this chapter, more generally
it is worth noting that because many channels have complicated multipath scat-
tering, energy propagating along any direction has some probability of getting
from the transmitter to the receiver. Consequently, all directions of propagation
and polarization are of interest. In particular, one can imagine employing an
array of three crossed dipole antennas, all centered at some point.
The simple plane wave description in Equation (6.5) is extended here to include
polarization. The vector of electric field components e(x, t) for each direction
as a function of location and time under the assumption that a plane wave is
propagating along the {x}3 -axis such that {k}1 = {k}2 = 0 is given by
⎧⎛ ⎞⎫
⎨ ψ1 (k, t) ⎬
e(x, t) = ⎝ ψ2 (k, t) ⎠
⎩ ⎭
0
⎧⎛ ⎞ ⎫
⎨ a1 ⎬
= ⎝ a2 ⎠ ei({k}3 {x}3 −ω t) , (6.87)
⎩ ⎭
0
where ψ1 (k, t) and ψ2 (k, t) are the plane wave solutions as a function of time
t to the wave equation propagating along k associated with polarization along
the {x}1 -axis and polarization along direction {x}2 -axis, respectively. The pa-
rameters a1 and a2 are the complex amplitudes along the 1 and 2 axes. Given
the defined geometry (as seen in Figure 6.10), the horizontal and vertical po-
larizations of the wavefront are associated with the {x}1 -axis and {x}2 -axis,
respectively. The third element is 0 because there is no electric field along the
direction of propagation in free space.
A horizontally polarized wavefront (as seen in Figure 6.10) is characterized by
a1 = 0 , and a2 = 0 . (6.88)
198 Antenna arrays
a2 = 0 , and a1 = 0 . (6.89)
where r1 and r2 are real parameters, and a1 and a2 have a common complex
phase, φ. This basis corresponds to starting with a horizontally or vertically
polarized wave and rotating the axes (or physically rotating the antenna) about
the direction of propagation. An arbitrary elliptical polarization allows for values
a1 = 0 , and a2 = 0 . (6.91)
a1 = b , and a2 = ±i b , (6.92)
where b is some complex valued parameter. The positive sign for the ±i term
indicates a right-handed polarization, and the negative sign indicates a left-
handed polarization.
The circular basis for electric polarization, {right, left, propagation}, for ecir
is related to the linear basis for a wavefront propagating along {x}3 by
⎧⎛ ⎞⎛ ⎞⎫
⎪
⎨ √2 −i √2 0
1 1
ψ1 (k, t) ⎪ ⎬
⎜ ⎟
ecir (x, t) = ⎝ √12 i √12 0 ⎠ ⎝ ψ2 (k, t) ⎠ . (6.93)
⎪
⎩ ⎪
⎭
0 0 1 0
Problems
6.1 Considering the plane wave approximation for the reception of a narrow-
band signal on a continuous linear antenna array in a plane with a source located
along boresight of the array, evaluate the root-mean-square error as a function
of range R, length L, and signal wavelength.
6.3 For a four-element linear regular array with 1 wavelength spacing that
incorporates the array element amplitude pattern a(θ)
a(θ) = 2 cos[sin(θ)π/2] ,
where the angle θ is measured from boresight of the array:
(a) formulate an unnormalized steering vector in a plane;
(b) assuming the array is pointed at boresight, evaluate ratio of power beam
pattern of this array to an eight-element array with isotropic elements and
half wavelength spacing;
(c) assuming the array is pointed at θ = π/4, evaluate the ratio of power beam
pattern of this array to an eight-element array with isotropic elements and
half-wavelength spacing.
6.4 For the continuous array construction discussed in Section 6.3.3 that exists
over the spatial domain of 0 ≥ x ≥ L, find the normalized power beam pattern
under the assumption the receive array uses the following tapering or windowing
functions.
(a) Triangular:
2x
L ; 0 ≥ x ≥ L/2
w(x) = .
2− 2x
L ; L/2 > x ≥ L
(b) Hamming:
2πx
w(x) = 0.54 − 0.46 cos .
L
6.5 Consider the linear sparse array problem with randomly moving elements
that have uniform probability density, assuming 32 isotropic antennas; find an
the aperture in terms of wavelengths such that the peak sidelobe is no worse
than 5 dB 90% of the time.
6.6 Consider the linear sparse array design problem assuming 32 isotropic an-
tennas; find an the aperture in terms of wavelengths such that a designer would
likely find an array with peak sidelobe is no worse than 5 dB after ten random
array evaluations.
6.7 By assuming that a source is in the plane spanned by {x}1 and {x}2 ,
construct the unnormalized steering vector for an array of three phase centers
with half-wavelength spacing along {x}2 axis, assuming that the elements are
constructed with small electric dipoles and that:
(a) the array elements and single source are vertically (along {x}3 ) polarized;
(b) the array elements are horizontally polarized along the {x}2 axis and the
single source is horizontally polarized (in the {x}1 –{x}2 plane) and per-
pendicular to the direction of propagation;
(c) the array is phased to point at source, find the ratio of received power for
the horizontally polarized to vertically polarized systems as a function of
angle.
200 Antenna arrays
6.8 By assuming that a source is in the plane spanned by {x}1 and {x}2 ,
construct the unnormalized steering vector for an array of three phase centers
with half wavelength spacing along {x}2 axis, assuming that the elements are
constructed with small electric dipoles, that at each phase center there is an
electric dipole along each axis ({x}1 , {x}2 , and {x}3 ) and that:
(a) source is vertically (along {x}3 ) polarized;
(b) source has arbitrary polarization;
(c) the array is phased to point at source, find the ratio of received power
for the arbitrarily polarized to vertically polarized sources as a function of
angle.
7 Angle-of-arrival estimation
where am is the common complex attenuation from the transmitter to the re-
ceiver for the mth (of the nt ) sources that has array response v(φm ) ∈ Cn r ×1 (or
steering vector), which contains the phase differences in propagation because of
small relative delays from the transmitter to each receive antenna as discussed
in Section 6.1.2. These phases are a function of the propagation wavevector
km ∈ C3×1 . Array responses for a single incoming signal are expected to exist
somewhere along the array manifold defined by the continuous set of vectors
defined by v(φ) for all φ (or more generally v(k) for all wavevectors k). The
additive noise for the receiver is given by N ∈ Cn r ×n s . Here it is assumed that
202 Angle-of-arrival estimation
the entries in N are such that the columns are independently drawn from a unit-
variance complex Gaussian distribution with potentially correlated rows. The
transmitted complex baseband signal for the mth single-antenna transmitter is
given by sm ∈ C1×n s .
For many of the examples discussed here, it is assumed that the entries in sm
are unknown and are independently drawn from a complex Gaussian distribution
with unit variance, although estimation bounds are considered for both known
signal and Gaussian signal models. On the basis of the assumptions described
here, the signal-plus-noise spatial covariance matrix Q ∈ Cn r ×n r is given by
1 % &
Q= Z Z†
ns
' n (
1 t
1 % &
= am 2
v(φm ) sm s†m †
v (φm ) + N N†
ns m =1 ns
n
t
2 †
= am v(φm ) v (φm ) + R, (7.2)
m =1
Q̃ = R−1/2 Q R−1/2
1 −1/2 % &
= R Z Z† R−1/2
ns
' n (
1 t
where the square root of a matrix R−1/2 satisfies the relationship R−1/2 R−1/2 =
R−1 . Thus, environments with more complicated correlated noise can be consid-
ered. One complication of operating in the whitened space is that the norm of
the steering vector R−1/2 v(φ) may be dependent upon direction.
7.1 Maximum-likelihood angle estimation with known reference 203
For some applications, much may be known about the waveform that is being
transmitted. The details of what is known about the signal being transmitted
may vary from something about the statistics of the signal to knowing the exact
transmitted signal [31] such as a known training sequence. This knowledge of
the waveform can be exploited to improve the angle-estimation performance.
In this discussion, it is assumed that there is a single source antenna nt = 1. If
the signal of the transmitter of interest s ∈ C1×n s is known, in the presence of
Gaussian spatially correlated noise with known spatial covariance R ∈ Cn r ×n r ,
then the probability density function of an observed data matrix Z condition
upon the known transmitted signal s, the unknown overall complex attenuation
a, and the unknown azimuthal angle φ is given by
1 † −1
p(Z|s, a, φ) = e−tr{[Z−a v(φ) s] R [Z−a v(φ) s]} . (7.4)
πn r n s
Because s is a known reference, we can define s 2 = ns which is stronger than
just knowing its expectation equals ns .
To find an estimate of the signal direction, the likelihood is maximized. Be-
cause the logarithm monotonically increases with its argument, maximizing the
likelihood is equivalent to maximizing the logarithm of the likelihood. If the
log-likelihood is denoted f (Z|s, a, φ), then it is given by
f (Z|s, a, φ) = −tr{[Z − a v(φ) s]† R−1 [Z − a v(φ) s]} + b
= −tr{R−1/2 [Z − a v(φ) s] [Z − a v(φ) s]† R−1/2 } + b
= −tr{R−1/2 [Z Z† − a v(φ) sZ† − a∗ Z s† v† (φ)
+ ns a 2
v(φ) v† (φ)] R−1/2 } + b , (7.5)
where b = − log(π n r n s ) is a constant containing parameters not dependent upon
direction or attenuation. The matrix identity tr{A B} = tr{B A} has been em-
ployed.
To remove the nuisance parameter a containing the overall complex attenua-
tion, the log-likelihood is maximized with respect to a,
∂
f (Z|s, a, φ) = tr{s† v† (φ) R−1 [Z − a v(φ) s]} , (7.6)
∂a∗
where Wirtinger calculus, discussed in Section 2.8.2, is invoked. Because the
log-likelihood is negative and the expression is quadratic in attenuation, the
stationary point must be a maximum. The likelihood is maximized when
0 = tr{s† v† (φ) R−1 [Z − a v(φ) s]}
= tr{v† (φ) R−1 [Z − a v(φ) s] s† }
= tr{v† (φ) R−1 Z s† − a v† (φ) R−1 v(φ) ns }
v† (φ) R−1 Z s†
am ax = . (7.7)
ns v† (φ) R−1 v(φ)
204 Angle-of-arrival estimation
v† (φ) R−1 Z s† 2
f (Z|s, am ax , φ) = + b2 , (7.9)
ns v† (φ) R−1 v(φ)
Z s†
z= (7.10)
ns
7.2 Beamscan 205
7.3 Beamscan
The most direct and possibly the most intuitive approach to estimate the angle of
arrival is to scan a matched filter for all possible expected array responses. This
206 Angle-of-arrival estimation
is similar to the analysis discussed in Section 6.2, in which the beam pattern
is discussed and is sometimes denoted beamscan. The beamscan approach is
developed by considering the maximum-likelihood solution under the condition
of a single transmitter and of spatially white noise. Under the assumption that
there is a single random complex Gaussian source (nt = 1), the maximum-
likelihood solution simplifies to
φ = argmaxa,φ p(Z|Q)
1
= argmaxa,φ
π n r n s | [ a 2 v(φ) v† (φ)] + R|n s
† † −1
· e−tr{Z [a v(φ) v (φ)+R ] Z} .
2
(7.16)
|Q| = | a 2
v(φ) v† (φ) + R|
=| a 2
R−1/2 v(φ) v† (φ) R−1/2 + I| |R|
=( a 2
v† (φ) R−1 v(φ) + 1) |R|
=( a 2
κ + 1) |R| , (7.17)
where the whitened inner product κ = v† (φ) R−1 v(φ) is defined for convenience.
Because the whitened signal-plus-noise spatial covariance matrix R−1/2 Q R−1/2
is represented by an identity matrix plus a rank-1 matrix (as presented in Equa-
tion (2.114)), its inverse is given by
Q−1 = ( a 2
v(φ) v† (φ) + R)−1
= [ R1/2 ( a 2
R−1/2 v(φ) v† (φ) R−1/2 + I) R1/2 ]−1
= R−1/2 ( a 2 R−1/2 v(φ) v† (φ) R−1/2 + I)−1 R−1/2
−1/2 a 2 R−1/2 v(φ) v† (φ) R−1/2
=R I− R−1/2 . (7.18)
1+ a 2κ
This likelihood is maximized when the log of the likelihood is maximized, which
is the equivalent of maximizing
† −1/2 a 2 R−1/2 v(φ) v† (φ) R−1/2 −1/2
φ̂ = argmaxa,φ tr Z R R Z
1+ a 2κ
− ns log(1 + a 2
κ) . (7.20)
7.4 Minimum-variance distortionless response 207
7.5 MuSiC
and the eigenvector for the mth eigenvalue is denoted em . Under the assumption
of unit-norm eigenvectors, the projection operator (as discussed in Section 2.3.5)
for the noise subspace Pn oise ∈ Cn r ×n r is given by
Pn oise = em e†m . (7.32)
m
If there is energy coming from some direction φ, then it is expected that the
quadratic form
would be small because array responses would be contained in the signal space,
which is orthogonal to the noise projection matrix Pn oise . Conversely, in other
directions with “noise-like” spatial responses, this quadratic form would be ap-
proximately equal to v† (φ) R−1 v(φ). Thus, the ratio of these two values would
be a reasonable indicator of energy, and the MuSiC spatial pseudospectral esti-
mator ηm u sic (φ) is given by
v† (φ) R−1 v(φ)
ηm u sic (φ) = . (7.34)
v† (φ) R−1/2 Pn oise R−1/2 v(φ)
Because the spatial signal-plus-noise covariance matrix (or a whitened version of
it) can typically only be estimated, the MuSiC spatial pseudospectral estimator
is generally implemented using the estimated spatial covariance matrix Q̂ (and
if whitening is used, R̂).
To be clear, MuSiC is a relatively poor estimator for energy of received sig-
nals. With this normalization, the pseudospectrum is approximately unity when
directed far from any source. When pseudospectrum is pointed toward a source,
the output is approximately equal to the energy per sample times the number of
210 Angle-of-arrival estimation
25
BS
MVDR
Pseudospectrum (dB)
20
MuSiC
15
10
0
−1 −0.5 0 0.5 1
Sin( φ )
antennas squared. However, any given example is strongly dependent upon the
instantiation of noise, so it fluctuates significantly.
For a given array geometry and SNR of the signal of interest, there are two perfor-
mance metrics of interest, as seen in Figure 7.2. The first is the asymptotic in the
number of samples angle-estimation error bound given by the Cramer–Rao for-
mulation discussed in Section 3.8 and in References [77, 312, 172]. The Cramer–
Rao parameter performance bound is a local bound. It assumes that the probabil-
ity for an estimator to confuse the value of a parameter with a value far from the
actual value is zero. The second metric is the threshold point. This is the point at
which an estimator diverges dramatically from the asymptotic estimation bound.
Because of similarities in the array response (array manifold) at different phys-
ical angles, there is some probability of making large angle-estimation errors by
confusing an observed array response in noise with the wrong region of the array
manifold. These regions of potential confusion can be seen by those regions of
relative high sidelobes in the array response as a function of angle. The high side-
lobes are an indication that, while the angles are significantly different, the array
responses are similar. Consequently, when the angle is estimated in the presence
of significant noise, the array response associated with the erroneous angle at a
large sidelobe can sometimes be a closer match to the observed array response
than the array response associated with the correct angle. The threshold point
is not well defined. However, because the average estimation typically diverges
quickly as a function of SNR, the exact definition is typically not a significant
concern. There are a variety of bounds that attempt to incorporate the nonlocal
effects, such as the Bhattacharyya, and the Bobrovsky–Zakai bounds. Many of
these bounds are special cases of the more general Weiss–Weinstein bound [342].
Angle-Estimation Performance
Estimator
Log Estimation
Performance
Variance
CR B
ound
Threshold
SNR (dB)
Figure 7.2 Notional performance of parameter estimation. The high SNR performance
is characterized by the Cramer–Rao bound. Below some threshold point, the
estimation diverges from the Cramer–Rao bound.
where a is the complex attenuation, and v(u) is the steering vector as a function
of direction parameter u = sin(φ) with φ indicating the angle from boresight. In
this case, the mean is given by
μ = av(u) s , (7.36)
where the reference sequence s is normalized so that s 2 = ns . The covariance
matrix R is given by the covariance matrix of the external interference plus noise.
The Fisher information matrix for all ns samples is s 2 = ns times the Fisher
information matrix for a single sample. From Section 3.8, in Equation (3.259),
the reduced Fisher information matrix is given by
0 1
Ju(r,u) ({s}m ) = 2 a 2 {s}m 2 ẋ† P⊥ x(u ) ẋ
Ju(r,u) = Ju(r,u) ({s}m )
m
0 1
=2 a 2
ns ẋ† P⊥
x(u ) ẋ , (7.37)
7.7 Local angle-estimation performance bounds 213
where the spatially whitened vector and derivative matrix are defined by
x(u) = R−1/2 v(u)
∂
ẋ = R−1/2 v(u) , (7.38)
∂u
and the projection operator (discussed in Section 2.3.5) for the subspace orthog-
onal to the column space spanned by the whitened array response x(u) is given
by
P⊥ †
x(u ) = I − x(u) [x (u) x(u)]
−1 †
x (u) . (7.39)
For the sake of discussion, it is assumed that there is no external interference
and the units of power are scaled so that
R = I. (7.40)
The reduced Fisher information simplifies to
∂v†(u) ⊥ ∂v(u)
Ju(r,u) = 2 ns a 2
Pv(u ) . (7.41)
∂u ∂u
As discussed in Section 6.1, the components of the array response or steering
vector v(u) ∈ Cn r ×1 are given by
{v(u)}m = eik y m u
, (7.42)
under the assumption of the normalization v(u) 2 = nr , where ym is the posi-
tion of the mth antenna along the linear array in units of wavelength and k is
the wavenumber or equivalently the magnitude of the wavevector.
The derivative with respect to the direction variable u is given by
∂v(u)
= ik ym v(u) . (7.43)
∂u m
The reduced Fisher information is then given by
∂v†(u) ⊥ ∂v(u)
Ju(r,u) = 2 ns a
2
Pv(u )
∂u ∂u
+ 7
∂v †
(u) v(u) v †
(u) ∂v(u)
= 2 ns a 2 k2 2
ym −
m
∂u nr ∂u
⎧ $ $2 ⎫
⎨ 1 $ $ ⎬
$ $
= 2 ns a 2 k2 2
ym − $ yn $ . (7.44)
⎩m nr $ n $ ⎭
By setting the origin of the y-axis so that the average element position is zero,
"
n yn = 0, the second term in the braces of Equation (7.44) goes to zero and
the Fisher information is given by
+ 7
(r ) 2 2 2
Ju ,u = 2 ns a k ym
m
2
= 2 ns a k 2 nr σy2 , (7.45)
214 Angle-of-arrival estimation
" 2
and the notation for σy2 = m ym /nr indicates the mean-squared antenna po-
sition, under the assumption that the mean position is zero. Consequently, the
reduced Fisher information is given by the direction term exclusively. The vari-
ance in the estimate of direction u is limited by
% & 1
û − u 2
≥ " 2
2 k2 a 2
ns m ym
1
= , (7.46)
2 k2 a 2 ns nr σy2
1 † −1
p(Z|P, u) = e−tr{Z Q Z}
|Q|n s π n r n s
ns
= [p(z|P, u)] , (7.48)
1 † −1
p(z|P, u) = e−tr{z Q z} . (7.49)
|Q| π n r
7.7 Local angle-estimation performance bounds 215
As defined in Equation (3.200), the mean portion of the signal implicitly in-
corporates the multiple samples; however, the covariance portion of the Fisher
information matrix does not, and thus includes the coefficient ns . Given some
vector of parameters θ, the {m, n}th component of the Fisher information ma-
trix, is given by
∂Q(θ) −1 ∂Q(θ)
{J}m ,n = ns tr Q−1 (θ) Q (θ) . (7.51)
∂{θ}m ∂{θ}n
The reduced Fisher information for the real direction u and power P parameters
is given by
Ju(r,u) = Ju ,u − Ju ,P J−1
P ,P JP ,u . (7.52)
∂ ∂
Q(u) = I + P v(u) v† (u)
∂u ∂u
= P v̇(u) v† (u) + v(u) v̇(u)† , (7.53)
where the notation v̇(u) indicates the derivative of the steering vector with re-
spect to the direction parameter
∂
v̇(u) = v(u) . (7.54)
∂u
By using Equation (2.114), the inverse of the rank-1 plus identity receive spatial
covariance is given by
P v(u) v† (u)
Q−1 = I − . (7.55)
1 + nr P
where the notation c.c. indicates the complex conjugate of the previous term.
The mth element of the array response or steering vector associated with the
element at position ym along the antenna array is given by
{v}m = eik y m u
. (7.57)
The derivative of the steering vector with respect to the direction parameter is
given by
∂
{v̇}m = {v}m
∂u
= ik ym eik y m u
v† v̇ = ik e−ik y m u
ym eik y m u
m
= ik ym . (7.58)
m
The inner product between the derivative of the steering vectors is given by
v̇† v̇ = k 2 e−ik y m u ym
2 ik y m u
e
m
= k nr σy2 .
2
(7.60)
By using these results, the component of the Fisher information matrix associ-
ated with the direction parameter u is given by
nr + n2r P − n2r P
Ju ,u = 2 ns P 2 v̇† v̇
1 + nr P
nr
= 2 ns P 2 k 2 nr σy2 . (7.61)
1 + nr P
The component of the Fisher information matrix associated with the received
signal power is given by
P v v†
JP ,P = ns tr I− v v†
1 + nr P
P v v† †
· I− vv
1 + nr P
2
P v v†
= ns v † I − v
1 + nr P
nr
= ns . (7.62)
1 + nr P
The cross-parameter component of the Fisher information matrix is given by
P v v† †
Ju ,P = ns tr I− P v̇ v + v v̇†
1 + nr P
P v v†
· I− v v†
1 + nr P
P v v† †
= ns tr v† I − P v̇ v + v v̇†
1 + nr P
P v v†
· I− v
1 + nr P
= 0. (7.63)
Because the cross-parameter term is zero, from Equation (7.59) (and the power
term is nonzero), the reduced Fisher information in Equation (7.41) is the same
(r )
as the Fisher information matrix without the nuisance parameters Ju ,u = Ju ,u .
218 Angle-of-arrival estimation
σu2 ≥ J−1
u ,u
1 + nr P
= . (7.64)
2 n2r ns P 2 k 2 σy2
As the SNR P becomes large, the variance on the estimation bound converges
to that of the deterministic signal in Equation (7.46) from above.
The threshold point occurs at the SNR at which the probability of confusing a
mainlobe with sidelobe starts contributing significantly to the angle-estimation
error. This notion is not a precise definition. Depending upon the details, var-
ious systems may have varying sensitivities to the probability of confusion. A
variety of techniques are available to extend parameter-estimation bounds to in-
clude nonlocal effects. One example is the Weiss–Weinstein bound [342]. Here
an approximation is considered. By using the method of intervals [263], nonlo-
cal contributions to the variance are introduced in an approximation. The total
parameter-estimation variance is approximated by considering the local contri-
butions associated with the mainlobe, which are characterized by the Cramer–
Rao bound, and the nonlocal contributions, which are approximated by adding
the variance contributed by a small number of large sidelobes. These sidelobes
correspond to array responses that are similar to that of the mainlobe. This
estimation assumes that the variance is the sum of the variance contributed by
the Cramer–Rao bound times the probability that there is no sidelobe confusion
plus the error squared of introducing an error near some sidelobe peak. For some
parameter φ, its estimate φ̂ is given by maximizing some test statistic t(φ) (or
equivalently some spatial spectral estimator),
φ̂ = argmax{t(φ)} . (7.65)
The test statistic is that which maximizes the likelihood, given a model for a
signal. As an example, consider the single Gaussian signal model in the absence
of interference. In this case, finding the peak of the beamscan test statistic is the
maximum-likelihood solution. Consequently, t(φ) is given by
1 †
t(φ) = v (φ) Z Z† v(φ) . (7.66)
ns
The method of intervals parameter-estimation variance estimate is given by
σφ2 ≈ Pm .l. (SNR) σC2 R ,φ (SNR) + Ps.l.(m ) (SNR) φ2s.l.,m , (7.67)
m
some SNR, Ps.l.(m ) (SNR) is the probability of being confused by the mth sidelobe
at some SNR, and φs.l.,m is the location of the peak of the mth sidelobe. This form
can be simplified further by noting that the nonlocal contributions to the error
are typically dominated by the largest sidelobe. The probability of confusing
the observed array response with the largest sidelobe is denoted Ps.l. (SNR).
Consequently, for mainlobe direction φ0 , the variance is approximated by
Throughout this section, we will not attempt to be precise about Pr{t(φs.l. ) >
t(φ0 )} versus Pr{t(φs.l. ) ≥ t(φ0 )} because it will not introduce a meaningful
difference.
Items (1) and (2) have the same test statistic up to a simple scaling. The prob-
ability of confusion for these types of signals will be considered in Sections 7.8.3
and 7.8.4. The third type of signal with a sequence of random complex Gaussian
signals is considered in Section 7.8.5. If the length of the sequence is long, then
an unknown sequence of constant amplitude can be approximated by the Gaus-
sian signal assumption. However, we will not explicitly evaluate the probability
of confusion for the deterministic signal here.
the vector of Gaussian variables and the reference is a Gaussian variable. The
beamscan angle-of-arrival test statistic for a single observation simplifies to
Z s†
z= √
ns
(ã v(φm .l. ) s + N) s†
= √
ns
√
= ã v(φm .l. ) ns + n
= a v(φm .l. ) + n . (7.71)
Here ã indicates the received signal amplitude per receive antenna (implying the
steering vector normalization v(φ) 2 = nr ), and N ∈ Cn r ×n s is the additive
noise. In the case for which the multiple observations under the assumption of
a known reference collapses to a single observation, the amplitudes for the two
√
cases are related by a = ns ã. The single-observation amplitude a indicates
the received signal amplitude per receive antenna (implying the steering vector
normalization v(φ) 2 = nr ), and n ∈ Cn r ×1 is the additive noise. The proba-
bility of selecting a sidelobe over the mainlobe is given by the probability that
the inner product of the theoretical array response and the observed response is
larger for the sidelobe than the mainlobe,
The probability of selecting the wrong lobe is developed in Sections 7.8.3 and
7.8.4 and is given by
, ,
1 a 2 nr a 2n
r
Ps.l. = 1 − QM (1 + 1 − ρ 2 ), (1 − 1 − ρ 2 )
2 2 2
, ,
a 2 nr a 2 nr
+ QM (1 − 1 − ρ ),
2 (1 + 1 − ρ ) 2 ,
2 2
(7.74)
rm = am + zm , (7.75)
2
where the random central complex Gaussian variable zm has variance σm . With-
out loss of generality, the mean parameter am is assumed to be real. The prob-
ability density for rm is given by the Rician distribution,
2 rm 2 am rm
e−(r m +a m )/σ m drm ,
2 2 2
fm (rm ) drm = 2 I0 2
(7.76)
σm σm
where I0 (·) indicates the modified Bessel function of the first kind, discussed in
Section 2.14.5. The probability of the wrong Rician fluctuating to a level higher
than the other is given by Reference [293]. The probability of Rician r2 exceeding
r1 is given by
2 a22 2 a21
Pr{r2 > r1 } = QM ,
σ12 + σ22 σ12 + σ22
a2+a2
σ2 − 12 22 a1 a2
− 2 1 2 e σ 1 + σ 2 I0 2 2 , (7.77)
σ1 + σ2 σ1 + σ22
where the integral over the second Rician variable r2 . In the following discus-
sion, we evaluate this probability by noting that the complementary CDF of the
noncentral χ2 distribution is given by the Marcum Q-function that is discussed
in Section 2.14.8. By using the relationship developed in Problem 7.6,
√ √ √ √ √
QM ( 2a, 2b) + QM ( 2b, 2a) = 1 + e−(a+b) I0 (2 a b) , (7.78)
To evaluate the probability that one Rician variable fluctuates higher than
another Rician variable under the assumption of independence that is given in
222 Angle-of-arrival estimation
∞ ∞
Pr{r2 > r1 } = dr1 dr2 f1 (r1 ) f2 (r2 )
0 r1
√ √
∞ 2 2
dr1 a2 2 r1 2 2 r1 − a 1σ+2r 1 2 a1 r1
= QM , e 1 I0 , (7.80)
0 σ1 σ2 σ2 σ1 σ12
where the probability density is defined in Equation (7.76), and the discussion in
Section 3.1.14 provides the form for the definite integral. As discussed in Section
2.14.5, the modified Bessel function of the first kind can be expressed as a contour
integral [343, 53]. This integral is given by
1
dx x−m −1 e 2 (x+1/x) ,
z
Im (z) = (7.81)
2πi C
where C is a contour that encircles the origin. The zeroth order modified Bessel
function is given by
1 eax(p+1/p)/2
I0 (a x) = dp
2πi C p
2 2
1 e(a p+ x /p )/2
= dp , (7.82)
2πi C p
∞
a2+x2
QM (a, b) = dx e− 2 x I0 (ax)
b
∞ 2 2
−a
2+x2 1 e(a p+ x /p )/2
= dx e 2 x dp . (7.83)
b 2πi C p
Because the path of a contour integral over holomorphic functions can be de-
formed without consequence while no poles are crossed, the contribution of the
integrand is zero for the left-half plane and at some finite distance into the
right-half plane connect to a path at infinite radius, the contour integral can
be expressed as a line integral at some finite constant positive offset γ from the
7.8 Threshold estimation 223
0 2πi p−1
2 2
1 e(α 1 q + r /q )/2
· dq , (7.86)
2πi q
224 Angle-of-arrival estimation
α2 2 1 + 1 −1 − 1
2 p+α1 q+u ν2p q ν2
−(α 21 + α 22 )/2 ∞
e 1 du e 2
Pr{r2 > r1 } = dp dq
2πi 2πi 0 2 q (p − 1)
−(α 21 + α 22 )/2 α2 2
2 p+α1 q
e 1 e 2
= dp dq
2πi 2πi 1+ 1
− 1
− 1
q (p − 1)
ν2 ν2 p q
−(α 21 + α 22 )/2
e
= dp (7.88)
2πi
α2 2
2 p −α 1 α21 q
1 e 2 e 2
· dq .
2πi q − [1 + 1
− 1 −1 1
− 1
− 1)
ν2 ν2 p ] [1 + ν2 ν 2 p ] (p
The contour integral over q can be evaluated directly by using residues and is
discussed in Section 2.9.1 and in Reference [53], and is given by
α 21
α2
1 e2 q 1 [1+ ν12 − ν 21 p ] −1
dq −1
=e 2
. (7.89)
2πi q − [1 + ν12 − 1
ν2 p ]
α2
2 p
α2
1 [1+ ν12 − ν 21 p ] −1
e−(α 1 + α 2 )/2
2 2 γ 1 +i∞
e 2 e2
Pr{r2 > r1 } = dp
2πi γ 1 −i∞ [1 + ν12 − 1
ν 2 p ] (p − 1)
α21 p
α2
2 p p
2 [ p + 2 − 12 ]
e−(α 1 + α 2 )/2
2 2 γ 1 +i∞
e 2 e ν ν
= dp . (7.90)
2πi γ 1 −i∞ [1 + ν12 − 1
ν 2 p ] (p − 1)
α2
1 α12 p 1 α22 p2 1 + ν12 − ν 22 p + α12 p
α22 p+ = . (7.91)
2 p + ν 2 − 1/ν 2
p
2 p 1 + ν12 − 1/ν 2
7.8 Threshold estimation 225
This substitution will not affect the contour integral if the contour encloses the
poles. The probability of one Rician variable fluctuating about another then
becomes
α2
−(α 21 + α 22 )/2 1
2 α 22 p+ p + p
1 p
−1 / ν 2
e e ν2 p
Pr{r2 > r1 } = dp
2πi p 1+ 1
ν2 − 1/ν 2 (p − 1)
⎛ ⎞
p + 1/ ν 2
α2
⎜ 2 p + 1/ ν 2
1 1+ 1/ν 2 ⎟
1
2 ⎝α2 1+ 1/ν 2
+ p ⎠
−(α 21 + α 22 )/2 e p+1/ν 2
e dp 1+1/ν 2
= 8 9
2πi 1 + 1/ν 2 p p+1/ν 2
−1
1+1/ν 2
1
(α 2 +α 2 ν 2 +α 22 ν 2 p+ α 21 /p )
e−(α 1 + α 2 )/2
2 2
dp e 2(1+ ν 2 ) 2 1 p + 1/ν 2
=
2πi 1 + 1/ν 2 p (p − 1)
1 2 2 2
(α ν p+ α 1 /p )
κ dp e 2(1+ ν 2 ) 2 p + 1/ν 2
= , (7.93)
2πi 1 + 1/ν 2 p (p − 1)
where the final contour integral encloses poles at p = 0 and p = 1, and the
constant κ is given by
1
(α 22 +α 21 ν 2 )
κ = e−(α 1 + α 2 )/2 e 2 ( 1 + ν 2 )
2 2
α 21 + ν 2 α 2
− 12 2
=e 1+ ν 2 . (7.94)
By employing the partial fraction expansion for the denominator of the integrand
p+γ 1+γ γ
= − , (7.95)
p (p − 1) p−1 p
where the integrals A1 and A2 are defined implicitly by expanding the parenthet-
ical term. By substituting the value of the parameter for κ in Equation (7.94),
226 Angle-of-arrival estimation
and
κ dp 1 2 2 2
2 ) (α 2 ν p+ α 1 /p )
1
A2 = e 2 ( 1 + ν
2πi 1 + 1/ν 2 ν2 p
1 2 2 2
(α ν p+ α 1 /p )
κ 1 e 2(1+ ν 2 ) 2
= dp
1 + ν 2 2πi p
κ 2 2
α2 ν α1 2
= I0
1 + ν2 1 + ν2
2 a 21 σ 2 2 a 22
+ 22
σ 12 σ 1 σ 22
− 12
σ2 ⎛ 2 a2 2 2 ⎞
1 + 22 2 σ2 2 a1
σ1
e σ 22 σ 12 σ 12
= σ 22
I0 ⎝ σ 22
⎠
1+ σ 12
1+ σ 12
a2+a2
σ12 − 12 22
σ1+σ2
a1 a2
= 2 e I0 2 2 . (7.98)
σ1 + σ22 σ1 + σ22
Consequently, the probability that one independent Rician variable fluctuates
above another is given by
a2+a2
2 a22 2 a21 σ12 − 21 22 a1 a2
Pr{r2 > r1 } = QM , − 2 e σ1+σ2
I0 2 2 .
σ12 + σ22 σ12 + σ22 σ1 + σ22 σ1 + σ22
(7.99)
variables are associated with inner products between the observed array response
and the theoretical array response for the mainlobe and the sidelobe. These
random variables are correlated.
Here an approach to translate the results from the previous section to the
problem of correlated Rician variables is discussed. A thorough discussion of
correlated Rician random variables can be found in Reference [293], for example.
The Rician variables r1 and r2 are given by the magnitudes of the complex
Gaussian variables x1 and x2 .
In this section, uncorrelated variables are constructed by applying a transfor-
mation to the correlated variables. The newly constructed uncorrelated complex
Gaussian variables will be indicated by the vector and scalars,
x1
x= , (7.100)
x2
or
† 1 0
y y<0
0 −1
y1 2
− y2 2
< 0. (7.103)
The correlated variables are given by the inner product of the mainlobe array
response v† (um .l. ), the sidelobe array response v† (us.l. ), and the received array
response z ∈ Cn r ×1 such that z = a v(um .l. ) + n ∈ Cn r ×1 ,
1 v† (um .l. ) z
y= √
nr v† (us.l. ) z
1 v† (um .l. ) [a v(um .l. ) + n]
=√
nr v† (us.l. ) [a v(um .l. ) + n]
1 a nr + v† (um .l. ) n
=√ , (7.104)
nr a ρ nr + v† (us.l. ) n
228 Angle-of-arrival estimation
where a is the product of the signal amplitude and the channel attenuation, and
ρ is the normalized inner product between the theoretical sidelobe and mainlobe
array responses. The mean of the correlated random variables is then given by
) *
1 a nr + v† (um .l. ) n
y = √
nr a ρ nr + v† (us.l. ) n
1 a nr
=√ . (7.105)
nr a ρ nr
The covariance matrix C ∈ C2×2 for the correlated variable is given by
% &
C = [y − y] [y − y]†
) † *
1 v (um .l. ) n n† v(um .l. ) v† (um .l. ) n n† v(us.l. )
=
nr v† (us.l. ) n n† v(um .l. ) v† (us.l. ) n n† v(us.l. )
†
1 v (um .l. ) v(um .l. ) v† (um .l. ) v(us.l. )
=
nr v† (us.l. ) v(um .l. ) v† (us.l. ) v(us.l. )
1 ρ∗
= . (7.106)
ρ 1
While it is not obvious, there exists a linear transformation that relates the
correlated and uncorrelated variables. This transformation simultaneously main-
tains the difference relationships in Equations (7.102) and (7.103), and decor-
relates the variables in y. The invertible linear transformation between the two
vectors is given by the matrix A ∈ C2×2 , such that the two vectors are related by
x = Ay (7.107)
and
y = A−1 x . (7.108)
The transformation matrix must satisfy the following relations. Firstly, for the
transformed random variables to be uncorrelated, the off-diagonal entries in the
transformed covariance matrix must be zero,
2
† σx 1 0
ACA = , (7.109)
0 σx2 2
and secondly, the difference between the magnitude squared of the random vari-
ables must be conserved, x1 2 − x2 2 = y1 2 − y2 2 , which is satisfied by
requiring
1 0 1 0
A† A= . (7.110)
0 −1 0 −1
As suggested in Reference [293], a linear transform that satisfies these require-
ments is given by
1 + β (1 − β) e−iα
A=b , (7.111)
1 − β (1 + β) e−iα
7.8 Threshold estimation 229
x = Ay. (7.119)
where the means are given by am = xm for m = 1, 2. One should note
here that am is not the signal amplitude-attenuation product indicated by a. By
using Equation (7.105), the value for the means of the newly created uncorrelated
variables is given by
x = A y
⎛ ⎞
ρ+1 ρ+1
1 + 1−ρ e−iα 1 − 1−ρ 1 a n
=b ⎝ ⎠ √
r
ρ+1 ρ+1
1 − 1−ρ e−iα 1 + 1−ρ nr a ρ nr
√ 1+ ρ + 1 − ρ 2
= b a nr . (7.121)
1+ ρ − 1− ρ 2
By noting that σ12 = σ22 = σx2 1 = σx2 2 from Equation (7.116), the first and second
arguments of the Marcum Q-function in Equation (7.120) are given by
2 a22 a22
=
σ12+ σ22 σ12
b2 a2 nr (1 + ρ − 1 − ρ 2 )2
=
4b2 (1 + ρ )
,
a2 nr
= (1 − 1 − ρ 2 ) (7.122)
2
and
2 a21 a21
=
σ12+ σ22 σ12
,
a2 nr
= (1 + 1 − ρ 2 ) . (7.123)
2
7.8 Threshold estimation 231
where the SNR per receive antenna under the assumption of a single observation
is a2 .
a indicates the received signal amplitude per receive antenna (implying the steer-
ing vector normalization v(φ) 2 = nr ), the complex Gaussian signal is indicated
by s ∈ C1×n s , and N ∈ Cn r ×n s is the additive noise.
The probability of the test statistic at some sidelobe fluctuating above the
mainlobe Ps.l. (SNR) is given by
v† (φs.l. ) Z 2
Ps.l. (SNR) = Pr >1
v† (φm .l. ) Z 2 }
= Pr{ v† (φs.l. ) Z 2
− v† (φm .l. ) Z 2 } > 0} . (7.127)
where the SNR per sample per receive antenna is given by P = a 2 . In the
end, the overall scale will not be significant, so it is convenient to consider the
normalized covariance C̃Y is given by
1
C̃Y = CY . (7.137)
ns n2r
α = arg(ρ) (7.139)
and
(P + 1) + (P ρ 2 + 1) + 2 ρ (P + 1)
β= . (7.140)
(P + 1) + (P ρ 2 + 1) − 2 ρ (P + 1)
By evaluating Equation (7.138) with forms for α and β, the variances for the
uncorrelated variables is given by
2(P + 1)(ρ + 1)
σ12 =
β(P (1 − ρ) + 2) − P (ρ + 1)
2(P + 1)(ρ + 1)
σ22 = . (7.141)
β(P (1 − ρ) + 2) + P (ρ + 1)
σ12 β [P (1 − ρ) + 2] + P (ρ + 1)
= . (7.142)
σ22 β [P (1 − ρ) + 2] − P (ρ + 1)
Because the noise and the signal are assumed to be Gaussian, the difference
expressed in Equation (7.133) is the difference between random χ2 variables.
However, also from Equation (7.133), the test expressed as the difference between
the magnitudes squared of the vectors can also be expressed as a test in terms of
the ratio of these magnitudes squared. The ratio of two degree-normalized central
χ2 variables is given by the F distribution, as discussed in Section 3.1.13. The
probability density of a given value of the ratio q is denoted q ∼ pF (q; d1 , d2 ),
where d1 and d2 indicate the degrees of the χ2 variables. In our case, the two
χ2 variables have the same degree. If the ratio of two equal-degree complex χ2
234 Angle-of-arrival estimation
where q̃ is the unnormalized ratio associated with the test statistic. Confusion
between the sidelobe and the mainlobe occurs when the random variable q̃ > 1,
so that the probability of selecting the sidelobe over the mainlobe Ps.l. (SNR) is
given by
Consequently, the probability of selecting the sidelobe over the mainlobe Ps.l. (SNR)
is given by
σ 12
2 ns
σ 22
B σ 12
; ns , ns
2 n s +2n s
σ 22
Ps.l. (SNR) = 1 −
B(ns , ns )
σ 12
B σ 12 +σ 22
; ns , ns
=1− , (7.146)
B(ns , ns )
where the ratio of the decorrelated variable variances is given by Equation (7.142)
using Equation (7.140) in which the SNR per sample per receive antenna is given
by P .
7.9 Vector sensor 235
H2 H1
E2
H3
E1
E3
Figure 7.3 Notional representation of a vector sensor. All three electric (E1 , E2 , E3 )
and magnetic fields (H1 , H2 , H3 ) are measured simultaneously based at a single
phase center. The electric fields are measured by using the dipole antennas, and the
magnetic fields are measured by using the loop antennas.
The antenna array discussed previously exploits the relative phase delay induced
by the time delay of signals impinging upon each antenna to determine direc-
tion. A vector sensor employs an array of antennas at a single phase center.
Consequently, there is no relative phase delay. Instead, the vector sensor finds
the direction to a source by comparing the relative amplitudes [229]. The vector
sensor employs elements that are sensitive to electric and magnetic fields along
each axis, as seen in Figure 7.3. Depending upon the direction of the impinging
signal, different elements will couple to the wavefront with different efficiencies.
Because the polarization of the incoming signal is unknown, and different po-
larizations will couple to each antenna with different efficiencies, the incoming
signal polarization must be determined as a nuisance parameter.
The electric and magnetic fields are indicated by
⎛ ⎞
E1
e = ⎝ E2 ⎠ (7.147)
E3
and
⎛ ⎞
H1
h = ⎝ H2 ⎠ , (7.148)
H3
respectively. Under the assumption of free space propagation, the Poynting vector
[154] with power flux P is given by the cross product of the electric field and
the magnetic field. Here, the unit-norm direction vector u indicates the direction
from the receive array to the source (the opposite direction of the Poynting
236 Angle-of-arrival estimation
vector):
u P = −e × h
1
u×e=− e×h×e
P
e 2
u×e=− h (7.149)
P
by using the relationship
a × b × c = b(a · c) − c(a · b) . (7.150)
The six receive signals are given by the three electric field measurements
zE (t) = e + nE (t) (7.151)
and
e 2
zH (t) = − h + nH (t) , (7.152)
P
where the noise for the electric and magnetic field measurements are indicated
by nE (t) ∈ C3×1 and nH (t) ∈ C3×1 , respectively.
The six measured receive signals are a function of direction and polarization,
and are given by
zE (t) I
= V ξ(t) + n(t) , (7.153)
zH (t) [u×]
where the noise vector n(t) is given by
nE (t)
n(t) = , (7.154)
nH (t)
the direction cross-product operator [u×] is given by
⎛ ⎞
0 −{u}3 {u}2
[u×] = ⎝ {u}3 0 −{u}1 ⎠ , (7.155)
−{u}2 {u}1 0
and for a given polarization vector ξ(t) ∈ C2×1 , the matrix V ∈ R3×2 that maps
the two polarization components orthogonal to the direction of propagation to
the three spatial dimensions is given by
⎛ ⎞
−sin φ −cos φ sin θ
V = ⎝ cos φ −sin φ sin θ ⎠ . (7.156)
0 −cos θ
Here the angle φ is defined as the angle from the 1 axis in the 1–2 plane, and θ
is the angle from the 3-axis.
Because the vector sensor has no aperture, the intrinsic resolution is relatively
poor, of the order one radian. This intrinsic resolution can be determined from
the multiplicative constant term in the Cramer–Rao bound [229]. To achieve
reasonable angle-estimation performance requires beamsplitting under the as-
sumption of a high SNR signal.
Problems 237
0.4
0.2
sin φ
0.0
- 0.2
- 0.4
Figure 7.4 Beam pattern as a function of angle φ from boresight of vector sensor
assuming eθ polarization for elevations of 0 (black, in the plane of the transmitter),
45 degrees (dark gray), and 67.5 degrees (light gray).
In Figure 7.4, the beam pattern for a vector sensor is displayed. Only the
electric field along the polar direction (eθ in using the notation from Section
5.1) response is considered in the beam pattern. Because the vector sensor has
no intrinsic aperture, the beamwidth is very wide. In addition to beam pattern
in the plane (0 degrees), patterns for elevations of 45 and 67.5 degrees are both
smaller in their response to the in-plane (0 degrees) excitation.
Problems
Receiver
Transmitter
Scattering
Field
complex number. This is a valid characterization of the channel when the signal
bandwidth B is small compared to the inverse of the characteristic delay spread
Δt,
1
B . (8.1)
Δt
This regime is also described as a flat-fading channel because the same complex
attenuation can be used across frequencies employed by the transmission and is
consequently flat (as opposed to frequency-selective fading). The elements in the
flat-fading channel matrix
H ∈ Cn r ×n t
contain the complex attenuation from each transmitter to each receiver. For
example, the path between the mth transmitter and nth receiver has a complex
attenuation {H}n ,m . A received signal z(t) ∈ Cn r ×1 as a function of time t is
given by
where the transmitted signal vector and additive noise (including external in-
terference) as a function of time are denoted s(t) ∈ Cn t ×1 and n(t) ∈ Cn r ×1 ,
respectively.
It is often convenient to consider a block of data of ns samples. The received
signal for a block of data with ns samples is given by
Z = HS + N, (8.3)
8.2 Interference
where the interference contained within N is expressed in the sum of the terms
Jm Tm . The term Ñ is the remaining thermal noise. The interference channels
Jm are typically statistically equivalent to those for the signal of interest of the
channel H.
The nature (really the statistics) of the external interference signal can have
a significant effect on a communication system’s performance; thus, priors on
the probability distributions for the interference can have a dramatic effect on
receiver design. As an example, if the interference signal and its channel J T
are known exactly, then the interference has no effect because a receiver that
is aware of these parameters can subtract the contributions of the interference
from the received signal perfectly, assuming a receiver with ideal characteristics.
However, a receiver that cannot take advantage of this knowledge will be forced
242 MIMO channel
Z̃ = Z P⊥
T
P⊥ † † −1
T = I − T (TT ) T. (8.5)
h(x) = − dΩx p(x) log2 [p(x)] , (8.6)
where p(x) is the probability density function for the random vector x ∈ Cn r ×1
and dΩx indicates the differential hypervolume associated with the integration
variable x, as discussed in Section 2.9.2. As discussed in Section 5.3.3, the Gaus-
sian distribution maximizes entropy for a given signal variance. This property is
also true in the case of multivariate distributions. The differential entropy of an
nr -dimensional multivariate mean-zero complex Gaussian distribution denoted
8.3 Flat-fading MIMO capacity 243
m =1 n =1
π
= log2 [|R| π n r ] + log2 [e] nr
= log2 ([π e]n r |R|) , (8.7)
The maximum spectral efficiency at which the effective error rate can be driven to
zero (or in other words capacity) for a flat-fading link is found by maximizing the
mutual information [68], as introduced in Section 5.3.2. The spectral efficiency
is defined by the data rate divided by the bandwidth and has the units of bits
per seconds per hertz (b/s/Hz) or equivalently (b/[s Hz]). The units of bits per
seconds per hertz are just bits, although it is sometimes useful to keep the slightly
clumsy longer form because it is suggestive of the underlying meaning. To find
the channel capacity, both an outer bound and an achievability (inner) bound
must be evaluated, and it must be shown that these two bounds are equal. In the
following discussion, it is assumed without proof that Gaussian distributions are
capacity achieving for MIMO links. More thorough discussions are presented in
[308, 68]. There are various levels of channel-state information available to the
transmitter. The spectral efficiency bound increases along with the amount of
information available to the transmitter. As we use it here, the term capacity is
a spectral efficiency bound. However, not all useful spectral efficiency bounds are
capacity; because of some other constraints or lack of channel knowledge, a given
spectral efficiency bound may be less than the channel capacity given complete
244 MIMO channel
channel knowledge. One might argue reasonably that only when the entire system
has knowledge of the channel (with the exception of noise) is the maximum
achievable spectral efficiency bound; thus, is the channel capacity. However, in
practice it is common to refer to as capacity, multiple spectral efficiency bounds
with different assumptions on system constraints. Given this practice, some care
must be taken when a given spectral efficiency bound is identified as channel
capacity.
In maximizing the mutual information, a variety of constraints can be imposed.
The most common constraint is the total transmit power. For the MIMO link,
an additional requirement can be placed upon the optimization: knowledge of
channel-state information (CSI) at the transmitter. If the transmitter knows
the channel matrix, then it can alter its transmission strategy (which in theory
can be expressed in terms of the transmit signal covariance matrix) to improve
performance. Conversely, if the channel is not known at the transmitter, and
this is more common in communication systems, then the transmitter is forced
to employ an approach with lower average performance.
Because the channel is represented by a matrix in MIMO communications
compared to a scalar in SISO communications, the notion of channel knowledge
is more complicated. In both cases, the channel state can be completely known
exactly or statistically. However, in the case of MIMO, the notion of statistical
knowledge is even more involved. As an explicit example, all flat-fading SISO
channels of the same attenuation have the same capacity as a function of transmit
power, but all MIMO channel matrices with the same Frobenius norm (which
implies the same average attenuation) do not have the same capacity.
An issue in considering performance of communication systems is in relating
theoretical and experimental analyses of performance. In general, this is true
for both SISO and MIMO systems, although it is slightly more complicated for
MIMO systems. Theoretical discussions of MIMO communications are typically
discussed in terms of average SNR per receive antenna. However, the SNR es-
timate produced from a channel measurement is not the same. Explicitly, this
is understood by noting that the estimate of the SNR for a particular esti-
mated channel instance is not the same as the average SNR for an ensemble of
channels,
2
) *
∝ Ĥ F = SNR ∝ H 2F
SNR , (8.8)
nr nr
where the notation ˆ· indicates an estimated parameter. This difference is dis-
cussed in greater detail in Section 8.11. Implicit in this formulation of SNR is
the notion that each transmit antenna excites the channel with independent sig-
nals with equal power (the optimal solution for the uninformed transmitter). If
the transmit antennas incorporate correlations to take advantage of the channel-
state information (an informed transmitter solution), then this discussion is even
more complicated. A more thorough discussion of channel-state information is
presented in Section 8.3.1.
8.3 Flat-fading MIMO capacity 245
and
Po −1
c = log2 I + R H H† , (8.11)
nt
respectively. In the informed transmitter case, the transmit spatial covariance
matrix P ∈ Cn t ×n t contains the optimized statistical cross correlations between
transmit antennas. The total transmit power is indicated by Po . The interference-
plus-noise spatial covariance matrix is indicated by R ∈ Cn r ×n r . For conve-
nience, it is often assumed that the transmit spatial covariance matrix P and
the interference-plus-noise spatial covariance matrix R ∈ Cn r ×n r are expressed
in units of thermal noise. Under this normalization, a thermal noise covariance
matrix is given by I. As a reminder, we are considering signals in a complex
baseband representation. Consequently, for each symbol there are two degrees
of freedom (real and imaginary), so the “1/2” in the standard form of capacity
“1/2 log2 (1 + SNR)” is not present.
any receive antenna as a function of frequency and time. It would also include
any noise or interference introduced in the channel. In discussions of dirty-paper
coding, introduced in Section 5.3.4, it is the noise or the interference that is
referenced when the concept of channel-state information is considered. In this
chapter, and in most practical wireless communications, the focus of channel
knowledge is the complex attenuation between transmit and receive antennas
that is represented by the channel matrix for MIMO links. Channel-state infor-
mation may also include knowledge of the statistical properties of the interfer-
ence and noise, typically represented by the interference-plus-noise covariance
matrix. It is typically assumed that the receiver can estimate the channel. This
estimation can be done by employing joint data and channel estimation, or, more
typically, by including a known training or pilot sequence with which the channel
can be estimated as part of the transmission.
At the transmitter, access to knowledge of the channel state is problematic. In
rare circumstances, the transmitter can exploit knowledge of the geometry and
an exact model for the environment (such as of line-of-sight channels), but this
approach is rarely valid for terrestrial communications. If there is not a means
for a transmitter to obtain channel-state information, then the transmitter is
said to be uninformed. If there is a communication link from the receiver, then
channel estimates can be sent back from the receiver to the transmitter, and
the link has an informed transmitter. Approaches to efficiently encode channel
estimates have been considered [195] and are discussed in Section 8.12.2. If the
link is bidirectional, on the same frequency and using the same antennas, then
reciprocity can be invoked so that the channel can be estimated while in the
receive mode and then exploited during the transmit mode. As discussed in
Section 8.12.1, there are some technical issues in using the reciprocity approach.
When using either channel-estimation feedback or reciprocity, the time-varying
channels can limit the applicability of these techniques [30]. If the channel is
very stable, which may be true for some static environments, then providing
the transmitter with channel-state information may be viable. If the channel is
dynamic, as in the case of channels with moving transmitters and receivers, the
channel may change significantly before the transmitter can use the channel-state
information. In this case, it is said that the channel-state information is stale.
In reaction to potentially stale channel-state information, one approach is to
provide the transmitter access to statistical characteristics of the channel. As an
example, if the typical distribution of the singular values of the channel matrix
can be estimated, then space-time codes can be modified to take advantage
of these distributions. Explicitly, if channels can be characterized typically by
high-rank channel matrices, then codes with higher rates may be suggested.
Conversely, if the channels can be characterized typically by low-rank channel
matrices, the codes with high spatial redundancy may be suggested. Trading rate
for diversity is discussed in Chapter 11.
In addition, there are different levels of knowledge of interference for the trans-
mitter. If the interference signals are known exactly at the transmitters, then
8.3 Flat-fading MIMO capacity 247
received signal conditioned by the transmitted signal are given by h(z) and
h(z|s), respectively. For the sake of notational convenience, the explicit parame-
terization of time z(t) ⇒ z is suppressed. Here the maximum mutual information
provides an outer bound on the spectral efficiency. As discussed in Section 5.3
and discussed for MIMO systems in [308], the mutual information is maximized
by employing a Gaussian distribution for s. The worst-case noise plus interfer-
ence is given by a Gaussian distribution for n. The probability distribution for
the received signal given the transmitted signal p(z|s) is given by
1 † −1
p(z|s) = e−(z−H s) R (z−H s) . (8.13)
|R| π n r
The probability distribution for the received signal without knowledge of what
is being transmitted p(z) is typically modeled by
1 † −1
p(z) = e−z Q z , (8.14)
|Q| π n r
where the combined spatial covariance matrix Q ∈ Cn r ×n r is given by
Q = R + H P H† . (8.15)
The differential entropy for the received signal given knowledge of what is
transmitted h(z|s) is just the entropy of the Gaussian noise plus the interference
h(n) because n = z − H s and is given by
h(z|s) = h(n)
= log2 (π n r en r |R|) . (8.16)
Consequently, the mutual information in units of bits per seconds per hertz is
given by
The notation (a)+ = max(0, a) indicates here that if the value of argument is
limited to non-negative values, and the parameter ν is varied so that the following
condition is satisfied,
1
+
ν+ −1/2
= Po . (8.21)
m
λm {R H H† R−1/2 }
We redevelop the same capacity explicitly providing a useful form. The whitened
channel R−1/2 H can be represented by the singular-value decomposition
D 0
R−1/2 H = (U U) (W W)† , (8.23)
0 D
where the nonzero singular values and corresponding singular vectors are par-
titioned into two sets. A subset of n+ singular values of the whitened channel
matrix is contained in the diagonal matrix D ∈ Rn + ×n + , and the remaining
min(nr , nt ) − n+ are contained in the diagonal matrix D.
In the following discussion, we develop the criteria for finding the subset of
whitened channel matrix singular values. The corresponding left and right sin-
gular vectors are contained in U, U, W, and W. The columns of U ∈ Cn r ×n +
are orthonormal, and the columns of W ∈ Cn t ×n + are orthonormal. For some
subset (contained in D) of whitened channel singular values, the subspace of the
nonzero eigenvector of P is constrained to be orthogonal to the columns of W,
250 MIMO channel
Po ≥ tr{P}
= tr{W† P W}
= tr{D−1 C (D† )−1 }
= tr{(D† D)−1 C} (8.26)
because all of the power in P is contained in the subspace defined by the orthonor-
mal matrix W, replacing the transmit covariance matrix with the quadratic form
W† P W does not change the total power, tr{P} = tr{W† P W}. Because D is
a real, symmetric, diagonal matrix, the transpose contribution of the Hermitian
conjugate has no effect; thus, D† D = D∗ D. The capacity (optimized spectral
efficiency) is given by
To simplify the expression, the notation η = η / log2 (e) is used. This relationship
is satisfied if C is given by the diagonal matrix
1 †
C= D D − In + . (8.29)
η
The value for the Lagrangian multiplier η is found by imposing the total power
constraint,
Po = tr{P}
= tr{C (D† D)−1 }
I
= tr − (D† D)−1
η
1 Po + tr{(D† D)−1 }
= . (8.30)
η n+
Consequently, the noise-free receive covariance matrix C is given by
Po + tr{(D† D)−1 }
C= D† D − I n + . (8.31)
n+
The non-negative power constraint is satisfied if the eigenvalues of the transmit
covariance matrix are positive,
P≥0
P = C (D† D)−1 ≥ 0 . (8.32)
density p(R−1/2 H)
for any unitary matrix U. The goal is to optimize the transmit covariance matrix
P. Because any Hermitian matrix can be constructed by U P U† , starting with
a diagonal matrix P, there is no reason to consider any transmit covariance
matrices with off-diagonal elements. Another way to view this is that under the
random channel matrix assumption, the transmitter cannot have knowledge of
any preferred direction. If the whitened channel matrix can be represented by
the singular-value decomposition R−1/2 H = Ũ S̃ Ṽ† , then the ergodic (average
over time) capacity is given by
5 6
cU T = log2 |I + S̃ S̃† Ṽ† P Ṽ|
5 6
= log2 1 + λm {S̃ S̃† Ṽ† P Ṽ} , (8.39)
m
given by
cU T log2 In r + Po
nt H H†
=
cS I S O log2 (1 + a2 Po )
"n r 0 1
m =1 log 2 m In r +
λ Po
nt H H†
=
log2 (1 + a2 Po )
"m in(n r ,n t ) 0 1
m =1 log 2 λ m In r + Po
nt H H†
=
log2 (1 + a2 Po )
"m in(n r ,n t ) 0 1
m =1 log2 (Po ) + log2 λm n1t H H†
→
log2 (Po ) + log2 (a2 )
"m in(n r ,n t )
log2 (Po )
→ m =1
log2 (Po )
= min(nr , nt ) (8.42)
in the limit of large transmit power. The convergence to this asymptotic result
is very slow. Consequently, this often-quoted result is mildly misleading because
practical systems typically work in SNR regimes for which this limit is not valid.
Furthermore, the advantages of MIMO are often in the statistical diversity it
provides, which improves the robustness of the link. Nonetheless, the above result
and the following sections can be used to provide some insight into potential
performance improvements or limits when used properly.
where we employ the observations that the summation only needs to occur over
arguments of the logarithm that are not unity, and that the eigenvalues of the
finite channel components are small compared to the large power term. The con-
vergence to the final result is relatively slow. In general, the theoretical capacity
is not significantly affected as long as the number of antennas is much larger
than the number of interferers.
A practical issue with this analysis is that at very high INR, the model that
J can be completely contained within a subspace fails because of more subtle
physical effects. As examples, the effects of dispersion (that is resolvable delay
spread) across the array or receiver linearity can cause the rank of the interference
covariance to increase. However, for many practical INRs, the analysis is a useful
approximation.
† −1 −1
cI T log2 P o +tr{(H n +R H ) } H† R−1 H
=
cU T
log2 In r + Pn ot H† R−1 H
† −1
H ) −1 }
nt log2 P o +tr{(H nR + log2 H† R−1 H
= t
. (8.47)
log2 In t + Pn ot H† R−1 H
In the limit of large SNR (Po tr{(R−1 H† H)−1 }), the difference between the
various channel eigenvalues becomes unimportant, and the capacity ratio is given
8.3 Flat-fading MIMO capacity 257
by
cI T nt log2 + log2 H† R−1 H
Po
nt
→
cU T
log2 Pn ot H† R−1 H
nt log2 Pn ot + log2 H† R−1 H
→
nt log2 Pn ot + log2 |H† R−1 H|
= 1. (8.48)
d = λm ax {R−1/2 H H† R−1/2 }
= D† D , (8.49)
where in this limit the matrix D collapses to a scalar of the dominant singular
value because n+ = 1. In this limit of low SNR, the ratio of the informed to the
uninformed capacity cI T /cU T is given by
P o +(d) −1
cI T log2 n+ d
=
cU T
log2 In r + Po
nt R−1/2 H H† R−1/2
−1
log P o +(d)
n+ d
=
log In r + Pn ot R−1/2 H H† R−1/2
log (1 + Po d)
=" 0 1
m log λm In r + Po
nt R−1/2 H H† R−1/2
log (1 + Po d)
=" 0 1 . (8.50)
m log 1 + λm Pn ot R−1/2 H H† R−1/2
In the low SNR limit, the eigenvalues are small, so the lowest-order term in the
logarithmic expansion about one is a good approximation; thus, the capacity
258 MIMO channel
ratio is given by
cI T Po d
→" 0 1
cU T Po −1/2 H H† R−1/2
m λm n t R
λm ax {R−1/2 H H† R−1/2 }
= " 3 4
1 −1/2 H H† R−1/2
nt m λm R
λm ax {H† R−1 H}
= " † −1 H}
, (8.51)
m λm {H R
1
nt
by using Equation (8.35) with n+ = 1 and Equation (8.37). Given this low SNR
asymptotic result, a few observations can be made. The spectral-efficiency ra-
tio is given by the maximum to the average eigenvalue ratio of the whitened
channel matrix H† R−1 H. If the channel is rank one, such as in the case of a
multiple-input single-output (MISO) system, the ratio is approximately equal to
nt . Finally, in the special, if physically unlikely, case in which R−1/2 H H† R−1/2
has a flat (that is, all equal) eigenvalue distribution, the optimal transmit co-
variance matrix is not unique. Nonetheless, the ratio cI T /cU T approaches one.
It is worth repeating here that, when embedded within a wireless network, the
optimization and potential performance benefits are not the same as an isolated
link discussed above.
where the distance between frequency samples is given by Δf , and the nf -bin
frequency-partitioned channel matrix Ȟ is given by
⎛ ⎞
H(f1 ) 0 0 0
⎜ 0 H(f2 ) 0 0 ⎟
⎜ ⎟
Ȟ ≡ ⎜ . ⎟, (8.53)
⎝ . . ⎠
0 0 H(fn f )
and the frequency-partitioned interference-plus-noise spatial covariance matrix
is given by
⎛ ⎞
R(f1 ) 0 0 0
⎜ 0 R(f2 ) 0 0 ⎟
⎜ ⎟
Ř ≡ ⎜ .. ⎟. (8.54)
⎝ . ⎠
0 0 R(fn f )
In order to construct the discrete approximation, it is assumed that any variation
in channel or interference-plus-noise covariance matrix within a frequency bin is
insignificant.
For the informed transmitter channel capacity, power is optimally distributed
among both spatial modes and frequency channels. The capacity can be ex-
pressed as
1
cI T ,F S ≈ max log2 I + Ř−1 Ȟ P̌ Ȟ† , (8.55)
P̌ nf
which is maximized by Equation (8.35) with the appropriate substitutions for
the frequency-selective channel, and diagonal entries in D in Equation (8.33) are
selected from the eigenvalues of ȞȞ† . Because of the block diagonal structure of
Ȟ, the (nt · nf ) × (nt · nf ) space-frequency noise-normalized transmit covariance
matrix P̌ is a block diagonal matrix, normalized so that in each frequency bin
the average noise-normalized transmit power is Po , which can be expressed as
tr{P̌}/nf = Po . There are a number of potential issues related to the use of
discretely sampled channels. Some of these effects are discussed in greater detail
in Section 10.1.
H = avw† , (8.56)
where a is the overall complex attenuation, and v and w are the receive and
transmit array steering vectors, respectively.
To further study the line-of-sight model, we consider an example 2 × 2 channel
in the absence of external interference, and in which the transmit and receive
arrays grow. To visualize the example, one can imagine a receive array and a
transmit array each with two antennas so that the antennas are located at the
corners of a rectangle, as seen in Figure 8.2. The ratio of the larger to the smaller
channel matrix eigenvalues can be changed by varying the shape of the rectangle.
When the rectangle is very asymmetric (wide but short) with the arrays being
far from each other, the rank-1 channel matrix is recovered. The columns of the
channel matrix H can be viewed as the receive-array response vectors, one vector
for each transmit antenna,
√
H= 2 (a1 v1 a2 v2 ) , (8.57)
† 2 1 (v1† v2 )∗
H H = 2a (8.58)
v1† v2 1
8.5 2 × 2 Line-of-sight channel 261
V1
V2
are given by
1 + 1 ± (1 − 1)2 + 4 v1† v2 2
μ = 2a2
2
2 †
μ1 = 2a 1 + v1 v2
μ2 = 2a2 1 − v1† v2 , (8.59)
2 0 1
b= arccos v1† v2 . (8.60)
π
µ1
0
Eigenvalue/a2 (dB)
- 10 µ2
- 20
- 30
2
μ2 > 1 1 ,
Po + μ1 + μ2
1 1
Po > −
μ2 μ1
v1† v2
> , (8.61)
a2 1 − v1† v2 2
cI T = log2 (1 + μ1 Po )
= log2 1 + 2a2 [1 + v1† v2 ] Po ; (8.62)
8.5 2 × 2 Line-of-sight channel 263
10
Spectral Efficiency
5
(b/s/Hz)
2
1
0.5
– 10 –5 0 5 10 15 20
2
a Po (dB)
resolve antenna elements is related to the number of large singular values of the
channel matrix and thus the capacity.
H = U D V† . (8.65)
8.6 Stochastic channel models 265
This model is based upon the notion that the environment is full of scatterers.
The signal seen at each receive antenna is the sum of a random set of wavefronts
bouncing off the scatterers. For a SIMO system under the assumptions of a
266 MIMO channel
nondispersive array response (bandwidths small compared with the ratio of the
speed of light divided by the array size) and scatterers in the far field of the
array, the channel vector hm ∈ Cn r ×1 (mth column of H) is given by
hm = am ,n v(km ,n )
n
∼ g, (8.72)
where the vector g is drawn from the limiting (that is, large number of scatter-
ers) distribution, v(km ,n ) ∈ Cn r ×1 is the array response for a single wavefront-
associated direction of the wavevector km ,n , and am ,n is a random complex
scalar. The values of am ,n are determined by the propagation from the trans-
mitter impinging on the array from direction km ,n . For physically reasonable
distributions for am ,n and km ,n , in complicated multipath environments, the
central limit theorem [241] drives the probability distributions for the entries in
hm to independent complex circular Gaussian distributions. Consequently, by
employing the assumption that all transmit–receive pairs are uncorrelated, the
entries in H are drawn independently from a complex circular Gaussian distri-
bution.
The random matrix with elements drawn from a complex circular Gaussian
distribution with unit variance is often indicated by G, such that
H = aG, (8.73)
= nr nt . (8.74)
scattering fields as seen by either the transmit and receive array from the other
array subtends a limited field of view. As a consequence, the random channel
has spatial correlation. For the described situation, the model for the channel
[110, 30],
H ∝ Mr G M†t , (8.76)
can be employed, so that
vec{H} ∝ (M∗t ⊗ Mr ) vec{G } . (8.77)
Consequently, this model is sometimes denoted the Kronecker channel. The ma-
trices Mr and Mt introduce spatial correlation associated with the receiver and
transmitter respectively.
The spatially coloring matrices Mr and Mt can be decomposed by using a
singular-value decomposition such that
Mr = Ur Dr Vr† (8.78)
and
Mt = Ut Dt Vt† . (8.79)
The spatially correlated channel can then be represented by
H = a Ur Dr Vr† G Vt† Dt U†t
= a Ur Dr G Dt U†t , (8.80)
where G and G are matrices with elements drawn independently from a com-
plex circular unit-variance Gaussian distribution. The matrices G and G are
related by a unitary transformation. The two matrices are statistically equiva-
lent because unitary transformation of a complex circular unit-variance Gaussian
matrix with independent elements produces another Gaussian complex circular
unit-variance Gaussian matrix with independent elements.
Reduced-rank channels
When one is simulating channels, random unitary and Gaussian matrices can be
generated for a given average attenuation a and diagonal matrices Dr and Dt .
There is significant literature on selection values for the average SISO attenua-
tion, a [140, 260, 188]. However, it is less clear on how to determine values for
Dr and Dt . One model is to assume that the diagonal values are given by some
specified number of equal-valued elements and zero otherwise, of the form
Im r 0
Dr = , (8.81)
0 0
where mr sets the rank of Dr . The form of Dt is given by replacing mr with
mt . In the channel model given in Equation (8.80), the unitary matrices Ur
and Ut are full rank by construction. The Gaussian matrix G can, in principle,
have any rank; however, the size of the set of reduced-rank Gaussian matrices
268 MIMO channel
is vanishingly small compared to the size of the set of full-rank matrices (that
is the matrix G is full rank with probability one). Thus, the set of reduced-
rank Gaussian matrices forms a set of zero measure and can be ignored for any
practical discussion.
Because the rank of a matrix produced by the product of matrices can be
no more than the smallest rank of the constituent matrices, this form would
produce a channel matrix with a rank limited by the smaller of mr and mt . For
the rank to be reduced, the unitary matrices UG and VG in the singular value
decomposition of the Gaussian matrix G = UG DG VG would have to transform
(which is a rotation in some sense) the subspace of Dr and Dt such that there is
no overlap on one dimension. Given the random nature of the matrix G, this is
extremely unlikely. Consequently, from any practical point of view, the rank of
the channel is given by min(mr , mt ). Under this model, the expected Frobenius
norm squared of the channel matrix is given by
% & 5 6
H 2
F = a2 tr{Ur Dr G Dt U†t Ut D†t G† D†r U†r }
% &
= a2 tr{Dr G Dt (Dr GDt )† }
m
r ,m t
% &
= a2 {G}j r ,j t 2
= a2 mr mt , (8.82)
j r =1,j t =1
H = a Ur Δα r G Δα t U†t (8.83)
√ 0 1
diag{α , α , . . . , α n −1
}
Δα = n , (8.84)
tr (diag{α0 , α1 , . . . , αn −1 }2 )
where the shaping parameter α and the number of antennas n can be either
αr or αt and nr or nt , respectively. For many environments of interest, the
environments at the transmitter and receiver are similar, assuming that the num-
bers of transmit and receive antennas are equal and have similar
spatial correlation characteristics. In this regime, the diagonal shaping matri-
ces can be set equal, Δα = Δα L = Δα R , producing the new random channel
matrix H.
The form of shaping matrix Δα given here is arbitrary, but has the satisfying
characteristics that in the limit of α → 0, only one singular value remains large,
and in the limit of α → 1, a spatially uncorrelated Gaussian matrix is produced.
Furthermore, empirically this model provides good fits to experimental distri-
butions [30]. The normalization for Δα is chosen so that the expected value of
H 2F is a2 nt nr .
, ,
K v w† √ 1
H=a nt nr + G , (8.85)
K +1 v w K +1
where the observation that G has zero mean is used to remove the cross terms.
H = aG. (8.87)
where the constant associated with the “delta function” or atom at 0 is given by
1
cκ = max 0, 1 − . (8.89)
κ
8.7 Large channel matrix capacity 271
0.08
Probability Density
0.07
0.06
0.05
0.04
0.03
0.02
0.01
– 20 – 15 – 10 – 5 0 5 10
Eigenvalue (dB)
Figure 8.5 Eigenvalue probability density function for the complex Gaussian channel
((1/nt )GG† ), assuming an equal number of transmitters and receivers (κ = 1) in the
infinite dimension limit. IEEE
c 2002. Reprinted, with permission, from Reference
[33].
0
−5
Eigenvalue (dB)
−10
−15
−20
−25
Figure 8.6 Peak-normalized eigenvalue spectrum for the complex Gaussian channel
((1/nt )GG† ), assuming an equal number of transmitters and receivers (κ = 1) in the
infinite dimension limit. IEEE
c 2002. Reprinted, with permission, from Reference
[33].
where λm {·} indicates the mth eigenvalue, and the continuous form is asymp-
totically exact. This integral is discussed in Reference [259].2 The normalized
asymptotic capacity as a function of a2 Po and κ, cU T /nr ≈ Φ(a2 Po ; κ), is given
by
x 1−ρ
1 w−
Φ(x; κ) = ν log2 w+ + log2 − ,
ν ρ 1 − w− ρ log(2)
,
1 ρ ν 1 ν 2
w± = + + ± 1+ρ+ − 4ρ ,
2 2 2x 2 x
1 1
ρ = min r, , ν = . (8.93)
κ max(1, κ)
In the special case of M = nt = nr , the capacity is given by
cU T a2 Po
≈ 3 F2 ([1, 1, 3/2], [2, 3], −4 a Po )
2
(8.94)
M log(2)
√ √
4a2 Po + 1 + 4a2 Po log 4a2 Po + 1 + 1 − 2a2 Po (1 + log(4)) − 1
= ,
a2 Po log(4)
(8.95)
where p Fq is the generalized hypergeometric function discussed in Section 2.14.2.
and μcu t is the minimum eigenvalue used by the transmitter and is the solution
to the integral in Equation (8.98), given by the continuous version of Equation
(8.33),
n+
dm = nt a2 μ > ∞ ,
Po + nr a 21n t μ c u t dμ pκ (μ) μ1
∞
κ μc u t
dμ pκ (μ )
μcu t = ∞ . (8.98)
a2 Po + κ μc u t
dμ pκ (μ) μ1
The approximations are asymptotically exact in the limit of large nr .
For a finite transmit power, the capacity continues to increase as the number
of antennas increases. Each additional antenna increases the effective area of the
receive system. Eventually, this model breaks down as the number of antennas
becomes so large that any additional antenna is electromagnetically shielded by
existing antennas. However, finite random channel matrices quickly approach the
shape of the infinite model. Consequently, it is useful to consider the antenna-
number normalized capacity cI T /nr . The normalized capacity is given by
2 ∞
cI T a Po + r μ c u t dμ pκ (μ ) μ1
≈ g log2
nr κg
∞
+ dμ pκ (μ) log2 (μ) . (8.99)
μc u t
10
Spectral Efficiency
1
(b/s/Hz/M)
0.1
0.01
and
∞
1
dμ pκ=1 (μ)
μc u t μ
, √
1 1 4 − μcu t 1 μcu t
=− + + arcsin . (8.101)
2 π μcu t π 2
To calculate the capacity, the following integral must also be evaluated,
∞
dμ pκ=1 (μ) log2 (μ)
μc u t
1 1 1 3 3 μcu t
= 4 3 F2 , , , , ,
2 2 2 2 2 4
4 2
+ 4 − μcu t − √ arcsec √
μcu t μcu t
√
2π μcu t
× (1 − log[μcu t ]) − √ log(μcu t ) . (8.102)
μcu t π log[4]
By implicitly solving for cut-off eigenvalue μcu t , the capacity as a function of
2
a Po is evaluated and is displayed in Figure 8.7. The uninformed transmitter
spectral efficiency bound is plotted for comparison. For small a2 Po , μcu t ap-
proaches the maximum eigenvalue supported by pκ (μ). In this regime, the ratio
of the informed transmitter to the uninformed transmitter capacity cI T /cU T ap-
proaches 4. To be clear, this limiting value for the ratio occurs in the case of a
symmetric number of transmitters and receivers. Conversely, at large a2 Po , the
normalized informed transmitter and uninformed transmitter spectral efficiency
bounds converge, as predicted by Equation (8.48).
8.9 SNR distributions 275
The term outage capacity is poorly named because it is not really a capacity.
However, it is a useful concept for comparing various practical systems. In par-
ticular, it is useful for comparing various space-time codes and receivers. Given
the assumption of a stochastic model of the channel drawn from a stationary dis-
tribution defined by a few parameters, the outage capacity is defined to be the
rate achieved at a given SNR with some probability [253]. Because the capacity
is dependent upon the given channel matrix, under the assumption of stochastic
channel, the capacity becomes a stochastic variable. If the capacity for some at-
tenuated total transmit power a2 Po (the average SISO SNR), channel matrix H,
and interference-plus-noise spatial covariance matrix R is given by c(a2 Po , H, R)
for some random distribution of channels H, then the P (c ≥ η) is the probability
that the capacity for a given channel draw is greater than or equal to a given
spectral efficiency η. When the capacity for a given channel draw is greater than
or equal to the desired rate, then the link can close, theoretically. Explicitly, the
probability that the link can close, Pclose , is given by
3 4
Pr c(a2 Po , H, R) ≥ η = Pclose . (8.103)
If the link does not close, then it is said to be in outage. Implicit in this assump-
tion is the assumption that the capacity is being evaluated for a single carrier
link in a flat-fading environment. When the link has access to alternative types
of diversity, the discussion becomes more complicated.
Typically, the probability of closing is fixed and the SNR is varied to achieve
this probability. As an example, the outage capacities under the assumptions of
90% and 99% probabilities of closure for an uncorrelated Gaussian 4 × 4 MIMO
channel link in the absence of interference are displayed as a function of a2 Po
(average SNR per receive antenna) in Figure 8.8. The curves in this figure are
constructed by empirically evaluating the cumulative distribution function of
spectral efficiency for each SNR value. The values for 90% and 99% probability
of closing the link are extracted from the distributions.
In Section 8.3.4, the channel capacity in the limit of high and low SNR was con-
sidered. Here the discussion of approximations to the uninformed transmitter
capacity in the limit of low SNR is extended. While capacity is the most funda-
mental metric of performance for a wireless communication system, it is often
useful to consider the distribution of SNR, particularly at low SNR. At lower
SNR, capacity is proportional to SNR. In addition, for practical systems, SNR
is often much easer to measure directly.
276 MIMO channel
8
90% prob close
0
−10 −5 2 0 5 10
a P o (dB)
Figure 8.8 Outage capacity for a 4 × 4 MIMO link under the assumption of an
uncorrelated Gaussian channel as a function of SNR per receive antenna (a2 Po ). Link
closure probabilities of 90% and 99% are displayed.
1 Po ⊥
cU T ≈ tr PM HH†
log 2 nt
1 Po $ $P⊥
$2
$ ,
= M hn (8.108)
log 2 nt n
where hn is the nth column of the channel matrix H, and we have made use
of the idempotent property of projection matrices. In this limit, the spectral
efficiency bound can be expressed as the sum of beamformer outputs each having
an array-signal-to-noise ratio (ASNR) (which is the SNR at the output of a
receive beamformer),
1 Po $ $h†n P⊥
$
$
cU T ≈ M hn
log 2 nt n
1 Po $ † $2
nt
≡ $wn hn $
log 2 nt n =1
P⊥
M hn
wn =
P⊥
M hn
1
n t
≡ ASNRm
log 2 m =1
1
≡ ζ, (8.109)
log 2
where ζ is the sum of ASNRs optimized for each transmit antenna.
By using the notation that gn is the nth column of G, the low SNR spectral effi-
ciency bound (that is, when a2 Po is small) in the presence of strong interference
is given by
1 a2 Po ⊥ 2
cU T ≈ PM gn
log 2 nt n
1 a2 Po †
= gn UU† P⊥ †
M UU gn
log 2 nt n
1 a2 Po †
= (gn ) JK gn
log 2 nt n
K ·n
1 a2 Po t 2
= gm , (8.111)
log 2 nt m =1
278 MIMO channel
where U is a unitary matrix that diagonalizes the projection matrix such that
the first K diagonal elements are one and all the other matrix elements are zero,
represented by JK , assuming K is the rank of the projection matrix,
K = nr − ni , (8.112)
where ni is the number of interferers for nr > ni . While the particular values
change, the statistics of the Gaussian vector g are not changed by arbitrary
unitary transformation Ug, so the statistics of each element of gn and gn are
the same. Here gm is used to indicate a set of random scalar variables sampled
from a unit-norm complex circular Gaussian distribution. As a consequence, the
statistical distribution of the approximate spectral efficiency bound is represented
by a complex χ2 -distribution. The array-signal-to-noise ratio, denoted ASNRm ,
is the SNR at the output of the beamformer associated with the mth transmitter,
assuming that the strong interferer is spatially mitigated. In the low SNR regime,
the presence of the other transmitters does not affect the optimal beamformer
output SNR.
The probability density function of the complex χ2 -distribution from Section
3.1.11 is given by
xN −1 −x
pC
χ 2 (x; N ) = e . (8.113)
Γ(N )
Thus, the following probability density function for the low SNR sum of ASNR
ζ can be expressed as
C nt ζ nt
p(ζ) ≈ pχ 2 2 ; (nr − ni ) · nt 2
. (8.114)
a P0 a Po
Similarly, the cumulative distribution function (CDF) for the complex
χ2 -distribution is given by
x0
PχC2 (x; N ) = dx f (x; N )
0
γ(N, x0 )
=1− , (8.115)
Γ(N )
where γ(N, x0 ) is the incomplete gamma function. Consequently, the CDF for
the sum of the ASNRs ζ is given by
2
C nt , a Po
P (ζ) = Pχ 2 ; K · nt
ζ
γ [nr − ni ] · nt , an2tPζo
=1− . (8.116)
Γ([nr − ni ] · nt )
As an example, the CDFs for 2 × 2, 3 × 3, and 4 × 4 MIMO systems, with and
without a single strong interferer, are compared as in Figure 8.9. The horizontal
axis is normalized such that the average SISO total receive power, normalized by
a2 Po , is 0 dB. In this flat, block-fading environment, the SISO system (not shown)
8.9 SNR distributions 279
1
0.5
2×2 4×4
0.1
CDF
0.05
0.01 3×3
0.005
0 Interferer
1 Interferer
-10 -5 0 5 10
SASNR/a2 Po (dB)
Figure 8.9 CDFs for total receive power at the output of beamformers, which is the
sum of ASNRs (SASNR or ζ), for 2 × 2, 3 × 3, and 4 × 4 MIMO systems, with (solid)
and without (dashed) a single strong interferer.
would perform badly. In the low SNR regime, the information-theoretic spectral
efficiency bound is proportional to the sum of ASNRs ζ. Thus, the probability
of outage is given by the complementary CDF (that is, 1 − CDF). For example,
the 99% reliability (or outage capacity) is associated with the sum of ASNRs ζ
at the probability of 0.01. Because of the spatial diversity and because of the
receive-array gain, performance improves as the number of antennas increases.
In the presence of a strong interferer, the 4 × 4 MIMO system receives more than
13 dB more power than the 2 × 2 system if 99% reliability is required. At this
reliability, the 3 × 3 MIMO system has only lost 3 dB compared to the average
SISO channel. At this reliability, a SISO system would suffer significant losses
compared to the MIMO systems and would have infinite loss in the presence of
a strong interferer.
1
0.5
0.1 2´2
CDF
0.05
0.01 3´3
0.005
4´4
-14 -12 -10 -8 -6 -4 -2 0
Power Loss (dB)
Figure 8.10 CDFs for fractional power loss caused by spatial mitigation of a single
strong interferer for 2 × 2, 3 × 3, and 4 × 4 MIMO systems.
where gm is a Gaussian random variable with the same statistics as gm , is given
by
Γ(j + k) j −1
pβ (x; j, k) = x (1 − x)k −1 (8.119)
Γ(j) Γ(k)
and the corresponding CDF is given by
x0
Pβ (x0 ; j, k) = dx pβ (x; j, k)
0
Γ(j + k)
= B(x0 ; j, k) , (8.120)
Γ(j) Γ(k)
where B(x; j, k) is the incomplete beta function. Consequently, the CDF P (η) of
fractional loss η due to mitigating an interferer is given by
P (η) ≈ Pβ (η; K · nt , [nr − K]nt )
≈ Pβ (η; [nr − ni ] · nt , ni · nt ) . (8.121)
As an example, the comparison of the total ASNR loss CDFs for 2 × 2, 3 × 3,
and 4 × 4 MIMO systems with a single strong interferer is shown in Figure 8.10.
At a 99% reliability (or outage capacity) the sum of ASNRs ζ loss is no worse
than −3.3 dB for a 4 × 4 MIMO system, but is worse than −12 dB for a 2 × 2
system.
The advantage of multiple transmitters is illustrated in Figure 8.11. Given
four receive antennas and the same total transmit power, there is a significant
difference in the performance of a 1 × 4 system versus a 4 × 4 system. At the
99% reliability level, the sum of ASNRs ζ losses are −6.7 dB versus −3.3 dB,
8.10 Channel estimation 281
1
0.5
0.1
CDF
0.05
1´4
0.01
0.005
2´4 4´4
-10 -8 -6 -4 -2 0
Power Loss (dB)
Figure 8.11 CDFs for fractional power loss caused by spatial mitigation of a single
strong interferer for 1, 2, or 4 transmit antennas assuming 4 receiver antennas.
Although joint channel estimation and decoding is possible, for most decoding
approaches and for any informed transmitter approach, an estimate of the chan-
nel is required. While, given some model for the environment, training or channel
probing sequences [139] can be designed to improve performance [23], here it is
assumed that the sequences associated with each transmitter are independent.
The flat-fading MIMO model, assuming nt transmit antennas, nr receive anten-
nas, and ns complex baseband samples is given by
Z = HS + N
,
Po
=H X+N
nt
= AX + N, (8.122)
and the normalized reference signal (also known as a training or pilot sequence)
X ∈ Cn t ×n s is given by
S
X= (8.124)
Po /nt
and is normalized so that
% &
X X† = ns I . (8.125)
The “thermal” noise at each receive antenna can be characterized by the variance.
Here it is assumed that the units of power are defined so that the “thermal” noise
variance is one. It is also assumed that the channel is temporally static.
Here it is worth noting that, in the literature, the normalization of the chan-
nel is not always clear. Because the transmit power can be absorbed within the
amplitude-channel product matrix estimate A or within the transmit signal S, its
definition is ambiguous. Within the context of the discussion of theoretical capac-
ity it is often convenient to explicitly express the power, so that the transmitted
signal contains the square root of the power; however, in channel estimation, it
is often assumed that the reference signal has some arbitrary normalization, and
the transmit power is subsumed into the channel estimate. While this ambiguity
can cause confusion, it is typically reasonably clear from context. Nonetheless,
within the context of this section, we will endeavor to be slightly more precise
to avoid any confusion.
Under the assumption of Gaussian noise and external interference, the prob-
ability of ns received samples or given received signal is
p(Z|X; A, R) = , (8.126)
π n r n s |R|n s
where the interference-plus-noise spatial covariance matrix is given by
% &
N N†
R= . (8.127)
ns
Maximizing with respect to an arbitrary parameter α of A gives the following
estimator:
∂p(Z|X; A, R)
=0
∂α
⇒ (Z − Â X)X† = 0
 = Z X† (X X† )−1 . (8.128)
Here for notational convenience, the couplet {m, n} indicates an index into a
vector of size m · n. Similarly, {m, n}, {j, k} is used to specify an element of a
matrix at row {m, n} and column {j, k}. This is done to avoid using the vector
operation defined in Equation (2.35). For some known reference sequence X, the
received signal mean Y is given by
Y = Z = A X + N
= AX, (8.130)
∂R/∂(A)m ,n = 0 , (8.131)
and the derivative of one conjugation with respect to the other is zero,
∂
A = 0, (8.132)
∂{A}∗m ,n
the only contributing factor to the Fisher information from Section 3.8 for a
Gaussian model is given by
∂2
{J}{m ,n },{j,k } = − log p(Z|A, X)
∂(A)∗m ,n ∂(A)j,k
∂2
= log tr{(Z − A X)† R (Z − A X)}
∂(A)∗m ,n ∂(A)j,k
∂(A X)† −1 ∂(A X)
= tr R
∂(A)∗m ,n ∂(A)j,k
∗ T
† ∂(A ) −1 ∂(A)
= tr X R X , (8.133)
∂(A)∗m ,n ∂(A)j,k
where Wirtinger derivatives are being used. The derivatives of the channel are
given by
∂A
= ej eTk
∂(A)j,k
∂A†
= en eTm (8.134)
∂(A)∗m ,n
284 MIMO channel
where the em vector indicates a vector of zeros with a one at the mth row,
⎛ ⎞
0
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
em =⎜ ⎟
⎜ 1 ⎟. (8.135)
⎜ . ⎟
⎝ .. ⎠
0
Interference free
For the sake of discussion, first consider the interference-plus-noise covariance in
the absence of interference, given by
R = In r , (8.136)
where power is normalized so that the noise per channel is unity. The information
matrix is then given by
∗ T
† ∂(A ) −1 ∂(A)
{J}{m ,n },{j,k } = tr X R X
∂(A)∗m ,n ∂(A)j,k
0 1
= tr (em e†n X)† I−1 ej e†k X
0 1
= tr (em e†n X)† ej e†k X
= xk x†n δm ,j , (8.137)
where xm indicates a row vector containing the mth row of the reference sequence
X, and δm ,j is the Kronecker delta function. For sufficiently long sequences of
xm ,
xk x†n ≈ ns δk ,n (8.138)
The information matrix is thus diagonal with equal values along the diagonal.
For the interference-free channel estimate, the variance of the estimate is given
by
0 1 3 4
var (Â)m ,n = J−1 {m ,n },{m ,n }
1
≈ . (8.140)
ns
It is sometimes useful to consider the estimation variance normalized by the
mean variance of the channel because it is the fractional error that typically
8.10 Channel estimation 285
var{(Â)m ,n } var{(Â)m ,n }
=
A2F a2 Pn ot
nr nt
var{(Ĥ)m ,n }
=
a2
nt
≈
ns a2 Po
nt 1
= , (8.142)
ns SNR
where SNR indicates the mean SISO signal-to-noise ratio.
In interference
From above in Equation (8.133), the Fisher information matrix is given by
∂(A∗ )T −1 ∂(A)
{J}{m ,n },{j,k } = tr X† R X
∂(A)∗m ,n ∂(A)j,k
3 † † −1 4
= tr xn em R ej xk . (8.143)
XX† ≈ ns In t , (8.146)
286 MIMO channel
It would seem that the SNR of a signal would be an easy parameter to define.
However, its definition is somewhat problematic for block-fading MIMO chan-
nels. As was suggested in Section 8.10, the channel estimate, average attenuation,
and transmit power are coupled.
In attempting to analyze space-time codes, which are discussed in Chapter 11
in an absolute context, there are a number of technical issues. The first is that
theoretical analyses of space-time codes often do not lend themselves to experi-
mental interpretation. A primary concern is the definition of SNR. In addition,
the performance of space-time codes is typically dependent upon channel delay
properties. Delay spread in the channel translates to spectral diversity that can
be exploited by a communication system.
8.11 Estimated versus average SNR 287
H = aF, (8.152)
SNRave = SNR
% &
HS 2
=
N 2
% &
tr{H S S† H† }
=
tr{N† N}
% &
tr{H† H S S† }
=
n n
% †s &r % † &
tr{ H H S S }
=
n n
% † s & rP o
tr{ H H n t I}
=
nr
% & 2
F 2F a nPt o }
=
nr
a2 Po
nt nr nt
=
nr
2
= a Po , (8.153)
variation become coupled. Imagine the extreme case for which a single carrier
system is employed with a static channel. The code is not exercised over an
ensemble of channel matrices, and the estimated SNR is biased by the single
particular channel draw.
As described in Section 8.10, for some ns samples, the single-carrier channel
response Z ∈ Cn r ×n s can be expressed by
Z = AX + N, (8.154)
where A = Po /nt H denotes the amplitude-channel product, X ∈ Cn t ×n s is
the normalized transmitted signal, and N ∈ Cn r ×n s is the additive noise. Under
the assumption of a known transmit training sequence X, the channel can be
estimated. In estimating the channel, an amplitude-normalized version X of the
reference is typically employed such that
2
X F = nt ns . (8.155)
 = Z X† (X X† )−1
= A X X† (X X† )−1 + N X† (X X† )−1
= A + N X† (X X† )−1 , (8.156)
The estimated SNR per receive antenna for a given channel realization and esti-
mation is given by
= a
SNR 2P
o
Z X† (X X† )−1 2
F
= . (8.158)
nr
In the limit of large integrated SNR for the channel estimate, the estimation
error approaches zero and
 → A . (8.159)
8.11 Estimated versus average SNR 289
In this limit, a simple relationship between the estimated and average SNR can
be found,
2
= A F
SNR
nr
H 2F Pn ot
=
nr
F 2F
= SNRave . (8.160)
nt nr
To be clear, even though the channel is estimated perfectly, there is a bias in the
SNR estimate.
This discussion becomes somewhat complicated in the context of frequency-
selective fading. Specifically, if we assume that orthogonal-frequency-division
multiplexing (OFDM) modulation is employed where there is a large number
of carriers, then an average SNR maybe estimated. Under the assumptions of a
constant average attenuation across frequency, and of significant resolvable delay
spread, which indicates significant frequency-selective fading, the average SNR
can be found by averaging across the SNR estimates (remembering to perform
this on a linear scale). In this regime, the average estimated SNR converges to
the average SNR,
5 6
→ SNR .
SNR (8.161)
However, if there is not significant delay spread, then there will not be a large
set of significantly different channel matrices to average over. Consequently, the
average SNR and the average estimated SNR will not be the same.
ce = cU T
) *
Po
= log2 In r + H H†
nt
) *
a2 Po
= log2 In r + F F† , (8.163)
nt
where the only remaining random variable is the channel distribution.
From this expression, it can be observed that the standard formulation of the
ergodic capacity is given in the context of the average SNR, which is a parameter
that may or may not be able to be estimated in an experimental context. We
can reformulate the relationship between the estimated and the average SNR for
the asymptotic estimated channel (that is, the well-estimated channel) found in
Equation (8.160) to replace the argument in the capacity,
a 2 Po nr .
= SNR (8.164)
nt F 2F
Consequently, the ergodic capacity as a function of estimated SNR is given by
) *
nr
ce, SNR = log I +
SNR F F†
2 nr
F F 2 , (8.165)
5
99Out Est SNR
0
−10 −5 0 5 10 15
SNR Per Receive Antenna (dB)
Figure 8.12 Performance bounds for a 2 × 2 MIMO system under the assumptions:
“99Out Est SNR” – the outage capacity with a 99% probability of closing the link
under the assumption that the SNR per receive channel is estimated from the
received signal; “<Cap> Est SNR” – the average or ergodic capacity under the
assumption that the SNR is estimated from the received signal; “99Out Ave SNR” –
the outage capacity with a 99% probability of closing the link under the assumption
that the SNR is the average SNR given by a distribution of channels; “<Cap> Ave
SNR” – the average or ergodic capacity under the assumption that the SNR is the
average SNR given by a distribution of channels.
the coding should approach the ergodic capacity and the average estimated SNR
should approach the average SNR.
8.12.1 Reciprocity
For two radios, the reciprocity approach takes advantage of the physical property
that, after time reversal is taken into account, the channel is in principle the same
292 MIMO channel
There are, however, a few technical caveats to this expectation. Radios of-
ten use different frequencies for different link directions. In general, the chan-
nels in multipath environments decorrelate quickly as a function of frequency.
If frequency-division multiple access (FDMA) is employed, then reciprocity may
not provide an accurate estimate of the reverse link.
Another potential practical issue is that the path within the radio is not ex-
actly the same between the transmit and the receive paths. From a complex
baseband processing point of view, these hardware chains are part of the chan-
nel. In principle, these effects can be mitigated by calibration. However, it is a
new requirement placed upon the hardware that is not typically considered.
The most significant potential concern is that the reciprocity approach only
captures the interference structure at the local radio and does not capture the
interference structure at the “other” radio. The optimal informed transmitter
strategy is to use the interference-plus-noise whitened channel estimate. As an
example, in a flat-fading environment the channel from radio 1 to 2 is given
by H1→2 , the interference-plus-noise covariance matrix at radio 2 is denoted
R2 , and the whitened channel matrix used by the optimal strategy is given
−1/2
by R2 H1→2 . While a reasonable estimate for the channel can be made by
reciprocity, the interference structure R2 cannot be estimated by a reciprocity
approach. Consequently, for many informed transmitter applications, reciprocity
is not a valid option.
Problems
8.1 Develop the result in Equation (8.48) for the case in which nt > nr .
8.3 Evaluate the capacity in Equation (8.52) if the interference, noise, and
channel are all not frequency selective.
8.5 Reevaluate the relations in Equation (8.48) under the assumption that the
interference of rank ni is much larger than the signal SNR (INR SNR).
8.6 Evaluate the informed to uninformed capacity ratio cI T /cU T in the limit
of an infinite number of transmit and receive antennas (as discussed in Section
8.7) and in the limit of low SNR as a function of the ratio of receive to transmit
antennas κ.
8.8 Consider the outage capacities (at 90% probability of closing) presented in
Equation (8.103) for a 4×4 flat fading channel matrix characterized by the expo-
nential shaping model introduced in Equation (8.83). Assume that the transmit
and receive shaping parameters have the same value α. As a function of per re-
ceive antenna SNR (−10 dB to 20 dB), numerically evaluate and plot the ratio of
294 MIMO channel
the outage capacity (at 90% probability of closing) for the values of exponential
shaping parameter α:
(a) α = 0.25
(b) α = 0.5
(c) α = 0.75
to the outage capacity assuming that α = 1.
9 Spatially adaptive receivers
Figure 9.1 Basic communication chain. Depending upon the implementation, the
coding and modulation may be a single block. Similarly, the receiver and decoding
may be a single block.
In some sense, the “right” answer is to not separate coding and modulation
or even channel estimation. The optimal receiver would instead make a direct
estimate of the transmitted information based upon the observed signal and a
model of the channel. In reality, the definition of optimal is dependent upon the
problem definition (for example, incorporating the cost of computations). Ignor-
ing external constraints, the optimal receiver in terms of receiver performance is
the maximum a posteriori (MAP) solution, that is, it maximizes the posterior
proability.
To begin, we consider the maximum a posteriori solution for a known channel.
Consider an information symbol represented by a single element selection from a
set α ∈ {α1 , α2 , . . . , αM } (which many take many bits or a sequence of channel
uses to represent). Given this notation, the maximum a posteriori solution is
formally given by
α̂ = argmaxα pα (α|Z)
pα (α)
= argmaxα p(Z|α)
p(Z)
= argmaxα p(Z|α) pα (α) , (9.3)
There is a strong connection between approaches used for spectral filtering and
those used for adaptive spatial processing. As an introduction, the Wiener spec-
tral filter [346, 142, 256, 273] is considered here for a single-input single-output
(SISO) link. Specifically, we will consider a sampled Wiener filter applied to
spectral compensation, which is the minimum-mean-squared error (MMSE) rake
receiver [254]. The name rake receiver comes from the appearance of the original
physical implementation of the rake receiver in which the various mechanical
taps off a transmission line used to introduce contributions at various delays
looked like the teeth of garden rake. This filter attempts to compensate for the
effects of a frequency-selective channel.
The effect of channel delay is to introduce intersymbol interference, which
is used to describe delay spread in the channel introducing copies of previous
symbols at the current sample. Given some finite bandwidth signal, the channel
can be accurately represented with the sampled channel if the bandwidth B
satisfies Ts < 1/B. Note that the standard Nyquist factor of two for real signals
is absent because these are complex samples and, consequently, we are taking B
to span both the positive and negative frequencies at baseband.
To be clear, there are a number of issues related to the use of discretely sampled
channels. In particular, scatterers placed off the sample points in time can require
large numbers of taps to accurately represent the channel effect. These effects
are discussed in greater detail in Sections 4.5 and 10.1.
∗
ŝ(t) = wm z(t − m Ts ) , (9.7)
m
where ŝ(t) is the estimate of the transmitted signal, and Ts is the sample period
for which it is assumed that the sampling is sufficient to satisfy Nyquist sampling
9.1 Adaptive spectral filtering 299
{w}m = wm
{Q}m ,n = z(t − m Ts ) z ∗ (t − n Ts )
{v}m = z(t − m Ts ) s∗ (t) , (9.10)
the derivative of the mean-squared error or average error power1 can be written
as
∂ % & ∂ †
(t) 2 = w Q w − w† v − v† w + s(t)s∗ (t)
∂α ∂α
∂ †
= w [Q w − v] + h.c. , (9.11)
∂α
where h.c. indicates the Hermitian conjugate of the first term. This equation is
solved by setting the non-varying term to zero, so that the filter vector w is given
1 Strictly speaking, the output of the beamformer should be parameterized in terms of
energy per symbol, but it is common to refer to this parameterization in terms of power.
Because the duration of a symbol is known, the translation between energy per symbol
and power is a known constant.
300 Spatially adaptive receivers
by
Qw = v
w = Q−1 v ; Q > 0 . (9.12)
This result is known as the Wiener–Hopf equation [346, 256]. The result can be
formulated in more general terms, but this approach is relatively intuitive. Thus,
the MMSE estimate of the transmitted signal ŝ(t) is given by
∗
ŝ(t) = wm z(t − m Ts )
m
By using the same notation as that found in Equation (8.122), the nt transmitter
by nr receiver sampled flat-fading MIMO channel model can be given by either
of two forms depending upon whether the power parameter is absorbed into the
transmitted signal or the channel matrix in which the received data matrix Z is
given by either
Z = HS + N
, ,
Po Po
=H X + N; S= X,
nt nt
or
,
Po
= AX + N; A= H, (9.14)
nt
where the received signal is indicated by Z ∈ Cn r ×n s , the channel matrix is
indicated by H ∈ Cn r ×n t , the transmitted signal is indicated by S ∈ Cn t ×n s ,
the noise plus interference is indicated by N ∈ Cn r ×n s , the amplitude-channel
product is indicated by A ∈ Cn r ×n t , the normalized transmitted signal is indi-
cated by X ∈ Cn t ×n t , and the total thermal-noise-normalized power is indicated
by Po . It may be overly pedantic to differentiate between these two forms (X
versus S) because it is typically clear from context. Nonetheless, for clarity in
this chapter, we will maintain this notation.
By employing a linear operator, denoted the beamforming matrix W ∈ Cn r ×n t ,
an estimate of the normalized transmitted signals X̂ is given by
X̂ = W† Z , (9.15)
where the columns of the beamforming matrix W contain a beamformer associ-
ated with a particular transmitter. The complex coefficients within each column
are conjugated and multiplied by data sequences from each receive antenna data
stream. These modified data streams are then summed and are used to attempt
to reconstruct the signal associated with a given transmitter.
9.2 Adaptive spatial processing 301
Z = AX + N
= a m xm + N . (9.16)
m
By ignoring the signal from other transmitters (the internal interference) and
the external interference plus noise, the power at the output of the beamformer
associated with a particular transmit antenna Qm is given by
1 % † &
Qm = wm am xm 2
ns
1 †
% &
= tr{wm am xm x†m a†m wm }
ns
†
= tr{wm am a†m wm } , (9.17)
where the expectation is over the transmitted signal, and the average power of
the signal transmitted by the mth antenna is normalized to be one. Because the
unconstrained beamformer that maximizes the power at the output has infinite
coefficients, some constraint on the beamformer is required. Here, it is required
302 Spatially adaptive receivers
The beamformer that maximizes the average power under this constraint is found
by using the method of Lagrangian multipliers discussed in Section 2.12.1:
∂
0= Qm − λm wm 2
∂α
∂ †
= tr{wm am a†m wm − λm wm
†
wm }
∂α
∂wm†
= tr am a†m wm − λm wm + h.c. , (9.19)
∂α
am a†m wm = λm wm
am
wm = . (9.20)
am
where the channel matrix without the channel vector from the transmitter of
interest Am ∈ Cn r ×(n t −1) is given by
Am = (a1 · · · am −1 am +1 · · · an t ) . (9.23)
WZ F = A (A† A)−1
X̂ = WZ† F Z = (A† A)−1 A† Z
= (A† A)−1 A† A X + (A† A)−1 A† N
= X + (A† A)−1 A† N . (9.28)
The outputs of this beamformer, under the assumption of perfect channel knowl-
edge and at least as many receive antennas as transmit antennas (nr ≥ nt ), have
no contributions from other transmitters. The beamformer adapted for the mth
transmitter wm is given by
wm = WZ F em
= A(A† A)−1 em , (9.29)
where the selection vector {em }n = δm ,n is given by the Kronecker delta, that
is one if m and n are equal and zero otherwise.
Because the channel is not known, this beamformer wm must be approximated
by an estimate of the channel. By substituting the maximum-likelihood channel
estimate, under the Gaussian interference and noise model that is found in Equa-
tion (8.128), into Equation (9.29), the estimated channel inversion beamformer
is found,
wm ≈ Â(† Â)−1 em
= Z X† (X X† )−1 [(X X† )−1 X Z† Z X† (X X† )−1 ]−1 em
= Z X† [X Z† Z X† ]−1 X X† em
= Z X† [X Z† Z X† ]−1 X x†m , (9.30)
Orthogonal beamformer
Imagine a scenario in which there is a MIMO link with no external interference.
The interference from other transmitters within the MIMO link is minimized by
constructing a beamformer wm for each transmit antenna that is orthogonal to
the spatial subspace spanned by the spatial responses of the other transmitters,
wm ∝ P⊥
A m am , (9.31)
where P⊥ A m ∈ C
n r ×n r
is the operator that projects onto a column space orthog-
onal to the spatial response of the interfering transmitters. This construction is
heuristically satisfying because the beamformer begins with the matched-filter
9.2 Adaptive spatial processing 305
array response and then projects orthogonal to subspace occupied by the internal
interference. If there were no interference, then the beamformer would become
the matched filter. We will show that the beamformer constructed by using this
model is proportional to the zero-forcing beamformer and is therefore equivalent.
By using Equation (9.23), the projection operator that projects onto a basis
orthogonal to the receive array spatial responses of all the other transmitters
P⊥A m is given by
P⊥
A m = I − PA m
This form can be found by considering the beamformer that minimizes the in-
terference. The average interference power at the output of a beamformer from
(in t)
other transmitters of the MIMO link Qm is given by
) *
1 $
$wm
$2
†
Am Xm $
(in t)
Qm ∝
ns
1 † 5 6
= wm Am Xm X†m A†m wm
ns
†
= wm Am A†m wm , (9.33)
By minimizing the expected interference power under this constraint, the beam-
former is found. The constraint on the norm of wm is enforced by using the
method of Lagrangian multipliers discussed in Section 2.12.1,
∂ (in t)
0= Qm − λm wm 2
∂α
∂
= tr{wm†
Am A†m wm − λm wm†
wm }
∂α
∂w†
†
= tr Am Am wm − λm wm m
+ h.c. , (9.35)
∂α
where λm is the Lagrangian multiplier, α is some arbitrary parameter of wm ,
and h.c. indicates the Hermitian conjugate of the first term. The beamformer
lives in the subspace spanned by the eigenvectors associated with the eigenvalues
with zero value,
0 wm = Am A†m wm . (9.36)
2 Implicit in this formulation is the assumption that the MIMO system is operating in an
uninformed transmitter mode.
306 Spatially adaptive receivers
The relationship can be simplified by recognizing that a projection onto the space
orthogonal to the column space of amplitude-channel product Xm imposes the
same constraint,
0 wm = PA m wm , (9.37)
am a†m wm − λm wm − ηm PA m wm = 0 . (9.39)
wm = P⊥
A m wm . (9.40)
0 = am a†m P⊥ ⊥ ⊥
A m wm − λm PA m wm − ηm PA m PA m wm
= am a†m P⊥ ⊥
A m wm − λm PA m wm
= P⊥ † ⊥ ⊥
A m am am PA m wm − λm PA m wm , (9.41)
where the observation that projection operators are idempotent (which indicates
the operation can be repeated without affecting the result) is employed. This
9.2 Adaptive spatial processing 307
Once again, the channel response is not typically known; however, by using a
reference signal, the beamformer wm can be estimated by employing Equation
(9.25),
P̂⊥ ⊥ †
A m Z PX m xm
wm ≈ †
, (9.43)
P̂⊥ ⊥
A m Z PX m xm
where P⊥ † †
x m = I − xm xm /(xm xm ).
A (A† A)−1 em ∝ P⊥
A m am . (9.46)
P⊥ ⊥
A m PA am = PA PA m am
8 9
= A (A† A)−1 A† P⊥
A m am . (9.47)
em ∝ A† P⊥
A m am
= (P⊥ †
A m A) am , (9.48)
a1 = P1 a1 + P⊥
1 a1 . (9.49)
The projection onto the subspace orthogonal to that spanned by the columns of
the channel matrix other than the first column is given by
P⊥A m A = P1
⊥
P1 a1 +P⊥
1 a1 a2 · · · a n t
= P⊥ 1 a1 0 ··· 0 (9.51)
because P⊥ ⊥
A m which in this example is indicated by P1 is constructed to be
orthogonal to the subspace containing the vectors a2 · · · an t . Consequently, a
form proportional to the selection vector is found,
†
(PA m A)† a1 = P⊥
1 a1 0 ··· 0 a1
⎛ † ⊥ ⎞
a1 P1 a1
⎜ 0 ⎟
⎜ ⎟
=⎜ .. ⎟
⎝ . ⎠
0
∝ e1 . (9.52)
Similarly, for any value of m, this relationship holds, so the two beamformers
are the same up to an overall normalization.
R̂ = U D U† , (9.61)
P⊥
[U q A m ] = I − [Uq Am ] ([Uq Am ]† [Uq Am ])−1 [Uq Am ]†
† ⊥
= I − P⊥
U q Am (Am PU q Am )
−1
Am † − P⊥ † ⊥
A m Uq (Uq PA m Uq )
−1
Uq † .
(9.62)
The minimum-interference beamformer in external interference can be
constructed by modifying Equation (9.31) such that the beamformer must be
orthogonal to the other MIMO transmitters and orthogonal to the external
interference. The beamformer wm is given by
wm ∝ P⊥
[U q A m ] am , (9.63)
where the notation from Equation (9.23) is employed. Because the interference
cannot be completely removed, the next best thing is to remove as much as you
can. This corresponds to the minimum eigenvalue of the interference-plus-noise
covariance matrix Qm . The beamformer w that achieves this goal is given by
the eigenvector em in associated with the minimum eigenvalue λm in of Qm that
satisfies
w ∝ em in
λm in em in = Qm em in . (9.65)
1
Q̂m = Âm †m + Z P⊥ †
X Z , (9.66)
ns
1
Q̂m = Z P⊥ †
xm Z . (9.67)
ns
E = W† Z − X . (9.68)
% &
The mean-squared error E 2F between the output of a set of beamformers
and transmitted signals is given by
% & % &
E 2
F = W† Z − X 2F
% &
= tr (W† Z − X)(W† Z − X)† . (9.69)
3 This assumes that the signals from each transmit antenna are uncorrelated.
312 Spatially adaptive receivers
To minimize this error, the derivative with respect to some parameter α of the
matrix of beamformers W is set to zero,
∂ % &
E 2F = 0
∂α
∂ % &
= tr (W† Z − X)(W† Z − X)†
∂α) *
∂
= tr (W† Z − X)(W† Z − X)†
∂α
) *
∂
= tr Z (W† Z − X)† W + c.c.
∂α
% & % & ∂
= tr Z Z† W − Z X† W + c.c. , (9.70)
∂α
where c.c. indicates the complex conjugate of the first term. This relationship is
satisfied if for all variations in beamformers ∂W/∂α the argument of the trace
is zero. Consequently, the term within the parentheses is set to zero,
% & % &
Z Z† W − Z X† = 0
% &−1 % &
W = Z Z† Z X† . (9.71)
This form has an intuitive interpretation. The first term is proportional to the
inverse of the receive covariance (signal-plus-interference-plus-noise) matrix Q ∈
Cn r ×n r and the second term is proportional to an array response estimator.
Consequently, this beamformer attempts to point in the direction of the signals
of interest, but points away from interference sources.
With the assumptions % that
& the transmit covariance matrix is proportional to
4 †
the identity matrix
% X
& X = ns I and that the cross covariance is proportional
†
to the channel Z X = ns A, the mean-squared error for the MMSE beam-
former is given by
) % % † *
% & †
&% &
† −1 †
&% &
† −1
2
E F = tr XZ ZZ Z−X XZ ZZ Z−X
% 3 † −1 † −1 4&
= tr A Q Z − X Z Q A − X†
3 4
= ns tr A† Q−1 Q Q−1 A − 2A† Q−1 A + I
3 4
= ns tr I − A† Q−1 A
3 4
= ns tr I − A† (A A† + R)−1 A . (9.72)
For practical problems, the expectations in Equation (9.71) cannot be known
exactly. The expectations can be approximated over some finite number of sam-
ples ns . If ns nr and ns nt , then the expectations can be approximated
well by
% &
Z Z† ≈ Z Z†
% &
Z X† ≈ Z X† . (9.73)
4 This assumption implies that the MIMO link is operating in an uninformed mode.
9.2 Adaptive spatial processing 313
where the covariance matrix for the received internal and external interference
for the mth transmitter is indicated by Qm = Am A†m + R, assuming external-
interference-plus-noise covariance matrix R. It is assumed here that the transmit
covariance matrix is proportional to the identity matrix, Xm X†m /ns = In t −1 .
For the beamformer wm that maximizes the SINR for the mth transmitter,
the SINR is found by
†
wm am a†m wm
wm = argmaxw m †
. (9.78)
wm Qm wm
1/2
By employing the change of variables, η m = Qm wm , the optimization is equiv-
alent to
−1/2 −1/2
−1/2 η †m Qm am a†m Qm ηm
wm = Qm argmaxηm . (9.79)
η †m ηm
The value of η m that solves this form is proportional to the eigenvector of the
−1/2 −1/2
matrix Qm am a†m Qm , which is rank-1 and is constructed from the outer
product of the interference-plus-noise whitened (as introduced in Section 8.3.1)
channel vector. The eigenvector that solves this equation is proportional to the
whitened channel vector. Consequently, the beamformer wm for the mth trans-
mitter that maximizes the SINRm is given by
−1/2 −1/2 −1/2
wm = Qm η m = Qm Qm am
= Q−1
m am . (9.80)
While the structure of the beamformer is formally satisfying because the con-
tributions of all interfering sources are reduced by the matrix inverse, the form
assumes exact knowledge of the model parameters. However by using either
Equation (9.66) or Equation (9.67) along with Equation (9.25), an estimate of
the beamformer can be evaluated.
Qm = Q − am a†m . (9.82)
= (Q − am a†m )−1 am
= (Q − am a†m )−1 Q Q−1 am
= (I − Q−1 am a†m )−1 Q−1 am
Q−1 am a†m
= I+ Q−1 am
1 + a†m Q−1 am
a†m Q−1 am
= 1+ Q−1 am
1 + a†m Q−1 am
a†m Q−1 am M M SE
= 1+ †
wm
−1
1 + am Q am
∝ wm
M M SE
. (9.83)
The SNR loss is used here as the metric of performance to compare the minimum
interference and MMSE beamformer approaches. The SNR loss provides a mea-
sure of the loss caused by mitigating interference. It is given by the ratio at the
output of an adaptive beamformer in the presence and absence of interference.
This metric does not address how well the interference is mitigated. Rather, it
provides insight into the cost in reducing SNR induced by attempting to mitigate
the interference. This may be of value when comparing various system concepts
that do or do not require interference mitigation such as time-division multiple-
access schemes. To simplify this analysis, it is assumed that there is a single
source of interest and a single interferer.
In the absence of interference, the received signal for some block of data Z
from the transmitter of interest is given by
Z = a0 x0 + N . (9.84)
where the interfering signal is indicated by the subscript 1. The SNR ρ0|1 (as
opposed to the SINR) at the output of a beamformer in the presence of an
interferer is given by the ratio of the signal power at the output of the beamformer
to the noise power at the output of the beamformer,
% &
w † a 0 x0 2
ρ0|1 =
w† N 2
% &
x 0 2 w † a0 2
=
w† N N† w
w † a0 2
= . (9.92)
w 2
SNR loss is given by the ratio of the SNR after mitigating interference to the
SNR in the absence of interference,
ρ0|1
SNR Loss =
ρ
0
w † a 0 2
ρ0|1 w 2
=
ρ0 a0 2
w † a0 2
= . (9.93)
w 2 a0 2
As with many metrics in engineering, the SNR loss ratio is often expressed on a
decibel scale. When expressed in a linear regime, its value is bounded by 0 and
1. However, when expressed on a decibel scale the sign is sometimes inverted.
P⊥
1 a0
w=
P⊥
1 a0
∝ P⊥
1 a0
w † a0 2
SNR lossM I =
a0 2 w 2
$8 † $
$ † 9 $2
$ I − a 1 a 12 a0 a0 $
$ a 1 $
= $8 † 9
$2
$ a a $
a0 2 $ I − a1 1 12 a0 $
$ 8 9 $2
$ † a a† $
$a0 I − a1 1 12 a0 $
= $ 8 9 $
$ a a† $
a0 2 $a†0 I − a1 1 12 a0 $
1 † a1 a†1
= a I− a0
a0 2 0 a1 2
a†1 a0 2
=1− . (9.95)
a0 2 a1 2
where it is assumed that the noise, x0 , x1 are all independent and have unit
variance per sample. From Equation (2.116), the inverse of the rank-2 matrix
9.3 SNR loss performance comparison 319
a0 a†0 a1 a†1 a† a1 2
(I + a0 a†0 + a1 a†1 )−1 =I− †
+ †
1+ 0
1 + a0 a0 1 + a 1 a1 γ
1 †
+ a a1 a0 a†1 + a†1 a0 a1 a†0 , (9.99)
γ 0
where
φ = a†0 a1 = a0 a1 α
∗
φ = a†1 a0 . (9.102)
Here α represents the normalized inner product between the vectors a0 and a1 ,
using the definition
a†0 a1
α= . (9.103)
a0 a1
5 6 5 6
Z x†0 = [a0 x0 + a1 x1 + N] x†0
5 6
= a0 x0 x†0
= a0 ns . (9.104)
w = Q−1 a0 . (9.105)
320 Spatially adaptive receivers
a0 a†0 a1 a†1 a†0 a1 2
w = a0 − + 1+ a0
1 + a†0 a0 1 + a†1 a1 γ
1 †
+ a0 a1 a0 a†1 + a†1 a0 a1 a†0 a0
γ
a†0 a0 a†1 a0 a†0 a1 2
= a0 − a0 + a1 1+
1 + a†0 a0 1 + a†1 a1 γ
a†0 a1 a†1 a0 a† a0 a†0 a0
+ a0 + 1 a1
γ γ
a0 2 φ 2 φ 2
= 1− 2
1+ + a0
1 + a0 γ γ
∗
φ a0 2 φ∗ φ 2
+ − 1+ a1
γ 1 + a1 2 γ
= k0 a0 + k1 a1 , (9.106)
where k0 and k1 are used for notational convenience and are given by
γ = 1 + a0 + a1 2 + a0 2 a1 2 − φ
2 2
a0 2 φ 2 φ 2
k0 = 1 − 2
1 + +
1 + a0 γ γ
1 + a1 2
=
γ
φ∗ a0 2 φ∗ φ 2
k1 = − 1 +
γ 1 + a1 2 γ
∗
φ
=− . (9.107)
γ
w † a0 2
SNR lossM M SE =
w 2 a0 2
$ † $2
$w a0 $
= 2 . (9.108)
w 2 a0
9.3 SNR loss performance comparison 321
$ $2
The two terms of interest are $w† a0 $ and w 2 , given by
$ † $2 $ $2
$w a0 $ = $ † $
$a0 (k0 a0 + k1 a1 )$
$ $2
= $(k0 a0 2 + k1 φ)$
1 2
= 2 (1 + a1 2 ) a0 2 − φ 2
γ
1 2
= 2 (1 + a1 2 ) a0 2 − a0 2 a1 2
α2
γ
a0 4 2
= 2
(1 + a1 2 ) − a1 2 α2
γ
a0 4 2
= 2
1 + a1 2 (1 − α2 ) , (9.109)
γ
where α is the normalized inner product from Equation (9.103), and
w 2
= w† w
= (k0 a0 + k1 a1 )† (k0 a0 + k1 a1 )
= (k02 a0 2 + k0 k1 φ + k0 k1∗ φ∗ + k1 2 a1 2 )
1
= 2 ([1 + a1 2 ]2 a0 2 − 2[1 + a1 2 ] φ 2 + φ 2 a1 2 )
γ
a0 2
= ([1 + a1 2 ]2 − 2[1 + a1 2 ] a1 2 α2 + a1 4 α2 )
γ2
a0 2 2
= [α + (1 + a1 2 )2 (1 − α2 )] . (9.110)
γ2
Consequently, the SNR loss is given by
w † a0 2
SNR lossM M SE =
w 2 a0 2
2
1 + a1 2 [1 − α2 ]
= 2 . (9.111)
α + (1 + a1 2 )2 (1 − α2 )
In the limit of strong interference, the interfering term a1 becomes large, and the
SNR loss converges to that of the minimum-interference beamformer described
in Equation (9.95),
2
1 + a1 2 [1 − α2 ]
lim SNR lossM M SE = lim
a 1 →∞ a 1 →∞ α2 + (1 + a1 2 )2 (1 − α2 )
2
a1 2 [1 − α2 ]
= lim
a 1 →∞ ( a1 2 )2 (1 − α2 )
2
1 − α2
= = 1 − α2 . (9.112)
1 − α2
As an aside, in the case of a single signal of interest and single interferer, the
MMSE beamformer is the maximum SINR beamformer discussed in
322 Spatially adaptive receivers
Section 9.2.4. Consequently, the above analysis is valid for the maximum SINR
beamformer for this particular problem definition.
information between the transmitted signal and observed signal. However, here
it is assumed that the receiver ignores any potential performance gain available
from considering the correlations between beamformer outputs. In particular,
it is assumed that a beamforming receiver can only decode a single signal at
the output of each beamformer. This assumption is not valid for the optimal re-
ceiver or multiple-user type approaches that mix temporal and spatial mitigation
as discussed in Section 9.6 and in References [98, 324, 323, 69]. However, this
limitation is a reasonable approximation to the bounding performance for some
receivers that separate the transmitted signal by using receive beamformers and
ignoring the correlation between noise at the beamformer outputs.5
y = W† z
= W† H s + W† n , (9.113)
H ⇒ W† H , (9.114)
that is, the beamformer is subsumed into the channel. It is typical for beamform-
ers to attempt to reduce the correlations between beamformer outputs because
they mitigate interference associated with other transmit antennas. However,
there is typically some remaining correlation between beamformer outputs. Sim-
ilarly, this interpretation implies that the noise-plus-interference covariance ma-
trix becomes
% †& % &
n n ⇒ W† n n† W . (9.115)
Depending upon the beamforming approach, the signals of interest may suffer
SNR losses that may be significant, and the noise at the outputs of the beam-
formers may become correlated. The beamformers may or may not attempt to
estimate parameters of the external interference and thus may or may not miti-
gate it.
5 Portions of this section are IEEE
c 2004. Reprinted, with permission, from Reference [32].
324 Spatially adaptive receivers
R ⇒ W† R W
Po
H P H† ⇒ W † H I H† W . (9.116)
nt
If W is invertible, then
† −1 Po
cbf = log I
2 nr + W −1
R −1
W W †
HH†
W
UT
nt
Po
= log2 In r + R−1 H H†
nt
= cU T , (9.120)
and the capacity is the same as in the absence of the beamformer. This is not
surprising because the effect of the beamformers W on the channel H in Equation
(9.114) can be reversed if W is invertible.
In the case of a receiver based on beamformers that does not share information
across beamformer outputs, such as MMSE or minimum interference discussed
in Section 9.2, the form of the bound is modified. In this case, there is a separate
beamformer optimized for each transmitter. The interference power that could be
employed to jointly estimate signals instead contributes power to the noise-like
entropy term of the capacity.
We attempt to approximate the effects of ignoring the correlations between
beamformer outputs by evaluating the entropies while ignoring the correlations.
Because knowledge about {s}m is not used by the beamformer to remove inter-
ference for {y}k (for k = m), the entropy for the noise-like component becomes
the sum of entropies assuming independent sources.
9.4 MIMO performance bounds of suboptimal adaptive receivers 325
† Po †
hu c,m (y|H, R) ≤ log2 πe wm R wm + wm H H† wm
nt
P
†
Hm H†m wm
o
= log2 πe wm R+
nt
Po †
+ wm hm h†m wm . (9.121)
nt
The resulting noise-like entropy hu c,m (y|s, H, R) for the mth beamformer is
given by
Po
hu c,m (y|s, H, R) = log2 †
πe wm R+ Hm H†m wm . (9.122)
nt
Here it is observed that the mean noise-like output (which includes residual
interference signals) of each beamformer is given by
Po
†
wm R+ Hm H†m wm . (9.123)
nt
nt
cu c = [hu c,m (y|H, R) − hu c,m (y|s, H, R)]
m
−1
nt
Po
= †
log2 1 + wm R+ Hm H†m wm
m
nt
Po †
× wm hm 2
, (9.124)
nt
inequality is demonstrated by
Po
cU T
= log2 I + †
HH
nt
= log2 |I + AA† |
≥ cu c
−1
nt
Po Po
cu c = log2 1 + †
wm I+ Hm H†m wm †
wm hm 2
m
nt nt
nt 8 † 3 4 −1 9
= log2 1 + wm I + A A† − am a†m wm †
wm am 2
, (9.125)
m
8 † 3 4 −1 9
[cu c ]m = log2 1 + wm I + A A† − am a†m wm †
wm am 2
≤ [cM
uc
SINR
]m
8 0 −1/2
= log2 λm ax I + I + A A† − am a†m am
−1/2 19
· a†m I + A A† − am a†m
8 −1 9
= log2 1 + a†m I + A A† − am a†m am , (9.126)
"n t
where λm ax {·} indicates the largest eigenvalue, such that m =1 [cu c ]m = cu c
"n t
and m =1 [cMuc
SINR
]m = cM
uc
SINR
. Consequently, any spectral efficiency bound for
beamformers under the receiver assumption of uncorrelated channels is bounded
by
8 −1 9
cM
uc
SINR
= log2 1 + a†m I + A A† − am a†m am . (9.127)
m
9.4 MIMO performance bounds of suboptimal adaptive receivers 327
cU T = log2 |I + A A† |
"2 "3 "4
|I + a1 a†1 | |I + m =1 am a†m | |I + m =1 am a†m | |I + m =1 am a†m | · · ·
= log2 "2 "3
|I| |I + a1 a†1 | |I + m =1 am a†m | |I + m =1 am a†m | · · ·
"m −1
|I + j =1 aj a†j + am a†m |
= log2 "m −1
m |I + j =1 aj a†j |
⎛ ⎞−1
m −1
= log2 I + ⎝I + aj aj ⎠ am am
† †
m j =1
⎡ ⎛ ⎞−1 ⎤
m −1
⎢ ⎥
= log2 ⎣1 + a†m ⎝I + aj a†j ⎠ am ⎦
m j =1
= [cU T ]m , (9.128)
m
where [cU T ]m indicates the mth term in the sum. Finally, it can be seen that
each term for the largest bound cM uc
SINR
(found in Equation (9.127)) is still less
than the successive interference cancellation term of the uninformed transmitter
capacity,
⎡ ⎛ ⎞−1 ⎤
m −1
⎢ ⎥
[cU T ]m = log2 ⎣1 + a†m ⎝I + aj a†j ⎠ am ⎦
j =1
8 −1 9
≥ log2 1 + a†m I + A A† − am a†m am
= [cM
uc
SINR
]m . (9.129)
and for any complex vector x and positive definite Hermitian matrices B and
positive semidefinite Hermitian matrix C
Iterative receivers are useful when sample matrix inversion (SMI) is not com-
putationally feasible. Implicitly, the typical sample matrix inversion approach
assumes that the environment is blockwise stationary. In some sense, the un-
derlying assumption of some continuously adapting iterative receivers, such as
recursive least squares (RLS) or least mean squares (LMS), can be a better match
to continuously changing environments. In practice, the choice between using a
sample matrix inversion or an iterative approach is usually driven by logisti-
cal and computational considerations. More thorough investigations of RLS and
LMS algorithms for adaptive spectral filtering can be found in Reference [142].
% &−1 % &
W = Z Z† Z X† . (9.132)
% &
Z Z†
Q=
ns
and
% &
Z X†
V= , (9.133)
ns
respectively, where ns is the number of samples in the block of data. For the
mth update, Qm and Vm indicate estimates of the receive covariance matrix Q
and the cross-covariance matrix V, respectively.
A column of Z is denoted zm ∈ Cn r ×1 and is the mth observation. A column
of X is denoted xm ∈ Cn t ×1 is the mth vector of known symbols transmitted.
The (m + 1)th updated estimate of the data-reference cross-covariance matrix
9.5 Iterative receivers 329
Vm +1 is given by
m Vm + zm +1 x†m +1
Vm +1 = . (9.134)
m+1
Just so that the notation does not become too cumbersome, we have dropped
the notation ˆ· for estimated values in this discussion. If the observed data vector
zm is drawn from a stationary distribution, then the estimated data-reference
cross-covariance matrix converges to the exact solution V,
lim Vm = V . (9.135)
m →∞
Similarly, the (m+1)th updated estimate of the receive spatial covariance matrix
Qm +1 of the received signal is given by
m Qm + zm +1 z†m +1
Qm +1 = , (9.136)
m+1
and under the same assumption for the data vector z, the estimated receive
spatial covariance matrix converges to the exact solution Q,
lim Qm = Q . (9.137)
m →∞
β Vm + zm +1 x†m +1
Vm +1 = (9.138)
β+1
and
β Qm + zm +1 z†m +1
Qm +1 = . (9.139)
β+1
Updates for the estimate of the inverse of the covariance matrix can be found
directly. From Equation (2.113), the Woodbury formula is given by
By using this relationship, the updated estimate of the inverse of the receive
covariance matrix Q−1
m +1 can be found,
†
Q−1
m +1 = (β + 1) (β Qm + zm +1 zm +1 )
−1
†
−1 −1
(β Q m ) z m +1 z (β Qm )
= (β + 1) (β Qm )−1 − m +1
. (9.141)
1 + z†m +1 (β Qm )−1 zm +1
By combining the results for the cross-correlation matrix update and the in-
verse of the receive covariance matrix update, the (m + 1)th updated estimate
of the beamformers Wm +1 is given by
Wm +1 = Q−1
m +1 Vm +1
−1 (β Qm )−1 zm +1 z†m +1 (β Qm )−1
= (β Qm ) −
1 + z†m +1 (β Qm )−1 zm +1
· (β Vm + zm +1 x†m +1 )
†
Q−1 −1
m zm +1 zm +1 Qm zm +1 x†m +1
= Q−1
m − Vm +
β + z†m +1 Q−1
m zm +1 β
zm +1 x†m +1
= Wm + Q−1m
β
Qm zm +1 z†m +1 Q−1
−1
m zm +1 x†m +1
− Vm + . (9.142)
β + z†m +1 Q−1
m zm +1 β
†
qm +1 = Wm zm +1 − xm +1 . (9.143)
9.5 Iterative receivers 331
By substituting xm +1 in terms of the error into the relationship for the beam-
former update, the simpler form is given,
†
zm +1 (Wm zm +1 − qm +1 )†
Wm +1 = Wm + Q−1m
β
†
−1 −1 †
Qm zm +1 zm +1 Qm zm +1 (Wm zm +1 − qm +1 )†
− Vm +
β + z†m +1 Q−1
m zm +1 β
zm +1 q†m +1 Q−1 †
m zm +1 zm +1 Qm
−1
zm +1 q†m +1
= Wm − Q−1 +
β + z†m +1 Q−1
m
β m zm +1 β
zm +1 z†m +1
+ Q−1
m Wm
β
†
†
Q−1
m zm +1 zm +1 Q−1
m zm +1 zm +1
− I+ Wm
β + z†m +1 Q−1
m zm +1 β
†
†
Q−1
m zm +1 zm +1 Q−1
m zm +1 qm +1
= Wm − I −
β + z†m +1 Q−1
m zm +1 β
zm +1 z†m +1
+ Q−1
m Wm
β
⎛ ⎞
1 z† Q−1 zm +1 zm +1 z†m +1
−⎝ ⎠ 1 + m +1 m Q−1
m Wm
z †m + 1 Q −1
m zm + 1 β β
1+ β
1 †
= Wm − Q−1
m zm +1 qm +1 . (9.144)
β+ z†m +1 Q−1
m zm +1
The goal of the LMS algorithm is to minimize this error. The direction of steepest
descent, discussed in Section 2.12, is given by evaluating the additive inverse of
the derivative of the error with respect to each of the elements of the beamformer.
The nth element of the beamformer is denoted {wm }n . The complete gradient
denoted by 2∇w m∗ from Section 2.8.4 denotes a vector of Wirtinger calculus
derivatives (as discussed in Section 2.8.4) with respect to each of the beamformer
elements. The gradient of the error is given by
% &
2∇w m∗ m 2
= 2∇w m∗ m ∗m
% † &
= 2∇w m∗ (wm zm − xm )(z†m wm − s∗m )
= 2 ∗m zm (9.147)
so that the difference between the updated and the current receive beamformer
is given by
% &
wm +1 − wm ∝ −2∇w m∗ m 2
= −2 ∗m zm . (9.148)
The main contribution of the LMS algorithm is to suggest the relatively ques-
tionable approximation that the expected value above can be replaced with the
squared error associated with the form
% &
2∇w m∗ m 2 ≈ 2∇w m∗ m 2
= 2 ∗m zm . (9.149)
wm +1 − wm ∝ −2 ∗m zm
= −2 (z†m wm − x∗m ) zm . (9.150)
wm +1 = wm − μ 2 ∗m zm . (9.151)
Smaller values of the constant μ will improve the stability of the beamformer
update by reducing its sensitivity to noise, while larger values of the constant μ
will enable the beamformer to adapt more quickly. If the value of μ is smaller
than the multiplicative inverse of the largest eigenvalue of the receive covariance
matrix, then the beamformer will converge to the MMSE beamformer for a wide-
sense stationary environment.
It is sometimes useful to consider the normalized least-mean-squared (NLMS)
update. For this version of the update, the constant of proportionality μ is re-
placed with μ̃/(z†m zm ). This form reduces the sensitivity to the scale of z when
selecting μ̃.
9.6 Multiple-antenna multiuser detector 333
The underlying assumption in this chapter is that multiple users are transmit-
ting simultaneously in the same band at the same time. Often the transmitters
use spreading sequences to reduce multiple-access interference. This interference
could be from multiple antennas on a single transmit node, or from multiple
nodes in a network. The significant difference between receivers discussed in this
chapter and those discussed previously is that here the separation in temporal
structure between users is exploited in addition to the differences in spatial re-
sponses. Often it is assumed that these systems are employing a direct-sequence
spread-spectrum technique.
There is some inconsistency in the use of “linear” versus “nonlinear” in dis-
cussions regarding multiple-user receivers. Often these receivers are implemented
as iterative receivers in which the receiver operates on the same block of data
multiple times. An iterative receiver is not linear in some sense. However, if the
receiver employs a linear operator applied to some space, then it is generally de-
noted a linear receiver. In this case, separation between various receive states is
separated by a hyperplane in some high-dimensional space. Conversely, nonlinear
receivers separate receive states in both angle and amplitude [184]. To further
complicate this discussion, receivers that exploit spatial and temporal structures
simultaneously are linear in each domain.
There is a significant body of literature dedicated to multiuser detectors (MUD).
A large portion of this literature is dedicated to systems with a single receive
antenna and multiple cochannel users with single transmit antennas. Significant
contributions to this area were made in References [324, 323].
Multiple-antenna multiuser detectors (also denoted multiple-channel multiuser
detectors or MCMUD) have been discussed by a number of authors [98, 38, 335].
While these concepts were developed for cellular networks, they can be applied
to MIMO receivers. In particular, they are well matched to bit-interleaved, coded
modulation approaches [28].
−n s n r
πe
max p(Z|X; R, A) = |Z P⊥ † −n s
X Z | , (9.152)
R,A ns
By substituting this form for the estimate of the channel Â, the probability
density is given by
1 ⊥ †
R −1 (Z P ⊥
p(Z|X; R, Â) = e−tr{(Z P X ) X )} . (9.156)
|R|−n s π n s n r
Similar to the result found in Equation (9.60), by maximizing the probability
density with the above substitution for the nuisance parameter A, for an arbi-
trary parameter of the interference-plus-noise covariance matrix R, the estimate
R̂ is given by
1
R̂ = Z P⊥ †
X Z . (9.157)
ns
By substituting this result into the probability density, only the received data
matrix and the possible transmitted signals are left,
log p(Z|X; R̂, Â) ∝ | Z P⊥ † ns
X Z | . (9.158)
Although it is theoretically possible to use the form
Z P⊥
X Z
†
(9.159)
directly for demodulation, this is computationally very expensive. A more practi-
cal procedure is to pursue an iterative receiver. One approach is to choose a basis
and optimize along each axis of the basis in turn in an alternating projections
optimization approach [76]. By using the result from the previous optimization
9.6 Multiple-antenna multiuser detector 335
step and then optimizing along the next axis, the optimization climbs towards a
peak that is hopefully the global optimum. This iterative receiver can achieve the
maximum-likelihood performance; however, because the optimization criterion is
not guaranteed to be convex, convergence to the global maximum is not guar-
anteed. For many applications, it can be shown empirically that the probability
of convergence to the global maximum is sufficient to warrant the significant
reduction in computation complexity. A natural choice for bases is the signal
transmitted by each individual transmitter. Consequently, the receiver cycles
through the various rows of the transmitted signal matrix X.
The transmitted signal matrix X ∈ Cn t ×n s can be decomposed into the mth
row that is denoted here as x ∈ C1×n s and matrix with mth row remove Xm ∈
C(n t −1)×n s . We can construct a reorder version X̃ of the matrix X, given by
x
X̃ = . (9.160)
Xm
Because row space projection operators are invariant to reordering of rows (ac-
tually to any unitary transformation across the rows), the projection matrix for
the matrix X and X̃, so that P⊥ ⊥ ⊥ †
X = PX̃ , where PX̃ = I − X̃ (X̃X̃ )
† −1
X̃. The
⊥
matrix PX m that projects onto a subspace orthogonal to the row space of Xm
can be factored into the form
† † −1
P⊥
X m = I − Xm (Xm Xm ) Xm
= U† U , (9.161)
(n t −1)×n s
where the rows of U ∈ C form an orthonormal basis for the complement
of the row space of Sm . By using the definitions
ZU = Z U†
xU = x U† , (9.162)
the data and signal are projected onto a basis orthogonal to the estimates of
signals radiated from the other transmitters. It is useful to note that the two
quadratic forms are the same in the original or the projected bases,
Z P⊥ † ⊥ ⊥ ⊥
X Z = Z (PX m + PX m ) PX (PX m + PX m ) Z
†
= Z (P⊥ ⊥ ⊥
X m ) PX (PX m ) Z
†
= Z P⊥ † † −1
X m [I − X̃ (X̃ X̃ ) X̃] P⊥ X m Z
†
⎡ ⎤
† † −1
⎣I − x x x x ⎦ P⊥
= Z P⊥
X m X m Z
†
Xm Xm Xm Xm
⎡ ⎤
† † †
−1
⎣I − x x x x X x ⎦ P⊥
= Z P⊥ m
X m Z
†
X m
0 Xm x† Xm X†m 0
†
⊥ x α · x
= Z PX m I − P⊥ †
X m Z , (9.163)
0 · · 0
336 Spatially adaptive receivers
† † −1
= Z P⊥ †
X m Z − ZU xU (xU xU ) xU Z†U
= ZU Z†U − ZU Px U Z†U
†
= ZU P⊥
x U ZU . (9.165)
w† ZU x†U
=1− , (9.167)
ns
where
w = R̂−1
U â ,
1 1
R̂U ≡ ZU Z†U = Z P⊥ †
X m Z ,
ns ns
â = ZU x†U (xU x†U )−1
= Z P⊥ † ⊥ † −1
X m x (x PX m x ) . (9.168)
Z = vs + N . (9.170)
λ1 {Q} = P + 1
λ2 {Q} = · · · = λm {Q} = 1 . (9.173)
15
Estimated λ
10 Covariance λ
Eigenvalue (dB)
5
−5
−10
2 4 6 8
Eigenvalue Number
matrix with a zero eigenvalue has zero probability. In particular, while the
“small” eigenvalues of the real covariance matrix are all 1, the noise eigenval-
ues of the estimated covariance matrix (ignoring any mixing with the received
signal) are given by the eigenvalues of a Wishart distribution discussed in Sec-
tion 3.5. An example, assuming ns = 16 samples and nr = 8 receive antennas,
of the difference between the eigenvalues is displayed in Figure 9.2. The total
noise-normalized received signal power is 10.
Depending upon the algorithm in which the eigenvalues will be used, the dif-
ference between the small noise eigenvalues of the estimated versus the real
covariance matrix may or may not be important. If the covariance matrix is in-
verted, then the small eigenvalues of the estimate can have a significant effect.
This effect can motivate the use of regularization to limit the small eigenvalues.
The range of eigenvalues can be much more dramatic in the case of space-time
covariance matrices that are temporally oversampled.
One approach to regularizing matrices is to perform an eigenvalue decompo-
sition of the matrix of interest Q,
Q = U D U† , (9.175)
interest,
{D}m ,m ; {D}m ,m > a
{D̃}m ,m =
a ; otherwise
a = λm ax {Q} , (9.176)
Q̃ = U D̃ U† . (9.177)
Q̃ = Q + tr{Q} I . (9.178)
Problems
9.1 At high SNR, compare the symbol error performance ML and MAP de-
coding for an unencoded QPSK constellation under the assumption that points
on the constellations of {±1, ±1} have
(a) equal symbol probability: p{±1,±1} = 1/4,
(b) symbol probabilities defined by
p{1,±1} = 2/6
p{−1,±1} = 1/6 .
9.2 At SNR of 20 dB per receive antenna (can assume high SNR), compare the
symbol error performance for a MMSE and MI beamformer for an unencoded
QPSK constellation with equal probabilities for each symbol. Assume a four-
antenna receiver in a line-of-sight environment in the far field with a signal of
interest, and a single interferer of arbitrary power, all with known channels.
Assume that the normalized inner product between
√ the array responses of the
signal of interest and the inner product is 1/ 2.
9.4 By employing the Wirtinger calculus, show that Equation (9.74) is the least
squared error solution for the estimator of X.
9.5 Evaluate the least-squares error beamformer which minimizes the Frobe-
nius norm squared of the error matrix E defined by
E = W† Z − X , (9.179)
and show that it provides the same solution as the approximate MMSE beam-
former found in Equation (9.74).
9.6 Extend the result in Equation (9.125) to include external Gaussian inter-
ference. Show that performance is still bounded by the uninformed transmitter
capacity.
9.7 Show that the LMS beamformer solution converges to the MMSE solution
in the limit of a large number of samples.
9.8 For a four-antenna receiver observing a known signal with 0 dB SNR per
receive antenna in a block-fading i.i.d. Gaussian channel that is static for at least
50 samples over which the beamformers are estimated, numerically evaluate the
average (over many channel draws) estimated signal error as a function of samples
1 to 50 for
(a) RLS
(b) LMS
(c) estimated MMSE using blocks of 10 samples,
where the RLS and LMS have no knowledge of the channel at the first sample.
9.9 For a four-antenna receiver observing a known signal with 0 dB SNR per
receive antenna with a 10 dB INR per receive antenna interferer in a block-
fading i.i.d. Gaussian channel that is static for at least 50 samples over which
the beamformers are estimated, numerically evaluate the average (over many
channel draws) estimated signal error as a function of samples 1 to 50 for
(a) RLS
(b) LMS
(c) estimated MMSE using blocks of 10 samples,
where the RLS and LMS have no knowledge of the channel at the first sample.
9.10 For a 10-antenna receiver observing a known signal with 0 dB SNR per
receive antenna in a block-fading i.i.d. Gaussian channel that is static for the
period of observation over which the beamformers are estimated, numerically
evaluate the average (over many channel draws) estimated signal error using the
estimated MMSE beamformer of the forming
−1
Z Z† Z X†
w= + I , (9.180)
ns ns
using blocks of five samples as a function of diagonal loading for the form de-
scribed in Equation (9.178).
10 Dispersive and doubly
dispersive channels
z(t) ↔ Z(f )
s(t) ↔ S(f )
n(t) ↔ N (f )
h̃(t) ↔ H̃(f ) . (10.2)
10.1 Discretely sampled channel issues 343
While nearly all modern communication systems use sampled signals, there are
some subtleties to be considered. As an example, consider a physical channel with
an arbitrary delay relative to signal sampling. In general, it will take an infinite
number of channel taps to represent the channel. For many problems, this will
have little effect on the analysis because it will only take a few channel taps to
provide a sufficiently accurate approximation. However, for some problems that
require precise representations (often these are found in theoretical analyses),
misleading results can be generated.
The approaches used to implement delay and Doppler offsets observed through-
out this chapter ignore the effects of noncommutative delay and Doppler opera-
tors. Because a dense set of delay and Doppler taps is assumed in the processing,
the approach is not particularly sensitive to this oversight. However, when one is
attempting to use sparse sets of delay and Doppler taps, more care is required.
For a delay shift d and Doppler-frequency shift f , the effects on a signal s(t)
are sometimes approximated by the operation
ei 2π f t s(t − d) . (10.8)
Two assumptions were used in this formation. First, the velocity difference is
small enough that the frequency offset can be described by a frequency shift.
Second, the delay-shifting operation is applied before the frequency-shifting op-
eration. This choice was arbitrary. A useful model for considering the frequency
shift is to induce the frequency shift via time dilation 1 1 + . With independent
local oscillators, the time dilation is caused by one clock simply running faster
than another. Consider the operators Td {·} and F {·} that delay time by d and
1 We are not defining time dilation here in the special relativity sense.
10.3 Effect of frequency-selective fading 345
Td {s(t)} = s(t − d)
F {s(t)} = s([1 + ]t) . (10.9)
However, if the product of the delay spread and the Doppler-frequency spread is
small, then the difference between the two operator orderings is small.
Here we consider a static channel with delay spread. For multiple-antenna re-
ceivers, the effect of delay spread can cause the rank of the receive spatial co-
variance matrix to increase. To demonstrate this effect, consider the following
simple two-tap channel model of a transmitted signal s(t) for the received signal
z(t) ∈ Cn r ×1 as a function of time t impinging upon an array
The units of power are selected so that the spatial covariance of the thermal
noise is assumed to be given by
% &
n(t) n† (t) = I , (10.13)
so that noise power per receive antenna is 1. The receive spatial covariance matrix
Q ∈ Cn r ×n r is given by
% &
Q = z(t) z† (t)
= h0 h†0 s(t) s∗(t) + h0 h†τ s(t) s∗(t − τ )
+ hτ h†0 s(t − τ ) s∗(t) + hτ h†τ s(t − τ ) s∗(t − τ ) + I
= P h0 h†0 + P ρτ h0 h†τ + P ρ∗τ hτ h†0 + P hτ h†τ + I , (10.14)
and
Consequently, even though there are multiple channel paths, there is a single
large signal eigenvalue.
ρτ ≈ 0 . (10.19)
From Equation (10.14), the receiver spatial covariance matrix Q is then approx-
imately given by
The mth eigenvalue of the receiver spatial covariance matrix λm {Q} is given by
For two-tap channels with taps that are well separated in delay, the eigenvalues
are given by
h0 2 + hτ 2 + ( h0 2 − hτ 2 ) + 4 h†0 hτ 2
2
λ1 {Q} = 1 + P
2
h0 + hτ − ( h0 2 − hτ 2 ) + 4 h†0 hτ 2
2 2 2
λ2 {Q} = 1 + P
2
λm {Q} = 1 ; m ∈ {3, . . . , nr } , (10.22)
where Equation (2.85) has been employed. For notational convenience, we will
make the following definitions. The normalized inner product between the array
10.3 Effect of frequency-selective fading 347
responses at the different delays is given by η, and the ratio of the norms of the
array responses is given by γ,
h†0 hτ
η=
h0 hτ
hτ
γ= . (10.23)
h0
By using these definitions, the eigenvalues λ1 {·}, λ2 {·}, and the rest λm {·} of
the receive spatial covariance matrix are given by
2
1 + γ 2
+ (1 − γ 2 ) + 4γ 2 η 2
λ1 {Q} = 1 + P h0 2
2
2
1 + γ − (1 − γ 2 ) + 4γ 2 η 2
2
λ2 {Q} = 1 + P h0 2
2
λm {Q} = 1 ; m ∈ {3, . . . , nr } . (10.24)
In the special case of equal array response norms so that γ = 1, the first two
eigenvalues are given by
The ratio of the second to the first eigenvalue in the high-power limit is given by
λ2 {Q} 1−η
≈ . (10.26)
λ1 {Q} 1+η
In another special case, if array response norms are not equal, but the array
responses are approximately orthogonal so that η ≈ 0, then the first two eigen-
values are given by
λ1 {Q} = 1 + P h0 2
λ2 {Q} = 1 + P h0 2
γ2 . (10.27)
From Equation (8.2), the received data vector z(t) using the standard flat-fading
MIMO channel signal model is given by
A dispersive channel is one that has temporally resolvable delay spread. This
induces frequency-selective channel attenuation. As an extension of Equation
(10.7), for a bandwidth-limited signal and a channel with a finite delay range, the
frequency-selective channel characteristics are incorporated by including channel
taps indicated by delay τm ,
nd
z(t) = Hτ m s(t − τm ) + n(t) , (10.30)
m =1
where Hτ m indicates the channel matrix at the mth delay, and τm the nd re-
solvable delays. In general, a set of physical delay offsets that are not matched
to the regularly sampled delay offsets will require an arbitrarily large number of
sample delays to represent the channel perfectly. However, given a moderate set
of sample delays τm , a reasonably accurate frequency-selective channel can be
constructed.
For a channel represented by nd delays, the space-time channel matrix H̃ ∈
Cn r ×(n t ·n d ) is given by
H̃ = Hτ 1 Hτ 2 · · · Hτ n d . (10.31)
Similarly, the matrix of the transmitted signal at the nd delays s̃(t) ∈ C(n t · n d )×1
is given by
⎛ ⎞
s(t − τ1 )
⎜ s(t − τ2 ) ⎟
⎜ ⎟
s̃(t) = ⎜ .. ⎟. (10.32)
⎝ . ⎠
s(t − τn d )
Consequently, the received signal is given by
z(t) = Hτ m s(t − τm ) + n(t)
m
for a regularly sampled signal with sample period Ts , a space-time data matrix
Z̃ ∈ C(n r ·n δ )×n s is constructed,
⎛ ⎞
Z0 δ τ
⎜ Z1 δ τ ⎟
⎜ ⎟
⎜ Z ⎟
Z̃ = ⎜ 2 δτ ⎟. (10.35)
⎜ . ⎟
⎝ .. ⎠
Z(n d −1) δ τ
As a reminder, we use nδ here rather than nd because nδ indicates the number of
delays used in the processing rather than that required to represent the channel
with some accuracy.
Here there is potentially some confusion because there is temporal sampling
both along the traditional temporal sampling dimension which is encoded along
the rows of space-time data matrix Z̃ and in delay which is mixed with the
receive antennas along the columns of space-time data matrix Z̃.
One of the reasons that this structure is interesting is that it can be used
to compensate for the eigenvalue spread observed in the spatial covariance ma-
trix in environments with resolvable delay spread. For the example of a single
transmitter in an environment with resolvable delay spread, the fraction of non-
noise-level eigenvalues approaches 1/nr as the number of delays and samples
350 Dispersive and doubly dispersive channels
is given by the convolution of the transmitted signal convolved with the channel,
z(t − τ ) = dq h(q) s(t − τ − q) , (10.39)
where we have abused the notation somewhat, such that here hf (f ) is the Fourier
transform of h(t), and sf (f ) is the Fourier transform of s(t). Implicit in this for-
mulation is the implementation of an infinite-dimensional delay space, which is
an approximation to the case in which the space-delay matrix is very large com-
pared with the delay spread of the channel. In evaluating the space-time covari-
ance matrix, the expectation is evaluated over time and draws of the transmitted
signal, but the channel as a function of delay is assumed to be deterministic. In
a continuous analog to Equation (10.38), the nr × nr cross-covariance matrix as-
sociated with the frequencies {f, f } of the outer product of the inverse Fourier
transform of the space-delay array response
% −1 &
Fτ {z(t − τ )} Fτ−1 †
{z(t − τ )} , (10.41)
where the expectation over the exponential produces a delta function, under the
assumption that the signal is uncorrelated at different frequencies. As a counter
example, cyclostationary signals would have some correlation across frequencies.
The resulting covariance is block diagonal with the outer product of channel
responses hf (f ) h†f (f ) at each frequency. For the finite case, with nδ delays, the
corresponding space-time covariance matrix is size nδ · nr × nδ · nr . In the limit
of nδ becoming large, because each block is rank-1 out of a nr × nr matrix, the
rank of the space-time covariance is given by nδ . Consequently, the fraction of
the eigenvalues that are not zeros is bounded by one over the number of receive
channels 1/nr .
352 Dispersive and doubly dispersive channels
on the ability for a receiver to decode other signals typically decreases. Thus,
by increasing the number of delays in processing, the typical performance of a
receiver that is observing multiple signals improves; however, this comes at the
cost of an increase in computation complexity.
Ŝ = W̃† Z̃ , (10.47)
such that
5 6
W̃† Z̃ − S 2
F (10.48)
Z̃ Z̃† (10.51)
ns ≥ nr nδ . (10.52)
Data Coding/
01101110... Modulation
Transmitted
Signal
Cyclic
IFFT
Prefix
Thus, each symbol is placed in its own subcarrier. The approximate width of
a bin in the frequency (which is approximate because each subcarrier has the
spectral shaping of a sinc function) is given by the bandwidth of the complex
baseband signal divided by the number of samples B/ns . If the width of a bin is
small compared with the inverse of the standard deviation of the delay spread σd
B 1
, (10.54)
ns σd
then the frequency-selective fading will typically move slowly across the fre-
quency bins. Consequently, representing the channel as a complex attenuation
in each frequency bin is a good approximation. In this regime, performing nar-
rowband processing within each bin works reasonably well. Approaches to ad-
dress doubly dispersive channels (discussed later in this chapter) using OFDM
waveforms have also been considered [200, 277].
At the OFDM receiver, an FFT is performed upon a block of received data Z ∈
n r ×n s
C to attempt to recover an estimate of the original frequency domain signal.
However, because of temporal synchronization errors and because of multipath
delay spread, the receiver cannot extract the exact portion of data that was
transmitted. This mismatch in temporal alignment causes degradation in the
10.5 Frequency-selective channel compensation 355
orthogonality assumption. Noting that a cyclic shift at the input of the FFT
induces a benign phase ramp across the output, the adverse effects caused by
delay spread and synchronization error can be mitigated by adding a cyclic
prefix. A portion (ncp samples) of the time-domain signal from the beginning
of the signal is added to the end of the signal at the transmitter, so that the
transmitted signal Y ∈ Cn t ×(n s + n c p ) is given by
Y = X x 1 x2 · · · x n c p , (10.55)
the transmitted signal with cyclic prefix has essentially the same form as the
transmitted signal without the cyclic prefix,
1
ns
( m −1 ) ( n −1 )
{Y}k ,n = √ {S}k ,m ei2π ns ∀ n ∈ {1, 2, · · · , ns + ncp } . (10.57)
ns m =1
The sum over m can be considered the sum over subcarriers that produces the
final time-domain signal. The received signal in the time domain Z for the nth
sample in time and the jth receive antenna is then given by
1
ns nt
( m −1 ) ( n −1 )
{Z}j,n ≈ √ {Hm }j,k {S}k ,m ei2π ns + {N}j,n , (10.58)
ns m =1
k =1
where here Hm ∈ Cn r ×n t is the channel matrix for the mth subcarrier. The
result is an approximation because the model of a flat-fading channel within a
subcarrier is approximate.
The significant advantage of OFDM is the implicit frequency channelization
that, given a sufficient density of frequency bins, enables narrowband process-
ing within each channel. In addition, by employing FFTs, the computational
complexity increases by order of the logarithm of the number of frequency chan-
nels per signaling chip (because it grows order ns log ns for the whole block of
ns chips). This increase in computational complexity is much slower than most
equalization approaches for single-carrier systems.
One of the significant disadvantages of the OFDM approach is that the trans-
mitted signal has a large peak-to-average power ratio, as discussed in Section
18.5. Ignoring the possibility of receivers that can compensate for nonlinearities,
which would be very computationally intensive, the large peak-to-average power
ratio imposes significant linearity requirements on the transmit amplifier. Typ-
ically, improved transmitter amplifier linearity comes at the expense of greater
power dissipation. The large peak-to-average ratio can be understood by noting
that the time-domain signal is constructed by adding a number of indepen-
dent frequency-domain symbols together. Even if the starting frequency-domain
symbols have a constant modulus, by the central limit theorem, the limiting
transmitted signal distribution is Gaussian. While not likely, occasionally values
356 Dispersive and doubly dispersive channels
drawn from a Gaussian distribution can be several times larger than the standard
deviation. Thus, the transmitted signal has a large peak-to-average power ratio.
The doubly dispersive channel model includes the tap delay characteristics of the
frequency-selective channel model and allows for the model to vary as a function
of time. A general model for the received signal z(t) ∈ Cn r ×1 as a function of time
t that extends the static sampled delay channel model and allows time-varying
channel coefficients is given by
nd
z(t) = Hτ m (t) s(t − τm ) + n(t) , (10.59)
m =1
Similar in form to the static case, the received signal z(t) is given by
z(t) = Hτ m (t) s(t − τm ) + n(t)
m
frequency δν , is constructed by
⎛ ⎞
Z0 δ τ , 0 δ ν
⎜ .. ⎟
⎜ ⎟
⎜ . ⎟
⎜ ⎟
⎜ Z(n δ −1) δ τ , 0 δ ν ⎟
Z̃ = ⎜ ⎟, (10.71)
⎜ Z0 δ τ , 1 δ ν ⎟
⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
Z(n δ −1) δ τ , (n ν −1) δ ν
where the data matrix for distortions of a particular delay offset τ and frequency
offset ν is given by
Zτ ,ν = ei 2π ν 0 T s z(0 Ts − τ ) ei 2π ν 1 T s z(1 Ts − τ ) ei 2π ν 2 T s z(2 Ts − τ )
· · · ei 2π ν [n s −1] T s z([ns − 1] Ts − τ ) . (10.72)
⎛ "n d −1 "n f −1 ⎞
k =0 hm δ τ ,k δ f s(t − [m + 0] δτ ; [k + 0] δf )
⎜ "n d −1 "n f −1 h
m =0
⎟
⎜ m =0 k =0 m δ τ ,k δ f s(t − [m + 0] δτ ; [k + 1] δf ) ⎟
⎜ .. ⎟
⎜ ⎟
⎜ . ⎟
⎜ "n d −1 "n f −1 ⎟
⎜ hm δ τ ,k δ f s(t − [m + 0] δτ ; [k + nν ] δf ) ⎟
⎜ "mn=0 "kn=0f −1
⎟
⎜ d −1 ⎟
⎜ k =0 hm δ τ ,k δ f s(t − [m + 1] δτ ; [k + 0] δf ) ⎟
z̃(t) = ⎜ "n d −1 "n f −1 h
m =0
⎟, (10.76)
⎜ m δ τ ,k δ f s(t − [m + 1] δτ ; [k + 1] δf ) ⎟
⎜ m =0 k =0
.. ⎟
⎜ ⎟
⎜ . ⎟
⎜ "n d −1 "n f −1 ⎟
⎜ k =0 hm δ τ ,k δ f s(t − [m + 1] δτ ; [k + nν ] δf )
⎟
⎜ m =0 ⎟
⎜ .. ⎟
⎝ . ⎠
"n d −1 "n f −1
m =0 k =0 hm δ τ ,k δ f s(t − [m + nδ ] δτ ; [k + nν ] δf )
where the notation s(t; δf ) indicates the signal at time t that is shifted by fre-
quency δf . Similar to form of the space-time covariance matrix, by rearranging
the sum so that terms with the same value of delay s(t − mδτ ; kδf ) are grouped,
the rank of the space-time-frequency covariance matrix can be bounded,3 and
the number of contributions to the rank can be found. Because the frequency and
delay contributions are independent, for any given frequency there are nd +nδ −1
delay contributions. Consequently, there are (nd +nδ −1) (nf +nν −1) contributing
terms. This accounting can be observed in the rearranged space-time-frequency
data vector that is given by
⎛ ⎞
h0 δ τ ,0 δ f
⎜ 0 ⎟
⎜ ⎟
z(t) = ⎜ .. ⎟ s(t − 0 δτ ; 0 δf )
⎝ . ⎠
0
⎛ ⎞
0
⎜ .. ⎟
⎜ ⎟
⎜ . ⎟
⎜ ⎟
⎜ 0 ⎟
⎜ ⎟
+ · · · + ⎜ h[n d −1] δ τ ,0 δ f ⎟ s(t − [nd − 1 + nδ − 1] δτ ; 0 δf )
⎜ ⎟
⎜ 0 ⎟
⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
0
⎛ ⎞
0
⎜ .. ⎟
⎜ ⎟
⎜ . ⎟
⎜ ⎟
⎜ 0 ⎟
⎜ ⎟
+ · · · + ⎜ h0 δ τ ,1 δ f ⎟ s(t − 0 δτ ; 1 δf )
⎜ ⎟
⎜ 0 ⎟
⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
0
⎛ ⎞
0
⎜ 0 ⎟
⎜ ⎟
⎜ .. ⎟ s(t − [nd − 1 + nδ − 1] δτ ;
+ ··· + ⎜ . ⎟ .
⎜ ⎟ [nd − 1 + nδ − 1] δf )
⎝ 0 ⎠
h[n d −1] δ τ ,[n f −1] δ f
(10.77)
Under the assumption that the channel and signal at each delay and frequency
are independent, the rank of the space-time-frequency covariance matrix is given
by the contribution of each of these (nd + nδ − 1)(nf + nν − 1) terms; thus,
Equations (10.74) and (10.75) are shown.
Ŝ = W̃† Z̃ . (10.78)
5 6−1 5 6
W̃ = Z̃ Z̃† Z̃ S̃†
≈ (Z̃ Z̃† )−1 Z̃ S̃† , (10.79)
362 Dispersive and doubly dispersive channels
Problems
dilation eventually causes a chip slip that cannot be corrected by a simple phase
correction. For a filtered BPSK signal with a carrier frequency of 1 GHz and a
bandwidth of 1 MHz with a relative fractional frequency error of 10−6 between
the transmitter and receiver, evaluate the expected receiver loss because of chip
misalignment as a function of time since a perfect synchronization.
10.3 By using the notation in Section 10.2, consider a loop in parameter space
for delay and Doppler channel operators on a signal. Starting at some point, and
moving operators through the space of delay and Doppler along some path, and
then returning to the original point, the signal should be unaffected, because the
effect should only be determined by the parameters’ values. However, evaluate
an effect on some signal s(t) of the following sequence of operators Td F T−d F−
and evaluate the error.
10.4 Consider the eigenvalues of the observed space-time covariance matrix
that is observing critically sampled signals. For a 10 receive antenna array and
a single transmit antenna with a line-of-sight channel (equal channel responses
across receive antennas), evaluate the eigenvalue distribution of the receive space-
time covariance matrix of a 0 dB SNR per receive antenna assuming unit variance
per antenna noise. Evaluate the eigenvalues under the assumption that the space-
time covariance matrix includes
(a) 1 (spatial-only)
(b) 2
(c) 4
delay samples at Nyquist spacing.
10.5 Consider the eigenvalues of the observed space-time covariance matrix
with two delays. The signal and noise are strongly filtered so that they sig-
nificantly oversampled (that is, the sampling rate is large compared to the
Nyquist sample rate). For a 10 receive antenna array and a single transmit
antenna with a line-of-sight channel (equal channel responses across receive an-
tennas), evaluate the eigenvalue distribution of the receive space-time covari-
ance matrix of a 0 dB SNR per receive antenna assuming unit variance per
antenna noise in the region of spectral support. Evaluate the eigenvalues approx-
imately under the assumption that signal and noise are temporally oversampled
significantly.
10.6 Consider the signal s(t) and the SISO doubly dispersive channel char-
acterized by a time-varying channel ht (t, τ ) and the delay-frequency channel
hD (fD , τ ); develop the form of a bound on the average squared error in using
the hD (fD , τ ) form under the assumption of a bounded temporal T and bounded
spectral B signal.
10.7 Develop the results in Section 10.6.1 by using discrete rather than integral
Fourier transforms.
10.8 Develop the Doppler-frequency analysis dual of Equation (10.7).
364 Dispersive and doubly dispersive channels
The following sections of the chapter discuss various classes of space-time coding
schemes.
where Q(.) is known as the Q-function, defined in Section 2.14.7, which is the
tail probability of the standard Gaussian probability density function,
∞
1 x2
Q(x) = √ dx e− 2 .
2π x
Note that we use the notation per r (SNR|g) to refer to the probability of error as
a function of the SNR, given the channel coefficient g.
With Rayleigh fading (complex Gaussian channel coefficients), the magnitude
square of the channel coefficient, that is, g 2 is exponentially distributed (see
Section 3.1.10). Hence, one can derive the marginal probability of error as follows
[255], [314]:
∞ ∞ √
g 2 SNR e−g = dτ Q τ SNR e−τ
2
per r (SNR) = d g Q
0 0
,
1 1 SNR 1 1
= − = +O ,
2 2 2 + SNR 2 SNR SNR2
where τ is an integration variable over the channel attenuation, that is, τ = g 2 .
The last equation indicates that a SISO QPSK system has diversity order 1 since
the dominant SNR term has an exponent equal to −1.
Note that in a fading system at high SNR, symbol and bit-error events are
typically due to a channel realization that is weak (rather than a spike in the
11.1 Rate diversity trade-off 367
noise), in the sense that the norm square of the channel coefficients is small
compared with the inverse of the SNR. Hence, for a SISO system with channel
coefficient g, for most practical modulation schemes, the probability of error is
approximated as follows [314]:
a
per r ≈ Pr g 2 < , (11.2)
SNR
where a is some constant that depends on the modulation scheme used.
where we recall that g is the magnitude of the channel coefficient for the duration
of communication. At very large SNR, the capacity can be approximated by
c ≈ log2 g 2 SNR . (11.4)
Because we assume Rayleigh fading, the magnitude square of the channel coef-
ficient g 2 is exponentially distributed. Hence,
SNR r −1
Pr( g 2 < SNRr −1 ) = dτ e−τ
0
r −1
= 1 − eSNR
= 1 − 1 − SNRr −1 + o SNR2(r −1)
for large SNR and multiplexing rate r < 1. The function d(r) is the diversity
gain associated with the multiplexing rate r. The previous expression indicates
the rate at which outage probability can be improved at the expense of data rate
and is a fundamental relationship for fading channels.
Consequently, for the SISO Rayleigh channel, the diversity gain d and multi-
plexing rate r are related by
d(r) = r − 1 (11.8)
for high SNR. Any real coding scheme is bound by this relationship.
For a general MIMO link with nt transmitter and nr receiver antennas, the
optimal diversity-multiplexing trade-off curve was found by Zheng and Tse in
Reference [362]. While a precise analysis of this result is quite complicated, we
present a brief description of their findings here, which are based on the analysis
given in References [362, 314].
The capacity of an uninformed transmitter MIMO link with spatially uncor-
related noise, as discussed in Section 8.3, is given by
Po
c = log2 I + HH †
nt
a2 Po
= log2 1 + λm
m
nt
SNR
= log2 1 + λm , (11.9)
m
nt
where Po is the total noise normalized power and a is the average attenuation
from transmit to receive antenna. The variable λm is the mth eigenvalue of GG† ,
where the matrix is given by G ∈ Cn r ×n t , such that G = H/a, and is drawn from
an identically distributed, complex, circularly symmetric, unit-variance Gaussian
distribution. The term a2 Po is also the total SNR per receive antenna.
Note that at high SNR, the spectral efficiency for such a MIMO system can
grow approximately linearly with the minimum of nr and nt , that is, writing n =
min(nr , nt ), the MIMO link can support a spectral efficiency of approximately
n log2 (SNR), where n is the multiplexing gain provided by the multiple transmit
and receive antennas. For some real system with spectral efficiency R operating
with a multiplexing rate of r ≤ n,
R ≈ r log2 (SNR) . (11.10)
The probability of outage is given by
pou t = Pr(c < R)
SNR
≈ Pr log2 1 + λm < r log2 SNR . (11.11)
m
nt
Hence, the outage probability is related to the joint distribution of the eigenvalues
of the matrix HH† . The joint distribution of these eigenvalues has a complicated
11.2 Block codes 369
(0, nr nt )
0 1 2 3 r n
Multiplexing gain, r
Figure 11.1 Optimal diversity multiplexing trade-off for the MIMO channel.
H = h = (h1 h2 · · · hn t )
⎛ ⎞
w1
⎜ w2 ⎟
⎜ ⎟
C=⎜ . ⎟s
⎝ . ⎠.
wn t
N = n,
can be thought of as weights applied to the signals on the antennas of the trans-
mitter. Suppose that the transmitter uses the following w:
1 †
w= h . (11.14)
||h||
Since this w is a unit-norm vector, the transmitted power does not change with
h. Additionally, note that this w precompensates for the phase offset introduced
by the paths between each transmit antenna and the antenna of the receiver such
11.2 Block codes 371
that the signal adds in phase at the receiver. Equation (11.13) then becomes
z = ||h|| s + n. (11.15)
Let hj be the channel coefficient between the jth transmit antenna to the antenna
of the receiver. Communication takes place over two time slots and the channel
matrix H which is a row vector h here, and is assumed to be constant over the
two time slots, is defined as follows:
h = (h1 h2 ) . (11.17)
Hence, the received samples at times 1 and 2 are the entries of the row vector
z = Z given by
z1 = h1 s1 + h2 s2 (11.18)
z2 = −h1 s∗2 + h2 s∗1 . (11.19)
372 Space-time coding
The receiver constructs a new vector w ∈ C2×1 whose first element is z1 and the
second is z2∗ . We can then write w as follows:
h 1 h2 s1 n1
w= + . (11.20)
h∗2 −h∗1 s2 n∗2
We can thus recover estimates of s1 and s2 by premultiplying w by the following
matrix
∗
1 h1 h2
, (11.21)
||h|| h∗2 −h∗1
which yields the following expression:
∗
ŝ1 1 h1 h2 h1 h 2 s1
=
ŝ2 ||h|| h∗2 −h∗1 h∗2 −h∗1 s2
∗
1 h1 h2 n1
+
||h|| h∗2 −h∗1 n∗2
||h|| 0 s1 ñ1
= +
0 ||h|| s2 ñ2
||h|| s1 ñ1
= + . (11.22)
||h|| s2 ñ2
Note that ñ1 and ñ2 are independent CN (0, 1) random variables since n1 and
n∗2 are independent CN (0, 1) variables and the matrix
∗
1 h1 h2
(11.23)
||h|| h∗2 −h∗1
has orthonormal columns.
Hence, two independent data symbols can be transmitted over two time in-
tervals. Since at high SNR the probability of bit error is mainly due to poor
fading conditions, the probability of bit error assuming unit transmit power is
approximately equal to the probability that ||h||2 is smaller than the inverse of
the SNR, that is,
0 a 1
Pr ||h||2 ≤
SNR
for some a. If the vector h is a 1 × 2 row vector of independent, circularly
symmetric, Gaussian random variables, the norm square of the vector, ||h||2 is
distributed as a complex χ2 random variable with two degrees of freedom, or a
real χ2 random variable with four degrees of freedom (see Section 3.1.11) and
σ 2 = 1. The CDF of a complex χ2 random variable with two degrees of freedom
is given in terms of γ(k, x), the lower incomplete gamma function as follows:
x
PχC2 (x; 2; 1) = γ 2,
2
1 2
= x + o(x2 ) ,
2
11.2 Block codes 373
where the last expression follows from Equation (2.272). Hence the probability
of error for the Alamouti code is
1 a 1
Pr ||h||2 ≤ ≈ + o ,
SNR 2 SNR2 SNR2
which indicates that the diversity order of the Alamouti code is 2. Therefore, the
Alamouti code enables one transmission per symbol time and obtains a diversity
order of 2. Note that the diversity order of 2 is achieved because the Alamouti
code transmits each symbol over each antenna, and so, at high SNR, an error
occurs only if the channel coefficients for both antennas are small. Note that this
is the same diversity order as that achieved by maximal ratio transmission which
was shown previously to achieve a diversity order of nt and one symbol per unit
time. Unlike maximal ratio transmission, the Alamouti code however does not
require the transmitter to have channel-state information. Transmitter channel-
state information requires significant overhead as channel parameters have to be
estimated at receivers and then fed back to transmitters.
(11.26)
This property enables simple linear decoding of the space-time code, as shown
in the previous section. For a general space-time block code for nt transmitter
antennas, let the generator matrix G have the property that
nt
G† G = |sj |2 I (11.27)
j =1
where the entries of the matrix G are s1 , −s1 , s2 , −s2 , . . . , sn t , −sn t , which rep-
resent the transmitted symbols from the nt antennas. If the sj s are real and
Equation (11.27) holds, the matrix G is known as a real orthogonal design [305].
Orthogonal designs permit easy maximum-likelihood decoding using only linear
operations as described in [305], which also provides some further examples of
real orthogonal designs for 4 × 4 and 8 × 8 generator matrices. Note that there
are only a small number of real orthogonal designs with all nonzero entries. A
4 × 4 example from Reference [305] is the following:
⎛ ⎞
s1 s2 s3 s4
⎜ −s2 s1 −s4 s3 ⎟
⎜ ⎟. (11.28)
⎝ −s3 s4 s1 −s2 ⎠
−s4 −s3 s2 s1
We refer the reader to specialized texts on space-time coding (for example, Ref-
erences [331, 157, 84]) for further examples and analyses of orthogonal block
codes. The material in this subsection is based on Reference [157].
The benefits provided by a space-time code, in terms of the diversity and cod-
ing gains, can be systematically analyzed by using codeword difference matrices
associated with the space-time coding scheme as introduced in Reference [307],
which is the basis for the discussion in this section. The codeword difference ma-
trix between a pair of transmitted codewords C and Ck is simply the difference
between the two codewords
Dk = C − Ck .
Recall that Ck ∈ Cn t ×n s , so Dk ∈ Cn t ×n s . The codeword difference matrix
Dk is used to bound the probability of a transmitted codeword Ck being erro-
neously decoded as a codeword C at the receiver. Assuming that the channels
between different pairs of antennas of the transmitter and receiver are indepen-
dent, identically distributed, circularly symmetric, Gaussian random variables of
11.3 Performance criteria for space-time codes 375
zero mean and unit variance, the error probability can be bounded from above.
In order to write this bound, for notational simplicity, define the matrix Ak as
the product of Dk and its Hermetian transpose as follows,
Ak = Dk D†k , (11.29)
with λm denoting the mth largest nonzero eigenvalue of the matrix Ak . Using
this notation, the probability of confusing C with Ck denoted by Pr (Ck → C )
is bounded from above as follows:
rank(A) 1
Pr (Ck → C ) ≤ Πm =1 λm −n r
× SNR −rank(A) n r . (11.30)
4
The derivation of this inequality, which uses bounds for the tail of the Gaus-
sian probability density function and linear algebra properties, can be found in
Reference [305]. From the right-hand side of Equation (11.30), the probability of
decoding the codeword Ck as C decays as
1
. (11.31)
SNR rank (A k ) n r
which means that, with good codewords, one can effectively increase the SNR.
Thus, space-time coding can also provide coding gain to the system. To maximize
the coding gain of a given space-time code, we need to choose the codewords
such that the minimum over all codeword pairs of the quantity in the previous
expression is maximized. In the literature, this is known as the determinant
criterion since the quantity in the parentheses of the previous expression equals
the determinant of Ak if Ak is a full-rank matrix.
Hence, the rank and determinant criteria can be used to design coding schemes
that have the desired diversity order and coding gains. The diversity order is
376 Space-time coding
increased by maximizing the minimum (over all pairs of codewords) rank of the
matrix Ak , which is the product of the codeword difference matrix associated
with the kth and th codewords. The coding gain is maximized by maximizing
the minimum over all codeword pairs of the determinant of this matrix.
0 4 0
2
6
2
6
1 1
0
4
States States
1
5
2 2
3
7 3 7
3 1 3
Allowed
transitions
Figure 11.2 8-PSK trellis diagrams that shows the states, and allowed transitions from
left to right. The labels on the arcs represent the transmit symbol corresponding the
state transition.
each sequence of bits. For instance in the third state and if the data bits are 01,
symbols 2 and 1 are transmitted from antennas 1 and 2, respectively, and the
next state will be state 2. This idea can be generalized to the scenario of a larger
numbers of transmit antennas where each arc in the trellis diagram corresponds
to a codeword q1 q2 · · · qn t whereby qi is the constellation point transmitted on
the ith antenna.
In addition to being described by a trellis, space-time trellis codes can also
be described in terms of an input–output relationship [307]. Consider a 2M -
ary phase-shift-keying transmission where the M bits associated with the tth
transmit symbol are b1t , b2t , . . . , bM t . Let the output at time t at each of the nt
antennas be contained in a vector xt ∈ Cn t ×1 . xt can be expressed as
m −1
M K
xt = bm (t−k ) cm k , (11.33)
m =1 k =0
1
1
0.5
0 2 0
−0.5
−1 3
−1 −0.5 0 0.5 1
Figure 11.3 QPSK constellation diagram and labelings for Tarokh space-time trellis
code.
The space-time trellis code described in Figure 11.5 can be written in the form
of Equation (11.33) as follows:
2 1 0 0
xt = b2(t−1) + b1(t−1) + b2t + b1t , (11.34)
0 0 2 1
where the addition is performed modulo 4.
Another example from Reference [307] is an 8-PSK code given by
4 2 5
xt = b3(t−1) + b2(t−1) + b1(t−1)
0 0 0
0 0 0
+ b3t + b2t + b1t , (11.35)
4 2 1
where the addition is done modulo 8.
The receiver uses a Viterbi decoder [327] to estimate the maximum-likelihood
transmitted signal over the length of the space-time code. The decision metric
that is used in the Viterbi decoder at a given symbol time t is the following:
$
nr $
$2
nt $
$ $
$yt − h k qk t $ , (11.36)
$ $
=1 k =1
1
2
0.8
0.6 3 1
0.4
0.2
0 4 0
−0.2
−0.4
−0.6 5 7
−0.8
6
−1
−1 −0.5 0 0.5 1
Figure 11.4 8-PSK constellation diagram and labelings for Tarokh space-time trellis
code.
Hence, the Viterbi decoder computes the path through the trellis with the lowest
accumulated decision metric. Note that the analysis bounding the probability of
error which led to (11.30) still holds in this case. The codewords correspond to
the different valid sequences through the trellis. Hence, the rank and determinant
criteria developed in Section 11.3 can also be used in the context of space-time
trellis codes.
Chen et al. [57, 58] proposed a different criteria to be used in designing space-
time trellis codes when the product of the number of transmitter and receiver
antennas is moderately high (nr nt > 3). They use a central-limit-theorem-based
argument to show that the probability of error associated with interpreting a
codeword Ck as C can be bounded from above in the limit as nr → ∞:
1 1 nt
lim Pr (Ck → C ) ≤ exp − nr SNR λi , (11.37)
n r →∞ 4 4 i=1
where the ith squared singular value of the codeword difference matrix λi is as
defined in Section 11.3. Thus, when the number of receiver antennas is large,
maximizing the trace of the matrix Ak should result in a smaller probability
of error. This criterion is introduced as the trace criteria by Chen et al. and is
used as a design tool to identify space-time trellis codes with low probabilities
of error [57, 58].
380 Space-time coding
Branch 10 11 12 13
labels
20 21 22 23
30 31 32 33
Figure 11.5 Space-time trellis code for two transmit antennas with 4-PSK and two bits
per symbol.
Using this scheme, Chen et al. provided several different space-time trellis
codes for QPSK and 8-PSK systems in [58]. One that is equal in complexity as
Tarokh’s code from Figure 11.5 and Equation (11.34) is given by the following
equation, where addition is done modulo 4,
1 2 0 2
xt = b2(t−1) + b1(t−1) + b2t + b1t . (11.38)
2 0 2 3
This code is shown to outperform the Tarokh code by approximately 2.1dB when
the number of antennas at the receiver nr = 4. That is to say, the probability
of error achievable with the code in Equation (11.38) is equal to the probability
of error achievable with the code in Equation (11.34) with approximately 2.1
dB higher SNR. For nr = 1, however, the two codes are comparable which
is unsurprising since the Tarokh code was primarily designed for one receiver
antenna.
The following 8-PSK space-time trellis code was given by Chen et al. [58] and
has a complexity comparable to the code given in Equation (11.35),
3 2 4
xt = b3(t−1) + b2(t−1) + b1(t−1)
4 0 0
2 4 0
+ b3t + b2t + b1t , (11.39)
1 6 4
where the addition is done modulo 8. The performance of this code is comparable
to the code given in Equation (11.35) with nr = 1. For nr = 4, however, this
code is better by approximately 1.7 dB. Note that the 1.7 dB number is obtained
based on an i.i.d., circularly symmetric Gaussian channel model with constant
11.5 Bit-interleaved coded modulation 381
Interleaving Binary
Binary Coding
Labeling
Figure 11.6 Bit-interleaved coded modulation block diagram. The jth data bit is
denoted by bj with the jth interleaved bit given by bj . The transmit symbol of the
kth antenna is denoted by xk .
fading across a given block. We refer the reader to Reference [58] for details and
additional space-time codes that were found using the trace criteria.
1
x1
Modulation
Serial to parallel
x2
Modulation
Binary bj cj
Interleaving
Coding
x nT
T
Modulation
1
y1
nR
ynR
Xk 0 := {x : ck = 0 and x = φ(c)}
Xk 1 := {x : ck = 1 and x = φ(c)} ,
Linear block codes can be used to perform effective space-time coding by di-
rect modulation of the encoder output [207]. Consider a system with nt trans-
mit antennas and a constellation of size M , that is to say, an M -ary symbol
is transmitted through each antenna. Since a constellation point can represent
m = log2 M bits of information, there are nt m bits of information that can be
represented using m-bit constellations on nt antennas. Hence, we can construct
a “space-time symbol” that can represent nt m bits of information. Now suppose
that the block code is defined over a finite field (Galois field) of order q. Each
symbol in a codeword can be represented by log2 q bits and if
log2 q = nt m
q = 2n t m ,
we can directly map each codeword symbol into a space-time symbol.
Perhaps the best possible error-control codes for a direct-modulation coding
scheme are low-density parity-check codes (LDPC), which are a class of capacity-
achieving, linear, block codes. These codes are defined by their parity-check
matrices C, which have a certain structure, and comprise mostly zero-entries,
hence the term low-density. Using posterior probability decoding algorithms,
these codes have been shown to approach the Shannon capacity to fractions of
a decibel in signal-to-noise ratio. We refer the reader to standard texts on error
control coding such as Reference [191] for further details.
Low-density parity-check codes with direct modulation can perform close to
the outage capacity of MIMO links as illustrated in Figure 11.11 in which a
low-density parity-check code over GF (256) performs close to the ideal outage
capacity for 4×4 and 2×2 MIMO systems. The main drawback of this technique
is that it is computationally intensive as the receiver must perform decoding
over a large finite field. One notable cost is associated with the fact that simple
likelihood ratios are not sufficient for the iterative decoding process since each
codeword symbol can take one of q possible values.
Using these codes and Bayesian belief networks for decoding, Margetts et al., in
Reference [207], find that this technique consistently outperforms the space-time
trellis codes proposed by Chen et al. [57] and with a computational complexity
of O(ns q log q), where ns is the number of symbols per decoding block.
386 Space-time coding
El-Gamal and Damen [86] introduced a framework for creating full-rate, and full
diversity, space-time codes out of underlying algebraic SISO codes. The general
approach is to divide a space-time codeword into threads over which single-input
single-output codes are used. Each thread is associated with a SISO codeword
and defines the antenna and time slot over which symbols corresponding to the
codeword are transmitted.
Consider a space-time system with nt transmit antennas with a codeword
spanning ns time slots. A thread refers to a set of pairs of antenna and time slot
indices such that all antennas (numbered 1, 2, . . . nt ) and time slots (numbered
1, 2, . . . , ns ) appear, no two threads have identical antenna and time slot pairs,
and the nt antennas appear an equal number of times in each thread. These
requirements ensure that each thread is active during every time slot, each thread
uses all of the antennas equally, and at any give time slot, there is at most one
thread active for each antenna.
A simple example of a thread for a system with ns = nt , which we denote by
1 , is
where the kth antenna is used at the kth time slot. From [86], by offsetting the
antenna indices and incrementing them modulo nt , we arrive at the following
generalization to the above thread for 1 ≤ j ≤ L ≤ nt :
Ant. 1 1 4 3 2
Ant. 2 2 1 4 3
Ant. 3 3 2 1 4
Ant. 4 4 3 2 1
Figure 11.9 Threads for universal space-time coding with four antennas, four time
slots and four threads. The horizontal and vertical dimensions represent the time slots
and antennas respectively. The numbers correspond to the threads. For illustration,
the antenna and time-slot pairs corresponding to thread three are shaded.
D-BLAST
The Bell Labs layered space-time (BLAST) architecture is a family of space-time
transmission schemes that were developed at Bell Labs. These schemes can be
described by using the universal space-time coding framework discussed in the
previous section. Diagonal-BLAST (D-BLAST) [99] is the first of these.
The D-BLAST scheme uses a diagonal threading scheme as depicted in Fig-
ure 11.10 with coefficients φ1 = φ2 = · · · = φL = 1, where L is the number
of threads. Each thread is at least nt symbols long, and, hence, is transmitted
through all antennas. Since each thread includes transmissions over each of the
388 Space-time coding
Ant. 1 1 2 3
Ant. 2 1 2 3
Ant. 3 1 2 3
Ant. 4 1 2 3
Unused space-
time slots
5
4 ´ 4 Achievable Rate
2 ´ 2 Achievable Rate
4.5
3.5 4 ´ 4 BICM−ID,
Throughput (bps/Hz)
4 ´ 4 Direct 4 iter.
GF(256)
3 LDPC
4 ´ 4 BICM−ID, 2´2
2.5 1 iter. Alamouti
2
4´4 2´2
64 State 64 State
1.5 STTC 2 ´ 2 Direct
GF(16) STTC
LDPC
1
0.5
0
−5 0 5 10
SNR per Receive Antenna (dB)
0 2 3 4 5
10 10 10 10
Floating Point Operations per Information Bit
Figure 11.12 Excess SNR versus computational complexity for space-time trellis codes,
bit-interleaved coded modulation, and direct modulation. The code used to generate
this figure is courtesy of Adam Margetts and Nicholas Chang.
information bit. Note that the direct LDPC modulation using GF (256) can
achieve an excess SNR of lower than 1 dB. This near-optimal performance comes
at the significant computational cost of approximately 8 × 104 floating point op-
erations per information bit.
Problems
By comparing the equations above with that of a SIMO system with four receiver
antennas, show that a diversity order of 4 is achievable with the Alamouti scheme
and two receiver antennas.
11.4 Consider a space-time coding system with nt = 2 transmit antennas,
nr = 2 receive antennas, and coding preformed over ns = 2 symbol times. Let
the codewords be as follows:
1 1 1 −j
C1 = , C2 = ,
1 1 j j
1 1+j 1 − j −j
C3 = , C4 = . (11.41)
1 1−j 1−j j
Using the determinant criteria, find the maximum diversity gain achievable using
this space-time code.
11.5 Using the constellation diagram in Figure 11.3 and the space-time trellis
code given in Figure 11.5, please list the transmitted symbols from each transmit
antenna due to the following sequence of bits: 10 11 11 01 10. You should start
at state zero.
11.6 Use the determinant criteria to compute the diversity gain of the Alamouti
code with nr receiver antennas.
11.7 Perform a Monte Carlo simulation of an Alamouti space-time coding sys-
tem with Quadrature-Phase-Shift-Keyeing (QPSK) symbols and single-antenna
receivers. Show that the diversity order is what you expect by plotting the log-
arithm of the error probability at high SNR.
11.8 Show that the 4×4 space-time block code described by the real orthogonal
generator matrix in (11.28) has full diversity.
11.9 Explain why the requirement that all antennas are used for any given
thread in the universal space-time code framework described in Section 11.7
results in full diversity gain.
12 2 × 2 Network
12.1 Introduction
Transmitter 1 Receiver 1
Transmitter 2 Receiver 2
Figure 12.1 2 × 2 MIMO channel. Solid arrows indicate signal paths and dashed
arrows indicate interference paths.
z1 = h11 s1 + h21 s2 + n1
z2 = h22 s2 + h12 s1 + n2 ,
Han–Kobayashi scheme
The Han–Kobayashi scheme is known to achieve rates within 1 b/s/Hz of the
capacity region of the Gaussian interference channel as shown in Reference [89].
The basic idea behind this scheme is that each transmitter partitions its data
into two separate streams, a common or public stream that is intended to be
decoded by both receivers, and a private stream that is intended to be decoded
by just the target receiver. By dividing the transmit power (and hence data
rates) appropriately between common and private streams, partial interference
cancellation can be performed by each receiver.
Suppose that the powers allocated by the jth transmitter to its private and
common streams are Ppj and Pcj , and the rates of the private and common
streams of link-j are Rpj and Rcj , respectively. Additionally, suppose that the jth
private and common symbols are spj and scj and the jth transmitter transmits
spj + scj . The sampled signals at receivers 1 and 2 are thus
The jth receiver decodes the common streams from both transmitters and sub-
tracts the contribution of the common streams before decoding the private
stream from the jth transmitter, treating the private stream from transmitter
k = j as noise. Thus, the data rates on the private streams must satisfy
Pp1 |h11 |2
Rp1 < log2 1 + (12.3)
Pp2 |h21 |2 + σ 2
Pp2 |h22 |2
Rp2 < log2 1 + . (12.4)
Pp1 |h12 |2 + σ 2
1 For expedience, we break with our own convention and use | · | to represent the absolute
value in this chapter.
12.2 Achievable rates of the 2 × 2 MIMO network 395
Rc2
Pc2 |h22 |2
log 2 1 +
Pp2 |h22 |2 + Pp1 |h12 |2 + σ2
Rc1
2
Pc1 |h11 |
log 2 1+
Pc2 |h21 |2 + Pp1 |h11 |2 + Pp2 |h21 |2 + σ2
Pc1 |h11 |2
log 2 1+
Pp1 |h11 |2+ Pp2 |h21 |2 + σ2
Pc1 |h12 |2
log 2 1+
Pc2 |h22 |2 + Pp2 |h22 |2 + Pp1 |h12 |2 + σ2
Figure 12.2 Rate region of common streams for Han–Kobayashi system, case 1.
The common streams are to be decoded by both receivers. For a given receiver,
we can model the common streams as a channel between the two transmitters
and the given receiver, with the private streams treated as noise. Hence we can
model this portion of the system as a multiple-access channel, for which the
capacity region is known and discussed in more detail in Section 13.2. Thus,
the common rates Rc1 , Rc2 must fall into the intersection of two multiple-access
channel capacity regions, each of which is a pentagon. The intersection of the
two pentagons can take several different forms depending on the parameters of
the system.
The possible intersections (excluding cases where one capacity region is a
subset of the other, and cases that can be constructed by reversing the roles
of the transmit–receive pairs), are illustrated in Figures 12.2 to 12.4. For the
common messages to be decoded with arbitrary low probability of error by
both receivers, the set of common rates Rc1 , Rc2 must belong to the intersec-
tion of the two pentagons illustrated using the solid and bold lines. The three
figures represent the different ways that the two multiple-access channels can
intersect.
The achievable rate region using the Han–Kobayashi scheme is the union over
all valid power allocations of all rate pairs (R1 = Rp1 + Rc1 , R2 = Rp2 + Rc2 ) for
which (Rc1 , Rc2 ) fall into one of the rate regions above, with the private rates
satisfying inequalities (12.3) and (12.4).
396 2 × 2 Network
Rc2
Pc2 |h21 |2
log2 1 +
Pp1 |h11 |2+ Pp2 |h21 |2 + σ2
Rc1
2
Pc1 |h11 |
log2 1 +
Pc2 |h21 |2 + Pp1 |h11 |2 + Pp2 |h21 |2 + σ2 Pc1 |h11 |2
log2 1+
Pp1 |h11 |2 + Pp2 |h21 |2 + σ2
Pc1 |h12 |2
log2 1+
Pc2 |h22 |2 + Pp2 |h22 |2 + Pp1 |h12 |2 + σ2
Figure 12.3 Rate region of common streams for Han–Kobayashi system, case 2.
Rc2
Pc2 |h22 |2
log2 1+
Pp2 |h22 |2 + Pp1 |h12 |2 + σ2
Rc1
2
Pc1 |h11 |
log2 1 +
Pc2 |h21 |2 + Pp1 |h11 |2 + Pp2 |h21 |2 + σ2
Pc1 |h11 |2
log2 1+
Pp1 |h11 |2+ Pp2 |h21 |2 + σ2
Pc1 |h12 |2
log2 1+
Pc2 |h22 |2 + Pp2 |h22 |2 + Pp1 |h12 |2 + σ2
Figure 12.4 Rate region of common streams for Han–Kobayashi system, case 3.
Note that in the example given above, we have illustrated a successive decoding
scheme where the common and private messages are decoded in a particular
sequence. In general, however, better performance (that is larger achievable rates)
could be obtained by joint decoding of the common and private streams. We
refer the reader to references such as Reference [59] which provides a relatively
12.2 Achievable rates of the 2 × 2 MIMO network 397
Recall that R1c and R2c are the rates associated with the common data streams
from transmitters 1 and 2 respectively. Similarly, recall that R1p and R2p are the
rates associated with the private streams of transmitters 1 and 2 respectively.
The various rates and covariance matrices need to satisfy certain requirements
so that the common data streams are decodeable at both receivers for a given
decoding order. For all choices of decoding order, the private rates need to satisfy
R1p < log I + H11 K1p H†11 σ 2 I + H21 K2p H†21 −1
R2p < log I + H22 K2p H†22 σ 2 I + H12 K1p H†12 −1 .
In the previous two expressions, observe that in decoding the private streams,
each receiver only sees interference from the private stream corresponding to the
other transmitter as the common streams have all been decoded and subtracted
out by the time the private messages are decoded.
We can write different sets of inequalities corresponding to the different de-
coding orders of the common streams. For instance, suppose that receiver 1
decodes its common stream before decoding the common stream from trans-
mitter 2, and, likewise, receiver 2 decodes its common stream before decoding
the common stream from transmitter 1. Then the rates and covariance matrices
must satisfy the following requirements. For receiver 1 to be able to decode the
common stream from transmitter 1, we need
R1c < log I + H11 K1c H†11 σ 2 I + H11 K1p H†11 + H21 (K2p + K2c ) H†21 −1 .
(12.7)
Observe that the matrix that is inverted in the previous expression contains
contributions from the noise power, the private stream from transmitter 1, and
the private and common streams from transmitter 2. For receiver 2 to be able
to decode the common stream from transmitter 1, we need
R1c < log I + H12 K1c H†12 σ 2 I + H12 K1p H†12 + H22 K2p H†22 −1 . (12.8)
Observe that the matrix that is inverted in the previous expression contains con-
tributions from the noise power, the private stream from transmitter 1, and the
private stream from transmitter 2. Note that the common stream from trans-
mitter 2 does not contribute to the above expression as it is assumed to have
been decoded before the receiver decodes the common stream from transmit-
ter 1. Likewise, for receiver 1 to be able to decode the common stream from
transmitter 2, we require that
R2c < log I + H21 K2c H†21 σ 2 I + H11 K1p H†11 + H21 K2p H†21 −1 , (12.9)
12.3 Outer bounds of the capacity of the MIMO interference channel 399
and for receiver 2 to be able to decode the common stream from transmitter 2,
we need
R2c < log I + H22 K2c H†22 σ 2 I + H22 K2p H†22 + H12 (K1p + K1c ) H†12 −1 .
(12.10)
Note that inequalities (12.7) to (12.10) refer to the specific case of the receivers
decoding their respective common streams first followed by the other common
stream. One may write corresponding equations for other decoding orders.
Thus, one can construct an achievable rate region of the MIMO interference
channel as the convex hull of the rate pairs R1 = R1c + R1p and R2 = R2c + R2p .
The convex hull is taken over all possible decoding orders of the common streams
and over all possible covariance matrices K1c , K2c , K1p , and K2p that satisfy the
requirements of their respective decoding orders. Furthermore, the covariance
matrices must respect the following power constraints,
trace (K1c + K1p , ) ≤ P1
trace (K2c + K2p , ) ≤ P2 ,
where P1 and P2 are the power constraints on transmitters 1 and 2 respectively.
Since the achievable rate region depends on the covariance matrices in a com-
plicated way, visualizing the achievable rate region described above is compli-
cated. Most works in the literature that deal with the capacity region of MIMO
interference channels consider either the sum capacity (for example, Reference
[283]) or specific regimes of operation. For instance, in Reference [282], the ca-
pacity is found for the case that the interference is strong enough that it can be
decoded perfectly and then subtracted out. Note that as in the SISO case, joint
decoding of the common and private streams by the receivers can improve the
achievable rates compared to the sequential decoding described.
The achievable rate regions described in the previous section can be combined
with appropriate outer bounds in order to characterize the capacity region of the
2 × 2 network. Outer bounds to the capacity region are discussed in this section,
and the discussions are based on the pioneering work of Etkin et al. [89] who
originally derived these bounds and showed that they Han–Kobayashi scheme
described in the previous section is within one bit of the outer bounds.
and
which is the set of rates achievable as if the interfering paths did not exist,
that is, h12 = h21 = 0. In the high-interference case, the capacity region is the
intersection of two multiple-access channel capacity regions, each corresponding
to the multiple-access channel formed by the two transmitters and one of the
receivers.
Transmitter 1 Receiver 1
Transmitter 2 Receiver 2
Since the case P2 |h21 |2 > P1 |h11 |2 is already treated by the bounds in Equations
(12.11) and (12.12), it is sufficient to consider the case where
The sum capacity of the one-sided interference channel when P2 |h21 |2 < P1 |h11 |2
has been found by Sason in Reference [271], for the general one-sided interfer-
ence channel (that is, with the noise not necessarily Gaussian). For the case of
Gaussian noise, the sum capacity is bounded from above by the following
P1 |h11 |2 P2 |h22 |2
log2 1 + + log2 1 + 2 .
σ2 σ + P1 |h12 |2
Thus, one can write the following two bounds on the sum rate R1 + R2 of the
2 × 2 Gaussian interference channel,
P1 |h11 |2 P2 |h22 |2
R1 + R2 ≤ log2 1 + + log2 1 + 2
σ2 σ + P1 |h12 |2
P2 |h22 |2 P1 |h22 |2
R1 + R2 ≤ log2 1 + + log 1 + .
σ2 2
σ 2 + P2 |h21 |2
Note that the second bound is simply the first with the roles of the transmitter
and receiver reversed.
Noisy-interference bounds
A third type of bound can be found using a different genie-aided channel in which
the genie reveals to a particular receiver a noisy version of the interference which
402 2 × 2 Network
Transmitter 1 Receiver 1
Transmitter 2 Receiver 2
Figure 12.6 Genie-aided interference channel with interference plus noise revealed to
cross receivers.
that link causes to the other receiver. This bound can be explicitly described by
defining the following variables, which represent the interference plus noise seen
at the opposing receiver:
v1 = h12 s1 + n2
v2 = h21 s2 + n1 .
The genie reveals the noisy interference at receiver 2, v1 , to receiver 1 and the
noisy interference at receiver 1, v2 , to receiver 2. Note that v1 is the interference
plus noise seen at receiver 2 and v2 is the interference plus noise seen at receiver
1. This channel, where the broken lines represent information provided by the
genie, is illustrated in Figure 12.6.
This type of genie-aided network is different from the traditionally used genie-
aided network in that the information provided by the genie cannot be used
by any one node to perfectly cancel out interference. This technique provides a
useful bound to the sum capacity in certain regimes. Using detailed information-
theoretic techniques, it can be shown that an upper bound on the sum rate can
be written as
P2 |h21 |2 P1 |h11 |2
R1 + R2 ≤ log2 1 + +
σ2 σ 2 + P1 |h12 |2
P1 |h12 |2 P2 |h22 |2
+ log2 1 + + .
σ2 σ 2 + P2 |h21 |2
Again, we refer the reader to Reference [89] for details of the derivation.
Receiver 1B
Transmitter 1 Receiver 1A
Transmitter 2 Receiver 2
Figure 12.7 Genie-aided interference channel with interference plus noise revealed to
cross receiver and an additional receiver without aid of the genie.
For this type of network, it can be shown that the sum rate including the rate
achieved at the additional receiver is
2
P1 |h11 |2 P2 |h21 |2 σ + P1 |h11 |2
2R1 + R2 ≤ log2 1 + + + log2
σ2 σ2 σ 2 + P1 |h12 |2
P1 |h12 |2
P2 |h22 |2
+ log2 1 + + 2 . (12.15)
σ 2 σ + P2 |h21 |2
2
P2 |h22 |2 P1 |h12 |2 σ + P2 |h22 |2
R1 + 2R2 ≤ log2 1 + + + log
σ2 σ2 2
σ 2 + P2 |h21 |2
P2 |h21 |2 P1 |h11 |2
+ log2 1 + + 2 . (12.16)
σ 2 σ + P1 |h12 |2
Thus, we can say that the capacity region of the two-user Gaussian interference
channel when both interference channels are weaker than the corresponding di-
rect channels, that is,
P2 |h21 |2 ≥ P2 |h22 |2
P1 |h12 |2 < P1 |h11 |2 ,
we can write the following bounds that all rate pairs must satisfy:
P1 |h11 |2
R1 ≤ log2 1 + (12.25)
σ2
P2 |h22 |2
R2 ≤ log2 1 + (12.26)
σ2
P1 |h11 |2 P2 |h22 |2
R1 + R2 ≤ log2 1 + + log2 1 + 2 (12.27)
σ2 σ + P1 |h12 |2
P2 |h21 |2 P1 |h11 |2
R1 + R2 ≤ log2 1 + + (12.28)
σ2 σ2
P2 |h22 |2 P1 |h12 |2 P2 |h22 |2
R1 + 2R2 ≤ log2 1 + + + log2 1 + 2
σ2 σ2 σ + P2 |h21 |2
P2 |h21 | 2
P1 |h11 |2
+ log2 1 + + 2 . (12.29)
σ2 σ + P1 |h12 |2
12.3 Outer bounds of the capacity of the MIMO interference channel 405
Transmitter 1 Receiver 1A
Transmitter 2
Receiver 2A
Receiver 2B
Figure 12.8 Genie-aided mixed interference channel with interference plus noise
revealed to the cross receivers, interfering signal revealed at one receiver, and an
additional receiver without aid of the genie.
Except for the last inequality, the remaining expressions are either equivalent to
the weak interference channel, or can be found from the results of the weak inter-
ference channel. For instance, the first two inequalities are based on interference-
free communication and hold in all cases including the strong, weak, and mixed
interference channels.
The last inequality can be found by a genie-aided system with an additional
antenna (acting as an additional user) at node 2, as depicted in Figure 12.8. The
genie reveals the interference plus noise seen at receiver 2A to receiver 1 and the
interference plus noise seen at receiver 1 to receiver 2A. Additionally, the genie
reveals the interfering signal s1 to receiver 2A. Receiver 2B is not aided by the
genie and receives the signal s2 . The last bound can be found using arguments
detailed in Reference [89].
and
That is to say, the genie provides the jth receiver with the interference caused by
the jth transmitter on the kth receiver, for j = k. We can write two more bounds
that are simply the sum capacities of the multiple-access channels obtained by
12.3 Outer bounds of the capacity of the MIMO interference channel 407
The bound in Equation (12.39) applies when receiver 1 is able to decode the
messages intended for both receivers. Writing the singular-value decomposition
of the channel matrix between the jth transmitter and kth receiver as follows,
Hj k = Uj k Σj k Vj k , (12.41)
Similarly, the second bound in Equation (12.40) applies when receiver 2 is able
to decode the messages intended for both receivers, which occurs if
† †
γ1 V22 Σ−2 −2
22 V22 − ρ2 V21 Σ21 V21 ≥ 0 . (12.43)
Suppose now that receiver 1 is decomposed into two separate receivers. Assume
that s2 is revealed to one of the sub-receivers at receiver 1, and that v1 is revealed
to receiver 2. Then, once again generalizing the analysis of Reference [89], in
Reference [243] the following bound is found
Using a multiantenna version of Figure 12.8, we can write the following upper
bound proved in Reference [243].
R1 + 2 R2 ≤ log2 |K1 | + log2 |I + ρ2 R22 + γ2 R12 | +
−1
−1 †
log2 I + ρ2 H22 P2 + γ1 H21 H21 †
H22 . (12.48)
Cognitive radio systems are, loosely speaking, radio systems that can sense and
adapt to their environment in an “intelligent” way. Various authors have used
this term to mean different things, and Chapter 16 treats the topic of cognitive
radio and its various definitions in more detail. In this chapter, we consider one
form of cognitive radio whereby a cognitive transmitter–receiver pair, which we
refer to as the secondary link, wishes to transmit simultaneously and in the same
frequency band as an existing legacy link, which we refer to as the primary link.
Here we assume that the primary link must be able to operate at the capacity
as if the cognitive link was absent. In other words, the capacity of the primary
link must not be diminished by the existence of the cognitive link.
We define two different models for the 2×2 MIMO network, namely a network
with a non-cooperative primary link and a network with a cooperative primary
link. For the non-cooperative primary link model, the primary link operates as
if the secondary link does not exist. Hence, the secondary link must operate in a
manner such that it does not reduce the data rate of the primary link, without
requiring the primary link to modify its behavior. One possible method for this
is for the secondary link to transmit only when the primary link is not accessing
the medium or for the secondary link to transmit in a subspace that is orthogonal
to that used on the primary link.
In the cooperative primary link model, we assume that the primary link will
alter its behavior to accommodate the secondary link but not at the expense of
its communication rate. In other words, the primary link operates in a manner
that is judicious to the secondary link but without sacrificing its data rate.
More sophisticated assumptions can be made in the cooperative primary link
model as well. For instance, we may allow the primary transmitter to share its
data with the secondary transmitter, which can then encode its transmissions in
a manner that helps the primary link maintain its maximum data rate. We shall
12.4 The 2 × 2 cognitive MIMO network 409
not consider this type of cooperation in this chapter as it involves a high degree
of overhead for the data exchange between the transmitters.
Consider the two-link interference network of Figure 12.1, in which the solid
arrows are signal paths and broken arrows are interference paths. Suppose that
the link between transmitter 1 and receiver 1 is the primary link, and the link
between transmitter 2 and receiver 2 is the secondary link. Let R1 and R2 denote
the data rates on the respective links, and the matrices Hk j ∈ Cn r j ×n t k denote
the channel coefficients between the k-th transmitter and j-th receiver. With
zj ∈ Cn r j ×1 denoting the received-signal vector at receiver j, and sk the transmit-
signal vector from transmitter k, the following equations hold,
z1 = H11 s1 + H21 s2 + n1 (12.50)
z2 = H22 s2 + H12 s1 + n2 , (12.51)
where n1 and n2 are i.i.d. complex Gaussian noise vectors of variance σ 2 . Let
K1 ∈ Cn t 1 ×n t 1 and K2 ∈ Cn t 2 ×n t 2 respectively denote the covariance matrices of
the vectors of transmit samples s1 and s2 with tr(Kj ) ≤ P enforcing a common
power constraint on each transmitter.
Applying Equation (1) of Reference [92] to our network model, the maximum
rate supportable on link 1 if the signal from transmitter 2 is treated as noise is
given by the following bound
−1
† 2
R1 < log2 I + H11 K1 H11 σ I + H21 K2 H21 † . (12.52)
For the maximum rate supportable on link 2, simply replace 1 with 2 and vice
versa in Equation (12.52) such that
−1
R2 < log2 I + H22 K2 H†22 H12 K1 H†12 + σ 2 I .
(12.53)
We shall assume that the secondary (cognitive) transmitter and receiver know
all the channel matrices Hj k and that the number of transmitter antennas at
the secondary transmitter nt2 is greater than the number of receiver antennas
at the primary receiver, nr 1 .
1
diagonal. The primary transmitter transmits V1 Φ12 s1 , resulting in a transmit
covariance matrix at transmitter 1 of K1 = V1 Φ1 V1† . The primary receiver
multiplies the receive-signal vector z1 by U†1 . This operation effectively produces
a system with nr 1 parallel channels as follows:
1
z̃1 = U†1 H11 V1 Φ12 s1 + H21 s2 + n1 (12.55)
1
= U†1 H11 V1 Φ12 s1 + U†1 H21 s2 + U†1 n1 (12.56)
1
= Λ1 Φ1 s1 + U†1 H21 s2 + U†1 n1 ,
2
(12.57)
where λi is the ith largest eigenvalue of the matrix H11 H†11 . The notation (x)+
means the maximum of x or zero, in other words, (x)+ = max(0, x). The “water
level” η is chosen such that
nt 1
P = φ1i . (12.59)
i=1
Thus, φ1i gives the power that should be allocated to the stream transmitted
along vi . Note that this scheme achieves the capacity of the MIMO channel in
the absence of interference as described in Section 8.3.2.
Since the primary link is non-cooperative, one option is for the secondary link
to transmit in a subspace that is orthogonal to the subspace used in the primary
link. Suppose that the water-filling power allocation for the primary link allocates
zero power to K modes, that is, φ1j = 0 for j > nr 1 −K, for some integer K ≥ 0.
Thus, K spatial modes are available for secondary-link transmissions. Note that
this is the spatial analog of spectral scavenging, which is a commonly studied
cognitive radio paradigm.
1
Suppose that the secondary transmitter transmits K22 s2 instead of s2 , where
the matrix K2 ∈ Cn t 2 ×n t 2 is the covariance matrix of the signals transmitted by
transmitter 2. Substituting into Equation (12.57) yields
1 1
z̃1 = Λ1 Φ12 s1 + U†1 H12 K22 s2 + U†1 n1 . (12.60)
To avoid interfering with the primary link, the first nr 1 − K entries of the second
term on the right-hand side must equal zero as these correspond to the parallel
channels used by the primary link. Since s2 can be any vector, the first nr 1 − K
1 1
rows of the matrix U†1 H12 K22 must be all zeros. Since U†1 H12 K22 ∈ Cn r 1 ×n t 2 ,
it is possible to achieve this requirement if nt2 ≥ nr 1 − K. One can express this
requirement in matrix form by writing a diagonal matrix D ∈ Cn r 2 ×n r 2 whose
first nr 1 − K diagonal entries are unity and the remaining entries are zero. The
12.4 The 2 × 2 cognitive MIMO network 411
1
requirement that the first nr 1 − K entries of the matrix U†1 H12 K22 are all zero
can be written as
1
D U†1 H12 K22 = 0 , (12.61)
the secondary link operates as if there is no interference and the maximum rate
on the second link is given by the following bound
1
R2 < log I + 2 H22 K2 H†22 . (12.64)
σ
where K2c and K2p are the transmit covariance matrices of the common and
private streams of the secondary transmitter, respectively. In addition, R2c must
be supportable by the secondary link in the presence of interference from the
primary link and by self-interference from the private stream of the secondary
link which is captured by the following inequality,
−1
† 2 †
R2c < log I + H22 K2c H22 σ I + H22 K2p H22 + H12 K1 H12 † . (12.66)
To ensure that the private stream of the secondary link does not interfere with
the primary link, the following needs to hold:
1
D U†1 H12 K2p
2
= 0. (12.67)
Hence, the secondary transmitter needs to find covariance matrices K2c and
K2p as well as rates R2c and R2p such that R2c + R2p is maximized subject to
the constraints in Equations (12.65), (12.66), (12.67) and the power constraint
tr (K2c + K2p ) ≤ P .
Problems
Point-to-point Networks
z = s1 + s2 + n , (13.1)
R2
Capacity region is
inside this
P2
log2 1+ pentagon.
σ2 A C
P2 D
R ′2 = log2 1 +
P1 + σ2
B
R1
P1 P1
R ′1 = log2 1 + log2 1+
P2 + σ2 σ2
Figure 13.1 Bounds on the capacity region of the multiple-access channel. P1 and P2
are the received power due to transmitters 1 and 2 respectively, and R1 and R2 are
the data rates per channel use of transmitters 1 and 2 respectively.
where the right-hand side of the previous expression comes from the Shannon
capacity (for example, see Section 5.3) of a link with transmit power budget of
P1 + P2 . These bounds are shown in Figure 13.1. Any rate pair (R1 , R2 ) that can
be decoded with arbitrarily low probability of error must be inside the pentagon
in Figure 13.1 in order to satisfy the bounds given above.
Next, we show that all points inside the pentagon in Figure 13.1 are achievable
with arbitrarily low probability of error. In other words, communication with
arbitrarily low probability of error is possible at all pairs of rates (R1 , R2 ) that
are inside the pentagon in Figure 13.1. For the rest of this chapter, we shall use
the term “achievable” to describe a rate at which communication with arbitrarily
low probability of error is possible.
Consider Figure 13.1. Points A and B are achievable when transmitters 1 and
2 respectively are off, and by using Gaussian code-books at the active transmit-
ter, since with one transmitter off, the system is reduced to an additive white
Gaussian noise channel. Point C is achievable if the receiver first decodes the
signal from transmitter 1 which transmits at rate R1 , while treating the signal
from transmitter 2 as noise which effectively increases the noise variance by P2 .
The signal from transmitter 1 can be decoded with arbitrarily low probability of
error since by treating the signal from transmitter 2 as noise, the Shannon ca-
pacity result given in Section 5.3 indicates that communication with arbitrarily
low probability of error is possible at rates satisfying
P1
R1 < log2 1 + . (13.5)
P2 + σ 2
Therefore, the signal from transmitter 1 can be subtracted from the received
signal with arbitrary accuracy, thereby allowing any rate R2 < log2 1 + Pσ 22 to
13.2 Multiple access and broadcast channels 417
be achievable. Point D is achievable using the same technique with the roles of
transmitters 1 and 2 reversed. Thus we have shown that all rates at the corner
points of the pentagon in Figure 13.1 are achievable. Finally, any point inside
the pentagon 0ACDB is achievable by time sharing between the strategies used
to achieve the corner points. Therefore, any point inside the pentagon in Figure
13.1 is achievable.
For comparison, consider a time-division multiple-access (TDMA) scheme (dis-
cussed in Section 4.3.2) in which transmitter 1 uses the channel for a fraction
α ≤ 1 of the time and transmitter 2 uses the channel for 1 − α of the time. The
achievable rate pairs now must satisfy
P1
R1 < α log2 1 + (13.6)
α σ2
P2
R2 < (1 − α) log2 1 + . (13.7)
(1 − α) σ 2
The factors of α and 1 − α that scale the noise power are due to the fact that
P1 and P2 are long-term average power budgets of transmitters 1 and 2 which
are respectively on for fractions α and 1 − α of the time.
By varying α, the rate pair (R1 , R2 ) traces out the dashed line shown in
Figure 13.2. Note that it is possible to meet the maximum sum-rate by selecting
α = P 1P+1P 2 , which yields the following after some algebraic manipulation:
P 1 + P2
R1 + R2 < log2 1 + . (13.8)
σ2
This analysis extends in a straightforward manner to multiple-access channels
with K users where the capacity region satisfies the following set of constraints
"
Pk
Rk < log2 1 + k ∈T2 (13.9)
σ
k ∈T
V arying
A C
Figure 13.2 Capacity region of the multiple-access channel with achievable rates using
TDMA in dashed lines.
strengths of the channels between the transmitter and each mobile, and the
power allocated by the transmitter to each mobile. Suppose that the transmitter
allocates power P1 and P2 to receivers 1 and 2 with P = P1 + P2 and that
the channels between the transmitter and receivers are denoted by h1 and h2
respectively. If the transmitted signal is the superposition of the signals intended
for receiver 1 and receiver 2, we can write the following expression for the signal
at the jth receiver,
zj = hj s1 + hj s2 + nj , (13.11)
Suppose that ||h1 ||2 < ||h2 ||2 . In this case, receiver 2 can decode s1 with
arbitrarily low probability of error provided that s1 is transmitted at a rate
R1 such that receiver 1 can decode s1 with arbitrarily low probability of error.
Hence, receiver 2 can perform successive interference cancellation and remove
the interference contribution caused by the signals intended for receiver 1. This
strategy leads to the following bound on the achievable rate of link 2,
P2 ||h2 ||2
R2 < log2 1+ . (13.12)
σ2
Note that this is the single-user bound, that is, the capacity of the channel
between the transmitter and receiver 2 if receiver 1 was not present.
13.2 Multiple access and broadcast channels 419
If receiver 1 treats the signal intended for user 2 as noise, it can then achieve
the following:
P1 ||h1 ||2
R1 < log2 1 + b/s/Hz . (13.13)
P2 ||h1 ||2 + σ 2
which is simply the MIMO channel capacity if both transmitters are treated
as a single transmitter with their antennas pooled together. As in the single-
antenna case, these rates can be achieved by interference cancellation and time
sharing.
The capacity region can then be found as the union of the regions described by
Inequalities (13.15) to (13.17), over all positive-semidefinite matrices T1 and T2 ,
420 Cellular networks
which respect the power constraints. Thus, the capacity region of the two-user
MIMO multiple-access channel is the following:
⎧ ⎫
⎪ † ⎪
⎪
⎪ R 1 < log I + 1
H T H
1 1 1 ⎪
⎪
J ⎨ 2
σ2 ⎬
†
(R1 , R2 ) s.t. R2 < log2 I + σ 2 H2 T2 H2
1
⎪
⎪ ⎪
⎪
tr(T j )≤P ,∀j ⎩ R1 + R2 < log2 I + 12 H1 T1 H† + 12 H2 T2 H† ⎪
⎪ ⎭
σ 1 σ 2
(13.18)
In the expression above, the inequalities in the braces are restrictions that the
pair of rates must satisfy for a given set of transmit covariance matrices T1 and
T2 . The union is of all pairs of rates such that the inequalities are satisfied, and
is taken over all covariance matrices that respect the power constraints, as the
transmitters may choose any pair of covariance matrices that satisfy the power
constraints.
Like the single-antenna, multiple-access channel, the analysis of the two-user
broadcast channel, extends in a straightforward way to broadcast channels with
K users, which yields the following capacity region:
J 1
†
(R1 , R2 ) s.t. Ri < log2 I + Hi T i Hi ∀T ⊆ (1, 2, . . . , K) .
σ2
tr(T j )≤P ,∀j i∈T i∈T
(13.19)
z = H† s + n, (13.20)
where the set A over which the supremum is taken is the set of all K ×K diagonal
matrices with non-negative entries that satisfy the transmit power constraint
tr{D} ≤ P . Note that the supremum refers to the smallest upper bound.
13.2 Multiple access and broadcast channels 421
The corresponding capacity region, which holds for systems with multiantenna
receivers as well, is more complicated to describe and was derived in Reference
[341]. In the following, we shall briefly describe the capacity region and refer the
reader to Reference [341] for details of its derivation.
Let the received signal at the kth receiver which has nr k antennas be given by
zk ∈ Cn r k ×1 as follows,
zk = Hk s + nk ,
where Hk ∈ Cn r k ×n t is the channel matrix between the base station and the
antennas of the kth receiver. The transmitted signal of the base station is s ∈
Cn t ×1 , and nk ∈ Cn r k ×1 is a vector of circularly symmetric, complex, Gaussian
noise of variance σ 2 at the antennas of the kth receiver. The transmitted signal
is a superposition of signals intended for each of the K receivers. If the vector
sk ∈ Cn t ×1 represents the signal intended for the kth receiver, the transmitted
signal vector is
K
s= sk .
k =1
For simplicity, let us again consider the K = 2 case. Using dirty-paper coding,
if the signal for receiver 1 is encoded first, the achievable rate bound on link 1 is
2 † †
σ I + H1 T1 H1 + H1 T2 H1
R1 < I(z1 ; s1 ) = log2 (13.23)
2 †
σ I + H1 T2 H1
1
†
R2 < I(z2 ; s2 |s1 ) = log2 I + 2 H2 T2 H2 . (13.24)
σ
Note that the link to receiver 2 can operate at a rate as if it had perfect knowledge
of the signal from transmitter 1.
422 Cellular networks
Extending this to K users where the transmissions are encoded in the order
1, 2, . . . K, we have
"K
2 †
σ I + Hk j =k Tj Hk
Rk < I(zk ; sk |s1 , . . . sk −1 ) = log2 "K . (13.25)
†
σ 2 I + j =k +1 Hk Tj Hk
To find the full capacity region, we have to take the convex hull of the union over
all possible encoding orderings and all possible covariance matrices. Note that
the convex hull of a set of points in Rk is the convex set that has the minimum
volume such that all the points are in the convex set.
To describe the general capacity region, we rewrite the above equation for a
general encoding order P so that the rate for the kth receiver to be encoded is
bounded as follows:
Taking the convex hull of the union over all transmit covariances and encoding
orderings, we arrive at the following achievable rate region:
⎧ ⎫
⎨ J ⎬
Convex Hull RP(1) , RP(2) , . . . , RP(K ) . (13.27)
⎩ " ⎭
P,T j s.t. T r (T j )≤P
This scheme is capacity achieving, as shown in Reference [341], which uses more
general assumptions, allowing for different numbers of antennas at each mobile
user and arbitrary noise covariance matrices. The proof is rather involved, and
we refer the interested reader there for details.
Ro
d RI
Figure 13.3 Portion of a cellular network with hexagonal cells. The smallest circle
containing a cell has radius RI and the largest circle contained within a cell has
radius Ro .
For the hexagonal cell model (see Figure 13.3) with minimum base station
separation d, the CDF, PDF, and kth moment of x are given by
⎧
⎪
⎪
⎪0,√ 2
⎪
if x < 0,
⎪
⎪
⎪ 3d 2
⎪
2 3π x
, if 0 ≤ x < d2
⎨ √ 2 √ 2
Fx (x) =
2 3π x
3d 2 − 4 d3x
2 cos−1 2xd
(13.28)
⎪
⎪ √ x2 1 √
⎪
⎪ +2 3 d 2 − 4 1 2
, if 2 ≤ x < 3
d 3d
⎪
⎪
⎪
⎪ √
⎩1, if x ≥ 33d ,
⎧
⎪ 4π
⎪ √3d 2 x, if 0 < x < d
⎨ √ 2√
−1
fx (x) = √4π x− 8 3x
cos d
, if d
<x< 3d (13.29)
⎪
⎪ 3d 2 d2 2x 2 3
⎩0, otherwise ,
and
√ k π
% k
& 2 3 d 6 1
x = dτ k +2
. (13.30)
k+2 2 0 (cos τ )
This result is due to the fact that x is statistically equivalent to the distance
between a random point in an equilateral triangle of side length d to the closest
vertex of that triangle. The CDF, PDF, and kth moments of a random point
to the closest vertex of an equilateral triangle are known and can be found in
references such as [210]. Equation (13.29) was given in Reference [249] without
derivation. The interpretation given here is based on Reference [124].
For the Poisson-cell model described in Section 4.3 with base station density
ρb , the link length x has the following CDF and PDF:
+
1 − e−π ρ b x , if 0 < x
2
Fx (x) = (13.31)
0, otherwise ,
+
2 πρb xe−π ρ b x , if 0 < x
2
fx (x) = (13.32)
0, otherwise ,
and the kth moment of the link length is given by
% k& 1 1
x = k Γ 1 + k . (13.33)
(ρb π) 2 2
Equation (13.31) is found by noting that the probability that there is no point
from a Poisson point process of intensity ρb in a region of area πx2 is e−π ρ b x .
2
Hence, the probability that there is at least one point from the Poisson process
inside an area πx2 is 1 − e−π ρ b x , which is precisely the probability that the link
2
length is less than x. The PDF is obtained by deriving the CDF, and the kth
moment is found by direct integration. A more detailed derivation of the PDF
may be found in Reference [210].
13.3 Linear receivers in cellular networks 425
−α /2
n
−α /2
z = h1 r1 s1 + hj ri sj + n , (13.34)
j =2
where sj is the transmitted data sample from transmitter j and the vector n
contains i.i.d. complex, circularly symmetric, Gaussian random variables with
variance σ 2 . Note that the channels between each transmitter and the represen-
tative receiver can be factored into a fading component given by the vectors hj
and large-scale path loss which attenuates power by a factor of rj−α . We shall
assume a common path-loss exponent α for the transmitters.
The SINR of the representative link, link 1 denoted by β, is thus
where
% w
& is the vector of weights applied by the receiver, and it is assumed that
||sj ||2 = 1.
For the matched-filter receiver, wM F = h1 as shown in Section 9.2.1. For
the antenna selection receiver, wA S is a vector of zeros with a single 1 in the
entry corresponding to the largest magnitude entry in h1 . The MMSE receiver
has a more complicated expression that depends on the covariance matrix of
the aggregate interference as well as the channel coefficients between the target
transmitter and the representative receiver.
The next three subsections derive the probability density function of the SINR
under these three receiver structures. The discussion on the matched filter and
antenna selection receivers is based on results from Reference [147], and the
discussion on the MMSE receiver is based on Reference [9].
where hj k is the kth entry of hj and jm is the element with the largest magnitude
in hj . Pj is the transmit power of the jth transmitter.
Note that ||hj k ||2 are i.i.d., unit-mean exponential random variables as shown
in Section 3.1.10. Hence, the CDF of h1m at some x is equal to the probability
that nr i.i.d. exponential random variables are less than or equal to x. The CDF
of an exponential random variable with unit mean is 1 − e−x for x ≥ 0 and zero
otherwise, as shown in Section 3.1.10. Hence, the CDF of h1m is
We have the CDF Ph m (x), of the strongest channel between the target transmit-
ter and the antennas of the receiver as follows:
+"n
k −k x
r nr
k =0 k (−1) e for x ≥ 0
Ph m (x) = (13.38)
0 otherwise,
Note that h1j m for j = 1 are simply i.i.d. exponential random variables. Remov-
ing the conditioning with respect to I, we have
Pr {SINR ≤ x} = dI Pr {SINR ≤ x|I} pI (I)
nr ) *
nr x rα
= (−1)k exp −k 1 (I + σ 2 )
k P1 I
k =0
nr α
) *
nr x r1 2 x r1α
= (−1) exp −k
k
σ exp −k I (13.40)
k P1 P1 I
k =0
The Laplacian of the interference ΦI (s) depends on the particular network model
with the Laplacian of the interference due to cellular models given in Section
13.3.6.
h†
where h̃j = ||h 11 || hj are i.i.d., circularly symmetric, unit variance Gaussian ran-
dom variables. Additionally, note that the interference power in the output of
"n
the matched-filter receiver I = j =2 |h̃j ||2 rj−α is statistically identical to the
interference seen using the antenna selection receiver from the last subsection.
To find the CDF of the SINR, we observe that S = ||h1 ||2 is the sum of nr
exponentially distributed random variables, which is a χ2 distributed random
variable with 2nr degrees of freedom as described in Section 3.1.11. The CDF of
428 Cellular networks
S, PS (.) is given by
r −1
n
xk −x
PS (x) = 1 − e . (13.45)
k!
k =0
By substituting for the norm squared of the channel vector S into Equation
(13.44), we can write the CDF of the SINR as
S r1−α P1
Pr {SINR ≤ x|I} = Pr ≤ x I, r1
I + σ2
α
x r1
= PS (I + σ 2 )
P1
α k
x r1 x rα
r −1
n (I + σ 2
)
P1 − P1 (I + σ 2 )
=1− e 1 . (13.46)
k!
k =0
where ΦW (s) = e−sσ ΦI (s). Note that the expression for the CDF of the SINR
2
given in the previous equation could be used to derive the probability of outage
for a wireless system with matched-filter receivers, assuming that an outage event
is defined as the event that the SINR is below some threshold.
13.3 Linear receivers in cellular networks 429
Since we assume that the channel parameters are known perfectly by the repre-
sentative receiver here, the maximum SINR beamformer and the MMSE beam-
former are equivalent as shown in Section 9.2.4. The MMSE beamformer is given
by Equation (9.80) as follows,
w = a R−1 h1 , (13.51)
where a is a scale factor that does not affect the SINR, and R is the interference
plus noise covariance matrix that is given as follows,
n
R=P hj h†j rj−α + σ 2 I . (13.52)
j =2
The SINR associated with the MMSE receiver is equal to that of the maxi-
mum SINR beamformer as shown in Section 9.2.4. Thus, from Section 9.2.4, the
SINR β is
and κ is
κ = rn−α
1
r1α rn−α
2
r1α · · · rn−α
r1α . (13.57)
2≤n 1 < ···n ≤n
Pr{SINR ≤ z} = 1−
R R K 2 α k −1
σ 2 r1α Ak σ r1 2 r2 · · · 2 rn
··· dr2 dr3 · · · drn exp −z z .
0 0 P (k − 1)! P R2n −2
k =1
(13.58)
Thus, the right-hand side of Equation (13.61) when evaluated at zero equals
zero. Hence, Equation (13.61) becomes
R 2π
1
dr dθ −1 r
0 0 1 + r−α β r1α
π R2 2 R−α β r1α
= πR −
2
2 F1 1, 1; 1 − ; . (13.64)
1 + R−α β r1α α R−α β r1α + 1
Substituting
1 r−α β r1α
−1=− (13.65)
1 + r−α β r1α 1 + r−α β r1α
Writing the inverse of the path loss as g = rα , we have the CDF of g given by
⎧
⎨1, for Rα < g
Pg (g) = 2 (13.69)
⎩ g α2 , for 0 ≤ g ≤ Rα .
R
Taking the derivative with respect to g to obtain the PDF of the inverse path
loss g,
⎧
⎨0, for Rα < g
pg (g) = 2 −1 (13.70)
⎩ 2 g α 2 , for 0 ≤ g ≤ Rα .
αR
13.3 Linear receivers in cellular networks 433
The CDF of the received power due to a randomly located transmitter condi-
tioned on g is therefore
0
h g p1
Pp (p|g) = Pr P ≤ p = Pr h <
g P
+ p
1 − e−g P , for 0 ≤ g
= (13.71)
0, otherwise.
Hence, for g ≥ 0, the CDF of the received power due to a randomly located
transmitter is given by
p
Pp (p) = 1 − L {fg (g)} , (13.72)
P
where L{.} denotes the Laplace transform operator. Using the Laplace transform
differentiation property given in Section 2.11,
p
pp (p) = 1 − L {g fg (g)}
P
1+ α2
2+α
2 P Γ α 2 p
= 2 − R2+α Ei − , Rα , (13.73)
α R2 p1+ α α P
where Ei (., .) denotes the exponential integral (see, for example, Reference [2]).
Finally, the Laplace transform of pp (p) is
2 2
2 sα P α 2 2+α 2 −2 + α −α
ΦP (s) = Γ − Γ + 2 F1 1, − , , −R s P .
α R2 α α α α
If there are exactly k-interferers, then we have the following for the Laplacian of
the interference powers,
k
ΦI (s) = (ΦP (s)) .
Suppose instead that the transmitters are distributed on the plane according to
a Poisson point process discussed in Section 3.4, with average density ρ users
per unit area, and that we are only concerned with the interference from users
located within the circular cell. This scenario may arise from an appropriate
frequency reuse scheme where transmitters in nearby cells operate in different
frequency bands and do not contribute appreciably to the interference seen at the
base station as illustrated in Figure 13.4. In this case, k is a Poisson distributed
random variable with mean equal to the average number of transmitters in the
cell which equals AR ρ with AR = πR2 . The probability mass function (PMF),
which is introduced in Chapter 3, of k is
(ρ AR )k −ρ A R
Pr{k users in cell} = e . (13.74)
k!
The z-transform for the Poisson PMF is
eρ A R (z −1) . (13.75)
434 Cellular networks
Figure 13.4 Circular cell with a Poisson distribution of transmitters. The square in the
middle represents the base station and the crosses represent mobile transmitters.
Note that the z-transform is also known as the probability generating function
for discrete random variables. It is known that the Laplacian for the sum of a
random number k of i.i.d. random variables is simply equal to the z-transform
of the PMF of k evaluated at the Laplacian of a single realization of the random
variable (for example, see [93]). Hence, the Laplacian for the interference from
nodes distributed in the circular cell is
ΦI (s) = eρ A R (z −1)
z =Φ p (s)
Note that for specific values of the path-loss exponent α, the Laplacian of the in-
terference takes simpler forms. For α = 4 in particular, Equation (13.76) reduces
to
√ √ √
sP −1 sP sP
ΦI (s) = exp π R ρ2
tan −π . (13.77)
R2 R2 2 R2
Hexagonal cells
Consider a portion of a cellular network with hexagonal cells as illustrated earlier
in Figure 13.3. Assume that there is no out-of-cell interference, that is, nearby
cells operate at orthogonal frequency bands, and that the transmitters use con-
stant powers. The Laplacian of the total interference observed at the base station
at the center of the middle cell is
where the area of the hexagonal cell AR is given in terms of the minimum base
station separation d as follows,
√
3 2
AR = d ,
2
and Φk (s; k) is the Laplacian of the interference due to exactly k interferers dis-
tributed randomly, with uniform probability in the hexagonal cell. Then Φk (s; k)
is given by
√ α
−4 π 2 2 3
Φk (s; k) = √ 2 F1 1, − , 1 − , −s PT GT
3 3 α α d
1 4π 2 2 2
+ √ · (PT GT ) α s2/α Γ − Γ 1+
α 3 d2 α α
√ α
3π 2 2 2
+ · 2 F1 1, − , 1 − , −s PT GT
2 α α d
∞
(2 n)! 8 · 3n
+
n =0
24n +1 (n!)2 (2 n + 1)(1 − 2 n)
√ α
2n − 1 2n − 1 3
× 2 F1 1, ,1 + , −s PT GT
α α d
√
2 3 (2 n)!
−
(2 n + 1)(1 − 2 n) 22 n (n!)2
α k
2n − 1 2n − 1 2
× 2 F1 1, ,1 + , −s PT GT , (13.79)
α α d
1 The Laplace transform of the PDF of the interference in hexagonal cells with Rayleigh
fading was found by Yifan Sun (unpublished).
436 Cellular networks
0.9
0.8
nr = 2
Outage Probability
0.7
0.6 nr = 4
0.5
0.4 nr = 16
0.3
nr = 8
0.2
0.1
0
−20 0 20
SINR (dB)
Figure 13.5 Outage probability vs. SINR of an antenna selection receiver in a circular
cell with nr = 2, 4, 8 and 16 receiver antennas.
that the users have single antennas and the base station at the center of the cell
has nr = 2, 4, 8 or 16 receiver antennas and uses the antenna-selection receiver.
The outage probability as a function of the SINR threshold can be calculated by
using Equations (13.41) and (13.77), where the outage probability is the prob-
ability that the SINR is less than or equal to the SINR threshold. The outage
probability is shown in Figure 13.5, which illustrates the diminishing returns of
using antenna selection with large arrays at receivers.
the interference and, hence, works well when the interference is close to being
spatially white. With power control and a large number of interferers, the inter-
ferers will be close to spatially white. If the transmit power in constant (that is,
no power control), the received interference power will be dominated by a few
interferers that are close to the receiver and the aggregate interference will be
not be close to spatially white.
We shall first present an asymptotic result derived in a general system that
admits, but does not depend on, a cellular architecture or power control. We then
apply this result to cellular systems with power control. Note that power control,
whereby nodes whose channels to their respective base stations are strong trans-
mit with lower power to reduce interference, greatly increases the complexity
of the analysis as the transmit signal powers become dependent on the relative
positions of the mobile nodes and their respective base stations. By employ-
ing an asymptotic analysis, we can address the complexity associated with this
correlation of transmit power and spatial position.
estimates performed when they acted as receivers in the past. The spatial in-
terference covariance matrix can be estimated by receivers by constructing a
sample interference covariance matrix by listening to aggregate transmissions
of the interferers. The asymptotic regime that we consider is when the ratio of
the number of interferers n and the number of receiver antennas nr denoted by
a = n/nr > 0 is a constant.
Let the channel coefficients between the antennas of transmitting node and the
jth receiver be contained in the nr × nt matrix γj Hj .
Following the analysis of Section 8.3.2, the spectral efficiency of link between
R and T is given by
⎛ ⎞−1
n
c = log2 I + γ H T H† ⎝σ 2 I + γj Hj Tj H†j ⎠ , (13.81)
j =1,j =
where γj is the path loss between node and node j, and Tj is the transmit
covariance matrix of node j, that is, it is the covariance matrix of the signals
sent on the transmit antennas of node j.
13.4 Linear receivers in cellular networks with power control 439
Next, we shall find transmit covariance matrices T that the th transmitter
uses to maximize c from Equation (13.81). Recall that each transmitter only
knows the channel matrix between itself and its target receiver. In other words
the th transmitter only knows the channel matrix H .
Performing a singular-value decomposition on the channel matrix Hj between
transmitter and receiver j yields
†
Hj = Uj Σj Vj . (13.82)
Let the vectors vj and uj denote the jth column of the right singular matrix
V1 and left singular matrix U1 respectively, with λj representing the jth
largest singular value of H1 .
From Section 8.3.2, we know that, without knowledge of the quantity in the
parentheses in Equation (13.81) which is the covariance matrix of the interference
plus noise observed at the antennas of the receiver, to maximize Equation (13.81)
the th transmitter should use a transmit covariance matrix T as follows:
T = V P V† (13.83)
Note that random matrices with Gaussian distributed entries maintain their
statistical properties when multiplied by unitary matrices. Thus, we can write
Equation (13.81) as
⎛ ⎞−1
n
c = log2 I + γ H V P V† H† ⎝σ 2 I + γj H̃j Pj H̃†j ⎠ ,
j =1,j =
(13.86)
Substituting the SVD of the channel H = U Σ V† yields the spectral effi-
ciency of the th link
⎛ ⎞−1
n
c = log2 I + γ U Σ P Σ† U† ⎝σ 2 I + γj H̃j Pj H̃†j ⎠
j =1,j =
⎛ ⎞−1
n
† † ⎝ 2 † ⎠
= log2 I + γ Σ P Σ U σ I + γj H̃j Pj H̃j U ,
j =1,j =
(13.87)
With [Q]j k denoting the jkth entry of Q, we have the jth diagonal entry of Q
as
⎛ ⎞−1
n +1
[Q]j j = u†1j ⎝σ 2 I + γ1j H̃j 1 Pj H̃†j 1 ⎠ u1j , (13.90)
j =2
where u1k is the kth column of U1 . Thus, the jth diagonal entry of I+γ11 Σ1 P1 Σ†1 Q
is 1 + γ11 λ1j n1r P1j [Q]j j . Hence, by the Hadamard inequality (see Section 2.2.3),
we can bound Equation (13.89) as
⎛ ⎞
M
1
c1 = log2 I + γ11 Σ1 P1 Σ1 Q ≤ log2 ⎝
†
1 + γ11 λ1j P1j [Q]j j ⎠ ,
j =1
nr
(13.91)
where we have used the fact that P1j = 0 for j > M . This upper bound is
achieved if the matrix Q is diagonal. This would be the case if the M data
13.4 Linear receivers in cellular networks with power control 441
streams from transmitter 1 do not interfere with each other. Substituting the
definition of [Q]j j from Equation (13.90) yields
⎛ ⎛ ⎞−1 ⎞
M
⎜ 1
n +1
⎟
c1 ≤ log2 ⎝1 + γ11 λ1j P1j u†1j ⎝σ 2 I + γ1j H̃j 1 Pj H̃†j 1 ⎠ u1j ⎠ .
j =1
nr j =2
(13.92)
After some matrix manipulations, the upper bound on the spectral efficiency
from Equation (13.92) can be written as
−1
M
1 † 1 †
c1 ≤ log2 1 + γ11 λ1j 2
P1j u1j σ I + K1 Φ1 K1 u1j , (13.93)
j =1
nr nr
For the lower bound in Equation (13.95), K̆j ∈ C(n r −M +1)×n M are matrices
whose entries are i.i.d., circularly symmetric, complex, Gaussian random vari-
ables of unit variance, and ŭj ∈ C(n r −M +1)×1 are unit-norm, isotropic, random
vectors. The lower bound is achieved when the receiver uses M − 1 of its de-
grees of freedom to null the interference from the M − 1 other streams from the
442 Cellular networks
desired transmitter when it decodes a particular stream from the target trans-
mitter. When the number of receiver antennas nr M , the degrees of freedom
used for nulling the interference from other streams in the lower bound are a
negligible fraction of the total nr degrees of freedom available at the receiver.
Thus we approximate the spectral efficiency of the representative link in the high
nr regime by the following:
−1
M
1 † 1 †
c1 ≈ log2 1 + γ1 2
P1j λ1j u1j σ I + K1 Φ1 K1 u1j . (13.96)
j =1
nr nr
Note that it can be rigorously shown that in the limit as nr → ∞, the upper
and lower bounds in Equations (13.93) and (13.95) coincide, precisely because
the M − 1 degrees of freedom used for nulling the self-interference in the lower
bound are negligible when nr is large [125].
our model to that of CDMA systems with random spreading codes, we use the
fact that uj are isotropic vectors as follows. Note that u1j in Equation (13.96) is
the jth right singular vector of the matrix H11 , which is assumed to have i.i.d.
CN (0, 1) entries. The right singular vectors of a matrix of i.i.d. complex Gaussian
entries (of which u1j is one) are isotropic, have unit norm (as shown, for example,
in Reference [315]), and are statistically independent of their associated singular
values λ1j due to the isotropic property of random vectors of circularly symmetric
complex Gaussian entries. Furthermore, due to this isotropic property of vectors
with i.i.d. CN (0, 1) entries, u1j can be expressed as
1
u1j = gj , (13.104)
||gj ||
where the entries of gj are distributed as i.i.d. CN (0, 1), which means that Equa-
tion (13.102) can be written as
−1
1 1 1 † 2 1 †
SINRj = 1 γ1 P 1j λ 1j g σ I + K 1 Φ1 K 1 gj . (13.105)
n r ||gj ||
2 nr nr j nr
The proof of this can be found in Reference [127]. If Hn (x) → H(x), it was
shown in References [13] and [313] that the term
−1
1 † 2 1 †
g nr σ I + K1 Φ1 K1 gj
nr j nr
converges with probability 1 to an asymptotic limit, which we define as βj . Note
that βj is the unique, non-negative solution for β(z) in the equation
∞
dH(τ ) τ
z β(z) + 1 = β(z) a (13.108)
0 1 + τ β(z)
13.4 Linear receivers in cellular networks with power control 445
1
→1
1
nr ||g||2
{∞}
SINRj → γ1 P1j λj βj . (13.109)
Thus, the SINR for each of the M modes of the representative link converges with
probability 1 to the right-hand side of Equation (13.109). This implies that the
spectral efficiency, which is simply the sum of log2 (1 + SINRj ) over j, converges
with probability 1 as follows
M
c1 → log2 (1 + SINRj ) . (13.110)
j =1
for β, becomes
∞
1 x fj (x/γ)
M
−σ 2 β + 1 = β a dx , (13.111)
0 M j =1 1 + x β
where γ is the path loss between the representative transmitter and represen-
tative receiver. Because the transmit powers are all equal to P , the probability
density function of the transmit powers f (x) = δ(x − P ) and Equation (13.111)
becomes
β aP γ
−σ 2 β + 1 = . (13.112)
1+P γβ
By applying the quadratic formula and selecting the positive term, the limiting
value for β, which we denote by βep to emphasize that this is from the equal
received power model, is found to be
2
10
Asymptotic 8 Streams
Simulation 4 Streams
2 Streams
Spectral Efficiency (b/s/Hz)
1 1 Stream
10
4 Streams 8 Streams
0 1 Stream 2 Streams
10
−1
10 0 1 2
10 10 10
Number of Antennas
Figure 13.6 Simulated and asymptotic spectral efficiency vs. number of antennas with
power control and the ratio of nodes to receive antennas n/nr = 1. The dashed lines
represent the standard deviation of the simulated spectral efficiencies. The path loss
of each interferer was assumed to be −125 dB, with noise power of 10−1 3 W, and
path loss exponent α = 4. The representative link had a path loss of −100 dB.
2
10
Asymptotic
Simulation 8 Streams
4 Streams
Spectral Efficiency (b/s/Hz)
2 Streams
1
10 1 Stream
4 Streams 8 Streams
0 1 Stream 2 Streams
10
−1
10 0 1 2
10 10 10
Number of Antennas
Figure 13.7 Simulated and asymptotic spectral efficiency vs. number of antennas with
power control and the ratio of nodes to receive antennas n/nr = 4. The dashed lines
represent the standard deviation of the simulated spectral efficiencies. The path loss
of each interferer was assumed to be −125 dB, with noise power of 10−1 3 W, and
path loss exponent α = 4. The representative link had a path loss of −100 dB.
448 Cellular networks
of the spectral efficiency is evident from the figure since the points representing
different trials of the simulation converge with increasing numbers of receive
antennas nr . Additionally, note that the standard deviation decays with nr ,
which indicates convergence in the mean-square sense. For nr ≥ 14 antennas
and 1000 trials, the largest deviation from the asymptotic prediction is less than
15% for n/nr = 1 and n/nr = 4. For nr ≥ 25, the largest deviation falls below
10% in both cases.
Consider an identical network to the one from Section 13.4, but now assume that
the receivers use matched-filter receivers (see Section 9.2.1) with single stream
transmissions, that is, the covariance matrices at the transmitters are unit rank.
The matched filter is attractive because it is simpler to compute than the MMSE
receiver, and does not require any information about the spatial structure of the
interference, which reduces protocol complexity. The SINR at the output of the
matched filter can be found to converge in probability to [313]
P
SINR = c, (13.115)
σ 2 + a p
where p is the expected value of the received interference power from any given
user in the network. This expression is computed by using the limiting empirical
distribution function of the interference powers seen at the receiver and the ratio
of the number of interferers to receive antennas a = n/nr .
For systems with power control where the received powers from interferers and
the representative transmitter are identically P , then p = P and the SINR is
given by [323, 313],
P 1
SINR = = 1 , (13.116)
σ2 + a P P σ2 + a
20
15
10
Limiting SINR (dB)
−5
Matched Filter (20 dB SNR)
−10 Linear MMSE (20 dB SNR)
Matched Filter (10 dB SNR)
−15 Linear MMSE (10 dB SNR)
−20 Matched Filter (0 dB SNR)
Linear MMSE (0 dB SNR)
−25 −2 −1 0 1 2
10 10 10 10 10
Users per receiver degrees of freedom n/nr
Figure 13.8 Limiting SINR versus users per degree of freedom for linear receivers. The
transmitter was assumed to not have channel-state information.
If a > 1, however, the MMSE receiver is unable to null out all the interference,
and at high SNR, there will be a significant amount of interference remaining
after the MMSE receiver. Note that in the low SNR cases, the matched filter
and the MMSE have approximately the same asymptotic SINR because when
the SNR is low, the interference is not very significant compared with the noise
and thus the matched filter which is optimal in noise performs nearly as well as
the MMSE receiver.
locations form a homogenous Poisson point process. In Figure 13.9, one such
case is shown where base stations are at hexagonal grid points with adjacent
base stations separated by distance d.
Suppose that overlaid on this cellular architecture is a circular network with n
wireless transmitters located at random i.i.d. points in a circle of radius R such
that
n = ρw π R 2 , (13.117)
where ρw is the area density of wireless nodes in this network. Let these n
transmitters, which will act as interferers to a representative link, be numbered
2, 3, . . . n + 1. Additionally, suppose that a representative transmitter is at dis-
tance r1 from the representative receiver and is located in the cell associated
with the representative receiver at the origin.
In order to limit the complexity of the following exposition, we assume that
the transmitters have single antennas and no channel-state information, and that
average power received at a distance r from a transmitter transmitting with
power P̄ is
P̄ r−α . (13.118)
α /2−1
Additionally, suppose that the th transmitter transmits with power nr P ,
and controls its transmit power as follows:
pt α
P = min rt , Pm ax , (13.119)
Gt
where rt is the distance between the th transmitter and its nearest base station.
Thus, Equation (13.119) models a scheme where the th wireless node tries to
α /2−1
achieve a target received power (relative to path loss) of nr pt at its nearest
base station, with the maximum power constrained by Pm ax . The scale factor
α /2−1
of nr applied to the transmit power is done in order to keep the system
interference limited as the number of receive antennas nr → ∞ as without this
scale factor, the MMSE receiver will be able to suppress interference to levels
comparable to the noise, resulting in a system that is no longer interference
limited. Note that, since both the representative transmitter and all interferers
are assumed to apply this scaling, the scaling does not affect the SINR when the
noise is negligible. For a given set of base stations, the link lengths and hence
transmit powers are independent random variables as they depend solely on the
locations of the wireless nodes, which are independent by assumption. Hence,
results derived using the assumptions of the previous section can be applied to
this network model with the transmit-power PDF fP (p), which depends on the
base station locations.
α /2
The SINR in interference-limited networks is known to grow as nr [123].
Hence, as nr increases, the SINR will increase without bound. Thus, we define a
13.5 Matched-filter receiver in power-controlled cellular networks 451
Base Stations
Wireless Nodes
Figure 13.9 Illustration of base stations on a hexagonal grid with wireless nodes in a
circular network.
normalized version of the SINR that normalizes out the rate of growth with the
number of receiver antennas as
βn r = n−α
r
/2
SINR, (13.120)
and the thermal noise is equal to σ 2 . On the basis of the analysis of Section
13.4, as long as the empirical distribution of the received interference powers
converges with probability 1 to an asymptotic limit function H(τ ), we will be
able to evaluate the limiting normalized SINR. The normalization of scaling of
the interference powers assures that the empirical distribution of the received
interference powers does indeed converge to a limiting function.
To explicitly show this convergence and to find H(τ ), recall that the represen-
tative receiver at the origin is connected to a representative transmitter that we
call node 1 at a distance r1 from the origin. The remaining transmitting nodes
numbered 2, 3, . . . n + 1 are treated as interferers. Let p̃ equal the average re-
ceived power (averaged over the channel fading) from transmitter at all the nr
antennas of the representative receiver and be given by
p̃ = nαr /2 P Gt r−α for = 2, 3, . . . n + 1. (13.121)
452 Cellular networks
r
th wireless node
rb
db
Base station
Cell boundary
Figure 13.10 A representative cell in which the th wireless node is located. The
distance between the wireless node and the receiver at the origin is bounded by the
difference between, and the sum of, the distance between the base station associated
with the th transmitter and the radius of the smallest circle centered at the base
station that contains the cell.
Pr{p̃ ≤ x} = dP Pr{p̃ ≤ x|P }fP (P )
= dP Pr{P nαr /2 r−α ≤ x|P } fP (P )
+ α1 7
r P
= dP Pr √ ≥ P fP (P ) . (13.122)
nr x
As the number of receiver antennas goes to infinity, the upper and lower bounds
in the expression above approach each other. In other words, as nr → ∞,
+ α1 7
rb P
Pr{p̃ ≤ x} → dP Pr √ ≥ P fP (P ) . (13.125)
nr x
Since the transmit power of the th node is asymptotically not dependent on the
distance of its base station from the representative receiver at the origin,
+ α1 7
rb P
Pr{p̃ ≤ x} → dP Pr √ ≥ fP (P ) (13.126)
nr x
+ α1 7
r P
→ dP Pr √ ≥ fP (P )
nr x
1
P α √
= dP 1 − Fr nr fP (P ) , (13.127)
x
where Equation (13.127) results from substituting Equation (13.123). Note that
the inequalities in Equation (13.123) hold if the cells are bounded. For the Pois-
son cell model, this may not be the case although the cells have finite area with
probability 1. In the case of Poisson cells, however, the convergence in Equa-
tion (13.126) can be shown by using an alternate technique which is given in
α /2
Reference [126]. Substituting Equation (13.117), a = n/nr , and b = π aρ w ,
the probability that the received power from the th interferer at the repre-
sentative receiver is less than or equal to a value x converges in the following
manner
−α / 2 − α2
n
−
x nr
π ρw P
Pr{p̃ < x} → dP n I+ −α / 2 − α1 √ 7 f (P )
P
x nr n
π ρw 0< P < π ρw
π ρw x − α2
= dP 1 − I{P b< x< ∞} fP (P )
a P
2
π ρw P α
= dP fP (P ) 1 − I{P < x }
a x b
x π ρ 5 6
w −2 π ρw − 2 ∞ 2
= FP − x α P 2/α + x α dP fP (P )P α ,
b a a x
b
(13.128)
454 Cellular networks
where IA is the indicator function which equals 1 if the condition A is true and
zero otherwise. By the Glivenko–Cantelli theorem (see, for example, Reference
[131]), the empirical distribution function of a set of i.i.d. random variables con-
verges uniformly, with probability 1, to its CDF (for example , see Reference [79]).
Hence, the empirical distribution function of the p˜ s converges with probability
1 to the right-hand side of Equation (13.128), that is, H(x) = Pr{p̃ < x, P }.
The derivative of H(x) is
dH(x) 2 π ρw 5 2 6 − 2 −1 2 π ρw − 2 −1 ∞ 2
= Pα x α − x α dτ fP (τ )τ α . (13.129)
dx aα aα x/b
The first term on the right-hand side of Equation (13.130) is evaluated using
Lemma 1 of Reference [123], which yields
∞
τ dH(τ ) 2 π ρw 5 2 6 2 2π
ma = P m π csc
α α
b 1+τm α α
∞ ∞
τ−α
2
2 π ρw m 2
− dτ dx fP (x) x α . (13.131)
α 0 1 + m τ τ /b
equation:
5 6 ∞
2 π ρ β r1α −2 τ−α
2
π 2π
P 2/α
β 2/α
csc − α dτ
α α P1 α 2
0 1+τβ
∞
α
2 β r1α −2 σ 2 P12
× dx fP (x) x α + 1− α2
= . (13.134)
τ /b 2 Gt ρw π P1 2 ρw π r12
We assume that all transmit nodes use Gaussian codebooks and the receiver uses
single-user decoding. Hence, the spectral efficiency of the representative link in
the limit is given by the Shannon formula from Equation (5.33) as follows:
Section 3.2.2):
This previous expression indicates that the spectral efficiency grows approxi-
α /2
mately as log2 (nr β). Hence, with appropriate normalization, the spectral ef-
ficiency converges to an asymptotic limit with probability 1 as nr → ∞. Addi-
tionally, the deviation of the mean spectral efficiency from its asymptotic value
can also be shown to decay to zero, that is,
c − log2 (1 + nαr /2 β) → 0. (13.136)
The previous expression implies that the asymptotic spectral efficiency log2 (1 +
α /2
nr β) is a good approximation for the mean spectral efficiency as the difference
between the two quantities decays to zero with increasing numbers of receiver
antennas nr .
From Equation (13.134), it is unclear what the limiting normalized SINR β is.
To obtain a more meaningful expression for β and the spectral efficiency, we can
simplify Equation (13.134) by showing that the second term on the left-hand
side of Equation (13.134) is small if the number of nodes in the network n is
much larger than the number of antennas at the representative receiver nr , that
is, when b is small. When b is small, the lower limit of the following integral,
∞
2
dx fP (x) x α , (13.137)
τ /b
of interferers is high (that is, large a), Equation (13.134) can be written as3
5 26 2
α
π P α βα 2π β r1α −2 σ 2 P12
csc + 1− α
≈ . (13.138)
α α 2 Gt ρw π P 2 2 ρw π r12
1
Suppose now that the link lengths are bounded such that the maximum distance
between any transmitting node and its desired receiver rM ≤ (Gt PM /pt )1/α . We
call this the sufficient power case since every wireless node can satisfy the target
received power (relative to path loss) at its desired receiver. The sufficient power
case corresponds to the base station separation being small enough that the
target received power is attained by each wireless node. Substituting Equation
(13.119) into Equation (13.141),
⎛ ⎛ ⎞α2 ⎞
⎜ pt α ⎜ nr ⎟ ⎟
c ≈ log2 ⎜ ⎜
⎝1 + Gt r1 Gα ⎝ ) α2 *
⎟ ⎟
⎠ ⎠
pt α
Gt rti π ρw r12
α2
nr
= log2 1 + Gα , (13.142)
rti
2 πρ
w
which is a function of the second moment of the link lengths arising from the
cell shapes. Hence, we can evaluate the spectral efficiency for different models
3 This approximation requires the solution of Equation (13.134) to be a continuous function
of β, which holds if the path loss exponent α is rational, as Equation (13.134) can then be
raised to a sufficiently high power, resulting in a polynomial equation in the limit of the
normalized SINR β, with real coefficients, which are known to have continuous roots.
13.5 Matched-filter receiver in power-controlled cellular networks 457
1
10
20%
−1
10 5%
10%
−2
10
Figure 13.11 Mean spectral efficiency vs. number of receive antennas for ρw = 10−3
and ρw = 10−2 nodes/m2 with unlimited transmit powers.
of the cellular architecture by computing the second moment of the link lengths
associated with the cell shape. For the hexagonal-cell model, the second moment
of the link lengths can be found using Equation (13.30) with k = 2, to compute
the second moment.
√
% 2& 3 sin π6 1 + 2 cos2 π6 5 2
x = d2 = d ≈ 0.14 d2 .
24 cos3 π6 36
Substituting into Equation (13.142) yields the following simple approximation
for the mean spectral efficiency of interference-limited hexagonal-cell systems
with a large number of receive antennas per base station:
α2
nr
c ≈ log2 1 + Gα , (13.143)
0.14 d2 π ρw
where the averaging is with respect to the locations of the interferers and repre-
sentative transmitter, and channel fading coefficients.
The mean uplink spectral efficiency from Monte Carlo simulations for wireless
node densities of ρw = 10−3 and ρw = 10−2 nodes/m2 , and unlimited trans-
mit powers per node versus the number of antennas at the representative base
station is illustrated in Figure 13.11. The square and asterisk markers represent
simulations of wireless networks with node densities of 10−2 and 10−3 nodes/m2 ,
respectively, and the solid lines represent the asymptotic mean spectral efficiency
from Equation (13.143).
Note that the asterisk and square markers coincide, indicating that the abso-
lute density of wireless nodes does not affect the mean spectral efficiency, and it is
the relative density of wireless nodes to base stations that matters. Furthermore,
it is clear that the asymptotic approximation Equation (13.143) holds when the
458 Cellular networks
number of receive antennas nr , is sufficiently large. For instance, when the base
station density is 20% of the wireless node density, the asymptotic and simulated
mean spectral efficiency are within 10% of each other when the number of receive
antennas nr ≥ 10. For lower densities of base stations, the convergence is slower,
for example, when the base station density is 5% of the wireless node density, the
difference between the simulated and asymptotic mean spectral efficiency drops
below 10% only when nr > 37.
Low power
√ budgets 1/α
If d > 3 (Gt Pm ax /pt ) , each node has a maximum power budget that is so
low that nodes may sometimes fall so far away from the nearest base station that
they cannot meet the target power requirement and hence some fraction of nodes
transmit at full power. In this case, the expected
% value
& of the transmit power
of the wireless nodes raised to the power 2/α, P 2/α (which is required to find
the mean spectral efficiency using Equation (13.141)) takes the following √forms
α
which can be found by using straightforward calculus. If Pm ax < pt /Gt d/ 3 ,
that is, a randomly located wireless node has some nonzero probability of
being unable to achieve the target received power, the following two cases
apply.
pt
d α
(1) If Pm ax < Gt 2 then
5 6 √ α2
2 2 3π Gt 4
Pα = Pmα ax − Pmα ax . (13.144)
3 d2 pt
d α √ α
(2) If pt
Gt 2 ≤ Pm ax < pt
Gt 3
3
d , then
5 26 √ − 2
2 π 3 pt α 4
P α = Pm ax −
α
2
Pmα ax
3d Gt
√ − 2 α1
2 3 pt α 4
−1 d pt
+ 2 Pm ax cos
α
d Gt 2 Gt Pm ax
√ 2 √ 2
3 d pt α 5 3 α2 Gt Pm ax α
+ − Pm ax 4 − d2 .
12 Gt 6d pt
(13.145)
5 26
When we substitute the appropriate expression for P α from above into Equa-
tion (13.141), we obtain the mean spectral efficiency for a link of given length r1 .
Averaged over the PDF of link lengths associated with the hexagonal-cell model,
13.5 Matched-filter receiver in power-controlled cellular networks 459
where the CDF Fx (x) and PDF fx (x) of the % link &lengths are given by Equations
2/α
(13.28) and (13.29) respectively, and the P are from the previous set of
expressions. The second term on the right-hand side of Equation (13.146) cannot
be easily evaluated in closed form but can be evaluated efficiently using standard
numerical integration techniques.
The mean spectral efficiency versus number of receive antennas for ρw = 10−4 ,
with 200 mW maximum transmit power per wireless node is illustrated in Figure
13.12. The different markers represent the simulated mean spectral efficiencies
for different relative densities of tethered to wireless nodes. The solid lines are the
predicted asymptotic mean spectral efficiencies obtained by numerically evaluat-
ing Equation (13.146). The close agreement between the simulated values and the
asymptotic prediction illustrates the utility of Equation (13.146) in estimating
the mean spectral efficiency.
Random cells
Suppose that instead of at hexagonal lattice sites, the base stations were located
at random points in the plane according to a Poisson point process with intensity
ρt nodes/m2 . The cells generated by such a process have random shapes and
constitute a Poisson–Voronoi tessellation of the plane, where the Voronoi cell
associated with each base station is the subset of the plane that is closer in
Euclidean distance to that base station than any other base station. Figure 13.13
illustrates a portion of such a network. The base stations are the circles and the
cell boundaries are the solid lines.
In general, the distances between wireless nodes and their closest base station
are correlated random variables, which implies that their transmit and received
powers at any receiver are correlated as well. This correlation arises because link
lengths of wireless nodes are related to each other through the random locations
460 Cellular networks
1
10
−1
10 ρt /ρw = 0.2
ρt /ρw = 0.1
ρt /ρw = 0.05
−2
Asymptotic
10 0 1 2
10 10 10
Number of Base Station Antennas
Figure 13.12 Mean spectral efficiency for ρw = 10−4 nodes/m2 with different relative
densities of tethered to wireless nodes. The transmit power budget was 200 mW and
path-loss exponent α = 4.
of the base stations. Intuitively, if a particular link is long, it is likely that that
link is located in a large cell, in which case the nearby wireless nodes will also
tend to have long links, which leads to a correlation between link lengths and
consequently transmit powers.
13.5 Matched-filter receiver in power-controlled cellular networks 461
We cannot directly apply the technique used for the hexagonal-cell system
in the previous section to find the mean spectral efficiency for the random cell
model as it requires the transmit powers of individual wireless nodes to be in-
dependent random variables. However, conditioned on a particular realization of
the base station point process (that is, conditioned on a given set of base station
positions), the transmit powers of the wireless nodes are independent as they
are simply functions of the wireless node locations, which are independent by
assumption. We can then write an expression for the mean spectral efficiency of
links conditioned on a particular realization of the base station process. We then
average over all realizations of the base station process to obtain an expression
for the mean spectral efficiency.
Consider a specific realization of the base station process which we call Πt .
We shall assume that Πt does not result in any Voronoi cell of infinite area.
Realizations of Poisson point processes that result in Voronoi cells of infinite
area are known to be zero probability events (for example, see [298] page 310),
and of course not physically possible. Hence, excluding such realizations does not
influence the mean spectral efficiency when averaged over all possible realizations
of the base station process. Additionally, we shift the coordinates of our system
such that there is a base station at the origin of the system for every realization
of Πt that we consider. We shall analyze a representative link of a length r1
between the base station at the origin and a representative transmitter, which
we assume is independent of Πt for simplicity.4
Conditioned on a realization of the base-station process Πt , and link length
r1 , and using the parameters as defined in (13.139) the mean spectral efficiency
is:
⎛ ⎛ ⎞α2 ⎞
⎜ nr ⎠ ⎟
c|Πt , r1 ≈ log2 ⎝1 + Gα P1 ⎝ 5 2 6 ⎠. (13.147)
P α Πt π ρ r1
2
⎛ ⎛ ⎞α ⎞ 2
⎜ nr ⎠ ⎟
= log2 ⎝1 + Gα P1 ⎝ 5 2
6 ⎠. (13.149)
2
P π ρ r1
α
The step from Equation (13.148) to Equation (13.149) is due to the ergodicity
of the Poisson–Voronoi tessallation [216]. The ergodicity implies that with prob-
ability 1, properties of different realizations of the Poisson–Voronoi tessellation
4 Note that in reality, r1 depends on Π t as the representative link must be contained in the
cell associated with the base station at the origin.
462 Cellular networks
will have equal means. Since we have conditioned on the fact that there is a point
of the base station process at the origin, the resulting process is not ergodic as it
is conditioned on there being a cell with an associated base station at the origin.
As the radius of the circular network R →5 ∞ however,
6 5the 6influence of the cen-
2 2
ter cell diminishes. This fact implies that P α Πt = P α with probability 1.
Intuitively, this property holds because5 typical
6realizations of the base station
2
point process result in equal values of P Πt since the expectation is taken
α
with respect to all the wireless nodes in the infinite network. Any realization that
does not have this property occurs with zero probability. A detailed discussion on
ergodicity of point processes is beyond the scope of this text but the interested
reader is referred to references such as [299, 71, 70] and [235].
The probability distribution fr (r) of distances r between any wireless node
and its closest base station, for r ≥ 0, is given by
fr (r) = 2 π ρt r e−π ρ t r .
2
(13.150)
5 6 ∞ α2
pt α
2 π ρt r e−π ρ t r
2
2/a
P = dr min r , Pm ax
0 G t
( Gp t P m a x ) α1 α2
t pt
2 π ρt r3 e−π ρ t r
2
= dr
0 Gt
∞
−π ρ t r 2
+ 1
dr Pm2/αax 2 π ρt re dr
Gt
( p t Pm a x ) α
α2 Gt
2
1 − e−π ρ t ( p t P m a x )
α
pt
= . (13.151)
Gt π ρt
Substituting Equation (13.119) into Equation (13.149) and taking the average
with respect to r,
⎛ α2 ⎞
∞
pt α nr
c ≈ dr log2 ⎝1 + min r , Pm ax r−α Gα % & ⎠
0 Gt P 2/a π ρw
· 2 π ρt r e−π ρ t r
2
⎛ α2 ⎞
( Gp t 1
Pm a x ) α
pt nr
dr log2 ⎝1 + ⎠
t
= Gα % &
0 Gt P 2/a π ρw
· 2 π ρt r e−π ρ t r
2
13.5 Matched-filter receiver in power-controlled cellular networks 463
⎛ ⎛ ⎞ α2 ⎞
∞
⎜ nr ⎠ ⎟
+ 1
dr log2 ⎝1 + Pm ax r−α Gα ⎝ 5 6 ⎠
Gt 2/a
( pt Pm a x ) α
Pt π ρw
· 2 π ρt r e−π ρ t r
2
⎛ α2 ⎞
2
Gt pt nr
= 1 − e−π ρ t ( p t P m a x ) log2 ⎝1 + ⎠
α
Gα % 2/a&
Gt P π ρw
⎛ α2 ⎞
∞
nr
+ 1
dr log2 ⎝1 + Pm ax r−α Gα % 2/a& ⎠
Gt
( p t Pm a x ) α P π ρw
· 2 π ρt r e−π ρ t r .
2
(13.152)
It is difficult to find a closed-form expression for the second term on the right-
hand side of Equation (13.152). We can use numerical integration to evaluate it.
However, if the transmit power of each wireless node is large (or the density of
base stations is high), Equation (13.151) simplifies to
5 6 ptα
2
2/a
Pt ≈ 2 (13.153)
Gtα π ρt
because the exponential term becomes negligible compared to unity. Equation
(13.152) then simplifies to
⎛ ⎛ ⎞α2 ⎞
⎜ nr ⎠ ⎟
c ≈ log2 ⎝1 + pt Gα ⎝ 5 6 ⎠. (13.154)
2/a
Pt π ρw
Observe that the mean spectral efficiency does not depend on the specific values
of ρt and ρw but rather on their ratio, which implies scale invariance of the
network when the thermal noise is negligible compared to the interference.
Note that while Equation (13.155) does not depend on the choice of pt , the
original equation used to derive Equation (13.155) was based on the assump-
tion that the system is interference limited, which means Equation (13.155) is
valid only when pt and ρw are sufficiently high that the system is interference
limited. The scale invariance implied by Equation (13.155) indicates that as in
the hexagonal-cell case, constant mean spectral efficiency can be maintained by
fixing the relative density of base stations to wireless nodes.
464 Cellular networks
1
10
−1
10
ρt /ρw = 0.2
−2
10 ρt /ρw = 0.1
ρt /ρw = 0.05
−3
Asymptotic
10 0 1 2
10 10 10
Number of Base Station Antennas
Figure 13.14 Mean spectral efficiency of uplink communications with random cells and
unlimited transmit powers. The base station and wireless node densities are denoted
by ρt and ρw .
1
10
ρt /ρw = 0.2
−1
10
ρt /ρw = 0.1
ρt /ρw = 0.05
Asymptotic
−2
10 0 1 2
10 10 10
Number of Base Station Antennas
Figure 13.15 Mean spectral efficiency of uplink communications with random cells and
200 mW transmit power limit per node. The base station and wireless node densities
are denoted by ρt and ρw .
1
10
20% relative
Mean spectral efficiency (b/s/Hz/Link) density
0
10
−1
10
5 % relative
density
10% relative
−2 density
10 0 1 2
10 10 10
Number of receiver antennas at base stations
Figure 13.16 Mean spectral efficiency of the uplink with random cells and hexagonal
cells and transmit power limited to 200 mW. Solid and dashed lines represent
hexagonal and random cells respectively.
translation invariant as assumed here, this comparison can shed some insight
into the performance differences between a network with a completely random
placements of base stations (according to a Poisson point process), and a network
where base stations are placed with the optimal packing density, at the hexagonal
sites.
For systems with limited transmit powers, we can numerically evaluate and
plot equations of the spectral efficiency corresponding to random and hexag-
onal cells as shown in Figure 13.16, where the solid and dashed lines repre-
sent hexagonal and random cells, respectively. The transmit power budget was
200 mW and wireless node density was 10−3 . We simulated relative densities of
base stations to wireless nodes of 5%, 10%, and 20% as shown in the plot. Note
that the difference in mean spectral efficiencies diminishes with the number of
antennas. However, for high base station densities the mean spectral efficiency
for random cells is significantly lower. For instance, with 10 antennas at the base
stations and 20% relative density of base stations to wireless nodes, the mean
spectral efficiency with hexagonal cells is twice that of random cells.
When base station density and/or transmit power budgets are high, the mean
spectral efficiency given by Equation (13.143) can be rewritten in terms of the
Problems 467
13.6 Summary
This chapter covers the main concepts of cellular networks with multiantenna
base stations. We have primarily focused on the uplink part of the network
in part because the uplink is better understood from an information-theoretic
perspective but also because downlink typically uses orthogonal multiple-access
techniques such as orthogonal CDMA, which is technically challenging on the
uplink because signals pass through different channels from each mobile user
to the base station. Additionally, since the base station can accurately control
the received power at each mobile device, it can perform accurate power control
since it draws from the same total power budget for each mobile node. We have
also focused on aspects of cellular networks relating to adaptive receivers and
in particular we have focused on the distribution of users in space which not
typically covered in texts. We refer the interested reader to other texts that
focus on more practical aspects of cellular networks such as more sophisticated
power control and channel equalization techniques such as Reference [328].
Problems
13.1 Consider a multiple-access channel with three transmitters. Show that the
sum capacity of this channel is achievable using TDMA and provide expressions
for the fraction of time used by each of the tree transmitters.
13.2 Show that the sum capacity of the two user broadcast channel with addi-
tive Gaussian noise with power constraint P = P1 + P2 and channel coefficients
of h1 and h2 with ||h1 || < ||h2 || is given by Equation (13.14). You may wish to
use the optimization techniques described in Section 2.12.
468 Cellular networks
(a) Find the CDF of the spectral efficiency of a random link in this system.
(b) Using the previous result and the fact that the integral of 1 minus the CDF
equals the mean of a positive random variable, compute the mean spectral
efficiency of this system.
(c) Assuming that a reuse factor of K is required for the assumption of no
out-of-cell interference to hold, compare the mean area spectral efficiency
using the asymptotic analysis of and a reuse factor of 1 in (13.143) with
your answer in the previous section. Make sure you take into account the
penalty on the area spectral efficiency due to the reuse factor of K.
14 Ad hoc networks
14.1 Introduction
may not receive data while transmitting and vice versa. This assumption arises
because simultaneous transmission and reception of signals is physically very
difficult for wireless systems due to the large discrepancy between the trans-
mitted and received signal powers, which would result in most of the dynamic
range of analog-to-digital converters at a receiver being taken up by its own
transmit signal. By using multiantenna technology, however, this overwhelming
self-interference can be reduced to manageable levels, resulting in feasible oper-
ation of simultaneous transmit and receive systems such as the system reported
in Reference [37].
√
The per-link upper bound of O (1/ n) from [132] was shown to be achievable
for random traffic patterns by Franceschetti et al. [101] using techniques from
percolation theory. They show that as the number of nodes increases in a fixed
area, high throughput paths spanning the network naturally form. These paths
have high throughput because they contain nodes that are separated from one
another by small distances. Packets going from a source to a destination far away
are then routed through these high-throughput paths. Routing packets in this
√
manner is shown to support a per-link capacity scaling of O (1/ n). Ozgur et al.
[240] proposed a hierarchical cooperation scheme in which a network is divided
into subnetworks which are further divided into sub-subnetworks and so on. This
hierarchical division is combined with distributed MIMO communications where
multiple nodes in a subnetwork cooperate as a virtual antenna array to transmit
data over long distances. Figure 14.1 illustrates how nearby nodes are used in
a link between the node labeled S and the node labeled D. Each square in the
figure represents a sub network in which the same virtual array scheme is used.
Ozgur et al. found that with sufficient levels of hierarchy, it is possible to achieve
472 Ad hoc networks
a per-link capacity scaling of O (n ) where > 0 can be made arbitrarily small
by using sufficiently many hierarchical levels.
Franceschetti et al. [102] used a degrees-of-freedom argument that arises from
electromagnetic propagation and bounded the per-link rate of ad hoc wireless
√
networks by O (log n)2 / n . They used a cut-set bound (for example, see [68])
to derive this result. The cut-set bound essentially states that the sum of all the
individual data rates achievable with arbitrarily low probability of error between
any set of source and destination nodes is less than or equal to the capacity
of a MIMO link where the source nodes act as a unified transmitter and the
destination nodes act as a unified receiver. The term cut-set is used to refer to
the partitioning of nodes in a network into a set of transmitters and receivers
where the boundary between the two sets is known as the cut. In Reference [102],
a circular network is cut into concentric circles. The authors showed that a MIMO
link formed by nodes in the inner circle communicating with nodes in the outside
√
of the circle has the number of degrees of freedom bounded by O( n log n) which
limits the capacity of the MIMO link formed between nodes on one side of the
cut to the other. Adding up the capacities of all the links between
nodes inside
√
the inner circle and outside the outer circle leads to a O (log n)2 n scaling
law on the total of the capacities of the n links in the network which in turn
2
√
leads to the per-link scaling law of O (log n) / n . The discrepancy between
this result and others such as the Ozgur result are noted by Franceschetti et al.
to be artifacts of unrealistic channel models in these other works.
(1) rj ≤ ν, where ν is a threshold that determines the maximum distance over
which a link can successfully be closed, that is, a link is successful only if the
source and destination are sufficiently close to each other;
(2) there are no other nodes transmitting in a circle of radius (1 + Δ)ν around
node j. The term Δ represents a guard zone around the given receiver.
For the physical model, a link is considered successful if the signal-to-interference-
plus-noise ratio (SINR) exceeds a defined threshold. An additional significant
contribution of this work is the introduction of transport capacity as a metric for
the performance of an ad hoc wireless network. The transport capacity of a link is
the distance-weighted throughput capacity of a link and is the product of the data
rate (bits/second) and distance over which the bits in the link are transported
(meters). Transport capacity thus has nominal units of bit-meters/second.
14.1 Introduction 473
Hierarchical cooperation
In this section, we briefly describe the hierarchical cooperation scheme of Ozgur
et al. Please refer to Reference [240] for a complete description. Suppose that
n randomly selected source–destination pairs wish to communicate in a square
network of fixed area A in which there are n nodes, that is, each node is a
source as well as a destination. The channel between a pair of nodes separated
by distance r is modeled by a single coefficient of the form r− 2 e−iθ , where α is
α
the path-loss exponent and θ is uniformly distributed from zero to 2π, and are
independent between all pairs of nodes.
They prove that if there exists a communication strategy with network through-
put K nb , then there exists another strategy where the throughput is at least
1
Kj n 2 −b with 0 ≤ b < 1, with high probability (that is, approaching 1 as n → ∞).
474 Ad hoc networks
node in the cluster. Thus, the total number of bits exchanged is nsy m Q M 2 .
We assume that this exchange of information uses the communication strategy
which achieves a throughput of K nb for a network with n nodes. An additional
factor of two occurs to handle the case where source and destination clusters are
adjacent.
The total throughput in the network T (n) for this system is now
nM L
T (n) ≥ . (14.5)
18 M 2−b L/K0 + 2 nsy m n + 18 nsy m Q M 2−b /K
1
By setting M = n 2 −b , Equation (14.5) is maximized yielding the following bound
on the total throughput
L 1
T (n) ≥ n 2 −b . (14.6)
18L/K + 2nsy m + 18nsy m Q/K
Besides capacity scaling laws, one can also analyze the performance of ad hoc
networks with multiantenna nodes using specific system assumptions. Much work
has been done in characterizing the achievable rates of multiantenna links in
ad hoc wireless networks with specific receiver architectures [123, 149, 9, 165,
194, 321]. While this type of approach may not give a great deal of insight into
the ultimate performance limits of such systems, they are immensely useful in
practical scenarios, and are particularly attractive as closed-form results can
often be obtained. For the remainder of this chapter, we shall focus on analyses
476 Ad hoc networks
M
{∞}
c1 = log2 (1 + nαr /2 λj P1j γ1 β), (14.7)
j =1
M
1
H(x) = dτ f (τ ) Ψ(x/τ ) . (14.11)
M j =1
We shall now construct a spatially distributed network model where nodes trans-
mit with random powers that are not dependent on their distance from the rep-
resentative receiver and are distributed spatially. We shall show that an appro-
priately normalized version of the spectral efficiency converges with probability
1 to an asymptotic limit as given in Equation (14.7).
478 Ad hoc networks
few approximations can be made to yield additional insight into how the various
factors contribute to the limiting SINR. From Problem 14.4, we note that if
n/nr is very large, that is, the number of nodes in the network is much larger
than the number of antennas per receiver, the transmit power limit per node
implies that the second term on the left-hand side of Equation (14.15) is small.
Furthermore, we assume that the thermal noise power is small, which implies
that the third term on the left-hand side of Equation (14.15) is small. Using
these approximations, we have
2π 2 ρ (Gt β) α 5 α2 6
2 M
2π
Pj csc ≈1
α j =1
α
⎡ ⎤
α /2
1 ⎣ α 2π ⎦
β≈ 5 2 6 sin , (14.16)
Gt 2π 2 ρ "M Pα α
j =1 j
which when substituted into Equation (14.14) yields the normalized SINR in the
interference-limited regime on the th stream η {} ,
⎡ ⎤
α /2
{∞} α 2π ⎦
η {} ≈ λ1 P1j r1−α ⎣ "M 5 2
6 sin
2π 2 ρ Pjα α
j =1
⎡ ⎤α /2
{∞} α 2π ⎦
= λ1 P1j ⎣ "M 5 2
6 sin . (14.17)
2π 2 ρr12 Pj α α
j =1
α α /2 α /2
Defining Gα = 2π sin 2π
α , rescaling the normalized SINR by nr and
summing the contribution from M streams yield the following approximation
for the spectral efficiency of link 1 in the interference-limited regime when nr is
large,
⎛ ⎡⎤ α2 ⎞
M
⎜ {∞}
log2 ⎝1 + λ1 P1j Gα ⎣
nr ⎦ ⎟
c1 ≈
2
"M 5 α2 6 ⎠. (14.18)
=1 πρr1 j =1 Pj
For the 2-class model with link 1 assigned to the first class Equation (14.18)
gives a spectral efficiency c2c
1 of
⎛ ⎡ ⎤α /2 ⎞
M
⎜ {∞} nr ⎟
1 ≈
c2c log2 ⎝1 + λ1 P1 Gα ⎣ 2 2
⎦ ⎠ .
=1 π ρ r1 q M P1 + [1 − q] r P2
2 α α
(14.20)
4 2
nr/(πρ r1) = 5
3 nr/(πρ r2) = 4
1
nr/(πρ r2) = 3
2 1
2
nr/(πρ r1) = 2
1
2
nr /(πρ r1) = 1
0
0 0.2 0.4 0.6 0.8 1
Power Allocation to Stream 1
Figure 14.3 Approximate mean spectral efficiency vs. power allocated to one stream of
a two-stream system. Note that the optimal power allocation is to assign all power to
one stream or divide the power equally between two streams.
Note from Equation (14.23) that increasing the number of streams M reduces
the quantity inside the log function while increasing the quantity outside the
log function. By relaxing the integer requirement on M , the optimal number of
transmit streams (in the regime where Equation (14.23) is valid) is
α /2
arg max n
M opt
= 0 1 x log 1 + Gα r
,
2
x ∈ floor
nr
2 , ceil
nr
2
xπ ρ r12
K α π ρ r1 K α π ρ r1
(14.24)
where
2/α
2π 2π α
Kα = csc −1 − , (14.25)
α α 2W0 − α2 e−α /2
482 Ad hoc networks
and where W0 (z) is the Lambert’s W function (see Section 2.14.4). This opti-
mization is the subject of Problem 14.6.
If we substitute the integer-relaxed optimum number of streams from Equation
(14.25) into Equation (14.23), we find the mean spectral efficiency copt under
the optimal number of equal-power streams is
% & nr
opt
c ≈ Kα , (14.26)
π ρ r12
where
1
Kα = log 1 + G α (K α /2
) . (14.27)
Kα 2 α
Thus, the mean, per-link spectral efficiency can grow linearly with the number
of antennas, and approximately constant mean spectral efficiency can be main-
tained if the number of receiver antennas increases linearly with user density.
Comparison of systems with transmit link CSI and without transmit CSI
In Figure 14.4, the percentage increase in mean spectral efficiency with transmit
link CSI versus the quantity πρr12 which can be interpreted as the average number
of interferers closer to the receiver than its target transmitter and was defined as
the link rank in Reference [123] is shown. Note that the gain in using transmit
CSI is highly dependent on the quantity π ρ r12 . For instance, for π ρ r12 = 6, and
two transmit streams transmit link CSI provides a twofold increase in spectral
efficiency. The increase in mean spectral efficiency can be greater than threefold
for high-rank links. These results indicate that a significant (but not orders of
magnitude) increase in spectral efficiency is possible with link transmit CSI.3
Thus, we see that several-fold increases in spectral efficiency are possible using
transmit link CSI, especially in networks with long links, user densities, or both.
Hence, transmit link CSI can be useful in environments which are very stable
over time whereby receivers can feed back channel estimates infrequently to
transmitters.
350
1 Stream
2 Streams
200
150
100
0 5 10 15 20
Mean no. of interferers closer than transmitter πρ r2
1
Figure 14.4 Percentage increase in mean spectral efficiency of systems with transmit
link CSI over systems without transmit CSI versus πρr12 for nr = 12 antennas per
node.
That is, H̃ consists of the channel matrices between transmitter and all unin-
tended receivers. The SLNR associated with link- is defined as follows
||Hii w ||2
SLNR = "n , (14.29)
k =1,k = ||Hik w || + nr σ
2 2
The summation in the denominator corresponds to the sum of the signal powers
due to transmitter as observed at all unintended receivers plus the total noise
observed at all nr receive antennas. Using some matrix manipulations Equation
(14.29) can be rewritten as
||Hii w ||2
SLNR = . (14.30)
||H̃ w ||2 + nr σ 2
As shown in Reference [267], the transmit weights w are given by the eigenvector
corresponding to the largest eigenvalue of the matrix:
−1
Hii H†ii H̃ H̃† + σ 2 nr I .
In this section we discuss the linear receivers discussed in the previous chapter
for cellular networks in the context of spatially distributed ad hoc wireless net-
works. The derivations of the previous chapter will be used in this section with
appropriate modifications.
where r, θ, , and k are dummy variables. The integrals in the above equation
can be evaluated as follows:
R 2π R
1 1
dr dθ − 1 r = 2π dr −1 r
0 0 1 + r−α xr1α 0 1 + r−α xr1α
R
2 2
= −πr2 + πr2 2 F1 1, − ; 1 − ; −r−α xr1α . (14.32)
α α r =0
When R = ∞, that is, the limit as the radius of the circle goes to infinity, the
double integral in Equation (14.32) can be directly evaluated to yield
R 2π
1 2π 2 2 −α
α − α π (2 − α)
dr dθ −1 r = (xr1 ) csc .
0 0 1 + r−α xr1α α α
14.3 Linear receiver structures in spatially distributed networks 485
Using this result and Equation (13.65), the CDF of the SINR is found in Refer-
ence [9] to be given by
k
2
r −1
n ρ Kα x α r12 + σ 2 x r1α 2
Pr(SINR ≤ x) = 1 − exp −ρ Kα x α r12 − σ 2 x r1α ,
k!
k =0
(14.33)
where the parameter
2 π Γ(2/α) Γ(1 − 2/α)
Kα = .
α
Note that taking the radius of the circle to infinity while maintaining a constant
density of interferers ρ results in a Poisson point process (PPP) of interferers
discussed in Section 3.4 with density ρ. Hence Equation (14.33) is the CDF of
the SINR of a link of length r1 in a Poisson field of interferers with density ρ
and Rayleigh fading.
where
2s2/α P 2/α 2 2+α 2 −2 + α −α
ΦP (s) = Γ − Γ + 2 F 1 1, − , , −R sP .
αR2 α α α α
(14.35)
If we now take the radius of the circular cell to infinity, we find that the Laplacian
of the total interference equals (see for instance [133]),
2 2 2
ΦI (s) = exp −π ρΓ 1 + Γ 1− sα . (14.36)
α α
Equation (14.36) corresponds to the Laplacian of the interference due to trans-
mitters distributed according to a Poisson point process on the plane with density
486 Ad hoc networks
ρ and Rayleigh fading. This Laplacian can then be used to find the outage prob-
ability for the antenna selection receiver with nr receive antennas as derived in
Section 13.3.3 and repeated below:
nr
nr xr1α 2 xr1α
Pr {SINR ≤ x} = (−1) exp −k
k
σ ΦI k . (14.37)
k P1 P1
k =0
The outage probability of the matched-filter receiver can also be found using the
Laplacian as derived in Section 13.3.3, and is given below
k
xr 1α
r −1
n k
P1 d −sσ 2
Pr {SINR ≤ x} = 1 − (−1)k e ΦI (s)
k! dsk
k =0 x r 1α
s= P1
k
r −1
n
−xr1α dk −sσ 2
1
=1− e ΦI (s) . (14.38)
k! P1 dsk x r 1α
k =0 s= P1
Note that Equations (14.37) and (14.38) were first given in Reference [148].
d 2G1 2 −1 2
ΦI (s) = − s α exp −G1 s α ,
ds α
d2 2G1 −G 1 s α2 2 −2 α 4 −2 α
ΦI (s) = e (α − 2)s α + 2G 1 s α ,
ds2 α2
d3 4G1 −G 1 s α2 2 −3 α 4 −3 α
2 6 −3 α
ΦI (s) = − e (2 − 3α + α 2
)s α + (3α − 6)G1 s α + 2G 1 s α .
ds3 α3
The equations above can be combined with Equation (14.38) to find the CDF of
the outage probability with the matched filter receiver. The outage probability of
14.4 Interference alignment 487
1
nr = 2
0.8
nr = 4
Outage Probability nr = 8
0.6
nr = 16
0.4
0.2
Figure 14.5 Outage probability versus SINR of a MMSE receiver in a Poisson field of
interferers nr = 2, 4, 8, and 16 receiver antennas. For this Figure, the density of
interferers was ρ = 10−3 nodes/m2 , the length of the representative link r1 = 35.7 m,
such that πρ r12 = 1, and the path-loss exponent α = 4.
the antenna selection receiver is given directly in Equation (14.37). The outage
probabilities for the linear MMSE, matched-filter and antenna-selection receivers
are shown in Figure 14.6 for nr = 2 and nr = 4 receiver antennas. The right-
most plot for each receiver type corresponds to the nr = 4 case and the left-
most corresponds to nr = 2. Notice that the antenna-selection receiver and the
matched filter have similar performance for small numbers of antennas but the
difference increases significantly going from two to four antennas as expected.
0.8
Matched
Outage Probability
Filter
0.6
Antenna
Selection
0.4 Receiver
MMSE
0.2
0
−40 −30 −20 −10 0 10 20
SINR (dB)
Figure 14.6 Outage probability versus SINR of linear receivers in a Poisson field of
interferers nr = 2 and 4 receiver antennas. The density of interferers ρ = 10−3
nodes/m2 , the length of the representative link r1 = 35.7, such that πρ r12 = 1, and
the path-loss exponent α = 4.
invertible, which holds with probability 1 if the channel coefficients are sampled
from a continuous distribution. We shall now show that transmitter 1 can send
two independent messages to receiver 1, and transmitters 2 and 3 can send one
independent message each to receivers 2 and 3 respectively. Thus, four indepen-
dent messages can be sent in three time slots without interference.
Suppose that transmitter 1 encodes its messages using two vectors v11 and v12 ,
and transmitters 2 and 3 use vectors v21 and v31 . That is to say, if transmitter
1 wishes to send the values s11 and s12 to receiver 1, it sends the entries of the
3 × 1 vector
over the three antennas of the transmitters. Similarly, suppose that transmitters
2 and 3 wish to send the values s21 and s31 to receivers 2 and 3 respectively.
They respectively transmit entries of the following vectors on the three antennas,
The interfering signals from transmitter 3 and one of the interfering signals from
transmitter 1 (the signal associated with s11 in this case) are aligned at receiver
2 if
Similarly, we can align interference from transmitter 2 and one of the interfering
signals from transmitter 1 (the signal associated with s12 in this case) at receiver
3 if
Equations (14.42), (14.43), and (14.44) can simultaneously be satisfied using the
following choices of vj k . We start by setting v21 to the all-ones vector,
⎛ ⎞
1
v21 = ⎝ 1 ⎠ . (14.45)
1
Note that this choice of an all-ones vector is arbitrary. With this choice of v21 ,
we can find v31 by solving Equation (14.42), which yields
⎛ ⎞
1
v31 = H−1 13 H12
⎝ 1 ⎠. (14.46)
1
We can now solve for v11 by substituting the above expression for v31 into
Equation (14.43) which yields
⎛ ⎞
1
v11 = H−1 −1
12 H23 H13 H12
⎝ 1 ⎠. (14.47)
1
Finally, we can solve for v12 by substituting for v21 into Equation (14.44)
⎛ ⎞
1
v12 = H−1
31 H32
⎝ 1 ⎠. (14.48)
1
We should note here that all transmitters need to know the channel coefficients
of all receivers which will incur significant overhead in real systems. In Figure
14.7, interference alignment at receiver 1 is illustrated. The dashed arrows rep-
resent the signals of interest (that is, signals from transmitter 1) and the solid
arrows represent the interfering signals (that is, signals from transmitters 2 and
3). Observe that the two interfering signals lie on a single dimension at receiver
1, which leaves two other dimensions for useful signal communication.
In Figure 14.8, interference alignment at receiver 2 is illustrated. The dashed
arrows represent the signals of interest (that is, the signal from transmitter 2),
490 Ad hoc networks
and the solid arrows represent the interfering signals (that is, two signals from
transmitter 1 and and one from transmitter 3). Observe that the signal from
transmitter 3 and one of the signals from transmitter 1 (that is, the signal en-
coded with v11 lie on a single dimension). Together with the interfering signal
from transmitter 1, which is encoded with v12 , the total interference occupies
two dimensions, leaving an additional dimension for the useful signal.
In Figure 14.9, interference alignment at receiver 3 is illustrated. The dashed
arrows represent the signals of interest (that is, the signal from transmitter 3),
and the solid arrows represent the interfering signals (that is, two signals from
transmitter 1 and and one from transmitter 2). Observe that the signal from
transmitter 2 and one of the signals from transmitter 1 (that is, the signal en-
coded with v12 ), lie on a single dimension. Together with the interfering signal
from transmitter 1, which is encoded with v11 , the total interference occupies
two dimensions, leaving an additional dimension for the useful signal.
Hence, at each receiver, the interfering signals occupy two dimensions, leaving
an additional dimension for the desired signal. Thus, a zero-forcing receiver (see
Section 9.2.2) can be used to decode the signal without interference.
In this example, four concurrent transmissions are possible in a three dimen-
sional space. Extending this idea to 2 m + 1 dimensions, that is communication
Problems 491
Problems
14.1 Using the Ozgur hierarchical cooperation scheme, compare the bound on
the network throughput capacity given by Equation (14.6), for one and two
levels of hierarchy. In particular, use the bound to estimate the number of nodes
required in the network for the throughput capacity with two levels of hierarchy
to exceed the throughput capacity with one level of hierarchy. Your answer will
illustrate a weakness in using capacity scaling laws to estimate the performance
of practical wireless networks.
14.2 Consider a square wireless network of fixed area which is divided into
n uniform squares. In each of these squares, place a wireless node with uni-
form probability as shown in Figure 14.10, which illustrates a network with
n = 36 nodes. For simplicity, assume that the total interference in the network
is proportional to nα /2 and signal power decays with distance according to the
inverse-power-law model with path-loss exponent α > 2. Using these simplifying
492 Ad hoc networks
Additionally, the following is known about the upper regularized gamma function
for a positive real number q and integer L [364],
+
0, if q ≥ 1
lim Q(L, q L) =
L →∞ 1, if q < 1 .
Consider a homogenous Poisson network with a multiantenna receiver, and single-
antenna transmitters, with i.i.d. Rayleigh fading between all antennas.
(a) Ignoring the contribution of the noise and using the above properties of the
upper regularized incomplete gamma function, show that the SIR converges
in probability to a nonrandom limit if the number of antennas at the receiver
is increased linearly with node density.
Problems 493
(b) The result above suggests that it may be possible to scale ad hoc wireless
networks by increasing the number of receiver antennas with node density.
Discuss the feasibility of doing so.
14.4 Prove that the second term on the left-hand side of Equation (14.15) goes
to zero as n/nr → ∞.
14.6 Derive the integer-relaxed optimum number of streams for the multi-
stream transmissions in an ad hoc wireless network with multiantenna MMSE
receivers given in Equation (14.24).
Assume that the noise power is equal to P N −α at each antenna of the represen-
tative receiver and all nodes transmit with equal power in the network, with the
standard inverse-power-law path loss, and i.i.d., unit variance fading between all
antennas in the network. Show that βN = N −α SINR converges with probability
1 to a limit β as n, nr , R → ∞ such that n/nr equals a positive constant c,
and n = ρ R with ρ > 0 equal to a nominal density. Find an implicit expression
analogous to (14.15) that β needs to satisfy in this case. This problem is inspired
by results in Reference [122].
Coherence bands
Magnitude of channel
coefficient
Subchannels
Figure 14.11 Illustration of a block-frequency fading channel with six coherence bands
and twenty four subchannels.
In wireless and certain wired networks, multiple users share the same physical
medium. Data communication rates in networks can often be improved by us-
ing medium-access control (MAC) protocols, whereby multiple users share the
medium in a controlled manner such that the adverse effects of their interfering
signals is reduced. A general treatment of this topic can be found in Reference
[21]. The main reason for improved data rates with medium-access control is
that communication in noise typically tends to be at much higher data rates
than communication in interference if the data rates are a function of the signal-
to-interference-plus-noise ratio (SINR).
Earlier in the book, we introduced multiple-access schemes such as frequency-
division-multiple access (FDMA), time-division-multiple access (TDMA), code-
division-multiple access (CDMA) and space-division-multiple access (SDMA).
Each of these multiple-access schemes attempts to reduce interference by ensur-
ing that multiple links operate in orthogonal or approximately orthogonal spaces,
such as by time or frequency division. We did not, however, describe in much
detail how the assignments of frequency bands, time slots, or spatial dimensions
to users are made.
In cellular telephone networks, the assignments of links to time slots, frequency
bands, or codes can be made by the base station, which controls the behavior of
the mobile units in its own cell. The network topology (where there is a central
control node) and the connection-oriented nature of telephone links where links
stay operational for long periods (seconds or minutes) make this an attractive
approach.
There are, however, many scenarios in which communication is naturally
bursty, such as internet communications. In such systems, communications last
on the order of milliseconds and each user spends a large fraction of its time
not communicating. Communication data that are naturally bursty lend them-
selves well to a simple form of TDMA, where nodes transmit data when they
need to. Collisions will not be likely if the nodes in the network transmit data
very infrequently. Protocols that rely on this burstiness are generally termed
contention protocols. For the purpose of simplicity, we limit our discussions to
496 Medium-access-control protocols
0.4
0.35
0.3
0.25
Ge−G
0.2
0.15
0.1
0.05
0
0 1 2 3 4 5 6
G
Figure 15.1 Probability of successful transmission versus packet arrival rate for slotted
ALOHA.
T1 R2 T2
R1
Range of T1
Range of T2
be shown to equal 2e 1
≈ 18% as is done in Reference [3]. The factor of two loss
arises from the fact that, when a given transmission begins, there cannot be a
transmission initiated in the packet duration preceding the current transmission,
in addition to the fact that there cannot be another packet transmission initiated
during the transmission of the given packet.
T2 R2
T1
R1
Range of T1
Range of T2
3 1
2
6
the medium for some duration of time, and most collisions are limited to the
initiation packets, which are short so that less time is wasted on collisions.
The basic sequence of transmissions for CSMA/CA utilizes two types of control
packets to initiate a transmission, the request-to-send (RTS) packet and the
clear-to-send (CTS) packet. The contents of the RTS packet are as follows.
The CTS packet contains the same information as the RTS packet, except for a
different message ID. The basic sequence of a link is as follows.
(1) An RTS packet is transmitted by a node that desires to transmit data. This
packet informs all nodes within range of the transmit node that it is about
to start transmission and that all nodes except for the intended destination
of its transmission should not agree to receive packets for the duration of
the reservation.
(2) A CTS packet is transmitted by the destination node of the preceding RTS.
This packet informs all nodes within range of the destination node that it
is about to start receiving a data packet and that all other nodes within its
vicinity should not transmit anything for the duration of the reservation.
(3) After the CTS packet is received by the transmit node, it can then start
transmitting its data packet.
(4) After the duration of the reservation is completed, other nodes can then
initiate their own transmissions.
To illustrate, consider the network in Figure 15.4 in which the unfilled circles
represent nodes that have packets to transmit. Suppose that node 1 wishes to
transmit a packet to node 3. Consider the timing diagram in Figure 15.5 where
15.3 Carrier-sense multiple access (CSMA) 501
the numbers on the left indicate the nodes and the packets are represented by
rectangles with the labels indicating the type of packet. The arrows indicate the
propagation of the source packet to the different nodes. For instance, the RTS
packet transmitted by node 1 first arrives at node 2, followed by nodes 3 and 4.
The sequence of events is as follows.
(1) Node 1 initiates a link by transmitting an RTS message with node 3 as the
destination.
(2) The RTS packet from node 1 is received after some propagation delay at
nodes 2, 3, and 4.
(3) Upon receiving the RTS packet from node 1, node 2 waits to see if it can
detect a CTS from node 3. If node 2 detects a CTS packet from node
3, it knows not to transmit because node 3 is about to receive a data
transmission.
(4) Node 3 receives the RTS from node 1 and makes a decision on whether or
not to accept the transmission from node 1.
(5) Node 4 knows not to accept any transmissions after receiving the RTS from
node 1 as it now knows that a data transmission is about to begin in its
vicinity.
(6) Node 3 decides to accept the data packet from node 1 and transmits a CTS
packet.
(7) Node 2 receives the CTS packet from node 3, indicating to node 2 that
502 Medium-access-control protocols
5
4
2 3
Figure 15.6 CSMA/CA network that results in a data packet collision with timing
diagram in Figure 15.7.
node 3 is within range of it and and is about to start receiving data. Hence,
node 2 knows not to start a transmission of its own.
(8) Node 4 receives the CTS packet from node 3 and knows not to initiate a
transmission.
(9) Node 1 receives the CTS packet from node 3 so it knows that its RTS has
been accepted and that it can now transmit a data packet to node 3.
(10) Node 1 then initiates its data transmission.
(11) After the duration of the reservation expires, node 2 is now free to transmit
data by first sending an RTS packet.
Note that while the RTS/CTS exchange reduces the probability of collision on
the data packet, it is still possible to have data packet collisions in CSMA/CA
systems, as illustrated by the network of Figure 15.6 and the associated tim-
ing diagram in Figure 15.7. Suppose that in Figure 15.6, node 5 cannot detect
transmissions from nodes 1, 2, or 3 either because it is too far away from those
nodes or because of obstacles. Additionally, suppose that node 4 cannot detect
transmissions from nodes 1 or 2. The associated timing diagram is illustrated in
Figure 15.7 where dropped packets (either due to nodes being out of range or
collisions) are represented by the × symbol.
Suppose that node 1 wishes to transmit a packet to node 3 and node 4 wishes
to transmit a packet to node 5. The following is the sequence of transmissions
illustrated by Figure 15.7, which results in a collision on the data packet trans-
mitted by node 1 intended for node 3.
(1) Node 1 initiates transmission by sending an RTS packet.
(2) Node 3 successfully receives the RTS packtet from node 1.
(3) Node 4 does not receive the RTS packet from node 1 because it is out of
range.
(4) Node 5 does not receive the RTS packet from node 1 because it is out of
range.
(5) Node 4 initiates its own transmission by sending an RTS message.
(6) Node 5 successfully receives the RTS packet from node 4.
15.3 Carrier-sense multiple access (CSMA) 503
2
RTS CTS Data RTS t
LOST PACKET
3
RTS CTS Data t
4
RTS CTS Data t
5
RTS CTS Data t
Figure 15.7 CSMA/CA timing diagram with data packet collision. Arrows indicate
transmissions and crosses indicate lost data or control packets.
(7) Node 2 does not receive the RTS packet from node 4 because it is out of
range.
(8) Node 3 initiates transmission of a CTS packet accepting the RTS from node
1 at the same time that the RTS from node 4 arrives at node 3 and, hence,
does not receive the RTS from node 4.
(9) Node 5 transmits a CTS packet accepting the RTS from node 4.
(10) Node 4 receives the CTS packet from node 5.
(11) Node 2 receives the RTS packet from node 3, but does not receive the RTS
from node 5 because it is out of range.
(12) Node 3 does not receive the CTS packet from node 5 because it is out of
range.
(13) Node 1 does not receive the CTS packet from node 5 because it is out of
range.
(14) Node 1 receives the CTS packet from node 3.
(15) Node 4 initiates data transmission to node 5.
(16) Node 3 receives data packet from node 4.
(17) Node 1 does not detect data transmission from node 4 because it is out of
range and initiates data transmission to node 3.
(18) Node 3 suffers a dropped data packet because the data packet intended for it
from node 1 arrives when the data packet from node 4 is being transmitted.
504 Medium-access-control protocols
Note that the RTS/CTS packet exchange including the associated delays make
the CSMA/CA scheme useful only if the size of the data packet is significantly
larger than the duration of the RTS/CTS exchange. A closed-form analysis of
the efficiency and throughput of CSMA/CA is difficult and is highly dependent
on packet sizes and propagation delays. Hence, most studies on the efficiency of
such networks are done empirically, either using hardware or simulation.
15.5.1 Introduction
The ability of antenna arrays to suppress interference can be exploited in the
context of MAC protocols, allowing users to be separated spatially rather than
15.5 Space-division multiple-access (SDMA) protocols 505
Interferer
Target
transmitter
Interferer
M
z = h1 s1 + h s + n , (15.1)
=2
2
RTS1 CTS RTS3 RTS4 t
3
RTS1 RTS3 RTS4 Data3 t
4
RTS1 RTS3 RTS4 Data4 t
Figure 15.9 Timing diagram for SDMA protocol. Arrows indicate transmissions.
from the timing diagram). The sequence of events leading up to successful data
transmissions is as follows.
(1) Node 1 transmits an RTS packet containing training data that can be used
by node 2 and any other node that wishes to receive a transmission.
(2) Node 2 receives the RTS packet from node 1 and transmits a CTS packet
indicating that it accepts node 1’s transmission.
(3) Nodes 3 and 4 transmit RTS messages to other nodes, initiating links of their
own.
(4) Node 2 estimates the channel between nodes 3 and 4 to itself from the RTS
messages sent by those nodes.
(5) Node 1 transmits its data packet.
(6) Nodes 3 and 4 transmit data packets to their respective destinations.
(7) Node 2 uses the estimated channels from nodes 1, 3, and 4 to itself to perform
zero-forcing.
(8) The destinations of the packets from nodes 3 and 4 use the channels esti-
mated from the respective RTS packets to perform zero-forcing.
Note that this protocol requires the RTS and CTS packets to be transmitted
without collisions by all nodes. The data packets can, however, be transmitted
simultaneously as the required channel estimations can be performed by using
the RTS/CTS exchanges. It is also possible to not require destination nodes to
transmit CTS packets as they technically do not need to indicate to adjacent
nodes that they are receiving data as in the CSMA/CA protocol because the
destination nodes can perform zero-forcing. However, the CTS packets could be
useful to indicate acceptance of a link request.
15.5.3 SPACE-MAC
The SPACE-MAC protocol described in Reference [242] implements a more so-
phisticated version of the simple SDMA protocol described above. In SPACE-
MAC, RTS/CTS exchanges are used to request and accept transmissions as well
as to estimate channel parameters to perform nulling. Unlike the simple proto-
col described in Section 15.5.2, however, SPACE-MAC allows nodes to initiate
links during ongoing data transmissions, using antenna arrays at the transmitters
to place nulls in the directions of nodes receiving data during their RTS/CTS
handshakes.
This ability can be accomplished as follows. Consider a network of four nodes,
each with an antenna array with node 1 wishing to transmit to node 2, and node 3
wishing to transmit to node 4. Figure 15.10 is a timing diagram for the SPACE-
MAC protocol with four users. The dashed lines represent transmissions with
nulling, i.e., the transmitter of the packet places nulls in the direction(s) of the
receivers to which the dashed arrows connect. Hence, the dashed lines represent
very weak signal paths that are assumed to not disrupt ongoing receptions.
508 Medium-access-control protocols
2
RTS1 CTS1 Data1 t
3
RTS1 CTS1 RTS3 CTS4 Data3 t
4
RTS1 CTS1 RTS3 CTS4 Data3 t
Figure 15.10 Timing diagram of SPACE-MAC protocol. The bold arrows indicate
nulls which are placed in the direction of node 2 to avoid interfering with node 2’s
reception.
(8) Node 2 receives the CTS from node 1 and initiates data transmission with
beamforming.
Suppose an additional pair of nodes, nodes 5 and 6, were part of the network
but only have two antennas at each node. They will not be able to initiate a
link once nodes 1 and 3 have initiated their links because they do not have
sufficient degrees of freedom to null transmissions to both nodes 2 and 4 (that
is the receiver sides of the preestablished links).
array at one of the nodes. The node endowed with the antenna array acts as
a base station or access point, and aids in relaying packets from one node to
another as illustrated in Figure 15.11. Acknowledgment packets and timeouts
can be used to detect dropped packets in this protocol.
The base station uses its antenna arrays with multiple sets of weights, each
tuned to receive a packet from a particular source node while placing a null in the
directions of the other nodes. By doing this, multiple packets can be successfully
received by the base station. The base station then forwards packets to their
respective destination nodes by using an orthogonal communication protocol.
In order to design a set of weights that simultaneously focuses the signals from
a target transmitter while placing a null in the directions of the others, the base
station needs to estimate the channels between the antenna of the target and its
own antennas, as well as the covariance matrix of the aggregate received signals
from all the interfering transmissions and the target transmitter. The protocol
specifies a method for the base station to acquire these parameters from each
transmitter.
Each time slot is divided up into two intervals. The first Tu time units is called
the uncertainty window. Nodes that wish to transmit must begin their transmis-
sions at a random time in the uncertainty window. Each transmitted packet
is made of three consecutive intervals PN1, PN2, and PN3, during which the
same pseudorandom sequence is transmitted followed by data carrying samples.
For this discussion, we include any header, destination and other housekeep-
ing bits in the data. The duration of the pseudorandom sequence is TP N and
Tu + = TP N where is a small positive number. That is to say, the duration
of the PN transmission is slightly greater than the length of the uncertainty
window. Thus, all other nodes are guaranteed to be transmitting a PN sequence
during the transmission of the second PN sequence by any given transmitting
15.5 Space-division multiple-access (SDMA) protocols 511
Slot
Uncertainty
window
Time
Node 2 PN1 PN2 PN3 Data
Channel and
Covariance matrix
estimation
Base station
node. This is illustrated in Figure 15.12, whereby during the transmission of PN2
by node 1, node 2 is also transmitting a pseudorandom signal. Similarly, during
the transmission of PN2 by node 2, node 1 is also transmitting a pseudorandom
signal.
Suppose that the system is designed to support K simultaneous receptions.
The start of transmission is detected by a bank of K matched-filter receivers
at the base station that correlate the received signals with the pseudorandom
sequence. The presence of a pseudorandom sequence will manifest itself as a
sudden increase in the output of the matched filters. The start of a packet is
declared if the output of the matched filter exceeds some threshold. Suppose
that the matched filters are numbered 1, 2, . . . , K, where the kth matched filter
is used to detect the start of the kth packet. The output of the kth matched
filter is not compared against a threshold unless the k − 1st matched filter
has been triggered. This mechanism enables the bank of matched filters to de-
tect packets that begin at different times. Note that pseudorandom sequences
will not correlate if offset in time. Therefore, even though the PN sequence of
node 2 is transmitted during the time that the first PN sequence is transmitted on
node 1, it will not contribute significantly to the output of the first matched fil-
ter. The channel and interference covariance matrix estimations are performed in
the interval TP N immediately proceeding the detection of a packet. For a more
detailed discussion of synchronization issues please see Chapter 17.
512 Medium-access-control protocols
The channel and covariance matrix estimations are performed as follows. Sup-
pose that the sampled vector of signals received on each antenna of the base
station at time n is z[n]. Suppose that the jth packet is detected at time nj
and the th pseudorandom value is p . Then, the channel vector estimation for
the first node is performed by weighting the received signal vector by the pseu-
dorandom value and averaging for nr samples. Note that the duration of the
ns samples must be less than TP N . The estimated channel vector ĥ1 ∈ Cn r ×1
between the antenna of node 1 and the antennas of the base station is then given
by
1
ns
ĥ1 = z[n1 + ] p . (15.7)
ns
=1
Note that the signal contribution from node 2 will average out if ns is large.
The covariance matrix of the signals received at node 1 is estimated by using a
sample covariance matrix as follows:
1
ns
R̂1 = z[n1 + ] z† [n1 + ] .
ns
=1
Then, an approximate MMSE receiver to detect the packet from node 1 can
then be found as follows:
w1† = a ĥ†1 R̂−1
1 ,
where a is a scale factor. The data samples from node 1 can then be estimated
as
w1† z[n1 + 2ns ] ,
where npn is the length of the pseudorandom sequence in samples.
The sequence shown in Figure 15.12 can be described in words as follows.
(1) The base station is continually monitoring the output of its first matched-
filter.
(2) Node 1 commences transmission.
(3) Node 2 commences transmission.
(4) At the end of the pseudorandom sequence transmitted by node 1, the matched
filter at the base station detects the presence of the pseudorandom sequence.
(5) The base station commences channel and covariance matrix estimation to
compute the weights for the packet from node 1.
(6) At the end of the pseudorandom sequence transmitted by node 2, the matched
filter at the base station detects the presence of the pseudorandom sequence.
(7) The base station commences channel and covariance matrix estimation to
compute the weights for the packet from node 2.
(8) When weight estimations are complete, the base station applies the weight
vector for node 1 to detect signals from node 1 and the weight vector for
node 2 to detect signals from node 2.
15.5 Space-division multiple-access (SDMA) protocols 513
There are several failure modes for the Ward protocol. Since multiple trans-
missions can be simultaneously received, the traditional definitions of packet
collision do not hold. Packets are not successfully received if any of the following
occur.
(1) Insufficient degrees of freedom at the receiver. With nr antennas at the re-
ceiver, the base station can only place nr − 1 nulls. Hence, at most nr − 1
simultaneous transmissions are possible. If more than nr − 1 packets are
transmitted during any one slot, it is highly likely that none of the packets
will be received successfully as the base station will not be able to null the
interfering packets.
(2) Insufficient resolution at the receiver array. Even though there are sufficient
degrees of freedom at the receive array, if the channel vectors of two transmit
nodes are close, the MMSE receiver may not be able to place a null in the
direction of one of the packets while focusing on the other. This will result
in poor SINR for the packets in concern and likely result in unsuccessful
reception of those packets. Note that packets from other nodes may still be
received successfully.
(3) Transmissions that commence within one sample of the pseudorandom
sequence. If transmissions from multiple nodes commence within a small
amount of time from each other, the pseudorandom sequences will signifi-
cantly contribute to each other and the matched filter at the base station may
detect a single packet even if there are multiple packets. This is illustrated
in Figure 15.13. The channel estimation algorithm will end up estimating
h1 + h2 .
A detailed analysis incorporating all these failure modes can be found in [337],
which shows that a throughput of approximately 3.6 packets per slot may be
successfully transmitted in a system with 10 element arrays and K = 6. Com-
pare this with a throughput of approximately 0.37 for slotted ALOHA without
antenna arrays. Note here that the overhead associated with the pseudorandom
sequence is assumed to be negligible compared to the data.
Slot
Uncertainty
window Transmissions commence within an
interval of 1 bit of the PN sequence
Base station
Figure 15.13 Timing diagram of the Ward protocol with a collision at the start of
transmission.
MIMA-MAC frame
links, where each receiver has at least two antennas, is illustrated in Figure
15.14.
The MIMA-MAC frame is divided into a contention period, training period,
data period, and acknowledgment period. Except for the data period, all other
periods are divided into two slots (or N slots if N simultaneous transmissions
are desired). The contention period is divided into contention slot 1 (CS1) and
contention slot 2 (CS2). During the contention slots, an RTS/CTS exchange as
described in Section 15.5.3 takes place. The link that succeeds in carrying out the
RTS/CTS exchange in CS1 will not participate in CS2, and the transmit side of
that link will send a training sequence in training slot 1 (TS1) and data during the
data slot. The receiver side of the link will transmit an acknowledgment during
ACK1 if it successfully decodes the data. Likewise, the transmit side of the link
15.5 Space-division multiple-access (SDMA) protocols 515
that succeeds in carrying out the RTS/CTS exchange during CS2 will transmit
a training sequence during training slot 2 (TS2) and data during the data slot
(simultaneous to the link that succeeded in CS1). The receiver side of that link
will transmit an acknowledgment packet during acknowledgment slot 2 (ACK2).
During TS1 and TS2, both receivers will estimate the channels between their
antennas and the antennas of the respective transmitting nodes. Hence, at the
conclusion of TS2, both receivers know the channel parameters between their an-
tennas and both transmitters. During the data slot, both transmitters send their
signals simultaneously, and each receiver performs beamforming to null interfer-
ence from the undesired transmitter while focusing on the desired transmitter.
The protocol also has mechanisms to minimize collisions during the RTS/CTS
exchange and a backoff method to reduce probability of transmissions when con-
gestion is high. The interested reader can consult the original publication of the
protocol in Reference [242] for further details.
The comparison between MIMA-MAC and conventional CSMA/CA can be
made by comparing TDMA to SDMA since conventional CSMA/CA is essen-
tially a form of TDMA. From a degrees-of-freedom perspective, SDMA does not
offer any performance benefit compared to TDMA. However, if we assume that
nodes have short-term power constraints, MIMA-MAC with two simultaneous
transmissions has twice the transmit power since two nodes can transmit data
simultaneously. In a conventional CSMA/CA protocol, only one node transmits
and, hence, has half the power.
NullHoc protocol
The NullHoc protocol is a medium-access control protocol that uses beamforming
in both transmit and receiver sides as opposed to MIMA-MAC, which only uses
receive-side beamforming. A key assumption of the NullHoc protocol is that
channels are reciprocal.
The NullHoc protocol provides for channel estimation and exchange of channel
information between any given node and its nearby nodes. The protocol requires
that the total available bandwidth B be divided into a control channel and a data
channel where a factor 0 < α < 1 is used to assign a fraction of the bandwidth
to the control channel. The protocol allows the channel to be partitioned either
in frequency or in code space where, in the latter case, the control and data
channels use orthogonal sets of codes.
The control channel uses a CSMA/CA-type protocol to reserve the data chan-
nel for the duration of the data channel communication. Assuming that node 1
wishes to communicate with node 2, the following exchange takes place prior to
transmission on the data channel.
(1) Node 1 sends a request-to-send (RTS) packet to node 2. The RTS packet
includes pilot signals to enable other nodes to perform channel estimation
and the set of weights it will use to receive the acknowledgment (ACK)
packet at the end of the data transmission.
516 Medium-access-control protocols
Channel estimations are performed by all nodes in the network using the pi-
lot training sequence. Beamformer weights at the transmit and receive sides are
performed using a zero-forcing-type algorithm specified in [225]. Simulations in-
dicate that NullHoc can have up to double the throughput of 802.11 when the
number of antennas is large [225].
STI-MAC
The simultaneous-transmissions-in-interference MAC (STI-MAC) protocol spec-
ifies methods for communications in the presence of interference using a com-
bination of multi-carrier CDMA (MC/CDMA) and SDMA with antenna arrays
and was introduced in Reference [289]. The most significant difference between
this protocol and the majority of the protocols described above is that it does
not depend on orthogonal communications; that is to say, the protocol allows for
communication in interference, which is accomplished by using a linear minimum-
mean-square-error receiver that combines the degrees of freedom provided by the
antenna array and the multi-carrier CDMA system. The MMSE receiver does
not completely remove interference but rather optimally balances interference
suppression with noise suppression to maximize the SINR. In contrast, proto-
cols such as SPACE-MAC, NullHoc and MIMA-MAC all depend on the antenna
arrays to completely null the interference.
Like the NullHoc protocol, the STI-MAC protocol depends on a control chan-
nel for session initiation and training, and a data channel for payload transmis-
sions. The protocol has provisions for estimating all the required channel parame-
ters and a novel protest mechanism that allows an ongoing link to protest against
a new user entering the network if the new user’s presence will compromise the
ongoing link. The control channel can be allocated in frequency or time.
Transmissions with and without channel-state information are possible in this
protocol with slightly differerent session initiation sequences for each. For sys-
tems with the control channel allocated in time, time is divided periodically into
data and control slots. The control channel is operated using a slotted-ALOHA
protocol.
The operation of STI-MAC in its most basic form, without channel-state in-
formation at the transmitter and no protest messages is illustrated in Figure
15.15. The dashed lines are used to indicate control channel slots. Node 1 wishes
to transmit data to node 2, and node 3 is currently receiving data from some
other node. The control channel transactions needed to establish a link between
15.5 Space-division multiple-access (SDMA) protocols 517
node 1 and node 2 are shown in Figure 15.15. Note that all data transmissions
cease during the control channel slots and commence once the control channel
period ends. The following sequence of transactions is depicted in Figure 15.15.
(1) Node 1 transmits a session initiation request packet that contains a prede-
termined training sequence and the address of its target receiver, node 2.
(2) Node 2 estimates the channel parameters between node 1 and itself using
the session initiation packet from node 1 and decodes the session initiation
packet. Node 2 also determines the SINR it expects to see during the data
channel if node 1 transmits. This is done by using the channel parameters es-
timated from node 1’s initiation request packet and an estimated interference
covariance matrix based on previous channel estimations.
(3) Node 3 estimates the channel between node 1 and itself using the session
initiation packet from node 1, and since it is in the midsts of receiving data,
it computes the SINR it would observe during the data channel slot if node
1 is transmitting. In this case, node 3 determines that its SINR will be
sufficiently high during the data channel even if node 1 transmits. It there-
fore does not send a protest message.
(4) Node 2 agrees to receive data from node 1 and sends an initiation response
message to node 1 indicating its acceptance. As with the CTS message in
CSMA/CA, this message also indicates to other nodes that node 2 is about
to receive data in the data slot.
(5) The three slots following the initiation response message from node 2 are
protest slots. During this time, any node with an already established link that
determines that a transmission by node 1 during the data channel will cause
its SINR to fall below an acceptable level can send a protest message that
will cause node 1 and node 2 to not commence their link in the next data slot.
(6) If no protests are heard, node 1 may begin transmission at the next data
channel.
(7) Node 3 adapts its MMSE receiver to compensate for the added interference
caused by node 1’s transmission during the data slot. This is accomplished
by augmenting its estimated interference covariance matrix with the channel
information it estimated in step 3.
Note that the STI-MAC protocol is susceptible to the hidden-node problem
since the node initiating the new link (node 1 here) does not respond to the initi-
ation response message from the new receiver (node 2 here) with another control
packet. In protocols such as 802.11, this type of packet is used to inform receivers
within range of the new transmitter, but out of range of the new receiver, that
a link is going to be initiated.
Note that STI-MAC has provisions for transmissions with channel-state infor-
mation whereby immediately following the initiation response message, the new
receiver (node 2 in this example) estimates and transmits the covariance matrix
that should be used by the transmitter during the data channel. Other nodes
currently receiving data can hear this information and use it to estimate the
518 Medium-access-control protocols
SI
Data Channel
IR
protest window
No data
transmission
due to protest
N1
SI
Data Channel
N2
IR
N3 P
protest window
effects of the new transmitter on their ongoing links, which determines whether
or not nodes transmit protest messages (see Figure 15.16).
Problems
15.1 Derive the throughput capacity of unslotted ALOHA under the assump-
tion that the packet duration is a constant.
15.2 Construct a different scenario where the CSMA/CA protocol fails and
results in a collision during transmission of the data packet. Your answer should
include a timing diagram as well as a figure that describes the relative positions
of nodes and/or obstacles.
15.3 Modify the Ward protocol described in Section 15.5.5 such that it applies
to ad hoc wireless networks, that is networks with one-to-one links. You should
construct a timing diagram for a scenario where a link is successfully established.
15.4 For the carrier-sense-multiple-access system described in Section 15.3,
construct a scenario where a packet collision occurs. Discuss the role of
Problems 519
propagation delays and the amount of time required for sensing the medium
in the probability of a collision.
15.5 Qualitatively explain why a randomized sensing duration could result in
lower probability of collision in carrier-sense-multiple-access systems compared
to a fixed sensing duration.
15.6 Assuming that channels are static, construct a scenario where the SPACE-
MAC protocol described in Section 15.5.3 results in a collision during the data
packet transmission.
15.7 Construct a timing diagram for a simple interference-alignment protocol
with three transmit and receive pairs, each with three antennas. You may use the
system described in Section 14.4. Your may assume channel reciprocity between
all antennas, and only consider a case where links are successfully established.
Your answer should indicate when all necessary channel estimations are per-
formed and in which packets channel parameters that cannot be estimated are
exchanged.
15.8 Assuming devices that can simultaneously transmit and receive signals
in different frequency bands, consider the following communications protocol
which is a variant of busy-tone protocols (see, for instance, Reference [20]). The
available bandwidth B is divided into nc data subchannels and nc busy-tone
channels where each data channel has a corresponding busy-tone channel at
significantly different frequency range to enable simultaneous transmissions and
receptions. A node that is receiving data in the kth data channel simultaneously
transmits a noise-like busy-tone signal in the kth busy-tone channel. Any node
that wishes to transmit can only do so in a data channel in which the average
received energy is below a threshold.
(a) Describe how this protocol alleviates the hidden node problem.
(b) Describe how this protocol alleviates the exposed node problem.
(c) Suppose that the busy-tone channel occupies a very narrow range of fre-
quencies. Qualitatively describe a failure mode of the protocol that results
in a collision in a data channel when multiple receivers are successfully re-
ceiving data in a given channel, but a new transmitter believes that the data
channel is available and starts transmitting. Hint: the narrow bandwidth of
the busy-tone channel causes this problem.
16 Cognitive radios
where the ns samples observed at the nr 1 antennas of the legacy and nr 2 antennas
of the secondary receivers are indicated by Z1 ∈ Cn r 1 ×n s and Z2 ∈ Cn r 2 ×n s . The
nt 1 antennas of the legacy and nt 2 antennas of the secondary transmitters trans-
mit a complex baseband sequence indicated by S1 ∈ Cn t 1 ×n s and S2 ∈ Ct r 2 ×n s .
All the channel matrices between transmitters and receivers of the legacy and
the secondary links are indicated by H1,1 ∈ Cn r 1 ×n t 1 , H1,2 ∈ Cn r 2 ×n t 1 , H2,2 ∈
Cn r 2 ×n t 2 , and H2,1 ∈ Cn r 1 ×n t 2 . Finally, the additive complex circularly sym-
metric Gaussian noise is indicated by N1 ∈ Cn r 1 ×n s and N2 ∈ Cn r 2 ×n s .
A common use of the term cognitive radio is to denote a radio that finds an
unused portion of spectrum and operates there. The radio determines if a legacy
signal is operating in a given band by exploiting techniques like those discussed
in Section 16.3.
16.2 Cognitive spectral scavenging 523
While finding and using underemployed spectrum seems like a simple enough
prospect, there are a number of practical issues. First, because of spectral licens-
ing reasons, not all empty bands are open for scavenging. A cognitive radio of
this type would have to know the radio’s location and the local applicable regu-
lations. Second, a radio is only useful if at least two radios decide to use the same
band to communicate. In order to achieve this consensus, there are a few possi-
ble approaches. One potential solution is to expect that all cognitive radios in a
region to have the same spectral-selection algorithm and consequently come to
the same spectral-selection conclusion, given similar environmental observations.
However, because the observations are not identical, consensus is not guaranteed.
Another potential solution is to employ a preagreed-upon control channel. This
could be in a licensed band. In the control channel, the radios could agree upon
a common spectrally scavenged channel for data communications.
In the previous section, it was assumed that the legacy signal could be de-
tected and avoided. To be a good neighbor, a practical cognitive radio attempts
to avoid interfering with the legacy link by detecting and avoiding used spec-
trum. Depending upon what is known about the signal, there are many detection
approaches (for examples, see Reference [358] and references therein).
q= z 2
= {z}m 2
. (16.2)
m
The probability density for the integrated energy q of the observed Gaussian
signal z of some variance σ 2 is given by the complex central χ2 distribution with
ns complex degrees of freedom as defined in Section 3.1.11:
q n s −1 q
pC 2
χ 2 (q; ns , σ ) dq = n e− σ 2 dq . (16.3)
(σ 2 ) s Γ(ns )
where the result in Equation (3.44) is employed. The probability of a false alarm
Pf a under the assumption of unity noise variance is given by
∞
Pf a = dq pC
χ 2 (q; ns , 1)
η
1
=1− γ(ns , η) . (16.5)
Γ(ns )
In Figure 16.2, the probabilities of detection and false alarm are presented
under the assumption of a signal and noise variance of 0 dB with 10 observa-
tions. By evaluating the probability of detection and probability of false alarm
for a range of thresholds, a receiver operating characteristic (ROC) curve can
be generated as discussed in Section 3.7.3. In Figure 16.3, the probability of
detection as a function of probability of false alarm is presented under the as-
sumption of a signal and noise variance of 0 dB with either 1 or 10 observations.
526 Cognitive radios
1.0
0.8
Probability
0.6
0.4
0.2
0.0
0 5 10 15 20 25 30
Threshold
Figure 16.2 Single-antenna energy detection probability of false alarm (black) and
probability of detection (gray) for a Gaussian signal in the presence of Gaussian noise,
assuming an SNR of 0 dB for 10 observations.
1.0
Probability of Detection
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Probability of False Alarm
Given this model, the distribution for the norm squared of the received signal
vector is the complex noncentral χ2 distribution.
The probability density for the complex noncentral χ2 distribution as defined
in Section 3.1.12,
(n s −1)/2
1 −(q + ν C ) q
pC
χ2 (q; ns , σ 2
n , ν C
) dq = 2
e 2 C
In s −1 2 ν C q dq ,
σn σn ν
(16.7)
where for unit-variance complex Gaussian noise, the complex noncentrality pa-
rameter ν C is given by
1
νC = 2 {a s}m 2
σn m
2 2
= a s . (16.8)
The cumulative distribution function PχC2 for the complex χ2 random variable is
given by
η
C C
2
Pχ 2 (η; ns , σn , ν ) = dq pC 2 C
χ 2 (q; ns , σn , ν )
0
∞ C m
−ν C ν γ(m + ns , η)
=e , (16.9)
m =0
m! Γ(m + ns )
where γ(·, ·) indicates the lower incomplete gamma function that is defined in
Section 2.14.1.
The probability of detection is given by the integral of the threshold η up to
infinity. Consequently, the probability Pd is given by
Pd = 1 − PχC2 (η; ns , 1, a 2 s 2 )
∞ m
−ν C a 2 s 2 γ(m + ns , η)
=1−e . (16.10)
m =0
m! Γ(m + ns )
The evaluation for the probability of false alarm Pf a is that same as that for the
known Gaussian assumption found in Equation (16.5):
1
Pf a = 1 − γ(ns , η) . (16.11)
Γ(ns )
Similar to the process in the previous section, by evaluating the probability
of detection and probability of false alarm for a range of thresholds, a receiver
operating characteristic curve can be generated. In Figure 16.4, the probability
528 Cognitive radios
1.0
Probability of Detection
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Probability of False Alarm
which is identified as the background noise variance, to σn2 ew , which is the vari-
ance of the new signal plus the old background noise. Under the assumptions of
Gaussian signals, ns samples from two distributions (old and new) for the signals
zold ∈ C1×n s and zn ew ∈ C1×n s are given by
1
e−z o l d
2
/σ o2 l d
p(zold ) = 2n s
π n s σold
1
e−z n e w
2
/σ n2 e w
p(zn ew ) = . (16.12)
π n s σn2news
It is certainly not required for ns to be the same for estimating zold and zn ew ,
but it will be assumed here for convenience.
Detection is declared if the difference between the power estimates σ̂n2 ew and
2
σ̂old exceeds some threshold η/ns ,
η
σ̂n2 ew − σ̂old
2
> . (16.13)
ns
The probability density for either σ̂n2 ew or σ̂n2 ew under the assumption of Gaus-
sian signals and noise are given by complex χ2 distribution,
1 q
pC 2
χ 2 (q; ns , σ ) dq = q n −1 e− σ 2 dq , (16.14)
σ2 n s Γ(ns )
where the integrated energy q is given by the sum of the squared magnitudes of
n zero-mean complex Gaussians zm of variance σ 2 ,
ns
2
q= zm . (16.15)
m =1
= 0. (16.16)
The estimator is unsurprisingly given by
q
σ̂ 2 = . (16.17)
ns
For the sake of this discussion, the old and new power levels will be estimated
with the same number of samples ns . The probability of detection Pd of new
energy, when σ̂n2 ew − σ̂old
2
> η/ns , occurs when
where qn ew and qold correspond to the new and old energy estimates respectively.
For some σn2 ew and σold
2
, the probability of detection is given in Reference [77].
The probability of detection Pd is given by
where the function Γ(n, a) is the upper incomplete gamma function discussed in
Section 2.14.1 that is defined by
∞
Γ(n, a) = dx xn −1 e−x . (16.21)
a
For integer values of n > 0, the upper incomplete gamma function is given by
n −1
−a am
Γ(n, a) = Γ(n) e . (16.22)
m =0
m!
m
∞ s −1
n q o l d +η
−
qold + η
σ n2 e w
dqn ew pC 2 2
σn
χ 2 qn ew ; ns , σn ew = e . (16.23)
ew
qo l d +η m =0
m!
m
By exploiting the binomial theorem, the term (qold + η) is given by
m
m
(η)m −k ,
m k
(qold + η) = qold (16.24)
k
k =0
0 (σold s s
qold + η s −1
n m
− 1 m
·e (η)m −k
2
σn k
ew
m qold
m =0
m! (σn2 ew ) k
k =0
∞ σ2 +σ2
1 −q o l d o2l d 2n e w
σ σ
= dqold 2 )n s
e old n ew
0 (σold Γ(ns )
η ns −1 m
− 1 m n s +k −1
·e (η)m −k .
2
σn ew
m qold (16.27)
m =0
m! (σn2 ew ) k
k =0
By reordering the terms and moving the integral to the end, the probability of
detection Pd becomes
1
Pd = 2 )n s Γ(n )
(σold s
η s −1
n m
− σ2 1 m
·e n ew
m (η)m −k
m =0
m! (σn2 ew ) k
k =0
∞ σ o2 l d + σ n
2
n s + k −1 −q o l d ew
· dqold qold e σ2 σ2
old n ew
0
1
= 2 )n s Γ(n )
(σold s
η s −1
n m
− σ2 1 m
·e n ew
2 m (η)m −k
m! (σ n ew ) k
m =0 k =0
2 2
n s +k
σold σn ew
(ns + k − 1)! 2 + σ2
σold n ew
s −1
n m n s +k
− σ 2η (η)m −k (ns + k − 1)! 2
σold σn2 ew
=e n ew
ns .
s − 1)! k! (m − k)!
2 2 m 2 + σ2
m =0 k =0
(σold ) (σn ew ) (n σold n ew
(16.28)
The integral is found by using the parameter definition
2
σold + σn2 ew
b= 2 (16.29)
σold σn2 ew
and the substitution
x = qold b . (16.30)
532 Cognitive radios
1.0
0.8
Probability
0.6
0.4
0.2
0.0
0 5 10 15 20 25 30
Threshold
Figure 16.5 Probability of detection (upper curve, gray) and probability of false alarm
(lower curve, black) of new energy as a function of threshold η for the example of
ns = 10 samples and with variances of σn2 e w = 2 and σo2 l d = 1.
s −1
n m 2 2 n s +k
η (η)m −k (ns + k − 1)! σ σ
Pf a = e− σ 2 2 )n s (σ 2 )m (n − 1)! k! (m − k)! 2 + σ2
m =0 k =0
(σ s σ
m n s +k
s −1
η m −k
n
η (ns + k − 1)! 1
= e− σ 2 . (16.31)
m =0
σ 2 (ns − 1)! k! (m − k)! 2
k =0
For the example of integrating over ns = 10 samples and with new signal-
plus-noise and old noise variances of σn2 ew = 2 and σold
2
= 1, the probabilities of
detection and false alarm as a function of the threshold η are displayed in Figure
16.5. As the threshold value goes to 0, the probability of detection becomes large,
approaching 1, and the probability of false alarm approaches 1/2. The false-alarm
limit occurs because, at threshold value of zero, either the new or old variance
estimate can fluctuate to be larger with equal probability. In Figure 16.6, the
receiver operating curve that gives the probability of detection as a function of
probability of false alarm is displayed, given the above parameters. For compar-
ison, the single-antenna energy detection for an unknown deterministic signal
and a Gaussian signal in the presence of Gaussian noise are presented. As one
would expect, the performance of the new-energy test statistic is worse because
less is known about the environment compared to the other test statistics. At
the probability of detection of 0.8, the false-alarm rate of the new-energy test
16.3 Legacy signal detection 533
1.0
Probability of Detection
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Probability of False Alarm
statistic about 0.18, compared with 0.08 for detecting a unknown deterministic
signal.
T
0
dt z(t + τ /2) z ∗ (t − τ /2) e−iω t
φ(τ, ω) = T
. (16.32)
0
dt z(t) z ∗ (t)
2 1
0.8
1
Chips Delay
0.6
0
0.4
−1
0.2
−2
0 0.1 0.2 0.3 0.4
Fractional Frequency Offset
Figure 16.7 Complex ambiguity function surface, φ(τ, ω), for a 200-chip binary
phase-shift-keying signal.
Z = HS + N, (16.33)
where the channel matrix defining the complex attenuation between transmit
and receive antennas is indicated by H ∈ Cn r ×n t , the transmitted signal is given
by S ∈ Cn t ×n s , and the complex additive interference plus noise is denoted
N ∈ Cn r ×n s .
here as Z̃,
Z̃ = R−1/2 Z . (16.34)
The total energy of the whitened received signal plus noise ρ received by an
array of antennas is given by the Frobenius norm squared of the received signal
matrix, which is given by
ρ = Z̃ 2F = tr{Z̃ Z̃† }
= {R−1/2 H S}m ,n + R−1/2 {N}m ,n 2
. (16.35)
m ,n
and the distribution of total received energy ρ is given by the complex noncentral
χ2 distribution of complex degree nr · ns , as described in Section 3.1.12. The
noncentrality parameter ν C is given by the sum of the standard deviation-
normalized Gaussian means,
1
νC = μk 2
, (16.37)
σ2
k
where μk and σ 2 are the mean and variance for each Gaussian. From Equation
(16.36), the standard deviation is one σ = 1, and the noncentrality parameter
ν C is given by
$
$ −1/2
$2
$
νC = ${R H S}m ,n $
m ,n
0 1
= tr R−1/2 H S S† H† R−1/2
3 4
= tr R−1 H S S† H† . (16.38)
ρ ∼ pC C
χ 2 (r ; nr ns , ν ) . (16.39)
536 Cognitive radios
The probability of detection Pd is given by the probability that the value of total
received energy ρ exceeds some threshold η,
The probability of false alarm Pf a is given by the probability that the received
energy ρ, in the absence of a signal, fluctuates above the threshold η. The prob-
ability density for ρ in this environment is given by
ρ ∼ pC 2
χ 2 (ρ; nr ns , σ = 1) . (16.41)
1.0
Probability of Detection
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Probability of False Alarm
Figure 16.8 Receiver operation characteristic curve for a unknown deterministic signal.
The probability of detection as a function of the probability of false alarm for the
example for a multiple-antenna receiver with nr = 4 receive antennas and ns = 10
samples (black). The noncentrality parameter ν C is nr ns . As reference, the
single-antenna performance (gray) is also displayed.
energy. Also, we can look for changes in the spatial structure. Additionally, we
can attempt to find changes in specific waveform characteristics. Of these few
examples, we consider the first two in the following discussion.
The change in the total receive energy for some number of samples between
the old Zold and new Zn ew observations is given by
Zn ew 2
F − Zold 2
F > η, (16.43)
where η is the test statistic threshold. However, this approach has the disadvan-
tage of ignoring the spatial structure of the received signal.
Another approach is to consider using the old data to estimate the interference-
plus-noise spatial covariance matrix R̂old
1
R̂old = Zold Z†old . (16.44)
ns
Similar to the method used in the previous section, the data are whitened by
using this covariance matrix estimate,
−1/2
Z̃ = R̂old Zn ew . (16.45)
The total receive energy of the estimated whitened data is then given by
2
ρ = Z̃ F
= tr{R̂−1 †
old Zn ew Zn ew } . (16.46)
538 Cognitive radios
If the old and new signals are drawn from the same distribution, then one would
expect that the total receive energy of the estimated whitened data would have
a value near the product of the number of receive antennas and the number of
samples nr ns . If it is significantly greater than this value, then it is an indica-
tion of new energy. While techniques with this or similar forms can be useful,
explicit evaluation of probability of detection versus probability of false alarm
are challenging.
(a) (b)
Hidden y
Receiver
Link ri
x
f
Link
Transmitter
Hidden
t Node
Figure 16.9 (a) Displays a notional link of interest and distribution of hidden links in
time (t) and frequency (f ) in the presence of a waveform. (b) Depicts a notional
geometry of transmitter, receiver, and hidden node. The region of disruptive
interference is contained with radius ri . IEEE
c 2010. Reprinted, with permission,
from Reference [29].
• The effects of interference on the hidden node can be factored into the proba-
bility of collision and the probability that the interference-to-noise ratio (INR)
at the hidden node exceeds some critical threshold.
• The hidden node location is sampled uniformly over some large area, so we
do not have prior knowledge of what absolute level of power will cause inter-
ference.
• The average channel attenuation from the transmitter to the hidden node can
be accurately modeled by using a power-law attenuation model.
• The link performance can be characterized with reasonable accuracy by the
channel capacity.
• The hidden node does not have some interference mitigation capability that
prefers a particular waveform structure.
• Finally, the desired data rate of the link of interest is sufficiently low such that
the link has the freedom to transmit in packets with relatively low spectral-
temporal occupancy. Consequently, the optimization is developed for a single
packet of a given number of information bits.
pc ∝ (T + TH ) (B + BH ) , (16.47)
where TH and BH are the temporal and spectral extents of the hidden node link
because of the fraction of the temporal and spectral space occupied by the links
[seen in Figure 16.9(a)]. In the case of a cognitive radio that is attempting to
transmit a larger message, the probability of collision pc is approximated well by
pc ∝ T B . (16.48)
pc ∝ 4 T B ∝ T B . (16.49)
If it is assumed that the hidden node is randomly located on a plane with uni-
form density with respect to the transmitter in a two-dimensional physical space
(Figure 16.9(b)), then the probability that the hidden node is within sufficient
range pr to cause disruptive interference is proportional to the area A over which
the signal has a sufficient INR, η > ηj , at the hidden node
pj ∝ T B A . (16.51)
The area is a function of the transmit energy and propagation loss to the hidden
node.
For a SISO system, the information-theoretic bound, which is introduced in
Section 5.3, on the number of bits ninfo that can be transmitted within time T
and bandwidth B is given by
ninfo ≤ T B c (16.52)
c = log2 (1 + γ) ,
γ
c̃ = log2 1 + ,
l
where c is the information-theoretic limit in bits/s/Hz on the SISO spectral
efficiency (assuming a complex modulation), and γ is the SNR at the receiver.
16.4 Optimizing spectral efficiency to minimize network interference 541
The bound is not achievable for finite ninfo , but it is a reasonable approximation
to the limiting performance. To approximate a more realistic rate c̃, it is assumed
that the achieved spectral efficiency is given by the information theoretic capacity
with an additional implementation loss figure l, so that γ → γ/l.
By assuming the link of interest can be approximated by modified capacity,
the SNR at the receiver can be expressed in terms of the number of bits ninfo
transmitted and the spectral efficiency
γ
ninfo ≈ T B log2 1 + ,
l
c̃
γ ≈l 2 −1 . (16.53)
If the channel gain to the hidden node is denoted b2 and the channel gain to the
receiver of interest is denoted a2 , then the INR, denoted η, at the hidden node
is
b2
η= γ. (16.54)
a2
By using a simple power-law model for loss, with the channel gain to the hidden
node proportional to r−α , the radius rj at the critical interference level (at which
η = ηj ) is found by observing the SNR γ and INR η are related by
a2 a2 a2
γ= η ∝ η ⇒ ηi
b2 r−α ri−α
rj ∝ γ 1/α = l1/α (2c̃ − 1)1/α . (16.55)
pi ∝ T B A
ninfo ninfo 2
≈ A∝ rj
c̃ c̃
(2c̃ − 1)2/α
∝ . (16.56)
c̃
The optimal spectral efficiency copt for some α is given by
2
−1
2/α
∂pj 2c̃+1 (−1 + 2c ) α log(2) −1 + 2c̃
∝ − =0
∂c̃ c̃α c̃2
α + 2 W0 − 21 e−α /2 α
copt = , (16.57)
2 log(2)
where W0 (x) is the product log or principal value of the Lambert W function2
[65] that is discussed in Section 2.14.4. It is remarkable that optimal spectral
efficiency is dependent upon the channel exponent exclusively. Similar result
were found in Reference [94], and when attempting to optimize the spectral
partitioning of an interference-limited network [163].
2 The Lambert W function is the inverse function of f (W ) = W eW . The solution of this
function is multiply valued.
542 Cognitive radios
c [b/s/Hz]
2
0
2 3 4 5 6
α
Figure 16.10 Optimal SISO spectral efficiency c̃ for ideal coding in a static
environment as a function of transmitter-to-hidden-node channel gain exponent, α.
IEEE
c 2010. Reprinted, with permission, from Reference [29].
In Figure 16.10, the optimal spectral efficiency for a given channel exponent,
under the assumption of ideal coding in a static channel, is displayed. In the ab-
sence of multipath scattering, the line-of-sight exponent is α = 2 (an anechoic, for
example). For α = 2, the optimal spectral efficiency approaches zero. For most
scattering environments, α = 3 to 4 [140] is a more reasonable characterization,
suggesting an optimal spectral efficiency around 2 bits/s/Hz. Heuristically, one
can interpret these results by noting that as the attenuation exponent α in-
creases, the environment attenuates the signal more quickly in range, so a better
strategy is to transmit at higher power (and thus higher spectral efficiency) and
consequently for less time. The shorter transmission reduces the probability of
collision.
Z = HS + N, (16.58)
16.4 Optimizing spectral efficiency to minimize network interference 543
ninfo ≤ T B c
P0
c = log2 I + H H† , (16.59)
nt
where c is the bounding spectral efficiency, which is achievable as the number
of information bits ninfo approaches infinity, and P0 is the total thermal-noise-
normalized transmit power. By employing an approach similar to that used for
the SISO case, an approximation to a practical achieve rate c̃ is given by modi-
fying the SNR by a loss factor l
ninfo ≤ T B c̃
P0
c̃ = log2 I + H H† . (16.60)
l nt
Implicit in this formulation is the assumption that the interference-plus-noise
covariance matrix, is proportional to the identity matrix which is a reasonable
model for most interference-avoiding protocols.
Because the capacity is a function of a random SNR matrix, there is not a
single solution as there is in the SISO analysis. However, by assuming that the
channel matrix H = a G is proportional to a matrix sampled from an i.i.d. zero-
mean element-unit-norm-variance complex matrix, G, where a is the average
attenuation, an asymptotic analysis in the limit of a large number of antennas
like that employed in Section 8.7, a solution can be found. With this model,
the term a2 P0 is the the average SNR per receive antenna at the receiver of
interest. To simplify the analysis, it is assumed that nr = nt ≡ n. The optimal
spectral efficiency under the assumption of other ratios of number of transmitters
to receivers can be found following a similar analysis. The asymptotic capacity
c for the uninformed transmitter, discussed in Section 8.7, is given by
c a2 P 0
≈ 3 F2 ([1, 1, 3/2], [2, 3], −4 a P0 ) ≡ f (a P0 )
2 2
n log 2
√ √
4 log 4a2 P0 + 1 + 1 4a2 P0 + 1
= + 2
log(4) a P0 log(4)
1 2
− 2 −2− , (16.61)
a P0 log(4) log(4)
where p Fq is the generalized hypergeometric function [129], as discussed in Sec-
tion 2.14.2, and the function f (x) is used for notational convenience. The ap-
proximation for the achievable rate is given by modifying the SNR term in the
544 Cognitive radios
c̃ a2 P 0
≈ 3 F2 ([1, 1, 3/2], [2, 3], −4 a P0 /l) ≡ f (a P0 /l)
2 2
n l log 2
4 log 4a2 P0 /l + 1 + 1 4a2 P0 /l + 1
= + 2
log(4) a P0 /l log(4)
1 2
− 2 −2− . (16.62)
a P0 /l log(4) log(4)
The SNR per receive antenna at the receiver a2 P0 can be expressed in terms
of the number of information bits ninfo transmitted and the spectral efficiency c̃,
2
a P0
ninfo = T B n f
l
c̃
a2 P0 = l f −1 . (16.63)
n
η = b2 P0 . (16.64)
By using a similar analysis to the SISO case and power-law model for the average
channel gain b, the radius of disruptive interference ri for the average attenuation
is found by observing
a2 a2 a2
a2 P0 = η ∝ η ⇒ ηi
b2 r−α ri−α
1/α
c̃
rj ∝ (a2 P0 )1/α = l1/α f −1 . (16.65)
n
3.5
3.0
c/n [b/s/Hz]
2.5
2.0
1.5
1.0
0.5
0.0
2 3 4 5 6
α
Figure 16.11 Optimal MIMO spectral efficiency for ideal coding in a static
environment as a function of the transmitter-to-hidden-node channel gain exponent.
IEEE
c 2010. Reprinted, with permission, from Reference [29].
Problems
16.4 Extend the evaluation of probability of detection and false alarm for the
single-antenna, new-energy detector found in Equation (16.28) to include unequal
numbers of observations for the old and new variance estimates.
16.5 Under the assumption of a four-antenna receiver and a single transmitter
received at 0 dB SNR per antenna, numerically evaluate the receiver operat-
ing curves for the multiple-antenna, new-energy detectors defined in Equations
(16.43) and (16.46) under the assumption of
(a) ns = 4,
(b) ns = 8,
(c) ns = 32
independent samples.
16.6 Under the assumption of a four-antenna receiver and a single transmitter
with 10 samples received at 0 dB SNR per antenna, consider a modification of
that form in Equation (16.43). Replace the trace with evaluating the maximum
eigenvalue. Numerically compare the receiver operating curves of the form in
Equation (16.43) and the modified form.
16.7 For a SISO channel, under the assumptions discussed in Section 16.4.1,
evaluate an approximate optimal spectral efficiency to minimize interference with
a legacy or hidden-node network under the assumption that the frame is small
compared with the legacy waveform in time and bandwidth.
17 Multiple-antenna acquisition
and synchronization
A receiver cannot decode a signal if it is not aware of the existence of the trans-
mitted signal. Furthermore, if a transmitter and receiver are not aligned in time
and frequency, then the transmitted signal will not make any sense to the re-
ceiver. Consequently, in order to establish a wireless communication link, the
receiver must find or acquire the transmitted signal, and some sort of synchro-
nization in time and frequency between the transmitter and receiver must occur.
In this chapter, the process of acquisition and synchronization is simply denoted
synchronization. In order for two nodes to be synchronized, they must agree
on both the carrier frequency and timing. In this discussion, it is assumed that
any frequency errors are small enough that frequency synchronization can be
achieved after temporal synchronization. Extensions to the discussions provided
here would enable joint temporal and spectral synchronization. In situations in
which coherence is required for long durations, frequency is of greater impor-
tance [244], and the techniques discussed in this chapter need to be modified to
address this sensitivity.
The performance of synchronization or acquisition techniques is often charac-
terized in terms of probability of detecting the signal of interest given that it is
there versus the probability of falsely “detecting” a signal (a false alarm) given
that the signal is absent. The function relating these two probabilities for some
test statistic is often denoted the receiver operating characteristic (ROC) curve
that is discussed in Section 3.7.3.
Synchronization can be the weakest component of a communication link. This
potential weakness is exacerbated when an attempt is made to establish a link in
the presence of interference, which can effectively break many synchronization
approaches. Synchronization performance is a function of both the signal-of-
interest signal-to-noise ratio (SNR) and the interference-to-noise ratio (INR).
Here various approaches for temporal synchronization of multiple-input multiple-
output (MIMO) communication links are introduced.
Synchronization has been studied in a variety of contexts. For single-input
single-output (SISO) systems, synchronization is often achieved by finding the
peak in the correlation between received data and a known reference [287, 255].
These concepts can be extended to MIMO systems [221, 333, 201]. The discussion
in this chapter follows the discussion in Reference [36] closely.1
1 Portions of this Chapter are IEEE
c 2010. Reprinted, with permission, from Reference [36].
548 Multiple-antenna acquisition and synchronization
Here it is assumed that static signal and interference channels are not frequency
selective. Synchronization in frequency-selective channels is beyond the scope of
this chapter, but is discussed in Reference [36]. By extending the MIMO model
used in Chapter 8 to include a delay τ in time, the received signal at nr receive
antennas at time t z(t) ∈ Cn r ×1 as a function of t is described by using the
following form,
z(t) = H s(t − τ ) + n(t) , (17.1)
where H ∈ Cn r ×n t is the flat-fading channel matrix, s(t) ∈ Cn t ×1 is the vector
of transmitted signals at the nt transmit antennas, and n(t) ∈ Cn r ×1 is the
interference and complex circular additive Gaussian noise.
We can rewrite the MIMO model in terms of sampled blocks of data as dis-
cussed in Section 8.10,
Z = HS + N, (17.2)
where Z ∈ Cn r ×n s is the received data matrix, H ∈ Cn r ×n t is the flat-fading
channel matrix, S ∈ Cn t ×n s is the transmitted signal matrix, and N ∈ Cn r ×n s is
the noise-plus-interference matrix. For notational convenience, here the received
complex baseband signal, at some delay τ the matrix Zτ ∈ Cn r ×n s , is defined in
terms of the time-dependent form of the received signal
Zτ = (z(0 Ts − τ ) z(1 Ts − τ ) ··· z([ns − 1] Ts − τ )) , (17.3)
where Ts is the sample period, and s(t) and z(t) are the continuous transmit and
received vectors as a function of time t.
Implicit in the following discussion is the concept that the flat-fading channel
does not introduce resolvable delay. The delay introduced by a SISO channel
can be represented by the sum of contributions at various delays. A flat fading
channel could be represented by a single nonzero coefficient in this sum. Similarly,
a MIMO channel can be represented by a sum of channel matrices at various
delays as discussed in Section 10.4.
The best unbiased estimate of the delay can be found by evaluating the Cramer–
Rao bound. The Cramer–Rao bound (discussed in Section 3.8) that is developed
here for the product of the root-mean-squared baseband complex-signal band-
width, Br m s , and the standard deviation of delay estimation, στ , is given by
1
στ = √
2π Br m s ρ
3 4
ρ ≡ ns tr P H† R−1 H , (17.4)
17.2 Flat-fading MIMO delay-estimation bound 549
where the spatial covariance matrix of interference plus noise is given by R, and
the transmit array emits ns samples from each antenna with transmit spatial
covariance matrix P. The variable ρ can be interpreted as the total received
integrated SNR (in which the coherent integration of the signal scales as n2s
and the incoherent integration of the noise scales as ns ), where the integration
occurs over the ns independent complex samples and at the output of nt adaptive
beamformers.
If the signals emitted from the transmit antennas are independent, then P =
P0 /nt I, where P0 is the total transmitted power. If ρ ≥ 1/π 2 , then, based on the
Cramer–Rao bound, the error in temporal synchronization in the context of ac-
quisition should be less than a half of a complex sample at the Nyquist sampling
rate. However, at this SNR, the false-alarm rate would be unacceptably large.
Consequently, the typical synchronization performance is not set by the Cramer–
Rao bound, but by the probability of detection and false alarm. Nonetheless, for
completeness, we will present the bound here under the assumption of Gaussian
interference and noise with a signal in the mean.
The delay-estimation bound is developed for the flat-fading MIMO channel
model. The probability density function p(z(t); τ ) for the received signal z(t)
described in Equation (17.1) under the assumption the signal is transmitted at
some delay τ is given by
† −1
e−(z(t)−H s(t−τ )) R (z(t)−H s(t−τ ))
p(z(t); τ ) = , (17.5)
π n r |R|
∂s(t − τ )
s (t − τ ) = , (17.7)
∂τ
the partial derivative of the log probability density for z(t) is given by
∂ log p(z(t); τ )
= n† (t) R−1 H s(t − τ ) + h.c. , (17.8)
∂τ
where h.c. indicates the Hermitian conjugate of the first term. The partial deriva-
tive of the log probability density p(Zτ ) for data matrix Zτ at delay τ is given
by
∂ log p(Zτ ) †
= n (mTs ) R−1 H s(mTs − τ ) + h.c. (17.9)
∂τ m
550 Multiple-antenna acquisition and synchronization
using the independence of n(t) and s(t), the expectation of cross terms is zero,
and the assumption that s and n are critically sampled (sampled at the Nyquist
rate). Here it is assumed that the channel matrix H is deterministic. By consid-
ering the signals transmitted from each transmitter to be stochastic and to have
the same spectral support with the Fourier transform indicated by s̃(f ), and by
noting the definition for the root-mean-squared bandwidth of the signal Br m s
implies
% &
tr df f 2 s̃(f ) s̃† (f )
2
Br m s = % &
tr dfs̃(f ) s̃† (f )
% &
1 tr s(t − τ ) (s(t − τ ))†
= . (17.11)
(2π)2 tr s(t − τ ) s†(t − τ )
Here it is assumed that s(t) is a complex baseband signal with mean frequency of
zero. The expectation of the derivative found in the right-hand side of Equation
(17.11) simplifies to the following for the Fisher information J,
J = (2π)2 Br2m s
3% & 4
· tr s(mTs − τ )s†(mTs − τ ) H† R−1 H
m
3 4
= (2π) Br2m s ns tr P H† R−1 H ,
2
(17.12)
where the elements in s(t) are defined to have an expected unit variance. Conse-
quently, the uncertainty bound for the estimation of τ in terms of the standard
deviation is given by
στ = J −1/2 (17.13)
require synchronization no better than half a sample period. Finer timing align-
ment is often left to other components in the receiver. Furthermore, when a link
is being established in a channel that has delay spread caused by multipath scat-
tering, which is common in non-line-of-sight, ground-to-ground communications,
there may not be any single, well-defined delay that specifies synchronization. A
receiver designed to operate in environments with frequency-selective channels
(those with resolvable delay spread) will be able to compensate for some finite
delay spread. Consequently, if the temporal alignment is found to be within some
window (determined by the details of the receiver), then successful synchroniza-
tion can be claimed [319].
Given a discrete set of potential timing offsets, in principle, synchronization is
a multiple statistical hypothesis test. In the limit of a large number of potential
hypotheses with independent measurements, the statistical interdependence di-
minishes. As an example, if one knew that one of two tests must be the correct
synchronization delay, then a “not selected” outcome for one delay strongly af-
fects the likelihood of the other delay being the correct delay. Conversely, for a
billion potential delays, a “not selected” outcome for a given delay has essentially
no effect on the likelihood of another test point. Consequently, synchronization
for the situation in which there is no constraint on the set of potential delays
can be viewed as a sequence of statistical independent binary hypothesis tests
[255, 174]. This approach is also appropriate for typical practical implementa-
tions.
At each timing offset, the first hypothesis is that the signal of interest is prop-
erly aligned in time. The second hypothesis is that the signal is misaligned or
does not exist. At each test point in time, a statistical criterion (known as test
statistic) is evaluated, given the observed data. Synchronization is declared if
the test statistic threshold is exceeded. The performance of a synchronization
test statistic is characterized by the probability that synchronization is detected,
given the correct timing offset (within the allowed receiver window), versus the
probability of a false alarm that occurs if synchronization is declared in error.
By varying the threshold, a receiver operating characteristic (ROC) curve (that
is discussed in Section 3.7.3) in the space of the probability of detection versus
the probability of a false alarm can be constructed.
As a practical matter, it is sometimes useful to consider a two-stage synchro-
nization process in which a coarse synchronization search is followed by a fine
synchronization search. Once a coarse synchronization is detected, a search in the
neighborhood of that timing offset may be employed. Particularly for frequency-
selective environments, in which multiple delays in some local region may satisfy
the detection statistic threshold, selecting the delay with the largest test statistic
value may improve link performance, depending upon the receiver.
17.4.1 Correlation
The standard SISO link detection test statistic is given by simply correlating
the observed data with respect to a known reference signal at a given delay τ
[287, 255], which is optimal for a SISO link in additive Gaussian noise. Because
this is a MIMO correlation, the output of the correlation is a matrix, Γτ ∈
Cn r ×n t , given by
Γτ = Zτ S† , (17.14)
where the matrix elements correspond to the inner products between the row
vectors in Zτ and the row vectors in the reference signals S. Strong correlation
corresponds to large elements in Γτ . There are a variety of ways of exploiting
the structure in Γτ . As a reference, a noncoherent, multiple-antenna, combin-
ing approach is considered. If the magnitudes squared of the elements in Γτ are
summed, then the expected evaluation of the sum in the absence of the syn-
chronization signal is proportional to the noise variance. This is the test statistic
constructed by noncoherently combining the standard SISO test statistic for each
transmit–receive antenna pair. Consequently, the Frobenius norm squared is a
useful measure of the strength of the correlation, defined by
Γτ 2
F = tr{Γτ Γ†τ } = tr{(Zτ S† ) (Zτ S† )† }
= zm s†n 2 , (17.15)
m ,n
17.4 Test statistics for flat-fading channels 553
where zm and sn indicate the mth and nth row vectors in Zτ and S respectively.
The form Γτ 2F is sensitive to channel gain and signal power and is bound by
Γτ 2F ≤ zm 2 sn 2 = Zτ 2F S 2F . (17.16)
m ,n
Y = W† Zτ (17.18)
e−tr{(Z τ P S ) R (Z τ S )}
⊥ † −1
P⊥
p(Zτ |S; Ĥ, R) = , (17.29)
π n r n s |R|n s
where the projection operator P⊥S projects onto a basis orthogonal to the row
space spanned by S. It is defined to be
P⊥ † † −1
S = In s − S (SS ) S. (17.30)
Maximizing the likelihood in Equation (17.29) with respect to an arbitrary pa-
rameter β of R gives
∂p(Zτ |S; Ĥ, R)
=0
∂β
∂R ∂R
= tr (Zτ P⊥ ⊥ † −2
S )(Zτ PS ) R − ns R−1
∂β ∂β
Zτ P⊥ †
S Zτ
R̂ = , (17.31)
ns
556 Multiple-antenna acquisition and synchronization
using matrix derivative identities from Section 2.7.1 and the notion that projec-
tion matrices are idempotent. Substituting this estimator into the likelihood in
Equation (17.29), the maximum probability density is given by
e−tr{n s In r }
p(Zτ |S; Ĥ, R̂) =
n s Z τ P S Z τ
⊥ † ns
πn r ns
nns s e−n s n r
⊥ † −n s
= n n
Z τ PS Zτ . (17.32)
π r s
Similarly, maximizing the probability density function with a misaligned syn-
chronization, the received covariance matrix estimate of Q̂ is given by
Zτ Z†τ
Q̂ = , (17.33)
ns
which gives the probability density
nns s e−n s n r −n s
p(Zτ |Q̂) = Zτ Z†τ . (17.34)
πn r n s
Consequently, the generalized-likelihood ratio test statistic φg lr t (τ ) is given by
Zτ Z†τ n s
φg lr t (τ ) = n = |I − PS PZ τ |−n s . (17.35)
⊥ † s
Zτ PS Zτ
This test statistic is bounded to values greater than or equal to one. It is in-
teresting to note that the statistic is a function of the row space of S and Zτ
exclusively.
Here, the test statistic notation includes an explicit reference to the received
and transmitted signal parameterization. The test statistic is a function of the
row space of Zτ and S only. This suggests another approach for developing
new synchronization test statistics that are spatially invariant. While there is no
guarantee that invariance will provide a useful test statistic, there is precedence
for using it as motivation [175, 97].
The row spaces of Zτ and S can be represented by the matrices K and T
respectively, such that K† K = PZ τ and T† T = PS . A variety of spatially in-
variant test statistics can be constructed by considering the distances between
the subspaces defined by K and T. The distance between the subspaces is not
Problems 557
Problems
17.1 Reformulate the Cramer–Rao bound in Section 17.2 for frequency estima-
tion.
558 Multiple-antenna acquisition and synchronization
17.2 Reformulate the Cramer–Rao bound in Section 17.2 under the assump-
tions of an uninformed MIMO transmitter, nI strong interferers, and a random
flat-fading i.i.d. complex circular Gaussian channel matrix.
17.3 Reformulate the correlation, MMSE, and GLRT test statistics in terms of
frequency synchronization.
17.4 Evaluate the GLRT under the assumption that the interference-plus-
covariance matrix is known and given by I.
17.5 For a channel with a number of receivers greater than or equal to trans-
mitters (nr ≥ nt ), develop a test statistic that replaces the beamformers in W
from Equation (17.21) with zero-forcing beamformers using the estimated chan-
nel. Numerically compare performance of the statistic to the performances in
Figure 17.1.
17.6 For the correlation, MMSE, and GLRT test statistics determine numeri-
cally the SNR required for a 4 × 4 link to achieve a probability of false alarm of
less than 10−6 and a probability of detection of at least 0.9 for 32 observations
under an INR per receive antenna of:
(a) −∞ dB,
(b) 20 dB.
18 Practical issues
18.1 Antennas
the assumption of the Gaussian model often perform worse than expected when
exposed to non-Gaussian signals, noise, and interference.
It is relatively common for real noise and interference distributions to have
longer tails than a Gaussian distribution. These occasional large deviations from
the expected signal strength can have significant effects on various algorithms.
As an example, if a soft decoder algorithm is presented with a noise sample that
has a large deviation from the expected Gaussian distribution, then the erro-
neous likelihood for that sample can be extremely large or small. The erroneous
likelihood can then propagate throughout the decoding processing, overwhelm-
ing other reasonable likelihoods. A few samples with a large deviation may cause
a decoding error across the frame. Consequently, while the Gaussian noise signal
might be the worst case from an information-theoretic perspective, in practical
systems non-Gaussian noise can be much worse.
electrical sensitivity of the inputs of the frequency synthesizers, they are often
particularly sensitive to packaging-induced coupling problems. There are two
significant concerns with regard to using these frequency references: accuracy
and phase noise.
18.4.1 Accuracy
Local oscillators vary widely with respect to accuracy. Furthermore, the fre-
quency provided by the frequency reference often changes as a function of tem-
perature and age. An inexpensive crystal oscillator used in consumer electronics
may have an accuracy of 10−5 . If the carrier frequency is 1 GHz, then the radio
would transmit or receive with a frequency error of 10 kHz. As a point of com-
parison, a 100 km/h induced Doppler shift produces a frequency shift of 93 Hz.
Consequently, the frequency error caused by the local oscillator can be orders of
magnitude larger than that produced by Doppler shift.
There are much better oscillators available. Temperature-compensated oscil-
lators (TCXO) can have accuracies of better than 10−6 . Ovenized crystal oscil-
lators (OCXO) can have accuracies of better than 10−7 ; however, this comes at
the expense of size, weight, and power.
High-performance atomic clocks enable accuracies of better than 10−14 . These
“clocks” are currently the size of rooms and would make for an inconvenient
mobile phone. There are small atomic clocks with degraded performance. As
technology evolves, these oscillators may become viable for mobile radios.
One approach that allows radios to have high accuracy is to use an external
source as a reference. As an example, base stations of cellular phone systems
typically have access to accurate frequency standards. Mobile units can estimate
and track the frequency from a signal broadcast from the base station. Alterna-
tively, radios can use other external broadcast frequency standards. If the radio
has access to the Global Positioning System (GPS) [218] or some equivalent sys-
tem, an accurate frequency reference can be extracted from a system that has
access to atomic clocks.
Modern radios can be frequency agile. Often radios use a fixed frequency ref-
erence, and then uses a synthesizer to shift the operating frequency. Depending
upon the details of the synthesizer, it may require a noticeable amount of time
for the frequency at the output of the synthesizer to settle to its final value.
at the output of the frequency reference. Fortunately, crystal oscillators that are
used as references typically have good phase noise characteristics. However, when
frequencies are derived from the crystal oscillators by using synthesizers, phase
noise can be increased significantly. For applications that require the frequency
synthesizer to change frequencies, the settling time of the synthesizer can be an
issue. Often phase noise and settling time are competing requirements because
one characteristic is often improved at the expense of the other.
There are a couple of potentially important adverse effects of phase noise.
Relative phase noise between the transmitter and the receiver will cause the
constellation observed at complex baseband to rotate back and forth. While a
frequency adaptive approach can be employed to compensate for these effects,
this comes at the expense of greater computations and may not be effective in all
situations. If the communication link is using higher-order constellations, then
the small amounts of phase noise can cause errors in decoding the signal.
For a multiple-antenna system, if multiple synthesizers are used, then the phase
noise for each spatial channel can be different. As a consequence, small relative
frequency errors can be introduced between spatial channels. These errors can
place limits in the null depths produced by spatial interference mitigation. As
discussed in Chapter 10, in some situations these relative frequency effects can be
mitigated by using space-frequency adaptive processing, but a better engineering
solution is probably to remove the source of the relative phase errors. Even if the
multiple-frequency synthesizers are using a common reference, the relative phase
may be different from one power-up cycle to the next. Consequently, unless some
care is taken, the phase calibration of the multiple-antenna system may not be
stable.
slower than modulation, alternative approaches can be used to extend the range
over which a transmitter or receiver can operate. If there is cochannel interference
(that is, multiple signals being received at the same time at the same frequency),
then dynamic range requirements are increased by the ratio of the stronger to
the weaker received signal [14]. To receive the weaker signal, both the strong and
weak signals must fit within the dynamic range of the receiver. Because of the
near-far problem in networks, the range of received powers can be significant.
For many systems, the instantaneous dynamic range requirements of the receiver
are more stringent than for the transmitters. Limitations to dynamic range are
usually the result of various nonlinearities in the transmitter or receiver.
18.5.1 Quantization
The most apparent (although often not always the most significant) source of
nonlinearity is the analog-to-digital converter (ADC) for the receiver or simi-
larly the digital-to-analog converter (DAC) for the transmitter. In converting a
continuous signal to a set of discrete values, errors in the signal are introduced,
although theses effects may not be important if the noise is larger than the er-
rors. These errors are often referred to as quantization noise. While this “noise”
is clearly not Gaussian, it is often approximated as Gaussian.
Under the assumption of a flat probability distribution across a given quanti-
zation value, the variance of the quantization noise σq2 in units of bits squared is
given by
1/2
2 1
σq = dx x2 = . (18.5)
−1/2 12
The variance of a full-scale signal, of course, depends upon the statistics of the
signal in question. Under the assumption that the signal covering the range 0 to
2n b i t s is centered at the amplitude (2n b i t s − 1)/2 for a digitizer with nbits bits,
2
the maximum variance of a signal σs,m ax in units of bits squared is given when
the signal value is near the minimum and maximum values exclusively,
2 (2n b i t s − 1)2
σs,m ax = . (18.6)
4
2
The variance of a signal σs,eq in units of bits squared that occupies all amplitudes
with equal likelihood is given by
2 n b i t s −1
2 0
dx [x − (2n b i t s − 1)/2]2
σs,eq = 2 n b i t s −1
0
dx
(2 n bits
−1) 3
12 (2n b i t s − 1)2
= = . (18.7)
(2n b i t s − 1) 12
Alternatively, it is commonly assumed in assessments of effective number of bits
2
that the input to the digitization is a sinusoid so that the signal variance σs,sin
18.5 Dynamic range 565
is given by
2π 2 n b i t s −1 2
2 0
dφ 2 sin(φ)
σs,sin = 2π
0
dφ
(2 nb it s
− 1) 2
= . (18.8)
8
The dynamic range r is then defined by the ratio of the largest signal variance to
the quantization noise variance. Depending upon the choice of signal used, the
ratio is given by
σs2
r= = (2n b i t s )2 c
σq2
⎧
⎨ 3 ; max
c= 1 ; equal , (18.9)
⎩
3/2 ; sinusoid
where c is a constant that is dependent upon the distribution of the signal, and
it is assumed 2n b i t s − 1 ≈ 2n b i t s .
It is common to describe the dynamic range in terms of an effective number
of bits. The observed maximum dynamic range r is then equated to this number
of effective bits. From above, the observed dynamic range in power r is given by
( e f f ) 2
r ≈ c 2n b i t s
r (e f f )
≈ 4n b i t s . (18.10)
c
These relations are typically expressed in terms of decibels as a function of the
(ef f )
number of bits, so the effective number of bits nbits is given by
10 log10 (r/k) = r[dB] − c[dB]
(e f f )
(ef f )
≈ 10 log10 (4n b i t s ) ≈ nbits 6[dB]
(ef f ) r[dB] − c[dB]
nbits ≈ , (18.11)
6[dB]
where r[dB] and c[dB] indicate the dynamic range and distribution constant
expressed on a decibel scale. The values of c[dB] are given by 4.8 dB, 0 dB,
and 1.8 dB for maximum, equal, and sinusoidal signal distributions respectively.
Because it is easy to generate sinusoids in the laboratory, the sinusoidal version
is most often quoted. However, for most qualitative purposes, r[dB]/6[dB] is a
sufficiently accurate estimate for the number of effective bits.
executed with a finite number of bits. By reducing the number of bits used in
computations, the amount of real estate on a chip or in a system occupied by the
computation can sometimes be reduced significantly. In some cases, reducing the
number of bits used by a computation can reduce the execution time. Conversely,
limiting the number of bits used in a calculation can significantly affect the dy-
namic range supported in a calculation. As a specific example, the inverse of a
covariance matrix can require far more bits than were used to accurately store
the data matrix. This is because information contained in the covariance matrix
is stored in the power domain, requires approximately twice as many bits as the
data matrix which contains amplitude information. Furthermore, matrix inver-
sion inverts the eigenvalues of the covariance matrix. It is, therefore, sensitive
to both the largest and smallest values, and errors in the smallest eigenvalues
which are exaggerated. One approach to circumvent this problem is to perform
inversions in the amplitude domain, such as using QR decomposition to whiten
a vector.
Consider the estimate of the number of receivers by the number of receivers
covariance matrix C ∈ Cn r ×n r , constructed by using the ns samples contained
within the data matrix Z ∈ Cn r ×n s . The estimate and the data matrix QR
decomposition [117] are given by
1
C= Z Z†
ns
Z = (Q R)† , (18.12)
φ = z† C−1 z . (18.13)
18.5.5 Spurs
Spurs is shorthand notation for some spurious or unwanted signal caused by
distortions or other effects. In almost all real receivers, various spurs at low levels
can often be observed in the frequency domain. In modern hardware, there are
various clocks, digital signals, and many sources of nonlinearities causing these
unwanted spurs. Depending upon the effort expended to minimize these effects,
spurs can be very small and innocuous or large and disruptive. As an example, a
strong spurious tone in the middle of the intended received signal spectrum can
change the statistics of the signal, potentially causing adverse effects to the bit-
error-rate floor. In general, spurs in real systems can be disruptive to approaches
that are particularly sensitive to the model or statistics of the noise.
For many communication applications, one of the most important system design
trades is performance versus power consumption. While it is clear that the power
amplifier can draw significant power, particularly if high linearity is desired, many
other components can also consume significant power. Mixers and other analog
components can draw nontrivial amounts of power that can be a significant
concern, particularly for low-power systems. For more interesting waveforms,
coding, and algorithms, computations can sometimes be the dominant source
of power consumption. As was mentioned in Chapter 11, the required number
of computations per information bit can vary by several orders of magnitude.
It is difficult to make general comments about power consumption because it
is highly dependent upon the radio technologies involved, which tend to evolve
quickly, and because the importance of power consumption is dependent upon
the requirements of the system. Nonetheless, a system designer must consider
these requirements carefully.
References
[15] C. A. Balanis. Antenna Theory: Analysis and Design. John Wiley & Sons,
Hoboken, New Jersey, 2005.
[16] A. Barabell. Improving the resolution performance of eigenstructure-based
direction-finding algorithms. IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), 8:336–339, April 1983.
[17] P. Bergmans. A simple converse for broadcast channels with additive white
Gaussian noise. IEEE Transactions on Information Theory, 6:85–127, 1974.
[18] Dennis S. Bernstein. Matrix Mathematics: Theory, Facts, and Formulas. Prince-
ton University Press, 2009.
[19] C. Berrou, A. Glavieux, and P. Thitimaijshima. Near Shannon limit error cor-
recting coding and decoding: turbo-codes. Proceedings of ICC 1993, Geneva,
2:1064–1070, May 1993.
[20] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, 1995.
[21] D.P. Bertsekas and R.G. Gallager. Data Networks, volume 2. Prentice-hall Upper
Saddle River, NJ, USA:, 1987.
[22] Ezio Biglieri. MIMO Wireless Communications. Cambridge University Press,
2007.
[23] M. Biguesh, S. Gazor, and M. H. Shariat. Optimal training sequence for MIMO
wireless systems in colored environments. IEEE Transactions on Signal Process-
ing, 57(8):3144–3153, Aug. 2009.
[24] Patrick Billingsley. Probability and Measure. John Wiley & Sons, Hoboken, New
Jersey, 1995.
[25] C. Bissell. Vladimir Aleksandrovich Kotelnikov: pioneer of the sampling theo-
rem, cryptography, optimal detection, planetary mapping. IEEE Communica-
tions Magazine, 47(10):24–32, Oct. 2009.
[26] B. A. Bjerke and J. G. Proakis. Multiple-antenna diversity techniques for trans-
mission over fading channels. IEEE Wireless Communications and Networking
Conference, 3:1038–1042, 1999.
[27] I. Blake and W. Lindsey. Level-crossing problems for random processes. IEEE
Transactions on Information Theory, 19(3):295–315, May 1973.
[28] D. W. Bliss. Robust MIMO wireless communication in the presence of interference
using ad hoc antenna arrays. Proceedings of MILCOM 03 (Boston), Oct. 2003.
[29] D. W. Bliss. Optimal SISO and MIMO spectral efficiency to minimize hidden-
node network interference. IEEE Communications Letters, 14(7):620–622, July
2010.
[30] D. W. Bliss, A. M. Chan, and N. B. Chang. MIMO wireless communication chan-
nel phenomenology. IEEE Transactions on Antennas and Propagation, 52(8),
Aug. 2004.
[31] D. W. Bliss and K. W. Forsythe. Angle of arrival estimation in the presence
of multiple access interference for CDMA cellular phone systems. Proceedings
of the 2000 IEEE Sensor Array and Multichannel Signal Processing Workshop,
Cambridge, Mass., March 2000.
[32] D. W. Bliss and K. W. Forsythe. Information theoretic comparison of MIMO
wireless communication receivers in the presence of interference. IEEE Asilomar
Conference on Signals, Systems and Computers, 1:866–870, Nov. 2004.
References 571
[90] Albert Guillen I Fabregas, Alfonso Martinez, and Giuseppe Caire. Bit-Interleaved
Coded Modulation. Foundations and Trends in Communications and Information
Theory. Now Publishers, 2008.
[91] P. Farnsworth. Television system, 1930. U.S. Patent 1,773,980.
[92] F. R. Farrokhi, G. J. Foschini, A. Lozano, and R. A. Valenzuela. Link-optimal
space-time processing with multiple transmit and receive antennas. IEEE Com-
munications Letters, 5:85–87, March 2001.
[93] William Feller. An Introduction to Probability and Its Applications, Vol. II. John
Wiley & Sons, 1971.
[94] B. A. Fette. Cognitive Radio Technology: 2nd Edition. Elsevier, Burlington, MA,
2009.
[95] L. De Forest. Space telegraphy, 1908. U.S. Patent 879,532.
[96] K. W. Forsythe. Utilizing waveform features for adaptive beamforming and direc-
tion finding with narrowband signals. MIT Lincoln Laboratory Journal, 10(2):99–
126, 1997.
[97] K. W. Forsythe. Performance of space-time codes over a flat-fading channel using
a subspace-invariant detector. IEEE Asilomar Conference on Signals, Systems
and Computers, 1:750–755, Nov. 2002.
[98] K. W. Forsythe, D. W. Bliss, and C. M. Keller. Multichannel adaptive beam-
forming and interference mitigation in multiuser CDMA systems. IEEE Asilomar
Conference on Signals, Systems and Computers, 1:506–510, Oct. 1999.
[99] G. J. Foschini. Layered space-time architecture for wireless communication in
a fading environment when using multi-element antennas. Bell Labs Technical
Journal, 1(2):41–59, Autumn 1996.
[100] Giorgio Franceschetti and Sabatino Stornelli. Wireless Networks: From the Phys-
ical Layer to Communication, Computing, Sensing, and Control. Elsevier Aca-
demic Press, 2006.
[101] M. Franceschetti, O. Dousse, D. N. C. Tse, and P. Thiran. Closing the gap in
the capacity of wireless networks via percolation theory. IEEE Transactions on
Information Theory, 53(3):1009–1018, March 2007.
[102] M. Franceschetti, M. D. Migliore, and P. Minero. The capacity of wireless net-
works: Information-theoretic and physical limits. IEEE Transactions on Infor-
mation Theory, 55(8):3413–3424, July 2009.
[103] J. Freebersyser and B. Leiner. A DoD perspective on mobile ad hoc networks. In
C. E. Perkins, editor, Ad Hoc Networking, pages 29–51. Addison-Wesley, 2001.
[104] B. Friedlander and A. J. Weiss. Direction finding in the presence of mutual cou-
pling. IEEE Transactions on Antennas and Propagation, 39(3):273–284, March
1991.
[105] H. Gao, P. J. Smith, and M. V. Clark. Theoretical reliability of MMSE lin-
ear diversity combining in Rayleigh-fading additive interference channels. IEEE
Transactions on Communications, 46(5):666 –672, May 1998.
[106] W. A. Gardner. Exploitation of spectral redundancy in cyclostationary signals.
IEEE Signal Processing Magazine, 8(2):14–36, April 1991.
[107] Patrick Geddes. The Life and Work of Sir Jagadis C. Bose. Longmans, Green
and Co., London, 1920.
[108] S. I. Gel’fand and M. S. Pinsker. Coding for channel with random parameters.
Problems of Control Theory, 9(1):19–31, 1980.
References 575
[109] D. Gerlach and A. Paulraj. Adaptive transmitting antenna methods for mul-
tipath environments. IEEE Global Telecommunications Conference (GLOBE-
COM), 1:425–429, Nov. 1994.
[110] D. Gesbert, H. Bolcskei, D. A. Gore, and A. J. Paulraj. Outdoor MIMO wireless
channels: models and performance prediction. IEEE Transactions on Communi-
cations, 50(12):1926–1934, Dec. 2002.
[111] D. Gesbert, T. Ekman, and N. Christophersen. Capacity limits of dense palm-
sized MIMO arrays. IEEE Global Telecommunications Conference (GLOBE-
COM), 2:1187–1191, Nov. 2002.
[112] M. Godavarti, A. O. Hero III, and T. L. Marzetta. Min-capacity of a multiple-
antenna wireless channel in a static Ricean fading environment. IEEE Transac-
tions on Wireless Communications, 4(4):1715–1723, July 2005.
[113] M. J. E. Golay. Notes on digital coding. Proceedings of the IRE, 37, 1949.
[114] G. D. Golden, G. J. Foschini, R. A. Valenzuela, and P. W. Wolniansky. V-BLAST:
A high capacity space-time architecture for the rich-scattering wireless channel.
Fifth Workshop on Smart Antennas in Wireless Mobile Communications, July
1998.
[115] A. Goldsmith. Wireless Communications. Cambridge University Press, New
York, 2005.
[116] A. Goldsmith, S. A. Jafar, N. Jindal, and S. Vishwanath. Capacity limits of
MIMO channels. IEEE Journal on Selected Areas of Communications, 21(5),
June 2003.
[117] Gene Howard Golub and Charles F. Van Loan. Matrix Computations. Johns
Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press,
Baltimore, 1996.
[118] K. Gomadam, V. R. Cadambe, and S. A. Jafar. Approaching the capacity of wire-
less networks through distributed interference alignment. IEEE Global Telecom-
munications Conference (GLOBECOM), pages 1–6, 2008.
[119] D. A. Gore and A. J. Paulraj. Mimo antenna subset selection with space-time
coding. Signal Processing, IEEE Transactions, 50(10):2580–2588, 2002.
[120] S. Govindasamy. Multiple-Antenna Systems in Ad-Hoc Wireless Networks. Ph. D.
dissertation, Massachusetts Institute of Technology, Department of Electrical En-
gineering and Computer Science, 2008.
[121] S. Govindasamy, F. Antic, D. W. Bliss, and D. Staelin. The performance of linear
multiple-antenna receivers with interferers distributed on a plane. IEEE Inter-
national Workshop on Signal Processing Advances for Wireless Communications,
2005.
[122] S. Govindasamy and D. Bliss. On the spectral efficiency of links with multi-
antenna receivers in non-homogenous wireless networks. In Proceedings of IEEE
ICC, Kyoto, pages 1–6. IEEE, 2011.
[123] S. Govindasamy, D. W. Bliss, and D. H. Staelin. Spectral efficiency in single-hop
ad-hoc wireless networks with interference using adaptive antenna arrays. IEEE
Journal on Selected Areas of Communications, 25(7):1358–1369, Sept. 2007.
[124] S. Govindasamy, D. W. Bliss, and D. H. Staelin. Asymptotic spectral efficiency
of the uplink in spatially distributed wireless networks with multi-antenna base
stations. IEEE Asilomar Conference on Signals, Systems and Computers, 2008.
576 References
[144] Heinrich Hertz. Untersuchungen ueber die Ausbreitung der Elektrischen Kraft.
Johann Ambrosius Barth, Leipzig, 1892.
[145] A. Hjorungnes and D. Gesbert. Complex-valued matrix differentiation:
techniques and key results. IEEE Transactions on Signal Processing, 55(6):2740–
2746, June 2007.
[146] Sungook Hong. Wireless: from Marconi’s Black-box to the Audion. Transforma-
tions. MIT Press, 2001.
[147] A. M. Hunter, J. G. Andrews, and S. Weber. Transmission capacity of ad hoc
networks with spatial diversity. IEEE Transactions on Wireless Communications,
7(12), Dec. 2008.
[148] A. M. Hunter, J. G. Andrews, and S. Weber. Transmission capacity of ad hoc
networks with spatial diversity. IEEE Transactions on Wireless Communications,
2009.
[149] A. M. Hunter, J. Andrews, and S. Weber. Transmission capacity of ad hoc
networks with spatial diversity. Wireless Communications, IEEE Transactions,
7(12):5058–5071, 2008.
[150] IEEE. IEEE standard for information technology – telecommunications and in-
formation exchange between systems – local and metropolitan area networks –
specific requirements – part 11: Wireless LAN medium access control (MAC) and
physical layer (PHY) specifications. IEEE Std 802.11-1997, 1997.
[151] IEEE. IEEE standard for information technology – telecommunications and in-
formation exchange between systems – local and metropolitan area networks
– specific requirements part 11: Wireless LAN medium access control (MAC)
and physical layer (PHY) specifications amendment 5: Enhancements for higher
throughput. IEEE Std 802.11n-2009 (Amendment to IEEE Std 802.11-2007 as
amended by IEEE Std 802.11k-2008, IEEE Std 802.11r-2008, IEEE Std 802.11y-
2008, and IEEE Std 802.11w-2009), 29 2009.
[152] IEEE. IEEE standard for information technology–telecommunications and in-
formation exchange between systems wireless regional area networks (WRAN)–
specific requirements part 22: Cognitive wireless RAN medium access control
(MAC) and physical layer (PHY) specifications: Policies and procedures for op-
eration in the TV bands. IEEE Std 802.22-2011, pages 1–680, 1 2011.
[153] Joseph Mitola III. Cognitive Radio Architecture: The Engineering Foundations
of Radio XML. John Wiley & Sons, Hoboken, New Jersey, 2006.
[154] J. D. Jackson. Classical Electrodynamics. John Wiley & Sons, Hoboken, New
Jersey, 1975.
[155] S. A. Jafar. Exploiting channel correlations – simple interference alignment
schemes with no CSIT. IEEE Global Telecommunications Conference (GLOBE-
COM), pages 1–5, 2010.
[156] S. A. Jafar. Interference Alignment – A New Look at Signal Dimensions in a
Communication Network. Now Publishing, 2011.
[157] Hamid Jafarkhani. Space-Time Coding: Theory and Practice. Cambridge Uni-
versity Press, 2005.
[158] A. K. Jagannatham and B. D. Rao. Cramer–Rao lower bound for con-
strained complex parameters. IEEE Transactions on Signal Processing Letters,
11(11):875–878, Nov. 2004.
578 References
[159] Alan T. James. Distributions of matrix variates and latent roots derived from
normal samples. The Annals of Mathematical Statistics, 35(2):pp. 475–501, 1964.
[160] Mohinder Jankiraman. Space-Time Codes and MIMO Systems. Artech House,
2004.
[161] Z. Ji and K. J. R. Liu. Dynamic spectrum sharing: a game theoretical overview.
IEEE Communications Magazine, 45(5):88–94, May 2007.
[162] Y. Jiang, J. Li, and W. W. Hager. Joint transceiver design for MIMO commu-
nications using geometric mean decomposition. IEEE Transactions on Signal
Processing, 53(10):3791–3803, Oct. 2005.
[163] N. Jindal, J. G. Andrews, and S. Weber. Bandwidth partitioning in decentralized
wireless networks. IEEE Transactions on Wireless Communications, 7(12):5408–
5419, 2008.
[164] N. Jindal, J. G. Andrews, and S. Weber. Rethinking MIMO for wireless networks:
linear throughput increases with multiple receive antennas. IEEE International
Conference on Communications (ICC), June 2009.
[165] N. Jindal, J. G. Andrews, and S. Weber. Multi-antenna communication in ad
hoc networks: Achieving mimo gains with simo transmission. Communications,
IEEE Transactions, 59(2):529–540, 2011.
[166] Y. Jing and B. Hassibi. Distributed space-time coding in wireless relay networks.
Wireless Communications, IEEE Transactions on, 5(12):3524–3536, 2006.
[167] J. B. Johnson. Thermal agitation of electricity in conductors. Physical Review,
32:97–109, Jul 1928.
[168] D. Jonsson. Some limit theorems for the eigenvalues of a sample covariance
matrix. Journal of Multivariate Analysis, 12:1–38, 1982.
[169] R. Kahn. The organization of computer resources into a packet radio network.
IEEE Transactions on Communications, 25(1):169–178, Jan. 1977.
[170] S. Karmakar and M. K. Varanasi. Capacity of the mimo interference channel to
within a constant gap. In Information Theory Proceedings (ISIT), 2011 IEEE
International Symposium on, pages 2193–2197, 31 2011–Aug. 5 2011.
[171] Alan F. Karr. Probability. Springer-Verlag, 1993.
[172] Steven M. Kay. Fundamentals of Statistical Signal Processing: Estimation Theory.
Prentice Hall, Upper Saddle River, NJ, 1993.
[173] E. J. Kelly and K. W. Forsythe. Adaptive Detection and Parameter Estima-
tion for Multidimensional Signal Models. Technical Report 848, M.I.T. Lincoln
Laboratory, April, 1989.
[174] Maurice Kendall and Alan Stuart. The Advanced Theory of Statistics. Macmillan
Publishing, New York, 1979.
[175] H. S. Kim and A. O. Hero. Comparison of GLR and invariant detectors un-
der structured clutter covariance. IEEE Transactions on Image Processing,
10(10):1509–1520, Oct. 2001.
[176] A. N. Kolmogorov. Stationary sequences in Hilbert space. Bulletin of Moscow
University, 2(6):1–40, 1941.
[177] G. Kramer. Outer bounds on the capacity of Gaussian interference channels.
IEEE Transactions on Information Theory, 50(3):581–586, 2004.
[178] John Daniel Kraus and Daniel A. Fleisch, editors. Electromagnetics with Appli-
cations, 5th Edition. McGraw-Hill, New York, 1999.
References 579
[219] J. Mitola III and G. Q. Maguire, Jr. Cognitive radio: making software radios
more personal. IEEE Personal Communications, 6(4):13–18, Aug. 1991.
[220] Sanjit Kumar Mitra. Digital Signal Processing: A Computer Based Approach.
McGraw-Hill, New York, 2006.
[221] A. N. Mody and G. L. Stuber. Synchronization for MIMO OFDM systems. IEEE
Global Telecommunications Conference (GLOBECOM), 1:509–513, 2001.
[222] A. F. Molisch, M. Z. Win, and J. H. Winters. Space-time-frequency (stf) coding
for mimo-ofdm systems. Communications Letters, IEEE, 6(9):370–372, 2002.
[223] R. A. Monzingo and T. W. Miller. Introduction to Adaptive Arrays. John Wiley
& Sons, New York, 1980.
[224] A. S. Motahari and A. K. Khandani. Capacity bounds for the Gaussian interfer-
ence channel. IEEE Transactions on Information Theory, 55(2):620–643, 2009.
[225] J. C. Mundarath, P. Ramanathan, and B. D. Van Veen. A cross layer scheme
for adaptive antenna array based wireless ad hoc networks in multipath environ-
ments. Wireless Networks, 13:597–615, October 2007.
[226] A.F. Naguib, V. Tarokh, N. Seshadri, and A.R. Calderbank. A space-time coding
modem for high-data-rate wireless communications. Selected Areas in Commu-
nications, IEEE Journal, 16(8):1459–1478, 1998.
[227] J. R. Nash. Equilibrium points in n-person games. Proceedings of the National
Academy of Sciences of the United States of America, 36(1):48–49, Jan. 1950.
[228] B. Nazer, S. A. Jafar, M. Gastpar, and S. Vishwanath. Ergodic interference
alignment. IEEE International Symposium on Information Theory (ISIT), pages
1769–1773, 2009.
[229] A. Nehorai and E. Paldi. Vector-sensor array processing for electromagnetic
source localization. IEEE Transactions on Signal Processing, 42(2):376–398,
February 1994.
[230] David L. Nicholson. Spread Spectrum Signal Design: LPE and AJ Systems. Com-
puter Science Press, New York, 1988.
[231] D. Niyato and E. Hossain. Market-equilibrium, competitive, and cooperative
pricing for spectrum sharing in cognitive radio networks: analysis and comparison.
IEEE Transactions on Wireless Communications, 7(11):4273–4283, Nov. 2008.
[232] A. Nuttall. Some integrals involving the qm function. IEEE Transactions on
Information Theory, 21(1):95–96, Jan. 1975.
[233] H. Nyquist. Thermal agitation of electric charge in conductors. Physical Review,
32:110–113, July 1928.
[234] H. Ochiai, P. Mitran, H. V. Poor, and V. Tarokh. Collaborative beamforming
for distributed wireless ad hoc sensor networks. IEEE Transactions on Signal
Processing, 53(11):4110–4124, Nov. 2005.
[235] Atsuyuki Okabe, Barry Boots, Kokichi Sugihara, and Sung Nok Chiu. Spatial
Tessellations. Concepts and Applications of Voronoi Diagrams. With a foreword
by DG Kendall. John Wiley & Sons, Hoboken, New Jersey, 2000.
[236] B. M. Oliver. Thermal and quantum noise. Proceedings of the IEEE, 53(5):436–
454, May 1965.
[237] E. Ollila, V. Koivunen, and J. Eriksson. On the Cramer–Rao for the constrained
and unconstrained complex parameters. IEEE Sensor Array and Multichannel
Signal Processing Workshop, pages 414–418, July 2008.
582 References
[238] Alan V. Oppenheim, Ronald W. Schafer, and John R. Buck. Discrete-Time Signal
Processing. Prentice Hall, Upper Saddle River, NJ, 1999.
[239] H. C. Ørsted, K. Jelved, A. D. Jackson, and O. Knudsen. Selected Scientific
Works of Hans Christian Ørsted. Princeton University Press, 1998.
[240] A. Ozgur, O. Leveque, and D. Tse. Hierarchical cooperation achieves optimal
capacity scaling in ad-hoc networks. IEEE Transactions on Information Theory,
53(10):3549–3572, Oct. 2007.
[241] Athanasios Papoulis and S. Unnikrishna Pillai. Probability, Random Variables,
and Stochastic Processes, 4th Edition. McGraw-Hill, New York, 2002.
[242] M. Park, S.-H. Choi, and S. M. Nettles. Cross-layer mac design for wireless
networks using MIMO. IEEE Global Telecommunications Conference (GLOBE-
COM), 5:5 pp. –2874, Dec. 2005.
[243] P. A. Parker and D. W. Bliss. Outer bounds for the MIMO interference channel.
IEEE Asilomar Conference on Signals, Systems and Computers, pages 1108–
1112, Oct. 2008.
[244] P. A. Parker, P. Mitran, D. W. Bliss, and V. Tarokh. On bounds and algorithms
for frequency synchronization for collaborative communication systems. IEEE
Transactions on Signal Processing, 56(8):3742–3752, Aug. 2008.
[245] A. J. Paulraj and T. Kailath. Increasing capacity in wireless broadcast systems
using distributed transmission/directional reception (dtdr), 1994. U.S. Patent
5,345,599.
[246] A. J. Paulraj and C. B. Papadias. Space-time processing for wireless communi-
cations. IEEE Signal Processing Magazine, 14(6):49–83, Nov. 1997.
[247] Arogyswami Paulraj, Rohit Nabar, and Dhananjay Gore. Introduction to Space-
Time Wireless Communications. Cambridge University Press, Cambridge, 2003.
[248] S. U. Pillai and C. S. Burrus. Array Signal Processing. Springer-Verlag, 1989.
[249] P. Pirenen. Cellular topology and outage evaluation for ds-uwb system with
correlated lognormal multipath fading. The 17th Annual IEEE International
Symposium on Personal, Indoor and Mobile Radio Communications, 2006.
[250] H. V. Poor. An Introduction to Signal Detection and Estimation. Springer-Verlag,
1994.
[251] H. V. Poor and G. W. Wornell. Wireless Communications: Signal Processing
Perspectives. Prentice Hall, 1998.
[252] David M. Pozar. Microwave Engineering. John Wiley & Sons, Hoboken, New
Jersey, 2005.
[253] N. Prasad and M. K. Varanasi. Outage theorems for MIMO block-fading chan-
nels. IEEE Transactions on Information Theory, 52(12):5284–5296, Dec. 2006.
[254] R. Price. and P. E. Green. A communication technique for multipath channels.
Proceedings of the Institute of Radio Engineers, 46(3):555–570, March 1958.
[255] John G. Proakis. Digital Communications. McGraw-Hill, New York, 2001.
[256] John G. Proakis and Dimitris G. Manolakis. Digital Signal Processing. Pearson
Prentice Hall, 2007.
[257] R. W. Heath, Jr. and G. B. Giannakis. Exploiting input cyclostationarity for
blind channel identification in OFDM systems. IEEE Transactions on Signal
Processing, 47(3):848–856, March 1999.
[258] C. Rao and B. Hassibi. Analysis of multiple-antenna wireless links at low snr.
IEEE Transactions on Information Theory, 50(9):2123–2130, Sept. 2004.
References 583
[279] M. Schwartz. Edouard Branly, the coherer, and the Branly effect [history of
communications]. IEEE Communications Magazine, 47(9):20–26, Sept. 2009.
[280] Mischa Schwartz, William R. Bennett, and Seymour Stein. Communication Sys-
tems and Techniques. McGraw-Hill, New York, 1966.
[281] X. Shang, B. Chen, G. Kramer, and H. V. Poor. Interference suppression in the
presence of quantization errors. Allerton Conference on Communication, Control,
and Computing, pages 700–707, Sept. 2008.
[282] X. Shang, B. Chen, G. Kramer, and H. V. Poor. Capacity regions and sum-
rate capacities of vector gaussian interference channels. IEEE Transactions on
Information Theory, 56(10):5030–5044, Oct. 2010.
[283] Xiaohu Shang, Biao Chen, and Michael J. Gans. On the achievable sum rate for
MIMO interference channels. IEEE Transactions on Information Theory, 52(9),
September 2006.
[284] C. E. Shannon. A mathematical theory of communication. Bell System Technical
Journal, 27:379–423, July 1948.
[285] D. F. Sievenpiper, D. C. Dawson, M. M. Jacob, T. Kanar, S. Kim, J. Long,
and R. G. Quarfoth. Experimental validation of performance limits and design
guidelines for small antennas. IEEE Transactions onAntennas and Propagation,
60(1):8–19, Jan. 2012.
[286] J. W. Silverstein. Eigenvalues and eigenvectors of large dimensional sample co-
variance matrices. Contemporary Mathematics, 50:153–159, 1986.
[287] Bernard Sklar. Digital Communications: Fundamentals and Applications. Pren-
tice Hall, 1988.
[288] S. T. Smith. Statistical resolution limits and the complexified Cramer–Rao
bound. IEEE Transactions on Signal Processing, 53(5):1597–1609, May 2005.
[289] D. H. Staelin, D. W. Bliss Jr, D. A. Hinton, et al. Protocols for multi-antenna ad-
hoc wireless networking in interference environments. PhD thesis, Massachusetts
Institute of Technology, 2010.
[290] D. H. Staelin, A. W. Morgenthaler, and J. A. Kong. Electromagnetic Waves.
Prentice Hall Englewood Cliffs, NJ, 1994.
[291] William Stallings. Data and Computer Communications. Pearson/Prentice Hall,
2007.
[292] A. Stefanov and T. M. Duman. Turbo coded modulation for wireless communi-
cations with antenna diversity. Proceedings of IEEE Vehicular Technology Con-
ference, Amsterdam, 3:1565–1569, Sept. 1999.
[293] S. Stein. Unified analysis of certain coherent and noncoherent binary communi-
cations systems. IEEE Transactions on Information Theory, 10(1):43–51, Jan.
1964.
[294] Bernard D. Steinberg. Principles of Aperture and Array System Design: Including
Random and Adaptive Arrays. John Wiley & Sons, New York, 1976.
[295] P. Stoica, E.G. Larsson, and A. B. Gershman. The stochastic CRB for array
processing: a textbook derivation. IEEE Signal Processing Letters, 8(5):148–150,
May 2001.
[296] P. Stoica and A. Nehorai. MUSIC, maximum likelihood, and Cramer–Rao bound.
IEEE Transactions on Acoustics, Speech and Signal Processing, 37(5):720–741,
May 1989.
References 585
[297] Petre Stoica and Randolph Moses. Introduction to Spectral Analysis. Prentice
Hall, 1997.
[298] D. Stoyan, W. S. Kendall, and J. Mecke. Stochastic Geometry and Its Applica-
tions. John Wiley & Sons, Hoboken, New Jersey, 1995.
[299] Dietrich Stoyan, Wilfrid S. Kendall, and Joseph Mecke. Stochastic Geometry and
Its Applications, 2nd Edition. John Wiley & Sons, 1995.
[300] P. D. Sutton, K. E. Nolan, and L. E. Doyle. Cyclostationary signatures in prac-
tical cognitive radio applications. IEEE Journal on Selected Areas in Communi-
cations, 26(1):13–24, Jan. 2008.
[301] T. Svantesson. A double-bounce channel model for multi-polarized MIMO sys-
tems. IEEE Vehicular Technology Conference, 2:691–695, Fall 2002.
[302] T. Svantesson and A. L. Swindlehurst. A performance bound for prediction of
mimo channels. IEEE Transactions on Signal Processing, 54(2):520–529, Feb.
2006.
[303] A. Taherpour, M. Nasiri-Kenari, and S. Gazor. Multiple antenna spectrum
sensing in cognitive radios. IEEE Transactions on Wireless Communications,
9(2):814–823, Feb. 2010.
[304] Tapan K. Sarkar et al. History of Wireless. John Wiley & Sons, Hoboken, New
Jersey, 2006.
[305] V. Tarokh, H. Jafarkhani, and A. R. Calderbank. Space-time block codes from
orthogonal designs. IEEE Transactions on Information Theory, 45(5):1456–1467,
July 1999.
[306] V. Tarokh, A. Naguib, N. Seshadri, and A.R. Calderbank. Combined array
processing and space-time coding. Information Theory, IEEE Transactions on,
45(4):1121–1128, 1999.
[307] V. Tarokh, N. Seshadri, and A. R. Calderbank. Space-time codes for high data
rate wireless communication: performance criterion and code construction. IEEE
Transactions on Information Theory, 44(2):744–765, March 1998.
[308] I. E. Telatar. Capacity of multi-antenna Gaussian channels. European Transac-
tions on Telecommunications, 10(6):585–595, Nov.–Dec. 1999.
[309] N. Tesla. System of transmission of electrical energy, 1890. U.S. Patent 645,576.
[310] N. Tesla. On light and other high frequency phenomena. Record of Franklin
Institute, 1893.
[311] S. C. Thompson, J. G. Proakis, and J. R. Zeidler. The effectiveness of signal
clipping for PAPR and total degradation reduction in OFDM systems. IEEE
Global Telecommunications Conference (GLOBECOM), 5:5 pp. –2811, Dec. 2005.
[312] H. L. Van Trees. Detection, Estimation, and Modulation Theory, Part I. John
Wiley & Sons, New York, 1968.
[313] D. Tse and S. Hanly. Linear multiuser receivers: effective interference, effec-
tive bandwidth and user capacity. IEEE Transactions on Information Theory,
45(2):641–657, 1999.
[314] David Tse and Pramod Viswanath. Fundamentals of Wireless Communication.
Cambridge University Press, Cambridge, 2005.
[315] Antonia M. Tulino and Sergio Verdu. Random Matrix Theory and Wireless Com-
munications. Now Publishers, 2004.
[316] G. Ungerboeck. Trellis-coded modulation with redundant signal sets part i: in-
troduction. IEEE Communications Magazine, 25(2):5–11, Feb. 1987.
586 References
[317] G. Ungerboeck. Trellis-coded modulation with redundant signal sets part ii: state
of the art. IEEE Communications Magazine, 25(2):12–21, Feb. 1987.
[318] A. van den Bos. A Cramer–Rao lower bound for complex parameters. IEEE
Transactions on Signal Processing, 42(10), Oct. 1994.
[319] R. van Nee and R. Prasad. OFDM Wireless Multimedia Communications. Artech
House, Boston, 2000.
[320] M. K. Varanasi, C. T. Mullis, and A. Kapur. On the limitation of linear MMSE
detection. IEEE Transactions on Information Theory, 52(9):4282–4286, Sept.
2006.
[321] R. Vaze and R. W. Heath. Transmission capacity of ad-hoc networks with mul-
tiple antennas using transmit stream adaptation and interference cancellation.
Information Theory, IEEE Transactions, 58(2):780–792, 2012.
[322] S. Verdu. Optimum multi-user signal detection. Ph.D. Thesis, Dept. of Electrical
and Computer Engineering, University of Illinois, Aug. 1984.
[323] S. Verdu and S. Shamai (Shitz). Spectral efficiency of CDMA with random
spreading. IEEE Transactions on Information Theory, 45(2):622–640, March
1999.
[324] Sergio Verdu. Multiuser Detection. Cambridge University Press, Cambridge,
1998.
[325] E. Visotsky and U. Madhow. Space-time transmit precoding with imperfect
feedback. IEEE Transactions on Information Theory, 47(6):2632–2639, Sept.
2001.
[326] P. Viswanath and D. Tse. Sum capacity of the vector gaussian broadcast channel
and uplinkdownlink duality. IEEE Transactions on Information Theory, 49,
August 2003.
[327] A. Viterbi. Error bounds for convolutional codes and an asymptotically optimum
decoding algorithm. IEEE Transactions on Information Theory, 13(2):260–269,
April 1967.
[328] A.J. Viterbi et al. CDMA: Principles of spread spectrum communication, volume
129. Addison-Wesley, 1995.
[329] M. Vu and V. Tarokh. Scaling laws of single-hop cognitive networks. IEEE
Transactions on Wireless Communications, 8(8):4089–4097, Aug. 2009.
[330] Mai Vu and A. Paulraj. MIMO wireless linear precoding. IEEE Signal Processing
Magazine, 24(5):86–105, Sept. 2007.
[331] Branka Vucetic and Jinhong Yuan. Space-Time Coding. John Wiley & Sons,
2003.
[332] J. W. Wallace, Chan Chen, and M. A. Jensen. Key generation exploiting MIMO
channel evolution: Algorithms and theoretical limits. European Conference on
Antennas and Propagation, (EuCAP), pages 1499–1503, March 2009.
[333] D. Wang and J. Zhang. Timing synchronization for MIMO OFDM WLAN sys-
tems. IEEE Wireless Communications and Networking Conference, pages 1177–
1182, March 2007.
[334] X. Wang. Volumes of generalized unit balls. IMathematics Magazine, 78(5):390–
395, Dec. 2005.
[335] X. Wang and H. V. Poor. Space-time multiuser detection in multipath CDMA
channels. IEEE Transactions on Signal Processing, 47(9):2356–2374, Sept. 1999.
References 587
[336] J. Ward and R. T. Compton, Jr. Improving the performance of a slotted ALOHA
packet radio network with an adaptive array. IEEE Transactions on Communi-
cations, 40(2):292–300, Feb. 1992.
[337] J. Ward and R. T. Compton, Jr. High throughput slotted ALOHA packet
radio networks with adaptive arrays. IEEE Transactions on Communications,
41(3):460–470, March 1993.
[338] W. W. Ward. The NOMAC and Rake systems. Lincoln Laboratory Journal,
5(3):351–366, 1992.
[339] M. Wax and T. Kailath. Detection of signals by information theoretic criteria.
IEEE Transactions on Acoustics, Speech and Signal Processing, 33(2):387– 392,
April 1985.
[340] S. Weber, X. Yang, J. G. Andrews, and G. de Veciana. Transmission capac-
ity of wireless ad-hoc networks with outage constraints. IEEE Transactions on
Information Theory, pages 4091–4102, Dec. 2005.
[341] H. Weingarten, Y. Steinberg, and S. Shamai (Shitz). The capacity region of the
Gaussian multiple-input-multiple-output broadcast channel. IEEE Transactions
on Information Theory, 52(9), Sept. 2006.
[342] E. Weinstein and A. J. Weiss. A general class of lower bounds in parameter
estimation. IEEE Transactions on Information Theory, 34:338–342, March 1988.
[343] E. T. Whittaker and G. N. Watson. A Course of Modern Analysis. Cambridge
University Press, Cambridge, 1927.
[344] B. Widrow and S. S. Haykin. Least-Mean-Square Adaptive Filters. John Wiley
& Sons, Hoboken, New Jersey, 2003.
[345] B. Widrow and Jr. M. E. Hoff. Adaptive switching circuits. Convention Record
of IRE WESCON, 4:96–104, 1960.
[346] Norbert Wiener. Extrapolation, Interpolation, and Smoothing of Stationary Time
Series. John Wiley & Sons, New York, 1949.
[347] E. P. Wigner. Characteristic vectors of bordered matrices with infinite dimen-
sions. Annals of Mathematics, 62(3):548, Nov. 1955.
[348] William H. Tranter, et al., editor. The Best of the Best: Fifty Years of Commu-
nications and Networking Research. John Wiley & Sons, Hoboken, New Jersey,
2007.
[349] J. Winters. On the capacity of radio communication systems with diversity in a
Rayleigh fading environment. IEEE Journal on Selected Areas in Communica-
tions, 5(5):871–878, June 1987.
[350] J. H. Winters, J. Salz, and R. D. Gitlin. The capacity of wireless communication
systems can be substantially increased by the use of antenna diversity. Proceed-
ings of the 1st International Conference on Universal Personal Communications,
pages 02.01/1–02.01/5, 1992.
[351] J. H. Winters, J. Salz, and R. D. Gitlin. The impact of antenna diversity on the
capacity of wireless communication systems. IEEE Transactions on Communi-
cations, 42(234):1740–1751, 1994.
[352] W. Wirtinger. Zur formalen Theorie der Functionen von mehr complexen Vern-
derlichen. Mathematische Annalen, 97(1):357–375, 1927.
[353] P. W. Wolniansky, G. J. Foschini, G. D. Golden, and R. A Valenzuela. V-BLAST:
an architecture for realizing very high data rates over the rich-scattering wireless
588 References
associate, 18 bound
asymptotic eigenvalue densities, 270 spectral efficiency, 243
atom, 270 BPSK, 120
attenuation Branly, Edouard Eugene Desire, 3
line-of-sight, 143 Braun, Karl Ferdinand, 3
auto-correlation, 87 brick-wall filter, 138