AdaptiveWirelessCommunicationsMIMOChannelsandNetworks-1

Adaptive Wireless Communications
Adopting a balanced mix of theory, algorithms, and practical design issues, this compre-
hensive volume explores cutting-edge applications in adaptive wireless communications,
and the implications these techniques have for future wireless network performance.
Presenting practical concerns in the context of different strands from information
theory, parameter estimation theory, array processing, and wireless communications, the
authors present a complete picture of the field. Topics covered include advanced multiple-
antenna adaptive processing, ad hoc networking, MIMO, MAC protocols, space-time
coding, cellular networks, and cognitive radio, with the significance and effects of both
internal and external interference a recurrent theme throughout.
A broad, self-contained technical introduction to all the necessary mathematics, statis-
tics, estimation theory and information theory is included, and topics are accompanied
by a range of engaging end-of-chapter problems. With solutions available online, this is
the perfect self-study resource for students of advanced wireless systems, and wireless
industry professionals.
Daniel W. Bliss is an Associate Professor in the School of Electrical, Computer and

Energy Engineering at Arizona State University.
Siddhartan Govindasamy is an Assistant Professor of Electrical and Computer Engineer-
ing at Franklin W. Olin College of Engineering, Massachusetts.
“An excellent and well-written book. This book is a must for any wireless PHY system
engineer.”
Vahid Tarokh, Harvard University
“Great book! Fills a gap in the wireless communication textbook arena with its com-
prehensive signal-processing focus. It does a nice job of handling the breadth-vs-depth
trade-off in a topic-oriented textbook, and is perfect for beginning graduate students or
practicing engineers who want the best of both worlds: broad coverage of both old and
new topics, combined with mathematical fundamentals and detailed derivations. It pro-
vides a great single-reference launching point for readers who want to dive into wireless
communications research and development, particularly those involving multi-antenna
applications. It will become a standard prerequisite for all my graduate students.”
A. Lee Swindlehurst, University of California, Irvine
Adaptive Wireless
Communications
MIMO Channels and Networks
DANIEL W. BLISS
Arizona State University
SIDDHARTAN GOVINDASAMY
Franklin W. Olin College of Engineering, Massachusetts
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town,
Singapore, São Paulo, Delhi, Mexico City
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9781107033207

C Dan Bliss and Siddhartan Govindasamy 2013
Dan Bliss’s contributions are a work of the United States Government and
are not protected by copyright in the United States.
This publication is in copyright. Subject to statutory exception

and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2013
Printed and bound in the United Kingdom by the MPG Books Group
A catalogue record for this publication is available from the British Library
Library of Congress Cataloguing in Publication data

Bliss, Daniel W., 1966–
Adaptive wireless communications : MIMO channels and networks /
Daniel W. Bliss, Siddhartan Govindasamy.
pages cm
Includes bibliographical references and index.
ISBN 978-1-107-03320-7 (hardback)
1. MIMO systems. 2. Wireless communication systems.
3. Adaptive signal processing. I. Govindasamy, Siddhartan. II. Title.
TK5103.4836.B54 2013
621.384 – dc23 2012049257
ISBN 978-1-107-03320-7 Hardback
Additional resources for this publication at www.cambridge.org/bliss
Cambridge University Press has no responsibility for the persistence or

accuracy of URLs for external or third-party internet websites referred to
in this publication, and does not guarantee that any content on such
websites is, or will remain, accurate or appropriate.
The views expressed are those of the author (D. W. B.) and do not reflect the
official policy or position of the Department of Defense or the U.S. Government.
Contents
Preface page xvii

Acknowledgments xviii
1 History 1
1.1 Development of electromagnetics 1
1.2 Early wireless communications 2
1.3 Developing communication theory 5
1.4 Television broadcast 6
1.5 Modern communications advances 7
1.5.1 Early packet-radio networks 9
1.5.2 Wireless local-area networks 10
2 Notational and mathematical preliminaries 12

2.1 Notation 12
2.1.1 Table of symbols 12
2.1.2 Scalars 12
2.1.3 Vectors and matrices 14
2.1.4 Vector products 16
2.1.5 Matrix products 17
2.2 Norms, traces, and determinants 19
2.2.1 Norm 19
2.2.2 Trace 19
2.2.3 Determinants 19
2.3 Matrix decompositions 21
2.3.1 Eigen analysis 21
2.3.2 Eigenvalues of 2 × 2 Hermitian matrix 22
2.3.3 Singular-value decomposition 22
2.3.4 QR decomposition 23
2.3.5 Matrix subspaces 24
2.4 Special matrix forms 26
2.4.1 Element shifted symmetries 26
2.4.2 Eigenvalues of low-rank matrices 26
2.5 Matrix inversion 27
2.5.1 Inversion of matrix sum 28
viii Contents
2.6 Useful matrix approximations 28

2.6.1 Log determinant of identity plus small-valued matrix 28
2.6.2 Hermitian matrix raised to large power 29
2.7 Real derivatives of multivariate expressions 29
2.7.1 Derivative with respect to real vectors 30
2.8 Complex derivatives 33
2.8.1 Cauchy–Riemann equations 34
2.8.2 Wirtinger calculus for complex variables 35
2.8.3 Multivariate Wirtinger calculus 38
2.8.4 Complex gradient 38
2.9 Integration over complex variables 39
2.9.1 Path and contour integrals 40
2.9.2 Volume integrals 42
2.10 Fourier transform 44
2.10.1 Useful Fourier relationships 45
2.10.2 Discrete Fourier transform 46
2.11 Laplace transform 48
2.12 Constrained optimization 48
2.12.1 Equality constraints 48
2.12.2 Inequality constraints 51
2.12.3 Calculus of variations 53
2.13 Order of growth notation 57
2.14 Special functions 58
2.14.1 Gamma function 58
2.14.2 Hypergeometric series 59
2.14.3 Beta function 61
2.14.4 Lambert W function 61
2.14.5 Bessel functions 62
2.14.6 Error function 63
2.14.7 Gaussian Q-function 63
2.14.8 Marcum Q-function 63
Problems 63
3 Probability and statistics 66

3.1 Probability 66
3.1.1 Bayes’ theorem 66
3.1.2 Change of variables 67
3.1.3 Central moments of a distribution 68
3.1.4 Noncentral moments of a distribution 69
3.1.5 Characteristic function 70
3.1.6 Cumulants of distributions 70
3.1.7 Multivariate probability distributions 70
3.1.8 Gaussian distribution 71
3.1.9 Rayleigh distribution 72
3.1.10 Exponential distribution 73
Contents ix
3.1.11 Central χ2 distribution 73

3.1.12 Noncentral χ2 distribution 75
3.1.13 F distribution 76
3.1.14 Rician distribution 77
3.1.15 Nakagami distribution 78
3.1.16 Poisson distribution 78
3.1.17 Beta distribution 79
3.1.18 Logarithmically normal distribution 79
3.1.19 Sum of random variables 80
3.1.20 Product of Gaussians 81
3.2 Convergence of random variables 81
3.2.1 Convergence modes of random variables 82
3.2.2 Relationship between modes of convergence 83
3.3 Random processes 86
3.3.1 Wide-sense stationary random processes 88
3.3.2 Action of linear-time-invariant systems on wide-sense
stationary random processes 88
3.3.3 White-noise processes 89
3.4 Poisson processes 91
3.5 Eigenvalue distributions of finite Wishart matrices 92
3.6 Asymptotic eigenvalue distributions of Wishart matrices 92
3.6.1 Marcenko–Pastur theorem 94
3.7 Estimation and detection in additive Gaussian noise 95
3.7.1 Estimation in additive Gaussian noise 95
3.7.2 Detection in additive Gaussian noise 96
3.7.3 Receiver operating characteristics 98
3.8 Cramer–Rao parameter estimation bound 99
3.8.1 Real parameter formulation 99
3.8.2 Real multivariate Cramer–Rao bound 102
3.8.3 Cramer–Rao bound for complex parameters 105
Problems 116
4 Wireless communications fundamentals 118

4.1 Communication stack 118
4.2 Reference digital radio link 119
4.2.1 Wireless channel 122
4.2.2 Thermal noise 123
4.3 Cellular networks 125
4.3.1 Frequency reuse 127
4.3.2 Multiple access in cells 128
4.4 Ad hoc wireless networks 132
4.4.1 Achievable data rates in ad hoc wireless networks 134
4.5 Sampled signals 137
Problems 138
x Contents
5 Simple channels 141

5.1 Antennas 141
5.2 Line-of-sight attenuation 143
5.2.1 Gain versus effective area 144
5.2.2 Beamwidth 147
5.3 Channel capacity 149
5.3.1 Geometric interpretation 149
5.3.2 Mutual information 156
5.3.3 Additive Gaussian noise channel 159
5.3.4 Additive Gaussian noise channel with state 162
5.4 Energy per bit 165
Problems 168
6 Antenna arrays 170

6.1 Wavefront 170
6.1.1 Geometric interpretation 172
6.1.2 Steering vector 173
6.2 Array beam pattern 174
6.2.1 Beam pattern in a plane 176
6.3 Linear arrays 179
6.3.1 Beam pattern symmetry for linear arrays 182
6.3.2 Fourier transform interpretation 182
6.3.3 Continuous Fourier transform approximation 184
6.4 Sparse arrays 186
6.4.1 Sparse arrays on a regular lattice 186
6.4.2 Irregular random sparse arrays 188
6.5 Polarization-diverse arrays 196
6.5.1 Polarization formulation 196
Problems 198
7 Angle-of-arrival estimation 201

7.1 Maximum-likelihood angle estimation with known reference 203
7.2 Maximum-likelihood angle estimation with unknown signal 205
7.3 Beamscan 205
7.4 Minimum-variance distortionless response 207
7.5 MuSiC 208
7.6 Example comparison of spatial energy estimators 210
7.7 Local angle-estimation performance bounds 211
7.7.1 Cramer–Rao bound of angle estimation 211
7.7.2 Cramer–Rao bound: signal in the mean 212
7.7.3 Cramer–Rao bound: random signal 214
7.8 Threshold estimation 218
7.8.1 Types of transmitted signals 219
7.8.2 Known reference signal test statistic 219
7.8.3 Independent Rician random variables 221
Contents xi
7.8.4 Correlated Rician random variables 226

7.8.5 Unknown complex Gaussian signal 231
7.9 Vector sensor 235
Problems 237
8 MIMO channel 239

8.1 Flat-fading channel 239
8.2 Interference 241
8.2.1 Maximizing entropy 242
8.3 Flat-fading MIMO capacity 243
8.3.1 Channel-state information at the transmitter 245
8.3.2 Informed-transmitter (IT) capacity 247
8.3.3 Uninformed-transmitter (UT) capacity 252
8.3.4 Capacity ratio, cI T /cU T 256
8.4 Frequency-selective channels 258
8.5 2 × 2 Line-of-sight channel 259
8.6 Stochastic channel models 264
8.6.1 Spatially uncorrelated Gaussian channel model 265
8.6.2 Spatially correlated Gaussian channel model 266
8.7 Large channel matrix capacity 270
8.7.1 Large-dimension Gaussian probability density 270
8.7.2 Uninformed transmitter spectral efficiency bound 271
8.7.3 Informed transmitter capacity 272
8.8 Outage capacity 275
8.9 SNR distributions 275
8.9.1 Total power 277
8.9.2 Fractional loss 279
8.10 Channel estimation 281
8.10.1 Cramer–Rao bound 283
8.11 Estimated versus average SNR 286
8.11.1 Average SNR 287
8.11.2 Estimated SNR 287
8.11.3 MIMO capacity for estimated SNR in block fading 289
8.11.4 Interpretation of various capacities 290
8.12 Channel-state information at transmitter 291
8.12.1 Reciprocity 291
8.12.2 Channel estimation feedback 292
Problems 293
9 Spatially adaptive receivers 295

9.1 Adaptive spectral filtering 298
9.1.1 Discrete Wiener filter 298
9.2 Adaptive spatial processing 300
9.2.1 Spatial matched filter 301
9.2.2 Minimum-interference spatial beamforming 303
xii Contents
9.2.3 MMSE spatial processing 311

9.2.4 Maximum SINR 313
9.3 SNR loss performance comparison 315
9.3.1 Minimum-interference beamformer 317
9.3.2 MMSE beamformer 318
9.4 MIMO performance bounds of suboptimal adaptive receivers 322
9.4.1 Receiver beamformer channel 323
9.5 Iterative receivers 328
9.5.1 Recursive least squares (RLS) 328
9.5.2 Least mean squares (LMS) 331
9.6 Multiple-antenna multiuser detector 333
9.6.1 Maximum-likelihood demodulation 333
9.7 Covariance matrix conditioning 337
Problems 339
10 Dispersive and doubly dispersive channels 341

10.1 Discretely sampled channel issues 342
10.2 Noncommutative delay and Doppler operations 344
10.3 Effect of frequency-selective fading 345
10.4 Static frequency-selective channel model 348
10.5 Frequency-selective channel compensation 348
10.5.1 Eigenvalue distribution of space-time covariance matrix 349
10.5.2 Space-time adaptive processing 353
10.5.3 Orthogonal-frequency-division multiplexing 354
10.6 Doubly dispersive channel model 356
10.6.1 Doppler-domain representation 357
10.6.2 Eigenvalue distribution of space-time-frequency
covariance matrix 358
10.7 Space-time-frequency adaptive processing 361
10.7.1 Sparse space-time-frequency processing 362
Problems 362
11 Space-time coding 365

11.1 Rate diversity trade-off 365
11.1.1 Probability of error formulation 366
11.1.2 Outage probability formulation 367
11.2 Block codes 369
11.2.1 Alamouti’s code 371
11.2.2 Orthogonal space-time block codes 373
11.3 Performance criteria for space-time codes 374
11.4 Space-time trellis codes 376
11.4.1 Trellis-coded modulation 376
11.4.2 Space-time trellis coding 376
11.5 Bit-interleaved coded modulation 381
Contents xiii
11.5.1 Single-antenna bit-interleaved coded modulation 381

11.5.2 Multiantenna bit-interleaved coded modulation 382
11.5.3 Space-time turbo codes 384
11.6 Direct modulation 385
11.7 Universal codes 386
11.8 Performance comparisons of space-time codes 388
11.9 Computations versus performance 388
Problems 390
12 2 × 2 Network 392
12.1 Introduction 392
12.2 Achievable rates of the 2 × 2 MIMO network 393
12.2.1 Single-antenna Gaussian interference channel 393
12.2.2 Achievable rates of the MIMO interference channel 397
12.3 Outer bounds of the capacity region of the Gaussian MIMO
interference channel 399
12.3.1 Outer bounds to the capacity region of the single-antenna
Gaussian interference channel 399
12.3.2 Outer bounds to the capacity region of the Gaussian
interference channel with multiple antennas 405
12.4 The 2 × 2 cognitive MIMO network 408
12.4.1 Non-cooperative primary link 409
12.4.2 Cooperative primary link 412
Problems 412
13 Cellular networks 414

13.1 Point-to-point links and networks 414
13.2 Multiple access and broadcast channels 414
13.3 Linear receivers in cellular networks with Rayleigh fading and
constant transmit powers 422
13.3.1 Link lengths in cellular networks 423
13.3.2 General network model 425
13.3.3 Antenna-selection receiver 425
13.3.4 Matched filter 427
13.3.5 Linear minimum-mean-square-error receiver 429
13.3.6 Laplacian of the interference 432
13.4 Linear receivers in cellular networks with power control 436
13.4.1 System model 437
13.4.2 Optimality of parallelized transmissions with link CSI 438
13.4.3 Asymptotic spectral efficiency of parallelized system 442
13.4.4 Application to power-controlled systems without out-of-
cell interference 445
13.4.5 Monte Carlo simulations 446
13.5 Matched-filter receiver in power-controlled cellular networks 448
xiv Contents
13.5.1 Application to power-controlled systems with

out-of-cell interference 449
13.6 Summary 467
Problems 467
14 Ad hoc networks 470

14.1.1 Capacity scaling laws of ad hoc wireless networks 470
14.2 Multiantenna links in ad hoc wireless networks 475
14.2.1 Asymptotic spectral efficiency of ad hoc wireless networks
with limited transmit channel-state information and
minimum-mean-square-error (MMSE) receivers 476
14.2.2 Spatially distributed network model 478
14.2.3 Asymptotic spectral efficiency without transmit
channel-state information 480
14.2.4 Maximum-signal-to-leakage-plus-noise ratio receiver 482
14.3 Linear receiver structures in spatially distributed networks 484
14.3.1 Linear MMSE receivers in Poisson networks 484
14.3.2 Laplacian of the interference in Poisson networks and
matched-filter and antenna-selection receivers 485
14.4 Interference alignment 487
Problems 491
15 Medium-access-control protocols 495

15.1 The need for medium-access control 495
15.2 The ALOHA protocol 496
15.3 Carrier-sense multiple access (CSMA) 498
15.3.1 CSMA with collision avoidance (CSMA/CA) 499
15.4 Non-space-division multiple-access protocols 504
15.5 Space-division multiple-access (SDMA) protocols 504
15.5.1 Introduction 504
15.5.2 A simple SDMA protocol 506
15.5.3 SPACE-MAC 507
15.5.4 The reciprocity assumption 509
15.5.5 Ward protocol 509
15.5.6 Summary of some existing SDMA protocols 513
Problems 518
16 Cognitive radios 520

16.1 Cognitive radio channel 521
16.1.1 Cooperative cognitive links 522
16.2 Cognitive spectral scavenging 522
16.2.1 Orthogonal-frequency-division multiple access 523
16.2.2 Game-theoretical analysis 523
Contents xv
16.3 Legacy signal detection 524

16.3.1 Known training sequence 524
16.3.2 Single-antenna signal energy detection 524
16.3.3 Multiple-antenna legacy signal detection 534
16.4 Optimizing spectral efficiency to minimize network interference 538
16.4.1 Optimal SISO spectral efficiency 540
16.4.2 Optimal MIMO spectral efficiency 542
Problems 545
17 Multiple-antenna acquisition and synchronization 547

17.1 Flat-fading MIMO model 548
17.2 Flat-fading MIMO delay-estimation bound 548
17.3 Synchronization as hypothesis testing 550
17.3.1 Motivations for test statistic approaches 551
17.4 Test statistics for flat-fading channels 552
17.4.1 Correlation 552
17.4.2 MMSE beamformer 553
17.4.3 Generalized-likelihood ratio test 554
17.4.4 Spatial invariance 556
17.4.5 Comparison of performance 557
Problems 557
18 Practical issues 559

18.1 Antennas 559
18.1.1 Electrically small antennas 560
18.1.2 Crossed polarimetric array 560
18.2 Signal and noise model errors 560
18.3 Noise figure 561
18.4 Local oscillators 561
18.4.1 Accuracy 562
18.4.2 Phase noise 562
18.5 Dynamic range 563
18.5.1 Quantization 564
18.5.2 Finite precision 565
18.5.3 Analog nonlinearities 567
18.5.4 Adaptive gain control 568
18.5.5 Spurs 568
18.6 Power consumption 568
References 569
Index 589
Preface
In writing this text, we hope to achieve multiple goals. Firstly, we hope to de-
velop a textbook that is useful as a reference for graduate or a supplement
to advanced undergraduate classes investigating advanced wireless communica-
tions. These topics include adaptive antenna processing, multiple-input multiple-
output (MIMO) communications, and wireless networks. Throughout the text,
there is a recurring theme of understanding and mitigating both internal and ex-
ternal interference. In addressing these areas of investigation, we explore concepts
in information theory, estimation theory, signal processing, and implementation
issues as are applicable. We attempt to provide a development covering these
topics in a reasonably organized fashion. While not always possible, we attempt
to be consistent in notation across the text. In addition, we provide problem sets
that allow students to investigate these topics more deeply. Secondly, we attempt
to organize the topics addressed so that this text will be useful as a reference.
To the extent possible, each chapter will be reasonably self-contained, although
some familiarity with the topic area is assumed. To aid the reader, reviews of
many of the mathematical tools needed within the text are collected in Chap-
ters 2 and 3. In addition, an overview of the basics of communications theory
is provided in Chapters 4 and 5. Finally, in discussing these topics, we attempt
to address a wide range of perspectives appropriate for the serious student of
the area. Topics range from information theoretic bounds, to signal processing
approaches, to practical implementation constraints.
While there are many wonderful texts (and here we only list a subset) that
address many of the topics of wireless communications [355, 280, 287, 314, 115,
324, 251, 255, 203], networks [100, 62], signal processing [275, 297, 238, 220,
204], array processing [294, 223, 205, 312, 248, 189], MIMO communications
[247, 331, 160, 45, 22, 183, 84], information theory [68, 202, 212], estimation
theory [312, 172, 297], and the serious researcher may wish to collect many of
these texts, we hope that the particular collection and presentation of topics is
uniquely useful to the research in advanced communications.
Acknowledgments
I would like to thank and remember Professor David Staelin of Massachusetts

Institute of Technology, whose interests and insights encouraged the authors to
work together. I would like to thank my coauthor, who worked tirelessly with
me to write this text. I would like to particularly thank Keith Forsythe of MIT
Lincoln Laboratory, from whom I learned an immense amount over the years.
A number of the concepts discussed in this text were developed by him or in
collaboration with him. I will always be in debt for all that I learned from him.
I would also like to thank (or blame) Jim Ward of MIT Lincoln Laboratory who
encouraged me to write this text. Actually, I would like to thank everyone in the
Advanced Sensor Techniques Group at MIT Lincoln Laboratory. I have learned
something from every one of you.
We thank the many individuals who have contributed comments and sugges-
tions for the book: Pat Bidigare, Nick Chang, Glenn Fawcett, Jason Franz, Alan
Fenn, Anatoly Goldin, Tim Hancock, Gary Hatke, Yichuan Hu, Scott John-
son, Josh Kantor, Nick Kaminski, Paul Kolodzy, Shawn Kraut, Raymond Louie,
Adam Margetts, Matt McKay, Cory Myers, Peter Parker, Thomas Stahlbuhk,
Vahid Tarokh, Gary Whipple, and Derek Young. In particular, we thank Bruce
McGuffin and Ameya Agaskar who provided a significant number of comments.
We would like to thank Dorothy Ryan for all her many helpful comments. To the
folks at the Atomic Bean Cafe off of Harvard Square, thanks for all the espressos
and for letting me spend many, many, many hours writing there.
Finally, I would like to thank my family for their support. To my wife Nadya,
and daughter Coco you may see more of me now. You can decide if that is good
or bad.
Dan Bliss
Cambridge, MA
Acknowledgments xix
I would like to thank and remember Professor David H. Staelin, formerly of the
Massachusetts Institute of Technology for his inspiration, guidance and mentor-
ship, and in particular for introducing me to my coauthor.
I would like to thank my coauthor for his insight, mentorship and for being
the driving force behind this book.
I would also like to thank my former colleague at MIT, Danielle Hinton, in
particular for insightful discussions on multiantenna protocols. I am grateful
to my colleagues at Olin College including Brad Minch, Mark Somerville, and
Vin Manno, for their encouragement and general discussions, both technical and
non-technical. I would also like to thank my students and former students at
Olin College, in particular Yifan Sun, Annie Martin, Rachel Nancollas, Katarina
Miller, Jacob Miller, Jeff Hwang, Sean Shi, Elena Koukina, Yifei Feng, Rui Wang,
Raghu Rangan, Tom Lamar, Avinash Uttamchandani, Ashley Lloyd, Junjie Zhu,
and Chloe Egthebas for their direct and indirect contributions to this work, and
in particular for helping me refine my presentation of some of the material that
has made its way into the book.
Finally, I would like to thank Alo, Antariksh, my parents, parents-in-law, sib-
lings, and the rest of my family for their patience and tireless support.
Siddhartan Govindasamy
Natick, MA
1 History
For better or worse, wireless communications have become integrated into many
aspects of our daily lives. When communication systems work well, they almost
magically enable us to access information from distant, even remote, sources. If
one were to take a modern “smart” phone a couple of hundred years into the past,
one would notice a couple of things very quickly. First, most of the capability of
the phone would be lost because a significant portion of the phone’s capabilities
are based upon access to a communications network. Second, being burned at
the stake as a witch can make for a very bad day.
There are many texts that present the history of wireless communications in
great detail, for example in References [186, 48, 146, 304, 61]. Many of the papers
of historical interest are reprinted in Reference [348]. Because of the rich history
of wireless communications, a comprehensive discussion would require multiple
texts on each area. Here we will present an abridged introduction to the history
of wireless communications, focusing on those topics more closely aligned with
the technical topics addressed later in the text, and we will admittedly miss
numerous important contributors and events.
The early history of wireless communications covers development in basic
physics, device physics and component engineering, information theory, and sys-
tem development. Each of these aspects is important, and modern communica-
tion systems depend upon all of them. Modern research continues to develop and
refine components and information theory. Economics and politics are an impor-
tant part of the history of communications, but they are largely ignored here.
1.1 Development of electromagnetics
While he was probably not the first to make the observation that there is a
relationship between magnetism and electric current, the Danish physicist Hans
Christian Ørsted observed this relationship in 1820 [239] and ignited investiga-
tion across Europe. Most famously, he demonstrated that current flowing in a
wire would cause a compass to change directions. Partly motivated by Ørsted’s
results, the English physicist and chemist Michael Faraday made significant ad-
vancements in the experimental understanding of electromagnetics [304] in the
early 1800s. Importantly for our purposes, he showed that changing current in
2 History
one coil could induce current in another remote coil. While this inductive cou-
pling is not the same as the electromagnetic waves used in most modern wire-
less communications, it is the first step down that path. The Scottish physicist
James Clerk Maxwell made amazing and rich contributions to a number of areas
of physics. Because of his contributions in the area of electromagnetics [211],
the fundamental description of electromagnetics bears his name. While Maxwell
might not immediately recognize them in this form, Maxwell’s equations in in-
ternational system of units (“SI”) [290, 178] are the fundamental representation
of electromagnetics and are given by
∇·d=ρ
∇·b=0
∂b
∇×e=−
∂t
∂d
∇×h=j+ , (1.1)
∂t
where ∇ indicates a vector of spatial derivatives, · is the inner product, × is the
cross product, t is time, ρ is the charge density, j is the current density vector, d
is the electric displacement vector, e is the electric field vector, b is the magnetic
flux density vector, and h is the magnetic field vector. The electric displacement
and electric field are related by
e = d
b = μh, (1.2)
where is the permittivity and μ is the permeability of the medium. These are the
underpinnings of all electromagnetic waves and thus modern communications.
In 1888, the German physicist Heinrich Rudolf Hertz convincingly demonstrated
the existence of the electromagnetic waves predicted by Maxwell [144, 178]. To
demonstrate the electromagnetic waves, he employed a spark-gap transmitter.
At the receiver, the electromagnetic waves coupled into a loop with a very small
gap across which a spark would appear. The spark-gap transmitter with vari-
ous modifications was a standard tool for wireless communications research and
systems for a number of following decades.
1.2 Early wireless communications
In the late 1800s, significant and rapid advances were made. Given the prolifera-
tion of wireless technologies and the penetration of these technologies into every
area of our lives, it is remarkable that before the late 1800s little was known
about even the physics of electromagnetics. Over the years, there have been var-
ious debates over the primacy of the invention of wireless communications. Who
invented wireless communications often comes down to a question of semantics.
How many of the components do you need before you call it a radio? As is often
1.2 Early wireless communications 3
true in science and engineering, it is clear that a large number of individuals per-
formed research in the area of wireless communications or, as it was often called,
wireless telegraphy. The following is an incomplete list of important contributors.
In 1872, before Hertz’s demonstration, a patent was issued to the American1
inventor and dentist Mahlon Loomis for wireless telegraphy [193]. While his
system reportedly worked with some apparent reliability issues, his contributions
were not widely accepted during his life. This lack of acceptance was likely partly
due to his inability to place his results in the scientific context of the time.
In 1886, American physicist Amos Emerson Dolbear, received a patent for a
wireless communication system [82]. This patent later became a barrier to the
enforcement of Guglielmo Marconi’s patents on wireless communications in the
United States, until the Marconi Company purchased Dolbear’s patent. It is
worth noting this demonstration was also before Hertz’s demonstration.
In 1890, the French physicist Edouard Eugene Desire Branly developed an
important device used to detect electromagnetic waves. The so-called “coherer”
employed a tube containing metal filings filling a gap between two electrodes
and exploited a peculiar phenomenon of these filings [279]. When exposed to
radio-frequency signals, the filings would fuse or cling together, thus reducing the
resistance across the electrodes. British physicist Sir Oliver Joseph Lodge refined
the coherer by adding a “trembler” or “decoherer” that mechanically disrupted
the fused connections. Many of the early experiments in wireless communications
employed variants of the coherer.
The Serbian-born, American engineer Nicola Tesla was one of those larger-
than-life characters. He made significant contributions to a number of areas of
engineering, but with regard to our interests, he received a patent for wireless
transmission of power in 1890 [309] and demonstrated electromagnetic trans-
fer of energy in 1893 [310]. Tesla is rightfully considered one of the significant
contributors to the invention of wireless communications.
Bengal-born Indian scientist Jagdish Chandra Bose contributed significantly
to a number of areas of science and engineering. He was one of the early re-
searchers in wireless communication and developed an improved coherer. In 1885,
he demonstrated radio communication with a rather dramatic flair [107]. By us-
ing a wireless communication link, he remotely set off a small explosive that
rang a bell. His improved coherer was a significant contribution to wireless com-
munications. His version of the coherer replaced the metal filings with a metal
electrode in contact with a thin layer of oil that was floating on a small pool of
mercury. When exposed to radio-frequency signals, the conductivity across the
oil film would change. Marconi used a similar coherer for his radio system.
The German physicist Karl Ferdinand Braun developed a number of important
technologies that contributed to the usefulness of wireless communication. He
developed tuned circuits for radio systems, the cat’s whisker detector (really an
1 Throughout this chapter we employ the common, if imprecise, usage of “American” to

indicate citizen of the United States of America.
4 History
early diode), and directional antenna arrays. In 1909, he shared the Nobel Prize
in physics with Guglielmo Marconi for his contributions.
The Russian physicist Alexander Stepanovich Popov presented results on his
version of a coherer to the Russian Physical and Chemical Society on May 7th,
1895 [304]. He demonstrated links that transmitted radio waves between build-
ings. As an indication of the importance of this technology to society, in the
Russian Federation, May 7th is celebrated as Radio Day.
The Italian engineer Guglielmo Marconi, began research in wireless commu-
nications in 1895 [304] and pursued a sustained, intense, and eventually well-
funded research and development program for many years to follow. He received
the Nobel Prize in physics (with Karl Ferdinand Braun) in 1909 for his contri-
butions to the development of wireless radios [304]. While he is not the inventor
of radio, as is sometimes suggested, his position as principal developer cannot
be dismissed. His research, development, and resulting company provided the
impetus to the commercialization of wireless communications. In 1896 Marconi
moved to England, and during that and the following year he provided a number
of demonstrations of the technology. In 1901, he demonstrated a transatlantic
wireless link, and in 1907 he established a regular transatlantic radio service.
A somewhat amusing (or annoying if you were Marconi) public demonstration
of the effects of potential interference in wireless communications was provided
in 1903 by British magician and inventor Nevil Maskelyne [146]. Maskelyne was
annoyed with Marconi’s broad patents and his claims of security in his wireless
system. During a public demonstration of Marconi’s system for the Royal In-
stitution, Maskelyne repeatedly transmitted the Morse code signal “rats” and
other insulting comments which were received at the demonstration of the sys-
tem, which was supposedly immune to such interference. Previously, in 1902,
Maskelyne had developed a signal interception system that was used to receive
signals from Marconi’s ship-to-shore wireless system. Marconi had claimed his
system was immune to such interception because of the precise frequency tuning
required for reception.
In the first few decades of the twentieth century, wireless communication
quickly evolved from a technical curiosity to useful technology. An important
technology that enabled widespread use of wireless communication was amplifi-
cation. The triode vacuum-tube amplifier was developed by American engineer
Lee de Forest. He filed a patent for the triode (originally called the de For-
est valve) in 1907 [95]. The triode enabled increased power at transmitters and
increased sensitivity at receivers. It was the fundamental technology until the
development of the transistor decades later.
In the late 1910s, a number of experimental radio broadcast stations were
constructed [304]. In the early 1920s, the number of radio broadcast stations
exploded, and wireless communications began its integration into everyday life.
During the Second World War, the concept of tactical communications under-
went dramatic development. The radios became small enough and sufficiently
robust that a single soldier could carry them. It became common for various
1.3 Developing communication theory 5
military organizations to make wireless communications available to relatively

small groups of soldiers, allowing the soldiers to operate with greater effective-
ness and with access to external support. By the end of the Second World War,
American soldiers had access to handheld “handie-talkies,” [265], such as the
Motorola SCR-536 or BC-611, that are recognizable as the technical forebearers
of modern handheld communications devices.
1.3 Developing communication theory
In 1900, Canadian-born American engineer Reginald Aubrey Fessenden [304],

employed a high-frequency spark-gap transmitter to transmit an audio signal by
using amplitude modulation (AM). Fessenden also developed the concept of the
heterodyne receiver at about the same time, although the device technology
available at that time did not support its use. The heterodyne receiver would
multiply the signal at the carrier frequency by a tone from a local oscillator, so
that the beat frequency was within the audible frequency range.
In 1918, American engineer Edwin Howard Armstrong extended the hetero-
dyne concept, denoted the superheterodyne receiver, by having the mixed signal
beat to a fixed intermediate frequency. A second stage then demodulates the in-
termediate frequency signal down to the audible frequency range. This approach
and similar variants has become the standard for most of modern communi-
cations. In 1933, Armstrong also patented another important communications
concept, frequency modulation (FM).
If one had to pick the greatest single contribution to communications, most
researchers would probably identify the formation of information theory [284],
published in 1948 by American mathematician and engineer Claude Elwood
Shannon. In his work, Shannon developed the limits on the capacity of a com-
munication channel in the presence of noise. It is worth noting the contributions
of American engineer Ralph Vinton Lyon Hartley, who developed bounds for the
number of levels per sample with which a communication system can commu-
nicate at a given voltage resolution [138]. Hartley’s results were a precursor to
Shannon’s results.
The observation by Shannon that, even in non-zero noise, effectively error-
free communication was possible theoretically, increased the motivation for the
development of error-correction codes. Examples of early block codes to com-
pensate for noise were developed by Swiss-born American mathematician and
physicist Marcel J. E. Golay [113] and American mathematician Richard Wesley
Hamming [134]. Over time, a large number of error-correcting codes and decod-
ing algorithms were developed. The best of these codes closely approached the
Shannon limit.
Developed during the Second World War and published in the 1949, American
mathematician, zoologist, and philosopher Norbert Wiener presented statistical
signal processing [346]. In his text, he developed the statistical signal processing
6 History
techniques that dominate signal processing to this day. Addressing a similar set of
technical issues, prolific Russian mathematician Andrey Nikolaevich Kolmogorov
published his results in 1941 [176].
A frequency-hopping modulation enables a narrowband system to operate over
a wider bandwidth by changing the carrier frequency as a function of time. A
variety of versions of frequency hopping were suggested over time, and the iden-
tity of original developer of frequency hopping is probably lost because of the
secrecy surrounding this modulation approach. However, in what must be consid-
ered a relative unexpected source of contribution to communication modulation
technology, a frequency hopping patent, was given to Austrian-born American
actress Hedy Lamarr (filed as Hedy Kiesler Markey) and American composer
George Antheil [208]. The technology exploited a piano roll as a key to select
carrier frequencies of a frequency-hopping system.
As opposed to frequency hopping, direct-sequence spread spectrum (DSSS)
modulates a relatively narrowband signal with a wideband sequence. The re-
ceiver, knowing this sequence, is able to recover the original narrowband sig-
nal. This technology is exploited by code-division multiple-access (CDMA) ap-
proaches to enable the receiver to disentangle the signals sent from multiple users
at the same time and frequency. The origins of direct-sequence spread spectrum
are partly a question of semantics. An early German patent was given to Ger-
man engineers Paul Kotowski and Kurt Dannehl for a communications approach
that modulated voice with a rotating generator [278]. This approach has a loose
similarity to the digital spreading techniques used by modern communication
systems. In the early 1950s, for direct-sequence spread-spectrum communica-
tions, the noise modulation and correlation (NOMAC) system was developed
and demonstrated at Massachusetts Institute of Technology Lincoln Laboratory
[338]. In 1952, the first tests of the communication system were performed. The
system drew heavily from the doctoral dissertation of American engineer Paul
Eliot Green, Jr. [338], who was one of the significant contributors to the NO-
MAC system at Lincoln Laboratory. Because direct-sequence spread-spectrum
systems are spread over a relatively wide bandwidth, they can temporally re-
solve multipath more easily. Consequently, the received signal can suffer from
intersymbol interference. To compensate for this effect, in 1958, the concept of
the rake receiver, developed by American engineers Robert Price and Paul Eliot
Green, Jr. [254, 338], implemented channel equalization. During late 1950s, the
ARC-50 radio was designed and tested [278]. Magnavox’s ARC-50 was an oper-
ational radio that is recognizable as a modern direct-sequence spread-spectrum
system.
1.4 Television broadcast
While television technology is not a focus of this text, its importance in the
development of wireless technology cannot be ignored. Given the initial success
of wireless data and then voice radio communications, it didn’t take long for
researchers to investigate the transmission of images. Because of the significant
increase in the amount of information in a video image compared to voice, it
took decades for a viable system to be developed. Early systems often involved
mechanically scanning devices.
German engineers Max Dieckmann and Rudolf Hell patented [81] an electri-
cally scanning tube receiver that is recognizable in concept to televisions used
for the following seventy years. Apparently, they had difficulty developing their
concept to the point of demonstration.
Largely self-taught, American engineer Philo Taylor Farnsworth, developed
concepts for the first electronically scanning television receiver (“image dissec-
tor”) that he conceived as a teenager and for which he filed a patent several
years later in 1927 [91]. In 1927, he also demonstrated the effectiveness of his
approach.
During a similar period of time, while working for Westinghouse Laboratories,
Russian-born American engineer Vladimir K. Zworykin filed a patent in 1923
for his version of a tube-based receiver [365]. However, the U.S. Patent Office
awarded primacy of the technology to Farnsworth. In 1939, RCA, the company
for which Zworykin worked, demonstrated a television at the New York World’s
Fair. Regular broadcasts soon began; these are often cited as the beginning of
the modern television broadcast era.
1.5 Modern communications advances
In the modern age of wireless communications, with a few notable exceptions,

it is more difficult to associate particular individuals with significant advances.
During this era, communications systems have become so complicated that large
numbers of individuals contribute to any given radio. It is sometimes easier to
identify individuals who made significant theoretical contributions. However, so
many significant contributions have been made that here only a small subset of
contributions are identified that are particularly salient to the discussions found
in the text.
While satellite communications are clearly wireless communications, this type
of communication is not emphasized in this text. Nonetheless, the importance
of satellite communications should not be underestimated. The first communi-
cation satellite, launched in 1958, was named signal communications orbit re-
lay equipment (SCORE) [74]. It was developed under an Advanced Research
Projects Agency (ARPA later renamed Defense ARPA or DARPA) program
and demonstrated the viability of these satellites. It used both prerecorded and
receive-and-forward messages that were broadcast on a shortwave signal.
Italian-born American engineer Andrew James Viterbi made numerous con-
tributions to wireless communications. However, his most famous contribution
is the development in 1967 of what is now called the Viterbi algorithm [327].
8 History
This algorithm specified the decoding of convolutional codes via a dynamical

program approach that tracks the most likely sequences of states. In some ways,
this development marked the beginning of the modern era of communications.
Both in terms of the improvement in receiver performance and the computational
requirements for the receiver, this is a modern algorithm.
From the time of Golay and Hamming, coding theory steadily advanced. By
the end of the 1980s, coding had reached a point of diminishing returns. Over
time, advances slowed and the focus of research was placed on implementations.
However, coding research was reinvigorated in 1993 by the development of turbo
codes by French engineers Claude Berrou and Alain Glavieux, and Thai engineer
Punya Thitimajshima [19]. The principal contribution of these codes is that they
enabled an iterative receiver that significantly improved performance.
One of the defining moments of the modern era was in 1973 when American
engineer Martin Cooper placed the first mobile phone call. His team at Motorola
was the first to develop and integrate their wireless cellular phone system into
the wired phone network [63]. It is somewhat amusing that Cooper’s first phone
call was made to a competing group of cellular engineers at Bell Laboratories.
While numerous researchers contributed significantly to this area of investi-
gation, the Spanish-born American engineer Sergio Verdu is typically identified
as principal developer of multiuser detection (MUD) [322]. In systems in which
multiple users are transmitting signals at the same time and frequency, under
certain conditions, a receiver, even with a single receive antenna, can disentan-
gle the multiple transmitted signals by exploiting the structural differences in
the waveforms of the various signals. Because of the computational complexity
and potential system advances of multiuser detection, this is a quintessentially
modern communications concept.
Numerous researchers suggested multiple-antenna communications systems in
a variety of contexts. These suggestions are both in the context of multiple
antennas at either receiver or transmitter, and in terms of multiuser systems.
For example multiple-input multiple-output (MIMO) systems were suggested by
American engineers Jack H. Winters, Jack Salz, and Richard D. Gitlin [350, 351],
and by Indian-born American engineers Arogyaswami J. Paulraj and Thomas
Kailath [245]. Because he developed an entire system concept, the initial devel-
opment of MIMO communications concepts is typically attributed to the Amer-
ican engineer Gerard Joseph Foschini who, in his 1996 paper [99], described a
multiple-transmit and multiple-receive antenna communication system. In this
system, the data were encoded across the transmit antennas, and the receiver
disentangled these signals from the multiple transmit antennas.
In order to exploit MIMO communications links, some sort of mapping from
the information bits to the baseband signal must be used. These mappings
are typically called space-time codes. The trivial approach employs a standard
single-antenna modulation and then demultiplexes these signals among the mul-
tiple transmit antennas. However, this approach suffers from poor performance
because the required signal-to-noise ratio (SNR) is set by the SNR from the
transmitter with the weakest propagation. The most basic concept for an effec-
tive space-time code is the Alamouti block code. This code, patented by Iranian-
born American engineers Siavash M. Alamouti and Vahid Tarokh [7], is described
in Reference [8]. Tarokh and his colleagues extended these concepts to include
larger space-time block codes [305] and space-time trellis codes [307].
1.5.1 Early packet-radio networks

The ALOHA system (also known as ALOHAnet), which was developed by Nor-
man Abramson and colleagues at the University of Hawaii beginning in 1968
[3], was one of the first modern wireless networks. The system involved packet
radio transmissions using transceivers distributed on several islands in Hawaii.
The underlying communication protocol used in ALOHAnet is now commonly
known as the ALOHA protocol. The ALOHA protocol uses a simple and elegant
random-access technique well-suited to packet communications systems. This
protocol is described in more detail in Section 15.2. ALOHAnet was operated in
a star network configuration, where a central station routed packets from source
to destination terminals.
Ad hoc wireless networks, which are networks with no centralized control, re-
ceived attention from the United States Department of Defense (DoD) starting
in the early 1970s. The DoD was interested in such networks for their battlefield
survivability and the reduced infrastructure requirements in battlefields, among
other factors [103]. Through ARPA, the DoD developed several packet radio
communications systems such as the packet radio network (PRNet), whose de-
velopment began in 1972 [103]. This network was followed by a packet radio com-
munication system for computer communications networks in San Francisco in
1975. RADIONET, as it was called [169], differed from ALOHAnet in that it had
distributed control of the network management functions and used repeaters for
added reliablity and increased coverage. Another notable feature of RADIONET
is its use of spread-spectrum signaling to improve robustness against jamming.
RADIONET was followed by several different efforts by DARPA through the
1970s and early 1980s to develop ad hoc wireless networks for military use.
Notable among these is the low-cost packet radio (LPR) system, which was an
outcome of DARPA’s Survivable Radio Networks (SURAN) program. LPR used
digitally controlled spread-spectrum radios with packet switching operations im-
plemented on an Intel 8086 microprocessor [103].
Another important development in the history of wireless networks is the devel-
opment of the wired Ethernet protocol by Robert Metcalfe and colleagues at the
Xerox Palo Alto Research Center (PARC) in the early to mid 1970s [214]. Eth-
ernet used carrier-sense-multiple-access (CSMA) technology (described in more
detail in Section 15.3) and by 1981 offered packet data communication rates of
10 Mbps in commercially available systems at relatively low cost. In contrast,
wireless packet networks offered data rates of only a few thousand bits per sec-
ond at reasonable costs and equipment size. The enormous data rates offered by
10 History
Ethernet at low cost perhaps reduced the interest in developing wireless net-
working technologies for commercial use.
Interest in wireless networks for commercial use increased after the U.S. Fed-
eral Communications Consortium (FCC) established the industrial, scientific,
and medical (ISM) frequency bands for unlicensed use in the United States in
1985. The ISM bands are defined in Section 15.247 of the Federal Communica-
tions Consortium rules. The freeing of a portion of the electromagnetic spectrum
for unlicensed use sparked a renewed interest in developing wireless networking
protocols [103].
Other major developments in the late 1980s and 1990s that increased interest
in wireless networks were the increased use of portable computers, the inter-
net, and significant reduction in hardware costs. Since portable computer users
wanted to access the internet and remain portable, wireless networking became
essential.
1.5.2 Wireless local-area networks

In 1997, what may be considered that grand experiment in wireless communi-
cations was initiated. The standard IEEE 802.11 [150] or WiFi was finalized
for use in the industrial, scientific, and medical frequency band. While wire-
less communications were available previously, WiFi established a standard that
enabled moderately high data rates that could be integrated into interoperable
devices. This personal local-area wireless networking standard allowed individ-
uals to setup their own networks with relative ease at a moderate price. Over
the years, a number of extensions to the original standard have been developed.
Of particular interest is the development of IEEE 802.11n that was finalized in
2009 [151] (although many systems were developed using earlier drafts). This
provided a standard for WiFi multiple-input multiple-output (MIMO) wireless
communications.
The IEEE 802.11 family of standards marked a turning point in the develop-
ment of wireless networks as they were instrumental in making wireless local-area
networks (W-LAN) ubiquitous throughout the world. Wireless LANs running
some version of the IEEE 802.11 protocol have become so common that the
term “WiFi,” commonly used to signify compatibility with the 802.11 standard,
made its debut in the Webster’s New College Dictionary in 2005 [213].
Almost in parallel with the development of the IEEE 802.11 protocols, the
European Telecommunications Standards Institute (ETSI) developed its own
protocol for wireless networking called the HiperLAN (High Performance Radio
LAN). HiperLAN/1 offered in excess of 20 Mb/s data transfer rate and thus
had significantly higher data transmission rates than the existing IEEE 802.11
standard at the time [66]. The IEEE 802.11 standard incorporated a number of
technical extensions that were both a good match to the computational capabil-
ities of the time and provided paths to higher data rates in IEEE 802.11g. Over
time, the HiperLAN standard lost market share to the IEEE 802.11 standards.
At some point between the years 2000 and 2010, a rather significant change
occurred in the use of wireless communications. The dominant use of wireless
communication links transitioned from broadcast systems such as radio or televi-
sion to two-way, personal-use links such as mobile phones or WiFi. At that point,
it became considered strange to not be in continuous wireless contact with the
web. Not only did this change our relationship with information, possibly funda-
mentally altering the nature of the human condition, but it also changed forever
the nature of trivia arguments held in bars and pubs around the world.
2 Notational and mathematical
preliminaries
This chapter contains a number of useful definitions and relationships used

throughout the text. In the remainder of the text, it is assumed that the reader
has familiarity with these topics. In general, the relationships are stated without
proof, and the reader is directed to dedicated mathematical texts for further
detail [40, 180, 54, 117, 18, 217].
2.1 Notation
2.1.1 Table of symbols
a∈S a is an element of the set S

∃x there exists an x
a∗ complex conjugate of a (2.1)
A† Hermitian conjugate1 of A
∀x for all x
2.1.2 Scalars
A scalar is indicated by a non-bold letter such as a or A. Scalars can be integer
Z, real R, or complex numbers C:
a ∈ Z,
a ∈ R , or
a ∈ C, (2.2)
respectively.
The square root of −1 is indicated by i,
√
−1 = i . (2.3)
The Euler formula for an exponential for some real angle α ∈ R in terms of
radians is given by
eiα = cos(α) + i sin(α) . (2.4)

1 In some of the engineering literature this operator is indicated by ·H .
2.1 Notation 13
An arbitrary complex number a ∈ C can be expressed in terms of polar coordi-

nates with a radius ρ ∈ R and an angle α ∈ R,
a = ρ eiα
= ρ cos(α) + i ρ sin(α), (2.5)
where the real and imaginary parts of a are indicated by
{a} = ρ cos(α)
{a} = ρ sin(α) , (2.6)
respectively. The complex conjugate of a variable is indicated by
a∗ = (ρ ei α )∗ = ρ e−i α . (2.7)
The value of i can also be expressed in an exponential form,
i = ei π /2+i 2π m ∀ m ∈ Z . (2.8)
Consequently, exponents of i can be evaluated. For example, the inverse of i is

given by
1
= i−1
i
= e−i π /2
= cos(−π/2) + i sin(−π/2)
= 0 − i. (2.9)
The logarithm of variable x ∈ R, assuming base a ∈ R, is indicated
loga (x) , (2.10)
such that
loga (ay ) = y , (2.11)
under the assumption that a and y are real. When the base is not explicitly
indicated,2 it is assumed that a natural logarithm (base e) is indicated such that
log(x) = loge (x) . (2.12)
The translation between bases a and b of variable x is given by

loga (x)
logb (x) =
loga (b)
= logb (a) loga (x) . (2.13)
2 In some texts, it is assumed that log(x) indicates the logarithm base 10 or 2 rather than
the natural logarithm assumed here.
14 Notational and mathematical preliminaries
The logarithm can be expanded about 1,

∞
xm
log(1 + x) = (−1)m +1 ; for x < 1
m =1
m
≈ x ; for small x. (2.14)
Consequently, it can be shown for finite values of x
x n
ex = lim 1 + . (2.15)
n →∞ n
For real base a the logarithm of a complex number z ∈ C, such that z can be
represented in polar notation in Equation (2.5),
z = ρ ei α , (2.16)
is given by
i
loga (z) = loga (ρ) + (α + 2π m) ; m ∈ Z. (2.17)
log(a)
For complex numbers, the logarithm is multivalued because any addition of a
multiple of 2πi to the argument of the exponential provides an equal value for
z. If the imaginary component produced by the natural (base e) logarithm is
greater than −π and less than or equal to π, then it is considered the principal
value.
A Dirac delta function (which technically is not a function) [40] is generally
used within the context of an integral, for example, with a real parameter a, the
integral over the real variable x and well-behaved function f (x)
∞
dx f (x) δ(x − a) = f (a) . (2.18)
−∞
The floor and ceiling operators are indicated by

· and · , (2.19)
which round down and up to the nearest integer, respectively. For example, the
floor and ceiling of 3.7 are given by 3.7 = 3 and 3.7 = 4, respectively.
For many problems the notion of convex or concave functions are useful. Over
some range of a function, the function is considered convex if all line segments
connecting any two points on the function are contained entirely within the area
defined by the function. Similarly, over some range of a function, it is considered
concave if all line segments connecting any two points on the function are outside
the area defined by the function.
2.1.3 Vectors and matrices

An important concept employed throughout the text is the notion of a vector
space that is discussed in the study of linear algebra. Without significant dis-
cussion, we will assume that vector spaces employed within the text satisfy the
2.1 Notation 15
typical requirements of a Hilbert space, having inner products and norms. A

vector is indicated by a bold lowercase letter. For example, a column n-vector of
complex values is indicated by
a ∈ Cn ×1 . (2.20)
A row n-vector is indicated by a bold lowercase letter with an underscore,
a ∈ C1×n . (2.21)
The mth element in a is denoted
(a)m , or {a}m . (2.22)
A matrix with m rows and n columns is indicated by a bold uppercase letter,
for example
M ∈ Cm ×n . (2.23)
The element at the pth row and qth column of M is denoted
(M)p,q or {M}p,q . (2.24)
The complex conjugate of vectors and matrices is indicated by
a∗ and
M∗ , (2.25)
where conjugation operates on each element independently. The transpose of
vectors and matrices is indicated by
aT and MT , (2.26)
respectively. The Hermitian conjugate of vectors and matrices is indicated by
a† = (aT )∗ and
M† = (MT )∗ , (2.27)
respectively. A diagonal matrix is indicated by
⎛ ⎞
a1 0 0 ··· 0
⎜ 0 a2 0 ⎟
⎜ ⎟
⎜ ⎟
diag{a1 , a2 , a3 , . . . , an } = ⎜ 0 0 a3 ⎟. (2.28)
⎜ . .. ⎟
⎝ .. . ⎠
0 an
The Kronecker delta is indicated by
1 ; m=n
δm ,n = . (2.29)
0 ; otherwise
A Hermitian matrix is a square matrix that satisfies
M† = M . (2.30)
A useful property of Hermitian matrices is that they are positive semidefinite. A

positive-semidefinite matrix M ∈ Cm ×m has the property that for any nonzero
vector x ∈ Cm ×1 the following quadratic form is greater than or equal to zero,
x† M x ≥ 0 . (2.31)
A related matrix is the positive definite matrix that satisfies
x† M x > 0 . (2.32)
A unitary matrix is a square matrix that satisfies
U† = U−1 , so that U† U = UU† = I , (2.33)
where the m × m identity matrix is indicated by Im , or I if the size is clear from
context, such that
{Im }p,q = δp,q
p ∈ {1, 2, · · · , m} , q ∈ {1, 2, · · · , m} . (2.34)
The vector operation is a clumsy concept that maps matrices to vectors. It
could be avoided by employing tensor operations. However, it is sometimes con-
venient for the sake of implementation to consider explicit conversions between
matrices and vectors. The vector operation extracts elements along each column
before moving to the next column. The vector operation of matrix M ∈ CM ×N
is denoted vec(M) and is defined by
{vec(M)}(n −1)M + m = {M}m ,n . (2.35)
2.1.4 Vector products

The inner product between real vectors is indicated by the dot product,
a · b = aT b . (2.36)
In this text, the inner product for complex vectors (or Hermitian inner product)
is denoted by
a† b . (2.37)
The inner product can also be denoted by

a, b = {a}m {b}∗m . (2.38)
m
Note that the conjugation of the order is switched between the vector notation
and the bracket notation, such that
∗
a† b = a, b . (2.39)
This switch is performed to be consistent with standard conventions. When using
the phrase “inner product,” we will use both forms interchangeably. Hopefully,
the appropriate conjugation will be clear from context.
2.1 Notation 17
While we will not be particularly concerned about the technical details, the
higher-dimensional space in which the inner products are operating is sometimes
referred to as a Hilbert space. This space can be extended to an infinite dimen-
sional space. For example, a vector a can be indexed by the variable x,
a → fa (x) , (2.40)
where the function is defined along the axis x. Inner products in this complex
infinite-dimensional space are given by integrating over the indexing parameter.
In this case it is x. The complex infinite-dimensional inner product between
functions f (x) and g(x) that represent two infinite-dimensional vectors is denoted

f (x), g(x) = dx f (x) g ∗(x) . (2.41)
With this form, a useful inequality can be expressed. The Cauchy–Schwarz in-
equality is given by
2
f (x), g(x) ≤ f (x), f (x) g(x), g(x) . (2.42)
This concept can be extended to include a weighting or a measure over the

variable of integration. For example, if the measure is p(x), then the inner product
is given by

f (x), g(x) = dx p(x) f (x) g ∗(x) . (2.43)
The outer product of two vectors a and b is indicated by
a b† . (2.44)
2.1.5 Matrix products

For matrices A ∈ CM ×K , B ∈ CK ×N , and C ∈ CM ×N , the standard matrix
product is given by
C = AB

{C}m ,n = Am ,k Bk ,n . (2.45)
k
For matrices A ∈ CM ×N , B ∈ CM ×N , and C ∈ CM ×N , the Hadamard or

element-by-element product is denoted · · such that
C=A B
{C}m ,n = {A B}m ,n
= Am ,n Bm ,n . (2.46)
For matrices A ∈ CM ×N , B ∈ CJ ×K , and C ∈ CM J ×N K , the standard definition

of the Kronecker product [130] is denoted · ⊗ · and is given by
C=A ⊗ B
⎛ ⎞
{A}1,1 B {A}1,2 B {A}1,3 B ...
⎜ {A}2,1 B {A}2,2 B {A}2,3 B ⎟
⎜ ⎟
= ⎜ {A} B {A} B ⎟. (2.47)
⎝ 3,1 3,2 ⎠
..
.
This definition is unfortunate because it is inconsistent with the standard def-

inition of the vector operation. As a consequence, the forms that include in-
teractions between vector operations and Kronecker products are unnecessarily
twisted. Nonetheless, we will use definitions that will keep with the traditional
notation. A few useful relationships are given here:
(A ⊗ B)T = AT ⊗ BT (2.48)
∗ ∗ ∗
(A ⊗ B) = A ⊗ B (2.49)
† † †
(A ⊗ B) = A ⊗ B (2.50)
−1 −1 −1
(A ⊗ B) =A ⊗B , (2.51)
where it is assumed that A and B are not singular for the last relationship. The
Kronecker product obeys distributive and associative properties,
(A + B) ⊗ C = A ⊗ C + B ⊗ C (2.52)
(A ⊗ B) ⊗ C = A ⊗ (B ⊗ C) . (2.53)
The product of Kronecker products is given by
(A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD) . (2.54)
For square matrices A ∈ CM ×M and B ∈ CN ×N ,
tr{A ⊗ B} = tr{A} tr{B} (2.55)

|A ⊗ B| = |A| |B| N M
, (2.56)
where the trace and determinant are defined in Section 2.2. Note that the ex-
ponents M and N are for the size of the opposing matrix. The vector operation
and Kronecker product are related by
vec(a bT ) = b ⊗ a (2.57)
vec(A B C) = (C ⊗ A) vec(B) .
T
(2.58)
If the dimensions of A and B are the same and the dimensions of C and D are
the same, then the Hadamard and Kronecker products are related by
(A B) ⊗ (C D) = (A ⊗ C) (B ⊗ D) . (2.59)
2.2 Norms, traces, and determinants 19
2.2 Norms, traces, and determinants
In signal processing for multiple-antenna systems, determinants, traces, and

norms are useful operations.
2.2.1 Norm
The absolute value of a scalar and the L2-norm of a vector are indicated by
either · or · 2 . We reserve the notation |.| exclusively for the determinant of
a matrix. The absolute value of a scalar a is thus a , and the norm of a vector
a is denoted as follows:
√
a = (a)m 2 = a† a . (2.60)
m
The p-norm of a vector for values other than 2 is indicated by

1/p

p
a p= (a)m , (2.61)
m
for p ≥ 1. The Frobenius norm of a matrix is indicated by

M F = (M)m ,n 2 = tr{M M† } . (2.62)
m ,n
2.2.2 Trace
The trace of a square matrix M ∈ Cm ×m of size m is the sum of its diagonal
elements and is indicated by

tr{M} = (M)m ,m . (2.63)
m
The trace of a matrix is invariant under a change of bases. The product of two
matrices commutes under the trace operation,
tr{A B} = tr{B A} .
This property can be extended to the product of three (or more) matrices such
that
tr{A B C} = tr{C A B} = tr{B C A} . (2.64)
2.2.3 Determinants
The determinant of a square matrix A is indicated by

|A| = (A)m ,n (−1)m +n |Mm ,n | , (2.65)
n
where submatrix Mm ,n is here defined to be the minor of A, which is constructed

by removing the mth row and nth column of A (not to be confused with the
mth, nth element of A or M). The determinant of a 2 × 2 matrix is given by

a b

c d = ad − bc . (2.66)
The determinant has a number of useful relationships. The determinant of the

product of square matrices is equal to the product of the matrix determinants,
|A B| = |A| |B| = |B A| . (2.67)
For some scalar c ∈ C and matrix M ∈ Cm ×m , the determinant of the product is

the product of the scalar to the mth power times the determinant of the matrix,
|c M| = cm |M| . (2.68)
The determinant of the identity matrix is given by
|I| = 1 , (2.69)
and the determinant of a unitary matrix U is magnitude one,
|U| = 1 . (2.70)
Consequently, the determinant of the unitary transformation with unitary matrix

U, defined in Equation (2.33), of a matrix A is the determinant of A,
|U A U† | = |U A| |U† |
= |U† U A|
= |A| . (2.71)
The product of matrices plus the identity matrix commute under the
determinant,
|I + A B| = |I + B A| , (2.72)
where A ∈ Cm ×n and B ∈ Cn ×m are not necessarily square (although AB and

BA are). The inverse of a matrix determinant is equal to the determinant of a
matrix inverse,
|M|−1 = |M−1 | . (2.73)
The Hadamard inequality bounds the determinant of a matrix A whose mth

column is denoted by am as follows

|A| ≤ am . (2.74)
m
Suppose that A is a positive-definite square matrix. We can write A in terms of

another square matrix B as follows
A = B† B (2.75)

|A| = |B|∗ |B| = |B| 2
≤ bm 2
= {A}m ,m , (2.76)
m m
noting that b†m bm = {A}m ,m , and where the inequality is an application of

Equation (2.74). That is to say, the determinant of a positive-definite matrix is
less than or equal to the product of its diagonal elements.
2.3 Matrix decompositions
2.3.1 Eigen analysis

One of the essential tools for signal processing is the eigenvalue decomposition.
A complex square matrix M ∈ CM ×M has M eigenvalues and eigenvectors. The
mth eigenvalue λm , based on some metric for ordering, such as magnitude, and
the corresponding mth eigenvector vm of the matrix M are given by the solution
of
M v m = λ m vm . (2.77)
Sometimes, for clarity, the mth eigenvalue of a matrix M is indicated by λm {M}.

A matrix is denoted positive-definite if all the eigenvalues are real and positive
(λm {M} > 0 ∀ m ∈ {1, . . . , M }), and is denoted positive-semidefinite if all the
eigenvalues are positive or zero (λm {M} ≥ 0 ∀ m ∈ {1, . . . , M }). In cases in
which there are duplicated or degenerate eigenvalues, the eigenvectors are only
determined to within a subspace of dimension of the number degenerate eigen-
values. Any vector from an orthonormal basis in that subspace would satisfy
Equation (2.77). The extreme example is the identity matrix, for which all the
eigenvalues are the same. In this case, the subspace is the entire space; thus, any
vector satisfies Equation (2.77).
The sum of the diagonal elements of a matrix is indicated by the trace. The
trace is also equal to the sum of the eigenvalues of the matrix,

tr{M} = (M)m ,m = λm . (2.78)
m m
The determinant of a matrix is equal to the product of its eigenvalues,

|M| = λm . (2.79)
m
While, in general, for some square matrices A and B the eigenvalues of the sum
do not equal the sum of the eigenvalues
λm {A + B} = λm {A} + λm {B} , (2.80)

for the special case of I + A the eigenvalues add,
λm {I + A} = 1 + λm {A} . (2.81)
2.3.2 Eigenvalues of 2 × 2 Hermitian matrix

Given the 2 × 2 Hermitian matrix M ∈ C2×2 (that is a matrix that satisfies
M = M† ),

a c∗
M= , (2.82)
c b
the eigenvalues of M can be found by exploiting the knowledge of eigenvalue

relationships between the trace and the determinant. Because the matrix is Her-
mitian, the diagonal values (a and b) are real. The trace of M is given by
tr{M} = λ1 + λ2
= a + b. (2.83)
The determinant of the Hermitian matrix M is given by
|M| = λ1 λ2
= ab − c 2
. (2.84)
By combining these two results, the eigenvalues can be explicitly found. The
eigenvalues are given by
λ1 + λ 2 = a + b
λ21 + λ1 λ2 = (a + b)λ1
0 = λ2 − (a + b)λ + ab − c 2

a + b ± (a + b)2 − 4(ab − c 2 )
λ=
2
a + b ± (a − b)2 + 4 c 2
= . (2.85)
2
As will be discussed in Section 2.3.3, Hermitian matrices constructed from quadratic
forms are positive-semidefinite.
2.3.3 Singular-value decomposition

Another important concept is the singular-value decomposition (SVD). The SVD
of a matrix decomposes a matrix into three matrices: a unitary matrix, a diagonal
matrix containing the singular values, and another unitary matrix,
Q = U S V† , (2.86)
where U and V are unitary matrices and the diagonal matrix

⎛ ⎞
s1 0 0 ···
⎜ 0 s2 0 ⎟
⎜ ⎟
S=⎜ 0 0 s3 ⎟ (2.87)
⎝ ⎠
.. ..
. .
contains the singular values s1 , s2 , . . .. In the decomposition, there is sufficient

freedom to impose the requirement that the singular values are real and positive.
Note that the singular matrix S need not be square. In fact the dimensions of
S are the same as the dimensions of Q since both the right and left singular
matrices U and V are square. The mth column in either U or V is said to be
the mth left-hand or right-hand singular vectors associated with the mth singular
value, sm .
The eigenvalues of the quadratic Hermitian form QQ† are equal to the square
of the singular values of Q,
Q Q† = U S V† V S† U† = U S S† U† , (2.88)
where SS† = diag{ s1 2 , s2 2 , . . .}. The columns of U are the eigenvectors of

Q Q† . The eigenvalues of a Hermitian form QQ† are greater than or equal to
zero, and thus the form QQ† is said to be positive-semidefinite,
λm {QQ† } = (S S† )m ,m ≥ 0 . (2.89)
Notationally, a matrix with all positive eigenvalues is said to be positive-definite,

as defined in Equation (2.32), and is indicated by
M > 0 → λm {M} > 0 ∀m, (2.90)
and a positive-semidefinite matrix, as defined in Equation (2.31), is indicated by
M ≥ 0 → λm {M} ≥ 0 ∀m. (2.91)
The rank of a matrix is the number of nonzero eigenvalues,
rank{M} = #{m : λm {M} = 0} , (2.92)
where #{·} is used to indicate the number of entries that satisfy the condition.
2.3.4 QR decomposition
Another common matrix decomposition is the QR factorization. In this decom-
position, some matrix M is factored into a unitary matrix Q and an upper
right-hand triangular matrix R, where an upper right-hand triangular matrix
has the form

⎛ ⎞
r1,1 r1,2 r1,3 ··· r1,n
⎜ 0 r2,2 r2,3 ··· r2,n ⎟
⎜ ⎟
⎜ 0 0 r3,3 ··· r3,n ⎟
R=⎜ ⎟. (2.93)
⎜ .. .. ⎟
⎝ . . ⎠
0 0 0 ··· rn ,n
If the matrix M has symmetric dimensions n × n, then the decomposition is

given by
M = QR. (2.94)
For a rectangular matrix M ∈ Cm ×n with m > n, the QR decomposition can be

constructed so that

R
M=Q , (2.95)
0
where the upper triangular matrix has dimensions R ∈ Cn ×n , and the zero
matrix 0 has dimensions (m − n) × n.
2.3.5 Matrix subspaces

Given some vector space, subspaces are some portion of that space. This can be
defined by some linear basis contained within the larger vector space. It is often
useful to describe subspaces by employing projection operators that are orthog-
onal to any part of the vector space not contained within the subspace. Vector
spaces can be constructed by either column vectors or row vectors depending
upon the application.
The matrix M ∈ Cm ×n can be decomposed into components that occupy
orthogonal subspaces that can be denoted by the matrices MA ∈ Cm ×n and
MA ⊥ ∈ Cm ×n such that
M = MA + MA ⊥ . (2.96)
The matrix MA can be constructed by projecting M onto the subspace spanned

by the columns of the matrix A ∈ Cm ×m whose number of rows is less than
or equal to the number of columns, that is, m ≤ m. It is assumed here that
A† A is invertible and that we are operating on the column space of the matrix
M, although there is an equivalent row-space formulation. We can construct a
projection matrix or projection operator PA ∈ Cm ×m that is given by
PA = A (A† A)−1 A† . (2.97)
For some matrix of an appropriate dimension B, this projection matrix operates

on the column space of B by multiplying the operator by the matrix PA B.
span (A)
Figure 2.1 Illustration of projection operation.
As an aside, it is worth noting that projection matrices are idempotent, i.e.,

PA PA = PA . The matrix MA which is the projection of M onto the subspace
spanned by the columns of A is given by
MA = PA M . (2.98)
The rank of MA is bounded by the number of rows in A,
rank{MA } ≤ m . (2.99)
The orthogonal projection matrix P⊥

A is given by
P⊥
A = I − PA
= I − A (A† A)−1 A† . (2.100)
We define the matrix MA ⊥ to be the matrix projected onto the basis orthogonal
to A, MA ⊥ = P⊥
A M. Consequently, the matrix M can be decomposed into the
matrices
M = IM
= (PA + P⊥
A) M
= MA + MA ⊥ . (2.101)
To illustrate, consider Figure 2.1. The projection matrix PA projects the vec-
tor v onto a subspace that is spanned by the columns of the matrix A which
is illustrated by the shaded region. The projected vector is illustrated by the
dashed arrow. The associated orthogonal projection P⊥ A projects the vector v
onto the subspace orthogonal to that spanned by the columns of A resulting in
the vector illustrated by the dotted arrow.
2.4 Special matrix forms
In signal processing applications, a number of special forms of matrices occur

commonly.
2.4.1 Element shifted symmetries

Toeplitz matrices are of particular interest because they are produced in certain
physical examples, and there are fast inversion algorithms for Toeplitz matrices.
While for a general square matrix of size n it takes order n3 operations, a Toeplitz
matrix can be inverted in order n2 operations [117].
An n × n Toeplitz matrix is a matrix in which the values are equal along
diagonals,
⎛ ⎞
a0 a−1 a−2 · · · a−n +1
⎜ a1 a0 a−1 a−n +2 ⎟
⎜ ⎟
⎜ a2 a1 a0 ⎟
M=⎜ ⎟. (2.102)
⎜ . .. ⎟
⎝ .. . a
−1
⎠
an −1 an −2 a1 a0
The Toeplitz matrix is defined by 2n − 1 values. An n × n circulant matrix is

a special form of a Toeplitz matrix such that each row or column is a cyclic
permutation of the previous row or column:
⎛ ⎞
a0 an −1 an −2 · · · a1
⎜ a1 a0 an −1 a2 ⎟
⎜ ⎟
⎜ a2 a a ⎟
M=⎜ 1 0 ⎟. (2.103)
⎜ . . ⎟
⎝ .. .. a ⎠
n −1
an −1 an −2 a1 a0
The circulant matrix is defined by n values. An additional property of circulant

matrices is that they can be inverted in the order of n log n operations.
2.4.2 Eigenvalues of low-rank matrices

A low-rank matrix is a matrix for which some (and usually most) of the eigen-
values are zero. In a variety of applications, such as spatial covariance matrices,
low-rank matrices are constructed with the outer product of vectors.
Rank-1 matrix
For example, a rank-1 square matrix M is constructed by using complex n-vectors
v ∈ Cn ×1 and w ∈ Cn ×1 ,
M = v w† . (2.104)
2.5 Matrix inversion 27
This matrix has an eigenvector proportional to v and eigenvalue of w† v. The

eigenvalue can be determined directly by noting that the trace of the matrix is
equal to the sum of the eigenvalues which for a rank-1 matrix are all zero except
for one. For comparison, this matrix has a single nonzero singular value given by
w v .
Rank-2 matrix
A Hermitian rank-2 matrix M can be constructed by using two n-vectors x ∈
Cn ×1 and y ∈ Cn ×1 ,
M = xx† + yy† . (2.105)
The eigenvalues can be found by using the hypothesis that the eigenvector is
proportional to x + ay where a is some undetermined constant. The nonzero
eigenvalues of M are given by λ+ and λ− ,

2
x 2 + y 2 ± ( x 2 − y 2 ) + 4 x† y 2
λ± {M} = . (2.106)
2
2.5 Matrix inversion
For a square nonsingular matrix, that is a matrix with all nonzero eigenvalues,
so that |M| = 0, the matrix inverse of M satisfies
M−1 M = M M−1 = I . (2.107)
The inverse of the product of nonsingular square matrices is given by
(A B)−1 = B−1 A−1 . (2.108)
The inverse and the Hermitian operations as well as the transpose operations
commute,
(M† )−1 = (M−1 )† and (MT )−1 = (M−1 )T . (2.109)
The SVD, discussed in Section 2.3.3, of the inverse of a matrix is given by
−1
M−1 = U D V†
= V D−1 U† . (2.110)
It is often convenient to consider 2 × 2 matrices. Their inverse is given by
−1
a b 1 d −b
= . (2.111)
c d ad − bc −c a
The general inverse of a partitioned matrix is given by
−1
A B (A − B D−1 C)−1 −A−1 B(D − C A−1 B)−1
= .
C D −D−1 C (A − B D−1 C)−1 (D − C A−1 A)−1
(2.112)
2.5.1 Inversion of matrix sum

A general form of the Woodbury’s formula is given by
(M + A B)−1 = M−1 − M−1 A (I + B M−1 A)−1 B M−1 . (2.113)
A special and useful form of Woodbury’s formula is used to find the inverse of
the identity matrix plus a rank-1 matrix,
v w†
(I + v w† )−1 = I − . (2.114)
1 + w† v
The inverse of the identity matrix plus two rank-1 matrices is also useful. Here
the special case of a Hermitian matrix is considered. The matrix to be inverted
is given by
I + a a† + b b† , (2.115)
where a and b are column vectors of the same size. The inverse is given by

† † −1 a a† b b† a† b 2
(I + a a + b b ) = I − + 1+
1 + a† a 1 + b† b γ
1 † † † †

+ a bab + b aba , (2.116)
γ
where here
γ = 1 + a† a + b† b + a† a b† b − a† b 2
. (2.117)
This result can be found by employing Woodbury’s formula with M in Equation
(2.113) given by
M = I + b b†
b b†
M−1 = I − . (2.118)
1 + b† b
Consequently, Woodbury’s formula provides the form
(I + a a† + b b† )−1 = (I + b b† )−1 (2.119)
† −1 † † −1 −1
− (I + b b ) a (1 + a [I + b b ] a) a (I + b b† )−1 ,
†
which, after a bit of manipulation, is given by the form in Equation (2.116).
2.6 Useful matrix approximations
2.6.1 Log determinant of identity plus small-valued matrix

Motivated by a multiple-input multiple-output (MIMO) capacity expression, a
common form seen throughout this text is
c = log2 |I + M| , (2.120)
where M is a Hermitian matrix. Because the determinant of a matrix is given

by the product of the eigenvalues and that
λm {I + M} = 1 + λm {M} , (2.121)
where the mth eigenvalue of a matrix is indicated by λm {·}, the capacity expres-
sion in Equation (2.120) is equal to

c = log2 (1 + λm {M})
m

= log2 (1 + λm {M})
m

= log2 (e) log (1 + λm {M})
m

≈ log2 (e) λm {M} = log2 (e) tr{M} , (2.122)
m
if it is assumed that λm {M} 1 ∀ m. Here the approximation log(1 + x) ≈ x

for small values of x is employed.
2.6.2 Hermitian matrix raised to large power

Consider the Hermitian matrix M† = M ∈ Cn ×n . Under the assumption that
there is a single largest eigenvalue of M, the eigenvalue and dominant subspace
can be approximated by repeatedly multiplying the matrix by itself,
Mk = (UΛU† )(UΛU† ) . . . (UΛU† )

= U Λk U†
n
= λkm um u†m
m =1
≈ λk1 u1 u†1 , (2.123)
where U is a unitary matrix constructed from the eigenvectors um of M and

Λ = diag{λ1 , λ2 , . . . , λn } is a diagonal matrix containing the eigenvalues of M.
The largest eigenvalue λ1 grows faster than the other eigenvalues as the number
of multiplies grows and eventually dominates the resulting matrix. Here it is
assumed that there is a strict ordering of the largest two eigenvalues λ1 > λ2 .
2.7 Real derivatives of multivariate expressions
The real derivatives of multivariate expressions follow directly from standard

scalar derivatives [217]. For the real variable α, the derivatives of complex
N -vector z and complex matrix M are given by

⎛ ∂ ⎞
∂ α {z}1
⎜ ⎟
⎜ ∂ α {z}2
∂
∂ ⎟
z=⎜ .. ⎟ (2.124)
∂α ⎝ . ⎠
∂ α {z}N
∂
and

∂ ∂
M = {M}m ,n , (2.125)
∂α m ,n ∂α
respectively.
A few useful expressions follow [217]. Under the assumption that the complex
vectors z and A are functions of α, the derivative of the quadratic form z† A z
with respect to the real parameter α is given by

∂ † ∂ † † ∂ ∂
z Az = z Az + z A z + z† A z. (2.126)
∂α ∂α ∂α ∂α
The derivative for the complex invertible matrix M with respect to real param-
eter α can be found by considering the derivative of ∂/∂α(MM−1 ) = 0, and it
is given by

∂ ∂
M−1 = −M−1 M M−1 . (2.127)
∂α ∂α
The derivatives of the determinant and the log determinant of a nonsingular
matrix M, with respect to real parameter α are given by

∂ −1 ∂
|M| = |M| tr M M (2.128)
∂α ∂α
and

∂ −1 ∂
log |M| = tr M M . (2.129)
∂α ∂α
The derivative of the trace of a matrix is equal to the trace of the derivative of
the matrix,

∂ ∂
tr{M} = tr M . (2.130)
∂α ∂α
2.7.1 Derivative with respect to real vectors

Calculus involving vectors and matrices is useful for problems involving filter-
ing and multiple-antenna processing. Here vectors in real space are considered.
Derivatives with respect to complex variables are considered in Section 2.8.
For a real column vector x of size N , the derivative of a scalar function f (x)
with respect to a real column vector x of length N is defined to be a row vector
given by

∂ ∂ ∂ ∂
f (x) = f (x) f (x) ··· f (x) . (2.131)
∂x ∂{x}1 ∂{x}2 ∂{x}N
This is the typical, but not the only, convention possible.
Under certain circumstances, it is convenient to use the gradient operator that
produces a vector or matrix of the same dimension as the object with which the
derivative is taken,
⎛ ∂
⎞
∂ {x}1 f (x)
⎜ ∂ ⎟
⎜ ∂ {x}2 f (x) ⎟
∇x f (x) = ⎜⎜ ⎟, (2.132)
.. ⎟
⎝ . ⎠
∂
∂ {x}N f (x)
where the scalar function is indicated by f (·), and the gradient is with respect
to matrix x ∈ RN ×1 , and
⎛ ⎞
∂
∂ {A}1 , 1 f (A)
∂
∂ {A}1 , 2 f (A) ··· ∂
∂ {A}1 , N f (A)
⎜ ∂ ∂
··· ∂
f (A) ⎟
⎜ ∂ {A}2 , 1 f (A) ∂ {A}2 , 2 f (A) ∂ {A}2 , N ⎟
⎜
∇A f (A) = ⎜ ⎟ , (2.133)
.. ⎟
⎝ . ⎠
∂
∂ {A}M , 1 f (A)
∂
∂ {A}M , 2 f (A) ··· ∂
∂ {A}M , N f (A)
where the scalar function is indicated by f (·), and the gradient is with respect
to matrix A ∈ RM ×N .
The Laplacian operator [11] is given by
∇x2 f (x) = ∇x · ∇x f (x) . (2.134)
Note that the term “Laplacian” can be used to describe several different quan-
tities or operators. In the context of this book, in particular in Chapters 13 and
14, we also make reference to the Laplacian of a random variable, which is the
Laplace transform of the probability density function of the random variable.
In a Euclidean coordinate system, the Laplacian operator is given by
N
∂ 2 f (x)
∇x2 f (x) = . (2.135)
m =1
∂{x}2m
In a three-dimensional space that is defined in polar coordinates with cylindrical

radius ρ, azimuthal angle φ in radians, and height z, the Laplacian operator is
given by

1 ∂ ∂f (ρ, φ, z)
∇2 f (ρ, φ, z) = ρ (2.136)
ρ ∂ρ ∂ρ
1 ∂ 2 f (ρ, φ, z) ∂ 2 f (ρ, φ, z)
+ 2 + .
ρ ∂φ2 ∂z 2
In a three-dimensional space that is defined in spherical coordinates with radius

r, azimuthal angle φ in radians, and angle from zenith (or equivalently from the
north pole) θ, the Laplacian operator is given by

1 ∂ 2 ∂f (r, φ, θ) 1 ∂ ∂ f (r, φ, θ)
∇ f (r, φ, θ) = 2
2
r + 2 sin(θ)
r ∂r ∂r r sin(θ) ∂θ ∂θ
2
1 ∂ f (r, φ, θ)
+ 2 2 , (2.137)
r sin (θ) ∂φ2
where · here indicates the inner product of the gradient operators.
Some useful evaluations of derivatives are presented in the following. For an
arbitrary vector a that is not a function of the real column vector x, the derivative
of the product of the column vectors is given by
∂ T
a x = aT e1 aT e2 aT e3 · · ·
∂x
= aT , (2.138)
where em is the column vector of all zeros with the exception of the mth element,
which has a value of 1,
⎛ ⎞
0
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
em = ⎜ ⎟
⎜ 1 ⎟. (2.139)
⎜ . ⎟
⎝ .. ⎠
0
Similarly, the derivative of the transpose of the product of the vectors with
respect to the transpose of x is given by
⎛ T ⎞
e1 a
∂ T ⎜ eT a ⎟
x a=⎝ 2 ⎠
∂xT ..
.
= a. (2.140)
Finally, taking the derivatives of the inner product with respect to the opposite
transpositions of x gives the forms
∂ T
x a = eT1 a eT2 a eT3 a · · ·
∂x
= aT (2.141)
and
⎛ ⎞
aT e1
∂ ⎜ T ⎟
aT x = ⎝ a e2 ⎠
∂xT ..
.
= a. (2.142)
The matrix A can be decomposed into a set of column vectors a1 , a2 , a3 , . . . ,
A = (a1 a2 a3 ···) (2.143)
or into a set of row vectors b1 , b2 , b3 , . . .

⎛ ⎞
b1
⎜ b2 ⎟
⎜ ⎟
A=⎜ b ⎟. (2.144)
⎝ 3 ⎠
..
.
The derivative of the matrix vector product with respect to x is given by
∂
A x = (Ae1 Ae2 Ae3 ···)
∂x
=A (2.145)
and
⎛ ⎞ ⎛ ⎞
eT1 A b1
∂ ⎜ T ⎟ ⎜ b2 ⎟
T
xT A = ⎝ e2 A ⎠ = ⎝ ⎠
∂x .. ..
. .
= A. (2.146)
Another common expression is the quadratic form xT Ax. The derivative of the
quadratic form with respect to x is given by
∂ T
x A x = eT1 A x eT2 A x eT3 A x · · ·
∂x

+ xT A e1 xT A e2 xT A e3 · · ·
= xT AT + xT A = xT (A + AT ) . (2.147)
Similarly, the derivative of the quadratic form with respect to xT is given by

⎛ T ⎞ ⎛ T ⎞
e1 A x x A e1
∂ T ⎜ eT A x ⎟ ⎜ xT A e2 ⎟
x Ax = ⎝ 2 ⎠+⎝ ⎠
∂xT .. ..
. .
= A x + AT x = (A + AT ) x . (2.148)
2.8 Complex derivatives
Because it is useful to represent many signals in communications with complex

variables, many problems involve functions of complex variables. In evaluating
the derivative of the function of complex variables, it is observed that the deriva-
tive can be dependent upon the direction in which the derivative is taken, for
example, along the real axis versus along the imaginary axis. Functions whose
derivatives are independent of direction are said to be holomorphic. Many dis-

cussions of complex analysis focus upon holomorphic (or analytic) functions.
Holomorphic functions, which have unique derivatives with respect to the com-
plex variable, satisfy the Cauchy–Riemann equations [53]. However, holomor-
phic functions occupy a very special and small subset of all possible complex
functions. The focus on holomorphic functions is problematic because many of
the functions that are important to signal analysis are not holomorphic. Here,
derivatives of holomorphic functions are considered in Section 2.8.1, and non-
holomorphic functions are considered by employing the Wirtinger calculus in
Section 2.8.2.
2.8.1 Cauchy–Riemann equations

In this section, holomorphic functions are discussed followed by an overview of
the calculus for a more general class of functions. If a function of complex variable
z is composed of real functions u and v,
f (z) = f (x, y) = u(x, y) + iv(x, y)

z = x + iy
z ∗ = x − iy , (2.149)
where x and y are real variables, the derivative of f with respect to z is given by
df f (z) − f (z0 )
= lim
dz z →z 0 z − z0
[u(x, y) − u(x0 , y0 )] + i[u(x, y) − u(x0 , y0 )]
= lim . (2.150)
x→x 0 ,y →y 0 [x − x0 ] + i[y − y0 ]
Because the path to z0 cannot matter for holomorphic functions, there is freedom
to approach the point at which the derivative is evaluated by moving along x or
along y. Consequently, the derivative can be expressed by
df [u(x, y0 ) − u(x0 , y0 )] + i[u(x, y0 ) − u(x0 , y0 )]
= lim
dz x→x 0 [x − x0 ]
z = z
∂u ∂v 0
= +i . (2.151)
∂x ∂x
With equal validity, the derivative can be taken along y, so that
df [u(x0 , y) − u(x0 , y0 )] + i[u(x0 , y) − u(x0 , y0 )]
= lim
dz y →y 0 i[y − y0 ]
z =z 0
1 ∂u ∂v
= +i . (2.152)
i ∂y ∂y
In order for the derivative to be independent of direction, the real and imaginary
components of the derivative must be consistent, so the holomorphic function
must satisfy
∂u ∂v
=
∂x ∂y
∂u ∂v
=− . (2.153)
∂y ∂x
These relationships are referred to as the Cauchy–Riemann equations.
2.8.2 Wirtinger calculus for complex variables

Unfortunately, many useful functions do not satisfy the Cauchy–Riemann equa-
tions and alternative formations are useful [352, 145]; for example, the real func-
tion
g(z) = z 2
= z z∗
= (x + iy) (x − iy)
= x2 + y 2
u(x, y) = x2 + y 2 , v(x, y) = 0 . (2.154)
The Cauchy–Riemann equations are not satisfied in general,

∂ ∂
u(x, y) = 2x = v(x, y) = 0
∂x ∂y
∂ ∂
u(x, y) = 2y = − v(x, y) = 0 . (2.155)
∂y ∂x
However, all is not lost. Real derivatives can be employed that mimic the form
of the complex variables. These derivatives can be used to find stationary points
for optimizations of real functions of complex variables and other calculations
such as Cramer–Rao estimation bounds under special conditions. For the com-
plex scalar z, where the real and imaginary components are denoted x and y, a
vector form of the complex scalar is given by

x {z}
= . (2.156)
y {z}
With this notation, a new set of real variables ζ and ζ can be constructed with
a transformation that is proportional to a rotation in the complex plane,

ζ 1 i x
=
ζ 1 −i y

x + iy
= . (2.157)
x − iy
Consequently, the real variables {ζ, ζ} can be directly related to the complex
variable z and its complex conjugate z ∗ . The real components of z can be found
in terms of the complex variable “doppelgangers”3 {ζ, ζ} by using the inverse of

the transformation matrix,
−1
x 1 i ζ
=
y 1 −i ζ

1 1 1 ζ
= . (2.158)
2 i −i ζ
By using the above transformation, complex doppelganger derivatives can be

defined by

df 1 ∂f ∂f
= −i (2.159)
dζ 2 ∂x ∂y
and

df 1 ∂f ∂f
= +i , (2.160)
dζ 2 ∂x ∂y
where the terms z and z ∗ in the expression f are replaced with the complex
doppelgangers ζ and ζ, respectively. It is worth stressing that the complex dop-
pelgangers are not complex variables. If great care is taken, one can use the
notation in which z and z ∗ are used as the complex doppelgangers directly. It
is probably clear that this approach is ripe for potential confusion because ·∗ is
both an operator and an indicator of an alternate variable. Furthermore, in us-
ing the Wirtinger calculus, we take advantage of underlying symmetries. While
taking a derivative with respect to a single doppelganger variable may be useful
for finding a stationary point (as evaluated in Equation (2.170)), it is not the
complete derivative. As an example, when the value of the gradient is of interest,
typically the full gradient with both derivatives is necessary.
This derivative form [5, 262, 172] is sometimes referred to as Wirtinger cal-
culus, or complex-real (CR) calculus. Given that this is just a derivative under
a change of variables, it is probably unnecessary to give the approach a name.
However, for notational convenience within this text, this approach is referenced
as the Wirtinger calculus. It is worth noting that this definition is not unique.
As an aside, the Cauchy–Riemann equations can be expressed by taking the
derivative with respect to the “conjugate” doppelganger variable,
∂f
= 0. (2.161)
∂ζ
3 The term “doppelgangers” has not been in common use previously. It is used here to stress
the difference between the complex variable and its conjugate, and two real variables used
in their place.
Evaluation of stationary point by using Wirtinger calculus

The standard approach to evaluating the extrema (maximum, minimum, or in-
flection) of a function is to find a stationary point. The stationary point of a real
function g(x, y) satisfies both
∂ ∂
g(x, y) = 0 and g(x, y) = 0 . (2.162)
∂x ∂y
When one is searching for a stationary point of a real function with complex
parameter z, it is useful to “rotate” the independent real variables {x, y} into
the space of the doppelganger complex variables {ζ, ζ}. The function of the
complex variable z is given by
g(z) = g(x, y) = g̃(ζ, ζ) . (2.163)
The Wirtinger calculus is clearer when conjugation of compound expressions

is evaluated and expressed in terms of the variables {ζ, ζ}. However, here, at the
risk of some confusion, the doppelganger variables will be expressed by {z, z ∗ }.
By using the Wirtinger calculus, the following differentiation rules are found:
∂z ∂z ∗
= ∗ =1
∂z ∂z
∂z ∂z ∗
= = 0. (2.164)
∂z ∗ ∂z
For example, under Wirtinger calculus, the derivatives with respect to the dop-
pelganger variables of the expression z 3 z ∗ 2 are given by
∂ 3 ∗2
z z = 3z 2 z ∗ 2 (2.165)
∂z
and
∂ 3 ∗2
z z = 2z 3 z ∗ . (2.166)
∂z ∗
This result is somewhat nonintuitive if you consider the meaning of z and z ∗ .
However, by remembering that here z and z ∗ represent real doppelganger vari-
ables, it is slightly less disconcerting. In particular, the stationary point of a
real function of complex variables expressed in terms of the conjugate variables
g̃(z, z ∗ ) satisfies
∂
g̃(z, z ∗ ) = 0 (2.167)
∂z
and
∂
g̃(z, z ∗ ) = 0 . (2.168)
∂z ∗
A case of particular interest is if f (z, z ∗ ) is real valued. In this case, the deriva-
tives with respect to z and z ∗ will produce the same stationary point,

∂f 1 ∂f ∂f
= + i
∂z ∗ 2 ∂x ∂y
∗
1 ∂f ∂f
= −i
2 ∂x ∂y
∗
∂f
= . (2.169)
∂z
In other words, the relationships
∂f ∂f
= 0 and =0 (2.170)
∂z ∗ ∂z
produce the same solution for z.
2.8.3 Multivariate Wirtinger calculus

Given the complex column N -vector z, the Wirtinger calculus discussed in
Section 2.8.2 considers {z}m and {z}∗m independent variables. For a vector func-
tion f (z, z∗ ), where {z}m and {z}∗m indicate the doppelganger variables, the
derivative with respect to z is given by
⎛ ∗ ∗
⎞
∂ {z}1 {f (z, z )}1
∂
∂ {z}2 {f (z, z )}1
∂
· · · ∂ {z}
∂
{f (z, z∗ )}1
⎜ {f (z, z∗ )}2 ⎟
M
∗ ∗
⎜ ∂ {z}1 {f (z, z )}2 ∂ {z}2 {f (z, z )}2 · · · ∂ {z}
∂ ∂ ∂
∂ ⎟
f (z, z∗ ) = ⎜ ⎟.
M
∂z ⎜ .
.. ⎟
⎝ ⎠
∗ ∗ ∗
∂
∂ {z}1 {f (z, z )}N
∂
∂ {z}2 {f (z, z )}N · · · ∂
∂ {z}M {f (z, z )}N
(2.171)
Similarly, the derivative with respect to z∗ is given by
⎛ ∗ ∗
⎞
∂ {z ∗ }1 {f (z, z )}1
∂
∂ {z ∗ }2 {f (z, z )}1
∂
··· ∂
∂ {z ∗ }M {f (z, z∗ )}1
⎜ ∗ ∗
{f (z, z∗ )}2 ⎟
⎜ ∂ {z ∗ }1 {f (z, z )}2 ∂ {z ∗ }2 {f (z, z )}2 ···
∂ ∂ ∂
∂ ∂ {z ∗ }M ⎟
f (z, z∗
) = ⎜ ⎟.
∂z∗ ⎜ .. ⎟
⎝ . ⎠
∗
∂
∗
∂ {z }1 {f (z, z )}N
∂
∗
∂ {z }2 {f (z, z∗ )}N ··· ∂
∂ {z ∗ }M {f (z, z∗ )}N
(2.172)
By using Wirtinger calculus, the differential of f is given by
∂ ∂
df (z, z∗ ) = f (z, z∗ ) dz + ∗ f (z, z∗ ) dz∗ . (2.173)
∂z ∂z
2.8.4 Complex gradient

In many gradient optimization operations, gradients are employed to find the
direction in the tangent space of a function at some point that has the great-
est change. The gradient of some function of the complex vector z = x + iy
constructed from real vectors x and y is most clearly defined by its derivation
in the real space, where the real gradient was discussed in Section 2.7.1. For
some real function f (z), the gradient of f (z) = f (x, y) is probably clearest when
expressed by building a vector from stacking x and y,

x
v= , (2.174)
y
so that the gradient is given by
∇v f (x, y) . (2.175)
This gradient can be remapped into a complex gradient by expressing the com-
ponents associated with y as being imaginary,
∇x f (x, y) + i ∇y f (x, y) . (2.176)
Some care needs to be taken in using this form because it can be misleading.
To evaluate a complex gradient, it is sometime useful to evaluate it by using
Wirtinger calculus. The problem with using the Wirtinger calculus to describe
the gradient is that it is not a complete description of the direction of maximum
change. There is some confusion in the literature in how to deal with this issue
[47]. Here we will first employ an explicit real gradient as the reference. Second,
we will abuse the notation of gradient slightly by defining a complete gradient
of a real function as being different from the Wirtinger gradient.
As an example, consider the function
f (z) = z† z
= xT x + y T y . (2.177)
The gradient is given by
∇x (xT x + yT y) + i ∇y (xT x + yT y) = 2x + i 2y . (2.178)
By interpreting Equations (2.159) and (2.160) as gradients, the complete gradient
of a real function in terms of the Wirtinger calculus is given by
∇x f (x, y) + i ∇y f (x, y) = 2 ∇z ∗ f (z) . (2.179)
For the above example, the complete gradient is then given by
∇x (z† z) + i ∇y (z† z) = 2∇z ∗ (z† z)
= 2z. (2.180)
2.9 Integration over complex variables
For signal processing applications, two types of integral are used commonly:
contour integrals and volume integrals. A volume may be a simple area as in the
case of a single complex variable, or a hypervolume in the case of a vector space
of complex variables.
2.9.1 Path and contour integrals

A path integral, or line integral, or contour integral, over the complex variable
z follows a path in z defined by S. The integral over some function f (z) is
represented by

dz f (z) . (2.181)
S
If the path is closed (forming a loop), then the term contour integral is often used
[54, 180, 40]. In contour integration, a particularly useful set of tools is available
if the function is differentiable, which is identified as a holomorphic or analytic
function with some countable number of poles. A pole is a point in the space of z
at which the function’s value goes to ± infinity. The integrals are often the result
of evaluating transformations of functions. If the path S forms a loop, then the
path is said to be closed and the notation is given by

dz f (z) . (2.182)
S
A common approach is to convert a line integral to a closed contour integral

by adding a path at a radius of infinity. This technique can be done without
modifying the integral if f (z) goes to zero sufficiently quickly when approaching
infinity so that
∞
dz f (z) = dz f (z) , (2.183)
−∞ S
where the original integral is along the real axis.

Contour integrals over holomorphic functions are particularly interesting be-
cause the evaluation of the integral is the same under deformations of the path
so long as no poles are crossed by the deformation. A pole is a point in the space
of z where the value of f (z) becomes unbounded. For example, the function
1
f (z) = (2.184)
z−a
has a simple pole at z = a. Note that this function is holomorphic (∂f (z)/∂z ∗ = 0
in a Wirtinger calculus sense). An integral over a closed path that encloses no
poles can be deformed to a path of zero length with a finite integrand and
therefore evaluates to zero,

dz f (z) = 0 . (2.185)
Sno p oles
In order to deform an integral past a pole, a residue is left. The integral is then
given by the sum of these residues created by deforming past the poles located
at am enclosed within the original path,

dz f (z) = 2πi Resa m {f (z)} , (2.186)
S m
where Resa m {f (z)} indicates the residue located at the mth enclosed pole located
at am , of the function f (z). In general, f (z) can be expressed in terms of a
Laurent series [53] about the mth pole am
∞

f (z) = bn (z − am )n , (2.187)
n = −∞
where bn is the coefficient of the expansion. The residue is given by b−1 ,
Resa m {f (z)} = b−1 . (2.188)
A function is said to have a pole of order N at am if it can be expressed as the

ratio of a holomorphic function h(z), such that h(am ) = 0, and term (z − am )N ,
h(z)
f (z) = . (2.189)
(z − am )N
Under the assumption that f (z) has a pole of order N at am , the residue is given
by
∂ N −1
Resa m {f (z)} = lim N −1
(z − am )N f (z) . (2.190)
z →a m ∂z
For the special form f (z) = ez t g(z), which is commonly generated in transform
problems, the residue for the kth order pole at am of g(z) is given by
1 ∂ (k −1)
Resa(km) {f (z)} = lim [(z − am )k ez t g(z)] . (2.191)
(k − 1)! z →a m ∂z (k −1)
The line integral along the real axis of the function f (z) = eiω z /(z 2 + 1) is an
example,
∞
eiω z
φ= dz 2
−∞ z +1
iω z
e
= dz 2 , (2.192)
z +1
where it is assumed that if ω > 0, then the upper path (via i∞) is taken as shown
in Figure 2.2. Otherwise, the lower path is taken. The poles are at z = ±i. Only
z = i is enclosed by the contour when following the upper path. Thus, because
this is a simple pole, the residue is given by

eiω z
Resa= i {f (z)} = lim (z − i) 2
z →i z +1
e−ω
= . (2.193)
2i
The integral is given by
φ = 2πi Resa=i {f (z)}

= π e−ω , (2.194)
+i
−i
Figure 2.2 Contour of integration using the upper half plane with poles at ±i.
when ω > 0. Similarly, if ω < 0, the lower path encloses the pole at z = −i, so
φ = π e−ω . (2.195)
We have addressed the cases of poles enclosed by a path and poles outside
a path. In the case in which a path is constrained such that a pole is on the
path, the residue evaluates to 1/2 of that if the pole were enclosed. Because
of the potential subtleties involved in evaluating contour integrals, consulting a
complex analysis reference is recommended [53].
2.9.2 Volume integrals

The integral over a volume (in this case area) in the complex plane can be
formally denoted

dz dz ∗ f (z, z ∗ ) . (2.196)
Here the notation is borrowed from the Wirtinger calculus for dz and dz ∗ formally
acting as independent variables. These integrals are often the result of evaluating
probabilities. Often the slightly lazy notation of f (z) is used rather than f (z, z ∗ ).
Notationally, this form can be extended to complex n-vector space z ∈ Cn ×1
using the notation

dn z dn z ∗ f (z) , (2.197)
where dn z and dn z ∗ are shorthand notation for d{z}1 , d{z}2 . . . d{z}n and
d{z}∗1 , d{z}∗2 . . . d{z}∗n .
In general, the integrals need to be converted to the real space of x and y for
evaluation. When convenient, the notation
d2 z = dx dy (2.198)
is employed. Also used is the notation dΩz to indicate the differential hypervol-
ume over the real and imaginary components of z, given by
dΩz = dx1 dy1 dx2 dy2 . . . dxn dyn , (2.199)
where xm = {z}m and ym = {z}m . In the case of a matrix Z, the differential

volume dΩZ includes differential contributions associated with all elements of
the matrix. For the real vector x, the differential hypervolume is given by
dΩx = d{x}1 d{x}2 . . . d{x}1 . (2.200)
If z is a complex scalar, then the differential hypervolume is indicated by

dΩz . Because the two definitions of the differential volumes differ by the value
of the Jacobian (determinant of the Jacobian matrix), it is important to stress
that dΩz indicates the differential volume in terms of the real and imaginary
components of the complex variable and not in terms of the complex variable
and its conjugate.
As an example of using differential volume, the integration over the complex
circular Gaussian distribution is considered. The complex circular Gaussian dis-
tribution with probability density p̃(z, z ∗ ) for Wirtinger parameters and p(x, y)
for real parameters is given by
1 −z 2 /σ 2
p̃(z, z ∗ ) dz dz ∗ = e dz dz ∗ . (2.201)
2πσ 2
By evaluating the Jacobian, the probability can be mapped to the space of x
and y

∂z ∂z
∂x ∂y
J = ∂z∗ ∂z∗
∂x ∂y

1 i
= = −2i . (2.202)
1 −i
Consequently, the probability density functions are related by
p̃(z, z ∗ ) dz dz ∗ = p(x, y) J dx dy = p(x, y) J dΩz . (2.203)
By using the substitutions x = (z +z ∗ )/2 and y = (z −z ∗ )/(2i), the probability

density function in terms of x and y is given by
1 −(x 2 +y 2 )/σ 2
p(x, y) dx dy = e dx dy
πσ 2
1 −(x 2 +y 2 )/σ 2
= e dΩz . (2.204)
πσ 2
This equation can be rewritten as the product of two real Gaussian distributions
by noting that the variance of the magnitude of the complex distribution σ 2 =
σR2 e + σI2m (where σR2 e is variance of the real part of z and σI2m is variance of the
imaginary part of z) is twice the variance of either of the two real distributions
σR e , so that σ 2 = 2σR e . Consequently, the probability density is given by

1 1
e−x e−y /(2σ R e ) dx dy.
2 2 2 2
p(x, y) dx dy = /(2σ R e) (2.205)
πσR2 e 2
πσR e
As a result, the integral over x and y is given by

1 1
e−x /(2σ R e ) e−y /(2σ R e )
2 2 2 2
dx dy p(x, y) = dx dy 2 2
2πσR e 2πσR e
= 1 · 1 = 1. (2.206)
Similarly, the second moment of the zero-mean complex circular Gaussian dis-
tribution, which is formally given by
z 2
∗ 2 e− σ 2
φ= dz dz z , (2.207)
2πσ 2
is given by
x2+y2
2 e− σ 22
φ = dx dy (x + y )
πσ 2
− σ2x2+y2 2
− x σ+2y
2
2 e 2 e
= dx dy x + dx dy y
πσ 2 πσ 2
x2 y2
e− σ 2 e− σ 2
= dx x2 √ + dy y 2 √
πσ 2 πσ 2
√ 3/2 √ 3/2
πσ πσ
= √ + √ = σ2 . (2.208)
2 πσ 2 2 πσ 2
Because x and y are often used to indicate variables other than the real and
imaginary parts of z, the notations zr and zi may sometimes be invoked, respec-
tively. Consequently, the differential real variable area dx dy would be indicated
by dΩz = dzr dzi .
2.10 Fourier transform
A particularly useful transformation for engineering and the sciences is the

Fourier transform. The typical interpretation is that the Fourier transform re-
lates a time domain function to a frequency domain function.
The transform of a complex function g with a real parameter t (time) to the
complex function G with a real parameter f (frequency) is given by
∞
G(f ) = dt e−i 2π t f g(t) . (2.209)
−∞
The inverse transform is given by

∞
g(t) = df ei 2π t f G(f ) . (2.210)
−∞
In terms of angular frequency ω = 2π f , these transforms4 are indicated by

∞
G(ω) = dt e−i t ω g(t) . (2.213)
−∞

∞
1
g(t) = dω ei t ω G(ω) . (2.214)
2π −∞
2.10.1 Useful Fourier relationships

If G(f ) is the Fourier transform of g(t), then
∞
−∞
dt e−i 2π f t g(t) = G(f )
∞
−∞
dt e−i 2π f t g(t − a) = e−i a 2π f G(f )

∞ −i 2π f t 1 f
−∞
dt e g(a t) = a G a (2.215)
∞ −i 2π f t ∂m m
−∞
dt e ∂ tmg(t) = (i2πf ) G(f )

∞
−∞
dt e−i 2π f t 1
θ(a t) = a sinc fa ,
where θ(x) is a function equal to 1 if x ≤ 1/2 and is 0 otherwise. Note that

θ(x) is sometimes denoted in the literature by rect(x). Also, sinc(x) is given by
sin(πx)/(πx).
If G(f ) and H(f ) are the Fourier transforms of g(t) and h(t), the convolution
of g(t) and h(t) is indicated by

(f ∗ h)(t) = dx f (x) h(t − x) , (2.216)
where x is a variable of integration to perform the convolution. The Fourier

transform of the convolution is given by
∞
dt e−iω t (g ∗ h)(t) = G(f ) H(f ) . (2.217)
−∞
Parseval’s theorem is given by

dt g(t) h∗(t) = df G(f ) H ∗(f ) , (2.218)
4 In some of the literature, the normalization is defined for symmetry for the angular
frequency variable
∞
1
G̃(ω) = √ dt e−i ω t g(t) . (2.211)
2π −∞
∞
1
g(t) = √ dω ei ω t G̃(ω) . (2.212)
2π −∞
which also implies that the integral over the magnitude squared in either domain
is the same,

dt g(t) 2 = df G(f ) 2 . (2.219)
2.10.2 Discrete Fourier transform

The discrete form of the Fourier transform is often useful for digital systems
that employ sampled data. It is useful for considering the spectral composition
of a finite temporal extent of data that has bounded spectral content satisfying
the Nyquist criteria. The standard description of the Nyquist criteria states
that the spectral content of a signal can be represented exactly for a band-
limited signal of single-sided bandwidth Bs.s. , if the regular real samples are
spaced by Ts such that Ts ≤ 1/(2Bs.s. ). While there are both positive and
negative frequencies produced by the Fourier transform, they provide redundant
information for a real signal; thus, the single-side bandwidth is often considered.
In our discussion, because we typically assume that we are working with samples
of a complex signal (for example, we might have an analog-to-digital converter5
at the receiver for both the real and imaginary components), so we have two real
samples for every sample point in time, the full bandwidth (both positive and
negative frequencies) contains useful information. We denote this full bandwidth
B = 2Bs.s. . Consequently, we require that the sample period must satisfy
1
Ts ≤ . (2.220)
B
It is worth noting that if the spectrum is known to be sparse, then compressive
sampling techniques [88] can be used that reduce the total number of samples,
but that discussion is beyond the scope of this text.
If the spectral content extends just a little beyond that supported by the
Nyquist criteria, then the spectral estimate at the spectral edges will be contam-
inated by aliasing in which spectral components at one edge extend beyond the
estimated spectral limits and contaminate the estimates at the other end. A set
of regularly spaced samples is assumed here. This set may be of finite length.
The samples in the time domain (organized as a vector) are represented here by
{x}m = g([m − 1]Ts ) . (2.221)
The samples in the frequency domain are represented here by

1
{y}m = G [m − 1] . (2.222)
Ts
5 Here we are ignoring quantization effects.

Under the assumption of unitary normalization, the discrete Fourier transform

(DFT) for equally spaced samples is given by
n −1
1 −i 2 π m k
{y}k = √ e n {x}m ; k = 0, . . . , n − 1 . (2.223)
n m =0
The inverse DFT is given by

n −1
1 i 2π mk
{x}m = √ e n {y}k ; m = 0, . . . , n − 1 . (2.224)
n
k =0
Analogous to aliasing in the frequency domain, when we sample signals in the

frequency domain, we may introduce aliasing in the time domain if the signals of
interest have temporal extent beyond the Nyquist criteria. Here a symmetric (or
√
unitary) normalization is used in which the summation is multiplied by 1/ n for
both transformations. It is also common to use the normalization in which the
transformation is missing the normalization term and the inverse transformation
has the term 1/n. Depending upon the situation, each normalization has advan-
tages. In this text, the symmetric version is preferred. It is sometimes useful to
think of the DFT in terms of a unitary matrix operator denoted here by F,
⎛ ⎞
1 1 1 1 1
⎜ 1 e−i n 1
2π
e−i n 2
2π
e−i n 3
2π
· · · e−i n (n −1) ⎟
2π
⎜ ⎟
⎜ 1 −i n 2
2 π
−i n 4
2 π
−i n 6
2 π
−i nπ 2(n −1) ⎟
2
1 ⎜ e e e e ⎟
F= √ ⎜ ⎜ 1 e −i 2nπ 3
e−i 2nπ 6
e−i 2nπ 9
e−i 2nπ 3(n −1) ⎟
⎟.
n⎜ ⎟
⎜ .. .. ⎟
⎝ . . ⎠
1 e−i n (n −1) e−i n 2(n −1) e−i n 3(n −1) e−i n (n −1)
2π 2π 2π 2π 2
(2.225)
This DFT matrix satisfies the unitary characteristics,
F−1 = F†
F† F = FF† = I . (2.226)
By using this definition, the vectors x and y are given by
y = Fx
x = F† y . (2.227)
A computationally efficient implementation of the DFT is the fast Fourier

transform (FFT). Often the terms DFT and FFT are used interchangeably. The
FFT enables the evaluation of the DFT with order O(n log(n)) operations rather
than the O(n2 ) of a matrix-vector multiply observed in Equation (2.227). This
potential computational savings motivates a number of signal processing and
communications techniques. As an example, orthogonal-frequency-division mul-
tiplexing (OFDM), discussed in Section 10.5.3, typically exploits this computa-
tional savings.
2.11 Laplace transform
Another transform that is often encountered is the Laplace transform, which can
be viewed as a generalization of the Fourier transform. The Laplace transform
of a function f (·) is defined as

L {f (·)} (s) = dxf (x) e−sx , (2.228)
where s is a complex number, s ∈ C. If the integral above does not converge

for all s, the region of convergence of the Laplace transform needs to also be
specified. The region of convergence refers to the range of values of s for which
the Laplace transform converges.
The Laplace transform has properties that are similar to the Fourier transform.
Suppose that the Laplace transform of the functions f (t) and g(t) are F (s) and
G(s). Then the following properties hold:
L {a f (t) + b g(t)} = a F (s) + b G(s) (2.229)

d
L f (t) = s F (s) . (2.230)
dt
2.12 Constrained optimization
2.12.1 Equality constraints

In many situations, we may wish to optimize a function subject to an equality
constraint. This type of problem can be described by
min f (x) = f (x1 , x2 , . . . , xn ) such that (2.231)
g(x) = g(x1 , x2 , . . . , xn ) = 0 , (2.232)
where x ∈ Rn ×1 . The function to be optimized, f (x) ∈ R, is known as the objec-
tive function and g(x) is the constraint function. In other words, the optimization
problem described above aims to find the smallest value of f (x) over all points
where g(x) = 0. The method of Lagrange multipliers establishes necessary con-
ditions for a point x = (x1 , x2 , . . . , xn ) to be a stationary point (a local minima,
maxima or inflection point).
The principle behind the Lagrange multiplier has a satisfying geometric in-
terpretation. The main property used in the method of Lagrange multipliers is
that the gradient of the constraint and the objective functions must be paral-
lel at stationary points. To see why the gradients must be parallel at a given
stationary point x̄, consider a two-dimensional example where n = 2. For illus-
trative purposes, consider Figures 2.3 and 2.4, where an example of an objective
function f (x1 , x2 ) and a constraint function
g(x) = x1 − x2 = 0
−0.2
−0.4
f(x 1, x2)
−0.6
constraint
−0.8
−1
−2
−1
0
1 2
1
0
2 −1
−2
x1 x
2
Figure 2.3 Constrained optimization with equality constraints.
are shown in surface and contour plots, respectively. The minimum occurs at the
point marked by the dot in Figure 2.3.
As we trace out the path of the constraint function g(x) = 0 on the surface of
the objective function f (x), observe that the constraint function intersects the
contours of the objective function, except at stationary points (the point from
which the dashed arrow originates in Figure 2.4), where the constraint func-
tion just touches the contour of the objective function, but never crosses it. In
other words, the constraint function is tangent to the surface of the objective
function at all stationary points. Since the gradient is always perpendicular to
tangents, the gradient vector must be parallel to the objective function at station-
ary points. Hence, if a point x̄ is a stationary point that satisfies the constraint
equation
g(x) = 0, (2.233)
the gradient vectors of the objective function f (x) and the constraint function
g(x) are parallel at x̄. In other words, the gradient vectors must be linearly
−0.2
−0.4
f (x1, x2)
−0.6 constraint
−0.8
−1
−2 tangent contour
−1
0
1 2
1
0
2 −1
−2
x1 x2
Figure 2.4 Constrained optimization with equality constraints, contour plot.
related. Thus, there must exist a λ such that
∇x f (x̄) = −λ ∇x g(x̄), and (2.234)

g(x̄) = 0. (2.235)
The term λ is known as the Lagrange multiplier. We can combine the two equa-
tions above by defining a function Λ(x̄, λ) as follows:
Λ(x̄, λ) = f (x̄) + λ g(x̄),
and then writing
∇x,λ Λ(x̄, λ) = 0. (2.236)
The gradient operator in the equation above is with respect to the elements of
x and λ. Taking the gradient with respect to the elements of x ensures that
the gradient vectors of the objective function f (x) and the constraint function
g(x) are parallel, that is to say, Equation (2.234) is satisfied. Taking the gradient
with respect to the Lagrange multiplier λ ensures that the constraint equation is
satisfied since taking the derivative of Equation (2.236) with respect to λ results
precisely in Equation (2.235). Hence, by solving for λ and x in Equation (2.236),

one can find all the stationary points.
With multiple constraint functions, the method of Lagrange multipliers can be
generalized by using essentially the same arguments presented before. Suppose
that the constraint equations are
g1 (x) = 0 (2.237)
g2 (x) = 0
..
.
gK (x) = 0.
At any stationary point x̄, there must exist λ1 , λ2 , . . . λK such that

K
∇x,λ 1 ,...,λ k f (x̄) + λk gk (x̄) = 0. (2.238)
k =1
2.12.2 Inequality constraints

We may also wish to optimize an objective function with both equality and
inequality constraints. This may be stated mathematically as follows:
min f (x) = f (x1 , x2 , . . . , xn ) such that (2.239)

h1 (x) = 0
h1 (x) = 0
..
.
hK (x) = 0, (2.240)
with the following inequality constraints:
g1 (x) ≤ 0
g2 (x) ≤ 0
..
.
gM (x) ≤ 0 . (2.241)
If a point x̄ is a local minima that satisfies the constraints, x̄ must satisfy the
Karush–Kuhn–Tucker conditions, which are as follows.
There exist μ̄ = (μ̄1 , μ̄2 , . . . , μ̄M ) and λ̄ = (λ̄1 , λ̄2 , . . . λ̄K ) such that

M
M
∇f (x̄) + μ̄m ∇ gm (x̄) + λ̄k ∇ hk (x̄) = 0 ,
m =1 m =1
μ̄m gm (x̄) = 0 for m = 1, 2, . . . M ,

μ̄m ≥ 0 for m = 1, 2, . . . M . (2.242)
Note that x̄ must satisfy the constraints of the problem, which are known as the
primal feasibility constraints, since without them x̄ cannot be a solution to the
optimization problem:
gm (x̄) ≤ 0 for m = 1, 2, . . . M
hk (x̄) = 0 for k = 1, 2, . . . K . (2.243)
The Karush–Kuhn–Tucker conditions above are sufficient for optimality if f (x)

and gm (x) are convex for m = 1, 2, . . . M .
Figure 2.5 illustrates a convex function f (x1 , x2 ) with the following inequality
constraint:
x1 − x2 ≤ 0.
The arrows in the plot indicate the region of the x1 − x2 plane that satisfies the
boundary conditions. The global minimum is marked by the dot.
In this case, the Karush–Kuhn–Tucker conditions are given by
∇ f (x̄1 , x̄2 ) + μ̄∇ (x̄1 − x̄2 ) = 0 (2.244)
and
μ̄(x̄1 − x̄2 ) = 0 (2.245)

μ̄ ≥ 0 .
The feasibility conditions are satisfied by any point in the region indicated by
the arrows.
Observe that Equation (2.245) can be satisfied either if x̄1 − x̄2 = 0, i.e.,
the optimal point is at the boundary of the constraint function, or if μ̄ = 0, in
which case the optimal point is not on the boundary. If the point is not on the
boundary, then μ̄ = 0, and Equation (2.244) becomes
∇ f (x̄1 , x̄2 ) = 0 , (2.246)
which is a global minimum because of the convexity of f (x1 , x2 ).

Now suppose that the constraint function is
x2 − x1 ≤ 0 ,
and the objective function f (x1 , x2 ) is the same as before. Figures 2.6 and 2.7
illustrate this case, where the dot indicates the optimal point.
The Karush–Kuhn–Tucker conditions for this optimization problem are
∇ f (x̄1 , x̄2 ) + μ̄∇ (x̄2 − x̄1 ) = 0 (2.247)
and
μ̄(x̄2 − x̄1 ) = 0 (2.248)

μ̄ ≥ 0,
25
20
15
f(x1, x2)
10
−5 10
10 5 0 5
0 −5 −5
−10 −10
x2 x
1
Figure 2.5 Karush–Kuhn–Tucker theorem with interior minima.
with the feasibility constraints satisfied by points on the region of the x1 –x2
plane indicated by the arrows.
Note that at the minima, Equation (2.248) is satisfied for any μ̄ because x̄2 −
x̄1 = 0. Hence Equation (2.247) remains unchanged and identical to the Lagrange
multiplier technique for optimization with equality constraints for which the
optimal point is apparent from Figure 2.7. The global optimality of this point
follows from the convexity of f (x1 , x2 ).
2.12.3 Calculus of variations

A function mapping members of a set of functions to real values is a functional.6
In many contexts, we may wish to find a function that maximizes or minimizes a
functional, subject to some constraints. For instance, one could be asked to de-
scribe the shortest path between two points on the surface of a sphere. Solutions
to problems of this form can be found by using the calculus of variations.
Suppose that we wish to find the function y(x) that minimizes the following
quantity,
b
I(y) = dx g(x, y(x), y (x)) , (2.249)
a
6 Note that the general definition of a functional is a mapping from a vector space (the
space of functions is a vector space) to the real numbers.
25
20
15
f(x1, x2)
10
−5 10
10 5 0 5
0 −5 −5
−10 −10
x x
2 1
Figure 2.6 Karush–Kuhn–Tucker theorem with boundary minima.
where the functional is g(·, ·, ·). We use the notation y (x) to represent the deriva-
tive of y with respect to x, i.e.,
d
y (x) = y(x) (2.250)
dx
For all continuous functions that maximize or minimize the quantity I in
Equation (2.249), the function y(x) must satisfy

dg d dg
− = 0. (2.251)
dy dx dy
The equation above is known as the Euler–Lagrange differential equation.

The canonical example used to illustrate calculus of variations is the problem
of proving the fact that the shortest path between two points on a plane is given
by a straight line connecting the two points. Consider Figure 2.8 and suppose
that we wish to travel from the point (a, y(a)) to the point (b, y(b)) along a
function y(x).
The length of the curve connecting the point (a, y(a)) to the point (b, y(b)) is
given by
b 2
d
dx 1+ y(x) . (2.252)
a dx
4
x1− x2 = 0
2
x2
−2
−4
−6
−8
−8 −6 −4 −2 0 2 4 6 8
x1
Figure 2.7 Karush–Kuhn–Tucker theorem with boundary minima, coutour plot.
(b, y(b))
(a, y(a))
x
Figure 2.8 Shortest path between two points determined by using calculus of
variations.
We start by defining I(y(.)) as the following operation on the functional y(.):
b 2
d
I(y(.)) = dx 1+ y(x) . (2.253)
a dx
If the function y(.) minimizes I(y(.)), then it must satisfy Equation (2.251).
Writing the Euler–Lagrange differential equation we have
⎛ ⎞
d y (x)
dg d ⎝ ⎠
0= − dx
d 2
d y(x) d x
1 + dx y(x)
⎛ ⎞
d 2 y (x) 2 d 2 y (x)
⎝ ⎠ d y(x)
=− dx 2 d x2
d 2 + dx d 2 3/2
1 + dx y(x) 1 + dx y(x)
d 2 y (x)
= − d x2
2 3/2
(2.254)
d
1+ dx y(x)
Hence, the curve that minimizes the distance between the two points (a, y(a))
and (b, y(b)) must have a second derivative that equals zero for all x in [a, b]. If
the second derivative is zero in the interval, then for all x in [a, b], it must be the
case that
d d
y(x) = y(a) . (2.255)
dx dx
Integrating both sides with respect to x yields
d
y(x) = x y(a) + A , (2.256)
dx
where A is a constant. Hence, we have proved that if y(x) is the curve that
minimizes the distance between the points (a, y(a)) and (b, y(b)), it must be a
straight line. Note that technically, we haven’t proved that there is a curve that
minimizes the distance between those points, but assuming that there is such a
curve, we have shown that it is a straight line.
In order to prove that it is necessary for any function y(x) which minimizes
(2.249) to satisfy the Euler–Lagrange equation, we first observe that any function
y(x) that minimizes Equation (2.249) must satisfy
I(y(x)) ≤ I(y (x)),
where y (x) is a perturbed version of y(x). The perturbed function y (x) is given
by the following
y (x) = y(x) + h(x) .
Here h(x) is any other function which has the following properties
h(a) = h(b) = 0 .
The last property implies that y(a) = y (a) and y(b) = y (b). Additionally,
assume that h(x) is continuous and has a continuous derivative.
If y(x) minimizes I(y(.)), it must be the case that y (x) at = 0 minimizes
I(y(.)). Hence, the derivative of I(y (x)) with respect to must be zero. This
2.13 Order of growth notation 57
requirement leads to
b
d d d
I(y (x)) = dx g x, y , y (x) x . (2.257)
d d a dx
Moving the derivative into the integral, one finds that
b
d d d
I(y (x)) = dx g x, y , y (x) (2.258)
d a d dx
b
d d d d d d
= dx g x+ g y +
g y . (2.259)
a dx d d y d d y d
Substituting y (x) = y(x) + h(x) yields
b
d d d d
I(y (x)) = dx g h(x) +
g h (x) dx. (2.260)
d a d y d y d y
d
Because y(x) = y0 (x) minimizes I(y(.)), d I = 0 at = 0. Therefore, at = 0,
we have
b
d d
dx g h(x) + g h (x) = 0 . (2.261)
a dy dy
We can then use integration by parts to write the right-hand side of the above
integral as follows
b
d d d
0= dx g− g h(x) .
a dy d x d y
Since the function h(x) is any arbitrary continuous function with a continuous
derivative, the above equation can hold only if

d d d
g− g = 0,
dy d x d y
which is the Euler–Lagrange differential equation.
2.13 Order of growth notation
In many areas of engineering, it is useful to understand the order of growth

of one function with respect to another. For instance, we may wish to know
the rate at which the per-link data rate in an ad hoc wireless network declines
with increasing numbers of nodes. The so-called big-O/little-O notation is useful
to describe this. This notation is also used in other fields such as computer
science and approximation theory, and is also referred to as the Landau notation.
Consider two real functions f (x) and g(x). We say that f (x) is “big-O” of g(x),
or f (x) = O(g(x)), when there exists a constant A and X such that
f (x) < A g(x) (2.262)

for all x > X. In other words, the function g(x) grows at a faster rate with x
than the function f (x). We say that f (x) is “little-O” of g(x), or f (x) = o(g(x)),
when
g(x)
→0 (2.263)
f (x)
as x → ∞. We say that f (x) is “theta of” g(x), or f (x) = Θ(g(x)), if there exist
an X, and constants A1 and A2 such that for all x > X,
A1 g(x) ≤ f (x) ≤ A2 g(x). (2.264)
In other words f (x) and g(x) grow at the same rate for sufficiently large x.
Confusingly, it is common practice to use this notation to describe the order of
growth of functions when x is close to zero as well as when x is large as described
above. The little-O notation is most commonly used in this context, whereby we
say that f (x) is little-O of g(x), or f (x) = o(g(x)), when
g(x)
→0 (2.265)
f (x)
as x → 0. A common application of the little-O notation is in writing Taylor
series expansions of functions for small arguments. For instance, one may write
the Taylor series expansion of log(1 + x) for small x as
log(1 + x) = x + o(x) . (2.266)
2.14 Special functions
In this section, we briefly summarize some special functions that are often en-
countered in wireless communications in general and in this text in particular.
2.14.1 Gamma function

The gamma function Γ(z) is an extension of the factorial function to real and
complex numbers and is defined as
∞
Γ(z) = dτ τ z −1 e−τ . (2.267)
0
The integral requires analytic continuation to evaluate in the left half plane,
and the gamma function is not defined for non-positive integer real values of z.
For the special case of integer arguments the gamma function can be expressed
in terms of the factorial,
Γ(n) = (n − 1)! (2.268)
−1
n
= m. (2.269)
m =1
Two related functions are the upper and lower incomplete gamma functions
defined respectively as
∞
Γ(s, x) = dτ τ s−1 e−τ , (2.270)
x
and
x
γ(s, x) = dτ τ s−1 e−τ . (2.271)
0
The lower incomplete gamma function γ(s, x) is often encountered in commu-

nication systems since the cumulative distribution function of a χ2 distributed
random variable is proportional to the lower incomplete gamma function. The
following asymptotic expansion of the lower incomplete gamma function is useful
for analyzing the probability of error of wireless communications systems.
1 s
γ(s, x) = x + o(xs ) . (2.272)
s
2.14.2 Hypergeometric series

The hypergeometric series [129] is defined as
∞ !p
(a ) xk
p Fq (a1 , . . . ap ; b1 , . . . bq ; x) = !mq =1 m k , (2.273)
n =1 (bn )k k!
k =0
where here (a)k is known as the Pochammer symbol defined as
Γ(a + k)
(a)k = = a(a + 1)(a + 2) · · · (a + k − 1). (2.274)
Γ(a)
From its definition, it can be observed that the hypergeometric series does not
exist for non-positive integer values of bn because it would result in terms with
zero denominators.
The hypergeometric series arises in a variety of contexts; in particular, the
Gauss hypergeometric function (p = 2, q = 1) has some special properties. For
convenience of notation let a = (a)1 , b = (a)2 , and c = (b)1 , when the argument
is unity, we have
Γ(c) Γ(c − a − b)
2 F1 (a, b; c; 1) = . (2.275)
Γ(c − a) Γ(c − b)
Other special values include the following:
log (1 + x) = x 2 F1 (1, 1; 2; −x) (2.276)

−a
(1 − x) = 2 F1 (a, b; b; x) for all b except non-negative integers,
(2.277)

1 3
arctan(x) = x 2 F1 , 1; ; −x2 , (2.278)
2 2

1 1 3 2
arcsin(x) = x 2 F1 , ; ;x , (2.279)
2 2 2

1 1 3
log x + 1 + x2 = x 2 F1 , ; ; −x2 , (2.280)
2 2 2
and numerous other values which can be found in the literature. One particular
identity that is not widely available in the literature is the following, which
applies when p and q are integers with 0 < p < q:

p p p p
F
2 1 1, ; 1 + ; x = Lerch x, 1, (2.281)
q q q q
p k q1 −p
q −1
1
=− ζq x log 1 − ζqk x q , (2.282)
q
k =0
2π i
where ζq = e q is the qth root of unity and the Lerch transcendent is defined as
∞
xk
Lerch(x, s, a) = s , (2.283)
(a + k)
k =0
which has the following property,
1
Lerch(x, s, a) = x Lerch(x, s, a + 1) + . (2.284)
(a2 )s/2
Using this equality, we find

p p p p
2 F1 1, − ; 1 − ; x = − Lerch x, 1, − (2.285)
q q q q

p q−p q
= − Lerch x, 1, − (2.286)
q q p
px k q1 p−q
q −1
1
=− ζq x log 1 − ζqk x q . (2.287)
q
k =0
Euler’s hypergeometric transforms can be used to manipulate the parameters

of the Gauss hypergeometric function. The latter two of the following are useful
for numerically evaluating the hypergeometric series with negative arguments
which have alternating sign terms. The latter two transforms convert negative
arguments x to non-negative arguments:
2 F1 (a, b; c; x) = (1 − x)c−a−b 2 F1 (c − a, c − b; c; x) (2.288)

−b x
2 F1 (a, b; c; x) = (1 − x) 2 F1 c − a, b; c; (2.289)
x−1

−a x
2 F1 (a, b; c; x) = (1 − x) 2 F1 a, c − b; c; . (2.290)
x−1
2.14.3 Beta function

The beta function B(x, y) is defined by
1
B(x, y) = du ux−1 (1 − u)y −1
0
Γ(x) Γ(y)
= . (2.291)
Γ(x + y)
Incomplete beta function

The incomplete beta function is defined by
z
B(z; x, y) = du ux−1 (1 − u)y −1
0
zx
= 2 F1 (x, 1 − y; x + 1; z) , (2.292)
x
where 2 F1 (a1 , a2 ; b; z) is the hypergeometric function defined in Section 2.14.2.
Regularized beta function

The regularized beta function is given in terms of the ratio of an incomplete to
a complete beta function and is defined as
B(z; x, y)
I(z; x, y) = .
B(x, y)
2.14.4 Lambert W function

The Lambert W (or product logarithm) function which is often denoted by W (z)
is a function that has multiple branches and cannot be expressed in terms of
elementary functions. It is defined as the functional inverse of
z = W (z) eW (z ) . (2.293)
For real arguments, the Lambert W function W (z) has only two real branches:
the principal branch W0 (z) and another branch that is simply denoted by W−1 (z).
The principal branch has the following special values:

1
W0 − = −1
e
W0 (0) = 0
W0 (e) = 1 .
Additionally, the principle branch has a series representation which is valid for
|x| ≤ 1e ,
∞
k k −2 k
W0 (x) = (−1)k −1 x .
(k − 1)!
k =1
2.14.5 Bessel functions

Bessel functions [129, 343, 53] are given by the functional solutions for fα (x) to
the differential equation
∂2 ∂
x2 fα (x) + x fα (x) + (x2 − α2 ) fα (x) = 0 . (2.294)
∂x2 ∂x
There are two “kinds” of solution to this equation. Solutions of the first kind
are denoted by Jα (x), and solutions of the second kind are denoted Yα (x) (and
sometimes Nα (x)). The parameter α indicates the “order” of the function. The
Bessel function of the second kind is defined in terms of the first kind by
Jα (x) cos(α π) − J−α (x)
Yα (x) = . (2.295)
sin(α π)
Bessel functions are often the result of integrals of exponentials of trigonometric
functions. The Bessel function and the confluent hypergeometric function 0 F1 (·, ·)
are related by
x α
x2
Jα (x) = 2
F
0 1 α + 1; − . (2.296)
Γ(α + 1) 4
For integer values of order m, the Bessel function can also be expressed by using
the contour integral form given by

1
dz z −m −1 e 2 (z −1/z ) ,
x
Jm (x) = (2.297)
2πi C
where the counterclockwise contour C encloses the origin [11].
Modified Bessel functions of the first and second kind are denoted by Iα (x)
and Kα (x), respectively. The modified Bessel function of the first kind is propor-
tional to the Bessel function with a transformation in the complex plane of the
form
Iα (x) = (i)−α Jα (i x) . (2.298)
Problems 63
For integer order, this form can be expressed in terms of the contour integral

1
Im (x) = (i)−α dz z −m −1 e 2 (z −1/z )
ix
2πi C

1
dz z −m −1 e 2 (z +1/z ) ,
x
= (2.299)
2πi C
where the contour C encloses the origin. The modified Bessel function of the
second kind is given by
π
Kα (x) = (i)α +1 [Jα (i x) + i Yα (i x)] . (2.300)
2
2.14.6 Error function

The error function, often denoted erf(·), is defined by
x0
2
dx e−x .
2
erf(x0 ) = √ (2.301)
π 0
2.14.7 Gaussian Q-function

The Gaussian Q-function, often denoted Q(·), is defined by
∞
1 x2
Q(x0 ) = √ dx e− 2 . (2.302)
2π x 0
2.14.8 Marcum Q-function

The generalized Marcum Q-function [293, 280, 232, 255] is defined by
∞
1 x2+ν 2
QM (ν, μ) = M −1 dy y M e− 2 IM −1 (ν y)
ν μ
∞
m
2
− x +2 ν
2 ν
=e Im (μ ν)
μ
m =1−M
μ2 ν2 z
1 − μ2+ν2 e 2z + 2
= e 2 dz M , (2.303)
2π i S z (1 − z)
where the contour S encloses the pole at zero but not the pole at 1. If M is not
specified, a value of 1 is assumed.
Problems
2.1 Evaluate
√ the following expressions.
(a) e−i π
(b) log4 (1 + i)
∞ 2
(c) dx δ(x − 1) cosh√[π (x−1)]
1−∞ 2−x 2
(d) I4
a
(e) Γ(2)
2.2 For complex vectors a and b, evaluate the following expressions.

(a) rank{a b† }
(b) [I − a(a† a)−1 a† ] a
(c) [I − a(a† a)−1 a† ] b(b† b)−1 b† a
−1
(d) I + aa† b if a† b = 0
−1
(e) I + aa† b if a† b = 1/2
†
(f) log2 |I + a a | if a = 1
2.3 For unit-norm complex vectors a and b, evaluate the following expressions.
(a) λm {I + a a† + b b† } if a† b 2 = 1/2
(b) tr{I + a a† + b b† }
(c) |I + a a† + b b† |
2.4 For matrices A ∈ Cm ,p and B ∈ Cm ,q , show that
|I + A A† + B B† | ≥ |I + A A† | .
2.5 Evaluate the following Wirtinger derivatives (where z ∗ is interpreted as the

doppelganger variable for the conjugation of z).
"∞ m
(a) ∂ ∂z ∗ m =0 mz m
(b) ∂ ∂z ∗ z† z
(c) ∂∂z † z† z
∂ z† A z
(d) ∂ z† z† B z
∂ †
(e) ∂ z∗ z A
2.6 Evaluate the following integrals under the assumption that the closed con-
tour encloses a radius of 10 of the origin.
# 1
(a) dz (z −1)
#
2 z
1
(b) dz (z −20) 2 z
#
(c) dz (z(z−2)(z −3)
−1) 2 z
#
(d) dz (z 2 z−1)
# z
(e) dz (ze−1)
2.7 Evaluate the following integrals where V indicates the entire volume spanned
by the variables of integration.
(a) For real variables x and y
∞ ∞
dx −∞ dy(x2 + y 2 ) e−x e−y
2 2
−∞
Problems 65
(b) For complex variable z

dΩz z 2 e−z
2
V
(c) For the complex n-vector z
dΩz z 2 e−z
2
2.8 Evaluate Gauss hypergeometric function expressions in terms of common

functions.
(a) 2 F1 (1, 2, 4; 1)
(b) 2 F1 (1, 1, 2; −1)
(c) 2 F1 (1/2, 1/2, 3/2; −3)
(d) 2 F1 (−2, 1/2, 1/2; 1/2)
2.9 By using the calculus of variation, find the shortest distance between a
point on the zenith of a sphere (the north pole) and a point on the equator.
3 Probability and statistics
3.1 Probability
While it is often suppressed when confusion is unlikely, it is pedagogically useful

to differentiate between a random variable and a particular value that a random
variable takes. In this section, we will be explicit in our notation for random
variables or realizations of them. However, throughout the rest of the text, the
formalism will be employed only if confusion is likely. Imagine that a random
variable is denoted X and a value for that random variable is given by x. The
probability Pr{x ∈ S; a} of a continuous real random variable X having a value
x within some set of values S, given some parameter of the distribution a, is
given by the integral of the probability density function (PDF) pX (x; a) over S,

Pr{x ∈ S; a} = dx pX (x; a) . (3.1)
S
Depending upon the situation, the explicit dependency upon parameters may
be suppressed. The cumulative distribution function (CDF) is the probability
PX (x0 ) that some random variable X is less than or equal to some threshold x0 ,
PX (x0 ) = Pr{x ≤ x0 }
x0
= dx pX (x; a) . (3.2)
−∞
3.1.1 Bayes’ theorem

There are a variety of probability densities including: prior probability den-
sity,1 marginal probability density, posterior probability density,2 and conditional
probability density, which are denoted here as
pX (x) : prior probability density

pY (y) : marginal probability density
pX (x|y) : posterior probability density
pY (y|x) : conditional probability density. (3.3)
1 Often the Latin form is used a priori to denote the probability.
2 Often the Latin form is used a posteriori to denote the probability.
3.1 Probability 67
For single variables, and pY (y) > 0, this relationship (Bayes’ theorem) can be
written in the important form
pY (y|x) pX (x)
pX (x|y) = . (3.4)
pY (y)
A useful interpretation of this form is to consider the random variable X as the
input to a random process that produces the random variable Y that can be
observed. Thus, the likelihood of a given value x for the random input variable
X is found given the observation y of the output distribution Y . Throughout
statistical signal processing research, a common source of debate is the use of
implicit and sometimes unstated priors in analysis. These priors can dramatically
affect the performance of various algorithms when exercised by using measured
data that often have contributions that do not match simple models.
3.1.2 Change of variables

Consider a random variable X with probability density pX (x), and a new random
variable that is a function of Y = f (x). Assuming that the function f (x) is one
to one and is differentiable, we can find the probability density pY (y) of Y using
the following transformation:
pX(f −1 (y))
pY (y) = $
$ ∂ $,
$ (3.5)
$ ∂ x f (x)x= f −1 (y ) $
where x = f −1 (y) indicates the inverse function of f (x), and the notation ·|x= x 0
indicates evaluating the expression to the left with the value x0 . However, it
is not uncommon for the inverse to have multiple solutions. If the jth solution
given by x at some value y to the inverse is given by x = fj−1 (y), then the
transformation of densities is given by
pX(fj−1 (y))
pY (y) = $ $. (3.6)
$ ∂ $
j $ ∂ x f (x)x= f −1 (y ) $
j
Consider a multivariate distribution involving M random variables which is

discussed in greater detail in Section 3.1.7. Define the random vectors X ∈
RM ×1 whose realizations are denoted by x and Y ∈ RM ×1 whose realizations
are denoted by y. The probability density functions of these random vectors are
given by pX (x) and pY (y), respectively. Let the vector function f (x) map x to
y such that y = f (x). If at the point y, there are multiple solutions for x, then
the functional inverse fj−1 (y) is the jth solution. The relationship between the
probability density function of X and Y is then given by
pX (fj−1 (y))
$ $,
pY (y) = $ ∂ f (x) $ (3.7)
$ $
$ ∂ x x=f −1 (y) $
j
j
where |∂f (x)/∂x| is the Jacobian associated with the two random vectors, and
the notation |.|x = x0 indicates that the absolute value of the quantity within
the bars is evaluated with the parameter x = x0 .
3.1.3 Central moments of a distribution

The characteristics of the distribution of random variables are often represented
by the various moments about the mean of the distribution. The expectation of
some function f (x) of the random variable X with probability density function
pX (x) is indicated by

f (X) = dx f (x) pX (x) . (3.8)
The mean value of the random variable X is given by

X = dx x pX (x) . (3.9)
The mth central3 moment about the mean indicated here by μm is given by

μm = dx (x − X)m pX (x). (3.10)
By construction, the value of μ1 is zero. The following central moments, denoted

here as μ2 , μ3 , and μ4 , are related to the variance, skew, and kurtosis excess of
a distribution.
The variance of random variable X is given by

2
σX = dx (x − X)2 pX (x)
= μ2 . (3.11)
Note that in situations where the random variable in concern is clear, we shall
omit the subscript, denoting the variance simply as σ 2 .
The skewness of random variable X is an indication of the asymmetry of a
distribution about its mean. It is given by the third central moment normalized
by the variance to the 3/2 power; thus, it is unitless,
dx (x − X)3 pX (x)
skew{X} = 3/2
(σ 2 )
μ3
= 3/2
. (3.12)
(σ 2 )
Finally, the kurtosis of random variable X is a measure of a distributions “peak-
iness.” It is given by the fourth central moment normalized by the variance
squared; thus, it is unitless. The excess kurtosis is the ratio of the fourth cumulant
3 Central indicates that it is the fluctuation about the mean that is being evaluated.
3.1 Probability 69
to the square of the second cumulant; cumulants are discussed in

Section 3.1.6. The excess kurtosis is given by subtracting 3 from the kurtosis
dx (x − X)4 pX (x)
excess kurtosis{X} = 2 −3
(σ 2 )
μ4
= 2 − 3, (3.13)
(σ 2 )
which has the desirable characteristic of evaluating to zero for Gaussian distribu-
tions. Unfortunately, there is sometimes confusion in the literature as to whether
“kurtosis” indicates kurtosis or excess kurtosis.
Jensen’s inequality
Jensen’s inequality can be used to relate the mean of a function of a random
variable to the function of the mean of the random variable. Specifically, Jensen’s
inequality states that for every convex function f (·) of a random variable X,
f (X) ≥ f (X) . (3.14)
Similarly for every concave function g(·) of a random variable X,
g(X) ≤ g(X) . (3.15)
3.1.4 Noncentral moments of a distribution

The noncentral moments of a distribution for the random variable X are similar
to the central moments with the exception that the mean is not removed from
the expectation. The mth noncentral moment indicated here by μm is given by
μm = X m

= dx xm pX (x) . (3.16)
A tool that is sometimes useful in working with problems involving moments

is the moment-generating function M (t; X) for random variable X and dummy
variable t, which is given by

M (t; X) = dx pX (x) et x
% &
= et X
' ∞ (
1
= (t X)m . (3.17)
m =0
m!
The mth moment is found by noting that the derivative with respect to t evalu-
ated at t = 0 leaves only the mth term in a Taylor expansion of the exponential,

∂
M (t; X) = X m . (3.18)
∂t t=0
3.1.5 Characteristic function

For a variety of applications, it is useful to consider transforms of the proba-
bility density. The characteristic function is proportional to the inverse Fourier
transform in terms of angular frequency. The integral transform of some density
pX (x) of real random variable X is denoted by the characteristic function φ(s)
and is given by

φX (s) = dx ei x s pX (x) , (3.19)
for which s is the transformed variable for x that corresponds to the angular
frequency. Note that the characteristic function of a random variable is essen-
tially the Fourier transform (see Section 2.10) of its probability density function.
The moment-generating function, on the other hand, is essentially the Laplace
transform (see Section 2.11) of the PDF evaluated at real values.
3.1.6 Cumulants of distributions

The concepts of cumulants and moments are closely related. Estimating the cu-
mulants of observed signals can be useful in disentangling or detecting signals
[200]. The mth cumulant km of a probability distribution can be implicitly de-
fined in terms of the characteristic function of the random variable X and is
implicitly defined by
∞
km (i t)m
log φX (s) = . (3.20)
m =1
m!
In terms of the central moments μm , the first few cumulants are given by
k1 = μ
k2 = μ2
k3 = μ3
k4 = μ4 − 3μ22
k5 = μ5 − 10μ2 μ3
k6 = μ6 − 15μ2 μ4 − 10μ23 + 30μ32 . (3.21)
3.1.7 Multivariate probability distributions

The probability density function of multiple random variables indicates the prob-
ability that values of the random variables are within some infinitesimal hyper-
volume about some point in the variable space. The probability density is denoted
pX 1 ,X 2 ,... (x1 , x2 , . . .) . (3.22)

3.1 Probability 71
If the random variables are independent, then the joint probability density func-
tion is equal to the product of the individual probability densities,

pX 1 ,X 2 ,... (x1 , x2 , . . .) = pX m (xm ) . (3.23)
m
Given some parameter A, the probability of a complex random matrix variable

X having a value X that is contained within a space defined by S is given by

PX (X ∈ S; A) = dΩX pX (X; A) ,
S
= dΩX pX ((X)1,1 , (X)1,2 , . . . (X)2,1 , . . . ; A) , (3.24)
S
where dΩX , discussed in Section 2.9.2, is the notation for the measure and is
given by
dΩX = d{X}1,1 d{X}1,2 . . . d{X}m ,n

· d{X}1,1 d{X}1,2 . . . d{X}m ,n . (3.25)
Note that the measure is expressed in terms of the real and imaginary compo-
nents of the complex random variable. This convention is not employed univer-
sally, but will be assumed typically within this text. In the case of a real random
variable, the imaginary differentials are dropped.
The probability density function of a given set of random variables xm given
or conditioned on particular values for another set of variables yn is denoted by
pX 1 ,X 2 ,... (x1 , x2 , . . . |y1 , y2 . . .) . (3.26)
Bayes’ theorem relates the conditional and prior probability densities,
pX 1 ,X 2 ,...,Y 1 ,Y 2 ,... (x1 , x2 , . . . , y1 , y2 . . .)

= pX 1 ,X 2 ,... (x1 , x2 , . . . |y1 , y2 . . .) pY 1 ,Y 2 ,... (y1 , y2 . . .)
= pY 1 ,Y 2 ,... (y1 , y2 . . . |x1 , x2 , . . .) pX 1 ,X 2 ,... (x1 , x2 , . . .) . (3.27)
3.1.8 Gaussian distribution

The Gaussian distribution is an essential distribution in signal processing. Be-
cause of the central limit theorem, processes that combine the effects of many
distributions often converge to the Gaussian distribution. Also, for a given mean
and variance, the entropy (which is used in the evaluation of mutual information)
is maximized for Gaussian distributed signals. The analysis of multiple-antenna
systems will often take advantage of multivariate Gaussian distributions. The
probability density function for a real% Gaussian& random variable X with value
x, mean μ = X, and variance σ 2 = (X − μ)2 is given by
1 ( x −μ ) 2
pX (x; μ, σ) dx = √ e− 2σ 2 dx . (3.28)
2πσ 2
This normal distribution is often identified by N (μ, σ 2 ). The complex normal

(or Gaussian) distribution assuming circular symmetry for a complex random
variable Z with value z, mean μ, and variance σ 2 is given by
1 − ( z −μ2 ) 2
pZ (z; μ, σ) dz dz = e σ dz dz . (3.29)
πσ 2
The complex version of the distribution is often denoted CN (μ, σ 2 ).

The probability density for an m by k random matrix Z with value Z ∈ Cm ×k
drawn from a multivariate complex Gaussian is given by
1 † −1
pZ (Z; X, R) dΩZ = e−tr{(Z−X) R (Z−X)} dΩZ , (3.30)
|R|k π m k
where the mean of Z is given by X ∈ Cm ×k . The covariance of the rows of

the random matrix is R ∈ Cm ×m under the assumption of independent columns
(note that it is possible to define a more general form of Gaussian random matrix
with dependent columns). The notation dΩZ indicates the differential hypervol-
ume in terms of the real parameters {Z}p,q and {Z}p,q , indicating the real and
imaginary part of the elements of Z, where p and q are here indices identifying
elements in the matrix.
The covariance matrix is an important concept used repeatedly throughout
adaptive communications. For some complex random matrix Z with values Z
and mean X, the covariance matrix is given by
) *
(Z − Z)(Z − Z)†
R=
k

(Z − X)(Z − X)†
= dΩZ pZ (Z; X) . (3.31)
k
The covariance is Hermitian, R = R† . In this form, the covariance matrix is a

measure of the cross correlation between the elements along the columns of Z.
3.1.9 Rayleigh distribution

The magnitude of a random variable drawn from a complex, circularly symmetric
Gaussian distribution with variance σ 2 follows the Rayleigh distribution. Here we
will identify this random variable as Q with value q. Suppose that the Gaussian
variable is denoted by Z with value z, and has variance σ 2 . If its magnitude is
denoted by
q= z (3.32)
3.1 Probability 73
then the probability density4 for the real Rayleigh variable Q is given by
+
2 q −q 2 /σ 2
σ2 e dq ; q ≥ 0
pR ay (q) dq = (3.33)
0 dq ; otherwise,
where σ 2 is the variance of the complex Gaussian variable z. The cumulative

distribution function PR ay (q) for the Rayleigh variable q is given by
q
PR ay (q) = dx p(x)
+0
1 − e−q /σ
2 2
;q≥0
= (3.34)
0 ; otherwise.
3.1.10 Exponential distribution

The square of a Rayleigh distributed random variable is a useful quantity in
analyzing the received power of a narrowband transmission through a Rayleigh
fading channel because the amplitude of the channel coefficient in that case
follows a Rayleigh distribution. An example application is the interarrival times
of a one-dimensional Poisson point process (see Section 3.4). The exponential
random variable is parameterized by the inverse of its mean λ.
The probability density function of the exponential random variable is
λ e−λ x dx ; x≥0
pE xp (x) dx = (3.35)
0 dx ; x < 0.
The cumulative distribution function of the exponential random variable is
1 − e−λ x ; x≥0
PE xp (x) = (3.36)
0 ; x < 0.
1 1
Its mean and variance are λ and λ2 , respectively.
3.1.11 Central χ2 distribution

The sum q of the magnitude squared of real, zero-mean, unit-variance Gaussian
variables Xm with value xm is characterized by the χ2 distribution. The sum of
k independent Gaussian variables denoted by q is

k
q= x2m . (3.37)
m =1
4 This density assumes that the complex variance is given by σ 2 , which is different from a
common assumption that the variance is given for a real variable. Consequently, there are
some subtle scaling differences.
Since the random variables Xm have unit variance and zero mean,
% 2
&
Xm = 1. (3.38)
The distribution pχ 2 (q) for the sum of the magnitude square q of k independent,
zero-mean, unit-variance real Gaussian random variables is given by
+
1
2 k / 2 Γ(k /2)
q k /2−1 e−q /2 dq ;q≥0
pχ 2 (q; k) dq = (3.39)
0 dq ; otherwise.
The cumulative distribution function Pχ 2 (q; k) of the χ2 random variable is given

by
q
Pχ 2 (q; k) = dr pχ 2 (r; k)
+0 k
1
Γ(k /2) γ q
2, 2 ;q≥0
= (3.40)
0 ; otherwise,
where Γ(·) is the standard gamma function, and γ(·, ·) is the lower incomplete
gamma function given by Equation (2.271).
Complex χ2 distribution
With a slight abuse in terminology, we define the complex χ2 distribution as
the distribution of the sum q of n independent complex, circularly symmetric
Gaussian random variables Zm with values zm . The sum q is given by [173]

n
2
q= zm (3.41)
m =1
% 2
& 2
Zm =σ . (3.42)
To be clear, the variance detailed here is in terms of the complex Gaussian vari-
able and we include the variance explicitly as a parameter of the distribution. By
employing Equation (3.5) and noting that the number of real degrees of freedom
is twice the number of complex degrees of freedom (k = 2n), the distribution
pCχ 2 (q; n, σ ) for the sum of the magnitude squared q ≥ 0 is given by
2

1 q
pC 2
χ 2 (q; n, σ ) dq = pχ 2 ; 2n dq
σ 2 /2σ 2 /2
n −1
1 2q q
= 2 n −1 2
e− σ 2 dq
σ 2 Γ(n) σ
q n −1 q
= 2 n e− σ 2 dq , (3.43)
(σ ) Γ(n)
3.1 Probability 75
where it is assumed that the variance σ 2 of zm is the same for all m, and the
density is zero for q < 0. The cumulative distribution for q is given by
q
PχC2 (q; n, σ 2 ) = dr pC 2
χ 2 (q; n, σ )
0
1 q
= γ n, 2 . (3.44)
Γ(n) σ
3.1.12 Noncentral χ2 distribution

The sum of k nonzero mean, unit variance Gaussian random variables follows
a noncentral χ2 distribution. Assume that the variable q is drawn from the
noncentral χ2 distribution. Here, the set of k random variables is defined such
that mth random variable Xm with value xm has mean μm (not to be confused
with the notation used for the central moments) and is drawn from real unit-
variance Gaussian distributions (plural since they have different means),
Xm ∼ N (μm , 1) . (3.45)
The random variable q is given by the sum of the magnitude squared of the real
independent Gaussian variables

k
q= x2m (3.46)
m =1
μm = Xm (3.47)
% &
σ 2 = Xm − μm 2
= 1, (3.48)
where here μm indicates the mth mean (and not the moment). The probability
density for q ≥ 0 is given by [174]
1 q k /4−1/2 √
pχ 2 (q; k, ν) = e−(q +ν )/2 Ik /2−1 ( ν q) , (3.49)
2 ν
where Im (·) indicates the mth order modified Bessel function of the first kind
(discussed in Section 2.14.5), and the density is zero for q < 0. The noncentrality
parameter ν is given by the sum of the standard deviation normalized means,

k
ν= μ2m . (3.50)
m =1
The cumulative distribution function for q ≥ 0 is given by [232, 173]

q
Pχ 2 (q; k, ν) = dr pχ 2 (r; k, ν)
0
√ √
= 1 − Qk /2 ( ν, q)
∞ ν m
−ν /2 2 γ(m + k/2, q/2)
=e , (3.51)
m =0
m! Γ(m + k/2)
where QM (·, ·) is the Marcum Q-function discussed in Section 2.14.8, and γ(·, ·)
is the lower incomplete gamma function.
Complex noncentral χ2 distribution

If n random variables are drawn from a circularly symmetric complex Gaussian
distribution with complex mean μm and variance σ 2 , then the complex noncen-
tral χ2 distribution can be found by noting that n complex degrees of freedom
correspond to 2k real degrees of freedom, and the complex noncentrality param-
eter is given by

n
νC = μm 2
. (3.52)
m =1
In converting from the real to the complex distribution, a factor of 2 occurs in

multiple changes of variables. In addition to the change in number of degrees
of freedom, the real and imaginary variances are half the complex variance,
σr2e = σim 2
= σ 2 /2, where we include the variance σ 2 as a parameter of the
distribution. Consequently, k = 2n, ν = 2ν C /σ 2 , and q → 2q/σ 2 ; thus, the
complex noncentral χ2 distribution for q ≥ 0 is given by

C 2 C 1 −(q +ν C )/σ 2 q (n −1)/2 2 νC q
pχ 2 (q; n, σ , ν ) dq = 2 e In −1 dq , (3.53)
σ νC σ2
and is zero for q < 0. The cumulative distribution function for q ≥ 0 is given by
q
PχC2 (q; n, σ 2 , ν C ) = dr pC 2 C
χ 2 (r; n, σ , ν )
0
, ,
2ν 2q
= 1 − Qn 2
, . (3.54)
σ σ2
3.1.13 F distribution
The F distribution is a probability distribution of the ratio of two independent
central χ2 distributed random variables. It is parameterized by two parameters
n1 and n2 and has a density function
⎧ n1 n2
⎨
n1
n 1 2 n 2 2 x 2 −1
1
n1 n2 n1 n
B( 2 , 2 ) (n 1 x+ n 2 ) 2 + 22
;x≥0
pF (x; n1 , n2 ) = (3.55)
⎩
0 ; otherwise.
B(·, ·) here refers to the beta function defined in Section 2.14.3.
The cumulative distribution function of the F -distributed random variable is
given by

n1 x n1 n2
PF (x; n1 , n2 ) = I ; ; , (3.56)
n2 + n1 x 2 2
where I(.; ., ) is the regularized beta function defined in Section 2.14.3.
3.1 Probability 77
When viewed as the ratio of two χ2 random variables normalized by their

degrees of freedom, the parameters n1 and n2 are the degrees of freedom of the
numerator and denominator random variables. In other words, if the random
variables X1 and X2 follow χ2 distributions with n1 , and n2 degrees of freedom
respectively, then the random variable
X1 n2
Y = (3.57)
X2 n1
follows an F distribution with parameters n1 and n2 .
3.1.14 Rician distribution

The magnitude of the sum of a real scalar a and a random complex variable z
sampled from a circularly symmetric complex Gaussian distribution with zero
mean and variance σ 2 is given by the random variable Y with value y,
y = a+z

= (a + {z})2 + {z}2 . (3.58)
The random variable y follows the Rice or Rician distribution whose probability
density function pR ice (y) is,
+ 2 a y −(y 2 +a 2 )/σ 2
2y
σ 2 I0 σ 2 e dy ; y ≥ 0
pR ice (y) dy = , (3.59)
0 dy ; otherwise
where I0 (·) is the zeroth order modified Bessel function of the first kind (discussed
in Section 2.14.5). In channel phenomenology, it is common to describe this
distribution in terms of the Rician K-factor, which is the ratio of the coherent
to the fluctuation power,
a2
K= . (3.60)
σ2
It may be worth noting that the Rician distribution is often described in terms
of two real Gaussian variables. Consequently, the distribution given here differs
from the common form by replacing σ 2 with σ 2 /2.
The cumulative distribution function for a Rician variable for value greater
than zero is given by
y0
PR ice (y0 ) = dy pY (y)
0 y 0
2y 2ay
e−(y +a )/σ
2 2 2
= dy 2 I0 2
0 σ σ
√ √
a 2 y0 2
= 1 − QM =1 , , (3.61)
σ σ
where QM (ν, μ) is the Marcum Q-function discussed in Section 2.14.8. The dis-
tribution for the square of a Rician random variable q = y 2 is the complex
noncentral χ2 distribution with one complex degree of freedom.
3.1.15 Nakagami distribution

The Nakagami distribution was developed to fit experimental data for wireless
propagation channels that are not well modeled by either the Rayleigh or Rician
distributions. For random variable X with value x, a Nakagami distribution
pN ak (x; m, ω) is parameterized by two variables and takes the following form
2mm
x2m −1 e− ω x dx,
m 2
pN ak (x; m, ω) dx = (3.62)
Γ(m) ω m
where ω is the second moment of the random variable and m is a parameter
known in the communications literature simply as the “m-parameter.”
The mean and variance are given by

Γ m + 12 ω 12
X = (3.63)
Γ(m) m
and
2
ω Γ m + 12
var[X] = ω − (3.64)
m Γ(m)
Observe that for m = 1 the Nakagami distribution reduces to the Rayleigh

distribution as follows:
2 x2
pN ak (x; m, ω) = xe− ω . (3.65)
ω
2
With m = (K +1)
2K +1 , the Nakagami distribution is close to a Rician distribution.
Hence, the Nakagami distribution can be used to model a wider range of channels
than the Rayleigh and Rician channels alone.
3.1.16 Poisson distribution

The Poisson distribution is a discrete probability distribution that is useful for
modeling independent events that occur in some interval (or volume in general).
The probability of n events is characterized by a rate μ and has the following
probability mass function (PMF)
μn e−μ
pn = . (3.66)
n!
The cumulative distribution function of the Poisson distribution is
n
μk
Pn = e−μ , (3.67)
k!
k =0
and its mean and variance are μ.

3.1 Probability 79
The Poisson distribution is useful in calculating the number of arrivals of a

point process in a given interval of length t (or volume in general) where μ/t is
the rate of arrivals. Suppose that the inter-arrival times of buses at a bus stop
are completely independent and the rate of arrivals is λ. Then, the PMF of the
number of buses arriving in an interval of duration τ is
(λτ )n e−λτ
pn = . (3.68)
n!
3.1.17 Beta distribution

The beta distribution can be used to describe the fraction of the vector norm
squared retained if a complex random Gaussian vector of size k + j is projected
into a subspace of size k as discussed in Reference [173]. The random variable
X with value x is described by the ratio of sums of the magnitude squared of
a set of random identically distributed, circularly symmetric complex Gaussian
variables
"k 2
m =1 gm
x = "k " j
, (3.69)
2 + 2
m =1 gm m =1 gm
where Gm is a real Gaussian random variable with the same statistics as Gm

(with values gm and gm , respectively). The probability density of the beta dis-
tribution pβ (x; j, k) is given by
Γ(j + k) j −1
pβ (x; j, k) dx = x (1 − x)k −1 dx , (3.70)
Γ(j) Γ(k)
and the corresponding CDF Pβ (x0 ; j, k) is given by
x0
Pβ (x0 ; j, k) = dx fβ (x; j, k)
0
Γ(j + k)
= B(x0 ; j, k) , (3.71)
Γ(j) Γ(k)
where B(x; j, k) is the incomplete beta function that is discussed in Section
2.14.3.
Note that while the beta distribution can be used to describe the retained
fraction of the norm square of a (k + j)-dimensional Gaussian random vector
projected onto a k-dimensional space, the beta distribution is more general than
that. As such, the parameters j and k could be non-integers as well.
3.1.18 Logarithmically normal distribution

One of the standard issues in employing Gaussian distributions to represent
various types of phenomenology is that many real-life distributions have long
tails (that is the probability of large deviations is greater than the Gaussian
distribution would suggest). One distribution with much longer tails is the loga-
rithmically normal distribution (or more commonly log-normal distribution). For
the log-normal random variable X with value x, the probability density function
is given by
1 l o g x −μ
plog N or m (x; μ, σ 2 ) dx = √ e− 2 σ2 dx . (3.72)
x 2πσ 2
The cumulative distribution function is given by

1 log x − μ
2
Plog N or m (x0 ; μ, σ ) = 1 + erf √ , (3.73)
2 σ 2
where erf(·) is the error function discussed in Section 2.14.6.
3.1.19 Sum of random variables

For a variety of applications, the distribution of the sum of random variables is
desired. The distribution of the sum of independent random variables is given
by the convolution of the distributions of the individual random variables. To
show this consider two independent random variables with X and Y , with x
drawn from pX (x) and y drawn from pY (y) respectively. The sum of the random
variables is Z, such that z = x + y. The distribution of z is drawn from pZ (z).

pZ (z) = dx dy pX ,Y (x, y) pZ (z|x, y)

= dx dy pX (x) pY (y) pZ (z|x, y) , (3.74)
where pX ,Y (x, y) is the joint probability density for x and y, pZ (z|x, y) is the
probability density of z conditioned upon the values of x and y, and x and y are
assumed to be independent. Because z = x + y, the conditional probability is
simple, given by
pZ (z|x, y) = δ(x + y − z) , (3.75)
where δ(·) is the Dirac delta function. Consequently, the distribution for Z is
given by

pZ (z) = dx dy pX (x) pY (y) δ(x + y − z)

= dx pX (x) pY (z − x) , (3.76)
which is the convolution of the distributions for X and Y .

This same result can be evaluated by using characteristic functions of the
distributions. The characteristic function φX (s) of a distribution for the random
variable X, discussed in Section 3.1.5, is given by

φX (s) = dx ei s x pX (x) . (3.77)
Because the convolution observed in Equation (3.76) corresponds to the product

in the transform domain, the characteristic function for Z is given by
φZ (s) = φX (s) φY (s) , (3.78)
and the corresponding probability density function for Z is given by

1
pZ (z) = du e−i s z φX (s) φY (s) . (3.79)
2π
3.1.20 Product of Gaussians

Another distribution of interest results from the product of random variables.
To show this, consider two independent real random variables with X and Y ,
with x drawn from pX (x) and y drawn from pY (y), respectively. The product of
the random variables is Z, such that
z = xy. (3.80)
The distribution of z is pZ (z). The probability distribution is given by

pZ (z) = dx dy pX (x) pY (y) δ(x y − z) . (3.81)
The distributions of the product of the of real zero-mean Gaussian variables

is given by
2 2
− 2xσ 2 − 2yσ 2
e x e y
pZ (z) = dx dy δ(x y − z)
2π σx2 2π σy2

1 z
= K0 , (3.82)
π σx σy σx σy
where K0 (·) is the modified Bessel function of the second kind of order zero
discussed in Section 2.14.5.
3.2 Convergence of random variables
What does it mean for a random variable to converge? While the convergence
of a sequence of deterministic variables to some limit is straightforward, conver-
gence of random variables to limits is more complicated due to their probabilistic
nature. In the following, we define several different modes of convergence of ran-
dom variables, starting with modes of convergence that are typically viewed as
stronger modes of convergence followed by weaker ones. The proofs of these prop-
erties can be found in standard probability texts such as References [131] and
[171].
3.2.1 Convergence modes of random variables

Consider a sequence of random variables Xn , n = 1, 2, . . . and another random
variable X.
Almost-sure convergence
We say that Xn converges with probability 1, or almost surely to a random
variable X, if
0 1
Pr lim Xn = X = 1 . (3.83)
n →∞
Almost-sure convergence is typically denoted by either

a.s.
Xn −−→ X or (3.84)
w .p.1
Xn −−−→ X . (3.85)
Almost-sure convergence simply means that the event that Xn fails to converge
to X has zero probability.
Convergence in quadratic mean

We say that Xn converges in quadratic mean, the mean-square sense, or in the
L2 sense to X if
% &
lim Xn − X 2 = 0 . (3.86)
n →∞
Convergence in quadratic mean is denoted by either

L
Xn −→
2
X or (3.87)
q.m .
Xn −−−→ X. (3.88)
This mode of convergence simply means that the mean of the squared deviation
of Xn from its limiting value X goes to zero. Note that almost-sure convergence
does not in general imply convergence in mean square or vice versa. However,
suppose that the sum of the mean-square deviations of Xn and its limiting value
X is finite, i.e.,
∞
% &
Xn − X 2
< ∞, (3.89)
n =1
then it can be shown [131], that

a.s.
Xn −−→ X. (3.90)
(3.91)
The idea of convergence in quadratic-mean can be generalized to convergence

in kth mean for k > 0. We say that Xn converges in kth mean to X if
% &
lim Xn − X k = 0 . (3.92)
n →∞
Additionally, if ≤ k, then the above expression implies

% &
lim Xn − X = 0 . (3.93)
n →∞
In other words, convergence in the kth mean of a random variable implies con-
vergence in all lower-order means as well.
Convergence in probability
We say that Xn converges in probability to X if for every > 0,
lim Pr {|Xn − X| ≥ } = 0 . (3.94)

n →∞
Convergence in probability is typically denoted by

P
Xn −
→ X. (3.95)
Convergence in probability simply means that the probability that the random
variable Xn deviates from X by any positive amount goes to zero as n → ∞.
Convergence in distribution
We say that Xn converges in distribution to a random variable X if the cumu-
lative density functions of Xn converge to the cumulative density function of X
as n → ∞. In other words,
lim PX n (x) = PX (x) (3.96)

n →∞
at all points x where PX is continuous. That is to say, the cumulative distri-

bution function of Xn converges to the cumulative distribution function of X.
Note that this form of convergence is not really a convergence of the random
variables themselves but rather of their probability distributions. Convergence
in distribution is denoted with the following:
D
Xn −
→ X. (3.97)
d
Xn −
→ X. (3.98)
Convergence in probability implies convergence in distribution.
3.2.2 Relationship between modes of convergence

The different modes of convergence described above are closely related to each
other. In general, almost-sure convergence is the strongest, frequently encoun-
tered form of convergence, implying most other modes of convergence (with the
notable exception of convergence in the kth mean). Convergence in distribution
is generally considered the weakest form of convergence and is implied by all the
other modes of convergence. The following subsections list some of the relation-
ships between the modes of convergence, starting with relationships that hold
for all random variables followed by some special cases.
General relationships
Convergence of a random variable Xn to a random variable X with probability
1 or almost surely implies that the random variable Xn converges in probability
to X as well, since convergence in probability is a weaker notion of convergence
than convergence with probability 1. Similarly, the convergence of Xn to X in
quadratic mean implies convergence in probability of Xn to X. Since convergence
in distribution is weaker than convergence in probability, convergence of random
variables Xn to X in probability implies convergence in distribution of Xn to X.
In other words, the cumulative distribution function of Xn converges to that of
X if Xn converges to X in probability. Mathematically, these relationships can
be written as follows
a.s. P
Xn −−→ X =⇒ Xn −
→ X (3.99)
q.m . P
Xn −−−→ X =⇒ Xn −
→ X (3.100)
P D
Xn −
→ X =⇒ Xn −
→ X, (3.101)
where =⇒ indicates a mathematical implication.
Some restricted relationships

The previous section describes the relationships between the different modes of
convergence that hold in general, for all random variables. For some special cases,
additional properties may hold.
One special case that proves useful in analyzing communication systems is
the convergence of a random variable that is a continuous function of another
random variable. In the context of communication systems, a useful example is
the convergence of the spectral efficiency log2 (1 + SINR) in the case where the
signal-to-interference-plus-noise ratio (SINR) converges to some value. The basic
result here is that convergence under each mode is preserved under continuous
functions. Mathematically speaking, if f (X) is a continuous function of X, then
a.s. a.s.
Xn −−→ X =⇒ f (Xn ) −−→ f (X) (3.102)
P P
Xn −
→ X =⇒ f (Xn ) −
→ f (X) (3.103)
D D
Xn −
→ X =⇒ f (Xn ) −
→ f (X) . (3.104)
Note that if f (X) is bounded in addition to being continuous, i.e., f (X) < A,
for some finite constant A, in addition to the property above we also have
D D
f (Xn ) −
→ f (X) =⇒ Xn −
→ X. (3.105)
While convergence in distribution does not imply convergence in probability in

general, when a sequence of random variables converges in distribution to a con-
stant, this property does indeed hold. Observe that if a sequence of random
variables converges in distribution to a constant, the limiting cumulative distri-
bution function will be a step function with the step at the limiting value. In
this case, it can be shown that convergence in probability holds as well. More
formally, suppose that A is a constant, then
D P
Xn −
→ A =⇒ Xn −
→ A. (3.106)
Sums of random variables maintain their convergence properties for almost-sure

convergence, convergence in probability, and convergence in quadratic mean.
That is to say, if two sequences of random variables each converge in some fashion
to a limit, the sum of the random variables also converges in the same manner to
the sum of the limits. More formally, consider an additional sequence of random
variables Yn for n = 1, 2, . . . . The following properties then hold:
a.s. a.s. a.s.
Xn −−→ X and Yn −−→ Y, then Xn + Yn −−→ X + Y (3.107)
P P P
Xn −
→ X and Yn −
→ Y, then Xn + Yn −
→ X +Y (3.108)
q.m . q.m . q.m .
Xn −−−→ X and Yn −−−→ Y, then Xn + Yn −−−→ X + Y. (3.109)
Note that the above property does not hold in general for convergence in distri-
bution. However, we have the following property, known as Slutsky’s theorem,
which applies when one of the sequences of variables converges to a constant A,
D D D
Xn −
→ X and Yn −
→ A, then Xn + Yn −
→ X + A. (3.110)
Note that even almost-sure convergence does not imply convergence of means
in general. In other words, even if a sequence of random variables Xn converges
with probability 1 to a random variable X, it is not necessarily the case that
the expected values of the Xn , i.e., Xn , converge to X. The reason for this
apparent paradox is that convergence of the mean of a random variable depends
on the rate at which the probabilities associated with that random variable
converge, whereas convergence in probability or almost-sure convergence do not
depend on the rate of convergence of the probabilities associated with the random
variables. A simple example that is often given in textbooks on probability is
the following.
Let the random variable Xn take on the following values,
n with probability n1
Xn = . (3.111)
0 with probability 1 − 1
n
The mean of Xn is always 1, but as n → ∞, the probability that Xn = 0

approaches zero. In other words, Xn converges with probabilty 1 to zero whereas
Xn = 1 for all n.
What is needed for almost-sure convergence or convergence in probability to
imply convergence of the mean is a property called uniform integrability, which
is defined as
% &
lim sup Xn 1{X n > ν } = 0 . (3.112)
ν →∞ n
where 1{A } = 1 if A is true and 0 otherwise.

The contents of the expectation operator in (3.112) are nonzero only for suitably
large values of Xn , i.e. |Xn | > ν. In other words, the expectation is averaging
only “large” values of Xn . The supremum over n outside the expectation looks for
the value of n for which the average value of |Xn | when small values of |Xn | are
forced to zero is largest. Finally, ν is taken to infinity which means that the values
of Xn that are not zeroed in the averaging operation get successively larger.
This property ensures that the mean value of Xn converges at the correct rate
such that convergence in probability implies convergence of the means. We then
have the following property, which states that the absolute value of the deviation
of Xn from X converges to zero, if and only if Xn converges in probability to a
random variable X, and Xn is uniformly integrable. Mathematically, this can be
expressed as
P
Xn −
→ X and Xn is uniformly integrable ⇐⇒ Xn − X → 0. (3.113)
While uniform integrability is hard to prove in many cases, a stronger require-

ment can be used instead of uniform integrability. Suppose that
Xn ≤ W ∀ n (3.114)
W < ∞ (3.115)
and
P
Xn −
→ X, (3.116)
then
Xn − X → 0 . (3.117)
This property is known as the dominated convergence theorem as applied to

random variables. The variable Xn is dominated by another random variable
W that has finite mean. The finite mean of W ensures that the mean of Xn
converges at such a rate that convergence in probability will imply convergence
of the means. The proof and detailed analyses of uniform integrability and the
dominated convergence theorem are beyond the scope of this text and can be
found in advanced probability texts such as [24].
3.3 Random processes
While random variables are mappings from an underlying space of events to real
numbers, random process can be viewed as a mapping from an underlying space
of events onto functions. Random processes are essentially random functions and
are useful for describing transmitted signals when the underlying signal sources
are nondeterministic.
Figure 3.1 illustrates a random process X(t) in which elements in an underlying
space of events (which need not be discrete) map onto functions. The set of all
Space of events x1(t)
.
.. .
. . .. x1(t), x2(t) – possible realizations
of the process X(t)
x2(t)
Figure 3.1 Random processes as a mapping from an underlying event space to

functions.
possible functions that X(t) can take is called the ensemble of realizations of the
process X(t).
Note that X(t) for any particular t is simply a random variable. A complete
statistical characterization of a random process requires the description of the
joint probability densities (or distributions), of the random variables X(t) for
all possible values of t. Note that since t is in general uncountable and hence
cannot be enumerated, the joint distribution in general needs to be specified for
a continuum of values of t.
In general, the joint density for all possible t is very difficult to obtain for real-
world signals. If we restrict ourselves to ergodic processes, that loosely speaking,
are random processes for which single realizations of the process contain the
statistical properties of the entire ensemble, it is possible to estimate certain
statistical properties of the ensemble from a single realization of the process. Of
particular interest are the second-order statistics of ergodic random processes,
which are the mean function
μ(t) = X(t) (3.118)
and the autocorrelation function
RX (τ1 , τ2 ) = X(τ1 ) X ∗ (τ2 ) . (3.119)
It is also possible to define the cross correlation between two random processes
X(t) and Y (t) as follows:
RX Y (τ1 , τ2 ) = X(τ1 ) Y ∗ (τ2 ) . (3.120)
Note that the expectations above are taken with respect to the ensemble of
possible realizations of the processes X(t) and Y (t), jointly.
3.3.1 Wide-sense stationary random processes

Random processes for which the mean function is a constant and the autocor-
relation function is dependent only on the difference in the time indices are
called wide-sense stationary (WSS) random processes. A random process X(t)
is wide-sense stationary if the following two conditions hold:
X(t) = μ (3.121)
∗
RX (τ1 , τ2 ) = X((τ1 − τ2 ) + t) X (t) ∀ t. (3.122)
The autocorrelation function for wide-sense stationary processes is written

with a single index corresponding to the time lag as
RX (τ ) = X(t + τ ) X ∗ (t) . (3.123)
Two processes are jointly wide-sense stationary if they are each wide-sense sta-
tionary and their cross correlation is just a function of the time lag. The cross
correlation of wide-sense stationary processes is usually written with a single
index as follows:
RX Y (τ ) = X(t + τ ) Y ∗ (t) . (3.124)
The power spectral density (PSD) of a wide-sense stationary random process

is a measure of the average density of power of the process in the frequency
domain. It can be defined as follows:
' T (
1 2

SX (f ) = lim x(t) e−2 π i f t dt .
T →∞ T −T 2

The Einstein–Weiner–Khinchin theorem further states that if the integral ex-

ists, the PSD is given by
∞
SX (f ) = dτ RX (τ ) e−i 2 π f τ . (3.125)
−∞
Observe here that the power-spectral density is the Fourier transform of the
autocorrelation function.
Wide-sense stationary processes are good approximations for many nondeter-
ministic signals encountered in the real world, including white noise. Addition-
ally, the effect of linear-time-invariant (LTI) systems on wide-sense stationary
random processes can be characterized readily.
3.3.2 Action of linear-time-invariant systems on wide-sense stationary

random processes
Consider a linear-time-invariant system as in Figure 3.2, where h(t) is the impulse
response of the system.
If x(t) is a deterministic signal, y(t) = x ∗ h(t), where ∗ denotes the convolution
operation. If x(t) is a realization of a random process, this relationship holds, but
x(t) y(t)
h(t)
Figure 3.2 Linear-time-invariant system.
in many scenarios observing the output signal y(t) in response to one realization
of x(t) may not be very useful to characterize the behavior of the LTI system.
Much more meaningful results can be obtained by characterizing the second-
order statistics of X(t) and Y (t).
Suppose that h(t) is absolutely integrable, i.e.,
∞
dt h(t) < ∞ .
−∞
Then it can be shown that X(t) and Y (t) are jointly wide-sense stationary,
provided that X(t) is wide-sense stationary. The cross-correlation function can
be found as follows:
) ∞ *
RY X (τ ) = Y (t + τ ) X ∗ (t) = dα h(α) X(t + τ − α) X ∗ (t)
−∞
∞
= dα h(α) X(t + τ − α) X ∗ (t)
−∞
∞
= dα h(α) RX (τ − α)
−∞
= h ∗ RX (τ ) . (3.126)
Note that the expectation can be taken into the integral because h(t) is absolutely
integrable. Using a similar set of steps, it can be shown that
←
−
RX Y (τ ) = h ∗ RX (τ ) , (3.127)
←
−
where h (t) = h(−t) is a time-reversed version of h(t). Similarly, it can be shown
that
←
−
RY (τ ) = h ∗ h ∗ RX (τ ) . (3.128)
3.3.3 White-noise processes

A random process N (t) is called a white-noise process if it is wide-sense station-
ary and its autocorrelation function is given by
RN (τ ) = N0 δ(τ ) . (3.129)
Note that N0 here is the value of the power spectral density of the white-noise
process since the power-spectral density is simply the Fourier transform of the
autocorrelation. Hence, white-noise processes have a flat PSD since the Fourier
transform of an impulse is a constant. This fact implies that white-noise processes
have infinite bandwidth and so infinite power. In practice, however, all systems
have limited bandwidth and the observed noise is not white.
Additionally, zero-mean, white-noise processes are uncorrelated at different
time samples since
N (t1 )N (t2 ) = RN (t1 − t2 + t) = 0

= X(t1 ) X(t2 ) if t1 = t2 . (3.130)
Note that the variance of a sample of a zero-mean, white-noise process is

infinite since
% &
var(X(t)) = X(t)2 − (X(t))
∞
= RX (0) = df S(f ) = ∞ . (3.131)
−∞
The last step follows from the fact that S(f ) is a constant for all f .
Note that by taking the Fourier transform of Equation (3.128), we find
SY (f ) = H(f ) 2 SX (f ) , (3.132)
where H(f ) is the Fourier transform of h(t). Hence, if a white-noise process N (t)
is filtered through a band-pass filter, the resulting output is no longer white and
may have finite variance.
Perhaps the most commonly used wide-sense stationary random process in the
analysis of wireless communication systems is the white-Gaussian-noise (WGN)
process. The white-Gaussian-noise process is a white-noise process with zero
mean and amplitude distributed as a Gaussian random variable. Since white-
noise processes are uncorrelated at different time instances and uncorrelated
Gaussian random variables are also independent, samples of a white-Gaussian-
noise process at different time instances are independent random variables.
As an example, consider a zero-mean white-Gaussian-noise process N (t) with
power-spectral-density N0 that is filtered through an ideal low-pass filter with
cut-off frequency of ±W and unit height in the pass band. The variance of the
output of the low-pass filter can then be found as follows. Let the output of the
filter be Y (t). Then the variance of Y (t) is
∞
% 2
&
Y (t) = RY (0) = df SY (f )
−∞
= df Sf (f ) = 2 W N0 . (3.133)
pass band
Since linear combinations of Gaussians are still Gaussian, Y (t) is a Gaussian

distributed random variable with zero mean and variance 2 W N0 .
3.4 Poisson processes 91
3.4 Poisson processes
A Poisson process is a simple stochastic process that is commonly used to model

discrete events that occur at random times, such as the start times of telephone
calls, radioactive decay of particles, and in simplified models of buses arriving at a
bus stop. The Poisson process is defined in terms of a function N (t), which counts
the number of occurrences of these events (or arrivals as they are commonly
referred to) that occur from some reference time (typically t = 0) to the time
t. In other words, N (t) counts the number of arrivals from time 0 up to time t.
The defining characteristic of a Poisson process is that the numbers of arrivals in
disjoint intervals are independent random variables. That is to say, the random
variables N (b)−N (a) and N (d)−N (c) are independent if the intervals [a, b), i.e.,
t between a and b, and [c, d), i.e., t between c and d, are disjoint. The Poisson
process is characterized by its intensity function λ(t). The intensity function
defines the mean number of arrivals in an interval [a, b) as follows:
b
N (b) − N (a) = dt λ(t) . (3.134)
a
The homogeneous Poisson process is a Poisson process for which λ(t) = λ, i.e.,
the mean number of arrivals in any interval is simply proportional to the length
of the interval. The following are the main characteristics of a homogeneous
Poisson process with intensity λ.
(1) The numbers of arrivals in disjoint intervals are independent random vari-
ables.
(2) The number of arrivals in a duration τ is a Poisson random variable with
parameter λτ , i.e., the probability mass function is
1
Pr{k arrivals in any interval of length τ } = (λτ )k e−λτ . (3.135)
k!
(3) The time between two consecutive arrivals is an exponential random variable
with mean λ1 .
The Poisson process can also be defined in Rd , where it is referred to as a
Poisson point process (PPP). The defining characteristic of the Poisson point
process is that the number of points in any disjoint subset of Rd are independent
random variables. The number of points in any subset B ∈ Rd is a Poisson
random variable with mean

dx λ(x) . (3.136)
B
Thus, for a homogenous Poisson point process, the number of points in B is a

Poisson random variable with mean λVol{B} where Vol{B} is the volume of B.
For d = 2, the Poisson point process is a useful model to describe planar
wireless networks with a completely random distribution of users. For the ho-
mogeneous Poisson point process, the probability distributions of the distance
to the nearest, second-nearest user, and so forth, are useful in the analysis of
wireless networks. The probability density function of the distance rk between
an arbitrary point to the kth nearest point of a two-dimensional Poisson point
process can be found as follows [210]. Suppose that the point is the origin, then
Pr{rk ≤ r} = Pr{greater than k − 1 points of the PPP in a circle of radius r}

k −1
−λπ r 2 (λπr2 )m
=1−e (3.137)
m =0
m!
−1

d
k
(λπr2 )m
−λπ r 2
fr k (r) = 1−e (3.138)
dr m =0
m!
−1
1
k
= 2e−λπ r
2
m(λπr2 )m − (λπr2 )m +1 . (3.139)
m =0
rm!
3.5 Eigenvalue distributions of finite Wishart matrices
Consider a matrix G ∈ Cm ×n with m ≤ n, and entries drawn independently and

randomly from a complex circular Gaussian distribution of unit variance and
define
1
M = GG† . (3.140)
n
The matrix M is known as a Wishart matrix.
Let λ1 , λ2 , . . . , λm denote the ordered eigenvalues of M where λ1 ≤ λ2 ≤ · · · ≤
λm ≤ ∞. Then, the joint probability density function of λ1 , λ2 , . . . , λm is [159]

m
m
fλ 1 ,λ 2 ,...,λ m (λ1 , λ2 , . . . , λm ) = K e−λ i λni −m (λi − λj )2 , (3.141)
i=1 i< j
where K is a constant that ensures the joint distribution integrates to unity.

Note that the marginal probability density of the kth largest eigenvalue, λk ,
is known and can be found in references such as Reference [359].
3.6 Asymptotic eigenvalue distributions of Wishart matrices
For a matrix G ∈ Cm ×n with entries drawn independently and randomly from

a complex circular Gaussian distribution, the distribution of eigenvalues of the
Hermitian matrix M defined in Equation (3.140) converges to an asymptotic
distribution as m → ∞ and n → ∞ under the constraint that the ratio of m to
n is fixed,
m
r= . (3.142)
n
3.6 Asymptotic eigenvalue distributions of Wishart matrices 93
While the eigenvalues of M are random for finite values of m and n, the dis-
tribution of eigenvalues converges to a fixed distribution (the Marcenko–Pastur
distribution) as m and n approach ∞ [347, 206, 168, 286, 323, 259, 315, 33]. Be-
cause M grows to be infinite in size, there are correspondingly an infinite number
of eigenvalues for M. The technical tools to develop the resulting eigenvalue dis-
tribution are discussed in the following section (Section 3.6.1). The distribution
for the eigenvalues is given by the sum of a continuous probability distribution
fr (λ) and a discrete point at zero weighted by cr :
fr (λ) + cr δ(λ) , (3.143)
where the constant associated with the “delta function” at 0 is given by

1
cr = max 0, 1 − . (3.144)
r
The first term of the probability measure, fr (λ), is given by

⎧
⎪
⎪ √
⎨ (λ−a r ) (b r −λ)
2π λ r ; ar ≤ λ ≤ br
fr (λ) = , (3.145)
⎪
⎪
⎩ 0 ; otherwise
where
√ √
ar = ( r − 1)2 , br = ( r + 1)2 . (3.146)
The largest eigenvalue of M is known to converge to br , which for n = m

equals 4.
The infinite-dimensional form can be employed with reasonable fidelity at sur-
prisingly small values of m and n. Depending upon the details of the problem
being addressed, values as small as 4 can be approximated by the infinite di-
mensional distribution. A useful approximation for the kth largest eigenvalue
when m and n are moderately large is obtained by finding the value of λ for
which the limiting distribution function of the eigenvalues Fr (.) takes the value
of (m − k + 1)/m. This approximation can be expressed as follows,
λk ≈ Fr−1 ((m − k + 1)/m), (3.147)
where Fr−1 (·) indicates the function inverse, and

1 1 1
Fr (x) = (ar + br ) − ar br + (br − x)(x − ar )
8 4 2π
1 ar + br − 2x
+ (ar + br ) arcsin
4π ar − br

1 2 ar br − br x − ar x
+ ar br arctan , (3.148)
2π 2 ar br (br − x) (x − ar )
which for the special case of m = n reduces to

⎧ √
⎨ π + 4x−x 2 +2 arcsin (−1+ 12 x )
2π if 0 ≤ x < 4
F1 (x) = (3.149)
⎩1 x ≥ 4.
3.6.1 Marcenko–Pastur theorem

One of the seminal results in the analyses of asymptotic eigenvalue distributions
[315] of random matrices is the Marcenko–Pastur theorem, which can be used to
find the probability measure in Equation (3.145). In other words, the Marcenko–
Pastur theorem tells us the distribution of the eigenvalues of random matrices
that take the form of covariance matrices, as the dimensions of these matrices
grow large.
The theorem was first derived in Reference [206] and strengthened in Reference
[12], and has proven useful in analyzing the limiting signal-to-interference-plus-
noise ratio (SINR) of wireless links with multiple antennas and/or code-division-
multiple-access (CDMA) systems with random codes as given in references such
as [313] and [323]. The theorem can be condensed into a form suitable for wireless
communications applications as follows.
Consider a matrix G ∈ Cm ×n where the entries of G are independent, iden-
tically distributed, zero-mean complex random variables with unit variance and
an n × n diagonal matrix T = diag(τ1 , τ2 , . . . τn ) where τi ∈ R. Assume that
n, N, → ∞ such that n/N → c > 0, a constant. In this asymptotic regime, as-
sume that the empirical distribution function (e.d.f.) of τ1 , τ2 , . . . τn converges
with probability 1 to a limiting probability distribution function H(τ ) and that
G and T are independent. Note that the empirical distribution function of a set
of variables is defined as the proportion of those variables that are less than,
or equal to, the argument of the function. That is, consider a set A of N real
numbers. The empirical distribution function of the members of the set A is
Number of elements in A less than or equal to τ

e.d.f. {A} (τ ) = . (3.150)
N
In the limit as n, N → ∞, the empirical distribution function of the eigenvalues
of B = GTG† converges with probability 1 at all points where the empirical
distribution function is continuous, to a nonrandom probability density function
f (τ ) whose Stieltjes transform m(z) defined for complex z satisfies the following
equation:
∞
τ
z m(z) + 1 = m(z) c dH(τ ) . (3.151)
0 1 + τ m(z)
Note that Equation (3.145) can be found by setting dH(τ ) equal to the Dirac
measure at 1 and solving Equation (3.151). Also, the Stieltjes transform of dφ(t)
is denoted by mφ (z) and is defined as

∞
1
mφ (z) = dφ(t) for Im(z) < 0. (3.152)
−∞ t−z
3.7 Estimation and detection in additive Gaussian noise
3.7.1 Estimation in additive Gaussian noise

A problem frequently encountered in communication systems (for an example
see Chapter 8) and many other fields is the estimation of a vector s ∈ Cp×1
from a noisy observation z ∈ Cn ×1 , where s and z are related by the following
equation,
z = Hs + n. (3.153)
The mixing matrix is denoted H ∈ Cn ×p , or in the case of MIMO communica-

tions (discussed in Chapter 8) the channel matrix, and n ∈ Cn ×1 is a vector of
noise samples.
Given the noisy observation z, one may wish to obtain the maximum-likelihood
(ML) estimate of s, which we denote here by ŝ. The maximum-likelihood esti-
mator is defined as
ŝ = arg max p(z|s) . (3.154)

s
While more practical receiver algorithms are considered in Chapter 9, as an

introduction, the maximum-likelihood signal estimator with known channel and
interference parameters is considered here. By assuming that the noise is sampled
from a complex Gaussian distribution n ∼ CN (0, R), and that the interference-
plus-noise covariance matrix R and channel matrix H are known, the maximum-
likelihood estimate for the transmitted signal s is
−1 † −1
ŝ = H† R−1 H H R z. (3.155)
The previous equation can be derived by starting with the PDF of the conditional
probability in Equation (3.154),
1
† −1
p(z|s) = exp − (z − H s) R (z − H s) . (3.156)
π n |R|
Because the exponential is a monotonically increasing function of its argument,

maximizing Equation (3.156) is equivalent to minimizing
†
(z − H s) R−1 (z − H s) . (3.157)
By exploiting the Wirtinger calculus discussed in Section 2.8.2, setting the deriva-
tive to zero yields,
d †
(z − H s) R−1 (z − H s) = 0
d s†
H† R−1 H s − H† R−1 z = 0
−1 † −1
s = H† R−1 H H R z. (3.158)
† −1 −1
We have assumed here that H R H is positive-definite, even though it
is only guaranteed to be non-negative-definite by virtue of the fact that R is a
covariance matrix.
3.7.2 Detection in additive Gaussian noise
Vector detection
Consider a system described by Equation (3.154) but the vectors s can only
take on one of the values s1 , s2 , . . . , sK . Given the noisy observation z, we wish
to detect which of the possible s vectors was actually present such that the
probability of making an error Pe is minimized, where
Pe = Pr{ŝ = s} . (3.159)
For a given observation z, the probability of error is minimized if the estimated
value ŝ is such that the conditional probability of s given z is maximized. That
is to say, the minimum probability of error estimator of s is
ŝ = arg max p (s = s |z) . (3.160)
s ∈{s 1 ,s 2 ,...,s k }
To illustrate this problem, it is instructive to consider the case when the random
vector of interest can take one of two possible, values i.e., K = 2. We can write
the conditional probability above using Bayes rule discussed in Section 3.1.7 as
p (z|s) p (s)
p (s|z) = . (3.161)
p (z)
Thus, ŝ = s1 if
p (z|s = s1 ) p (s = s1 ) p (z|s = s2 ) p (s = s2 )
≥
p (z) p (z)
p (z|s = s1 ) p (s = s2 )
≥ . (3.162)
p (z|s = s2 ) p (s = s1 )
The quantity on the left-hand side is known as the likelihood ratio.
Assuming that s1 and s2 are equally likely and that n ∼ CN (0, R), we can
write Equation (3.162) as
1
† −1
exp − (z − H s 1 ) R (z − H s1 )
π n |R|
1
†
≥ n exp − (z − H s2 ) R−1 (z − H s2 ) . (3.163)
π |R|
Simplifying by using the fact that the exponential is a monotonically increasing

function yields:
† †
(z − H s1 ) R−1 (z − H s1 ) ≤ (z − H s2 ) R−1 (z − H s2 ) . (3.164)
If the noise samples in the vector n are uncorrelated and have equal variance,
R = σ 2 I and the expression above becomes
2 2
||z − H s1 || ≤ ||z − H s2 || . (3.165)
Since H s equals z if s = s and no noise is present, the equation above can

be thought of as nearest-neighbor detection. If s = s1 , an error occurs with the
following probability:
0 1
2 2
Pr{ŝ = s2 |s = s1 } = Pr ||z − H s1 || ≥ ||z − H s2 || | s = s1
0 1
2 2
= Pr ||n|| ≥ ||n + H (s1 − s2 )||
0 1
= Pr (s1 − s2 )† H† n + n† H(s1 − s2 ) ≥ ||H (s1 − s2 )||
2
0 3 4 1
= Pr 2 (s1 − s2 )† H† n ≥ ||H (s1 − s2 )||
2
0 1
2
= Pr v ≥ ||H (s1 − s2 )|| , (3.166)

2
where v ∼ N 0, 2σ 2 ||H (s1 − s2 )|| . Hence Equation (3.166) evaluates to

||H (s1 − s2 )||
Pr{ŝ = s2 |s = s1 } = Q √ , (3.167)
2 σ2
which by symmetry (because s1 and s2 are equally likely) equals the probability
of error.
Extending this analysis to systems with a larger number of possible values of
the vector s, i.e., K ≥ 2 (for a general R), yields the following expression for
the minimum probability of error estimator for s when s is uniformly distributed
among s1 , s2 , . . . sK ,
†
ŝ = arg min (z − H s ) R−1 (z − H s ) . (3.168)
s ∈{s 1 ,s 2 ,...,s k }
If the noise samples are uncorrelated,

2
ŝ = arg min ||z − H s || . (3.169)
s ∈{s 1 ,s 2 ,...,s k }
The probability of error can be bounded from above by finding the worst-case
difference between H s and H sm . Let
dmin = min ||H s − H sm ||. (3.170)

∈1 , 2 , . . . , K
m ∈1 , 2 , . . . , K ,
= m
Then, the probability of error when s is one of K equally likely vectors, is bounded
from above by

|d |
Pe ≤ Q √ min . (3.171)
2 σ2
Matrix detection in white Gaussian noise

We can extend the vector detection problem of the previous section to a matrix
detection problem where the observations are matrices Y ∈ Cn ×m , the mixing
matrix H ∈ Cn ×p , the signal matrix S ∈ Cp×m , and the noise matrix W ∈ Cn ×m .
This class of problem is often encountered in multiple-antenna systems where the
rows of the observation matrix represent the samples received at a given antenna
of a receiver over multiple time samples. We can thus write a system equation
analogous to Equation (3.153) as
Z = HS + N, (3.172)
where S can take values of S1 , S2 , . . . SK with equal probability. The matrix
detection problem can be rewritten by vectorizing the matrices Z, S, and W
whereby the vectors z̄ ∈ Cn m ×1 , S̄ ∈ Cp m ×1 , and w̄ ∈ Cn m ×1 are obtained
by stacking up the columns of Z, S, and N respectively. Additionally, define
H̄ ∈ Cn m ×p m as a block diagonal matrix whose diagonal blocks comprise the
matrix H, that is to say
H̄ = H ⊗ Im . (3.173)
Equation (3.172) can thus be written as
z̄ = H̄ s̄ + w̄ . (3.174)
The probability of error can be bounded by writing
d̄min = min ||H̄ s̄ − H̄ s̄m || (3.175)
∈1 , 2 , . . . , K
m ∈1 , 2 , . . . , K ,
= m
= min ||H̄ S − H̄ Sm ||F . (3.176)

∈1 , 2 , . . . , K
m ∈1 , 2 , . . . , K ,
= m
Recall that ||A||F is the Frobenius norm of A, which is the square root of the
sum of the squares of all entries of the matrix A. The probability of error is thus
bounded from above by

d̄
Pe ≤ Q √ min . (3.177)
2 σ2
3.7.3 Receiver operating characteristics

We can consider an example problem in which there are two hypotheses: signal
of interest present H1 or signal of interest absent H0 . A common technique
for displaying the performance of a particular detection test statistic φ(Z) as a

function of some observed data is the receiver operating characteristic (ROC)
curve. Here, the test statistic is implicitly a function of any parameters known
about the transmitted signals of interest or environment. Detection is declared
if the test statistic achieves or exceeds some threshold η,
φ(Z) ≥ η . (3.178)
Given an ensemble of observations defined by the density p(Z|H1 ) in which signal
is present H1 , the probability of detection Pd (η) is defined by
Pd (η) = Pr{φ(Z) ≥ η}

= dΩZ p(Z|H1 ) θ{φ(Z) − η} , (3.179)
where the function θ{x} for some real variable x is defined here to be
1 ; x≥0
θ{x} = . (3.180)
0 ; x<0
A false alarm occurs when the test statistic exceeds the detection threshold when
no signal of interest is present. Given an ensemble of observations defined by the
density p(Z|H0 ) in which signal is absent H0 , the probability of false alarm
Pf a (η) is defined by
Pf a (η) = Pr{φ(Z) ≥ η}

= dΩZ p(Z|H0 ) θ{φ(Z) − η} . (3.181)
The receiver operating characteristic curve is given by plotting Pd (η) versus

Pf a (η) over all viable threshold values η.
3.8 Cramer–Rao parameter estimation bound
The Cramer–Rao bound is a standard tool of parameter estimation [312, 172,

250, 295]. It provides a limiting performance of an unbiased parameter estimator
in the presence of noise. To be clear, the Cramer–Rao bound is a local bound.
Any contribution to estimation variance caused by significantly different param-
eters that have nearly ambiguous observations are not included by the Cramer–
Rao bound. These nonlocal contributions are considered by the Weiss–Weinstein
bound [342, 78] for example.
3.8.1 Real parameter formulation

For an observed variable x ∈ R and parameter θ ∈ R, the probability density
function for x is given by p(x; θ). If an estimate of θ is a function of the obser-
vation f (x), and is denoted θ̂ = f (x), then the variance of the estimate is given
by
5 6
var(θ̂) = (θ̂ − θ)2

= dx (θ̂ − θ)2 p(x; θ) . (3.182)
If the estimator is unbiased, then the mean of the estimator is the actual
parameter,
5 6
θ̂ = θ . (3.183)
The variance of an unbiased estimator is bounded by the inverse of the Fisher

information J, given by
5 6
(θ̂ − θ)2 ≥ J −1
' 2 (
∂
J= log p(x; θ) . (3.184)
∂θ
The basis of this bound is given by the statistical relationship for random vari-
ables a and b that is constructed by the Cauchy–Schwarz inequality (defined in
Equation (2.42)),
2 % &% &
a b ≤ a2 b2 . (3.185)
By substituting the expressions of interest, this becomes
) *2 ' 2 (
∂ log p(x; θ) 5 6 ∂ log p(x; θ)
(θ̂ − θ) ≤ (θ̂ − θ)2
∂θ ∂θ
= var{θ̂} J . (3.186)
The Cramer–Rao bound is shown if the left-hand side of Equation (3.186) is

found to be one. To show that the left-hand side of the equation is one, it can
be expanded into two terms,
) * ) * ) *
∂ log p(x; θ) ∂ log p(x; θ) ∂ log p(x; θ)
(θ̂ − θ) = θ̂ − θ . (3.187)
∂θ ∂θ ∂θ
In the following discussion, it is shown that the first term on the right-hand side
of Equation (3.187) is one and the second is zero. Focusing on the first term, the
expression simplifies to
) * ) *
∂ log p(x; θ) 1 ∂p(x; θ)
θ̂ = θ̂
∂θ p(x; θ) ∂θ

1 ∂p(x; θ)
= dx p(x; θ) θ̂
p(x; θ) ∂θ

∂
= dx p(x; θ) θ̂
∂θ
∂ 5 6
= θ̂
∂θ
∂
= θ = 1, (3.188)
∂θ
where θ̂ = θ because by definition the estimator is unbiased. The second right-

hand term in Equation (3.187) is given by
) * ) *
∂ log p(x; θ) 1 ∂p(x; θ)
θ = θ
∂θ p(x; θ) ∂θ

1 ∂p(x; θ)
= dx p(x; θ) θ
p(x; θ) ∂θ

∂p(x; θ)
= dx θ
∂θ

∂ ∂θ
= dx p(x; θ) θ − dx p(x; θ)
∂θ ∂θ
∂θ ∂θ
= − = 0, (3.189)
∂θ ∂θ
where the observation that dx p(x; θ) = 1 is employed. Consequently, from

Equation (3.186), the variance of the estimate is greater than the inverse of the
Fisher information,
1
var{θ̂} ≥ . (3.190)
J
The Fisher information can also be represented by a form involving the second
derivative of the log of the probability,
' 2 (
∂
J= log p(x; θ)
∂θ
) 2 *
∂
=− log p(x; θ) . (3.191)
∂θ2
This relationship can be found by first noting the expectation of the derivative
of the log of the probability is zero,
) *
∂ ∂
log p(x; θ) = dx p(x; θ) log p(x; θ)
∂θ ∂θ

∂
= dx p(x; θ)
∂θ
∂
= 1 = 0. (3.192)
∂θ
By using this observation and evaluating the derivative of zero, the second form
of the Fisher information is found,
) *
∂ ∂ ∂
0= 0= log p(x; θ)
∂θ ∂θ ∂θ

∂ ∂
= dx p(x; θ) log p(x; θ)
∂θ ∂θ

∂p(x; θ) ∂ ∂2
= dx log p(x; θ) + p(x; θ) 2 log p(x; θ)
∂θ ∂θ ∂θ

∂ ∂ ∂2
= dx p(x; θ) log p(x; θ) log p(x; θ) + 2 log p(x; θ)
∂θ ∂θ ∂θ
' 2 ( ) 2 *
∂ ∂
log p(x; θ) =− log p(x; θ) . (3.193)
∂θ ∂θ2
Consequently, the expectations of the square of the derivative of the log-

likelihood and the second derivative of the log-likelihood are equal up to a nega-
tive
sign.
Note that one example where the Cramer–Rao bound is satisfied is when
estimating the mean of a Gaussian random variable.
3.8.2 Real multivariate Cramer–Rao bound

The Cramer–Rao bound can be extended to multivariate distributions with mul-
tiple parameters. The variance of a parameter estimator θ̂ for the mth element
of the real parameter vector θ ∈ Rn ×1 is given by the (mth, mth) element of the
inverse of the Fisher information matrix J,
)$ $2 *
$ $
${θ̂ − θ}m $ ≥ {J−1 }m ,m , (3.194)
or more generally the covariance matrix bound is given by

5 6
cov{θ} = (θ̂ − θ)(θ̂ − θ)T
≥ J−1 , (3.195)
where the inequality between matrices is used to indicate that the difference be-
tween the matrices (cov{θ}−J−1 ) is positive-semidefinite or, in other words, has
non-negative eigenvalues. The Fisher information matrix for observation vector
z with probability density function p(z; θ) is given by
) *
∂ ∂
{J}m ,n = log p(z; θ) log p(z; θ)
∂{θ}m ∂{θ}n
) 2 *
∂ log p(z; θ)
=− . (3.196)
∂{θ}m ∂{θ}n
Real multivariate Cramer–Rao bound for complex Gaussian distribution

If the n-vector z ∈ Cn ×1 is sampled from the complex Gaussian distribution with
the mean μ(θ) and signal covariance matrix R(θ), then the probability density
function is given by
1 † −1
p(z; θ) = e−[z−μ(θ)] R (θ) [z−μ(θ)] (3.197)
πn |R(θ)|
as a function of the vector of parameter θ.

To evaluate the Fisher information matrix for the Gaussian model, a few
notational conveniences are assumed. The parameters α and β are a pair of
parameters found in the parameter vector θ. While the mean μ and covari-
ance matrix R are functions of θ and therefore α and β, this dependence will
not be explicit for the sake of notational expediency. The Fisher information
matrix {J}α ,β element associated with the parameter pair α and β is
given by
) *
∂2
{J}α ,β =− log p(z; θ)
∂α ∂β
) 2 *
∂ † −1
= ([z − μ] R [z − μ] + log |R|)
∂α ∂β
)
∂ ∂μ† −1 ∂μ
= − R [z − μ] − [z − μ]† R−1
∂α ∂β ∂β
*
∂R ∂R
− [z − μ]† R−1 R−1 [z − μ] + tr R−1 ,
∂β ∂β
(3.198)
where the derivative with respect to β is evaluated. By evaluating both deriva-

tives, the elements of the Fisher information matrix is given by
)
∂μ† −1 ∂μ ∂μ† −1 ∂μ
{J}α ,β = R + R
∂β ∂α ∂α ∂β
2
∂ R
− [z − μ]† R−1 R−1 [z − μ]
∂αβ

† −1 ∂R −1 ∂R
+ [z − μ] R R R−1 [z − μ]
∂α ∂β

† −1 ∂R −1 ∂R
+ [z − μ] R R R−1 [z − μ]
∂β ∂α
2 *
−1 ∂R −1 ∂R −1 ∂ R
− tr R R + tr R
∂α ∂β ∂αβ
∂μ† −1 ∂μ ∂μ† −1 ∂μ
= R + R
∂β ∂α ∂α ∂β

−1 ∂R −1 ∂R
+ tr R R , (3.199)
∂α ∂β
where the observations that the mean of the data vector with the mean re-
moved is zero z − μ = 0, that the quadratic form can be reordered by using
the trace v† M v = tr{v v† M}, and that the expectation of the outer prod-
uct
% of the difference
& with itself is the interference-plus-noise covariance matrix
[z − μ][z − μ]† = R are all employed. By reverting to the earlier notation and
observing that the sum of the variable and its conjugate is equal to twice the real
part of the variable, the Fisher information matrix for this Gaussian distribution
is given by

∂R(θ) −1 ∂R(θ)
{J(θ)}m ,n = tr R−1 (θ) R (θ)
∂{θ}m ∂{θ}n
†

∂μ (θ) −1 ∂μ(θ)
+ 2 R (θ) , (3.200)
∂{θ}m ∂{θ}n
where {·} indicates the real portion of an expression.
Change of variables
It is sometimes convenient to calculate the Fisher information matrix in one
basis and then change to another set of variables. By using the matrix form
found in Equation (3.196), the covariance for the estimation error on a vector of
parameters θ is given by
5 6
(θ̂ − θ)(θ̂ − θ)T ≥ J−1
5 6−1
T
= (∇θ log p(z; θ)) (∇θ log p(z; θ)) . (3.201)
If another vector of parameters ν = g(θ) is defined by the vector function of θ

given by g(θ), then the transformed parameter covariance matrix bound is given
by
% &
cov{ν} = (ν̂ − ν)(ν̂ − ν)T
T
∂g(θ) −1 ∂g(θ)
≥ J (θ)
∂θ ∂θ
' T (−1 T
∂g(θ) ∂ ∂ ∂g(θ)
= log p(z; θ) log p(z; θ) .
∂θ ∂θ ∂θ ∂θ
(3.202)
Reduced Fisher information matrix

Often in the estimation process, some of the parameters are of interest, while
other parameters are nuisance parameters that need to be estimated but are not
of interest. The existence of these nuisance parameters often increases the bound
on the variance of the parameter of interest. The evaluation of the Cramer–Rao
bound can sometimes be simplified by employing the reduced Fisher information
matrix J(r ) . The Fisher information matrix can be partitioned for some set of
parameters indicated by a in the presence of nuisance parameters indicated by
b with the form

Ja,a Ja,b
J= . (3.203)
Jb,a Jb,b
Because only the {a, a}th component of the inverse is desired, the inverse of the
entire information matrix does not need to be evaluated. By using the Sherman
relation discussed in Section 2.5, the variance bound for the set of parameters of
interest is given using the reduced Fisher information matrix
−1 −1
(r )
Ja,a = Ja,a − Ja,b J−1
b,b Jb,a . (3.204)
3.8.3 Cramer–Rao bound for complex parameters

In general, Cramer–Rao bounds are a function of a mix of real and complex
parameters. Depending upon the details of the problem, there are a few different
ways to address the inclusion of complex parameters in the evaluation. The most
general solution is to extend the information matrix by mapping the complex
parameters to doublets of real parameters. In special cases, Wirtinger calculus
can be employed to simplify the evaluation. Fortunately, problems of interest
often satisfy the requirements of the special case. Finally, in a more narrowly
defined special case, it is not uncommon for complex parameters to be nuisance
parameters, while some real parameters are the parameters of interest. In ad-
dition to the general forms, specific applications to Gaussian distributions are
considered.
Real components of complex parameters

For the case of a vector of complex parameters ξ and a vector of real parameters
ρ such that the probability distribution for z is given by p(z; ρ, ξ), the general
solution is to construct a vector of real parameters θ that is defined by
⎛ ⎞
ρ
θ = ⎝ ξ ⎠ . (3.205)
ξ
By using this formulation, the previous results for real parameters can be em-
ployed directly. However, the variance bound associated with the mth compo-
nent of ξ is given by the sum of variance bounds on the real and imaginary
components.
Wirtinger calculus for complex scalar

Within the context of multiple antenna problems, it is not uncommon to work
with complex parameters. While one can consider the real and imaginary com-
ponents of the complex parameters as suggested earlier in this section, under
special cases, the Wirtinger calculus can be applied to simplify the Cramer–Rao
bound evaluation. When one is considering using the Wirtinger calculus for the
evaluation of Cramer–Rao bounds, it is worth stressing that this approach is not
appropriate for all problems, depending upon the definition of the probability
distribution [357, 172, 318, 158, 288, 237]. For the sake of this discussion, it is
assumed that the parameter is a single complex scalar ξ = α + i β with real and
imaginary components α and β. This development can be extended to include
more complex and real parameters.
The form of the Cramer–Rao bound can be found by considering the change of
variables discussed in Section 2.8.2 such that the real and imaginary components
α and β are converted to the bound for the variables ξ and ξ ∗ . In the Wirtinger
calculus sense, the doppelganger parameters ξ and ξ ∗ are considered real. The
notation to represent the two-dimensional vector for each doublet is given by

α
a=
β

ξ
ν= . (3.206)
ξ∗
If the function that converts the real pair to the conjugate pair is defined ν =
g(a),

1 i
ν= a. (3.207)
1 −i
Consequently, the derivative of g with respect to a is given by

∂ 1 i
g(a) = . (3.208)
∂a 1 −i
The inverse of this derivative is given by

−1
∂ 1 1 1
g(a) = . (3.209)
∂a 2 −i i
At this point, there is an issue of convention for the variance of the ν. Because
the doppelganger variables are real, it would be natural to define the variance
as the expectation of ν ν T . However, we will use the form
% &
cov{ν} = (ν̂ − ν)(ν̂ − ν)† . (3.210)
This approach will lead to a swapping of the position of terms in the Fisher
information matrix. In this particular case, it is desirable for the diagonal terms
to be associated with ξ ξ ∗ , which is the term of interest for complex variables.
Everything could be done by using the traditional real covariance definition and
then focusing on the off-diagonal elements.
By using the form for change of variables given in Equation (3.202), the co-
variance matrix for the conjugate pair is given by
% &
cov{ν} = (ν̂ − ν)(ν̂ − ν)†

∂g(a) % & ∂g(a) †
= (â − a)(â − a) T
∂a ∂a
†
∂g(a) −1 ∂g(a)
≥ J (a)
∂a ∂a
' T (−1 †
∂g(a) ∂ ∂ ∂g(a)
= log p(z; a) log p(z; a)
∂a ∂a ∂a ∂a
' T (
−1 †
1 i ∂ ∂ 1 i
1 −i ∂a ∂a 1 −i
' −† T −1 (−1
1 i ∂ ∂ 1 i
1 −i ∂a ∂a 1 −i
' T (−1
1 1 i ∂ ∂ 1 1 1
= log p(z; a) log p(z; a) ,
2 1 −i ∂a ∂a 2 −i i
(3.211)
where the superscript −† indicates the Hermitian conjugate and inverse. By
making the observation that the Wirtinger partial derivatives are defined by

∂ 1 1 −i ∂
T
=
∂ν 2 1 i ∂aT

1 1 −i ∂
∂
α
= ∂
2 1 i ∂β

∂
∂ξ
= ∂ , (3.212)
∂ξ∗
as can be seen in Equations (2.159) and (2.160), similarly the complex version
of the above is given by

∂ 1 1 i ∂
=
∂ν † 2 1 −i ∂aT
∂

1 1 i ∂α
=
2 1 −i ∂
∂β

∂
∂ξ∗
= ∂ . (3.213)
∂ξ
Consequently, the bound can be written

' T T (−1
1 1 i ∂ ∂ 1 1 −i
cov{ν} ≥ log p(z; a) log p(z; a)
2 1 −i ∂a ∂a 2 1 i
) *−1
∂ ∂
= †
log p(z; ν) log p(z; ν)
∂ν ∂ν
−1
Jξ ∗ ,ξ Jξ ∗ ,ξ ∗
= , (3.214)
Jξ ,ξ Jξ ,ξ ∗
where the elements Jξ ,ξ , Jξ ,ξ ∗ , Jξ ∗ ,ξ , and Jξ ∗ ,ξ ∗ are defined implicitly. For many
problems of interest, the off-diagonal elements in cov{ν} are zero. The lower-left
element of the Fisher information is given by
' 2 (
∂
Jξ ,ξ = log p(z; ν)
∂ξ
) 2 *
∂
=− log p(z; ν) . (3.215)
∂ξ 2
As an example, the Gaussian distribution with a complex parameterization ξ of
the mean given by
1 † −1
p(z; ξ) = e−{(z−ξ w ) R (z−ξ w )} , (3.216)
π n |R|
where z ∈ Cn ×1 is the observed variable, and w ∈ Cn ×1 and R ∈ Cn ×n are
known parameters. The lower-left element of the Fisher information matrix is
given by
) 2 *
∂
Jξ ,ξ = − log p(z; ν)
∂ξ 2
) 2 *
∂ † −1
= {(z − ξw) R (z − ξw)}
∂ξ 2
) 2 *
∂ ∗ ∗ ∗ T −1
= {(z − ξ w ) R (z − ξw)}
∂ξ 2
= 0. (3.217)
Similarly, the upper-right element of the Fisher information matrix is Jξ ∗ ,ξ ∗ = 0.

The diagonal elements of the Fisher information matrix are equal. This equality
is shown by rearranging the order within the expectation,
) *
∂ ∂
Jξ ,ξ ∗ = log p(z; ν) log p(z; ν)
∂ξ ∂ξ ∗
) *
∂ ∂
= log p(z; ν) log p(z; ν)
∂ξ ∗ ∂ξ
= Jξ ∗ ,ξ . (3.218)
Consequently, the covariance matrix of the estimation error is bound by

−1
Jξ ∗ ,ξ 0
cov{ν} ≥
0 Jξ ,ξ ∗

1 1 0
= (3.219)
Jξ ∗ ,ξ 0 1
under the condition that Jξ ,ξ = Jξ ∗ ,ξ ∗ = 0. This result can be rewritten as
5 6 ) ∂
∂
*
ξˆ − ξ 2 ≥ log p(z; ν) log p(z; ν) . (3.220)
∂ξ ∗ ∂ξ
Wirtinger calculus for complex multivariate parameters

The general form for the Cramer–Rao bound [172, 288, 237] for the variance on
the unbiased estimates of the complex vector ξ is given by
' (
(ξ̂ − ξ)(ξ̂ − ξ)† (ξ̂ − ξ)(ξ̂ − ξ)T
≥ F−1 , (3.221)
(ξ̂ − ξ)∗ (ξ̂ − ξ)† (ξ̂ − ξ)∗ (ξ̂ − ξ)T
where the complete Fisher information matrix F is given by

J K
F= . (3.222)
K ∗ J∗
The complex Fisher information matrix is determined by J under the assump-
tion that the pseudo-Fisher information matrix K is zero. By employing the
Wirtinger calculus, the Fisher information matrix and the pseudo-Fisher infor-
mation matrix are given by
5 6
T
J = Jξ∗ ,ξ = (∇ξ∗ log p(z; ξ)) (∇ξ log p(z; ξ))
5 6
T
K = Kξ∗ ,ξ = (∇ξ∗ log p(z; ξ)) (∇ξ∗ log p(z; ξ)) , (3.223)
where the subscripts in the notation Jξ∗ ,ξ and Kξ∗ ,ξ∗ are used to indicate the
parameter with which the derivative is being taken. From Equation (2.112), the
upper-right-hand block of the inverse of the complete Fisher information matrix
{F−1 }u .r. is given by
{F−1 }u .r. = −(J − K J−∗ K∗ )−1 K J−∗ , (3.224)

where the superscript (·)−∗ indicates the inverse conjugate of the expression. If
it is known that the expectation of the term (ξ̂ − ξ)(ξ̂ − ξ)T is 0, because of
knowledge of the distribution, then the upper-right-hand block of the inverse of
the complete Fisher information matrix must also 5be zero. For bounded6 (not
infinite) J and K and nonzero J, the relationship (ξ̂ − ξ)(ξ̂ − ξ) T
= 0 can
only be satisfied by a pseudo-Fisher information matrix that is zero, K = 0,
5 6
(ξ̂ − ξ)(ξ̂ − ξ)T = 0 and J = 0 ⇒ K = 0. (3.225)
It is worth stressing that the general form of the definition of the complete
Fisher information matrix allows for nonzero pseudo-Fisher information matri-
ces [237]; however, for many problems of interest the pseudo-Fisher information
matrices evaluate to the zero matrix and the bound is given by
5 6
(ξ̂ − ξ)(ξ̂ − ξ)† ≥ J−1
ξ∗ ,ξ = J
−1
(3.226)
if K = 0. One additional note, because of the matrix inversion, the ordering of the
conjugation between the estimation error covariance matrix and the derivatives
is reversed.
Wirtinger calculus for complex multivariate Gaussian distribution

As an example, consider the complex Gaussian distribution that is a function of
a vector of complex parameters ξ. In particular, we are interested in formulations
in which the pseudo-Fisher information matrix is zero. It is assumed here that
the mean μ and the covariance matrix R are implicit functions of the Wirtinger
doppelgangers ξ and ξ ∗ . The complex Gaussian distribution is given by
1 † −1
p(z; ρ) = e−[z−μ] R [z−μ] . (3.227)
π n |R|
The first issue is to evaluate the pseudo-Fisher information matrix K,

5 6
T
K = (∇ξ∗ log p(z; ξ)) (∇ξ∗ log p(z; ξ))
) *
∂ ∂
{K}m ,n = log p(z; ξ) log p(z; ξ) . (3.228)
∂{ξ ∗ }m ∂{ξ ∗ }n
Recalling the gradient operation under the assumption that vectors a and b are
functions of x while M is not, then the gradient of the quadratic form is given
by
∇x (aT M b) = (∇x aT ) M b + (∇x bT ) MT a . (3.229)

The derivative of the log of the probability distribution is given by

∇ξ∗ log p(z; ξ) = ∇ξ∗ −[z∗ − μ∗ ]T R−1 [z − μ] − log |R|
∗ T
∂μ −1 ∗ ∗ T −1 ∂μ
= R [z − μ] + [z − μ ] R
∂ξ ∗ ∂ξ ∗

∂R −1 −1 ∂R
+ [z∗ − μ∗ ]T R−1 R [z − μ] − tr R .
∂ξ ∗ ∂ξ ∗
(3.230)
Here two different regimes are considered: first, the mean μ is a function of the
parameter, and second, the covariance R is a function of the parameter. Under
the assumption that the mean is a function of the complex parameter, but the
covariance is not, the derivative is given by
∇ξ∗ log p(z; ξ) = (∇ξ∗ μ† ) R−1 [z − μ] + (∇ξ∗ μT ) R−T [z∗ − μ∗ ] , (3.231)
where (·)−T indicates the inverse transpose of the matrix. Because

% the complex
&
Gaussian distribution is circularly symmetric and the term (z − μ)(z − μ)T =
0 is also zero, the terms involving [z − μ][z − μ]T or its complex conjugate are
zero, so the elements of the pseudo-Fisher information matrix K are given by
5 6
T
K = [∇ξ∗ log p(z; ξ)] [∇ξ∗ log p(z; ξ)]
= ∇ξ∗ μ† R−T ∇ξ† μ + ∇ξ∗ μ† R−1 ∇ξ† μ . (3.232)
For many calculations, the element of the matrix given by ∇ξ† μ or ∇ξ† μ∗ is zero
because the mean is a function of either ξ or ξ ∗ . Consequently, in this case the
pseudo-Fisher information matrix K is zero. If the pseudo-Fisher information
matrix K is zero, then the Fisher information matrix is given by
5 6
T
J = [∇ξ∗ log p(z; ξ)] [∇ξ log p(z; ξ)]
= ∇ξ∗ μ† R−1 ∇ξT μ + ∇ξ∗ μT R−T ∇ξT μ∗ , (3.233)
where the gradients are given by
∇ξ∗ log p(z; ξ) = (∇ξ∗ μ† ) R−1 [z − μ] + (∇ξ∗ μT ) R−T [z∗ − μ∗ ]

[∇ξ log p(z; ξ)]T = [z − μ]T R−T (∇ξ μ∗ ) + [z − μ]† R−1 (∇ξ μ) , (3.234)
% &
and the observation that [z − μ][z − μ]T = 0 is employed.
If the mean is not a function of the parameter, but the covariance matrix is a
function of the parameters, then the derivative with respect to the doppleganger
variables is given by

∂ ∂R ∂R
log p(z; ξ) = [z − μ]† R−1 R−1 [z − μ] − tr R−1
∂{ξ ∗ }n ∂{ξ ∗ }n ∂{ξ ∗ }n
∂2 ∂R ∂R
log p(z; ξ) = −[z − μ]† R−1 R−1 R−1 [z − μ]
∂{ξ }m ∂{ξ ∗ }n
∗
∂{ξ ∗ }m ∂{ξ ∗ }n
∂2 R
+ [z − μ]† R−1 R−1 [z − μ]
∂{ξ }m ∂{ξ ∗ }n
∗
∂R ∂R
− [z − μ]† R−1 R−1 R−1 [z − μ]
∂{ξ ∗ }n ∂{ξ ∗ }m

∂R ∂R
+ tr R−1 R −1
∂{ξ ∗ }m ∂{ξ ∗ }n

∂2 R
− tr R−1 . (3.235)
∂{ξ ∗ }m ∂{ξ ∗ }n
The expectation of the additive inverse of the second derivative generates the
(mth, nth) element of the pseudo-Fisher information matrix K. This expectation
is given by
) *
∂2
{K}m ,n = − log p(z; ξ)
∂{ξ ∗ }m ∂{ξ ∗ }n

∂R −1 ∂R −1
= tr R R . (3.236)
∂{ξ ∗ }m ∂{ξ ∗ }n
Many useful models for covariance matrices do not satisfy the requirement. Con-
sequently, a real parameter evaluation is required.
Reduced Fisher information for real parameters with complex nuisance parameters
It is assumed here that a set of real parameters contained in the vector u are the
parameters of interest and that the complex parameters are nuisance parameters.
Once again defining the stacked vector ν of the doppelganger variables,

ξ
ν= . (3.237)
ξ∗
By using the notation for the Fisher information matrix

% &
Jx,y = [∇x log p(z; x, y)] [∇y log p(z; x, y)]T (3.238)
for probability density p(z; x, y) over observation variable z, the complete Fisher
information matrix is given by

Ju,u Ju,ν
J= . (3.239)
Jν ∗ ,u Jν ∗ ,ν
By considering the upper-left term of the inverse of the 2 × 2 block matrix from
Equation (2.112), the reduced Fisher information matrix is given by
(r )
Ju,u = Ju,u − Ju,ν J−1
ν ∗ ,ν Jν ∗ ,u

Jξ∗ ,u
= Ju,u − (Ju,ξ Ju,ξ∗ ) J−1
ν ∗ ,ν
Jξ,u
−1
Jξ∗ ,ξ Jξ∗ ,ξ∗ Jξ∗ ,u
= Ju,u − (Ju,ξ Ju,ξ∗ ) . (3.240)
Jξ,ξ Jξ,ξ∗ Jξ,u
In general, the expression must be evaluated completely. However, as mentioned
in the previous section, often some of these terms quickly evaluate to zero. In
particular, if the psuedo-information matrix evaluates to zero, the form of the
reduced Fisher information matrix simplifies to
−1
Jξ∗ ,ξ 0 Jξ∗ ,u
(r )
Ju,u → Ju,u − (Ju,ξ Ju,ξ∗ )
0 Jξ,ξ∗ Jξ,u
= Ju,u − Ju,ξ J−1 ∗ ∗
−1
ξ∗ ,ξ Jξ ,u − Ju,ξ Jξ,ξ∗ Jξ,u (3.241)
if Jξ∗ ,ξ∗ = Jξ,ξ = 0. For many applications, either the second or third term in
Equation (3.241) is zero.
Reduced Fisher information for Gaussian distribution with real parameter in the mean
with complex nuisance parameters
The Fisher information matrix for parameter pair x, y is given by
% &
Jx,y = [∇x log p(z; x, y)] [∇y log p(z; x, y)]T , (3.242)
where z is the vector of observations. A common form for the mean that is
encountered in array processing is given by
μ(θ) = a ν(u) , (3.243)
where a is a complex attenuation scalar and ν(u) is a vector function of a vector
of real direction parameters u. Typically, the attenuation scalar a is considered
a nuisance parameter. The vector of parameters can be represented by
⎛ ⎞
u
θ=⎝ a ⎠. (3.244)
a∗
For a complex Gaussian model, the log of the probability density is given by
log p(z; a, u) = −[z − a ν(u)]† R−1 [z − a ν(u)] + const. , (3.245)
where R is the spatial covariance matrix of the noise. Given this model for the
probability density, the pseudo-information terms for the complex parameters
are zero. Consequently, the reduced information matrix is given by Equation
(3.241),
(r )
Ju,u = Ju,u − Ju,a J−1 −1
a ∗ ,a Ja ∗ ,u − Ju,a ∗ Ja,a ∗ Ja,u . (3.246)
The first term is the Fisher information matrix associated with the real direc-
tion parameters u. This term is given by the parameter-in-the-mean term found
in Equation (3.200),

∂[a ν(u)]† −1 ∂[a ν(u)]
J{u}m ,{u}n = 2 R
∂{u}m ∂{u}n
3 4
Ju,u = 2 a (∇u [ν(u)] ) R−1 (∇u [ν(u)])T
2 †
0 1
Ju,u = 2 a 2 V̇† R−1 V̇ , (3.247)
where a matrix of array response vector derivatives V̇ is given by

∂ ∂
V̇ = ν(u) ν(u) ··· . (3.248)
∂{u}1 ∂{u}2
The attenuation Fisher information terms Ja ∗ ,a and Ja,a∗ are given by

% &
Ja ∗ ,a = [∇a ∗ log p(z; a, u)] [∇a log p(z; a, u)]T
∇a ∗ log p(z; a, u) = ν † (u) R−1 [z − a ν(u)]
∇a log p(z; a, u) = [z − a ν(u)]† R−1 ν(u)
% &
Ja ∗ ,a = ν † (u) R−1 [z − a ν(u)][z − a ν(u)]† R−1 ν(u)
= ν † (u) R−1 ν(u)
Ja ∗ ,a = Ja,a ∗ , (3.249)
where the two attenuation Fisher information terms are equal because the scalar
derivative terms commute, Ja ∗ ,a = Ja,a ∗ .
The cross-parameter information terms are given by
% &
Ju,a ∗ = [∇u log p(z; a, u)] [∇a ∗ log p(z; a, u)]T
∇u log p(z; a, u) = −∇u ([z − a ν(u)]† R−1 [z − a ν(u)])
= a∗ [∇u ν † (u)] R−1 [z − a ν(u)] + a [∇u ν T (u)] R−T [z − a ν(u)]∗
∇a ∗ log p(z; a, u) = ν † (u) R−1 [z − a ν(u)]
% &
Ju,a ∗ = a [∇u ν T (u)] R−T [z − a ν(u)]∗ [z − a ν(u)]T R−T ν ∗ (u)
= a [∇u ν T (u)] R−T ν ∗ (u) , (3.250)
and for the non-conjugated attenuation term

% &
Ju,a = [∇u log p(z; a, u)] [∇a log p(z; a, u)]T
% &
= a∗ [∇u ν † (u)] R−1 [z − a ν(u)] [z − a ν(u)]† R−1 ν(u)
= a∗ [∇u ν † (u)] R−1 ν(u) . (3.251)
From the definition of the information matrix, reversing the parameters is given
by the transpose of the information matrix, so that
Ja ∗ ,u = JTu,a ∗
= aν † (u) R−1 [∇u T ν(u)]
Ja,u = JTu,a
= a∗ ν T (u) R−T [∇u T ν ∗ (u)] . (3.252)
The second and third terms for the reduced Fisher information matrix are given
by
[∇u ν † (u)] R−1 ν(u) ν † (u) R−1 [∇u T ν(u)]
Ju,a J−1
a ∗ ,a Ja ∗ ,u = a
2
(3.253)
ν † (u) R−1 ν(u)
and
−T ∗
2 [∇u ν (u)] R
T
ν (u) ν T (u) R−T [∇u T ν ∗ (u)]
Ju,a ∗ J−1
a,a ∗ Ja,u = a
ν † (u) R−1 ν(u)
−1
∗
= Ju,a Ja ∗ ,a Ja ∗ ,u . (3.254)
Consequently, the reduced Fisher information matrix is given by
3 4
(r )
Ju,u = Ju,u − 2 Ju,a J−1 a ∗ ,a Ja ∗ ,u
0 1 3 4
= 2 a 2 V̇† R−1 V̇ − 2 Ju,a J−1 a ∗ ,a Ja ∗ ,u
0 1
= 2 a 2 V̇† R−1 V̇
+ 7
† −1 † −1
V̇ R ν(u) ν (u) R V̇
− 2 a 2 . (3.255)
ν † (u) R−1 ν(u)
By using a spatially whitened set of variables the reduced information matrix

can be rewritten as
0 1
(r )
Ju,u = 2 a 2 Ẋ† Ẋ − Ẋ† x(u)[x† (u) x(u)]−1 x(u)† Ẋ , (3.256)
where the spatially whitened vector and derivative matrix are defined by
x(u) = R−1/2 ν(u)
Ẋ = R−1/2 V̇ . (3.257)
This form can be simplified further by defining the projection operator orthog-
onal to the column space spanned by x(u) and is given by
P⊥ †
x(u) = I − x(u) [x (u) x(u)]
−1 †
x (u) . (3.258)
By using this operator, the final form for the reduced Fisher information matrix
is given by
0 1
(r )
Ju,u = 2 a 2 Ẋ† P⊥x(u) Ẋ . (3.259)
Problems
3.1 Evaluate the variance of a random variable from the log-normal distribu-
tion.
3.2 If the real variables X and Y with values x and y are given by Y = X 2 ,
evaluate the probability density for Y given the density for X for the cases
(a) in general,
(b) if X is given by a Rayleigh distribution.
3.3 Evaluate the characteristic function of the sum of independent variables

drawn from a real Gaussian distribution and a Rayleigh distribution.
3.4 Evaluate the first four central moments of a random variable that is char-
acterized by unit variance and is uniform over phase.
3.5 For an n-vector with norm-square of q, randomly drawn from a circularly

symmetric complex Gaussian distribution with a covariance matrix proportional
to the identity matrix, what is the probability that after projecting onto a
subspace of rank 3n/4 that at least half of norm-squared of the vector q/2 re-
mains, assuming that n is given by
(a) 4,
(b) 8,
(c) 12?
3.6 For large Wishart matrices, constructed by the outer product G G† where
the matrix G ∈ Cm ×n contains entries drawn independently and randomly from
a complex circular Gaussian distribution, evaluate the approximate peak-to-
average eigenvalue ratio under assumptions of
(a) n/m = 1
(b) n/m = 2
(c) n/m = 4
(d) n/m = 16.
3.7 For m independent observations of complex variable Z with value z, given

by Z = a eiθ + N , where a is an unknown deterministic real amplitude and θ is
an unknown deterministic real phase with additive circularly symmetric complex
Gaussian noise N with value n and variance σn2 , evaluate the minimum phase
estimation variance for an unbiased estimator.
3.8 Let h(t) be the impulse response of a linear-time-invariant system such that
h(t) is square integrable. Let the input to this system be a stationary random
process X(t). Show that the autocorrelation function of the output process Y (t)
is given by
←
−
RY Y (τ ) = h ∗ h ∗ RX X (τ ) (3.260)
Problems 117
and the power spectral density of Y (t) is

SY (f ) = |H(f )|2 SX (f ). (3.261)
3.9 Let Y = X12 + X22 + · · · + XK 2
, where Xi are independent, identically
distributed Gaussian random variables with zero mean and unit variance. Show
that Y follows a χ2 distribution with K degrees of freedom.
3.10 Suppose that there are n unit-power transmitters distributed indepen-
dently and identically with uniform probability in a disk of radius R with polar
coordinates (r1 , θ1 ), (r2 , θ2 ), . . . (rn , θn ) . Let the aggregate signal received at the
center of the circle be given by

n
I= rk−α , (3.262)
k =1
where α > 2.
(a) Show that the mean signal power at the center of the disk I is infinite.
(b) Show that the signal power at the center of the disk I is finite with proba-
bility 1.
3.11 Use Equation (3.5) to derive Equation (3.59) from Equation (3.53).
4 Wireless communications
fundamentals
4.1 Communication stack
For convenience in design, the operations of radios are often broken into a number
of functional layers. The standard version of this stack is referred to as the open
systems interconnection (OSI) model [291], as seen in Figure 4.1. The model
has two groups of layers: host and media. The host layers are the application,
presentation, session, and transport layers. The media layers are the network,
data-link, and physical layers. In many radio systems, some of these layers are
trivial or the division between the layers may be blurred. The OSI stack is
commonly interpreted in terms of wired networks such as the internet. Depending
upon the details of an implementation, various tasks may occupy different layers.
Nonetheless, the OSI layered architecture is useful as a common reference for
discussing radios. In this text, the media layers are of principal importance.
The network layer indicates how data are routed from an information source
to a sink node, as seen in Figure 4.2. In the case of a network with two nodes,
this routing is trivial. In the case of an ad hoc wireless network, the routing
may be both complicated and time varying. The network layer may break a
data sequence at the source node into smaller blocks and then reassemble the
data sequence at the sink node. It also may provide notification of errors to the
transport layer.
The data-link layer controls the flow of data between adjacent nodes in a
network. This layer may provide acknowledgments of received data, and may or
may not contain error checking or correction. Sometimes this layer is broken into
the logical-link-control and media-access-control (MAC) sublayers. The MAC is
used to control the network’s reaction to interference. The interference might be
internal, that is, caused by the network’s own links, or external, that is, cause by
a source not under the network’s control. The logical-link-control sublayer is used
by the protocol to control data flow. The logical-link-control sublayer interprets
frame headers for the data-link layer. The MAC specifies a local hardware address
and control of a channel.
The physical layer defines the mapping of information bits to the radiated sig-
nal. The physical layer includes error-correction coding, modulation, and spectral
occupancy. It also includes all the signal processing, at both the transmitter and
receiver.
Figure 4.1 OSI stack.
Simple Network
Sink
Source
Figure 4.2 Network of nodes with a connection between a source node and a sink node.
4.2 Reference digital radio link
The basic physical layer of a digital radio link has nine components: data source,
encoding, modulation, upconversion, propagation, downconversion, demodula-
tion, decoding, and data sink. While not all digital radios conform to this struc-
ture, this structure is flexible enough to capture the essential characteristics for
discussion in this text. Here we have distinguished between up/downconversion
and modulation. This distinction is a convenient convention for digital radios.
In practice there is a large variety of data sources. The classic modern example
is the cellular or mobile phone [260]. The modern mobile phone is used for
internet access, data, video, and occasionally voice. For this discussion, we will
focus on voice communications. In the uplink, voice data are sent from the phone
to the base station. The analog voice signal is digitized and compressed by using
a vocoder. There are a variety of approaches to vocoders that in general provide
significant source compression. The raw digitized signal might require a data
rate of as much as 200 kbits/s or more. The signals compressed by vocoders
typically require around 10 kbits/s. These data, along with a number of control
parameters, are the data source.
120 Wireless communications fundamentals
Figure 4.3 Examples of modulations: BPSK, QPSK, 8-PSK, and 16-QAM.
The encoding of the data typically includes some approach to compensate for
noisy data, denoted forward-error-correction (FEC) encoding. Error-correction
codes introduce extra parity data to compensate for noise in the channel. With a
strong code, multiple errors caused by noise in the channel can be corrected. The
theoretical limit for the amount of data (that is, information not parity) that
can be transmitted over a link in a noisy channel with essentially no errors is
given by the Shannon limit and is discussed in Section 5.3. Modern codes allow
communication links that can closely approach this theoretical limit. Coding
performance and computation complexity can vary significantly.
Following the data encoding is the modulation. Depending upon the details of
the forward-error-correction coding scheme, the error-correction algorithms used
may or may not be strongly coupled with the modulation. As an example, trellis
coding strongly couples modulation and coding [316, 317].
The modulation translates the digital data to a baseband signal for transmis-
sion. The baseband signal is centered at zero frequency. Associated with each
transmit antenna are an in-phase and a quadrature signal. It is often convenient
to view these signals as being complex, with the real component correspond-
ing to the in-phase signal and the imaginary component corresponding to the
quadrature signal. The variety of modulation schemes vary in complexity. Some
examples shown in Figure 4.3 are binary phase-shift keying (BPSK), which uses
symbols
BPSK: {−1, +1} , (4.1)

quadrature phase-shift keying (QPSK) which uses symbols

−1 − i −1 + i +1, +i +1 − i
QPSK: √ , √ , √ , √ , (4.2)
2 2 2 2
M-ary phase-shift keying (PSK), which uses symbols
PSK: {e2π i n /M } ; n ∈ {0, . . . , M − 1} , (4.3)
and M-ary quadrature amplitude modulation (QAM), which uses symbols
QAM: {±p ± i q} , (4.4)
such that the regularly spaced values of p and q form a square lattice with M
points. The set of symbols observed in the complex plane is commonly referred
to as a constellation.
The overall scale of these modulations is somewhat arbitrary. The important
scale is relative to the interference-plus-noise amplitude at the receiver. For real
systems, channels, filters, and other effects distort these idealized modulations.
Orthogonal-frequency-division multiplexing (OFDM), which is discussed in
greater detail in Section 10.5.3, is a common modulation approach that builds a
composite symbol from a sequence of simpler symbols, such as QPSK. It does this
by transmitting over time the inverse fast Fourier transform (IFFT) of a sequence
of symbols. As a consequence, each simple symbol is associated with a bin of the
fast Fourier transform (FFT), and, given a sufficiently narrow FFT subcarrier,
the communication system comprises a set of flat-fading channels. This is a
useful approach for environments with frequency-selective fading, particularly if
the channels are relatively static over time.
For analog communications, it was common to think of signal modulation
and frequency upconversion both as modulation. For digital communications, it
seems more natural to make the distinction between modulation and frequency
upconversion clearer. The modulation is typically done digitally. The frequency
upconversion may be either digital, analog, or both, depending upon the sys-
tem. Some systems perform the frequency upconversion in multiple steps. For
example, it is often convenient to upconvert to an intermediate frequency (IF)
digitally,1 then to upconvert to the carrier frequency using analog circuitry. This
approach is basically a modern version of a superheterodyne transmitter. Math-
ematically, upconverting to a carrier frequency, f0 , can be performed by mul-
tiplying the complex baseband signal as a function of time, s(t), by the term
e−iω t , where ω = 2πf0 is the angular frequency. The physical signal is given by
the real part of this product,
{e−iω t s(t)} = {s(t)} cos(ωt) + {s(t)} sin(ωt) . (4.5)
This approach takes advantage of the orthogonality of sin and cos. For analog
upconversion, the IF signal is multiplied by a real carrier then filtered, as seen
1 Many modern systems employ direct conversion, avoiding the IF stage because of
integrated circuit (IC) advantages.
Figure 4.4 Example of a communication system with a digital IF upconversion

to the frequency fIF , followed by an analog upconversion to the carrier frequency
fIF + fUP .
Figure 4.5 Examples of multipath scattering in an environment that is observed by

multiple receivers.
in Figure 4.4. The mixer creates images at the sum and difference of the IF
frequency and the analog upconversion frequency. Because of filter design con-
straints, it is helpful to keep the IF frequency reasonably high so that the filter
can easily select one of these images. For logistical reasons, even more stages are
sometimes used.
4.2.1 Wireless channel

Channels have a wide variety of characteristics. A simple static line-of-sight chan-
nel with a single transmit antenna and single receive antenna can be character-
ized by a single complex number. Channels between multiple-antenna transmit-
ters and multiple-antenna receivers are considered in detail in Chapter 8. More
typical and more interesting channels are subject to the effects of time-varying
multipath scattering. For such channels, the transmitted signal bounces off vari-
ous scatterers in the environment. Because of the spatial distribution of scatter-
ers, the receiver observes the transmitted signal coming from a distribution of
angles at various delays, displayed in Figure 4.5. The signal that is observed at
the receiver is the result of the sum of the delayed versions of the transmitted
signal as discussed in Chapter 10. If these delays are significant compared to
inverse of the bandwidth of the signal then there is said to be delay spread. De-
lay spread introduces channel sensitivity to frequency. Consequently, channels
with delay spread are said to be frequency selective. If there is motion in the
environment or if the transmitter or receiver is moving, the channel will change
over time. Because the directions to various scatters are not typically identical,
motion introduces a range of Doppler frequency shifts. In this regime, it is said
that the channel has Doppler spread. The effects of this complicated environment
can be mitigated or even exploited by using adaptive techniques as discussed in
Chapter 10.
For the sake of convenience, the channel attenuation that is caused by mul-
tipath scattering is often factored into a term associated with fading that in-
corporates the variation in the channel due to relative delays and motion, and
a term associated with overall average attenuation. The average attenuation is
typically parameterized by the link length, r. Typically, average signal power in
ad hoc wireless networks is assumed to decay with distance r as r−α e−γ r , where
α is known as the path-loss exponent and γ is an absorption coefficient. In most
works in the literature, γ is set to zero and α > 2.
4.2.2 Thermal noise

The introduction of noise is usually associated with the internal noise of the
receiver. In 1928, John Johnson [167] observed that thermal noise in a conduc-
tor was proportional to bandwidth and temperature. This result was discussed
by Harry Nyquist [233]. This noise is associated with the black-body radiation
[236]. It turns out that proportionality to temperature and bandwidth is a low-
frequency approximation. At higher frequencies, the effects of quantum mechan-
ics reduce the noise spectral density. However, for frequencies f of interest, the
classical approximation is accurate because the frequency is far from the quantum
limit,
f kB TK /h
≈ 6 THz at room temperature, (4.6)
where this kB ≈ 1.38 · 10−23 J/K is the Boltzmann constant in SI units, h ≈

6.624 · 10−34 Js is the Planck constant in SI units, and TK is the absolute tem-
perature that is expressed here in Kelvin. The observed receive noise power Pn
is bounded from below by thermal noise
Pn ≥ kB TK B . (4.7)
The bandwidth is given by B. In practice, the noise power is a number of decibels

higher than the thermal limit. In order to characterize the noise in a system, it
is common to cite an effective temperature such as TK = 1500 K, which is much
higher than the real room temperature of around TK = 290 K. Alternatively,
the noise of the system is characterized by a noise figure, fn , which is usually
presented in decibels. Expressed on a linear scale, the noise figure multiplies the
right-hand side of Equation (4.7). The noise figure of a good receiver might be
two to three decibels. However, it is not uncommon to have noise figures a few
decibels higher.
The channel also includes external interference that may come from unin-
tended spectral sidelobes of nearby spectral occupants or competing users at
the same frequency. External interference is a common issue in the industrial-
scientific-medical (ISM) band in which WiFi operates [150]. In the case of ad hoc
wireless networks, this interference may be caused by other users in one’s own
network.
Similar to upconversion, downconversion is used to transform the signal at car-
rier frequency to a complex baseband signal s(t). The downconversion may be
performed in a single step or in multiple steps using an intermediate frequency.
These conversions may be performed digitally or by using analog circuitry. As
an example, a single-stage downconversion can be notionally achieved by multi-
plying the received signal by the complex conjugate of the upconversion, eiω t ,
eiω t [s(t) e−iω t ] = eiω t ({s(t)} cos(ωt) + {s(t)} sin(ωt))

= {s(t)} cos2 (ωt) + i sin(ωt) cos(ωt)

+ {s(t)} i sin2 (ωt) + sin(ωt) cos(ωt)

1 1 1
= {s(t)} + cos(2ωt) + i sin(2ωt)
2 2 2

1 1 1
+ {s(t)} i − i cos(2ωt) + sin(2ωt) . (4.8)
2 2 2
It is clear from the above form that the signal is broken into a high-frequency
component centered at 2ω and a baseband component near zero frequency. Un-
der the assumption that ω is large compared with 2πB, where B is the signal
bandwidth, the baseband signal can be recovered by applying a lowpass filter.
This will remove the 2ωt terms, giving the downconverted signal,
1 1
({s(t)} + i{s(t)}) = s(t) . (4.9)
2 2
Demodulation covers a range of approaches to working with the received base-
band signal. For multiple-antenna receivers, some form of signal combining can
be used to improve demodulation performance by increasing signal-to-noise ratio
(SNR) or mitigating cochannel interference. Two basic classes of decoders are
the hard and the soft decoder. As an example, consider a complex baseband
QPSK signal in noise, displayed in Figure 4.6. Because of the noise, the received
points lie in a region near but not at the transmitted QPSK symbol. A hard
decision on the two modulated bits can be made based on the quadrant in which
the signal is observed. These hard decisions can be sent to the decoder. In the
example depicted in Figure 4.6, some hard decisions (for example, the dark gray
dots in the bottom left quadrant) will be incorrect. In modern receivers, it is
more common to estimate the likelihood of the possible bit states and pass these
“soft decisions” to the decoder. In general, hard decisions reduce computation
Imagninary Part Of Received Signal

1
−1
−2
−2 −1 0 1 2
Real Part Of Received Signal
Figure 4.6 Example of an ensemble of points drawn from a complex QPSK

modulation with amplitude 1 in the presence of Gaussian noise with an SNR of 4 dB.
The points at various shades of gray correspond to four originating points.
complexity, while soft decisions improve performance. Soft decisions also blur
the line between demodulation and decoding.
The decoder is intimately tied to the encoder. By using the hard or soft de-
cisions provided by the demodulator, the decoder extracts an estimate of the
original data sequence. Strong decoders can compensate for large symbol error
rates.
Finally, the data sink uses these data. In the case of a mobile phone, the
vocoded signal is reproduced. In the case, of a wireless network, the data may
be repackaged and retransmitted along the next link in the network.
4.3 Cellular networks
The cellular network topology is commonly found in mobile telephone systems.

A typical cellular network comprises multiple access points or base stations that
are connected to a tethered infrastructure, and mobile units such as telephones
that communicate with one or more base stations, although each mobile typically
communicates with one base station. This topology enables coverage over large
areas; additionally, through base station control, it ensures that nearby nodes
operate with minimal interference to each other.
The link carrying data from a mobile unit to a base station is called the uplink
and the converse is called the downlink. For a given cell, the uplink is a many-to-
one network as multiple mobile units typically connect to a single base station,
and is called the multiple-access channel in information theory. The downlink
between a single base station and multiple mobile units forms a one-to-many
network, and is called the broadcast channel in information theory.
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Figure 4.7 Cellular model with Poisson cells, with base stations denoted by dots.
Base-station locations are selected on the basis of many factors, including

usage patterns, availability of space, and terrain, thereby making detailed math-
ematical analysis extremely difficult. For analytical tractability, two simple mod-
els are often used to describe cellular networks: the Poisson-cell model and the
hexagonal-cell model. In both cases, it is typically assumed that mobile units
communicate with their nearest (in Euclidian distance) base station although in
practice, mobile units typically communicate with the base station with which
they have the highest signal-to-interference-plus-noise ratio (SINR).
In the Poisson-cell model, base stations are assumed to be distributed on a
plane according to a Poisson point process (PPP) with a constant average like-
lihood (or area density) ρb . The Poisson point process is in a sense the most
random distribution of points as it assumes that every point is located indepen-
dently from other points and the number of points in any two disjoint regions
are independent random variables.
Figure 4.7 illustrates the Poisson-cell model in which base stations are denoted
by circles and the lines divide the plane into cells. A mobile unit that falls within a
given cell is typically modeled as communicating with the base station associated
with that cell.
Alternatively, base stations can be modeled as located on a hexagonal grid,
which results in hexagonal cells in a honeycomb pattern as in Figure 4.8. As-
suming that the coverage area of a base station is a disk centered on the base
3.5
2.5
1.5
0.5
−0.5
−1
−1.5
−2
−2 −1 0 1 2 3 4 5
Figure 4.8 Cellular model with hexagonal cells, with base stations denoted by dots.
station, the hexagonal-cell model results in the most efficient coverage of a given
two-dimensional area, that is the fewest number of base stations are required to
cover a given area.
Note that both the Poisson-cell model and the hexagonal-cell model are
opposite extremes. In reality, base-station locations are carefully planned, but
are subject to geographical and usage variances.
4.3.1 Frequency reuse

In many cellular systems, it is desirable for the base stations to have minimal in-
teraction with one another. To minimize interference from nearby cells (intercell
interference), the total communication bandwidth is divided into κ orthogonal
channels, and nearby cells operate on different channels. Hence, mobile units in
nearby channels do not interfere with each other. The number κ is called the
frequency-reuse factor. Note that some authors define the frequency-reuse factor
as 1/κ.
For the Poisson-cell model, the minimum value for κ so that no two adjacent
cells share the same channel is four by the celebrated four-color theorem, which
states that the minimum number of colors required to color an arbitrary two-
dimensional map such that no two adjacent countries have the same color is
four. For regular cells such as in the hexagonal-cell model, a smaller number of
channels may be possible. For instance, Figure 4.9 shows a channel assignment
in which no two adjacent cells share the same band with κ = 3. In practice, κ
may take on values as large as 11, which is done to reduce intercell interference
in systems with small cell sizes.
3 2 1 3 2
2 1 3 2 1 3
3 2 1 3 2
2 1 3 2 1 3
3 2 1 3 2
2 1 3 2 1 3
Figure 4.9 Channel assignment for hexagonal cells with reuse factor three.
Typically, within each cell, transmissions are orthogonalized using some

multiple-access approach such as time-division multiple access (TDMA) or code-
division multiple access (CDMA), which are sometimes combined with space-
division multiple access (SDMA) using sectored or adaptive antennas. Hence, we
may view cellular systems as comprising large-scale frequency-division multiple
access (FDMA) to mitigate intercell interference combined with T/C/SDMA to
mitigate intracell interference. It is not uncommon for CDMA systems to employ
a frequency reuse factor of 1. Consequently, they suffer from intercell interference
which reduces the number of simultaneous users that they can support.
4.3.2 Multiple access in cells

Since nearby cells often operate in different frequency bands (large-scale FDMA),
it is common to assume that, within each cell, nodes do not experience interfer-
ence from other cells, although there has been recent work that explicitly models
out-of-cell interference; that is described in Chapter 13. Additionally, the uplink
from the mobile units to base stations and downlinks from base stations to mobile
units typically occupy different frequency bands.
Within each cell time-division multiple access or code-division multiple access
is typically used and sometimes combined with sectored-antennae that essen-
tially divide users spatially. Time-division multiple access in cellular networks
is conceptually straightforward and is used in systems such as Global System
Guard time/interval
Slot 0 Slot 1 Slot 2 Slot K−1 Slot 0

Frame
Figure 4.10 Time slots for time-division multiple access.
for Mobiles (GSM). The base station assigns mobile users to noninterfering time
slots and provides an accurate common timing reference for all in-cell users.
Basic time-division multiple access

In time-division multiple-access systems, the base station divides time into in-
dividual slots and assigns each slot to a single mobile unit for transmission, as
illustrated in Figure 4.10. A duration of time called a frame is divided into K
time slots with guard intervals between the time slots to handle timing offsets
at different mobiles caused by propagation delays and mismatches in timing
synchronization.
Basic code-division multiple access

For code-division multiple-access systems, users are separated by encoding their
information with waveforms that allow their signals to be detangled at the base
station. For instance, on the downlink, the base station typically encodes users’
signals by using orthogonal functions such as the Walsh codes illustrated in
Figure 4.11.
For a simple illustration of orthogonal CDMA, consider a toy example with
four mobile units in a given cell where the base station wishes to communicate
one data symbol per mobile. For the length-4 Walsh functions shown in Figure
4.11, the base station may assign the kth function (or code) to the kth user.
To communicate the data symbol xk to the kth user, the base station transmits
xk ck (t). Assuming an ideal channel, the kth user receives the following signal

4
yk (t) = xj cj (t) + nk (t), (4.10)
j =1
where nk (t) is the noise process at the kth receiver. The kth receiver may recover
a noise-corrupted version of xk by filtering the received signal through a filter
matched to its code ck (t) and sampled at time t = 0,
⎛ ⎞
∞ 4
1
rk (0) = dτ ck ∗ (τ ) ⎝ xj cj (τ ) + nk (τ )⎠
4 −∞ j =1
= xk + nk , (4.11)
c3 (t) c4 (t)
c1 (t) c2 (t)
Figure 4.11 Walsh functions.
∞
where nk = 14 −∞ ck ∗ (τ ) nk (τ ) dτ . Thus, the interference is eliminated as rk (0)
does not contain any contribution intended for the other mobile units.
Note that length-M Walsh functions can also be represented as length-M
vectors, where ck (t) for M = 4 can be represented by the following vectors:
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 1 1 1
⎜ 1 ⎟ ⎜ 1 ⎟ ⎜ −1 ⎟ ⎜ −1 ⎟
c1 = ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ 1 ⎠ , c2 = ⎝ −1 ⎠ , c3 = ⎝ −1 ⎠ , c4 = ⎝ 1 ⎠ .
⎜ ⎟ (4.12)
1 −1 1 −1
With this representation, the operation of matched filtering followed by sampling
can be interpreted as a simple inner product. A matched filter, discussed for
spatial processing in Section 9.2.1, has a form that has the same structure as
that expected of the signal. Here the matched filter is given by the structure of
the Walsh function spreading sequence.
On the uplink, mobile users do not encode their signals using orthogonal codes
because the transmitted signals from the mobile units pass through different
channels, thus destroying orthogonality. Additionally, the Doppler spread (dis-
cussed in detail in Chapter 10) resulting from relative motions of the different
mobile units causes signals received by the base station from different mobiles
to no longer be orthogonal. For these reasons, practical CDMA systems utilize
random spreading codes on the uplink from the mobiles to base stations.
Consider the following simple example in which two mobile users (nodes 1 and
2) are in the same cell. Let hk (t) denote a linear-time-invariant (LTI) channel
impulse response between the base station and node k. Furthermore, we shall
assume reciprocity.
On the downlink, let the base station encode the signal intended for the kth
user by using the function dk (t). The received signal at the kth mobile user is
given by
yk (t) = x1 hk ∗ d1 (t) + x2 hk ∗ d2 (t) + nk (t) . (4.13)
Suppose that the kth mobile user is able to invert the channel perfectly by using
an equalizer whose impulse response is fk (t). The equalized signal for user k is
given by
ỹk (t) = x1 d1 (t) + x2 d2 (t) + fk ∗ nk (t) . (4.14)
Mobile node 1 can then match filter ỹk (t) with d1 and sample at time 0 to remove
the interference. Node 2 can do the same with d2 (t).
The situation is different on the uplink from the mobile units to the base
stations. Let the kth user encode its transmit symbol xk by using the function
ck (t). The received signal at the base station is given by
y(t) = x1 h1 ∗ c1 (t) + x2 h2 ∗ c2 (t) + n(t) . (4.15)
In this case, unless h1 (t) and h2 (t) take very specific forms, it is not possible to
simultaneously invert h1 (t) and h2 (t). Any orthogonality associated with c1 (t)
and c2 (t) will be lost, making it not very useful to employ orthogonal codes on
the uplink. Instead, pseudorandom (but known at the base station) codes are
used. If the codes are sufficiently long, they will be nearly orthogonal as the
following analysis illustrates.
Consider a collection of M codes of length M where each code vector has M
zero-mean i.i.d. entries of variance 1/M . Let the jth entry of ck be denoted by
cij . We then have

M
1 √
M
c†k ck = c∗ij cij = | M cij |2 . (4.16)
j =1
M j =1
By the law of large numbers, for large M

1 √
M
c†k ck = | M cij |2 (4.17)
M j =1
√
≈ var{ M cij } = 1 . (4.18)
Similarly, when M is large, we have the following for i = k:
1
M
c†k ck = M c∗ij ck j (4.19)
M j =1
% & % &
≈ M c∗ij ck j = M c∗ij M ck j = 0 (4.20)
since cij and cik are zero-mean and uncorrelated random variables.
Nominal beam
pattern for one
directional antenna
Three directional
antennas
combined to
cover cell in three Cell boundary
sectors
Figure 4.12 Illustration of coverage of a nominal sectored cell using directional

antennas.
Hence, we conclude that long random codes√ are nearly orthogonal. Note that,
typically, cij are assigned values from ±1/ M with equal probability although
the above analysis holds for other distributions as well.
Space division multiple access with sectored antennas

Base stations are often equipped with sectored antennas, which are antennas
that focus their energy in several specific directions. A cell is typically divided
up into sectors with one or more antennas assigned to radiate with a beam
pattern that covers that sector with little energy spillage into adjacent sectors.
Sectored antennas allow base stations to simultaneously transmit to multiple
mobiles in a given cell if they are in different sectors.
Figure 4.12 illustrates a nominal cell that is divided into three sectors. The
solid lines represent the beam patterns of the three directional antennas used
in this sectored antenna. The base station can transmit to three mobile units
in different sectors simultaneously with minimal interference unless at cell
boundaries.
4.4 Ad hoc wireless networks
A simpler network topology is an ad hoc wireless network as illustrated in Figure

4.13. Such a network does not have central controllers (such as base stations in
cellular networks), and data links are typically between a single transmitter and
receiver pair although variations that include one-to-many and many-to-one links
also exist.
Because of the lack of central control, any algorithms used in ad hoc networks
have to be distributed in nature. From a practical standpoint, simple algorithms
Figure 4.13 Ad hoc network with transmitters denoted by dots and receivers by circles.
are attractive in such networks because the overhead required to synchronize a

spatially distributed ad hoc network is high.
Characterizing the capacity of such networks is very difficult because there
are many different ways that nodes can cooperate with each other. At the time
of writing, the capacity of 2 × 2 wireless links is still unknown, although it
is known in certain regimes and approximations to the capacity region have
been found. Thus, the problem of characterizing the capacity region of such
networks is very difficult. Most work on the capacity of ad hoc wireless networks
to date has focused on capacity-scaling laws, which are the rates at which the
throughput in the network changes with the number of nodes in the network
n. Two general models are used. Dense networks are networks whose area is
constant and thus density of nodes increases with n. Extended networks are
network whose the area is increased with the number of nodes such that the
area density of nodes is constant. Note that scaling laws are typically given
in “order-of-growth” type expressions. The pre-constants of the order-of-growth
expressions are often ignored, and, as such, these results are usually only useful
when n is very large.
Figure 4.14 illustrates some of the key results on the capacity-scaling laws of
ad hoc wireless networks. The results illustrated here apply to dense networks
with some differences in network models. For instance, the Gupta–Kumar model
[132] does not include channel fading, and the Ozgur et al. [240] model uses a
specific fading model.
For TDMA systems, time is split up among the n nodes and each node is
assigned a time slot in a round-robin fashion. Since each node gets only 1/n th
of the time to communicate, its throughput decays as 1/n. The Gupta–Kumar
work assumes multi-hop communications in which each physical-layer link is
Per-link capacity
1
TDMA ~ n
1
Multi-hop - Gupta & Kumar (2000) etc. ~
n
Ozgur, Leveque, Tse (2007) ~ Constant
Number of
>1000 >10 9 nodes, n
Figure 4.14 Illustration of key results on the capacity-scaling laws of ad hoc wireless
networks. Note that the per-link rates illustrated in this figure are qualitative.
between a given node and one of its nearby nodes. Since the distances between
√
nodes and their nearest neighbors decay with 1/ n, the signal power received
by a node from its nearest neighbors increases as nα /2 . However, so does the
interference power. Thus, the physical-layer links can maintain approximately
constant signal-to-interference ratios and approximately constant data rates with
increasing n. Because the number of hops required to traverse a fixed distance
√ √
increases as n, the data rate also decays approximately as n. Gupta and
Kumar showed this principle for a specific traffic pattern, and it was extended
to random traffic patterns by Franceschetti et al. [101]. The Ozgur et al. result
uses a hierarchical cooperation scheme with distributed multiple-input multiple-
output (MIMO) links, where collections of nearby nodes act as virtual antenna
arrays. More details on these results are given in Chapter 14.
4.4.1 Achievable data rates in ad hoc wireless networks
Single-antenna systems
A different approach used in analyzing ad hoc wireless networks is to find the
data rates achievable for a given outage probability in networks in which nodes
utilize specific communication schemes. The primary tool for such analyses has
been stochastic geometry, where nodes are typically modeled as distributed on
a plane according to a homogenous Poisson point process that places nodes
randomly with uniform spatial probability density.
For narrowband systems that have nodes distributed according to a Poisson
point process, that transmit with i.i.d. power levels, the aggregate interference
power seen at a typical node in the network can be characterized by its charac-
teristic function, although the CDF and PDF of the interference powers is not
known. For the case of an ad hoc wireless network with α > 2 and the density
of transmitting nodes equaling ρ, the characteristic function of the interference
power I with transform variable s is given in Reference [133]:
5 6
% & 2
e−sI = exp −ρπ (hP )2/α Γ 1 −
2
sα , (4.21)
α
% &
where (hP )2/α is the expected value of the product of the channel fading
coefficients h and the transmit powers P of the transmitting nodes.
The characteristic function is particularly useful in computing the probability
that the signal-to-interference ratio (SIR) of a representative link exceeds some
threshold in Rayleigh fading channels. This property is the result of the fact
that the received signal power conditioned on the transmit power and link length
in Rayleigh fading channels is distributed as a unit-mean exponential random
variable (see Section 3.1.10) whose CDF PE xp (x) is
1 − e−x , x ≥ 0,
PE xp (x) = (4.22)
0, x < 0.
For x ≥ 0, the complementary cumulative distribution function (CCDF) is
1−PE xp (x) = e−x . With S and I representing the received signal and interference
powers respectively, the probability that the SIR is greater than some threshold
τ is given by
Pr(S/I > τ |I) = Pr(S > τ I|I) (4.23)

= exp(−τ I) . (4.24)
Writing the probability density function of the interference as pk and integrating

out with respect to the interference power I yields

Pr(S/I > τ ) = dI Pr(S/I > τ |I) pk (I) (4.25)

= dI e−τ I pk (I) (4.26)
% −τ I&
= e (4.27)
5 6
2 2
= exp −ρπ (hP )2/α Γ 1 − τα . (4.28)
α
Using tools from stochastic geometry, Weber et al. [340] introduced the idea of
transmission capacity, which is the product of the data rate and the maximum
density of nodes in an ad hoc network that achieves a particular SINR for a
given outage probability weighted by the probability. This quantity enables a
more direct analysis of the achievable data rates in such networks. The authors
in Reference [340] use transmission capacity to compare direct-sequence CDMA
that uses matched-filter receivers, with frequency-hopping CDMA in networks
with spatially distributed nodes. They find that the order-of-growth of the trans-
mission capacity with spreading length of frequency-hopping CDMA systems is
larger than that of direct-sequence CDMA systems.
More formally, consider a wireless network with nodes distributed according

to a Poisson point process with intensity (or user density) ρ. Suppose that links
in the network have a target data rate of rt , which is achievable if the SINR on
each link exceeds a threshold γ. Then define the contention density ρ as the
maximum density of nodes such that the probability that the SINR is less than
or equal to the threshold γ is less than or equal to . In other words,
ρ = max such that Pr(SINR ≤ γ) ≤ . (4.29)
ρ
The transmission capacity cT is then defined as

cT = ρ (1 − )rt . (4.30)
Note that the transmission capacity cT is the product of the maximum density of
nodes that achieves the target SINR, and the probability of a link achieving the
target SINR and the communication rate that is supportable given that SINR.
Multiple-antenna systems
Antenna arrays in ad hoc wireless networks are useful both in terms of spatial
multiplexing (that is enabling a single user to transmit multiple data streams)
as well as for SDMA. By nulling out the interference contribution from nearby
nodes, it is possible to get significant increases in SINR and hence data rates.
With N antennas per receiver, it is possible to null out the interference from
N − 1 sources in ideal situations; however, this interference mitigation may come
at the expense of some loss of signal SNR at the output of the mitigating receiver.
Alternatively, it is also possible to increase signal power by a factor of N relative
to noise by coherently adding signals from N antennas coming from a target
signal source.
The following simple heuristic argument shows that it is possible to achieve
signal-to-interference-plus-noise ratio scaling on the order of (N/ρ)α /2 in ad hoc
wireless networks with a power-law path-loss model.
Consider a receiver with N antennas at the center of a circular network of
interferer density ρ as illustrated by the square in Figure 4.15. Suppose this
receiver uses a fraction ζ of its degrees of freedom to null the ζN − 1 interferers
closest to it. When N is large, these nulled interferers occupy a circle of radius
approximately equal to

ra = (ζN − 1)/πρ ≈ ζN/πρ . (4.31)
Assuming that the interferers are distributed continuously from ra to infinity and
integrating their interference contribution, we find that the residual interference
grows as ra2−α ∼ (N/ρ)1−α /2 . Suppose that the remaining (1 − ζ)N degrees of
freedom are used by the receiver to increase the SINR relative to thermal noise
and residual interference by a factor N . Then, the SINR grows as (N/ρ)α /2 .
In networks with α > 2, which are common, the SINR growth with number of
antennas is greater than linear, which would be the case for simple coherent com-
bination. Additionally, this heuristic analysis indicates that it may be possible to
Closest un-
nulled
interferer
Nulled region
Continuous distribution
of un-nulled interferers.
Figure 4.15 Illustration of interference contribution from planar network with nulling
of nearby interference.
increase the density of users and maintain the same SINR by linearly increas-
ing the number of antennas per receiver with the node density. This has been
shown independently for different sets of assumptions in References [56], [123],
and [164].
4.5 Sampled signals
There are some subtleties in considering sampled signals and channels. Some of
these effects with regard to channels are explored in more detail in Section 10.1.
From the basic physics, in some sense, all electromagnetic signals are quantized
because the signal is mediated by photons that have quantized energies. However,
at frequencies of interest for most wireless communications, the energy of a single
photon is so small that it is not useful, although it is an important consideration
for optical communications. The large number of photons received at an antenna
are so very large that statistically they can be modeled well by continuous signals.
Somewhat amusingly, because essentially all modern communications are digital,
the physical signals are quantized in energy and time, although this quantization
has little to do with the physical photon quantization. In general in this text,
it is assumed that any subtleties associated with sampling have been addressed
and we work with the complex baseband signal. However, we will address some
of the sampling issues here.
The fundamental issue is that sampled signals need to be sampled at a rate
that satisfies the Nyquist criterion,2 that is, for a band limited signal of width
2 While “Nyquist” is widely used to indicate this criterion, work by John Whittaker,
Vladimir Kotelnikov, and Claude Shannon could justify use of their names, and the
criterion is sometimes denoted the WKS criterion [25].
B (including both positive an negative frequencies), the complex sampling rate

fs must be greater than the bandwidth B.3 Translating between the continuous
and the sampled domains typical requires an anti-aliasing filter or pulse-shaping
filter. A common example of the pulse-shaping filter is the raised-cosine filter
[255]. This filtering is required because a sampled signal has spectral images at
multiples of the inverse of the sample period. For a continuous signal s(t) ∈ C as
a function of time t, the spectrum S(f ) as a function of frequency f is given by

S(f ) = dt e−i2π t f s(t) . (4.32)
The spectrum of the sampled signal Ss (f ) with sample spacing T is given by

Ss (f ) = dt e−i2π t f s(t) T δ(t − mT )
m

= e−i2π m T f T s(mT )
m

= S(f − m/T ) , (4.33)
m
where δ(·) is the delta function, and the scaling by T is for convenience. The
evaluated form in Equation (4.33) is the discrete-time Fourier transform of the
sampled form of the signal. Two issues can be observed from Equation (4.33).
First, for a received signal, if the spectral width of the signal S(f ) is greater
than 1/T the various images of the continuous signal would overlap, resulting
in an inaccurate sampled image. This phenomenon is referred to as aliasing.
Second, if one were to transmit this signal, it would occupy infinite bandwidth.
This is undesirable, and physically unrealizable. A pulse shaping filter is applied
typically to reduce the spectral width of the signal. To get perfect reconstruction
of a band-limited signal (of bandwidth B ≤ 1/T ), one can theoretically employ
a perfect spectral “brick-wall” filter θ(f T ) that is spectral flat, within −1/(2T )
to 1/(2T ) and zero everywhere else. The impulse response of this filter is given
by the sinc function, so that the reconstructed signal is given by

t − mT
s(t) = s(mT ) sinc . (4.34)
m
T
Unfortunately, this approach is not achievable because the sinc function is infinite
in extent; however, reconstruction filters that require a small number of samples
can be designed that still work well.
Problems
4.1 In considering various constellations, when in the presence of additive Gaus-

sian noise, the largest probability of confusing one point with another point on
3 Note we dropped the common factor two because we are assuming complex samples so
that B includes both the positive and negative frequencies.
i
(01) (00)
−1 1
(10) (11)
−i
Figure 4.16 QPSK bit assignments.
the constellation is driven by the distance between the closest points. Compared
to a BPSK constellation, find the relative average power required to hold this
minimum amplitude distance equal for the following constellations:
(a) QPSK,
(b) 8-PSK,
(c) 16-QAM,
(d) 64-QAM,
(e) 256-QAM.
4.2 Under the assumption of 30 ◦ C temperature measured at the receiver, find

the observed noise power for the following parameters:
(a) noise figure of 2 dB, bandwidth of 10 kHz,
(b) noise figure of 6 dB, bandwidth of 10 MHz.
4.3 In considering a two-stage superheterodyne downconversion to complex

baseband for a system with a carrier frequency of 1 GHz and a bandwidth of
40 MHz, find IF frequency ranges such that undesirable images are suppressed
by more than the square of the relative filter sidelobe levels under assumption
of the same filter being used at each stage.
4.4 By assuming that nodes in a wireless network communicate directly with
their nearest neighbor, evaluate the capacity scaling laws for networks that are
constrained to
(a) a linear geometry,
(b) a three-dimensional geometry.
4.5 Consider the constellation diagram for a QPSK system shown in Figure 4.16
with the constellation points assigned to 2-bit sequences. If possible, find an
alternative assignment of bits that leads to a lower average probability of bit
error.
4.6 Consider the following equation:

Z =S+N. (4.35)
Let N be distributed according to a zero-mean, circularly symmetric, Gaussian
random variable with variance 12 σ 2 per real dimension and is independent of S.
The random variable Z is used to estimate S such that the probability of error
in making the estimation is minimized.
(a) Suppose that S = ±V with equal probability. Find the probability of error
in terms of the Q function. Recall that the Q function is the integral of the
tail of a standard Gaussian probability density function, i.e.
∞
1 x2
Q(t) = dx √ e− 2 . (4.36)
t 2π
(b) Suppose that S = ±U + ±U i. Find the probability of error.
(c) How should U and V be related such that the probabilities of error in
the previous two parts are equal? (This question really is about the SNR
requirement for a QPSK system and BPSK system to have the same prob-
ability of symbol error.)
4.7 Suppose that c1 , c2 , . . . , cM are M × 1 vectors and c†j cj = 1 for all j:

M
z= sj cj + n , (4.37)
j =1
where the si terms take on values of ±1 with equal probability and n contains
independent, identically distributed Gaussian random variables with zero mean
and variance 0.01. Let an estimate of s1 be given by ŝ1 = sign c†1 y .
(a) Suppose that c†j c1 = 0 for j = 1. Find the probability that ŝ1 = s1 .
(b) Suppose that√the entries of the vectors cj are i.i.d. random variables taking
values of ±1/ M with equal probability. Find the probability that â1 = a1 .
4.8 Consider a network comprising interferers distributed according to a Pois-
son point process with density of interferers ρ and subject to the standard inverse-
power-law path-loss model with path-loss exponent α > 2. Consider a link of
length r in this network between a receiver that is not part of the process and
an additional transmitter at a distance r away. Assuming that the signals are
subject to Nakagami fading with shape parameter μ equaling a positive integer,
find the cumulative distribution function (CDF) of the signal-to-interference ra-
tio of the link in question. Hint: the upper incomplete gamma function Γ(s, x)
for positive integers s can be expressed as follows:

s−1 k
x
Γ(s, x) = (s − 1)! e−x . (4.38)
k!
k =0
This problem was inspired by Reference [148].

5 Simple channels
For most wireless communications, channels (what happens between the

transmitter and receiver) are complicated things. For the sake of introduction, in
this section we consider a single transmit antenna and receive antenna, residing
in a universe without scatterers or blockage.
5.1 Antennas
The study and design of antennas is a rich field [15]. Here, we focus on a small
set of essential features. The first important concept is that antennas do not
radiate power uniformly in direction or in polarization. The radiated power as
a function of direction is denoted the radiation pattern. If the antenna is small
compared with the wavelength (for example, if the antenna fits easily within
radius of a 1/8 wavelength), then the shape of the radiation pattern is relatively
smooth. However, if the antenna is large compared with the wavelength, then
the radiation pattern can be complicated. Antenna patterns are often displayed
in terms of decibels relative to a notional isotropic antenna (denoted dBi). The
notional isotropic antenna has the same gain over all 4π of solid angle.1 Gain is an
indication of directional preference in the transmission and reception of power.
The axisymmetric radiation pattern for an electrically small (small compared
with a wavelength) dipole antenna is displayed in Figure 5.1. In the standard
spherical coordinates of r, θ, φ, which correspond to the radial distance, the polar
angle, and the azimuthal angle, respectively, the far-field electric field is limited
to components along the direction of θ, denoted eθ . For electrically small dipoles,
the radiation pattern is proportional to [290, 154]
1
eθ 2
∝ sin2 θ . (5.1)
r2
1 Consider an object in a three-dimensional space projected onto a unit, from a point at the
origin, the solid angle encodes the fraction is the area of the unit sphere that is occupied
on the unit sphere. Solid angle is typically normalized such that 4π covers the entire
viewable angular area.
142 Simple channels
One can find this relationship by noting that the radiation pattern must satisfy
the differential equation

1 ∂2
∇ − 2 2
2
e(r, t) = 0 , (5.2)
c ∂t
to satisfy Maxwell’s equations, where e(r, t) is the electric field vector as a func-
tion of time t and position r, and is determined by radial distance, polar an-
gle, and azimuthal angle indicated by r, θ, φ. The speed of light is indicated
by c, and ∇2 is the Laplacian operator discussed in Section 2.7.1. Solutions to
this equation are proportional to gradients of spherical harmonics [11] denoted
Yl,m (θ, φ), where l indicates the degree of the harmonic and m is the order of the
harmonic.
By observing various symmetries of the antenna and of the spherical har-
monic function, one can determine the contributing degree and order. Here these
observations are made without proof, and the interested reader can find more
thorough discussions of spherical harmonics. These symmetries are presented in
various texts discussing spherical harmonics [11]. Based on the axial symmetry
of the antenna, the radiated power must be axisymmetric. Solutions with m = 0
are a function of φ and therefore have azimuthal structure (that is, they are
a function of φ); consequently, order zero m = 0 is required. Furthermore, the
value of the θ-direction component of {e}θ is the same under parity inversion
such that the direction vector r → −r. A result of this symmetry is that there
is a symmetry in the value eθ above and below the θ = π/2 plane. Spherical
harmonics that observe this symmetry require that only odd values of degree l
are allowed. Here it is assumed that the antenna is small compared with a wave-
length. Given the short length of the antenna compared to a wavelength, it is
difficult to induce complicated radial functions of current flow [15]. The coupling
of current to induce a particular spherical harmonic is proportional to the spher-
ical Bessel function jl (k d) [11], where k is the wavenumber and d is the distance
along the antenna. The l = 1 spherical Bessel function moves the quickest from
zero as a function of k d, and thus corresponds to the solution with the largest
current and thus radiated power. The lowest-order spherical harmonic satisfying
all the symmetries and radiating the most power for an electrically small dipole
is Yl=1,m =0 (θ, φ). Consequently, the electric field that is proportional to the gra-
dient of this function is proportional to sin(θ). The gain is therefore given by
Equation (5.1).
The notional isotropic antenna would radiate equal power in all directions.
Consequently, if the isotropic antenna and the small dipole antenna radiated
the same total power, such that the integrals over a surface at some distance r
are the same, then the peak gain Gdipole , which is at the horizon (θ = π/2), is
Dipole
2
e ␪
␪
Figure 5.1 Radiation pattern of a small (length λ) dipole antenna.
∼1.76 dBi,
sin2 (π/2)
Gdipole = % 2 &
sin (θ)
sin2 (π/2)
= dφ dθ sin θ sin 2 (θ )
dφ dθ sin θ
3
= ≈ 1.76 dBi . (5.3)
2
Typically, the peak gain increases as the size of the antenna increases in units of
wavelength.
A second important concept is that the magnitude of the radiation pattern is
the same for transmitting and for receiving. This property is because Maxwell’s
equations are symmetric under time reversal [290, 154]. Consequently, a transmit
and receive pair of antennas observe reciprocity; that is, they will observe the
same channel and antenna gain when the link direction is reversed. If an antenna
tends to radiate power in a particular direction, then it will prefer receiving power
from that same direction. Signals being received from other directions will have
some relative attenuation.
5.2 Line-of-sight attenuation
The attenuation between two antennas in a line-of-sight environment in the far

field is proportional to the gain of the transmitter times the effective area of the
receiver divided by the distance between antennas squared. The motivation for
this relationship is given by the effect of the density of the power spreading on the
surface of a sphere. For a given power transmitted in some direction (proportional
to antenna gain), the power received is then given by the product of the flux of
power in that direction times the effective area of the receive antenna, as seen in
Figure 5.2.
144 Simple channels
Surface
2
Area ~ r
Effective
Area Gain
Figure 5.2 Spherical propagation about transmitter.
5.2.1 Gain versus effective area

Consider a signal propagating in both directions between a pair of line-of-sight
antennas (antenna 1 and antenna 2). The transmitted and received powers are
denoted Pt,1 and Pt,2 , and Pr,1 and Pr,2 , respectively. The transmit gains and re-
ceive effective areas are denoted Gt,1 and Gt,2 , and Aeff ,1 and Aeff ,2 , respectively.
The gain is an indication of how much power is radiated or received in a partic-
ular direction. The concept of effective area is a large antenna approximation. If
one considered a very large antenna that was many wavelengths across in both
dimensions, then its ability to collect photons is essentially given by the physical
cross section upon which the photons impinge. As the antenna grows smaller,
the correspondence between physical area and effective area becomes less clear.
In the limiting case of a thin wire dipole antenna of finite length but very small
width, the effective area has little to do with the physical area. Nonetheless,
it is still a useful concept. The relationships between the transmit and receiver
powers are then given by
Pr,2 = α Aeff ,2 Gt,1 Pt,1

Pr,1 = α Aeff ,1 Gt,2 Pt,2 , (5.4)
where α is some constant that incorporates effects such as attenuation due to

propagation. From reciprocity, the attenuation for the link in either direction
must be the same. Consequently, the ratios of the transmit to receive powers
must be the same, Pr,2 /Pt,1 = Pr,1 /Pt,2 , or equivalently,
Aeff ,2 Gt,2
= . (5.5)
Aeff ,1 Gt,1
Thus, the effective area of an antenna in some direction is proportional to the

gain of the antenna in that direction. The next question is to determine the
Load Antenna
Figure 5.3 Antenna in thermal equilibrium.
constant of proportionality between the effective area of an antenna Aeff and

gain of that antenna G.
To determine the constant of proportionality between the gain and effective
area, a thermodynamic argument is invoked [236]. The effective area is defined
by the power received by the antenna divided by the power spectral-density flux
impinging upon the antenna,
P
Aeff = , (5.6)
S
where P is the received power spectral density (in units of W/Hz) and S is
the power spectral-density flux (in units of W/(m2 · Hz) for example) under the
assumption that the polarization and direction maximize the received power. An
antenna is in thermal equilibrium with its background in a chamber, as seen in
Figure 5.3.
For radio frequencies, the total power spectral-density flux of blackbody radi-
ation Φf (in units of W/m2 /Hz, for example) can be approximated by [264]
2 f 2 kB T
Φf = , (5.7)
c2
which is known as the Rayleigh–Jeans law, where f is the frequency, c is the
speed of light, kB is the Boltzmann constant and T is the absolute temperature.
For the system to be in equilibrium, the incoming power and the outgoing power
must be equal. Because power is being received at the antenna from all directions,
P is the sum of power received from all directions, and the effective area and
blackbody power spectral-density flux are denoted Aeff (θ, φ) and S(θ, φ). The
differential of the solid angle is indicated by dΩ = dφ dθ sin(θ). The received
power spectral density is given by

P = dΩ Aeff (θ, φ) S(θ, φ)
2π π 2π π
Φf Φf
= dφ dθ sin(θ) Aeff (θ, φ) ·1+ dφ dθ Aeff (θ, φ) · 0,
0 0 2 0 0 2
(5.8)
146 Simple channels
where the first term on the right-hand side of Equation (5.8) includes the radia-
tion that matches the polarization of the antenna and the second term includes
the radiation that is orthogonal to the polarization of the antenna. Consequently,
the received power spectral density from the blackbody radiation is given by
π
2f 2 kB T 2π
P = dφ dθ Aeff (θ, φ)
2 c2 0 0
f 2 kB T
= 4π Aeff (θ, φ) . (5.9)
c2
As discussed in Section 4.2.2, at lower frequencies, the thermal power spectral
density due to the resistor and radiated by the antenna is given by
P = kB T . (5.10)
By equating the incoming and outgoing power, the average effective area is found
to be
λ2
Aeff (θ, φ) = , (5.11)
4π
by noting that the wavelength is given by λ = c/f . By construction, the average
gain is one:
G(θ, φ) = 1 . (5.12)
Because the effective area and gain are proportional and their averages are de-
termined here, they are related by
Aeff (θ, φ)
= G(θ, φ) , (5.13)
λ2 /4π
under the assumption that their polarizations are matched.
When the direction parameters are dropped, maximum gain and effective area
are typically assumed, so that the gain and effective area are related by
4π Aeff
G= . (5.14)
λ2
As mentioned previously, the effective area of an antenna is somewhat different
from the physical area. For moderately sized antennas (a few wavelengths by
a few wavelengths), effective area is typically smaller than the physical area.
Effective area is difficult to interpret for wire antennas and electrically small
antennas, although if one imposes the constraint that no physical dimension can
be smaller than some reasonable fraction of a wavelength (something on the
order of 1/3), then it is at least somewhat consistent with the effective area.
While it is generally assumed that the gain indicates the peak gain from the
antenna exclusively, sometimes the inefficiencies due to impedance mismatches
between the amplifier or receiver and the antenna are included with the gain.
In this case, directionality is the gain of the ideal antenna, and the gain is the
product of the directionality and the adverse effects of inefficiencies [15].
Figure 5.4 Notional radiation beamwidth for antenna with lh × lw effective area.
5.2.2 Beamwidth
The beam shape is dependent upon the details of the antenna shape, but the
shape of the main lobe can be approximated by considering a rectangular an-
tenna with area Aeff ≈ A = lw lh , where lw and lh are the width and height
of the antenna, as seen in Figure 5.4 The beamwidths in each direction are ap-
proximately λ/lw and λ/lh . For a square antenna l = lw = lh of gain G, the
beamwidth, Δθ, in each direction is approximately
,
λ 4π
Δθ ≈ = . (5.15)
l G
There are advantages and disadvantages to having higher-gain antennas. In a
line-of-sight propagation environment, the increased gain provides more signal
power at the receiver. However, this comes at the cost of requiring greater ac-
curacy in pointing the antennas. Furthermore, in non-line-of-sight environments
with significant multipath, there is no one good direction for collecting energy
from the impinging wavefronts. Collecting energy in complicated scattering en-
vironments is one of the advantages of using adaptive antenna arrays versus
high-gain antennas.
As an example, if we imagine a square 20 dBi antenna, the effective area is
given by
G 2
A= λ ≈ 8 λ2 (5.16)
4π
and the approximate beamwidth is given by
,
4π
Δθ ≈ ≈ 0.35 rad . (5.17)
G
In terrestrial communications, it is not uncommon for communication links to
be blocked by walls, buildings, or foliage. Even when the transmitter and receiver
have a direct path, the propagation is often complicated by the scatterers in
the environment. Nonetheless, line-of-sight propagation is useful as a reference
propagation loss.
148 Simple channels
For a line-of-sight environment, the received power Pr is related to the transmit

power Pt by the free-space attenuation, which is given by
Pr = a 2 Pt
Gt Aeff
a 2=
4π r2
Gt Gr λ2
= , (5.18)
(4π r)2
where a is the complex attenuation of the signal, Gt and Gr are the gains of the
transmit and receive antennas toward each other, and Aeff is the effective area
of the receive antenna. The distance from the transmitter to the receiver is r.
Geosynchronous orbit link

It is amusing to consider the link from a satellite in geosynchronous orbit to a
ground-based transmitter. Geosynchronous orbits match the orbital period to
the earth’s rotation. Geosynchronous orbits are convenient for communication
satellites because the satellites appear to be approximately stationary as the
earth rotates. Consequently, the pointing direction of the ground antenna can
be fixed.
We can find the channel attenuation between a geosynchronous satellite and
a ground-based transmitter for a few typical parameters. The altitude r of a
geosynchronous satellite is about 36 000 km, as seen in Figure 5.5. This is a
relatively long link. One of the digital TV bands occupies spectrum a little
above 10 GHz. Here we will pick 12 GHz or a wavelength λ of about 25 mm. At
this frequency, relatively small antennas can have fairly high gain. Compared to
low carrier frequency, it is relatively easy to achieve 30 dBi gain Gr . Satellites’
antennas often have even higher gains. One important limitation is the broadcast
coverage area. If the gain is too high, the beam might not illuminate the required
area on the ground. If one desired a beam using a square antenna that covered
the continental USA, the peak gain Gt is about 28 dBi, although smarter beam
shaping could be used to increase the gain. In practice, a satellite could cover the
same region by using a larger antenna and multiple feeds. Each feed illuminates
the main antenna from a slightly different angle. Consequently, the antenna
illuminates a different region from each feed.
With the nominal parameters given in the previous paragraph, the attenuation
can be calculated. The attenuation through the channel is given by
2 Gt Gr λ2
a =
(4π r)2
= 35 + 30 + 10 log10 {λ2 } − 10 log10 {(4π)2 } − 10 log10 {r2 } [dB]
≈ 35 + 30 + (−32) − 22 − 151 = −140 [dB] , (5.19)
where the arithmetic was performed on a decibel scale. The attenuation in a real
environment will be slightly worse because of atmospheric and other nonideal
effects.
r ~ 36 000 km
Gt
Gr
Figure 5.5 Notional geometry of satellite broadcast received on earth.
5.3 Channel capacity
The single-input single-output (SISO) channel link capacity was described by

Claude Shannon [284, 68]. The capacity provided a theoretical bound on the data
rate that can be successfully transmitted and received through a noisy channel
with arbitrarily low error rate. Before this result, it was unclear if essentially
error-free performance was achievable given some positive channel noise. The
bound does not provide guidance on how to design a practical link. However, the
notion that the bound is achievable has driven communication systems ever since.
The bound is based on a few simple ideas. The transmit signal as a function
of time is given by s(t). The signal at the receiver has some average power.
The channel, represented by the complex attenuation a, is known. Added to the
communication signal is additive noise as a function of time n(t). The received
signal as a function of time z(t) is given by
z(t) = a s(t) + n(t) . (5.20)
Next, two approaches for motivating channel capacity are discussed. The first
is geometric. The second is based on the concept of mutual information. For a
more thorough discussion see [68, 314], and the original discussion [284].
5.3.1 Geometric interpretation

A heuristic approach to motivate channel capacity can be constructed geomet-
rically. Consider a sequence of ns transmitted symbols, s(t0 ), s(t1 ), . . . s(tn s −1 ),
and corresponding received signals, z(t0 ), z(t1 ), . . . z(tn s −1 ). This sequence is
used to construct a codebook of allowed transmitted symbol sequences such that
there is little probability of confusing one entry in the codebook from another
entry even in the presence of the additive complex Gaussian noise. The question
is, how many different distinguishable sequences can exist for a given number of
symbols, signal power, and noise power?
As a simple example, consider a system in which there are only two transmitted
symbols, s(t0 ) = ±2, and s(t1 ) = ±2. Furthermore, consider a very strange
non-Gaussian noise structure such that the values {−3/2, −1/2, 1/2, 3/2} can be
150 Simple channels
added to s(t0 ), and s(t1 ) with equal probability. The complete list of 64 possible
channel output states is displayed in Figure 5.6. The four values of the s(t0 ),
s(t1 ) pair are represented by the dots, and all of the potential output states of
z(t0 ), z(t1 ) are represented by the intersections of the grid lines.
• The total number of states is 64.

• The number of noise states is 16.
• The number of information states is 4 by construction.
Because of the careful construction of the noise, all four of these states can
be recovered. Consequently, the system reliably can communicate 2 bits. The
potential number of useful information bits can also be seen by considering the
entropy of the received signal and the noise. By analogy to statistical mechanics,
the entropy with equally likely states is given by the logarithm of the number of
potential states. Thus, the entropy associated with z(t0 ), z(t1 ), denoted here as
Hz , is given by
Hz = log2 (64) = 6 , (5.21)
and the entropy associated with noise, denoted here as Hn , is given by
Hn = log2 (16) = 4 . (5.22)
The total number of information bits or the capacity C of this channel is given
by
C = Hz − Hn
= 6 − 4 = 2. (5.23)
If one thinks of the sequence of transmitted symbols as occupying a complex

ns -dimensional space, the maximum number of codewords (useful transmitted
symbol sequences) is given by the ratio of the number of total states divided by
the number states occupied by the noise. Finding the number of codewords in
this manner assumes a best case packing of noise and symbol spacing (as seen
in Figure 5.6). For continuous distributions, a state is a notional concept, con-
venient for developing a geometric understanding. A state is occupied if a draw
from a random distribution lies within some differential hypervolume around a
given point. For states to be occupied with equal probability, the differential
hypervolume must be proportional to the inverse of the probability density at
the local value.
The total number of receive states is a function of the received signal power
and the noise power. One can think of a particular codeword as a vector in
the ns -dimensional space and the noise as a fuzzy ball around the endpoint of
the vector. We assume that the fuzzy ball is given by a Gaussian distribution,
whose selection is discussed in greater precision in the next section. For any finite
value of ns complex symbols, the Gaussian fuzzy ball has an arbitrarily large ex-
tent. However, as ns → ∞, the sphere associated with the Gaussian distribution
z(t1)
4
z(t0)
-4 -2 2 4
-2
-4
Figure 5.6 Simple entropy example. There are 16 possible noise states and 64 possible
received states.
hardens. The notion of hardening indicates that the fluctuation about a central
value decreases as the number of symbols increases. Consequently, essentially
all draws from the Gaussian distribution will be arbitrarily close to the surface
of the sphere. As the number of complex symbols ns increases, the probability
of large fluctuations in noise distance decreases. A second implication of con-
sidering the large dimensional limit is that the ns -dimensional hypervolume is
dominated by the contributions at the surface, that is the vectors associated with
codewords are nearly always close to the surface. Thus, by assuming that the
distribution of codewords is statistically uniform across the surface, the number
of states is proportional to the volume. The capacity is given by the ratio of the
volume of hyperspheres in which signal-plus-noise vectors occupy to the volume
of hyperspheres in which noise vectors occupy. It is shown later in this section
that this packing is achievable. A set of code words generated by drawing from
a complex Gaussian distribution satisfies the requirements of sphere hardening
and uniformity. In Section 5.3.2, the optimality of the Gaussian distribution is
discussed.
Sphere hardening
The magnitude-squared norm (here denoted x) of an n-dimensional real noise
vector n with elements sampled from a real zero-mean unit-variance complex
152 Simple channels
Gaussian distribution is given by

x= n 2
= {n}m 2
. (5.24)
m
The probability distribution for the random variable x (magnitude-squared norm

of n) is given by a complex χ2 distribution fχC2 (x; n, σ 2 ), discussed in Section
3.1.11, with a variance of σ 2 . The mean of the magnitude-squared norm x for
the distribution fχC2 (x; n, σ 2 ) is given by
∞
x = dx x fχC2 (x; n, σ 2 )
0
= n σ2 . (5.25)
The variance of the magnitude-squared norm x is given by

∞
% 2&
dx x2 fχC (x; n, σ 2 ) − n2 σ 4
2
x − x =
0
= n(n + 1) σ 4 − n2 σ 4 = n σ 4 . (5.26)
For a fuzzy Gaussian ball, the square of the radius is given by x and the standard
deviation of the fluctuation about the mean of x is given by the square root of
the variance of x. The fuzziness ratio indicated by the standard deviation to the
mean square of the radius is given by
√ 2
nσ
→ 0 ; as n becomes large. (5.27)
n σ2
Because the ratio goes to zero, it is said that the noise sphere has hardened. This
effect can be observed in Figure 5.7. As n becomes larger, the density function
becomes sharply peaked about the mean. For larger values of n, the noise about
the codeword vector is modeled well by a hard sphere.
Volume of hypersphere
For some point in a k-dimensional real space, the volume of the hypersphere of
some radius r is given by [334]
π k /2
V ( ) (k, r) = rk . (5.28)
Γ[k/2 + 1]
For some point in an m-dimensional complex space, the volume of the

hypersphere is given by
πm πm 2 m
V (m, r) = r2 m = r . (5.29)
Γ[m + 1] m!
Note, this is not the volume of an m-dimension hypersphere in a real space

because of the doubling of dimensions due to the complex space.
n
1
f␹C2 x, n,
2
0
0.0 0.5 1.0 1.5 2.0
x
n
Figure 5.7 Probability density function for fχC2 (x; n, 1/n), with n = 10, 20, 40, . . . , 100.
The volume Vn of the fuzzy complex noise ball for large m is approximated by
πm m
Vn (m, σ) ≈ x
m!
πm 2
= (σ m)m , (5.30)
m!
where x denotes the magnitude-squared norm of the noise used in the previous
section. By using essentially the same argument and by noting that the signal and
noise are independent, the variances of the noise and the signal power add. The
volume of the hypersphere associated with the received signal Vz is approximated
by
πm
Vz (m, σ 2 + Pr ) ≈ ([σ 2 + Pr ] m)m , (5.31)
m!
where Pr is the average receive signal power observed at the receiver in the
absence of noise.
Geometric capacity construction

To simplify the discussion, it is assumed that a = 1 in Equation (5.20) without
loss of generality. For a large ns -dimensional complex space, corresponding to the
m complex symbols transmitted, the number of separable codewords ncode can be
bounded by the number of fuzzy noise balls that fit into the total hypervolume.
This number can be approximated by the ratio of the volumes of the hyperspheres
154 Simple channels
Capacity b symbol
5
4
3
2
1
0
10 5 0 5 10 15 20
SNR dB
Figure 5.8 Channel capacity in terms of bits per symbol as a function of SNR.
associated with the noise-plus-signal power to the noise power,

√
Vz (ns , σ 2 + Pr )
ncode ≤
Vn (ns , σ)
(σ 2 + Pr )n s
≤ , (5.32)
(σ 2 )n s
in the limit of a large number of symbols ns . Thus, the upper limit on the number
of bits per complex symbol c, which is an outer bound on capacity, is given by
the log2 of the number of possible codewords,
1
c= log2 (ncode )
ns

Pr
= log2 1 + 2 . (5.33)
σ
The ratio of the signal power Pr to noise power σ 2 is the SNR. The capacity
channel as a function of SNR is displayed in Figure 5.8.
This geometric argument sets an upper bound on the data rate. It is based on
some notion of perfectly packing the noise spheres such that there is no potential
confusion between symbols. Somewhat surprisingly, this bound is theoretically
achievable in the limit of a large dimensional space.
Achievability
To demonstrate that the capacity bound is asymptotically achievable, a partic-
ular implementation is proposed. For a given data rate, the probability of error
must go to zero as the number of symbols goes to infinity. The effect of sphere
hardening is exploited here. Consider a random code over m complex symbols
with ncode codewords constructed using vectors drawn randomly from a complex
Gaussian distribution. These codewords would randomly fill the space (lying
near the surface with high probability). Given that a particular codeword was
Potentially
Confused
n-Dimensional
Symbol
Space
Noise
Symbol
Figure 5.9 Notional representation of two symbols. The first symbol is displayed
including some distribution of noise. The second symbol is potentially confused with
the first symbol. In this case, the second symbol does not cause confusion because it
is outside the noise volume of the first symbol.
transmitted, the probability that another randomly generated codeword would

(1)
fall within a confusion distance (determined by the noise power) per r is given
by the ratio of the volumes of the hypersphere corresponding to the noise to the
hypersphere corresponding to the signal plus noise,
Vn (m, σ 2 )
p(1)
er r =
Vz (m, σ 2 + Pr )
m
σ2
= , (5.34)
Pr + σ 2
where the equality is asymptotically valid using the sphere-hardening approxi-

mation.
By employing the union bound (which exploits the observation that summing
the probabilities of pairwise errors as if there were independent variables pro-
duces is an upper bound on the real error probability), the probability that the
ncode − 1 erroneous codeword might lie within the noise sphere of the correct
codeword is bounded by
m
σ2
per r ≤ (ncode − 1)
Pr + σ 2
m
σ2
< ncode . (5.35)
Pr + σ 2
By using the definition that the coding rate r is the number of bits encoded by
the codebook normalized by the m complex symbols used
log2 (ncode )
r= , (5.36)
m
156 Simple channels
the probability of error is bounded by

m
σ2
per r < 2r m
Pr + σ 2
= 2m (r −log 2 [1+P r /σ ]) ,
2
(5.37)
where the relationship x = 2log 2 (x) is exploited.
The error is driven to zero as the exponent of the right-hand side of Equation
(5.37) becomes large. For Pr > 0, the exponent of the right-hand side tends to
−∞ as n → ∞, if

Pr
r < log2 1 + 2
σ
= c. (5.38)
Thus, given the Gaussian random code construction, error-free decoding is pos-
sible as n → ∞ for rates approaching the channel capacity

Pr
c = log2 1 + 2 . (5.39)
σ
Here the capacity has been developed under the assumption of a complex
baseband signal. Consequently, there are two orthogonal degrees of freedom (real
and imaginary). In other texts, the capacity is sometimes developed under the
assumption of real variables. In that case, the capacity is half what is displayed
in Equation (5.39). In addition, the real and imaginary components of the noise
would be considered separately. The variance of the real or imaginary component
of the noise would then be half of the complex noise.
In this chapter, c is used to represent the capacity in terms of bits/symbol or
in terms of bits/s/Hz. Elsewhere in the text, c is used to represent the speed of
light. Hopefully, the usage will be obvious from context and will not cause any
confusion.
5.3.2 Mutual information

In previous sections, channel capacity was discussed in terms of a geometric
argument. A description of channel capacity in terms of mutual information
and entropy is provided here. The capacity is the mutual information with the
optimal input probability distribution for a given noise distribution and for a
given set of channel constraints [68]. In the channel model used in the previous
section,
z = as + n, (5.40)
it is convenient to set the units of power such that the coefficient a is 1. Here
we have suppressed the explicit dependence upon time. Throughout this section,
more care is taken notationally with regard to random variables than is taken in
most of the text. A random variable X is indicated with an uppercase character.
Some instance of that variable x is drawn from the probability distribution asso-
ciated with X. Throughout this section, it is assumed that the random variables
are complex.
The maximum information per symbol (or data rate) is given by the mutual
information I(S; Z) between the random variables S and Z when the distribution
for S is optimized. The mutual information is given by

2 2 p(s, z)
I(S; Z) = d s d z p(s, z) log2
p(s) p(z)
= h(Z) − h(Z|S)
= h(S) − h(S|Z) , (5.41)
where the temporal parameter t of the random variables has been suppressed
for s and z, and the differential area in the complex space d2 s is described in
Section 2.9.2. Differential entropy for some random variable is indicated by h(·).
The conditional differential entropy is indicated by h(·|·), where the second term
is the constraining condition. The joint probability distribution of S and Z is
indicated by p(s, z). While formally the notion pS,Z (s, z) might be clearer, it is
assumed that dropping the subscripts will not cause confusion. Similarly, the
probability distributions of S and Z are indicated by p(s) and p(z), respectively.
The differential entropy h(·) and conditional differential entropy h(·|·) are
named, making a connection with statistical mechanics [264]. The use of the
modifier “differential” indicates that this is the entropy used for continuous ran-
dom variables. While the derivation will not be presented explicitly, the moti-
vation for the mutual information being given by the difference in the entropy
terms is directly related to the geometric discussion in the previous section. In
statistical thermodynamics, entropy is proportional to the log of the number of
possible states Ω. Each state is assumed to be equally likely with probability
p = 1/Ω. Consequently, statistical mechanical entropy is given by
1
kB log Ω = kB log
p
= −kB log p , (5.42)
where kB is the Boltzmann constant. This expression is a measure of entropy

in units of joules/kelvin, which relates energy with temperature. In information
theory, it is convenient to ignore the energy discussion and use base 2 rather
than the natural logarithm because information is typically measured in terms
of the number of bits. In addition, the constant of proportionality is dropped. It
is worth noting that in the literature the natural log is used sometimes, and the
units of information are given in “nats.” Specifically, the log2 is replaced with a
natural log in Equation (5.39). A single nat is equivalent to 1/ log(2) ≈ 1.44 bits.
In this text bits are preferred. Unlike in the typical statistical thermodynamics
discussion, the probability of each state may not be equally likely. The entropy
of a random variable X is the expected value of the number of bits required to
158 Simple channels
specify a state taken over all values of X,
) *
1
h(X) = log2 . (5.43)
p(x)
For continuous variables, h(X) is called the differential entropy and is given by

1
h(X) = d2 x p(x) log2
p(x)

= − d2 x p(x) log2 [p(x)] . (5.44)
Similarly, conditional entropy is given by
) *
1
h(X|Y ) = log2
p(x|y)

= − d2 x d2 y p(x, y) log2 [p(x|y)] , (5.45)
where p(x|y) is the probability density of x assuming a given value for y. The
difference between the entropy and the conditional entropy is given by

h(X) − h(X|Y ) = − dx p(x) log2 [p(x)] + d2 x d2 y p(x, y) log2 [p(x|y)]

=− d2 x d2 y p(x, y) log2 [p(x)]

p(x, y)
+ d2 x d2 y p(x, y) log2
p(y)

p(x, y)
= d2 x d2 y p(x, y) log2
p(x) p(y)
= I(X; Y ) . (5.46)
where the relationship p(x, y) = p(x|y) p(y) is employed. By observing the sym-
metry between x and y in Equation (5.46), it can be seen that the mutual
information is also given by
I(X; Y ) = h(Y ) − h(Y |X)

= h(X) − h(X|Y ) . (5.47)
The above form is somewhat satisfying heuristically. If the entropy is expressed

in units of bits, then the entropy is the average number of bits required to specify
the state of a random variable. If Z is the observed random variable and S is

the source random variable, then h(Z) is the average number of bits required to
specify the observed state, and h(Z|S) is the average number of bits required
to specify the observed state if the source state is known (this is the average
number of bits required to specify the noise). Consequently, the difference must
be the number of information bits that can be communicated per symbol. The
capacity c is given by maximizing the mutual information with respect to the
transmit probability distribution p(s),
c = max I(S; Z) . (5.48)

p(s)
5.3.3 Additive Gaussian noise channel

By using the definition in Equation (5.20), the three random variables are given
by the received signal Z, the transmitted signal S, and the complex Gaussian
noise N associated with n. By using Equation (5.47), the mutual information
between the channel input S and the received signal Z is given by
I(S; Z) = h(Z) − h(Z|S) . (5.49)
The differential entropy h(Z|S) is simply the differential entropy associated with
the noise h(N ),
h(Z|S) = h(S + N |S)

= h(N ) . (5.50)
This differential entropy evaluation can be seen directly by noting that under
the change of variables z = s + n, the probability p(s + n|s) = p(n),

h(Z|S) = − d2 s d2 z p(z, s) log2 [p(z|s)]

=− d2 s d2 n p(s + n|s) p(s) log2 [p(s + n|s)]

=− d2 s d2 n p(n) p(s) log2 [p(n)]

=− d2 n p(n) log2 [p(n)]
= h(N ) . (5.51)
Here we use the observation that the integrals over p(s + n|s) and p(n) are the
same even though the distributions themselves are not the same. The differential
160 Simple channels
entropy for the Gaussian noise can be evaluated directly:

h(N ) = − d2 n p(n) log2 [p(n)]
⎛ n 2 ⎞
n 2
− σ2 −
e n
⎝ e σ n2 ⎠
= − d2 n log 2
πσn2 πσn2
−
n 2

2 e σn2
n 2 2
= d n +log2 (e)
log 2 [πσ n ]
πσn2 σn2
n 2
− σ2
2 2 e
n n 2
= log2 [πσn ] + log2 (e) d n
πσn2 σn2
= log2 [πσn2 ] + log2 (e)
= log2 [π σn2 e] . (5.52)
The capacity is given by finding the probability distribution that maximizes

the mutual information between the received signal Z and the transmitted signal
S in Equation (5.49). To find this distribution, the calculus of variations is em-
ployed as discussed in Section 2.12.3. To be meaningful, this maximization must
have some physical constraints. In particular, if the power is allowed to go to
infinity, then the capacity can be infinite. The mutual information is maximized
under an average power constraint P0 ,
% &
s 2
≤ P0 . (5.53)
The mutual information is maximized by maximizing the entropy of the received

signal Z under the average power constraint. The entropy is given by

h(Z) = − d2 z p(z) log2 [p(z)] . (5.54)
For some random complex variable X, the distribution p(x) that maximizes
the entropy can be found using the Lagrangian constrained optimization. There
are three basic constraints that p(x) must satisfy. These are the basic constraints
on a probability,
• p(x) ≥ 0
• d2 x p(x) = 1 ,
and, for the problems under

% consideration
& here, there is an average power con-
straint on X such that x 2 = σx2 ,
• d2 x x 2
p(x) = σx2 .
The variational functional is φ constrained by the Lagrangian multipliers λ1

and λσ ,

φ = d x p(x) log2 [p(x)] − λ1
2
d x p(x) − 1
2

− λσ d2 x x 2 p(x) − σ 2 . (5.55)
The optimal probability density is found by setting the total variation of the
function δφ to zero,

1
δφ = d x log2 [p(x)] −
2
− λ1 − λσ x 2
δp(x) = 0 , (5.56)
log 2
where the relationship log2 (x) = log(x)/ log(2) is used. By solving for p(x) such
that the total variation is zero for some arbitrary variation of δp(x), the following
form is found,
1
0 = log2 [p(x)] − − λ1 − λσ x 2
log 2
p(x) = e 2(λ 1 + λ σ x
2
)
= a eλ σ x ,
2
(5.57)
where a is a constant incorporating e 2λ 1 . Applying the constraints to determine

λ1 and λσ , a distribution p(x) that maximizes the entropy is found to be the
complex Gaussian,

1 = d2 x a eλ σ x
2
π
= −a
λσ
λσ
⇒a=− , (5.58)
π
and, by using the notation x = xr + ixi ,

λσ
d2 x x 2 eλ σ x
2
σx2 = −
π

λσ 2 2
=− dxr dxi (x2r + x2i ) eλ σ (x r +x i )
π

λσ 2 2 λσ 2 2
=− dxr dxi x2r eλ σ (x r +x i ) − dxr dxi x2i eλ σ (x r +x i )
π π

λσ 2 2
= −2 dxi eλ σ x i dxr x2r eλ σ x r
π
√ √
λσ π π 1
= −2 √ =−
π −λσ 2(−λσ )3/2 λσ
1
⇒ λσ = − 2 . (5.59)
σx
162 Simple channels
Encoder
Figure 5.10 Discrete memoryless channel with random state.
Consequently, the maximum differential entropy is given by the complex Gaus-

sian distribution,
1 −x2 /σ x2
p(x) = e . (5.60)
π σ2
Therefore, the distribution for Z that maximizes the mutual information is Gaus-
sian. Because the noise is Gaussian by construction, the distribution for S must
also be Gaussian. Because the variance of the sum of independent Gaussian vari-
ables is the sum of the variances, the variance of Z is the sum of the received
signal power and noise power Pr + σn2 , and the channel capacity bound is given
by
I(S; Z) = h(Z) − h(Z|S)
= h(Z) − h(N )
= log2 [π (Pr + σn2 ) e] − log2 [π σn2 e] .

Pr
= log2 1 + 2 . (5.61)
σn
Because the Gaussian distribution maximizes entropy, Gaussian noise also has
the greatest detrimental effect on the mutual information observed in Equation
(5.49).
5.3.4 Additive Gaussian noise channel with state

Another canonical channel in information theory is the discrete memoryless chan-
nel (DMC) with random state known at the encoder. For a single-channel use,
this channel can be described by the following equation and Figure 5.10:
z = s + t + n, (5.62)
where n ∼ N (0, σ 2 ) is additive noise distributed as a zero-mean, Gaussian
random variable of variance σ 2 , s is the transmitted symbol, and t is an in-
terfering signal (also known as the state) that is known noncausally at the
transmitter.
While it may seem unrealistic that the transmitter knows the interfering sig-
nal perfectly, there are many situations in which this model is applicable. For
instance, in broadcast channels where one transmitter has two different informa-
tion symbols to send to two different receivers, the signal intended for a particular
receiver is interference to an unintended receiver. Moreover, since the transmit-
ter is the source of both symbols, it must know the interfering signal perfectly.
Another possible application is in intersymbol-interference (ISI) channels where
successive symbols interfere with one another because of dispersion by the chan-
nel.
The capacity of this channel when the interfering signal t is not necessarily
Gaussian has been found by Gel’fand and Pinsker [108]. To achieve capacity, an
auxillary random variable U is used to aid in the encoding of the message via a
binning strategy.
Suppose that the transmitter wishes to send a message m (see Figure 5.10) to
the receiver. In the canonical additive-white-Gaussian-noise (AWGN) channel,
the transmitter will transmit an ns -symbol-long codeword, which is used to rep-
resent the message m, whereby each message maps to a single codeword. In the
Gel’fand–Pinsker scheme, each message m maps to several possible codewords,
each of length ns symbols. All the codewords corresponding to a particular mes-
sage m are said to come from the same bin. The codeword that is ultimately
selected to be transmitted is based on the value of the ns state symbols that
will occur during the transmission of the ns symbols that represent the message
m. Hence, since the transmitted codeword is dependent on the state symbols
that will occur during the transmission of that codeword, the transmitter can
precompensate for the effect of the interfering state symbols. By using random,
jointly typical coding and decoding (see, for example, Reference [68]), Gel’fand
and Pinsker show that the capacity of this channel is
C = max {I(U ; Z) − I(U ; T )} , (5.63)

p(u ,s|t)
where U is an auxiliary random variable that is used to generate the codewords.

A detailed treatment of the general result of Gel’fand–Pinsker is beyond the
scope of this text and can be found in specialized texts on information theory
such as [68]. However, an example based on systems with Gaussian-distributed
noise, and Gaussian-distributed state variables, called dirty-paper coding (DPC)
(introduced in Reference [67]) can be used to illustrate the Gel’fand–Pinsker
result.
Dirty-paper coding or Costa precoding is a technique used to select an appro-
priate auxiliary random variable U . Costa finds that U = S + μT can be used to
achieve capacity when the noise random variable N , and the interfering signal
random variable T are Gaussian with variances σn2 , σt2 , respectively. Using this
164 Simple channels
choice of U , observe that
I(U ; Z) = H(Z) − H(Z|U )

= H(S + T + N ) − H(S + T + N |S + μT )
= H(S + T + N ) − H(S + T + N, S + μT ) + H(S + μT )

1 (P + σn2 + σt2 )(P + μ2 σt2 )
= ln (5.64)
2 P σt2 (1 − μ) + σn2 (P + μ2 σz2 )
and that by making S Gaussian,

σ2
I(U ; T ) = ln 1 + μ2 t . (5.65)
P
Suppose that the rate is given by

1 (P + σn2 + σt2 )(P + μ2 σt2 ) 2
2 σt
I(U ; Z) − I(U ; T ) = ln − ln 1 + μ ,
2 P σt2 (1 − μ) + σn2 (P + μ2 σt2 ) P
(5.66)
then, Costa finds that by setting
P
μ= , (5.67)
P +N

1 P
R = I(U ; Z) − I(U ; T ) = ln 1 + 2 . (5.68)
2 σn
This rate equals the capacity as if%the &interfering signal does not exist! Note that
all along we have assumed that |x|2 ≤ P . Thus, the “presubtraction” of the
interference is done in such a manner that the average power of the transmit-
ted signal is unchanged. If a naive presubtraction strategy is used, the average
transmit power would be higher, as the following example illustrates. Consider a
system in which the transmitted symbol at a given time s equals the difference
between the codeword associated with the message m denoted by s̃ and the state
t, that is
s = s̃ − t . (5.69)
Since the received signal is a superposition of s and t, the receiver only sees the
codeword associated with the message m, i.e. s̃. The average transmit power for
this scheme is
% 2&
|s| = P + σt2 . (5.70)
This power is, of course, higher than the transmit power budget. Thus, dirty-
paper coding precompensates for the interference by cleverly encoding the trans-
mitted signal without a power penalty.
There is a nice geometric interpretation of the dirty-paper coding technique
that is depicted in the following figures, which is based on the presentation in
[314]. Consider a two-dimensional codeword space, and suppose that there are
four possible codewords, each represented by a different shape as depicted in

Figure 5.11. The dirty-paper coding technique extends this constellation by gen-
erating additional codewords that can be used to represent the original four
codewords. Figure 5.12 depicts one such extension. Note that a given constella-
tion point from Figure 5.11 may be represented by any one of the four constel-
lation points of the same shape in Figure 5.12.
For instance, suppose that the transmitter wishes to send the codeword cor-
responding to the circle. It looks in the extended constellation for the circle
that is closest to a scaled version of the state vector μ t. We shall represent the
corresponding codeword by the vector u (note that this is a vector of ns auxil-
iary variables). The transmitter then sends the difference between this auxiliary
vector u and the scaled interfering signal,
s = u − μt. (5.71)
The scaling by μ ensures that the correct amount of power is devoted to
presubtracting the interference signal. In the extreme case where there is no
noise, μ = 1 and the entire interfering signal is presubtracted. When the noise is
much larger than the signal, μ 1 and most of the transmit energy is used to
transmit the codeword.
The receiver multiplies the received vector y by μ to get
μ y = μ u − μ2 t + μ t + μ n (5.72)
and finds the closest constellation point to μ y in the extended constellation.
Note that when the number of symbols per codeword ns is large, the vector
μ u − μ2 t + μt = μs + μt in the previous expression lives with high probability
√
in an ns -dimensional sphere of radius μ ns P around the point μ t. The scaled
noise vector μ n, that is, the third term on the right-hand side of
Equation (5.72)
with high probability lives in an ns -dimensional sphere of radius μσn2 . Thus, as
the block size increases, that is, ns → ∞, with high probability the total number
of codewords (without the extension) that can be distinct is simply the ratio of
√
the volume of an ns -dimensional sphere with radius μ ns P to the volume of
an ns -dimensional sphere of radius μns σ 2 . Following the analysis of Section
5.3.1, as ns → ∞, the number of distinct codewords converges to:

P
ns log2 1 + 2 , (5.73)
σn
which per symbol, is the capacity of the additive white Gaussian noise without
interference! This interpretation of the dirty-paper coding scheme is related to
Tomlinson–Harashima precoding (see Problems).
5.4 Energy per bit
For a Shannon channel capacity c in bits/symbol or bits/second/hertz, the actual

link data rate R in bits/second for a frequency-band-limited signal is bounded
166 Simple channels
Figure 5.11 Original constellation for Gel’fand–Pinsker/dirty-paper coding.
μt
u − μt
Figure 5.12 Extended constellation for Gel’fand–Pinsker/dirty-paper coding. The

figure shows how the codeword corresponding to the circle is transmitted when the
state vector t is known.
by R ≤ Bc, where B is the bandwidth of the complex signal. If the symbol

rate is equal to bandwidth, then the bound on spectral efficiency in terms of
bits/second/hertz is also c. By noting that the noise power σn2 can be expressed
in terms of the noise spectral density N0 ,
σn2 = N0 B
= kB T B , (5.74)
where kB is the Boltzmann constant (∼ 1.38 · 10−23 J/K) and T is the absolute
temperature as discussed in Section 4.2.2. The spectral efficiency bound for the
spectral efficiency c ≥ R/B in bits/second/hertz is given by

Pr
c = log2 1 + , (5.75)
N0 B
where Pr is the receive power of the transmitted signal, N0 is the noise spectral
density (assuming complex noise), and B is the bandwidth.
For this simple channel, the SNR is given by
Pr
SNR = . (5.76)
N0 B
A related useful measure of signal energy to noise is Eb /N0 , which is sometimes
unfortunately pronounced “ebb-no.” This is the energy per information bit at
the receiver divided by the noise spectral density. The energy is given by the
power divided by the symbol rate. The energy per information bit normalized
by the noise spectral density Eb /N0 is given by
Eb Pr 1 B
= = SNR . (5.77)
N0 R N0 R
Consequently, capacity in terms of Eb /N0 is given by

Eb R
c ≥ log2 1 + . (5.78)
N0 B
The equality is satisfied if the communication rate density R/B is equal to the
capacity c. Thus, the implicit relationship between bounding spectral efficiency
and Eb /N0 is defined as

Eb
c = log2 1 + c . (5.79)
N0
We can solve for Eb /N0 for a capacity-achieving link:
Eb
2c = 1 + c
N0
Eb 2c − 1
= . (5.80)
N0 c
In the limit of low spectral efficiency, we can use the fact that the log of 1 plus a
small number is approximately the small number presented in Equation (2.14),

Eb
c = log2 (e) log 1 + c
N0

Eb
≈ log2 (e) c . (5.81)
N0
Consequently, for small spectral efficiencies c 1, there is a limiting Eb /N0 that
is independent of the exact value of spectral efficiency,
Eb 1
≈ ≈ −1.59 dB . (5.82)
N0 log2 (e)
Furthermore, this is the smallest Eb /N0 required for any nonzero spectral effi-
ciency. Links with large spectral efficiencies c 1 require larger Eb /N0 . The
required Eb /N0 as a function of the channel capacity is displayed in Figure 5.13.
168 Simple channels
Eb N0 dB
4
0.01 0.05 0.10 0.50 1.00 5.00

Capacity b symbol
Figure 5.13 The required noise-normalized energy per information bit as a function of
the channel capacity in terms of bits per second per hertz.
As the capacity falls below about 0.1 b/s/Hz, the required Eb /N0 does not change
appreciably. The implication is that in this constant Eb /N0 regime the data rate
is proportional to power. The capacity region where this is true is sometimes
denoted the noise-limited regime. Above about 1 b/s/Hz, increasing the data
rate requires an exponential increase in power. The capacity region in which this
is true is sometimes denoted the power-limited regime.
Problems
5.1 For a satellite in geosynchronous orbit about the earth centered over the
continental United States,
(a) find the antenna gain required to cover the continental United States well
(approximate 3 dB northeast corner of Maine and southwest corner of
California, about 5000 km); and
(b) evaluate the approximate effective area assuming a carrier frequency of
10 GHz.
5.2 In a line-of-sight environment without scatterers, find the largest achievable
data rate between two short dipoles with the same orientation, separated by
1 km, transmitting 1 W, operating at a carrier frequency of 1 GHz, with a
receiver at temperature of 300 K.
5.3 Consider Figure 5.14, which is a block diagram of a Tomlinson–Harashima
precoder developed in the 1970s to mitigate intersymbol-interference. The prin-
ciples of its operation are very similar to the Costa precoding described in this
chapter.
(a) Suppose that the box marked f (·, V ) is removed, i.e., x[k] equals m[k] with
the output of the filter g[k] subtracted out. Find g[k] such that y[k] = x[k].
Problems 169
Transmitter
Figure 5.14 Tomlinson–Harashima precoder.
(b) Suppose that ||m[k]|| ≤ M . What is the largest possible value that x[k] can
take, assuming that the box marked f (·, V ) is still not present?
(c) Please specify f (·, V ) such that ||x[k]|| ≤ V ∀k and show that y[k] = x[k].
5.4 By noting that at low spectral efficiency the best case Eb /N0 ≈ −1.59 dB,
evaluate the minimum received energy required to decode 1000 bits at a temper-
ature of 300 K.
5.5 Evaluate the differential entropy for a real Rayleigh random variable.
6 Antenna arrays
Arrays of antennas can be used to improve the signal-to-noise ratio (SNR) and
to mitigate interference. For many communication links, the propagation envi-
ronment is complicated by scattering that can distort an incoming signal both
in direction and delay. In this chapter’s introductory discussion, a simplifying
assumption is employed. Within this chapter, it is assumed that there is no scat-
tering. An example would be a line-of-sight link in a large anechoic chamber.
In addition, it is assumed that the antenna array is small compared with the
ratio c/B of the propagation speed c to the bandwidth B. As a consequence, it
is assumed that the signal is not dispersive across the antenna array. A disper-
sive channel would have resolvable delay spread across the antenna array. These
restrictions will be removed in subsequent chapters.
6.1 Wavefront
Consider a single transmitter that is a long distance away from an array of receive
antennas. The wavefront that expands in a sphere about the transmitter can be
approximated by a plane near the antenna array as seen in Figure 6.1. In other
words, the transmitter is far enough from the receive antenna array such that
the phase error associated with the plane wave approximation is small. This is
a valid approximation1 when
L2
R , (6.1)
λ
if R is the distance from the source, L is the size of the array (L is the largest
distance between any two antennas), and λ is the wavelength. Here it is assumed
that each receive antenna is identical. Because the receive antennas are at slightly
different distances from the source in general, the plane wave impinges upon each
antenna with some relative delay. Under the assumption of a narrowband signal,
that is, a signal that does not have sufficient signal bandwidth B to resolve the
relative antenna locations
c
B , (6.2)
L
1 The notation indicates much greater, which is not a precise notion, but is dependent
upon the allowed error in the approximation.
6.1 Wavefront 171
Figure 6.1 Propagation of wavefront across an array that has wavevector k,

wavelength λ, and transmitted signal s(t). The time delay of the signal observed
between two antennas is Δt.
the delays can be approximated well by phase differences of the carrier

wave.
Far from a source, the propagating signal can be approximated by a plane
wave with angular frequency ω = 2πf0 , where f0 is the carrier frequency. The
plane wave is given by a solution ψ(x, t) to the wave equation [154] represented
by the partial differential equation
1 ∂2
∇2 ψ(x, t) − ψ(x, t) = 0 , (6.3)
c2 ∂t2
where t is time, and x ∈ R3×1 is a location in space. The characteristic direction
information of the plane wave is contained within the wavevector, k ∈ R3×1 . The
wavevector points along the direction of propagation and has the magnitude
2π
k =
λ
f0
= 2π . (6.4)
c
There are a variety of ways of interpreting the wave equation, depending upon the
application. As an example, the electric field along some direction (the direction
perpendicular to the plane displayed in Figure 6.1 would be useful) could be
given by the real part of ψ(x, t). More complicated polarization and geometries
could be constructed by using different solutions to the wave equation ψ(x, t) for
the electric field along each axis.
The complex amplitude of a propagating plane wave as a function of time and
location (the solution of the wave equation2 ) [154] is characterized by
ψ(x, t) = a e−iω t+ik·x , (6.5)
2 If the sign of the terms in the exponent in Equation (6.5) are inverted, then Equation (6.3)
is still satisfied. Unfortunately, both conventions are employed.
172 Antenna arrays
where the complex amplitude attenuation is given by a ∈ C (the phase of the

complex attenuation is defined by the position and phase of the source), the
distance x ∈ R3×1 is measured from some arbitrary origin, and k · x = kT x
indicates the inner product. The location referenced by x is valid for any point
in the far field by satisfying Equation (6.1). For an antenna used by the receiver,
the point x at which the field is measured is often called the phase center of
the antenna. Theoretically the field observed by an object of extended size can
be represented by the measurement at a point; however, in practice, the exact
position of the phase center can be a complicated function of the surrounding
environment and of the frequency of operation.
A communication signal can be carried by modulating (slowly, compared with
the carrier wave frequency, modifying the phase and amplitude) of this plane
wave with a complex baseband signal, s(t). By assuming that the measurable
electric field in some direction is given by the real part of ψ(x, t), the resulting
complex amplitude of the wavefront has the form
ψ(x, t) = a e−iω t+ik·x s(t − τ0 ) , (6.6)
where τ0 is the time delay for the signal to propagate from the transmitter to
some local axis origin. The location x is measured from this origin.
After frequency downconversion that shifts the signal down to a complex base-
band, removing the carrier frequency term e−iω t , the received baseband signal
at the mth receive antenna is given by zm (t),
zm (t) = a eik·x m s(t − τ0 ) + nm (t) , (6.7)
where xm is the location for the mth receive antenna and nm (t) is the additive
noise as a function of time.
For the sake of clarity in the following discussion, we will assume in this section
that the noise is negligibly small. The received signal under the narrowband
baseband signal approximation is then an attenuated and delayed version of the
transmitted signal
zm (t) ≈ a ei k·x m s(t − τ0 ) . (6.8)
6.1.1 Geometric interpretation

The ei k·x m term can be understood intuitively by recognizing that the phase
difference at each receive antenna is the result of the relative time delay for the
wavefront to impinge upon the various receive antennas, as seen in Figure 6.1.
This time delay is proportional to the relative position of the antennas along the
direction of the wavevector. The component of the displacement from the origin
to the antenna along the wavevector for the mth receive antenna is given by the
inner product of the normalized wavevector and the antenna location vector,
k
· xm . (6.9)
k
6.1 Wavefront 173
The delay relative to the origin for the mth receive antenna is given by the
relative distance divided by the propagation speed,
k
k · xm
Δtm = . (6.10)
c
The relative phase is given in the argument of e−i ω (t−Δ t m ) . By focusing on the
relative phase of the baseband signal, the following antenna-dependent phase
term is found,
k ·x
k m
eiω Δ t m = eiω c
= eik k ·x m
k
= eik·x m , (6.11)
where the relationship k = 2π/λ = ω/c is used.
6.1.2 Steering vector

The phase relationships, given a set of receive antennas, can be represented
compactly when it is assumed that there is no dispersion across the array, by
using a vector notation. In the absence of noise (in the very high SNR limit),
the received complex baseband signal is proportional to the steering vector,
z ∝ v(k) s(t − τ0 ) . (6.12)
Depending upon what is convenient for the analysis, two different normalizations
are commonly employed for the steering vector: v(k) 2 = 1 or v(k) 2 = nr .
For this discussion, the former is used. Both forms are used in the text. The
receive steering vector v(k) ∈ Cn r ×1 is defined to be
ei k·x m
{v(k)}m = √ . (6.13)
nr
As an aside, a potential source of confusion is the use of transmit versus receiver
steering vectors by different authors. This decision of usage changes the sign on
the wavevector. Here, it is assumed that the wavevector points along the direction
of propagation. It is important to note that for this form to be valid, it is assumed
that the antenna response of each antenna is identical and that each antenna has
the same polarimetric and angular response for all angles. An example of this is
a set of identical vertical dipole antennas, which are assumed to not interact. At
best, all of these assumptions are only approximately true. As will be discussed in
the following section, the assumption of having a flat phase response as a function
of angle can often be relaxed because many metrics, such as beam patterns, are
insensitive to overall phase, so long as the phase response as a function of angle
is identical for all antennas. For applications for which knowledge of the array
response is important, careful element and array calibration [104, 181] must be
performed. There are numerous sources of errors that create the requirement for
174 Antenna arrays
calibration. These include position errors, electrical mismatch of antennas, and

coupling of antennas to the local structure as well as other antennas. In general,
it is possible to extend the steering vector response to include the individual
responses of the each antenna in the array, in which case the steering vector
becomes
ei k·x m
{v(k)}m = Am(k) √ , (6.14)
nr
where Am (k) is the direction and antenna-dependent amplitude factor.
While it is typically not useful from a terrestrial communications point of
view, for a variety of applications (such as communications between airborne
platforms, geolocation, and radar), it is sometimes useful to find the direction
or angle of arrival of the signal from a transmitter. Under the assumption that
a single plane wave is impinging upon the array, an estimate of the direction
to a source can be found by maximizing the magnitude of the inner product
between the observed array response vector z (which contains the array’s phase
and amplitude response to the impinging wavefronts plus noise) and a steering
vector as a function of angle.
v† (φ) z
φ̂ = argmaxφ . (6.15)
v† (φ)
If multiple observations are given of the receive array response, then a rich set
of approaches is available. Some of these techniques are discussed in Chapter 7.
The difference between array response and steering vector is rather subtle.
The steering vector is the theoretical set of antenna array phases and ampli-
tudes that is a function of the angle between the wavefront and some reference
angle on the array. In the absence of the multipath scattering that is assumed
in this chapter, the array response and steering vector are the same for some
given wavefront propagation direction. The observed receive array response may
include noise of the observation. In the more typical environment in which there
is multipath, the array response will be more complicated than is allowed by the
simple model assumed by the steering vector. Both of these terms are differen-
tiated from a transmit or receive beamformer (discussed in Section 6.2) in that
the beamformer is selected by the radio to achieve some goal. The inner product
between the receiver beamformer and the observed array response can be used as
an estimator of the transmitted signal. A reasonable choice for a beamformer in a
scatterer-free environment is to employ the steering vector associated with the
direction to the other radio.
6.2 Array beam pattern
A receive beamformer w ∈ Cn r ×1 contains coefficients that modify the phases

and amplitudes of the nr signals received by an array of antennas and then
sums the results, producing a single stream of data. The beamformer can be
considered a form of spatial filtering, which is analogous to spectral filtering. A
beamformer cam be constructed to selectively receive energy from some given
direction. The resulting array beam pattern is a measure of power at the output
of the beamformer (or array factor in amplitude) for signals coming from other
physical directions. In general, the term beamformer can be applied to either
transmission or reception. The mathematical formulation is the same up to a
reversal of time.
There are a variety of approaches for constructing the beamformer. Various
approaches are considered in Chapter 9. The implementation of the beamformer
can be at the carrier frequency; however, in modern digital receivers the
beamforming is typically applied to the complex baseband signal with the equiv-
alent effect. A transmit beamformer is the same up to a time reversal (causing
a conjugation), so that, for transmitting, w∗ would be employed rather than w.
The average amount of power Pw at the output of a beamformer w ∈ Cn r ×1
applied to the received data stream as a function of time z(t) ∈ Cn r ×1 , built
from the vector of received signals in Equation (6.7) is given by
% &
w† z(t) 2
Pw =
w 2
† 2
% &
w v(k) a 2 s(t) 2
= , (6.16)
w 2
where it is assumed that the received signal consists of a single wavefront propa-
gating along the wavevector k. The normalizing term of w 2 in the denominator
keeps the noise power constant for different scales of the beamformer. Once again,
noise is assumed to be negligibly small for this discussion. The steering vector,
defined in Equation (6.13), indicates the relative phases and amplitudes
% & for the
incoming signal. The mean square of the transmitted signal s(t) 2 = Pt is
associated with the transmit power and does not affect the shape of the beam
pattern.
It is sometimes useful to consider the normalized beam pattern ρw (k). It is
constructed so that the matched response (when w ∝ v(k)) would be unity and
is given by
w† v(k) 2
ρw (k) = . (6.17)
w 2 v(k) 2
The relative power at the output of the beamformer is given by the square of
the normalized inner product between the beamformer and the steering vector
for wavefronts propagating along various directions.
176 Antenna arrays
The beamformer that maximizes ρw (k) for a wavefront propagating along

some direction k0 is the matched beamformer,
w† v(k0 ) 2
w = argmaxw
w 2 v(k) 2
∝ v(k0 ) . (6.18)
Here, the solution for w is invariant under a change in some nonzero
multiplicative complex constant, so that for some constant a the solution a w
would produce the same normalized beam pattern, ρa w (k) = ρw (k).
6.2.1 Beam pattern in a plane

For geometries such that the source and the array are constrained to be in a
plane (at least approximately), the direction information contained in k can be
expressed with a single angle φ relative to some axis (typically along {x}1 from
Figure 6.1, which is assumed in this discussion). Consequently, the relative beam
pattern can be expressed as
w† v(φ) 2
ρw (φ) = . (6.19)
w 2 v(φ) 2
Here, the angle φ indicates the angle between the axis along {x}1 and a ray point-
ing toward the transmitter. In an environment free of scatterers or
multipath, a beamformer that maximizes the power from a direction φ0 is given
by solving Equation (6.18) given the two-dimensional constraint resulting in the
form
w = v(φ0 ) . (6.20)
This beamformer is equal to the expected array response, which in this environ-
ment is given by the steering vector. For a beamformer matched to the array
response of a wavefront coming from φ0 (relative to the {x}1 axis), the accep-
tance of power at the output of the beamformer from angle φ is given by
v†(φ0 ) v(φ) 2
ρw = v(φ 0 ) (φ) = . (6.21)
v(φ0 ) 2 v(φ) 2
For a wave propagating along the direction φ + π, the mth element of the
receive steering vector as a function of φ (pointing back toward the source) is
given by
2π
[{x m }1 cos(φ+ π )+{x m }2 sin(φ+π )]
ei λ
{v(φ)}m = √
nr
e−i [{x m }1 cos(φ)+{x m }2 sin(φ)]
2π
λ
= √ . (6.22)
nr
Note that the direction of propagation is in the opposite direction of the angle
to the source. This direction inversion induces a sign flip in the exponential
compared with what one might expect when using the direction of wavefront
propagation convention.
A sidelobe is a local peak of received power in a direction different from the
intended beamformer direction. For many applications, such as geolocation, the
levels of these sidelobes can be important because they can cause confusion with
regard to direction to the signal source. In environments with interfering users,
the beam pattern can be used to reduce power from interfering users at differ-
ent directions. High sidelobes indicate the potential for higher levels of interfer-
ence power at the output of a receive beamformer. A reasonable question is, do
sidelobes matter? For many wireless communication applications, they are not
important. If there is significant multipath such that the notion of line-of-sight
beam patterns is not valid, then the idea of sidelobes for a line-of-sight beam
pattern has little applicability. Also, if there is a single line-of-sight source, then
accepting energy from other directions in addition to receiving energy from the
intended direction will cause no adverse effects. Similarly, given a small number
of interferers, if adaptive processing is used, then the sidelobes can be distorted
to avoid accepting energy from the potential interferers. Once again, there are no
adverse effects for most applications. Conversely, if line-of-sight propagation is a
reasonable model, and either there is a very large number of interferers so that
adaptivity is not effective, or adaptivity is not possible, then sidelobe levels can
be important. If the array is being used for direction of arrival estimation in the
presence of significant noise, then the sidelobes can be important because of the
potential of confusing the correct angle of arrival with an angle corresponding
to a sidelobe direction.
Circular array example

As an example, consider a regular circular receive array with nr element positions
given by xm , such that
{xm }1 = r cos(2πm/nr )
{xm }2 = r sin(2πm/nr ). (6.23)
The steering vector is given by
e−i2π r [cos(2π m /n r ) cos(φ)+sin(2π m /n r ) sin(φ)]
{v(φ)}m = √ , (6.24)
nr
where r is the radius of the array measured in wavelengths, and φ indicates the
direction to the source. Consider an eight-antenna regular circular array with
a radius of one wavelength. The geometry is displayed in Figure 6.2. Here we
employ a matched-filter beamformer optimized for φ = 0. This angle is sometimes
denoted “boresight.” Conversely, the angle along the array is sometimes denoted
“end fire.” While for a circular array these definitions make little sense, they
are used regularly, and are more sensible for linear arrays. The matched-filter
beamformer is equal to the anticipated array response. Relative power at the
output of the beamformer as a function of transmitter angle is given in Figure 6.3.
178 Antenna arrays
1.5
1.0
Axis wavelengths
0.5
0.0
0.5
x 2
1.0
1.5
1.5 1.0 0.5 0.0 0.5 1.0 1.5
x 1 Axis wavelengths
Figure 6.2 Antenna array geometry for an eight-element array of radius one
wavelength.
Relative Beamformer Output Power dB
10
15
20
25
150 100 50 0 50 100 150
Angle deg
Figure 6.3 The beam pattern for eight-element circular array with radius of one
wavelength.
The relative power at the output of the beamformer is typically denoted an

antenna array beam pattern. The region around φ = 0 is the mainlobe of the
pattern. The width of the mainlobe in terms of radians is very approximately
given by the wavelength divided by the aperture, in this case about 1/2 or
a little less than 30 degrees. The aperture is the length of the array.3 For a
beamformer optimized for φ = 0, the amount of power accepted from other
directions can be relatively high. In this example, at the angle of about ±85
degrees, the attenuation is only down by 5 dB. This region of relatively low
attenuation is denoted a sidelobe. It is often desirable to minimize the height
of these sidelobes to minimize interference from undesired sources. The various
approaches to do this include both adaptive and nonadaptive techniques.
6.3 Linear arrays
For a linear receive antenna array in the {x}1 –{x}2 plane with the antennas
along the {x}2 axis starting at the origin (as seen in Figure 6.4), with regular
antenna spacing d, the positions are given by
{xm } = (m − 1) d . (6.25)
The inner product between the wavevector and the position of the antenna (de-
termined by angle φ) is given by
2π (m − 1) d sin(φ)
k · xm = − . (6.26)
λ
In the special case of a line-of-sight transmitter, the array response is given by
i 2 π ( m −1 ) d s i n ( φ )
e− λ
{v(φ)}m = √ , (6.27)
nr
√
where the arbitrary normalization is chosen by using the term nr so that the
magnitude of the array response is 1,
v(φ) = 1 . (6.28)
Given this formulation, the beam pattern is given by
ρv(φ 0 ) (φ) = v†(φ) v(φ0 ) 2
$ n −1 $2
$1 r $
$ i 2 π m d sin (φ ) −i 2 π m d s i n ( φ 0 ) $
=$ e λ e λ $
$ nr $
m =0
$n −1 $2
1 $ $
$ i 2 π m d [ s i n (λφ ) −s i n ( φ 0 ) ] $
r
= 2$ e $ . (6.29)
nr $ m =0
$
In particular, consider an eight-antenna regular linear array with spatial sam-

pling of 1/2 wavelength. This sampling is the spatial equivalent of Nyquist
sampling in the temporal domain. The geometry of the array is displayed in
3 The notion of aperture is not precisely defined. Sometimes it is useful to define it in terms
of the root-mean-square size of an array, because this corresponds to a parameter
developed by the Cramer–Rao bound. Often a sufficient definition is the largest length
between any two elements.
180 Antenna arrays
Axis wavelengths
2
1
x 2
0
2 1 0 1 2
x 1 Axis wavelengths
Figure 6.4 Antenna array geometry for an eight-element linear array with spacing of
1/2 wavelength.
Figure 6.4. For a matched-filter beamformer optimized for φ = 0 (along the {x}1
axis), with steering vector
1
v(φ0 ) = √
nr
⎛ ⎞
1
⎜ 1 ⎟
⎜ ⎟
1=⎜ . ⎟, (6.30)
⎝ .. ⎠
1
the relative power at the output of the beamformer as a function of transmitter
angle is given in Figure 6.5. As with the circular array, the region around φ = 0
is the mainlobe of the pattern. The width of the mainlobe is approximately
given in radians by the wavelength divided by the aperture. In this case, the
beamwidth is about 1/4 radians or a little less than 15 degrees. In this example,
the peak sidelobes are lower than the peak by about 13 dB. In fact, because of
the rotational symmetry about the {x}2 axis, energy received from various angles
will be equal in response for the line along a cone at the given angle as displayed
in Figure 6.6. In this particular example in a plane, the rotational symmetry
creates an exact forward–backward ambiguity in the beamformer, causing the
acceptance of power to be equal at φ = 0 and φ = 180 degrees.
For isotropic antenna elements, the Nyquist spacing for an antenna array is
d = λ/2. At this spacing, there will be no ambiguities over the range of angles
φ ∈ (−π/2, π/2) for signals in the {x}1 –{x}2 plane. Out of this plane, there can
be some confusion. The direction from which energy is preferentially received
Relative Beamformer Output Power dB

0
10
15
20
25
150 100 50 0 50 100 150
Angle deg
Figure 6.5 The beam pattern for an eight-element linear array with spacing of 1/2
wavelength.
Cone of {x}2 axis

Ambiguity Source
Am ection
Sou n To
Dir
bigu
rce
ctio
ous
Dire
φ
{x}1 axis
Antenna Array
Figure 6.6 For linear array direction to source and cone of ambiguity.
is determined unambiguously up to a cone generated by rotating a ray at the

steering angle about an axis along which the antenna array lies as seen in Figure
6.6. If each antenna in the array has some beam pattern so that it only receives
energy over some limited range of angles, the Nyquist sampling distance would
be larger without ambiguity. This discussion assumes that each of the antennas is
pointed in the same direction. While it is convenient to discuss isotropic antennas
here, many practical systems employ antenna arrays with antennas or subarrays
that have some gain on their own.
182 Antenna arrays
6.3.1 Beam pattern symmetry for linear arrays

Because of the construction of linear arrays, matched-filter array beam patterns
evaluated for these arrays have a reflection symmetry about the pointing direc-
tion. The beam pattern for an array pointing in direction φ0 receiving power
from φ is given by
ρv(φ 0 ) (φ) = v†(φ) v(φ0 ) 2

$n −1 $2
1 $ $
$ i 2 π m d [ s i n (λφ ) −s i n ( φ 0 ) ] $
r
= 2$ e $ .
nr $ m =0 $
$n −1
1 $$
r
2π m d [sin(φ) − sin(φ0 )]
= 2$ cos
nr $ m =0 λ
$2
2π m d [sin(φ) − sin(φ0 )] $ $
+i sin $
λ
$n −1 $
1 $ 2π m d [sin(φ) − sin(φ0 )] $
2
$
r
$
= 2$ cos $
nr $ m =0 λ $
$n −1 $2
1 $$
r
2π m d [sin(φ) − sin(φ0 )] $
$
+ 2$ sin $ . (6.31)
nr $ m =0 λ $
The array response is symmetric under the transformation
[sin(φ) − sin(φ0 )] → −[sin(φ) − sin(φ0 )] . (6.32)
If the reference direction is along boresight φ0 = 0, then the symmetry is observed

for the transformation φ → −φ. This symmetry is broken in two-dimensional
arrays.
6.3.2 Fourier transform interpretation

By considering a discrete sampling in sin(φ), the beamformer can be constructed
from the discrete Fourier transform (DFT) of the antenna weighting vector, which
is defined by the existence of antennas on a regular lattice. We consider a periodic
or regular sampling of the sine of the angle φ,
u = sin(φ) ∈ [−1, 1] . (6.33)
The samples in u are represented by regular samples q Δu, where q ∈ {−M/2, . . . ,

M/2} is the index parameter and Δu is the distance between samples in u. The
value of Δu is chosen so that there are M + 1 samples from u = −1 to u = 1. For
convenience, it is assumed that M is an even integer. The value of the angular
step size Δu is given by
2 2
Δu = = . (6.34)
M +1−1 M
For this discussion, it will be convenient to define the antenna weighting vector
a ∈ CM ×1 indexed from 0 to M − 1,
{v(φ0 )}m m ∈ {0, . . . , nr − 1}

{a}m = . (6.35)
0 m ∈ {nr , . . . , M − 1}
The vector contains information about both the existence of an antenna at some
lattice position and the phasing of that element.
Given these definitions and the assumption of λ/2 antenna spacing, the beam
pattern is given by
ρa (φ = arcsin[q Δu]) = v†(φ) v(φ0 ) 2

$n −1 $2
$ r $
$ ∗ $
=$ {v(arcsin[q Δu])}m {a}m $
$ $
m =0
$ $2
$ 1 n r −1 $
$ $
= $√ eiπ m q Δ u
{a}m $
$ nr $
m =0
$ $2
$ 1 M −1 $
$ $
= $√ eiπ m q Δ u {a}m $
$ nr $
m =0
$ $2
$ 1 M −1 $
$ i2π m q $
= $√ e M {a}m $ , (6.36)
$ nr $
m =0
so that the the beam pattern is now evaluated at a discrete set of points de-
termined by q Δu. Here we employ the observation that extending the range of
summation from nr − 1 to M − 1 indices has no effect because of the zero entries
in the antenna weighting vector a. The argument of the norm operator, which
we will denote the complex beam pattern b(q) as a function of q, is given by
M −1
1 i2π m q
b(q) = √ e M {a}m . (6.37)
nr m =0
The M + 1 values of q are given by
q ∈ {−M/2, −M/2 + 1, . . . , M/2 − 1, M/2} . (6.38)
However, because the exponential has the same value for arguments with imagi-
nary components separated by integral multiples of 2π, the first and the (M +1)th
indices are redundant,
i2π i2π
mq m (q + M )
eM =eM . (6.39)
So only the values of
q ∈ {−M/2, −M/2 + 1, . . . , M/2 − 1} (6.40)

184 Antenna arrays
are necessary. For convenience, consider the admittedly strange ordering of pos-
sible values of q,

M M M M M
q ∈ 0, 1, . . . , − 2, − 1, − , − + 1, − + 2, . . . , −1 . (6.41)
2 2 2 2 2
Notice that the negative values have been moved to the right-hand side of the
list. Furthermore, because of the same modularity characteristic used above, a
new index variable q is constructed spanning the same space of angles,
q ∈ {0, 1, . . . , M − 2, M − 1} . (6.42)
The difference between q and q is analogous to the difference in considering

the discrete Fourier transform of a time domain sequence. The spectrum can be
represented in an approximately symmetric domain about zero frequency, or the
spectrum can be represented by a domain covering zero to approximately twice
the maximum physical frequency.
Given this new index variable q , the complex amplitude beam pattern b(q →
q) now indexed by q rather than q is given by
M −1
1 i2π m q
b(q → q) = √ e M {a}m . (6.43)
nr m =0
By defining the vector b,

⎛ ⎞
b(q = 0)
⎜ b(q = 1) ⎟
⎜ ⎟
b=⎜ .. ⎟, (6.44)
⎝ . ⎠
b(q = M − 1)
the explicit relationship between the complex beam pattern and the discrete
Fourier transform of the antenna weighting vector can be found,
√
M
b = √ Fa, (6.45)
nr
where F is the discrete Fourier transform matrix defined in Equation (2.225).
Consequently, the beam pattern is given by the discrete Fourier transform of the
antenna weighting vector,
M 2
ρa (arcsin[q Δu]) = {F a}q . (6.46)
nr
6.3.3 Continuous Fourier transform approximation

It is sometimes useful to consider a continuous approximation of the antenna
array because it enables convenient intuition. This approximation is developed by
considering the limiting forms of Equations (6.35) and (6.46). The development
is similar to the discrete version discussed previously.
In the continuous version of the vector inner product, one can think of an
infinite-dimensional vector indexed by some continuous parameter, as introduced
in Section 2.1.4. While we will not be particularly involved in the technical de-
tails, this vector space is sometimes referred to as an infinite-dimensional Hilbert
space. For example, the antenna weighting vector a can be indexed by the po-
sition along the linear array x, under the assumption of some pointing direction
u. The continuous antenna weighting function is denoted
a → fa (x; u) , (6.47)
where the function is defined as a distance along the antenna array x in units of
wavelength. Inner products in this complex infinite-dimensional space are given
by integrating over the indexing parameter. In this case it is x. The complex
infinite-dimensional inner product between function f (x) and g(x) is denoted

f (x), g(x) = dx f (x) g ∗(x) . (6.48)
The beam pattern is related to the magnitude squared of the Fourier transform
of a continuous version of the antenna weighting vector. Similarly, the continuous
version of complex beam pattern is denoted
b → fb (u) , (6.49)
where function is defined in terms of the direction u = sin(φ). In a continuous

analog to the steering vector, the phasing of fa (x; u) is given by
⎧
⎪
⎨ 0 ;x<0
1 −i2π u x
fa (x; u) = √ e ; 0≤x≤L , (6.50)
⎪
⎩
L
0 ;x>L
where L is the length of the antenna array in units of wavelength. The inner prod-
uct between the continuous steering vectors at some direction u and boresight
u = 0 is given by

fa (x; u), fa (x; 0) = dx fa (x; u) fa∗ (x; 0)

1 L
= dx e−i2π u x
L 0
−1 −i2π u L
= e −1
i2π u L
e−iπ u L iπ u L
= e − e−iπ u L
i2π u L
sin (πuL)
= e−iπ u L
πuL
= e−iπ u L sinc (uL) , (6.51)
186 Antenna arrays
where the normalized sinc function4 is given by sinc(x) = sin(πx)/(πx). This

inner product is similar to the Fourier transform of the continuous antenna
weighting pointed toward boresight fb (u). The continuous version of the complex
pattern b is given by the Fourier transform of the antenna weighting function
fa (x; u),

fb (u) = dx e−i2π u x fa (x; 0)
L
1
=√ dx e−i2π u x
L 0
L
1 i −i2π u x
= √ e
L 2πu 0
1 1
=√ (1 − e−i2π u L )
L i2πu
√
= L e−iπ u L sinc(uL) . (6.52)
Similar to the discrete case, the inner product between steering vectors is related
to the Fourier transform of the array weighting vector,
1
fa (x; 0), fa (x; u) = √ fb (u) . (6.53)
L
The normalized beam pattern is given by the magnitude squared of the normal-
ized inner product of continuous steering vectors or functions,
2
fa (x; 0), fa (x; u)
ρ(u) =
fa (x; 0), fa (x; 0) fa (x; u), fa (x; u)
2
= fa (x; 0), fa (x; u)
= sinc2(uL)
1 2
= fb (u) . (6.54)
L
By using this analysis, the peak sidelobe in the beam power pattern ρ(u) is
about 13 dB below the peak, as seen in Figure 6.7. The value of the peak side-
lobe can be reduced by using windowing or tapering techniques [137]. However,
windowing increases the width of the mainlobe and causes loss in peak gain.
6.4 Sparse arrays
6.4.1 Sparse arrays on a regular lattice

There is no fundamental reason to require filled regular arrays. In fact, for many
applications, performance can be improved by using sparse arrays given the same
number of antennas. For a given number of antenna elements, the width of the
4 While it sometimes unfortunately causes confusion, there are two commonly used
normalizations for the sinc function. In this text, the normalized sinc function (preferred in
the communications and signal processing literature) is used rather than the unnormalized
sinc function, sin(x)/(x), that is commonly used in mathematics.
(a)
1
fa x;0
0.5
0
0.5 0 0.5 1
xL
(b)
0
5
Beam Pattern, ␳ u
10
15
20
25
30
3 2 1 0 1 2 3
Direction, u L
Figure 6.7 (a) The continuous antenna weighting function in terms of position along
the antenna array, assuming the array is phased to point perpendicularly to the array
(along boresight). (b) The beam pattern of the continuous antenna array
approximation as a function of the product of the direction u = sin(φ) and the array
side L.
mainlobe narrows if a sparse array is used. The approximate mainlobe width of

a linear array is proportional to the inverse of the root mean square antenna
position, under the assumption that the average element position is zero. How-
ever, the sidelobe levels increase. Depending upon the application, the increase
in sidelobe levels may or may not be of interest. For linear antenna arrays con-
strained to a regular lattice of spatial sample points, Equation (6.46) can be
employed by using a sparse antenna weighting vector a in which the values of 1
and 0 are mixed over some aperture (assuming that the array is phased to point
perpendicularly to the array). For a given number of antennas, a sparse array
will have a narrower main beam and higher peak sidelobes. An example form of
a beam pattern for sparse arrays on a regular lattice is given by
M
ρ(φ = arcsin[q Δu]) = {b}q 2
n
√r
M
b = √ Fa, (6.55)
nr
where a notional example of a sparse array is given by
T
a = (1 · · · 0 · · · 1 · · · 0 · · · 1 0 0 · · · 0) . (6.56)
188 Antenna arrays
For applications such as direction finding, with sparse arrays the sidelobes can be-
come sufficiently high to cause angle-estimation confusion. For communications,
these sidelobes are not typically an issue because, in complicated multipath en-
vironments, knowledge of the direction is not particularly meaningful. However,
the sparse array element can improve the spatial diversity, typically improving
performance.
6.4.2 Irregular random sparse arrays

There is no fundamental reason to require that antennas are placed on a regular
lattice. Random arrays with continuous irregular spacing are considered here. For
the sake of discussion, it is assumed that the antennas are constrained to a linear
array. Although we typically think of these arrays in the context of reception,
so that we use nr to indicate the number of antennas, here the arrays could be
used for either transmit or receive. The sidelobe distribution is the same.
Approximations for the distribution of peak sidelobes for random arrays are
addressed in References [83, 4, 234, 294]. Given an array of length L with nr
elements placed with a uniform random distribution within the region of length
L in units of wavelength, the probability for the ratio of the magnitude of the
peak sidelobe to the mainlobe (r2 = s.l./m.l.) in terms of power to be less than
some value η 2 is approximately given by
8 √ 9
2
− 4π n r η 2 e −n r η L2
−n r η 2
Pr(r < η) ≈ [1 − e
12π
]e , (6.57)
which will be developed within this section.

The approximate result in Equation (6.57) for the distribution of the ratio of
peak sidelobe to mainlobe of a beam pattern for a linear random array is found by
estimating the probability Pn oC r (η) of not having a beam pattern moving from
below the threshold to exceeding the threshold along at any point as one sweeps
across in angle. If, as the test angle is changed, the sidelobe moves from below
the threshold to above the threshold there is said to be an upward crossing. This
probability is multiplied by the probability that the sidelobe at the first angle
tested was below of the threshold Pbelow (η). Thus, the probability of starting
below the threshold and remaining below the threshold over the field of view is
given by
Pr(r < η) ≈ Pn oC r (η) Pbelow (η) . (6.58)
The probability of being below the threshold at a given point Pbelow (η) is found
by observing that the sum of independent random phasors tends toward a Gaus-
sian distribution because of the central limit theorem. The power received at the
output of a matched-spatial-filter beamformer constructed for direction u0 =
sin φ0 given an array response from some other direction u = sin φ, for angles
φ0 and φ from boresight, is given by the inner product between the two steering
vectors v† (u0 ) v(u), where the steering vector v(u) ∈ Cn r ×1 . For some direction
u, the value of the complex ratio of the sidelobe to mainlobe amplitude z is given
by
v† (u0 ) v(u)
z(u) =
v† (u0 ) v(u0 )
= v† (u0 ) v(u)
r(u) = z(u) , (6.59)
where, in this section, it is assumed that the norm of the steering vector is 1,
v(u) = 1 ∀ u. (6.60)
To make contact with previous sections, the square of the sidelobe to mainlobe
amplitude ratio is the normalized beam pattern, which is given by
ρ(u) = r2 (u) . (6.61)
It is assumed that the steering vector can be constructed by using a simple plane
wave model for a linear array. The element of the steering vector associated with
the mth randomly placed antenna is given by
1
{v(u)}m = √ eik u x m , (6.62)
nr
where k = 2π/λ is the wavenumber for wavelength λ, and xm is the position of

the mth randomly placed antenna. Note that because we assume a linear array,
the distance along the array can be expressed by using the scalar xm . The value
of the ratio of sidelobe to mainlobe amplitude is given by
z(u) = v† (u0 ) v(u)

1 −ik u x m ik u 0 x m
= e e
nr m
1 −ik x m (u −u 0 )
= e . (6.63)
nr m
By invoking the central limit theorem (assuming u = u0 ) in the limit of a large

number of antennas nr , the probability distribution for z(u) in the sidelobe region
is approximated well by a complex Gaussian distribution. The distribution for
the magnitude of the ratio r(u) is then given by a Rayleigh distribution. The
variance of the sum of independent phasors is given by the number of elements nr .
Because of the 1/nr amplitude normalization, the mean of sidelobe-to-mainlobe
power ratio r2 (or the variance of sidelobe-to-mainlobe power ratio) is 1/nr ,
dPbelow (r) = 2 nr r e−n r r2

dr . (6.64)
By integrating the probability density from zero to the threshold η, the value for
the probability of being below the threshold at some sidelobe level at a specific
190 Antenna arrays
1.0
Prob SL Below Threshold

0.8
0.6
0.4
0.2
30 25 20 15 10 5 0
Threshold dB
Figure 6.8 Probability that sidelobe-to-mainlobe ratio for sparse arrays with 25 (light
gray), 20, 15, and 10 (black) randomly placed antennas is less than some value.
direction away from the mainlobe is given by

η
Pbelow (η) = dPbelow (r)
0 η
= dr 2 nr r e−n r r2
0
= 1 − e−n r η2
. (6.65)
This approximation for the probability of ratio of sidelobe-to-mainlobe ampli-

tude r to be less than some threshold is displayed in Figure 6.8. This probability
is displayed for 10, 15, 20, and 25 antenna elements. Because the approximation
employs the central limit theorem, a relatively large number of elements is as-
sumed. It is interesting to note that this distribution is not explicitly dependent
upon the size of the array, although it is assumed that the array is sparse, that is,
the aperture is large in units of wavelength compared to the number of antenna
elements. If the array were small in number of elements or aperture, then the
validity of the central limit theorem approximation would be questionable.
Simple approximation
At this point, one could observe that the Nyquist sampling density is 1/(2L), and
that the scanned space for u is from −1 to 1, so that there are up to approximately
2 · 2L/λ − 1 distinct sidelobes. Under the generous approximation that they are
independent, the probability of exceeding the threshold is approximately given
by
4L /λ−1
Pr(r < η) ≈ 1 − e−n r η2
. (6.66)
Threshold crossing formulation

A better approximation [83, 4], by using techniques discussed in Reference [27]
and references therein, is given by constructing the probability Pn oC r (η) that
there are no crossings within the field of view (the domain of potential sidelobe
directions). The probability density of no sidelobe crossing from below to above
the threshold within some small region du of u is denoted pcr (u; η). The proba-
bility of not crossing over the entire observed region is given by integrating over
the probability density as a function of direction u for threshold η,

Pn oC r (η) = 1 − du pcr (u; η)
pcr (u; η) du = Pr {η − r (u) du < r(u) < η}

∞ η

pcr (u; η) du = dr dr p(r, r ) ,
0 η −r (u )du
∞
= dr p(η, r ) r du , (6.67)
0
where r = r (u) = ∂r(u)/∂u is the derivative of the sidelobe-to-mainlobe am-

plitude ratio with respect to u that is evaluated at u, and p(r, r ) is the joint
probability density values of sidelobe-to-mainlobe ratio r and the derivative of
the ratio with respect to direction u. The explicit functional dependence of the
sidelobe-to-mainlobe ratio r and and its derivative r is usually suppressed, but
is sometimes displayed for clarity. The integral over r is only for positive values
because only upward crossings are considered.
Gaussian probability density model

The probability density for the probability of a particular sidelobe-to-mainlobe
amplitude ratio r and the derivative of the ratio with respect to direction u
(denoted r ) is given by

r2
r − + 2 σr 2

p(r, r ) =
2σ r
e r , (6.68)
2
2πσr σr2
where σr2 and σr2 are the variances of the real part of the complex amplitude
ratio and the derivative of the real part of it respectively. To develop this density,
a few intermediate results are required. It is assumed that the probability of the
value of the real zr and imaginary zi parts of the sidelobe-to-mainlobe complex
amplitude ratio z and their derivatives (zr and zi ) can be represented by real
independent Gaussian distributions. The probability density for these variables
is given by

z r2 z i2 z 2 z i2
1 − 12 + + σ r2 +
p(zr , zi , zr , zi ) = e σ r2 σ2
i r σ 2
i , (6.69)
(2π)2 σr σi σr σi
where σi2 and σi2 are the variances of the imaginary portion of the amplitude ratio
and its derivative with respect to the direction parameter u. In the sidelobes,
192 Antenna arrays
it is expected that there is symmetry between the real and imaginary portions
of the amplitude ratio. Consequently, the variances for the real and imaginary
parts are equal, σi2 = σr2 and σi2 = σr2 . The probability density for the real
and imaginary parts of the amplitude ratio can be expressed in terms of polar
coordinates, and are given by
zr = r cos(θ)
zi = r sin(θ) , (6.70)
where the parameter θ = arctan(zi /zr ) is implicitly a function of u. The deriva-
tives with respect to the direction parameter u of the real and imaginary part of
the amplitude ratio are given by
zr = r cos(θ) − r θ sin(θ)
zi = r sin(θ) + r θ cos(θ) , (6.71)
where θ is the derivative of θ with respect to u. The sum of the squares of the
real and imaginary components of the amplitude ratio and their derivatives are
given by
zr2 + zi2 = r2
zr2 + zi2 = r2 + r2 θ2 . (6.72)
The probability density in terms of the polar coordinates is given by

r2 2 + r 2 θ 2
r2 − 12 σ r2
+r σ r2
p(r, r , θ, θ ) = 2 2 2
e , (6.73)
(2π) σr σr
To find the probability density for the magnitude of the amplitude ratio and its
derivative, an integration over the angular components is performed,

p(r, r ) = dθ dθ p(r, r , θ, θ )

r2 2 + r 2 θ 2
r2 − 12 +r
= dθ dθ e σ r2 σ r2
(2π)2 σr2 σr2

r2 2 + r 2 θ 2
r2 − 12 σ r2
+r σ r2
= dθ 2 2
e
2π σr σr

2
√
r2 − 12 r2
σ r2
+ σr 2 σr 2π
= 2 2
e r
2π σr σr r

r2 2
r − 12 + σr 2
=√ e σ r2 r , (6.74)
2π σr2 σr
where the integral over θ is from 0 to 2π, and the integral over θ is from −∞ to
∞.
The next issue is to determine the variances of the amplitude ratio and its
derivative. The variance of the sidelobe-to-mainlobe ratio σ 2 is 1/nr . Conse-
quently, the variance of the real (and equivalently the imaginary) component of
the amplitude ratio is given by

1
σr2 = . (6.75)
2 nr
The evaluation of the variance of the derivative of the amplitude ratio is slightly
more involved. The variance of the derivative term is given by the second deriva-
tive of the autocovariance as its argument approaches zero [241],
∂2
σr2 = − lim Rz ,z (δu) , (6.76)
δ u →0 ∂δu r r
where Rz r ,z r (δu) is the autocovariance5 of the real component of the amplitude
ratio zr . This relationship can be understood by noting that for a zero-mean
stationary (which means the statistics are independent of angle far from the
mainlobe) processes a(u), the autocovariance Ra,a (u1 , u2 ) is given by
Ra,a (u1 , u2 ) = a(u1 ) a∗ (u2 )

∂
Ra ,a (u1 , u2 ) = Ra,a (u1 , u2 )
∂u1
∂2
Ra ,a (u1 , u2 ) = Ra,a (u1 , u2 ) . (6.77)
∂u1 , ∂u2
If the process is stationary (angle independent) in u (which is a reasonable as-
sumption far from the mainlobe if the probability distribution of the locations
of the antennas changes slowly across the aperture), then the autocovariance is
a function of the difference δu = u1 − u2 and is given by
Ra ,a (δu) = Ra ,a (u1, u2)

∂2
= Ra,a (u1 , u2 )
∂u1 ∂u2
∂2
= Ra,a (u1 − u2 )
∂u1 ∂u2
∂2
Ra ,a (δu) = − Ra,a (δu) , (6.78)
∂δu2
where ∂/∂u1 → ∂/∂δu and ∂/∂u2 → −∂/∂δu because δu = u1 − u2 . Conse-
quently, the variance of a (u) (which is the derivative of some random process
a with respect to the variable u), denoted σa2 , is found by setting the second
derivative of the autocovariance to zero:
∂2
σa2 = − lim Ra,a (δu) . (6.79)
δ u →0 ∂δu2
In the sidelobe region, it is expected that the autocovariance of the real and
imaginary components are approximately equal.
5 Here we use R {·, ·} (·) to indicate covariance rather than correlation used elsewhere in the
text.
194 Antenna arrays
To calculate the autocovariance of the amplitude ratio, we will make a cou-

ple of observations. Because, in this section, length is represented in units of
wavelength, the wavenumber k is equal to 2π, and the contribution of the mth
antenna is proportional to ei2π y m u . It is assumed here that the antenna elements
are distributed randomly with uniform probability. Consequently, the probability
density for the elements’ positions is given by p(y) = 1/L. Thus, the autocovari-
ance of the real component of the amplitude ratio is given by

1
nr
Rz r ,z r (δu) = dy p(y) cos(2π ym u) cos(2π ym [u + δu])
nr m =1

1 sin δ u2k L 1 sin 12 2πL(δu + 2u)
= +
nr 2πL δu nr 2πL(δu + 2u)
δ u 2π L
1 sin
≈ 2
nr 2πL δu
1
= sinc(δu L) , (6.80)
2 nr
where, because we are considering the sidelobes, the second term in the second
line of the above equation (decaying rapidly with the term L [δu + 2u]) is small.
The variance of the derivative of the real part of the amplitude ratio is given by
∂2
σr2 = − lim Rz r ,z r (δu)
δ u →0 ∂δu

cos(δu Lπ) sin(δu L π) L π sin(δu L π)
= − lim − −
δ u →0 δu2 2δu3 L π 2δu
2 2
L π
= . (6.81)
6 nr
By substituting the value for the variances, the joint probability density for
the sidelobe-to-mainlobe ratio and the derivative of the ratio is given by

r2 2
r − 12 + σr 2
p(r, r ) = √ e σ r2 r
2π σr2 σr

r2 r 2
r − 12 1 + L2 π2
=√ e 2 nr 6 nr
1 L2 π 2
2π 2 nr 6 nr
2
12 −1 2 n r r 2 + 6 Ln2r πr 2
=r e 2 , (6.82)
π 3 L2 n3r
where Equations (6.75) and (6.81) are employed to determine the values for σr2
and σr2 . The probability pcr (u; η) of crossing the threshold at some direction
u [formulated in Equation (6.67)] is given by integrating the joint probability
density over all derivative values near the threshold value of interest dr ≈ r du.
The resulting probability density is a function of the direction and threshold
level only, and is given by

∞
pcr (u; η) du = dr r p(η, r ) du
0
,
π nr
= η L e−n r η2
du . (6.83)
3
From the above discussion, the total probability of the peak sidelobe being
below the threshold η is approximated by
P (r < η) ≈ Pn oC r (η) Pbelow (η)

= 1 − e−n r η Pbelow (η)
2
M

um ax − um in
Pbelow (η) = lim 1− pcr (u; η)
M →∞ M
m =1

M
um ax − um in m−1
= lim 1− pcr um in + (um ax − um in ); η
M →∞ M M
m =1
M
∞
2
= lim 1− dr r p(η, r )
M →∞ M 0
m =1
M ,
2 π nr
η L e−n r η
2
= lim 1−
M →∞ M 3
m =1
0 √ 2
1
− [ 4 π 3n r η L e −n r η ]
=e , (6.84)
where um ax and um in are the limits of direction and are given by 1 and −1
respectively, and pcr (u; η) is evaluated by using Equation (6.83). Here we employ
the observations that the argument of the product above is independent of u and
that from Equation (2.15) the product can be expressed as an exponential by
using the following relationship,
x n
lim 1 + = ex . (6.85)
n →∞ n
Consequently, the product of the probability of exceeding the threshold at a
given initial angle times the probability of crossing the threshold η over the
visible region is given by
0 √ 2
1
−n r η 2 − [ 4 π 3n r η L e −n r η ]
Pr(r < η) ≈ [1 − e ]e . (6.86)
This approximation for the probability of the ratio of peak sidelobe-to-mainlobe

power r2 to be less than some threshold is displayed in Figure 6.9. Here an
ensemble of random arrays with uniform element likelihood under an aper-
ture constraint of 50 wavelengths is displayed for 10, 15, 20, and 25 antenna
elements.
196 Antenna arrays
1.0
2
0.8
r2
0.6
Prob s.l. m.l.

0.4
0.2
0.0
10 8 6 4 2 0
2
Threshold, dB
Figure 6.9 Under an aperture constraint of 50 wavelengths, the probability of

sidelobe-to-mainlobe ratio exceeding a given threshold for random sparse arrays with
25 (light gray), 20, 15, and 10 (black) randomly placed antennas is less than some
value.
There are a couple of useful interpretations of the probability Equation (6.86),

depending upon how one wishes to use this result. One use of this result is for
applications in which the random distribution of elements over some given area
cannot be designed. An example might be for randomly distributed nodes in a
sensor network. Given some desirable peak sidelobe level, the number of nodes or
size of aperture can be modified so that there is a high probability that the peak
sidelobe is no larger than some design value. Alternatively, one could use these
probabilities as a system design tool. One can quickly define system parameters
for which some good sparse antenna array exists with some high likelihood.
Later, a specific array can be designed either by some optimization process or
by simply simulating multiple throws of random arrays until one finds an array
that satisfies the sidelobe requirements.
6.5 Polarization-diverse arrays
It is common in discussions of antenna arrays to ignore the existence of polar-

ization. This is a reasonable approximation if the antennas in the array all have
identical polarization. Any mismatch in the polarizations between the trans-
mitter and receiver is folded into the overall attenuation. However, by ignoring
polarization, some opportunities in increasing diversity and angle estimation are
missed.
6.5.1 Polarization formulation

A wavefront propagating in free space has the freedom to propagate with some
linear combination of two independent polarizations [154, 290, 229]. These polar-
izations are often expressed as horizontal and vertical, or, by a simple transforma-
tion, left and right circular polarizations. The horizontal and vertical
6.5 Polarization-diverse arrays 197
Figure 6.10 Propagation of wavefront along the {x}1 -axis with horizontal and vertical
polarization.
polarization axes are defined by the directions of the electric field oriented hori-
zontally and vertically in the plane perpendicular to the direction of propagation.
While line-of-sight propagation is considered within this chapter, more generally
it is worth noting that because many channels have complicated multipath scat-
tering, energy propagating along any direction has some probability of getting
from the transmitter to the receiver. Consequently, all directions of propagation
and polarization are of interest. In particular, one can imagine employing an
array of three crossed dipole antennas, all centered at some point.
The simple plane wave description in Equation (6.5) is extended here to include
polarization. The vector of electric field components e(x, t) for each direction
as a function of location and time under the assumption that a plane wave is
propagating along the {x}3 -axis such that {k}1 = {k}2 = 0 is given by
⎧⎛ ⎞⎫
⎨ ψ1 (k, t) ⎬
e(x, t) = ⎝ ψ2 (k, t) ⎠
⎩ ⎭
0
⎧⎛ ⎞ ⎫
⎨ a1 ⎬
= ⎝ a2 ⎠ ei({k}3 {x}3 −ω t) , (6.87)
⎩ ⎭
0
where ψ1 (k, t) and ψ2 (k, t) are the plane wave solutions as a function of time
t to the wave equation propagating along k associated with polarization along
the {x}1 -axis and polarization along direction {x}2 -axis, respectively. The pa-
rameters a1 and a2 are the complex amplitudes along the 1 and 2 axes. Given
the defined geometry (as seen in Figure 6.10), the horizontal and vertical po-
larizations of the wavefront are associated with the {x}1 -axis and {x}2 -axis,
respectively. The third element is 0 because there is no electric field along the
direction of propagation in free space.
A horizontally polarized wavefront (as seen in Figure 6.10) is characterized by
a1 = 0 , and a2 = 0 . (6.88)
198 Antenna arrays
Similarly, a vertically polarized wavefront is characterized by
a2 = 0 , and a1 = 0 . (6.89)
An arbitrary linear polarization is given by
a1 = r1 eiφ , and a2 = r2 eiφ , (6.90)
where r1 and r2 are real parameters, and a1 and a2 have a common complex
phase, φ. This basis corresponds to starting with a horizontally or vertically
polarized wave and rotating the axes (or physically rotating the antenna) about
the direction of propagation. An arbitrary elliptical polarization allows for values
a1 = 0 , and a2 = 0 . (6.91)
Right- and left-handed circularly6 polarized wavefronts correspond to
a1 = b , and a2 = ±i b , (6.92)
where b is some complex valued parameter. The positive sign for the ±i term
indicates a right-handed polarization, and the negative sign indicates a left-
handed polarization.
The circular basis for electric polarization, {right, left, propagation}, for ecir
is related to the linear basis for a wavefront propagating along {x}3 by
⎧⎛ ⎞⎛ ⎞⎫
⎪
⎨ √2 −i √2 0
1 1
ψ1 (k, t) ⎪ ⎬
⎜ ⎟
ecir (x, t) = ⎝ √12 i √12 0 ⎠ ⎝ ψ2 (k, t) ⎠ . (6.93)
⎪
⎩ ⎪
⎭
0 0 1 0
Problems
6.1 Considering the plane wave approximation for the reception of a narrow-
band signal on a continuous linear antenna array in a plane with a source located
along boresight of the array, evaluate the root-mean-square error as a function
of range R, length L, and signal wavelength.
6.2 Construct unit-normalized steering vectors as a function of azimuthal angle

φ and angle from zenith θ for the following geometries:
(a) an eight-element square with elements at ±1 wavelength and on point half
way in between each corner along the periphery of the square;
(b) an 11-element spiral that begins at the origin and ends at 2 wavelengths
along the {x}1 axis that follows the polar form in which radius of the nth
element follows the form rn = a(n − 1) and angle of the nth element follows
the form φn = b(n − 1), where a and b are undetermined coefficients.
6 The definition of left versus right is arbitrary. Some authors employ the reverse of the
definition used here.
Problems 199
6.3 For a four-element linear regular array with 1 wavelength spacing that
incorporates the array element amplitude pattern a(θ)
a(θ) = 2 cos[sin(θ)π/2] ,
where the angle θ is measured from boresight of the array:
(a) formulate an unnormalized steering vector in a plane;
(b) assuming the array is pointed at boresight, evaluate ratio of power beam
pattern of this array to an eight-element array with isotropic elements and
half wavelength spacing;
(c) assuming the array is pointed at θ = π/4, evaluate the ratio of power beam
pattern of this array to an eight-element array with isotropic elements and
half-wavelength spacing.
6.4 For the continuous array construction discussed in Section 6.3.3 that exists
over the spatial domain of 0 ≥ x ≥ L, find the normalized power beam pattern
under the assumption the receive array uses the following tapering or windowing
functions.
(a) Triangular:
2x
L ; 0 ≥ x ≥ L/2
w(x) = .
2− 2x
L ; L/2 > x ≥ L
(b) Hamming:

2πx
w(x) = 0.54 − 0.46 cos .
L
6.5 Consider the linear sparse array problem with randomly moving elements
that have uniform probability density, assuming 32 isotropic antennas; find an
the aperture in terms of wavelengths such that the peak sidelobe is no worse
than 5 dB 90% of the time.
6.6 Consider the linear sparse array design problem assuming 32 isotropic an-
tennas; find an the aperture in terms of wavelengths such that a designer would
likely find an array with peak sidelobe is no worse than 5 dB after ten random
array evaluations.
6.7 By assuming that a source is in the plane spanned by {x}1 and {x}2 ,
construct the unnormalized steering vector for an array of three phase centers
with half-wavelength spacing along {x}2 axis, assuming that the elements are
constructed with small electric dipoles and that:
(a) the array elements and single source are vertically (along {x}3 ) polarized;
(b) the array elements are horizontally polarized along the {x}2 axis and the
single source is horizontally polarized (in the {x}1 –{x}2 plane) and per-
pendicular to the direction of propagation;
(c) the array is phased to point at source, find the ratio of received power for
the horizontally polarized to vertically polarized systems as a function of
angle.
200 Antenna arrays
6.8 By assuming that a source is in the plane spanned by {x}1 and {x}2 ,
construct the unnormalized steering vector for an array of three phase centers
with half wavelength spacing along {x}2 axis, assuming that the elements are
constructed with small electric dipoles, that at each phase center there is an
electric dipole along each axis ({x}1 , {x}2 , and {x}3 ) and that:
(a) source is vertically (along {x}3 ) polarized;
(b) source has arbitrary polarization;
(c) the array is phased to point at source, find the ratio of received power
for the arbitrarily polarized to vertically polarized sources as a function of
angle.
7 Angle-of-arrival estimation
Although angle estimation of a source is not typically of significant interest

in communication systems, because angle-of-arrival estimation is commonly ad-
dressed in treatments of multiple-antenna systems, we will consider it here in this
chapter. In addition, some of the tools and intuition are helpful when consid-
ering adaptive multiple-antenna receivers. Furthermore, there are special cases
of communications systems for which line-of-sight propagation is valid and for
which angle-of-arrival estimation is of value. Angle estimation to the source is
sometimes denoted direction finding. There is a large body of work addressing
this topic [294, 223], and numerous useful approaches (for example, those in
References [16, 266]), many of which will be skipped for brevity. In general, the
direction to the source requires both azimuthal and elevation information, but
for most examples here it is assumed that the source is in the plan of the array, so
only azimuthal information encoded in the angle φ is required. A few approaches
are considered here as an introduction to the area.
Within this chapter, it is assumed that any multipath scattering is minimal
and that the signal is not dispersive across the array; that is, the array is small
compared with the speed of light divided by the bandwidth of the signal. This as-
sumption is sometimes denoted the narrowband signal assumption. Furthermore,
to simplify the introduction, it is assumed that the direction can be character-
ized by a single angle φ. Here it is assumed that multiple samples of the received
signal are available. The number of samples is denoted ns . The model of the
received signal Z ∈ Cn r ×n s for the nr receive antennas is given by

nt
Z= am v(φm ) sm + N , (7.1)
m =1
where am is the common complex attenuation from the transmitter to the re-
ceiver for the mth (of the nt ) sources that has array response v(φm ) ∈ Cn r ×1 (or
steering vector), which contains the phase differences in propagation because of
small relative delays from the transmitter to each receive antenna as discussed
in Section 6.1.2. These phases are a function of the propagation wavevector
km ∈ C3×1 . Array responses for a single incoming signal are expected to exist
somewhere along the array manifold defined by the continuous set of vectors
defined by v(φ) for all φ (or more generally v(k) for all wavevectors k). The
additive noise for the receiver is given by N ∈ Cn r ×n s . Here it is assumed that
the entries in N are such that the columns are independently drawn from a unit-
variance complex Gaussian distribution with potentially correlated rows. The
transmitted complex baseband signal for the mth single-antenna transmitter is
given by sm ∈ C1×n s .
For many of the examples discussed here, it is assumed that the entries in sm
are unknown and are independently drawn from a complex Gaussian distribution
with unit variance, although estimation bounds are considered for both known
signal and Gaussian signal models. On the basis of the assumptions described
here, the signal-plus-noise spatial covariance matrix Q ∈ Cn r ×n r is given by
1 % &
Q= Z Z†
ns
' n (
1 t
1 % &
= am 2
v(φm ) sm s†m †
v (φm ) + N N†
ns m =1 ns
n
t
2 †
= am v(φm ) v (φm ) + R, (7.2)
m =1
%where† &the external interference-plus-noise covariance matrix is given by R =

N N /ns so that the columns in N are drawn from the % complex
& Gaussian
distribution CN (0, R), and an average unit-variance signal sm s†m = ns is used.
When attempting to estimate the angle of arrival, it is often assumed that
the noise is not spatially correlated. However, by spatially whitening the data
with respect to the interference-plus-noise covariance matrix R, many traditional
approaches can be exploited. An effect of spatial whitening is to flatten the
eigenvalues along the directions in Hilbert space associated with the whitening
matrix. Consequently, if a matrix is whitened with respect to itself, then the
result is the identity matrix that has all unit eigenvalues. A whitened spatial
signal covariance matrix Q̃ is given by
Q̃ = R−1/2 Q R−1/2
1 −1/2 % &
= R Z Z† R−1/2
ns
' n (
1 t
= am 2 R−1/2 v(φm ) sm s†m v† (φm ) R−1/2

ns m =1
1 5 −1/2 6
+ R N N† R−1/2
ns
n
t
= am 2 R−1/2 v(φm ) v† (φm ) R−1/2 + I , (7.3)

m =1
where the square root of a matrix R−1/2 satisfies the relationship R−1/2 R−1/2 =
R−1 . Thus, environments with more complicated correlated noise can be consid-
ered. One complication of operating in the whitened space is that the norm of
the steering vector R−1/2 v(φ) may be dependent upon direction.
7.1 Maximum-likelihood angle estimation with known reference 203
7.1 Maximum-likelihood angle estimation with known reference
For some applications, much may be known about the waveform that is being
transmitted. The details of what is known about the signal being transmitted
may vary from something about the statistics of the signal to knowing the exact
transmitted signal [31] such as a known training sequence. This knowledge of
the waveform can be exploited to improve the angle-estimation performance.
In this discussion, it is assumed that there is a single source antenna nt = 1. If
the signal of the transmitter of interest s ∈ C1×n s is known, in the presence of
Gaussian spatially correlated noise with known spatial covariance R ∈ Cn r ×n r ,
then the probability density function of an observed data matrix Z condition
upon the known transmitted signal s, the unknown overall complex attenuation
a, and the unknown azimuthal angle φ is given by
1 † −1
p(Z|s, a, φ) = e−tr{[Z−a v(φ) s] R [Z−a v(φ) s]} . (7.4)
πn r n s
Because s is a known reference, we can define s 2 = ns which is stronger than
just knowing its expectation equals ns .
To find an estimate of the signal direction, the likelihood is maximized. Be-
cause the logarithm monotonically increases with its argument, maximizing the
likelihood is equivalent to maximizing the logarithm of the likelihood. If the
log-likelihood is denoted f (Z|s, a, φ), then it is given by
f (Z|s, a, φ) = −tr{[Z − a v(φ) s]† R−1 [Z − a v(φ) s]} + b
= −tr{R−1/2 [Z − a v(φ) s] [Z − a v(φ) s]† R−1/2 } + b
= −tr{R−1/2 [Z Z† − a v(φ) sZ† − a∗ Z s† v† (φ)
+ ns a 2
v(φ) v† (φ)] R−1/2 } + b , (7.5)
where b = − log(π n r n s ) is a constant containing parameters not dependent upon
direction or attenuation. The matrix identity tr{A B} = tr{B A} has been em-
ployed.
To remove the nuisance parameter a containing the overall complex attenua-
tion, the log-likelihood is maximized with respect to a,
∂
f (Z|s, a, φ) = tr{s† v† (φ) R−1 [Z − a v(φ) s]} , (7.6)
∂a∗
where Wirtinger calculus, discussed in Section 2.8.2, is invoked. Because the
log-likelihood is negative and the expression is quadratic in attenuation, the
stationary point must be a maximum. The likelihood is maximized when
0 = tr{s† v† (φ) R−1 [Z − a v(φ) s]}
= tr{v† (φ) R−1 [Z − a v(φ) s] s† }
= tr{v† (φ) R−1 Z s† − a v† (φ) R−1 v(φ) ns }
v† (φ) R−1 Z s†
am ax = . (7.7)
ns v† (φ) R−1 v(φ)
Consequently, the log-likelihood optimized for the nuisance parameter a = am ax

is given by
0
f (Z|s, am ax , φ) = −tr R−1/2 [Z Z† − am ax v(φ) sZ† − a∗m ax Z s† v† (φ)
1
+ns am ax 2 v(φ) v† (φ)] R−1/2 + b
+
v† (φ) R−1 Z s†
= −tr R−1/2 Z Z† − v(φ) sZ†
ns v† (φ) R−1 v(φ)
† ∗
v (φ) R−1 Z s†
− Z s† v† (φ)
ns v† (φ) R−1 v(φ)
$ † $ 7
$ v (φ) R−1 Z s† $2
+ns $ $ $ †
v(φ) v (φ) R −1/2
+b
ns v† (φ) R−1 v(φ) $
v† (φ) R−1 Z s†
= −tr R−1/2 Z Z† R−1/2 − R−1/2 v(φ) sZ† R−1/2
ns v† (φ) R−1 v(φ)
∗
v† (φ) R−1 Z s†
− R−1/2 Z s† v† (φ) R−1/2
ns v† (φ) R−1 v(φ)
$ † $ 7
$ v (φ) R−1 Z s† $2 −1/2
+ns $ $ $ R †
v(φ) v (φ) R−1/2
+b
ns v† (φ) R−1 v(φ) $
v† (φ) R−1 Z s†
= −tr{R−1/2 Z Z† R−1/2 } + sZ† R−1 v(φ)
ns v† (φ) R−1 v(φ)

s Z† R−1 v(φ)
+ v† (φ) R−1 Z s†
ns v† (φ) R−1 v(φ)
$ † $
$ v (φ) R−1 Z s† $2 †
− ns $ $ $ v (φ) R−1 v(φ) + b
ns v† (φ) R−1 v(φ) $
v† (φ) R−1 Z s† 2
= −tr{R−1/2 Z Z† R−1/2 } + 2
ns v† (φ) R−1 v(φ)
v† (φ) R−1 Z s† 2
− +b
ns v† (φ) R−1 v(φ)
v† (φ) R−1 Z s† 2
= −tr{R−1/2 Z Z† R−1/2 } + + b. (7.8)
ns v† (φ) R−1 v(φ)
The portion of the log-likelihood that is a function of the direction is given by
v† (φ) R−1 Z s† 2
f (Z|s, am ax , φ) = + b2 , (7.9)
ns v† (φ) R−1 v(φ)
where b2 = b − tr{R−1/2 Z Z† R−1/2 } is another constant independent of direc-

tion. The term
Z s†
z= (7.10)
ns
7.2 Beamscan 205
can be interpreted as an estimator of a vector proportional to the steering vector.

The maximum-likelihood estimator of the direction under the assumption of a
known reference signal is given by
v† (φ) R−1 z 2
φ̂ = argmaxφ . (7.11)
v† (φ) R−1 v(φ)
If the norm of the steering vector is independent of direction and the interference-
plus-noise covariance matrix R is white (that is, proportional to the identity
matrix) then the estimate is given by
φ̂ = argmaxφ v† (φ) z 2
. (7.12)
7.2 Maximum-likelihood angle estimation with unknown signal
By assuming that transmitted signals of interest are randomly drawn from a

Gaussian distribution with unknown but deterministic overall complex atten-
uation am with steering vectors determined by azimuthal unknown angles φ1 ,
φ2 , . . . , φn t , and with interference signals drawn from a Gaussian distribution as
described in the previous section, the likelihood of a given value of Z is given by
1 † −1
p(Z|Q) = e−tr{Z Q Z} . (7.13)
πn r ns |Q|n s
For a given number of transmitters of interest in the presence of some external
interference and noise characterized by interference-plus-noise covariance matrix
R, the receive covariance matrix Q is characterized by

nt
Q= am 2
v(φm ) v† (φm ) + R . (7.14)
m =1
The maximum-likelihood solution for the angle estimates is given by

{φ1 , φ2 , · · · , φn t } = argmaxa m ,φ m p(Z|Q)
1
= argmaxa m ,φ m "n t
πn r ns |[
m =1 am 2 v(φm ) v† (φm )] + R|n s
"n −1
−tr{Z † [( mt= 1 a m 2 v(φ m ) v † (φ m ) )+R ]
·e Z}
, (7.15)
where the values of am are considered nuisance parameters. While this approach
provides the maximum-likelihood solution, for many problems it is too expensive
computationally. In addition, often the number of sources is not known, although
the number of sources of interest can be estimated [339] by considering the eigen-
value distribution of the whitened spatial covariance matrix R−1/2 Q R−1/2 .
7.3 Beamscan
The most direct and possibly the most intuitive approach to estimate the angle of
arrival is to scan a matched filter for all possible expected array responses. This
is similar to the analysis discussed in Section 6.2, in which the beam pattern
is discussed and is sometimes denoted beamscan. The beamscan approach is
developed by considering the maximum-likelihood solution under the condition
of a single transmitter and of spatially white noise. Under the assumption that
there is a single random complex Gaussian source (nt = 1), the maximum-
likelihood solution simplifies to
φ = argmaxa,φ p(Z|Q)
1
= argmaxa,φ
π n r n s | [ a 2 v(φ) v† (φ)] + R|n s
† † −1
· e−tr{Z [a v(φ) v (φ)+R ] Z} .
2
(7.16)
Under the assumption of a single source, the determinant of the signal-plus-noise

spatial covariance matrix Q is given by
|Q| = | a 2
v(φ) v† (φ) + R|
=| a 2
R−1/2 v(φ) v† (φ) R−1/2 + I| |R|
=( a 2
v† (φ) R−1 v(φ) + 1) |R|
=( a 2
κ + 1) |R| , (7.17)
where the whitened inner product κ = v† (φ) R−1 v(φ) is defined for convenience.
Because the whitened signal-plus-noise spatial covariance matrix R−1/2 Q R−1/2
is represented by an identity matrix plus a rank-1 matrix (as presented in Equa-
tion (2.114)), its inverse is given by
Q−1 = ( a 2
v(φ) v† (φ) + R)−1
= [ R1/2 ( a 2
R−1/2 v(φ) v† (φ) R−1/2 + I) R1/2 ]−1
= R−1/2 ( a 2 R−1/2 v(φ) v† (φ) R−1/2 + I)−1 R−1/2

−1/2 a 2 R−1/2 v(φ) v† (φ) R−1/2
=R I− R−1/2 . (7.18)
1+ a 2κ
Thus, the probability density for the received signal is given by

1
p(Z|Q) =
πn r n s [( a κ + 1) |R|]n s
2

a 2 R −1 / 2 v ( φ ) v † ( φ ) R −1 / 2
−tr Z † R −1 / 2 I− R −1 / 2 Z
·e 1 + a 2 κ
. (7.19)
This likelihood is maximized when the log of the likelihood is maximized, which
is the equivalent of maximizing

† −1/2 a 2 R−1/2 v(φ) v† (φ) R−1/2 −1/2
φ̂ = argmaxa,φ tr Z R R Z
1+ a 2κ
− ns log(1 + a 2
κ) . (7.20)
7.4 Minimum-variance distortionless response 207
Now a couple of simplifying assumptions are employed. The interference-plus-

noise covariance matrix R is assumed to be spatially white with unit-variance-
normalized thermal noise so that it can be given by the identity matrix, and the
norm of the array response vector v(φ) is a constant,
R=I
2
κ = v(φ) = nr . (7.21)
The maximization simplifies to

a 2
Z† v(φ) v† (φ) Z
φ̂ = argmaxa,φ tr − ns log(1 + a 2
κ)
1+ a 2κ
= argmaxφ v† (φ) Z Z† v(φ) , (7.22)
where terms independent of angle φ have been discarded, and the optimization
of angle φ decouples from the optimization of the attenuation. When plotted,
the quadratic term being maximized is a useful estimate of energy received as a
function of angle, ηbs (φ).
v† (φ) Z Z† v(φ)
ηbs (φ) =
ns
†
= v (φ) Q̂ v(φ) , (7.23)
where Q̂ = Z Z† /ns is an estimate of the signal-plus-noise covariance matrix.

When the scanned beam in Equation (7.23) points far from a source of energy,
then the value of the beamscan test statistic is approximately the noise energy per
antenna times the number of antennas. When the scanned beam points toward a
source, the output is approximately the received source power per antenna times
the number of antennas squared under the assumption of a fairly strong source.
The term beamscan (or sometimes beamsum) is employed because the matched
filter is scanned across possible angles. Beamscan is an estimate of the spatial en-
ergy response of the array. A variety of techniques with a similar angle-estimation
goal to this approach are available. These estimators provide an estimate of the
energy as a function of direction in a manner similar to the way spectral esti-
mators provide estimates of energy as a function of frequency. Other directional
energy estimation approaches impose additional constraints that distort the en-
ergy estimation and are often denoted pseudospectral estimators.
7.4 Minimum-variance distortionless response
For some receive beamformer w ∈ Cn r ×1 , which is a function of a reference di-

rection φ, and total receive signal-plus-interference-plus-noise covariance matrix
defined in Equation (7.2), the energy at the output of the beamformer is given
by
ηw (φ) = w† Q w . (7.24)
The minimum-variance distortionless response (MVDR) pseudospatial-spectral

estimator (sometimes denoted the Capon method [51]) attempts to minimize the
energy accepted by the receive beamformer w, while requiring a distortionless
response (where distortionless indicates that the inner product of the beamformer
and ideal array response is a known constant),
w† v(φ) = nr . (7.25)
To minimize ηw (φ) as a function of w subject to the constraint, a Lagrange
multiplier λ can be used. The value of w that minimizes ηw (φ) is given by
argminw w† Q w − λ w† v(φ)
w = λ Q−1 v(φ) . (7.26)
By imposing the distortionless response constraint w† v(φ) = nr , the values of
λ, and thus w, are found,
nr
λ= †
v (φ) Q−1 v(φ)
nr Q−1 v(φ)
w= † . (7.27)
v (φ) Q−1 v(φ)
The minimum-variance distortionless-response energy estimator is then given by
ηm v dr (φ) = w† Q w
nr v† (φ) Q−1 nr Q−1 v(φ)
ηm v dr (φ) = Q
v† (φ) Q−1 v(φ) v† (φ) Q−1 v(φ)
n2r
= † . (7.28)
v (φ) Q−1 v(φ)
Because only estimates of the spatial receive covariance matrix are available, Q
is replaced with the estimate Q̂, so that the estimator is given by
n2r
ηm v dr (φ) ≈ . (7.29)
v† (φ) Q̂−1 v(φ)
With this normalization, when the pseudospectrum estimator defined in Equa-
tion (7.29) is pointed far from a source, the output is approximately the product
of the noise energy per sample and the number of receive antennas. Alterna-
tively, when the pseudospectrum estimator is pointed toward an isolated source,
the output is approximately the product of the signal energy per sample and the
number of receive antennas squared, although the level of the output is sensitive
to any sources with similar array responses.
7.5 MuSiC
Another common spatial pseudospectral estimator is the multiple-signal classifi-

cation (MuSiC) approach [276, 296]. Here it is presented in the context of a known
7.5 MuSiC 209
noise and interference environment characterized by the spatial interference-plus-

noise covariance matrix R ∈ Cn r ×n r . In this approach, the signal subspace and
the noise subspace in the estimate of the nr receive antenna spatial covariance
matrix Q ∈ Cn r ×n r [defined in Equation (7.2)] are identified. This identifica-
tion is often done by setting some threshold in the eigenvalue distribution of
the spatial covariance matrix Q, so that eigenvalues less than some threshold
are considered noise eigenvalues, and their associated eigenvectors can be used
to span the noise space. For sorted eigenvalues λm +1 > λm ∀ 1 ≤ m < nr , the
spatially whitened covariance matrix can be decomposed as

M
nr
R−1/2 Q R−1/2 = λm em e†m + λm em e†m , (7.30)
m =1 m =M +1
= >? @ = >? @
n oise sig n al
where the first M eigenvalues are below the signal threshold,
M = max{m : λm ≤ threshold} , (7.31)
and the eigenvector for the mth eigenvalue is denoted em . Under the assumption
of unit-norm eigenvectors, the projection operator (as discussed in Section 2.3.5)
for the noise subspace Pn oise ∈ Cn r ×n r is given by

Pn oise = em e†m . (7.32)
m
If there is energy coming from some direction φ, then it is expected that the
quadratic form
v† (φ) R−1/2 Pn oise R−1/2 v(φ) (7.33)
would be small because array responses would be contained in the signal space,
which is orthogonal to the noise projection matrix Pn oise . Conversely, in other
directions with “noise-like” spatial responses, this quadratic form would be ap-
proximately equal to v† (φ) R−1 v(φ). Thus, the ratio of these two values would
be a reasonable indicator of energy, and the MuSiC spatial pseudospectral esti-
mator ηm u sic (φ) is given by
v† (φ) R−1 v(φ)
ηm u sic (φ) = . (7.34)
v† (φ) R−1/2 Pn oise R−1/2 v(φ)
Because the spatial signal-plus-noise covariance matrix (or a whitened version of
it) can typically only be estimated, the MuSiC spatial pseudospectral estimator
is generally implemented using the estimated spatial covariance matrix Q̂ (and
if whitening is used, R̂).
To be clear, MuSiC is a relatively poor estimator for energy of received sig-
nals. With this normalization, the pseudospectrum is approximately unity when
directed far from any source. When pseudospectrum is pointed toward a source,
the output is approximately equal to the energy per sample times the number of
25
BS
MVDR
Pseudospectrum (dB)
20
MuSiC
15
10
0
−1 −0.5 0 0.5 1
Sin( φ )
Figure 7.1 Beamscan, minimum-detection distortionless response, and MuSiC

pseudospectra for a regular, filled, 10-antenna array with 1/2 wavelength spacing.
The SNR of the source per receive antenna is 0 dB, and the uncorrelated sources are
located at sin(φ) = −0.55, −0.1, 0.1. The receive covariance is estimated using 50
samples, and assumed noise floor was 0 dB per antenna.
antennas squared. However, any given example is strongly dependent upon the
instantiation of noise, so it fluctuates significantly.
7.6 Example comparison of spatial energy estimators
In Figure 7.1, an example of a comparison of pseudospectra is displayed. In this

example, the receive array consists of ten isotropic antenna elements spaced at
half a wavelength spacing. The signal-to-noise ratio is assumed to be 0 dB per
receive antenna. Three sources are located at sin(φ) = −0.55, −0.1, and 0.1. The
receive signal-plus-noise covariance matrix is estimated using 50 samples.
As can be observed in the figure, the isolated source sin(φ) = −0.55 can be
identified easily by using any of the statistics. The beamscan pseudospectrum
has the broadest peak, and the MuSiC approach has the narrowest peak. It is
tempting to think that the widths of the peaks are an indication of the angular
accuracy of the approach; however, this is misleading. For the two peaks that
are close together, sin(φ) = −0.1 and 0.1, the story is somewhat more compli-
cated. The minimum-variance distortionless response and the MuSiC approaches
attempt to isolate the two sources, while in the beamscan pseudospectrum the
two peaks might be confused with a single peak. The two peaks are most easily
identified when MuSiC is used.
In this example, it was assumed that the characteristics of the receive an-
tenna arrays were known perfectly. In practice, errors in antenna position or
other mismatches between the assumed array manifold and the real manifold
(denoted calibration errors) can be significant. Each pseudospectrum estimator

has different sensitivities to these errors. It is not uncommon for pseudospectrum
estimators to look promising theoretically, but perform badly in practice because
of calibration errors [356].
7.7 Local angle-estimation performance bounds
For a given array geometry and SNR of the signal of interest, there are two perfor-
mance metrics of interest, as seen in Figure 7.2. The first is the asymptotic in the
number of samples angle-estimation error bound given by the Cramer–Rao for-
mulation discussed in Section 3.8 and in References [77, 312, 172]. The Cramer–
Rao parameter performance bound is a local bound. It assumes that the probabil-
ity for an estimator to confuse the value of a parameter with a value far from the
actual value is zero. The second metric is the threshold point. This is the point at
which an estimator diverges dramatically from the asymptotic estimation bound.
Because of similarities in the array response (array manifold) at different phys-
ical angles, there is some probability of making large angle-estimation errors by
confusing an observed array response in noise with the wrong region of the array
manifold. These regions of potential confusion can be seen by those regions of
relative high sidelobes in the array response as a function of angle. The high side-
lobes are an indication that, while the angles are significantly different, the array
responses are similar. Consequently, when the angle is estimated in the presence
of significant noise, the array response associated with the erroneous angle at a
large sidelobe can sometimes be a closer match to the observed array response
than the array response associated with the correct angle. The threshold point
is not well defined. However, because the average estimation typically diverges
quickly as a function of SNR, the exact definition is typically not a significant
concern. There are a variety of bounds that attempt to incorporate the nonlocal
effects, such as the Bhattacharyya, and the Bobrovsky–Zakai bounds. Many of
these bounds are special cases of the more general Weiss–Weinstein bound [342].
7.7.1 Cramer–Rao bound of angle estimation

By employing the reduced Fisher information matrix from Equation (3.204), a
relatively simple form for the angle-estimation performance bound can be found.
The Cramer–Rao bound is discussed in Section 3.8. In the discussion in this
chapter, it is assumed that while potentially spatially correlated, the noise is
drawn independently in each temporal sample.
To simplify the discussion, it is assumed that the source and linear array lie in a
plane. The angle φ is measured from the boresight (the direction perpendicular to
the axis along which the antenna array lies) and is associated with some direction
u = sin(φ). Here two possible models for the signal source are considered.
Angle-Estimation Performance
Estimator
Log Estimation
Performance
Variance
CR B
ound
Threshold
SNR (dB)
Figure 7.2 Notional performance of parameter estimation. The high SNR performance
is characterized by the Cramer–Rao bound. Below some threshold point, the
estimation diverges from the Cramer–Rao bound.
• The signaling sequence is known, s ∈ C1×n s , so that the signal is modeled by

the mean of the received signal-plus-noise vector.
• The signaling sequence is random and drawn from a random distribution, so
that the signal contribution is parameterized in its contribution to the received
spatial covariance matrix.
7.7.2 Cramer–Rao bound: signal in the mean

For the first case, a known signal is transmitted. An example might be for a
known training or pilot sequence. In this case, the probability distribution of the
observed signal is given by
1 †
R −1 [Z−av(u ) s]}
p(Z|s, u; R) = e−tr{[Z−av(u ) s] , (7.35)
|R|n s πn r ns
where a is the complex attenuation, and v(u) is the steering vector as a function
of direction parameter u = sin(φ) with φ indicating the angle from boresight. In
this case, the mean is given by
μ = av(u) s , (7.36)
where the reference sequence s is normalized so that s 2 = ns . The covariance
matrix R is given by the covariance matrix of the external interference plus noise.
The Fisher information matrix for all ns samples is s 2 = ns times the Fisher
information matrix for a single sample. From Section 3.8, in Equation (3.259),
the reduced Fisher information matrix is given by
0 1
Ju(r,u) ({s}m ) = 2 a 2 {s}m 2 ẋ† P⊥ x(u ) ẋ

Ju(r,u) = Ju(r,u) ({s}m )
m
0 1
=2 a 2
ns ẋ† P⊥
x(u ) ẋ , (7.37)
where the spatially whitened vector and derivative matrix are defined by
x(u) = R−1/2 v(u)
∂
ẋ = R−1/2 v(u) , (7.38)
∂u
and the projection operator (discussed in Section 2.3.5) for the subspace orthog-
onal to the column space spanned by the whitened array response x(u) is given
by
P⊥ †
x(u ) = I − x(u) [x (u) x(u)]
−1 †
x (u) . (7.39)
For the sake of discussion, it is assumed that there is no external interference
and the units of power are scaled so that
R = I. (7.40)
The reduced Fisher information simplifies to

∂v†(u) ⊥ ∂v(u)
Ju(r,u) = 2 ns a 2
Pv(u ) . (7.41)
∂u ∂u
As discussed in Section 6.1, the components of the array response or steering
vector v(u) ∈ Cn r ×1 are given by
{v(u)}m = eik y m u
, (7.42)
under the assumption of the normalization v(u) 2 = nr , where ym is the posi-
tion of the mth antenna along the linear array in units of wavelength and k is
the wavenumber or equivalently the magnitude of the wavevector.
The derivative with respect to the direction variable u is given by

∂v(u)
= ik ym v(u) . (7.43)
∂u m
The reduced Fisher information is then given by

∂v†(u) ⊥ ∂v(u)
Ju(r,u) = 2 ns a
2
Pv(u )
∂u ∂u
+ 7
∂v †
(u) v(u) v †
(u) ∂v(u)
= 2 ns a 2 k2 2
ym −
m
∂u nr ∂u
⎧ $ $2 ⎫
⎨ 1 $ $ ⎬
$ $
= 2 ns a 2 k2 2
ym − $ yn $ . (7.44)
⎩m nr $ n $ ⎭
By setting the origin of the y-axis so that the average element position is zero,
"
n yn = 0, the second term in the braces of Equation (7.44) goes to zero and
the Fisher information is given by
+ 7

(r ) 2 2 2
Ju ,u = 2 ns a k ym
m
2
= 2 ns a k 2 nr σy2 , (7.45)
" 2
and the notation for σy2 = m ym /nr indicates the mean-squared antenna po-
sition, under the assumption that the mean position is zero. Consequently, the
reduced Fisher information is given by the direction term exclusively. The vari-
ance in the estimate of direction u is limited by
% & 1
û − u 2
≥ " 2
2 k2 a 2
ns m ym
1
= , (7.46)
2 k2 a 2 ns nr σy2
where û is the estimate of u = sin φ. It is interesting to note that the estimation

variance bound decreases as the square of the mean-square antenna position σy2
and decreases as the integrated SNR (given by a 2 ns recalling that the noise
has unit variance). Here the phrase integrated SNR indicates the ratio of the
coherently integrated signal power, which grows as n2s , over the incoherently
integrated noise, which grows as ns .
7.7.3 Cramer–Rao bound: random signal

In this scenario, the signal is random. In particular, it is assumed that the signal
is drawn from a zero-mean complex circular Gaussian distribution with variance
P = a 2 per receive antenna per sample. By construction, the mean of the signal
is zero, and thus its derivatives, are zero. The nr × nr receive spatial covariance
of this signal is given by
Q = R + P v(u) v† (u) , (7.47)
where the interference-plus-noise receive covariance matrix is given by R. For

discussion, it is assumed here that there is no interference and the units of power
are defined so that R = I. In this section, it is assumed that the steering vectors
v(u) are normalized so that v(u) 2 = nr . For ns independent observations of
this signal Z ∈ Cn r ×n s , the probability density function is given by
1 † −1
p(Z|P, u) = e−tr{Z Q Z}
|Q|n s π n r n s
ns
= [p(z|P, u)] , (7.48)
under the assumption of independent columns in Z, where the probability density

for a single column of Z (with a single column denoted z) is given by
1 † −1
p(z|P, u) = e−tr{z Q z} . (7.49)
|Q| π n r
Consequently, because the Fisher information matrix is a function of derivatives

of logarithms of the probability density function, the Fisher information matrix
for Z is ns times the Fisher information matrix for z,
log p(Z|P, u) = ns log p(z|P, u)

J(Z) = ns J(z) . (7.50)
As defined in Equation (3.200), the mean portion of the signal implicitly in-
corporates the multiple samples; however, the covariance portion of the Fisher
information matrix does not, and thus includes the coefficient ns . Given some
vector of parameters θ, the {m, n}th component of the Fisher information ma-
trix, is given by

∂Q(θ) −1 ∂Q(θ)
{J}m ,n = ns tr Q−1 (θ) Q (θ) . (7.51)
∂{θ}m ∂{θ}n
The reduced Fisher information for the real direction u and power P parameters
is given by
Ju(r,u) = Ju ,u − Ju ,P J−1
P ,P JP ,u . (7.52)
The derivative of the receive spatial covariance matrix is given by
∂ ∂
Q(u) = I + P v(u) v† (u)
∂u ∂u

= P v̇(u) v† (u) + v(u) v̇(u)† , (7.53)
where the notation v̇(u) indicates the derivative of the steering vector with re-
spect to the direction parameter
∂
v̇(u) = v(u) . (7.54)
∂u
By using Equation (2.114), the inverse of the rank-1 plus identity receive spatial
covariance is given by
P v(u) v† (u)
Q−1 = I − . (7.55)
1 + nr P
For notational convenience, the explicit functional dependence with respect to

the direction parameter u is dropped for the remainder of this discussion. The
direction component of the Fisher information matrix is given by

P v v† †
Ju ,u = ns tr I− P v̇ v + v v̇†
1 + nr P

P v v† † †

· I− P v̇ v + v v̇
1 + nr P

P v v† †
= ns v † I − P v̇ v + v v̇†
1 + nr P

P v v†
· I− P v̇ + c.c.
1 + nr P

P v v† †
= 2 ns v† I − P v̇ v + v v̇†
1 + nr P

P v v†
· I− P v̇
1 + nr P
+ 2
† P v v†

= 2 ns P v I −
2
v̇
1 + nr P

† P v v† † P v v†
+v I− v v̇ I− v̇
1 + nr P 1 + nr P

nr P † 2
= 2 ns P 2 nr − v v̇
1 + nr P

n2r P † P v v†
+ nr − v̇ I− v̇ , (7.56)
1 + nr P 1 + nr P
where the notation c.c. indicates the complex conjugate of the previous term.
The mth element of the array response or steering vector associated with the
element at position ym along the antenna array is given by
{v}m = eik y m u
. (7.57)
The derivative of the steering vector with respect to the direction parameter is
given by
∂
{v̇}m = {v}m
∂u
= ik ym eik y m u

v† v̇ = ik e−ik y m u
ym eik y m u
m

= ik ym . (7.58)
m
Similar to the development in the previous section, to simplify the evaluation,

the origin of the axis can be chosen so that the average position of antennas is
"
zero, m ym /nr = 0. The inner product between the steering vector and the
derivative of the steering vector is then zero,

ym = 0 → v† v̇ = 0 . (7.59)
m
The inner product between the derivative of the steering vectors is given by

v̇† v̇ = k 2 e−ik y m u ym
2 ik y m u
e
m
= k nr σy2 .
2
(7.60)
By using these results, the component of the Fisher information matrix associ-
ated with the direction parameter u is given by

nr + n2r P − n2r P
Ju ,u = 2 ns P 2 v̇† v̇
1 + nr P

nr
= 2 ns P 2 k 2 nr σy2 . (7.61)
1 + nr P
The component of the Fisher information matrix associated with the received
signal power is given by

P v v†
JP ,P = ns tr I− v v†
1 + nr P

P v v† †
· I− vv
1 + nr P
2
P v v†
= ns v † I − v
1 + nr P
nr
= ns . (7.62)
1 + nr P
The cross-parameter component of the Fisher information matrix is given by

P v v† †
Ju ,P = ns tr I− P v̇ v + v v̇†
1 + nr P

P v v†
· I− v v†
1 + nr P

P v v† †
= ns tr v† I − P v̇ v + v v̇†
1 + nr P

P v v†
· I− v
1 + nr P
= 0. (7.63)
Because the cross-parameter term is zero, from Equation (7.59) (and the power
term is nonzero), the reduced Fisher information in Equation (7.41) is the same
(r )
as the Fisher information matrix without the nuisance parameters Ju ,u = Ju ,u .
The variance in the unbiased estimation σu2 of u cannot be better than
σu2 ≥ J−1
u ,u
1 + nr P
= . (7.64)
2 n2r ns P 2 k 2 σy2
As the SNR P becomes large, the variance on the estimation bound converges
to that of the deterministic signal in Equation (7.46) from above.
7.8 Threshold estimation
The threshold point occurs at the SNR at which the probability of confusing a
mainlobe with sidelobe starts contributing significantly to the angle-estimation
error. This notion is not a precise definition. Depending upon the details, var-
ious systems may have varying sensitivities to the probability of confusion. A
variety of techniques are available to extend parameter-estimation bounds to in-
clude nonlocal effects. One example is the Weiss–Weinstein bound [342]. Here
an approximation is considered. By using the method of intervals [263], nonlo-
cal contributions to the variance are introduced in an approximation. The total
parameter-estimation variance is approximated by considering the local contri-
butions associated with the mainlobe, which are characterized by the Cramer–
Rao bound, and the nonlocal contributions, which are approximated by adding
the variance contributed by a small number of large sidelobes. These sidelobes
correspond to array responses that are similar to that of the mainlobe. This
estimation assumes that the variance is the sum of the variance contributed by
the Cramer–Rao bound times the probability that there is no sidelobe confusion
plus the error squared of introducing an error near some sidelobe peak. For some
parameter φ, its estimate φ̂ is given by maximizing some test statistic t(φ) (or
equivalently some spatial spectral estimator),
φ̂ = argmax{t(φ)} . (7.65)
The test statistic is that which maximizes the likelihood, given a model for a
signal. As an example, consider the single Gaussian signal model in the absence
of interference. In this case, finding the peak of the beamscan test statistic is the
maximum-likelihood solution. Consequently, t(φ) is given by
1 †
t(φ) = v (φ) Z Z† v(φ) . (7.66)
ns
The method of intervals parameter-estimation variance estimate is given by

σφ2 ≈ Pm .l. (SNR) σC2 R ,φ (SNR) + Ps.l.(m ) (SNR) φ2s.l.,m , (7.67)
m
where σC2 R ,φ (SNR)

is the variance bound provided by the Cramer–Rao bound at
some SNR, Pm .l. (SNR) is the probability of not being confused by a sidelobe at
some SNR, Ps.l.(m ) (SNR) is the probability of being confused by the mth sidelobe
at some SNR, and φs.l.,m is the location of the peak of the mth sidelobe. This form
can be simplified further by noting that the nonlocal contributions to the error
are typically dominated by the largest sidelobe. The probability of confusing
the observed array response with the largest sidelobe is denoted Ps.l. (SNR).
Consequently, for mainlobe direction φ0 , the variance is approximated by
σφ2 ≈ [1 − Ps.l. (SNR)] σC2 R ,φ (SNR) + Ps.l. (SNR) (φs.l. − φ0 )2 . (7.68)
The probability of confusing a sidelobe φs.l. for a mainlobe φ0 is given by
Ps.l. (SNR) = Pr{t(φs.l. ) > t(φ0 )}

= Pr{ v† (φs.l. ) Z 2
F > v† (φm .l. ) Z 2
F }. (7.69)
Throughout this section, we will not attempt to be precise about Pr{t(φs.l. ) >
t(φ0 )} versus Pr{t(φs.l. ) ≥ t(φ0 )} because it will not introduce a meaningful
difference.
7.8.1 Types of transmitted signals

Similar to the variety of assumptions about the transmitted signal discussed in
Section 7.7, a number of assumptions can be made about the transmitted signal
when considering the probability of being confused by a sidelobe. Some possible
assumptions are
(1) known (deterministic) sequence,

(2) single observation with deterministic amplitude,
(3) unknown sequence of random complex Gaussian signals,
(4) unknown sequence of deterministic amplitude.
Items (1) and (2) have the same test statistic up to a simple scaling. The prob-
ability of confusion for these types of signals will be considered in Sections 7.8.3
and 7.8.4. The third type of signal with a sequence of random complex Gaussian
signals is considered in Section 7.8.5. If the length of the sequence is long, then
an unknown sequence of constant amplitude can be approximated by the Gaus-
sian signal assumption. However, we will not explicitly evaluate the probability
of confusion for the deterministic signal here.
7.8.2 Known reference signal test statistic

For a known reference s, normalized such that s 2 = ns , the variable z =
√
Z s† / ns , is equivalent to a single observation with a deterministic amplitude.
√
The ns normalization is to keep the Gaussian noise from growing as the num-
ber of samples ns increases, by employing the result that the inner product of
the vector of Gaussian variables and the reference is a Gaussian variable. The
beamscan angle-of-arrival test statistic for a single observation simplifies to
t(φ) = v† (φ) Q̂ v(φ)

= v† (φ) z 2
, (7.70)
where the observed response is given by z ∈ Cn r ×1 ,
Z s†
z= √
ns
(ã v(φm .l. ) s + N) s†
= √
ns
√
= ã v(φm .l. ) ns + n
= a v(φm .l. ) + n . (7.71)
Here ã indicates the received signal amplitude per receive antenna (implying the
steering vector normalization v(φ) 2 = nr ), and N ∈ Cn r ×n s is the additive
noise. In the case for which the multiple observations under the assumption of
a known reference collapses to a single observation, the amplitudes for the two
√
cases are related by a = ns ã. The single-observation amplitude a indicates
the received signal amplitude per receive antenna (implying the steering vector
normalization v(φ) 2 = nr ), and n ∈ Cn r ×1 is the additive noise. The proba-
bility of selecting a sidelobe over the mainlobe is given by the probability that
the inner product of the theoretical array response and the observed response is
larger for the sidelobe than the mainlobe,
Pr{t(φs.l. ) > t(φ0 )} = Pr{ v† (φs.l. ) z 2

> v† (φm .l. ) z 2 }
= Pr{ v† (φs.l. ) z > v† (φm .l. ) z } . (7.72)
Define the normalized inner product between the array responses ρ,
v† (φs.l. ) v(φm .l. )

ρ= . (7.73)
nr
The probability of selecting the wrong lobe is developed in Sections 7.8.3 and
7.8.4 and is given by
, ,
1 a 2 nr a 2n
r

Ps.l. = 1 − QM (1 + 1 − ρ 2 ), (1 − 1 − ρ 2 )
2 2 2
, ,
a 2 nr a 2 nr
+ QM (1 − 1 − ρ ),
2 (1 + 1 − ρ ) 2 ,
2 2
(7.74)
where the Marcum Q-function QM (·) is discussed in Section 2.14.8.

7.8.3 Independent Rician random variables

To find the probability of confusing a sidelobe for a mainlobe presented in Equa-
tion (7.74), it is noted that this is the probability that one Rician random variable
fluctuates above another Rician random variable. This is equivalent to asking if
one noncentral χ2 variable fluctuates above another noncentral χ2 random vari-
able. To complicate the issue, these two Rician variables are correlated (they
have the same signals).
To begin, the probability that one Rician fluctuates above another when the
two Rician distributions are independent is discussed in References [280, 293].
The mth (specifying 1 or 2) Rician is given by
rm = am + zm , (7.75)
2
where the random central complex Gaussian variable zm has variance σm . With-
out loss of generality, the mean parameter am is assumed to be real. The prob-
ability density for rm is given by the Rician distribution,

2 rm 2 am rm
e−(r m +a m )/σ m drm ,
2 2 2
fm (rm ) drm = 2 I0 2
(7.76)
σm σm
where I0 (·) indicates the modified Bessel function of the first kind, discussed in
Section 2.14.5. The probability of the wrong Rician fluctuating to a level higher
than the other is given by Reference [293]. The probability of Rician r2 exceeding
r1 is given by

2 a22 2 a21
Pr{r2 > r1 } = QM ,
σ12 + σ22 σ12 + σ22
a2+a2
σ2 − 12 22 a1 a2
− 2 1 2 e σ 1 + σ 2 I0 2 2 , (7.77)
σ1 + σ2 σ1 + σ22
where the integral over the second Rician variable r2 . In the following discus-
sion, we evaluate this probability by noting that the complementary CDF of the
noncentral χ2 distribution is given by the Marcum Q-function that is discussed
in Section 2.14.8. By using the relationship developed in Problem 7.6,
√ √ √ √ √
QM ( 2a, 2b) + QM ( 2b, 2a) = 1 + e−(a+b) I0 (2 a b) , (7.78)
the probability is given by

1 2 a22 2 a21
Pr{r2 > r1 } = 2 2 2
σ1 + σ2 QM ,
σ1 + σ22 σ12 + σ22 σ12 + σ22

2 a21 2 a22
− σ1 QM
2
, . (7.79)
σ12 + σ22 σ12 + σ22
To evaluate the probability that one Rician variable fluctuates higher than
another Rician variable under the assumption of independence that is given in
Equation (7.77), the probability can be written formally as
∞ ∞
Pr{r2 > r1 } = dr1 dr2 f1 (r1 ) f2 (r2 )
0 r1
√ √
∞ 2 2
dr1 a2 2 r1 2 2 r1 − a 1σ+2r 1 2 a1 r1
= QM , e 1 I0 , (7.80)
0 σ1 σ2 σ2 σ1 σ12
where the probability density is defined in Equation (7.76), and the discussion in
Section 3.1.14 provides the form for the definite integral. As discussed in Section
2.14.5, the modified Bessel function of the first kind can be expressed as a contour
integral [343, 53]. This integral is given by

1
dx x−m −1 e 2 (x+1/x) ,
z
Im (z) = (7.81)
2πi C
where C is a contour that encircles the origin. The zeroth order modified Bessel
function is given by

1 eax(p+1/p)/2
I0 (a x) = dp
2πi C p
2 2
1 e(a p+ x /p )/2
= dp , (7.82)
2πi C p
where the substitution p → p a/x is used.

By substituting this integral form for the Bessel function into the integral form
for the Marcum Q-function, the Marcum Q-function can be expressed as a form
with a single integral
∞
a2+x2
QM (a, b) = dx e− 2 x I0 (ax)
b
∞ 2 2
−a
2+x2 1 e(a p+ x /p )/2
= dx e 2 x dp . (7.83)
b 2πi C p
Because the path of a contour integral over holomorphic functions can be de-
formed without consequence while no poles are crossed, the contribution of the
integrand is zero for the left-half plane and at some finite distance into the
right-half plane connect to a path at infinite radius, the contour integral can
be expressed as a line integral at some finite constant positive offset γ from the
imaginary axis. The Marcum Q-function is then expressed as

∞ 2 2
a2+x2 1 e(a p+ x /p )/2
QM (a, b) = dx e− 2 x dp
b 2πi p
∞ a2+x2 2 2
1 γ + i∞
e− 2 e(a p+ x /p )/2
= dp dx x ; γ>0
2πi γ −i∞ b p
∞ 2 2
1 γ + i∞
e(a (p−1)+x (1/p−1) )/2
= dp dx x ; γ>0
2πi γ −i∞ b p
∞ 2
1 γ + i∞
du e(a (p−1)+u (1/p−1) )/2
= dp ; γ>0
2πi γ −i∞ b2 2 p
2 2
1 γ + i∞
e(a (p−1)+b (1/p−1) )/2
=− dp ; (1/p − 1) < 0
2πi γ −i∞ (1/p − 1)p
2 2
e−(a + b e(a p+ b /p )/2
2 2
)/2
= dp , (7.84)
2πi p−1
where the substitution p → p + 1 is employed, and the final contour encloses the
pole at p = 1.
For the sake of notational expedience, the following normalized variables are
defined:
√
r1 2
r=
σ1
√
a1 2
α1 =
σ1
√
a2 2
α2 =
σ2
σ2
ν= . (7.85)
σ1
With these definitions, the probability of one Rician variable fluctuating above
the other is given by
∞ √ √ 2 2
dr1 a2 2 r1 2 2 r1 − a 1σ+2r 1 a1 r1
Pr{r2 > r1 } = QM , e 1 I0 2 2
0 σ1 σ2 σ2 σ1 σ1
∞ r − α 21 + r 2
= dr QM α2 , re 2 I0 (α1 r)
0 ν
∞ r2
e−(α 2 + ν 2 )/2
2 2 2
e(α 2 p+(r /ν ) /p )/2 − α 21 + r 2
= dr dp re 2 I0 (α1 r)
0 2πi p−1
∞ r2
e−(α 2 + ν 2 )/2
2 2 2
e(α 2 p+(r /ν ) /p )/2 − α 21 + r 2
= dr dp re 2
0 2πi p−1
2 2
1 e(α 1 q + r /q )/2
· dq , (7.86)
2πi q
where the contour integrals enclose the pose at p = 1 and q = 0. Collecting

the arguments of the exponentials in Equation (7.86), we find the sum of the
arguments,

α22 p + α12 q − α12 − α22 + r2 1
ν2 p + 1
q −1− 1
ν2
. (7.87)
2
By using the substitution that u = r2 , the probability of one Rician variable

fluctuating higher than another Rician variable becomes

α2 2 1 + 1 −1 − 1
2 p+α1 q+u ν2p q ν2
−(α 21 + α 22 )/2 ∞
e 1 du e 2
Pr{r2 > r1 } = dp dq
2πi 2πi 0 2 q (p − 1)
−(α 21 + α 22 )/2 α2 2
2 p+α1 q
e 1 e 2
= dp dq
2πi 2πi 1+ 1
− 1
− 1
q (p − 1)
ν2 ν2 p q
−(α 21 + α 22 )/2
e
= dp (7.88)
2πi
α2 2
2 p −α 1 α21 q
1 e 2 e 2
· dq .
2πi q − [1 + 1
− 1 −1 1
− 1
− 1)
ν2 ν2 p ] [1 + ν2 ν 2 p ] (p
The contour integral over q can be evaluated directly by using residues and is
discussed in Section 2.9.1 and in Reference [53], and is given by
α 21
α2
1 e2 q 1 [1+ ν12 − ν 21 p ] −1
dq −1
=e 2
. (7.89)
2πi q − [1 + ν12 − 1
ν2 p ]
By incorporating this evaluation, the form for the probability is given by
α2
2 p
α2
1 [1+ ν12 − ν 21 p ] −1
e−(α 1 + α 2 )/2
2 2 γ 1 +i∞
e 2 e2
Pr{r2 > r1 } = dp
2πi γ 1 −i∞ [1 + ν12 − 1
ν 2 p ] (p − 1)
α21 p
α2
2 p p
2 [ p + 2 − 12 ]
e−(α 1 + α 2 )/2
2 2 γ 1 +i∞
e 2 e ν ν
= dp . (7.90)
2πi γ 1 −i∞ [1 + ν12 − 1
ν 2 p ] (p − 1)
The resulting argument of the product of exponentials within the integral is

given by
α2
1 α12 p 1 α22 p2 1 + ν12 − ν 22 p + α12 p
α22 p+ = . (7.91)
2 p + ν 2 − 1/ν 2
p
2 p 1 + ν12 − 1/ν 2
The form of the exponential can be simplified by using the substitution

1
p 1 + 2 − 1/ν 2 → p
ν

p + 1/ν 2
p→ . (7.92)
1 + 1/ν 2
This substitution will not affect the contour integral if the contour encloses the
poles. The probability of one Rician variable fluctuating about another then
becomes

α2
−(α 21 + α 22 )/2 1
2 α 22 p+ p + p
1 p
−1 / ν 2
e e ν2 p
Pr{r2 > r1 } = dp
2πi p 1+ 1
ν2 − 1/ν 2 (p − 1)
⎛ ⎞
p + 1/ ν 2
α2
⎜ 2 p + 1/ ν 2
1 1+ 1/ν 2 ⎟

1
2 ⎝α2 1+ 1/ν 2
+ p ⎠
−(α 21 + α 22 )/2 e p+1/ν 2
e dp 1+1/ν 2
= 8 9
2πi 1 + 1/ν 2 p p+1/ν 2
−1
1+1/ν 2
1
(α 2 +α 2 ν 2 +α 22 ν 2 p+ α 21 /p )
e−(α 1 + α 2 )/2
2 2
dp e 2(1+ ν 2 ) 2 1 p + 1/ν 2
=
2πi 1 + 1/ν 2 p (p − 1)
1 2 2 2
(α ν p+ α 1 /p )
κ dp e 2(1+ ν 2 ) 2 p + 1/ν 2
= , (7.93)
2πi 1 + 1/ν 2 p (p − 1)
where the final contour integral encloses poles at p = 0 and p = 1, and the
constant κ is given by
1
(α 22 +α 21 ν 2 )
κ = e−(α 1 + α 2 )/2 e 2 ( 1 + ν 2 )
2 2
α 21 + ν 2 α 2
− 12 2
=e 1+ ν 2 . (7.94)
By employing the partial fraction expansion for the denominator of the integrand
p+γ 1+γ γ
= − , (7.95)
p (p − 1) p−1 p
the probability becomes

1
(α 2 ν 2 p+ α 21 /p )
κ dp e 2(1+ ν 2 ) 2 p + 1/ν 2
Pr{r2 > r1 } =
2πi 1 + 1/ν 2 p (p − 1)

κ dp 1 2 2 2
2 ) (α 2 ν p+ α 1 /p )
1 + 1/ν 2 1/ν 2
= e 2 ( 1 + ν −
2πi 1 + 1/ν 2 p−1 p
= A1 − A2 , (7.96)
where the integrals A1 and A2 are defined implicitly by expanding the parenthet-
ical term. By substituting the value of the parameter for κ in Equation (7.94),
the two integrals A1 and A2 are given by

κ 1
(α 2 ν 2 p+ α 21 /p ) 1
A1 = dp e 2 ( 1 + ν 2 ) 2
2πi p−1
−
α 22 ν 2 + α 21
α2 2 2
e 2 (1+ ν 2 ) 2 ν p+α1/p 1
= dp e 2 ( 1 + ν 2 )
2πi p−1
⎛ ⎞
α22 ν 2 α12 ⎠
= QM ⎝ ,
1 + ν2 1 + ν2
⎛A A
B 2 a2 σ 2 B 2 a2 ⎞
B 22 22 B 1
⎜B σ σ B σ 12 ⎟
= QM ⎝C 2 σ 12 , C σ2
⎠
1 + σ 22 1 + σ 22
1 1

2 a22 2 a21
= QM , (7.97)
σ12 + σ22 σ12 + σ22
and

κ dp 1 2 2 2
2 ) (α 2 ν p+ α 1 /p )
1
A2 = e 2 ( 1 + ν
2πi 1 + 1/ν 2 ν2 p
1 2 2 2
(α ν p+ α 1 /p )
κ 1 e 2(1+ ν 2 ) 2
= dp
1 + ν 2 2πi p

κ 2 2
α2 ν α1 2
= I0
1 + ν2 1 + ν2
2 a 21 σ 2 2 a 22
+ 22
σ 12 σ 1 σ 22
− 12
σ2 ⎛ 2 a2 2 2 ⎞
1 + 22 2 σ2 2 a1
σ1
e σ 22 σ 12 σ 12
= σ 22
I0 ⎝ σ 22
⎠
1+ σ 12
1+ σ 12
a2+a2
σ12 − 12 22
σ1+σ2
a1 a2
= 2 e I0 2 2 . (7.98)
σ1 + σ22 σ1 + σ22
Consequently, the probability that one independent Rician variable fluctuates
above another is given by

a2+a2
2 a22 2 a21 σ12 − 21 22 a1 a2
Pr{r2 > r1 } = QM , − 2 e σ1+σ2
I0 2 2 .
σ12 + σ22 σ12 + σ22 σ1 + σ22 σ1 + σ22
(7.99)
7.8.4 Correlated Rician random variables

The discussion in the previous section described the evaluation of the probability
of one Rician variable fluctuating above another under the assumption that the
two variables are independent. In general, this is not true. Here the random
variables are associated with inner products between the observed array response
and the theoretical array response for the mainlobe and the sidelobe. These
random variables are correlated.
Here an approach to translate the results from the previous section to the
problem of correlated Rician variables is discussed. A thorough discussion of
correlated Rician random variables can be found in Reference [293], for example.
The Rician variables r1 and r2 are given by the magnitudes of the complex
Gaussian variables x1 and x2 .
In this section, uncorrelated variables are constructed by applying a transfor-
mation to the correlated variables. The newly constructed uncorrelated complex
Gaussian variables will be indicated by the vector and scalars,

x1
x= , (7.100)
x2
and the underlying correlated variables are indicated by

y1 1 v† (φm .l. ) z
y= =√ , (7.101)
y2 nr v† (φs.l. ) z
√
where the notation from Equation (7.71) is used, and the normalization of nr
is employed so that the variance of the noise is 1. In both cases, the desire is to
determine if x2 > x1 or y2 > y1 . With the above vector notation, this
can be addressed with the form

1 0
x† x<0
0 −1
x1 2
− x2 2
<0 (7.102)
or

† 1 0
y y<0
0 −1
y1 2
− y2 2
< 0. (7.103)
The correlated variables are given by the inner product of the mainlobe array
response v† (um .l. ), the sidelobe array response v† (us.l. ), and the received array
response z ∈ Cn r ×1 such that z = a v(um .l. ) + n ∈ Cn r ×1 ,

1 v† (um .l. ) z
y= √
nr v† (us.l. ) z

1 v† (um .l. ) [a v(um .l. ) + n]
=√
nr v† (us.l. ) [a v(um .l. ) + n]

1 a nr + v† (um .l. ) n
=√ , (7.104)
nr a ρ nr + v† (us.l. ) n
where a is the product of the signal amplitude and the channel attenuation, and
ρ is the normalized inner product between the theoretical sidelobe and mainlobe
array responses. The mean of the correlated random variables is then given by
) *
1 a nr + v† (um .l. ) n
y = √
nr a ρ nr + v† (us.l. ) n

1 a nr
=√ . (7.105)
nr a ρ nr
The covariance matrix C ∈ C2×2 for the correlated variable is given by
% &
C = [y − y] [y − y]†
) † *
1 v (um .l. ) n n† v(um .l. ) v† (um .l. ) n n† v(us.l. )
=
nr v† (us.l. ) n n† v(um .l. ) v† (us.l. ) n n† v(us.l. )
†
1 v (um .l. ) v(um .l. ) v† (um .l. ) v(us.l. )
=
nr v† (us.l. ) v(um .l. ) v† (us.l. ) v(us.l. )

1 ρ∗
= . (7.106)
ρ 1
While it is not obvious, there exists a linear transformation that relates the
correlated and uncorrelated variables. This transformation simultaneously main-
tains the difference relationships in Equations (7.102) and (7.103), and decor-
relates the variables in y. The invertible linear transformation between the two
vectors is given by the matrix A ∈ C2×2 , such that the two vectors are related by
x = Ay (7.107)
and
y = A−1 x . (7.108)
The transformation matrix must satisfy the following relations. Firstly, for the
transformed random variables to be uncorrelated, the off-diagonal entries in the
transformed covariance matrix must be zero,
2
† σx 1 0
ACA = , (7.109)
0 σx2 2
and secondly, the difference between the magnitude squared of the random vari-
ables must be conserved, x1 2 − x2 2 = y1 2 − y2 2 , which is satisfied by
requiring

1 0 1 0
A† A= . (7.110)
0 −1 0 −1
As suggested in Reference [293], a linear transform that satisfies these require-
ments is given by

1 + β (1 − β) e−iα
A=b , (7.111)
1 − β (1 + β) e−iα
where the phase α is defined by ρ = ρ eiα , and b and β are constants to be

determined. The difference relationship requires
†
1 0 1 + β (1 − β) e−iα 1 0
A† A = b2
0 −1 1 − β (1 + β) e−iα 0 −1

1 + β (1 − β) e−iα
·
1 − β (1 + β) e−iα

4β 0
= b2 . (7.112)
0 −4β
Consequently, the value of the scaling constant b is given by
1
b= √ . (7.113)
4β
The value for β is solved by noting that

1 1 + β (1 − β) e−iα 1 ρ e−iα
A C A† =
4β 1 − β (1 + β) e−iα ρ eiα 1
†
1 + β (1 − β) e−iα
·
1 − β (1 + β) e−iα

2 1 + β 2 + ρ (1 − β 2 ) 1 − β 2 + ρ (1 + β 2 )
= 2b . (7.114)
1 − β 2 + ρ (1 + β 2 ) 1 + β 2 + ρ (1 − β 2 )
By noting that only diagonal elements should be nonzero, the value for β can be
found by setting the off-diagonal elements to zero,
0 = 1 − β 2 + ρ (1 + β 2 )
1+ ρ
β= . (7.115)
1− ρ
By considering the diagonal entries in Equation (7.114), the variance of the newly
created uncorrelated complex Gaussian variables is given by
1 + β 2 + ρ (1 − β 2 )
σx2 1 = σx2 2 =
2β
= 4b2 (1 + ρ) . (7.116)
Consequently, the transformation matrix is given by
⎛ ⎞
ρ+1 ρ+1
1 + 1−ρ e−iα 1 − 1−ρ
A=b⎝
ρ+1

ρ+1
⎠, (7.117)
1 − 1−ρ e−iα 1 + 1−ρ
and the inverse of the transformation is given by

⎛ ⎞
ρ+1 ρ+1
1 + − 1 −
A−1 = b ⎝
1−ρ
ρ+1
1−ρ ⎠ .
ρ+1
(7.118)
−iα −iα
−e 1 − 1−ρ e 1 + 1−ρ
As an aside, the transformation matrix A is not unitary. The correlated complex

Gaussian variables defined by y are transformed to the uncorrelated variables x
by
x = Ay. (7.119)
Recalling Equation (7.79), the probability of one independent Rician fluctuat-

ing above another is given by

1 2 a22 2 a21
Pr{r2 > r1 } = 2 2 2
σ1 + σ2 QM ,
σ1 + σ22 σ12 + σ22 σ12 + σ22

2 a21 2 a22
− σ1 QM
2
, , (7.120)
σ12 + σ22 σ12 + σ22
where the means are given by am = xm for m = 1, 2. One should note
here that am is not the signal amplitude-attenuation product indicated by a. By
using Equation (7.105), the value for the means of the newly created uncorrelated
variables is given by
x = A y
⎛ ⎞
ρ+1 ρ+1
1 + 1−ρ e−iα 1 − 1−ρ 1 a n
=b ⎝ ⎠ √
r
ρ+1 ρ+1
1 − 1−ρ e−iα 1 + 1−ρ nr a ρ nr

√ 1+ ρ + 1 − ρ 2
= b a nr . (7.121)
1+ ρ − 1− ρ 2
By noting that σ12 = σ22 = σx2 1 = σx2 2 from Equation (7.116), the first and second
arguments of the Marcum Q-function in Equation (7.120) are given by
2 a22 a22
=
σ12+ σ22 σ12

b2 a2 nr (1 + ρ − 1 − ρ 2 )2
=
4b2 (1 + ρ )
,
a2 nr
= (1 − 1 − ρ 2 ) (7.122)
2
and
2 a21 a21
=
σ12+ σ22 σ12
,
a2 nr
= (1 + 1 − ρ 2 ) . (7.123)
2
Consequently, the probability of selecting an erroneous sidelobe used in Equation

(7.74) is given by
, ,
1 a2 nr a2n
r

Pr{r2 > r1 } = 1 + QM (1 − 1 − ρ 2 ), (1 + 1 − ρ 2 )
2 2 2
, ,
a2 nr a2 nr
− QM (1 + 1 − ρ ), 2 (1 − 1 − ρ )2 ,
2 2
(7.124)
where the SNR per receive antenna under the assumption of a single observation
is a2 .
7.8.5 Unknown complex Gaussian signal

For a transmitted signal with ns samples that are drawn independently from
a unit-variance complex Gaussian distribution, represented by the row vector
s ∈ C1×n s , the beamscan test statistic t(φ) (discussed in Section 7.3) as a function
of angle φ is given by
t(φ) = v† (φ) Q̂ v(φ)

1
= v† (φ) Z 2 , (7.125)
ns
where the ns observed independent samples are contained in the matrix Z ∈
Cn r ×n s ,
Z = a v(φm .l. ) s + N , (7.126)
a indicates the received signal amplitude per receive antenna (implying the steer-
ing vector normalization v(φ) 2 = nr ), the complex Gaussian signal is indicated
by s ∈ C1×n s , and N ∈ Cn r ×n s is the additive noise.
The probability of the test statistic at some sidelobe fluctuating above the
mainlobe Ps.l. (SNR) is given by

v† (φs.l. ) Z 2
Ps.l. (SNR) = Pr >1
v† (φm .l. ) Z 2 }
= Pr{ v† (φs.l. ) Z 2
− v† (φm .l. ) Z 2 } > 0} . (7.127)
We construct the output of the matched-filter response associated with the

mainlobe y1 and the sidelobe y2 , respectively:
y1 = v† (φm .l. ) Z = a nr s + v† (φm .l. ) N

y2 = v† (φs.l. ) Z = a nr ρ s + v† (φs.l. ) N , (7.128)
where the normalization v† (φ)v(φ) = nr is assumed and the normalized cor-

relation variable is defined to be ρ = v† (φs.l. )v(φm .l. )/nr . It is convenient to
construct a matrix with the correlated variables that is denoted

y1
Y= ∈ C2×n s . (7.129)
y2
Similar to the discussion in Sections 7.8.2, 7.8.3, and 7.8.4, to evaluate the
probability of confusion (which is a function of the F distribution), we need
to construct a set of uncorrelated random variables by transforming correlated
random variables. The related uncorrelated variables are denoted

x1
X= ∈ C2×n s . (7.130)
x2
With the original variables contained within Y and the related uncorrelated
versions contained within X, the two forms are related by the transformation
matrix A ∈ C2×2 under the transformation
X = AY. (7.131)
By noting that the magnitude squared of the inner product of the steering
vector and the observation matrix can also be expressed by using a trace
v† (φ) Z 2
= tr{v† (φ) Z Z† v(φ)} , (7.132)
the difference between the magnitudes squared of the inner products can be
expressed by

1 0
v† (φm .l. ) Z 2 − v† (φs.l. ) Z 2 = tr Y† Y
0 −1

† −† 1 0 −1
= tr X A A X . (7.133)
0 −1
By exploiting the relationship found in Equation (7.111), the decorrelating trans-
formation matrix is given by

1 1 + β (1 − β) e−iα
A= √ , (7.134)
4β 1 − β (1 + β) e−iα
where the parameters α and β need to be determined.
Because the Gaussian variables have zero mean,
X = Y = 0 , (7.135)
and because the values in s are drawn independently from sample to sample, the
covariance CY ∈ C2×2 of Y is given by
% &
CY = Y Y†
' (
y1 y†1 y1 y†2
=
y2 y†1 y2 y†2

2 P +1 ρ∗ (P + 1)
= ns nr , (7.136)
ρ (P + 1) P ρ 2 + 1
where the SNR per sample per receive antenna is given by P = a 2 . In the
end, the overall scale will not be significant, so it is convenient to consider the
normalized covariance C̃Y is given by
1
C̃Y = CY . (7.137)
ns n2r
The transformed normalized covariance matrix is given by

σ12 0
A C̃Y A† = . (7.138)
0 σ22
The parameters α and β for Equation (7.134) can be found to be
α = arg(ρ) (7.139)
and
(P + 1) + (P ρ 2 + 1) + 2 ρ (P + 1)
β= . (7.140)
(P + 1) + (P ρ 2 + 1) − 2 ρ (P + 1)
By evaluating Equation (7.138) with forms for α and β, the variances for the
uncorrelated variables is given by
2(P + 1)(ρ + 1)
σ12 =
β(P (1 − ρ) + 2) − P (ρ + 1)
2(P + 1)(ρ + 1)
σ22 = . (7.141)
β(P (1 − ρ) + 2) + P (ρ + 1)
Because only the relative values of these variances is of interest, it is useful to

consider the ratio,
σ12 β [P (1 − ρ) + 2] + P (ρ + 1)
= . (7.142)
σ22 β [P (1 − ρ) + 2] − P (ρ + 1)
Because the noise and the signal are assumed to be Gaussian, the difference
expressed in Equation (7.133) is the difference between random χ2 variables.
However, also from Equation (7.133), the test expressed as the difference between
the magnitudes squared of the vectors can also be expressed as a test in terms of
the ratio of these magnitudes squared. The ratio of two degree-normalized central
χ2 variables is given by the F distribution, as discussed in Section 3.1.13. The
probability density of a given value of the ratio q is denoted q ∼ pF (q; d1 , d2 ),
where d1 and d2 indicate the degrees of the χ2 variables. In our case, the two
χ2 variables have the same degree. If the ratio of two equal-degree complex χ2
random variables with degree ns is denoted q,

"n s "n s
1
σ 22 /2 m =1 {x2 }m 2
+ 1
σ 22 /2 m =1 {x2 }m 2
q= "n s "n s
1
σ 12 /2 m =1 {x1 }m 2 + 1
σ 12 /2 m =1 {x1 }m 2
"n s
σ12 {x2 }n 2
= "nns=1
σ22 m =1 {x1 }m 2
"n s
σ22 {x2 }n 2
= q̃ ; q̃ = "nns=1 , (7.143)
σ12 m =1 {x1 }m
2
where q̃ is the unnormalized ratio associated with the test statistic. Confusion
between the sidelobe and the mainlobe occurs when the random variable q̃ > 1,
so that the probability of selecting the sidelobe over the mainlobe Ps.l. (SNR) is
given by
Ps.l. (SNR) = Pr{q̃ > 1}

σ2
= Pr q > 12
σ2
∞
= dq pF (q; 2ns , 2ns )
σ 12 /σ 22
= 1 − PF (σ12 /σ22 ; 2ns , 2ns ) , (7.144)
where PF (q; d1 , d2 ) is the cumulative distribution function for the F distribu-

tion discussed in Section 3.1.13. The cumulative distribution function can be
expressed in terms of beta functions, discussed in Section 2.14.3. The cumulative
distribution function for the F distribution is given by

q d1
B ; d21 , d22
q d 1 +d 2
PF (q; d1 , d2 ) = . (7.145)
B d21 , d22
Consequently, the probability of selecting the sidelobe over the mainlobe Ps.l. (SNR)
is given by
σ 12

2 ns
σ 22
B σ 12
; ns , ns
2 n s +2n s
σ 22
Ps.l. (SNR) = 1 −
B(ns , ns )

σ 12
B σ 12 +σ 22
; ns , ns
=1− , (7.146)
B(ns , ns )
where the ratio of the decorrelated variable variances is given by Equation (7.142)
using Equation (7.140) in which the SNR per sample per receive antenna is given
by P .
7.9 Vector sensor 235
H2 H1
E2
H3
E1
E3
Figure 7.3 Notional representation of a vector sensor. All three electric (E1 , E2 , E3 )
and magnetic fields (H1 , H2 , H3 ) are measured simultaneously based at a single
phase center. The electric fields are measured by using the dipole antennas, and the
magnetic fields are measured by using the loop antennas.
7.9 Vector sensor
The antenna array discussed previously exploits the relative phase delay induced
by the time delay of signals impinging upon each antenna to determine direc-
tion. A vector sensor employs an array of antennas at a single phase center.
Consequently, there is no relative phase delay. Instead, the vector sensor finds
the direction to a source by comparing the relative amplitudes [229]. The vector
sensor employs elements that are sensitive to electric and magnetic fields along
each axis, as seen in Figure 7.3. Depending upon the direction of the impinging
signal, different elements will couple to the wavefront with different efficiencies.
Because the polarization of the incoming signal is unknown, and different po-
larizations will couple to each antenna with different efficiencies, the incoming
signal polarization must be determined as a nuisance parameter.
The electric and magnetic fields are indicated by
⎛ ⎞
E1
e = ⎝ E2 ⎠ (7.147)
E3
and
⎛ ⎞
H1
h = ⎝ H2 ⎠ , (7.148)
H3
respectively. Under the assumption of free space propagation, the Poynting vector
[154] with power flux P is given by the cross product of the electric field and
the magnetic field. Here, the unit-norm direction vector u indicates the direction
from the receive array to the source (the opposite direction of the Poynting
vector):
u P = −e × h
1
u×e=− e×h×e
P
e 2
u×e=− h (7.149)
P
by using the relationship
a × b × c = b(a · c) − c(a · b) . (7.150)
The six receive signals are given by the three electric field measurements
zE (t) = e + nE (t) (7.151)
and
e 2
zH (t) = − h + nH (t) , (7.152)
P
where the noise for the electric and magnetic field measurements are indicated
by nE (t) ∈ C3×1 and nH (t) ∈ C3×1 , respectively.
The six measured receive signals are a function of direction and polarization,
and are given by

zE (t) I
= V ξ(t) + n(t) , (7.153)
zH (t) [u×]
where the noise vector n(t) is given by

nE (t)
n(t) = , (7.154)
nH (t)
the direction cross-product operator [u×] is given by
⎛ ⎞
0 −{u}3 {u}2
[u×] = ⎝ {u}3 0 −{u}1 ⎠ , (7.155)
−{u}2 {u}1 0
and for a given polarization vector ξ(t) ∈ C2×1 , the matrix V ∈ R3×2 that maps
the two polarization components orthogonal to the direction of propagation to
the three spatial dimensions is given by
⎛ ⎞
−sin φ −cos φ sin θ
V = ⎝ cos φ −sin φ sin θ ⎠ . (7.156)
0 −cos θ
Here the angle φ is defined as the angle from the 1 axis in the 1–2 plane, and θ
is the angle from the 3-axis.
Because the vector sensor has no aperture, the intrinsic resolution is relatively
poor, of the order one radian. This intrinsic resolution can be determined from
the multiplicative constant term in the Cramer–Rao bound [229]. To achieve
reasonable angle-estimation performance requires beamsplitting under the as-
sumption of a high SNR signal.
Problems 237
0.4
0.2
sin φ
0.0
- 0.2
- 0.4
0.0 0.2 0.4 0.6 0.8 1.0

cos φ
Figure 7.4 Beam pattern as a function of angle φ from boresight of vector sensor
assuming eθ polarization for elevations of 0 (black, in the plane of the transmitter),
45 degrees (dark gray), and 67.5 degrees (light gray).
In Figure 7.4, the beam pattern for a vector sensor is displayed. Only the
electric field along the polar direction (eθ in using the notation from Section
5.1) response is considered in the beam pattern. Because the vector sensor has
no intrinsic aperture, the beamwidth is very wide. In addition to beam pattern
in the plane (0 degrees), patterns for elevations of 45 and 67.5 degrees are both
smaller in their response to the in-plane (0 degrees) excitation.
Problems
7.1 Considering a five-element regular linear array with 1/2-wavelength spacing

and isotropic antennas, under the assumption of receive SNR per sample per
antenna of 10 dB and 10 samples independently drawn from a complex Gaussian
distribution for each source, evaluate and plot the pseudospectrum as a function
of direction parameter u = sin φ for angle φ, where φ = 0 is along boresight:
(a) for beamscan and MVDR with a single source at sin φ = −0.3,
(b) for beamscan and MVDR with sources at sin φ = −0.3, 0.4,
(c) for beamscan and MVDR with sources at sin φ = −0.4, −0.3, 0.4,
(d) for beamscan and MVDR with sources at sin φ = −0.8, −0.4, −0.3, 0.0,
0.3, 0.4, 0.8.
7.2 Considering a five-element regular linear array with 1/2-wavelength spac-

ing and isotropic antennas, under the assumption of receive SNR per sam-
ple per antenna of 10 dB where all sources coherently transmit the same 10
samples drawn from a complex Gaussian distribution, evaluate and plot the
psuedospectrum as a function of direction parameter u = sin φ for angle φ,

where φ = 0 is along boresight:
(a) for beamscan and MVDR with a single source at sin φ = −0.3,
(b) for beamscan and MVDR with sources at sin φ = −0.3, 0.4,
(c) for beamscan and MVDR with sources at sin φ = −0.4, −0.3, 0.4,
(d) for beamscan and MVDR with sources at sin φ = −0.8, −0.4, −0.3, 0.0,
0.3, 0.4, 0.8.
7.3 Considering the best unbiased angle-estimator variance for a linear array
with mean-squared antenna position σy2 , with receive SNR per sample per an-
tenna P and direction parameter u = sin φ for angle φ,
(a) evaluate the ratio of best variance of direction parameter u estimation for
a single transmitted sequence that is drawn from a Gaussian distribution
relative to a known sequence, and
(b) discuss the ratio in the regime of small SNR but large number of samples.
7.4 Considering the best unbiased angle-estimator variance for a linear array
with mean-squared antenna position σy2 , with receive energy per sample per
antenna P and direction parameter u = sin φ for angle φ,
(a) evaluate best unbiased angle-estimator variance for angle estimation φ for
a single transmitted sequence that is drawn from a Gaussian distribution,
and
(b) discuss the variance as φ approaches end fire of the array.
7.5 Show that the following relationship between Marcum Q-functions and
Bessel functions is true,
√ √ √ √ √
QM ( 2a, 2b) + QM ( 2b, 2a) = 1 + e−(a+b) I0 (2 a b) . (7.157)
7.6 Considering the problem of angle estimation based upon a single observa-
tion of a narrowband signal of wavelength λ for the antenna array with phase
centers at positions {0, 1, 3, 5, 8}λ/2 along the {x}2 axis, find the approximate
probability of confusing a sidelobe for a mainlobe as a function of per receive
antenna SNR.
7.7 Considering the problem of angle estimation based upon a single observa-
tion of a narrowband signal of wavelength λ for the antenna array with phase
centers at positions {0, 1, 3, 5, 8}λ/2 along the {x}2 axis, evaluate the variance
bound using the method of intervals, keeping only the dominant sidelobe as a
function of per receive antenna SNR.
7.8 Show for Equation (7.134) that the values for the parameters α and β given
in Equations (7.140) and (7.139):
(a) decorrelate the variables y1 and y2 found in Equation (7.129),
(b) produce the variances presented in Equation (7.141).
7.9 For a single source known to be in the {x}1 –{x}2 plane, observed by vec-
tor sensor, evaluate the Cramer–Rao angle-estimation bound as a function of
integrated SNR and polarization.
8 MIMO channel
By using the diversity made available by multiple-antenna communications, links

can be improved [349, 109]. For example, the multiple degrees of freedom can be
used to provide robustness through channel redundancy or increased data rates
by exploiting multiple paths through the environment simultaneously. These ad-
vantages can even potentially be employed to reduce the probability of intercep-
tion [1]. Knowledge of the channel can even be used to generate cryptographic
keys [332].
The basic concept of a multiple-input multiple-output (MIMO) wireless com-
munication link is that a single source distributes data across multiple transmit
antennas, as seen in Figure 8.1. Potentially independent signals from the mul-
tiple transmit antennas propagate through the environment, typically encoun-
tering different channels. The multiple-antenna receiver then disentangles the
signal from the multiple transmitters [99, 209, 308, 33, 116, 258, 84]. There is a
wide range of approaches for distributing the data across the transmitters and
in implementations for the receiver.1
While MIMO communication can operate in line-of-sight environments (at
the expense of some of the typical assumptions used in MIMO communica-
tions), the most common scenario for MIMO communications is to operate in an
environment that is characterized by complicated multipath scattering. Conse-
quently, most, if not all, of the energy observed at the receive array impinges
upon the array from directions different from the direction to the source. Con-
sequently, the line-of-sight environment assumption employed in Chapters 6 and
7 is not valid for most applications of MIMO communications. The environment
in which the link is operating is referred to as the channel. The capacity of a
MIMO link is a function of the structure of this channel, so a number of channel
models are considered in the chapter.
8.1 Flat-fading channel
It is said that a signal is narrowband if the channel between each transmit

and receive antenna can be characterized, to a good approximation, by a single
1 Some sections of this chapter are IEEE
c 2002. Reprinted, with permission, from
Reference [33].
240 MIMO channel
Receiver
Transmitter
Scattering
Field
Figure 8.1 Notional multiple-input multiple-output (MIMO) wireless communication

link from a transmitter to a receiver. The transmitted signal propagates through some
scattering field.
complex number. This is a valid characterization of the channel when the signal
bandwidth B is small compared to the inverse of the characteristic delay spread
Δt,
1
B . (8.1)
Δt
This regime is also described as a flat-fading channel because the same complex
attenuation can be used across frequencies employed by the transmission and is
consequently flat (as opposed to frequency-selective fading). The elements in the
flat-fading channel matrix
H ∈ Cn r ×n t
contain the complex attenuation from each transmitter to each receiver. For
example, the path between the mth transmitter and nth receiver has a complex
attenuation {H}n ,m . A received signal z(t) ∈ Cn r ×1 as a function of time t is
given by
z(t) = H s(t) + n(t) , (8.2)
where the transmitted signal vector and additive noise (including external in-
terference) as a function of time are denoted s(t) ∈ Cn t ×1 and n(t) ∈ Cn r ×1 ,
respectively.
It is often convenient to consider a block of data of ns samples. The received
signal for a block of data with ns samples is given by
Z = HS + N, (8.3)
where the received signal is given by Z ∈ Cn r ×n s , the transmitted signal is

given by S ∈ Cn t ×n s , and the noise (including external interference) is given by
8.2 Interference 241
N ∈ Cn r ×n s . The notion that the channel is static for at least ns samples is

implicit in this model.
8.2 Interference
Historically, external interference (interference from other communication links)

was easily avoided by employing frequency-division multiple access (FDMA).
However, because of the significant increase in the use of wireless communica-
tions, external interference is becoming an increasingly significant problem. We
typically identify interference as external interference. Interference from within
a communication system’s own transmitters is typically is defined as internal
interference. For a MIMO system this is particularly important because the sig-
nals from the multiple transmitters of a signal node typically interfere with each
other at the receiver. One of the important wireless regimes is unregulated or
loosely controlled frequency bands, such as the industrial, scientific, and medi-
cal (ISM) bands. In these bands, various communication systems compete in a
limited spectrum. In addition, wireless ad hoc and cellular networks, discussed
in detail in later chapters, can introduce cochannel interference (that is interfer-
ence at the same frequency). By decreasing sensitivity to interference in wire-
less networks, higher link densities or higher signal-to-noise ratio (SNR) links
can be achieved; thus, network throughput is increased. In any case, the abil-
ity to operate in interference can significantly increase the range of potential
applications. To describe the effects of the interference, two essential charac-
teristics need to be specified: the channel and knowledge of the interference
waveform.
The received signal described in Equation (8.3) is given by
Z = HS + N

= HS + Jm Tm + Ñ , (8.4)
m
where the interference contained within N is expressed in the sum of the terms
Jm Tm . The term Ñ is the remaining thermal noise. The interference channels
Jm are typically statistically equivalent to those for the signal of interest of the
channel H.
The nature (really the statistics) of the external interference signal can have
a significant effect on a communication system’s performance; thus, priors on
the probability distributions for the interference can have a dramatic effect on
receiver design. As an example, if the interference signal and its channel J T
are known exactly, then the interference has no effect because a receiver that
is aware of these parameters can subtract the contributions of the interference
from the received signal perfectly, assuming a receiver with ideal characteristics.
However, a receiver that cannot take advantage of this knowledge will be forced
242 MIMO channel
to operate in the presence of a large noise-like signal. From a practical point of

view, the effect of known interference (even if the channel is unknown) can be
minimized at the receiver by projecting onto a basis temporally orthogonal to
the interfering signal,
Z̃ = Z P⊥
T
P⊥ † † −1
T = I − T (TT ) T. (8.5)
In the limit of a large number of samples, ns rank{T}, the loss associated

with this projection approaches zero because the size of the total signal space
is ns , but the size of the transmitted signal subspace is fixed; thus the frac-
tion of the potential signal subspace that is subtended by the projection op-
eration goes to zero as ns become large. If the interfering signal is known ex-
actly by the transmitter, then, in theory, the adverse effect of the interference
can be mitigated exactly by using “dirty-paper coding” [67] that is discussed
in Section 5.3.4. However, useful implementations of dirty-paper coding remain
elusive.
In the context of information theory, the worst-case distribution for an un-
known noise, or in this case interference signal, is Gaussian. As discussed in
Section 5.3.3, this distribution maximizes the entropy associated with the in-
terference and consequently minimizes the mutual information. As a somewhat
amusing aside, when receivers are optimized for Gaussian interference and noise,
non-Gaussian signals can sometimes be the most disruptive.
8.2.1 Maximizing entropy

As discussed in Section 5.3.3, entropy is a measure of the number of bits re-
quired to specify the state of a random variable with a given distribution. To
evaluate MIMO channel capacity (discussed in Section 8.3), it is useful to iden-
tify the probability distributions that maximize entropy. The differential entropy
(entropy per symbol) of a multivariate random variable is given by

h(x) = − dΩx p(x) log2 [p(x)] , (8.6)
where p(x) is the probability density function for the random vector x ∈ Cn r ×1
and dΩx indicates the differential hypervolume associated with the integration
variable x, as discussed in Section 2.9.2. As discussed in Section 5.3.3, the Gaus-
sian distribution maximizes entropy for a given signal variance. This property is
also true in the case of multivariate distributions. The differential entropy of an
nr -dimensional multivariate mean-zero complex Gaussian distribution denoted
x with covariance matrix R ∈ Cn r ×n r is

1 −x † R −1 x 1 −x † R −1 x
h(x) = − dΩx e log e
|R| π n r 2
|R|π n r

1 −x † R −1 x −x † R −1 x
= − dΩx e − log [|R|π nr
] + log [e] log[e ]
|R| π n r 2 2

1 † −1
= − dΩx e−x R x − log2 [|R|π n r ] − log2 [e] x† R−1 x
|R| π n r

nr x† R−1 x −x † R −1 x
= log2 [|R| π ] + log2 [e] dΩx e
|R|π n r

y† y −y † y
= log2 [|R| π n r ] + log2 [e] dΩy |R| e ; y = R−1/2 x
|R|π n r
n n
r r
e−{y}n
2
nr
= log2 [|R| π ] + log2 [e] dΩy {y}m 2
m =1 n =1
π
= log2 [|R| π n r ] + log2 [e] nr
= log2 ([π e]n r |R|) , (8.7)
where it is observed that the determinant of the interference-plus-noise covari-

ance matrix |R| is the Jacobian associated with the change of variables from x
to y, and the last set of integrals each evaluate to 1 because the integral over
the
% Gaussian& probability density is 1, and the variance of the zero-mean variable
{y}m 2 = 1.
8.3 Flat-fading MIMO capacity
The maximum spectral efficiency at which the effective error rate can be driven to
zero (or in other words capacity) for a flat-fading link is found by maximizing the
mutual information [68], as introduced in Section 5.3.2. The spectral efficiency
is defined by the data rate divided by the bandwidth and has the units of bits
per seconds per hertz (b/s/Hz) or equivalently (b/[s Hz]). The units of bits per
seconds per hertz are just bits, although it is sometimes useful to keep the slightly
clumsy longer form because it is suggestive of the underlying meaning. To find
the channel capacity, both an outer bound and an achievability (inner) bound
must be evaluated, and it must be shown that these two bounds are equal. In the
following discussion, it is assumed without proof that Gaussian distributions are
capacity achieving for MIMO links. More thorough discussions are presented in
[308, 68]. There are various levels of channel-state information available to the
transmitter. The spectral efficiency bound increases along with the amount of
information available to the transmitter. As we use it here, the term capacity is
a spectral efficiency bound. However, not all useful spectral efficiency bounds are
capacity; because of some other constraints or lack of channel knowledge, a given
spectral efficiency bound may be less than the channel capacity given complete
244 MIMO channel
channel knowledge. One might argue reasonably that only when the entire system
has knowledge of the channel (with the exception of noise) is the maximum
achievable spectral efficiency bound; thus, is the channel capacity. However, in
practice it is common to refer to as capacity, multiple spectral efficiency bounds
with different assumptions on system constraints. Given this practice, some care
must be taken when a given spectral efficiency bound is identified as channel
capacity.
In maximizing the mutual information, a variety of constraints can be imposed.
The most common constraint is the total transmit power. For the MIMO link,
an additional requirement can be placed upon the optimization: knowledge of
channel-state information (CSI) at the transmitter. If the transmitter knows
the channel matrix, then it can alter its transmission strategy (which in theory
can be expressed in terms of the transmit signal covariance matrix) to improve
performance. Conversely, if the channel is not known at the transmitter, and
this is more common in communication systems, then the transmitter is forced
to employ an approach with lower average performance.
Because the channel is represented by a matrix in MIMO communications
compared to a scalar in SISO communications, the notion of channel knowledge
is more complicated. In both cases, the channel state can be completely known
exactly or statistically. However, in the case of MIMO, the notion of statistical
knowledge is even more involved. As an explicit example, all flat-fading SISO
channels of the same attenuation have the same capacity as a function of transmit
power, but all MIMO channel matrices with the same Frobenius norm (which
implies the same average attenuation) do not have the same capacity.
An issue in considering performance of communication systems is in relating
theoretical and experimental analyses of performance. In general, this is true
for both SISO and MIMO systems, although it is slightly more complicated for
MIMO systems. Theoretical discussions of MIMO communications are typically
discussed in terms of average SNR per receive antenna. However, the SNR es-
timate produced from a channel measurement is not the same. Explicitly, this
is understood by noting that the estimate of the SNR for a particular esti-
mated channel instance is not the same as the average SNR for an ensemble of
channels,
2
) *
∝ Ĥ F = SNR ∝ H 2F
SNR , (8.8)
nr nr
where the notation ˆ· indicates an estimated parameter. This difference is dis-
cussed in greater detail in Section 8.11. Implicit in this formulation of SNR is
the notion that each transmit antenna excites the channel with independent sig-
nals with equal power (the optimal solution for the uninformed transmitter). If
the transmit antennas incorporate correlations to take advantage of the channel-
state information (an informed transmitter solution), then this discussion is even
more complicated. A more thorough discussion of channel-state information is
presented in Section 8.3.1.
If the channel capacity over some bandwidth B is indicated by C, then under

the assumptions of a flat-fading or spectrally constant channel of interest, the
bounding spectral efficiency c and total data rate are related by
C = B c. (8.9)
Often without specifying any underlying assumptions or constraints being con-
sidered (which can lead to confusion), both the bounding spectral efficiencies
and the bounding total data rate are referred to as the channel capacity.
The information theoretic capacity of MIMO systems has been discussed widely,
for example in References [308, 33]. The development of the informed transmit-
ter (“water filling” [68]) and uninformed transmitter approaches is discussed in
Sections 8.3.2 and 8.3.3. The relative performance of these approaches is dis-
cussed in Section 8.3.4. Here the informed transmitter will have access to an
accurate estimate of the channel matrix and a statistical estimate of the inter-
ference. The application of channel information at the transmitter is sometimes
given the somewhat unfortunate name “precoding,” although often this name
implies a suboptimal linear precoding approach [330, 268, 274, 162, 197]. As will
be developed in Sections 8.3.2 and 8.3.3, the capacities (the bounding spectral
efficiency in units of bits per second per hertz) of the informed and uninformed
transmitter in flat-fading environments are given by

c= max log2 I + R−1 H P H† (8.10)
P: tr{P}≤P o
and

Po −1
c = log2 I + R H H† , (8.11)
nt
respectively. In the informed transmitter case, the transmit spatial covariance
matrix P ∈ Cn t ×n t contains the optimized statistical cross correlations between
transmit antennas. The total transmit power is indicated by Po . The interference-
plus-noise spatial covariance matrix is indicated by R ∈ Cn r ×n r . For conve-
nience, it is often assumed that the transmit spatial covariance matrix P and
the interference-plus-noise spatial covariance matrix R ∈ Cn r ×n r are expressed
in units of thermal noise. Under this normalization, a thermal noise covariance
matrix is given by I. As a reminder, we are considering signals in a complex
baseband representation. Consequently, for each symbol there are two degrees
of freedom (real and imaginary), so the “1/2” in the standard form of capacity
“1/2 log2 (1 + SNR)” is not present.
8.3.1 Channel-state information at the transmitter

In information theoretic discussions about MIMO links, a variety of models are
used with regard to the knowledge of the channel. There is some confusion with
regards to what is meant by the channel. In the most general sense channel
knowledge would include the complex attenuation from any transmit antenna to
246 MIMO channel
any receive antenna as a function of frequency and time. It would also include
any noise or interference introduced in the channel. In discussions of dirty-paper
coding, introduced in Section 5.3.4, it is the noise or the interference that is
referenced when the concept of channel-state information is considered. In this
chapter, and in most practical wireless communications, the focus of channel
knowledge is the complex attenuation between transmit and receive antennas
that is represented by the channel matrix for MIMO links. Channel-state infor-
mation may also include knowledge of the statistical properties of the interfer-
ence and noise, typically represented by the interference-plus-noise covariance
matrix. It is typically assumed that the receiver can estimate the channel. This
estimation can be done by employing joint data and channel estimation, or, more
typically, by including a known training or pilot sequence with which the channel
can be estimated as part of the transmission.
At the transmitter, access to knowledge of the channel state is problematic. In
rare circumstances, the transmitter can exploit knowledge of the geometry and
an exact model for the environment (such as of line-of-sight channels), but this
approach is rarely valid for terrestrial communications. If there is not a means
for a transmitter to obtain channel-state information, then the transmitter is
said to be uninformed. If there is a communication link from the receiver, then
channel estimates can be sent back from the receiver to the transmitter, and
the link has an informed transmitter. Approaches to efficiently encode channel
estimates have been considered [195] and are discussed in Section 8.12.2. If the
link is bidirectional, on the same frequency and using the same antennas, then
reciprocity can be invoked so that the channel can be estimated while in the
receive mode and then exploited during the transmit mode. As discussed in
Section 8.12.1, there are some technical issues in using the reciprocity approach.
When using either channel-estimation feedback or reciprocity, the time-varying
channels can limit the applicability of these techniques [30]. If the channel is
very stable, which may be true for some static environments, then providing
the transmitter with channel-state information may be viable. If the channel is
dynamic, as in the case of channels with moving transmitters and receivers, the
channel may change significantly before the transmitter can use the channel-state
information. In this case, it is said that the channel-state information is stale.
In reaction to potentially stale channel-state information, one approach is to
provide the transmitter access to statistical characteristics of the channel. As an
example, if the typical distribution of the singular values of the channel matrix
can be estimated, then space-time codes can be modified to take advantage
of these distributions. Explicitly, if channels can be characterized typically by
high-rank channel matrices, then codes with higher rates may be suggested.
Conversely, if the channels can be characterized typically by low-rank channel
matrices, the codes with high spatial redundancy may be suggested. Trading rate
for diversity is discussed in Chapter 11.
In addition, there are different levels of knowledge of interference for the trans-
mitter. If the interference signals are known exactly at the transmitters, then
dirty-paper coding techniques can be employed [67], although practical imple-

mentations for dirty-paper coding is an open area of research. Also, the inter-
ference may be known in some statistical sense. The most common example
would be for an interference-plus-noise spatial covariance to be estimated at the
receiver and passed back to the transmitter. As will be shown in this chapter,
for unknown Gaussian interference, the optimal channel-state information is the
spatially whitened channel matrix. A whitened channel matrix is a channel ma-
trix premultiplied by the inverse of the square root of the noise-plus-interference
covariance matrix.
Consequently, there are a variety of levels of knowledge of the channel state
at the transmitter. Some common levels of channel-state information are listed
here.
Channel matrix Interference
unknown unknown
known unknown
transmitter power unknown
known known signal
known known statistically
known statistically known statistically
This list is provided partly as a warning. In is not uncommon in the literature to
make some assumptions about the channel-state information at the transmitter
without providing a clear description of these assumptions.
8.3.2 Informed-transmitter (IT) capacity

For narrowband MIMO systems, the coupling between the transmitter and re-
ceiver for each sample in time can be modeled by using Equation (8.2). In this
section, it is assumed that the transmitter is informed with knowledge of the
channel matrix. The transmitter also has knowledge of the spatial interference-
plus-noise covariance matrix, so that the interference is known in a statistical
sense assuming Gaussian interference. For notational convenience, it is also as-
sumed that power is scaled so that the variance of the baseband complex thermal
noise (the noise in the absence of external interference) associated with each re-
ceive antenna is 1.
By using the definitions in Equation (8.2), the capacity is given by the maxi-
mum of the mutual information [68] as is discussed in Section 5.3.2,
) *
p(z|s)
I(z, s) = log2
p(z)
= h(z) − h(z|s) , (8.12)
and is maximized over the source conditional probability density p(s|P) subject
to various transmit constraints on the transmit spatial covariance matrix P ∈
Cn t ×n t . The differential entropies for the received signal and for the
248 MIMO channel
received signal conditioned by the transmitted signal are given by h(z) and
h(z|s), respectively. For the sake of notational convenience, the explicit parame-
terization of time z(t) ⇒ z is suppressed. Here the maximum mutual information
provides an outer bound on the spectral efficiency. As discussed in Section 5.3
and discussed for MIMO systems in [308], the mutual information is maximized
by employing a Gaussian distribution for s. The worst-case noise plus interfer-
ence is given by a Gaussian distribution for n. The probability distribution for
the received signal given the transmitted signal p(z|s) is given by
1 † −1
p(z|s) = e−(z−H s) R (z−H s) . (8.13)
|R| π n r
The probability distribution for the received signal without knowledge of what
is being transmitted p(z) is typically modeled by
1 † −1
p(z) = e−z Q z , (8.14)
|Q| π n r
where the combined spatial covariance matrix Q ∈ Cn r ×n r is given by
Q = R + H P H† . (8.15)
The differential entropy for the received signal given knowledge of what is
transmitted h(z|s) is just the entropy of the Gaussian noise plus the interference
h(n) because n = z − H s and is given by
h(z|s) = h(n)
= log2 (π n r en r |R|) . (8.16)
Similarly, the entropy for the received signal is given by
h(z) = log2 (π n r en r |R + H P H† |) . (8.17)
Consequently, the mutual information in units of bits per seconds per hertz is
given by
I(z, s) = h(z) − h(z|s)

= log2 ([π e]n r |R + H P H† |) − log2 ([π e]n r |R|)

= log2 I + R−1 H P H†

= log2 I + R−1/2 H P H† R−1/2 . (8.18)
There are a variety of possible constraints on P, depending on the assumed

transmitter limitations. As an example, one might imagine imposing a peak
power constraint upon each transmit antenna. In the following discussion, it
is assumed that the fundamental limitation is the total power transmitted.
The %optimization
& of the nt × nt noise-normalized transmit covariance matrix,
P = s s† , is constrained by the total thermal-noise-normalized transmit power,
Po . This optimization falls under the category of nonlinear programming. A
unique solution exists if the Karush–Kuhn–Tucker (KKT) conditions are sat-

isfied, as discussed in Section 2.12.2. If different transmit powers are allowed
at each antenna, the total power constraint can be enforced by using the form
tr{P} ≤ Po . The channel capacity is achieved if the channel is known by both
the transmitter and receiver, giving
cI T = sup log2 |In r + R−1/2 H P H† R−1/2 | . (8.19)

P: tr(P)=P o
To avoid radiating negative power, the additional constraint P ≥ 0 (that is all

the eigenvalues of P are greater than or equal to 0) requires that only a subset
of transmit covariance eigenvalues will be used. Much of the literature invokes
the water-filling solution for capacity [314, 68] by employing the standard KKT
solution at this point, where the mth eigenvalue for the optimized transmit
covariance matrix P is given by
+
1
λm {P} = ν+ −1/2
. (8.20)
λm {R H H† R−1/2 }
The notation (a)+ = max(0, a) indicates here that if the value of argument is
limited to non-negative values, and the parameter ν is varied so that the following
condition is satisfied,
1
+
ν+ −1/2
= Po . (8.21)
m
λm {R H H† R−1/2 }
The informed transmitter capacity is then given by

cI T = log2 1 + λm {R−1/2 H H† R−1/2 } λm {P} . (8.22)
m
We redevelop the same capacity explicitly providing a useful form. The whitened
channel R−1/2 H can be represented by the singular-value decomposition

D 0
R−1/2 H = (U U) (W W)† , (8.23)
0 D
where the nonzero singular values and corresponding singular vectors are par-
titioned into two sets. A subset of n+ singular values of the whitened channel
matrix is contained in the diagonal matrix D ∈ Rn + ×n + , and the remaining
min(nr , nt ) − n+ are contained in the diagonal matrix D.
In the following discussion, we develop the criteria for finding the subset of
whitened channel matrix singular values. The corresponding left and right sin-
gular vectors are contained in U, U, W, and W. The columns of U ∈ Cn r ×n +
are orthonormal, and the columns of W ∈ Cn t ×n + are orthonormal. For some
subset (contained in D) of whitened channel singular values, the subspace of the
nonzero eigenvector of P is constrained to be orthogonal to the columns of W,
250 MIMO channel
so that the term R−1/2 H P H† R−1/2 simplifies to

†
D 0
R−1/2 H P H† R−1/2 = R−1/2 H P (W W) (U U)†
0 D
= R−1/2 H P W D† U†
= U D W† P W D† U† . (8.24)
Given this decomposition, the transmit covariance P is optimized for possible

subsets (contained in D) of whitened channel singular values. The best solution
that satisfies the constraint of positive transmit power P > 0 (indicating that all
the eigenvalues of transmit covariance matrix are positive) is selected. However,
as one would expect, it is the singular values with larger magnitudes (modes
with better propagation) that are more helpful.
For a given test evaluation with n+ channel modes, the spectral efficiency op-
timization can be written by using the noise-free receive covariance C ∈ Cn + ×n +
that satisfies the following relationship,
C = DW† PWD† . (8.25)
The total transmit power is given by
Po ≥ tr{P}
= tr{W† P W}
= tr{D−1 C (D† )−1 }
= tr{(D† D)−1 C} (8.26)
because all of the power in P is contained in the subspace defined by the orthonor-
mal matrix W, replacing the transmit covariance matrix with the quadratic form
W† P W does not change the total power, tr{P} = tr{W† P W}. Because D is
a real, symmetric, diagonal matrix, the transpose contribution of the Hermitian
conjugate has no effect; thus, D† D = D∗ D. The capacity (optimized spectral
efficiency) is given by
cI T = sup log2 |In r + U D W† P W D† U† |

P: tr(P)=P o
= sup log2 |In + + C| . (8.27)

C ; tr(C (D † D ) −1 )=P o
By employing a Lagrangian multiplier η to enforce the constraint, the optimal

noise-free receive covariance matrix can be found by evaluating the derivative
with respect to some arbitrary parameter α of the noise-free receive covariance
matrix C = C(α), so that
∂
0= log2 |I + C| − η tr{C(D† D)−1 }
∂α
−1 ∂C † −1 ∂C
= log2 (e) tr (I + C) − η tr (D D) . (8.28)
∂α ∂α
To simplify the expression, the notation η = η / log2 (e) is used. This relationship
is satisfied if C is given by the diagonal matrix
1 †
C= D D − In + . (8.29)
η
The value for the Lagrangian multiplier η is found by imposing the total power
constraint,
Po = tr{P}
= tr{C (D† D)−1 }

I
= tr − (D† D)−1
η
1 Po + tr{(D† D)−1 }
= . (8.30)
η n+
Consequently, the noise-free receive covariance matrix C is given by

Po + tr{(D† D)−1 }
C= D† D − I n + . (8.31)
n+
The non-negative power constraint is satisfied if the eigenvalues of the transmit
covariance matrix are positive,
P≥0
P = C (D† D)−1 ≥ 0 . (8.32)
The capacity is maximized by employing the largest subset of channel singular

values such that the above constraint is satisfied using Equation (8.31). The
constraint can be rewritten such that the values of the diagonal matrix D† D
must satisfy
n+
{D† D}m ,m ≥ . (8.33)
Po + tr{(D† D)−1 }
Assuming that the selected singular values of the channel matrix contained in
D are sorted by their magnitude, if Equation (8.33) is not satisfied for some
{D† D}m ,m , it will not be satisfied for any smaller {D† D}m ,m . As a reminder, the
diagonal entries in the diagonal matrix D† D ∈ Rn + ×n + contain the n+ largest
eigenvalues of the whitened channel matrix
0 1
λm R−1/2 H H† R−1/2 , (8.34)
where λm {·} indicates the mth eigenvalue.

The resulting capacity, by substituting Equation (8.31) in Equation (8.27), is
given by

P + tr{(D† D)−1 }
o †
cI T = log2 D D . (8.35)
n+
252 MIMO channel
In this discussion, it is assumed that the environment is stationary over a

period long enough for the error associated with channel estimation to vanish
asymptotically. In order to study the typical performance of quasistationary
channels sampled from a given probability distribution, capacity is averaged
over an ensemble of quasistationary environments. Under the ergodic assumption
(that is, the ensemble average is equal to the time average), the mean capacity
cI T is the channel capacity.
It is worth noting that this informed transmitter capacity is based upon an
average total transmit power, tr{P} ≤ Po . For some practical systems, this may
not be the best constraint. If the transmitter is operating near its peak power
output, then the power limit may be imposed on a per transmit element basis
so that
Po
{P}m ,m ≤ , (8.36)
nt
which is a different optimization than the one discussed in this section and is
beyond the discussion in this text. In addition, there are other typically sub-
optimal approaches, sometimes denoted precoding, that may be logistically de-
sirable [330, 268, 274, 162, 197]. Precoding techniques can be extended to con-
sider the interaction on channel dynamics and accuracy of channel-state feedback
[325].
8.3.3 Uninformed-transmitter (UT) capacity

If the channel and interference are stochastic and not known at the transmitter,
then the optimal transmission strategy for an isolated link is to transmit equal
power from each antenna, P = Po /nt In t [308]. This optimization becomes more
complicated in the context of a network, as discussed in Chapters 12, 13, and 14.
Assuming that the receiver can accurately estimate the channel, but the trans-
mitter does not attempt to optimize its output to compensate for the channel,
the maximum spectral efficiency under the assumption of a diagonal transmit
covariance matrix is given by

Po −1

cU T = log2 In r + R HH .†
(8.37)
nt
This is a common transmit constraint as it may be difficult to provide the trans-
mitter channel estimates. In the following discussion, it is shown that this trans-
mitter strategy is optimal in an average sense.
This strategy can be demonstrated under the assumption of a random Gaus-
sian (sometimes denoted Rayleigh) channel with independent elements. Forms
for the capacity under other channel-fading models, such as Rician [112], are also
possible to develop. Under this assumption, any unitary transformation of the
whitened channel matrix is just as likely as any other, such that the probability
density p(R−1/2 H)
p(R−1/2 H) = p(R−1/2 H U) (8.38)
for any unitary matrix U. The goal is to optimize the transmit covariance matrix
P. Because any Hermitian matrix can be constructed by U P U† , starting with
a diagonal matrix P, there is no reason to consider any transmit covariance
matrices with off-diagonal elements. Another way to view this is that under the
random channel matrix assumption, the transmitter cannot have knowledge of
any preferred direction. If the whitened channel matrix can be represented by
the singular-value decomposition R−1/2 H = Ũ S̃ Ṽ† , then the ergodic (average
over time) capacity is given by
5 6
cU T = log2 |I + S̃ S̃† Ṽ† P Ṽ|
5 6
= log2 1 + λm {S̃ S̃† Ṽ† P Ṽ} , (8.39)
m
where Ṽ is a random unitary matrix, and S̃ is a random diagonal matrix. Because

the logarithm is compressive, the largest average sum will occur if the diagonal
contributions in the power covariance matrix are equal. This is a consequence
of the fact that for some real values am , the sum of the logarithm of 1 + am ,
explicitly

log(1 + am ) , (8.40)
m
"
under the constraint of m am = constant, is maximized when the various el-
ements are equal am = an . For a given total power, the maximum average
capacity occurs when the variance in eigenvalues is minimized. This minimum
variance occurs when the diagonal entries in the transmit covariance matrix
P are equal. Consequently, under the assumption of an uninformed transmit-
ter, when Equation (8.38) is satisfied, the optimal transmit covariance matrix is
given by
Po
P= I. (8.41)
nt
High SNR limit

In the limit of the absence of external interference R → I, the ratio of the ca-
pacity of the MIMO link cU T to the SISO link cS I S O capacity in the limit of
high SNR is given by the number of antennas used by the MIMO link. The
single-input signle-output (SISO) link has a channel attenuation a. By using
the notation λm {·} to indicate the mth eigenvalue of the argument (sorted
so that the largest eigenvalue is given by m = 1), the ratio of capacities is
254 MIMO channel
given by

cU T log2 In r + Po
nt H H†
=
cS I S O log2 (1 + a2 Po )
"n r 0 1
m =1 log 2 m In r +
λ Po
nt H H†
=
log2 (1 + a2 Po )
"m in(n r ,n t ) 0 1
m =1 log 2 λ m In r + Po
nt H H†
=
log2 (1 + a2 Po )
"m in(n r ,n t ) 0 1
m =1 log2 (Po ) + log2 λm n1t H H†
→
log2 (Po ) + log2 (a2 )
"m in(n r ,n t )
log2 (Po )
→ m =1
log2 (Po )
= min(nr , nt ) (8.42)
in the limit of large transmit power. The convergence to this asymptotic result
is very slow. Consequently, this often-quoted result is mildly misleading because
practical systems typically work in SNR regimes for which this limit is not valid.
Furthermore, the advantages of MIMO are often in the statistical diversity it
provides, which improves the robustness of the link. Nonetheless, the above result
and the following sections can be used to provide some insight into potential
performance improvements or limits when used properly.
High SNR and higher INR limit

Here we develop the ratio for the capacity with cU T ,I and without cU T ,N I ex-
ternal interference for ni infinite power interference sources in the limit of high
SNR. Implicit in the following discussion is the notion that the interference power
is growing faster than the signal power of the intended signal. Furthermore, the
assumption is employed that the interference can be modeled by asymptotically
high-power Gaussian signals that are spatially correlated such that they are
completely contained within a subspace of the interference-plus-noise covariance
matrix. This assumption is important because in practice nonideal limitations
of receivers typically cause the rank of the interference to grow as the power in-
creases. This rank increase will overwhelm the degrees of freedom of the receiver.
Here we will assume an ideal receiver and perfect parameter (that is, channel
and interference-plus-noise covariance matrix) estimation.
It is assumed here that the receive interference-plus-noise spatial covariance
matrix has the form
R = I + J J† , (8.43)
where power is normalized so that thermal noise contributes the identity matrix
I to the covariance matrix and the receive spatial covariance matrix of the ni
interferers is given by J J† . Here the columns of the interference channel ma-

trix J ∈ Cn r ×n i contain the array responses times the receive power for each
interferer. As the interference power increases, the inverse of the spatial covari-
ance matrix approaches a projection matrix P⊥ J that is orthogonal to the space
spanned by J,
−1
−1 J J†
R = I+
ni
−1
1 J† J
=I− J I+ J†
ni ni
→ I − J (J† J)−1 J†
= P⊥
J , (8.44)
by employing Equation (2.113) in the limit of large interference power compared

to the noise. If the number of interfering antennas is equal to or larger than
the number of receive antennas ni ≥ nr , then the capacity is zero because the
projection matrix is orthogonal to the entire space except for a set of examples
of zero measure in which the spatial response of the interferers are contained
completely in a subspace that is smaller than the number of interferers. The ratio
of the no-interference to high-interference capacities in the high-interference limit
is given by

Po ⊥ †
cU T ,I log 2 I n r
+ nt P J H H
=
cU T ,N I
log2 In r + Pn ot H H†

† ⊥
log2 In r + Pn ot P⊥ J H H PJ
=

log2 In r + Pn ot H H†
"m in(n r −n i ,n t ) 3 4
m =1 log2 1 + Pn ot λm P⊥ † ⊥
J H H PJ
= "m in(n r ,n t ) . (8.45)
m =1 log2 1 + Pn ot λm {H H† }
In the limit of high SNR, the capacity ratio is given by

"m in(n r −n i ,n t ) 3 4
cU T ,I m =1 log 2 nt
Po
+ log2 λm P⊥ † ⊥
J H H PJ
→ "m in(n r ,n t )
cU T ,N I log2 Pn ot + log2 (λm {H H† })
m =1
"m in(n r −n i ,n t )
Po
m =1 log 2 nt
→ "
m in(n r ,n t ) Po
m =1 log2 n t
min(nr − ni , nt )
→ , (8.46)
min(nr , nt )
256 MIMO channel
where we employ the observations that the summation only needs to occur over
arguments of the logarithm that are not unity, and that the eigenvalues of the
finite channel components are small compared to the large power term. The con-
vergence to the final result is relatively slow. In general, the theoretical capacity
is not significantly affected as long as the number of antennas is much larger
than the number of interferers.
A practical issue with this analysis is that at very high INR, the model that
J can be completely contained within a subspace fails because of more subtle
physical effects. As examples, the effects of dispersion (that is resolvable delay
spread) across the array or receiver linearity can cause the rank of the interference
covariance to increase. However, for many practical INRs, the analysis is a useful
approximation.
8.3.4 Capacity ratio, cIT /cU T

In general, the informed transmitter has higher capacity than the uninformed
transmitter. However, there is also increased overhead. It is reasonable for a
system designer to ask the question, is the increase in performance worth the
overhead? In general, there are a number of subtleties related to system limita-
tions and specifics of the assumed phenomenology which will often drive expected
performance. However, here a few limiting cases are considered that will aid in
developing system design intuition.
High SNR limit

At high SNR, cI T and cU T converge. At high SNR, for finite interference, all
the available modes are employed; that is, n+ is equal to the minimum of the
number of transmit and receive antennas, n+ = min(nr , nt ). If the number of
receive antennas is larger than or equal to the number of transmit antennas
nr ≥ nt , then this convergence can be observed in the large Po limit of the ratio
of Equations (8.35) and (8.37),

† −1 −1
cI T log2 P o +tr{(H n +R H ) } H† R−1 H
=
cU T
log2 In r + Pn ot H† R−1 H
† −1

H ) −1 }
nt log2 P o +tr{(H nR + log2 H† R−1 H
= t
. (8.47)

log2 In t + Pn ot H† R−1 H
In the limit of large SNR (Po tr{(R−1 H† H)−1 }), the difference between the
various channel eigenvalues becomes unimportant, and the capacity ratio is given
by

cI T nt log2 + log2 H† R−1 H
Po
nt
→
cU T
log2 Pn ot H† R−1 H

nt log2 Pn ot + log2 H† R−1 H
→
nt log2 Pn ot + log2 |H† R−1 H|
= 1. (8.48)
The convergence to one is relatively slow. If the number of transmit antennas

is greater than the number of receive antennas nt > nr (considered in Exercise
8.1), then the result is essentially the same.
Low SNR limit

At low SNR, the informed transmitter selects the dominant singular value (n+ =
1) of the whitened channel. Essentially, the system is selecting matched trans-
mit and receive beamformers that have the best attenuation path through the
whitened channel. The corresponding eigenvalue of the dominant mode is given
by
d = λm ax {R−1/2 H H† R−1/2 }
= D† D , (8.49)
where in this limit the matrix D collapses to a scalar of the dominant singular
value because n+ = 1. In this limit of low SNR, the ratio of the informed to the
uninformed capacity cI T /cU T is given by

P o +(d) −1
cI T log2 n+ d
=
cU T
log2 In r + Po
nt R−1/2 H H† R−1/2
−1

log P o +(d)
n+ d
=

log In r + Pn ot R−1/2 H H† R−1/2
log (1 + Po d)
=" 0 1
m log λm In r + Po
nt R−1/2 H H† R−1/2
log (1 + Po d)
=" 0 1 . (8.50)
m log 1 + λm Pn ot R−1/2 H H† R−1/2
In the low SNR limit, the eigenvalues are small, so the lowest-order term in the
logarithmic expansion about one is a good approximation; thus, the capacity
258 MIMO channel
ratio is given by
cI T Po d
→" 0 1
cU T Po −1/2 H H† R−1/2
m λm n t R
λm ax {R−1/2 H H† R−1/2 }
= " 3 4
1 −1/2 H H† R−1/2
nt m λm R
λm ax {H† R−1 H}
= " † −1 H}
, (8.51)
m λm {H R
1
nt
by using Equation (8.35) with n+ = 1 and Equation (8.37). Given this low SNR
asymptotic result, a few observations can be made. The spectral-efficiency ra-
tio is given by the maximum to the average eigenvalue ratio of the whitened
channel matrix H† R−1 H. If the channel is rank one, such as in the case of a
multiple-input single-output (MISO) system, the ratio is approximately equal to
nt . Finally, in the special, if physically unlikely, case in which R−1/2 H H† R−1/2
has a flat (that is, all equal) eigenvalue distribution, the optimal transmit co-
variance matrix is not unique. Nonetheless, the ratio cI T /cU T approaches one.
It is worth repeating here that, when embedded within a wireless network, the
optimization and potential performance benefits are not the same as an isolated
link discussed above.
8.4 Frequency-selective channels
In environments in which there is frequency-selective fading, the channel ma-

trix H(f ) and the interference-plus-noise spatial covariance matrix R(f ) are
functions of frequency f . Receiver approaches for frequency-selective channels
are considered in more detail in Chapter 10; however, for completeness, it is
discussed briefly here in the context of MIMO capacity. By exploiting the or-
thogonality of frequency channels, the capacity in frequency-selective fading can
be calculated using an extension of Equations (8.35) and (8.37). As a reminder,
in each frequency bin, only 1/nf of this power is employed. Similarly, in each fre-
quency bin only 1/nf of the noise power is received. Consequently, if the power is
evenly distributed among frequency bins, the noise-normalized transmit power
Po is the same, independent of bin size. For the uninformed transmitter, this
even distribution assumption leads to the frequency-selective spectral-efficiency
bound,
df cU T (Po ; H(f ), R(f ))
cU T ,F S =
df
"n f
−1
Po
n =1 Δf log2 I + n t R (fn ) H(fn ) H† (fn )
≈ "n f
n =1 Δf

1 P
log2 I + Ř−1 Ȟ Ȟ† ,
o
≈ (8.52)
nf nt
where the distance between frequency samples is given by Δf , and the nf -bin
frequency-partitioned channel matrix Ȟ is given by
⎛ ⎞
H(f1 ) 0 0 0
⎜ 0 H(f2 ) 0 0 ⎟
⎜ ⎟
Ȟ ≡ ⎜ . ⎟, (8.53)
⎝ . . ⎠
0 0 H(fn f )
and the frequency-partitioned interference-plus-noise spatial covariance matrix
is given by
⎛ ⎞
R(f1 ) 0 0 0
⎜ 0 R(f2 ) 0 0 ⎟
⎜ ⎟
Ř ≡ ⎜ .. ⎟. (8.54)
⎝ . ⎠
0 0 R(fn f )
In order to construct the discrete approximation, it is assumed that any variation
in channel or interference-plus-noise covariance matrix within a frequency bin is
insignificant.
For the informed transmitter channel capacity, power is optimally distributed
among both spatial modes and frequency channels. The capacity can be ex-
pressed as
1
cI T ,F S ≈ max log2 I + Ř−1 Ȟ P̌ Ȟ† , (8.55)
P̌ nf
which is maximized by Equation (8.35) with the appropriate substitutions for
the frequency-selective channel, and diagonal entries in D in Equation (8.33) are
selected from the eigenvalues of ȞȞ† . Because of the block diagonal structure of
Ȟ, the (nt · nf ) × (nt · nf ) space-frequency noise-normalized transmit covariance
matrix P̌ is a block diagonal matrix, normalized so that in each frequency bin
the average noise-normalized transmit power is Po , which can be expressed as
tr{P̌}/nf = Po . There are a number of potential issues related to the use of
discretely sampled channels. Some of these effects are discussed in greater detail
in Section 10.1.
8.5 2 × 2 Line-of-sight channel
While the 2 × 2 line-of-sight link is not a common terrestrial communications

problem, for instructive purposes, it useful to consider it [33] because explicit
analytic expressions are tractable. Here it is assumed that all antennas are iden-
tical and transmit and receive isotropically. If one imagines an environment in
which the transmit and receive arrays exist in the absence of any obstructions,
then this is a line-of-sight environment. While this phrase is used commonly in
an informal way, there can be some confusion as to what is assumed. Here it
is also assumed that there are no significant scatterers, so that the knowledge
260 MIMO channel
of the antenna geometry is sufficient to determine the channel. It is also typi-

cally assumed that each array is relatively small, so that each array is not able
to resolve the antennas of the opposing array. In the following discussion, the
assumption of small arrays is explicitly and parametrically broken.
If it is assumed that the arrays are small, so that the arrays cannot resolve
the antennas of the opposing array, then the channel can be characterized by
a rank-1 matrix. This matrix is proportional to the outer product of steering
vectors from each array pointing at the other,
H = avw† , (8.56)
where a is the overall complex attenuation, and v and w are the receive and
transmit array steering vectors, respectively.
To further study the line-of-sight model, we consider an example 2 × 2 channel
in the absence of external interference, and in which the transmit and receive
arrays grow. To visualize the example, one can imagine a receive array and a
transmit array each with two antennas so that the antennas are located at the
corners of a rectangle, as seen in Figure 8.2. The ratio of the larger to the smaller
channel matrix eigenvalues can be changed by varying the shape of the rectangle.
When the rectangle is very asymmetric (wide but short) with the arrays being
far from each other, the rank-1 channel matrix is recovered. The columns of the
channel matrix H can be viewed as the receive-array response vectors, one vector
for each transmit antenna,
√
H= 2 (a1 v1 a2 v2 ) , (8.57)
where a1 and a2 are constants of proportionality (equal to the root-mean-squared

transmit-to-receive attenuation for transmit antennas 1 and 2 respectively) that
take into account geometric attenuation and antenna gain effects, and v1 and v2
are unit-norm array response vectors of the receive √ array pointing at transmit
antenna 1 and transmit antenna 2, respectively. The 2 compensates for the use
of the unit-norm array response vectors. For the purpose of this discussion, it is
assumed that the overall attenuations are equal a = a1 = a2 , which is valid if the
rectangle deformation does not significantly affect overall transmitter-to-receiver
distances.
The capacity of the 2 × 2 MIMO system is a function of the channel singular
values and the total transmit power. The eigenvalues of the channel matrix inner-
product form

† 2 1 (v1† v2 )∗
H H = 2a (8.58)
v1† v2 1
V1
V2
Figure 8.2 Simple line-of-sight 2 × 2 channel.
are given by

1 + 1 ± (1 − 1)2 + 4 v1† v2 2
μ = 2a2
2
2 †
μ1 = 2a 1 + v1 v2

μ2 = 2a2 1 − v1† v2 , (8.59)
by using the results from Section 2.3.2.

From the above result, the normalized inner product of the array responses
pointed at each transmit antenna is the important parameter. To make contact
with physical space, it is useful to parameterize the “distance” (the inner prod-
uct) between array responses by the physical angle between them. To generalize
this angle, it is useful to express the angle in units of beamwidths. There are a
variety of ways in which beamwidths can be defined. A common definition is the
distance in angle between the points at which the beam is down from its peak
by 3 dB. Here a somewhat more formal definition is employed. The separation
between receive array responses is described in terms of the unitless parameter
generalized beamwidths b introduced in Reference [96], where the distance in
generalized beamwidths is defined by
2 0 1
b= arccos v1† v2 . (8.60)
π
It is assumed here that v1 = v2 = 1. The beamwidth separation indicates

the angular difference normalized by the width of the beam. As it is defined, the
generalized beamwidth separation varies from 0 corresponding to the same array
response, to 1 corresponding to orthogonal array responses. For small angular
separations, this definition of beamwidths closely approximates many ad hoc
definitions for physical arrays. One of the useful applications for the generalized
beamwidth definition is for complicated scattering environments for which the
ad hoc or physical definitions might be difficult to interpret.
262 MIMO channel
µ1
0
Eigenvalue/a2 (dB)
- 10 µ2
- 20
- 30
0 0.2 0.4 0.6 0.8 1

Generalized Beamwidth Separation
Figure 8.3 Eigenvalues of HH† for a 2 × 2 line-of-sight channel as a function of array

generalized beamwidth separation. IEEE
c 2002. Reprinted, with permission, from
Reference [33].
The eigenvalues μ1 and μ2 are displayed in Figure 8.3 as a function of general-

ized beamwidth separation. When the transmit and receive arrays are small,
indicated by a small separation in beamwidths, one eigenvalue is dominant.
As the array apertures become larger, indicated by a larger separation, one
array’s individual elements can be resolved by the other array. Consequently,
the smaller eigenvalue increases. Conversely, the larger eigenvalue decreases
slightly.
Equations (8.33) and (8.35) are employed to determine the capacity for the
2 × 2 system. The water-filling technique first must determine if both modes
in the channel are employed. Both modes are used if the following condition is
satisfied,
2
μ2 > 1 1 ,
Po + μ1 + μ2
1 1
Po > −
μ2 μ1
v1† v2
> , (8.61)
a2 1 − v1† v2 2
assuming μ1 > μ2 , where Po is the total noise-normalized power. If the condition

is not satisfied, then only the stronger channel mode is employed and the capacity,
from Equation (8.35), is given by
cI T = log2 (1 + μ1 Po )

= log2 1 + 2a2 [1 + v1† v2 ] Po ; (8.62)
10
Spectral Efficiency
5
(b/s/Hz)
2
1
0.5
– 10 –5 0 5 10 15 20
2
a Po (dB)
Figure 8.4 The informed transmitter capacity of a 2 × 2 line-of-sight channel,

assuming generalized beamwidth separations of 0.1 (solid) and 0.9 (dashed). IEEE
c
2002. Reprinted, with permission, from Reference [33].
otherwise, both modes are used and the capacity is given by

Po + 1 + 1
μ1 μ2 μ1 0
cI T = log2
2 0 μ2
+ 2 7
μ1 μ2 Po + μ1 + μ2 1
= log2
2 μ1 μ2
0 1
= 2 log2 a2 Po 1 − v1† v2 2 + 1
0 1
− log2 1 − v1† v2 2 . (8.63)
The resulting capacity as a function of a2 Po (mean SISO SNR) for two

beamwidth separations, 0.1 and 0.9, is displayed in Figure 8.4. At low a2 Po ,
the capacity associated with small beamwidth separation performs best. In this
regime, capacity is linear with receive power, and small beamwidth separation
increases the coherent gain. At high a2 Po , large beamwidth separation produces
a higher capacity as the optimal MIMO system distributes the energy between
modes.
The above discussion is useful to help develop some intuition, although it is un-
reasonable to expect communications to have access to widely separated antennas
in most situations. However, most terrestrial communications are characterized
by complicated multipath scattering. In complicated multipath environments,
small arrays employ scatterers to create virtual arrays of a much larger effective
aperture. The effect of the scatterers upon capacity depends on their number
and distribution in the environment. The individual antenna elements can be
resolved by the larger effective aperture produced by the scatterers. The larger
effective aperture increases the distance between transmit antennas in terms
of generalized beamwidths. As was demonstrated in Figure 8.3, the ability to
264 MIMO channel
resolve antenna elements is related to the number of large singular values of the
channel matrix and thus the capacity.
8.6 Stochastic channel models
In rich scattering environments, the propagation from each transmitter to each

receiver appears to be uncorrelated for arrays that are not oversampled spa-
tially. One can think of the channel matrix as a stochastic variable being drawn
from some distribution. For many applications, it is reasonable to assume that
the channel is approximately static for some period of time, changing relatively
slowly. Furthermore, it is often reasonable to assume that the distribution is
stationary over even longer intervals of time.
The description of the MIMO channel can be relatively complicated. Each
environment may have its own channel probability distribution. Because there
is no one correct answer when one is considering channel phenomenology, it is
useful to have a variety of approaches from which one can select [110, 301, 30],
depending upon the characteristics of the particular environment of interest. In-
evitably, a study of the channel phenomenology for each particular environment
must be performed [30]; however, there is value to studies of system performance
based on simple parametric models. When characterizing system performance,
it is useful to simulate a distribution of channels. As an example, some modula-
tions or space-time codes may have a relative advantage relative to other codes,
depending upon the channel distributions. This is easy to see when comparing
modulation approaches in channels that do or do not have frequency selectivity.
Assuming simple receivers, some modulations are sensitive to frequency-selective
fading. Similarly, there is a rate versus diversity trade-off [361] when considering
space-time coding approaches. Practical optimization of this coding trade-off is
sensitive to the channel correlation.
Here the flat-fading (not frequency-selective) channel is considered. From Equa-
tion (8.11), it can be observed that capacity is a function of the eigenvalues of
HH† ,

Po
cU T = log2 I + H H†
nt
Po

= log2 1+ λm {H H† } (8.64)
nt
under the simplification that the interference-plus-noise covariance matrix be-
comes the identity matrix R = I. The total noise-normalized transmit power is
given by Po , and λm {·} indicates mth eigenvalue.
The channel can be decomposed into the product of two unitary matrices U
and V and a diagonal matrix D by using the singular value decomposition,
H = U D V† . (8.65)
As discussed in Section 8.3, the capacity is insensitive to the structure in U and

V, and is dependent upon the singular values contained in D,
λm {H H† } = λm {U D V† V D† U† }
= λm {D D† }
= {D}m ,m 2
. (8.66)
By focusing on the capacity, it can be seen that the important characteristics
of the channel are the overall attenuation and the shape of the singular-value
distribution. It is often convenient to separate these two characteristics. To help
disentangle these two characteristics, a normalized channel F is defined here such
that
H = aF, (8.67)
where the real parameter a is the average attenuation defined by
% &
2 H 2F
a = , (8.68)
nt nr
and
% 2
&
F F = nt nr . (8.69)
By using this normalization, the average SNR per receive antenna is given by
SNR = a2 Po . (8.70)
Because of the extra and unnecessary freedom in the amplitude for the param-
eter’s average attenuation a2 , noise-normalized total transmit% power& Po , and
channel matrix H, it is sometimes assumed H = F, such that H 2F = nt nr ,
depending upon the situation.
8.6.1 Spatially uncorrelated Gaussian channel model

At the other end of the spectrum of channel models from the line-of-sight model
is the uncorrelated Gaussian channel model. If the signal associated with a par-
ticular transmit antenna seen at a receiver is the result of the superposition of a
large number of independent random scatterers, then the central limit theorem
suggests that the channel matrix element can be drawn from a Gaussian distribu-
tion. This model is the most commonly assumed one in the literature discussing
MIMO communication. The channel matrices are drawn independently from a
circular complex Gaussian distribution, such that
1 −2
H H †}
H ∼ p(H) = e−tr{a . (8.71)
πn t n r a 2n t n r
This model is based upon the notion that the environment is full of scatterers.
The signal seen at each receive antenna is the sum of a random set of wavefronts
bouncing off the scatterers. For a SIMO system under the assumptions of a
266 MIMO channel
nondispersive array response (bandwidths small compared with the ratio of the
speed of light divided by the array size) and scatterers in the far field of the
array, the channel vector hm ∈ Cn r ×1 (mth column of H) is given by

hm = am ,n v(km ,n )
n
∼ g, (8.72)
where the vector g is drawn from the limiting (that is, large number of scatter-
ers) distribution, v(km ,n ) ∈ Cn r ×1 is the array response for a single wavefront-
associated direction of the wavevector km ,n , and am ,n is a random complex
scalar. The values of am ,n are determined by the propagation from the trans-
mitter impinging on the array from direction km ,n . For physically reasonable
distributions for am ,n and km ,n , in complicated multipath environments, the
central limit theorem [241] drives the probability distributions for the entries in
hm to independent complex circular Gaussian distributions. Consequently, by
employing the assumption that all transmit–receive pairs are uncorrelated, the
entries in H are drawn independently from a complex circular Gaussian distri-
bution.
The random matrix with elements drawn from a complex circular Gaussian
distribution with unit variance is often indicated by G, such that
H = aG, (8.73)
where a2 is the average SISO attenuation in power. This expected Frobenius

norm squared of G is given by
% & % &
G 2F = {G}m ,n 2
m ,n
= nr nt . (8.74)
8.6.2 Spatially correlated Gaussian channel model

While the independent complex circular Gaussian model is an important stochas-
tic model for describing distributions of channel matrices, it can miss some im-
portant characteristics sometimes present in real channels. In particular, because
of the details of the environment, there can be spatial correlations in the direc-
tions to scatterers. The most general Gaussian model is constructed by using a
coloring matrix M such that
vec{H} ∝ M vec{G } , (8.75)
where the random matrix G ∈ Cn r ×n t has entries drawn independently from

a complex circular Gaussian distribution, and all the cross correlations are con-
tained in M ∈ Cn r n t ×n r n t . By employing a slightly more constrained physical
model, a simplified formalism is constructed. Imagine that the environment is
full of scatterers in the far field of the transmitter and receiver; however, the
scattering fields as seen by either the transmit and receive array from the other
array subtends a limited field of view. As a consequence, the random channel
has spatial correlation. For the described situation, the model for the channel
[110, 30],
H ∝ Mr G M†t , (8.76)
can be employed, so that
vec{H} ∝ (M∗t ⊗ Mr ) vec{G } . (8.77)
Consequently, this model is sometimes denoted the Kronecker channel. The ma-
trices Mr and Mt introduce spatial correlation associated with the receiver and
transmitter respectively.
The spatially coloring matrices Mr and Mt can be decomposed by using a
singular-value decomposition such that
Mr = Ur Dr Vr† (8.78)
and
Mt = Ut Dt Vt† . (8.79)
The spatially correlated channel can then be represented by
H = a Ur Dr Vr† G Vt† Dt U†t
= a Ur Dr G Dt U†t , (8.80)
where G and G are matrices with elements drawn independently from a com-
plex circular unit-variance Gaussian distribution. The matrices G and G are
related by a unitary transformation. The two matrices are statistically equiva-
lent because unitary transformation of a complex circular unit-variance Gaussian
matrix with independent elements produces another Gaussian complex circular
unit-variance Gaussian matrix with independent elements.
Reduced-rank channels
When one is simulating channels, random unitary and Gaussian matrices can be
generated for a given average attenuation a and diagonal matrices Dr and Dt .
There is significant literature on selection values for the average SISO attenua-
tion, a [140, 260, 188]. However, it is less clear on how to determine values for
Dr and Dt . One model is to assume that the diagonal values are given by some
specified number of equal-valued elements and zero otherwise, of the form

Im r 0
Dr = , (8.81)
0 0
where mr sets the rank of Dr . The form of Dt is given by replacing mr with
mt . In the channel model given in Equation (8.80), the unitary matrices Ur
and Ut are full rank by construction. The Gaussian matrix G can, in principle,
have any rank; however, the size of the set of reduced-rank Gaussian matrices
268 MIMO channel
is vanishingly small compared to the size of the set of full-rank matrices (that
is the matrix G is full rank with probability one). Thus, the set of reduced-
rank Gaussian matrices forms a set of zero measure and can be ignored for any
practical discussion.
Because the rank of a matrix produced by the product of matrices can be
no more than the smallest rank of the constituent matrices, this form would
produce a channel matrix with a rank limited by the smaller of mr and mt . For
the rank to be reduced, the unitary matrices UG and VG in the singular value
decomposition of the Gaussian matrix G = UG DG VG would have to transform
(which is a rotation in some sense) the subspace of Dr and Dt such that there is
no overlap on one dimension. Given the random nature of the matrix G, this is
extremely unlikely. Consequently, from any practical point of view, the rank of
the channel is given by min(mr , mt ). Under this model, the expected Frobenius
norm squared of the channel matrix is given by
% & 5 6
H 2
F = a2 tr{Ur Dr G Dt U†t Ut D†t G† D†r U†r }
% &
= a2 tr{Dr G Dt (Dr GDt )† }
m
r ,m t
% &
= a2 {G}j r ,j t 2
= a2 mr mt , (8.82)
j r =1,j t =1
where Dr and Dt are Hermitian and idempotent (Dr Dr = Dr ) in this partic-

ular situation. The notion of a reduced-rank channel model has a mixed set of
implications. From a phenomenological point of view, it is unlikely for any envi-
ronment to be so free of scatterers that the channel is actually reduced in rank.
However, it is possible for the smaller (in magnitude) singular values to be small
enough that they are not of value from a communication point of view under
the assumption of finite SNR. For example, imagine a case in which equal power
is transmitted in the strongest and the weakest singular values. If the magni-
tude squared of the smallest singular value is 40 dB smaller than the largest,
then the power coupled into this smallest mode only contributes 0.01% into the
total received power, and has little effect on the total receiver power. A water-
filling solution would avoid using this small mode for any reasonable transmit
power.
Exponential shaping model

An alternative approach for values contained in Dr and Dt is to assume a
shaped distribution of singular values. An approach that matches measured
channel distributions with reasonable fidelity [30] is to assume an exponential
shaping.
The spatial correlation matrices can be factored so and the receive
shaping model Dr = Δα r and the transmit shaping model Dt = Δα t are
positive-semidefinite diagonal matrices. The channel is then given by
H = a Ur Δα r G Δα t U†t (8.83)
√ 0 1
diag{α , α , . . . , α n −1
}
Δα = n , (8.84)
tr (diag{α0 , α1 , . . . , αn −1 }2 )
where the shaping parameter α and the number of antennas n can be either
αr or αt and nr or nt , respectively. For many environments of interest, the
environments at the transmitter and receiver are similar, assuming that the num-
bers of transmit and receive antennas are equal and have similar
spatial correlation characteristics. In this regime, the diagonal shaping matri-
ces can be set equal, Δα = Δα L = Δα R , producing the new random channel
matrix H.
The form of shaping matrix Δα given here is arbitrary, but has the satisfying
characteristics that in the limit of α → 0, only one singular value remains large,
and in the limit of α → 1, a spatially uncorrelated Gaussian matrix is produced.
Furthermore, empirically this model provides good fits to experimental distri-
butions [30]. The normalization for Δα is chosen so that the expected value of
H 2F is a2 nt nr .
Rician MIMO channel

The correlating matrix approach is only one of a variety of possible modeling ap-
proaches. Another approach is to assume that there is a line-of-sight contribution
in addition to the contribution from a rich scattering field. This approach is the
spatial extension to the Rician channel [314]. If the transmit and receive arrays
are small and far from each other, then the line-of-sight component (specular) of
the channel can be characterized by the rank-1 contribution v w† , where v and
w are deterministic, given by the array response to a plane wave. The stochastic
contribution from the rich scattering field is given by G that has entries drawn
independently from a circular complex Gaussian distribution. The channel by
this model is given by
, ,
K v w† √ 1
H=a nt nr + G , (8.85)
K +1 v w K +1
where K is the K-factor that varies according to the relative contribution of

specular and random contributions. The normalizations of the specular contri-
bution and the Gaussian components are defined so that the expected square of
the Frobenius norm of the channel is equal to a2 nt nr . The expectation of the
270 MIMO channel
norm squared is given by

' +, ,
% 2
& 2 K v w† √ 1
H = a tr nt nr + G
K +1 v w K +1
, , † ⎫(
K vw † √
1 ⎬
· nt nr + G
K +1 v w K +1 ⎭
) *
K 1
= a2 nt nr + tr G G†
K +1 K +1
= a2 nt nr , (8.86)
where the observation that G has zero mean is used to remove the cross terms.
8.7 Large channel matrix capacity
As has been discussed previously, a common channel modeling approach is to

construct a matrix G by independently drawing matrix elements from a unit-
variance complex Gaussian distribution, mimicking independent Rayleigh fading,
H = aG. (8.87)
This matrix is characterized by a relatively flat distribution of singular values

and is an appropriate model for very rich multiple scattering environments.
In the limit of a large channel matrix, the eigenvalue probability density func-
tion for a Wishart matrix with the form (1/nt )GG† asymptotically approaches
the Marcenko–Pastur distribution [206, 315], as is discussed in Section 3.6. Of
course, implemented systems will have a finite number of antenna elements; how-
ever, because the shape of the typical eigenvalue distributions quickly converges
to that of the asymptotic distribution, insight can be gained by considering the
infinite dimensional case.
8.7.1 Large-dimension Gaussian probability density

The probability that a randomly chosen eigenvalue of the nr ×nr Wishart matrix
(1/nt )GG† is less than μ is denoted Pκ (μ). Here G is an nr × nt matrix, and
the ratio of nr to nt is given by κ = nr /nt . As discussed in Section 3.6, in the
limit of nr → ∞, the probability measure associated with the distribution Pκ (μ)
is given by
pκ (μ) + cκ δ(μ) , (8.88)
where the constant associated with the “delta function” or atom at 0 is given by

1
cκ = max 0, 1 − . (8.89)
κ
0.08
Probability Density
0.07
0.06
0.05
0.04
0.03
0.02
0.01
– 20 – 15 – 10 – 5 0 5 10
Eigenvalue (dB)
Figure 8.5 Eigenvalue probability density function for the complex Gaussian channel
((1/nt )GG† ), assuming an equal number of transmitters and receivers (κ = 1) in the
infinite dimension limit. IEEE
c 2002. Reprinted, with permission, from Reference
[33].
The first term of the probability measure, pκ (μ), is given by

⎧
⎪ √
⎪
⎨ (μ−a κ ) (b κ −μ) ; a ≤ μ ≤ b
2π μ κ κ κ
pκ (μ) = , (8.90)
⎪
⎪
⎩
0 ; otherwise
where
√
aκ = ( κ − 1)2
√
bκ = ( κ + 1)2 . (8.91)
The eigenvalue probability density function for this matrix expressed using a
decibel scale is displayed in Figure 8.5. By using the probability density func-
tion, the large matrix eigenvalue spectrum can be constructed and is depicted in
Figure 8.6.
8.7.2 Uninformed transmitter spectral efficiency bound

In the large matrix limit, the uninformed transmitter spectral efficiency bound,
defined in Equation (8.37) and discussed in References [259, 34, 33], can be
expressed in terms of a continuous eigenvalue distribution,

Po

cU T = log2 In r + HH †
nt

1
= log2 In r + a2 Po GG†
nt

1
= log2 In r + a2 Po λm GG†
m
nt
∞
≈ nr dμ pκ (μ) log2 (1 + μ a2 Po ) , (8.92)
0
272 MIMO channel
0
−5
Eigenvalue (dB)
−10
−15
−20
−25
0 0.2 0.4 0.6 0.8 1

Fraction of Eigenvalues
Figure 8.6 Peak-normalized eigenvalue spectrum for the complex Gaussian channel
((1/nt )GG† ), assuming an equal number of transmitters and receivers (κ = 1) in the
infinite dimension limit. IEEE
c 2002. Reprinted, with permission, from Reference
[33].
where λm {·} indicates the mth eigenvalue, and the continuous form is asymp-
totically exact. This integral is discussed in Reference [259].2 The normalized
asymptotic capacity as a function of a2 Po and κ, cU T /nr ≈ Φ(a2 Po ; κ), is given
by
x 1−ρ
1 w−
Φ(x; κ) = ν log2 w+ + log2 − ,
ν ρ 1 − w− ρ log(2)
,
1 ρ ν 1 ν 2
w± = + + ± 1+ρ+ − 4ρ ,
2 2 2x 2 x

1 1
ρ = min r, , ν = . (8.93)
κ max(1, κ)
In the special case of M = nt = nr , the capacity is given by
cU T a2 Po
≈ 3 F2 ([1, 1, 3/2], [2, 3], −4 a Po )
2
(8.94)
M log(2)
√ √
4a2 Po + 1 + 4a2 Po log 4a2 Po + 1 + 1 − 2a2 Po (1 + log(4)) − 1
= ,
a2 Po log(4)
(8.95)
where p Fq is the generalized hypergeometric function discussed in Section 2.14.2.
8.7.3 Informed transmitter capacity

Similarly, in the large matrix limit, the informed transmitter capacity, defined in
Equation (8.35), can be expressed in terms of a continuous eigenvalue distribution
[34, 33]. To make the connection with the continuous eigenvalue probability
2 Equation (8.93) is expressed in terms of bits rather than nats as it is in Reference [259].
density defined in Equation (8.88), D from Equation (8.35) is replaced with

D = a2 nt Λ, where diagonal entries of Λ contain the selected eigenvalues of
(1/nt ) GG† .

Po + 21 trΛ−1
a nt 2
cI T = log2 a nt Λ
n+
2 ∞
a nt Po + nr μ c u t dμ pκ (μ ) μ1
≈ g nr log2
g nr
∞
+ nr dμ pκ (μ) log2 (μ) , (8.96)
μc u t
where g is the fraction of channel modes used by the transmitter,

∞
n+
g= ≈ dμ pκ (μ) , (8.97)
nr μc u t
and μcu t is the minimum eigenvalue used by the transmitter and is the solution
to the integral in Equation (8.98), given by the continuous version of Equation
(8.33),
n+
dm = nt a2 μ > ∞ ,
Po + nr a 21n t μ c u t dμ pκ (μ) μ1
∞
κ μc u t
dμ pκ (μ )
μcu t = ∞ . (8.98)
a2 Po + κ μc u t
dμ pκ (μ) μ1
The approximations are asymptotically exact in the limit of large nr .
For a finite transmit power, the capacity continues to increase as the number
of antennas increases. Each additional antenna increases the effective area of the
receive system. Eventually, this model breaks down as the number of antennas
becomes so large that any additional antenna is electromagnetically shielded by
existing antennas. However, finite random channel matrices quickly approach the
shape of the infinite model. Consequently, it is useful to consider the antenna-
number normalized capacity cI T /nr . The normalized capacity is given by
2 ∞
cI T a Po + r μ c u t dμ pκ (μ ) μ1
≈ g log2
nr κg
∞
+ dμ pκ (μ) log2 (μ) . (8.99)
μc u t
By using the asymptotic eigenvalue probability density function given in Equa-

tion (8.90), the integrals in Equations (8.98) and (8.99) can be evaluated. The
relatively concise results for κ = 1 are displayed here:
∞
dμ pκ=1 (μ)
μc u t
√
μc u t
(4 − μcu t ) μcu t + 4 arcsin 2
=1 − , (8.100)
2π
274 MIMO channel
10
Spectral Efficiency
1
(b/s/Hz/M)
0.1
0.01
−30 −20 −10 0 10 20

a2 Po (dB)
Figure 8.7 Asymptotic large-dimension Gaussian channel antenna-number-normalized

spectral efficiency bounds, cI T /M (solid) and cU T /M (dashed) (b/s/Hz/M), as a
function of attenuated noise-normalized power (a2 Po ), assuming an equal number of
transmitters and receivers (κ = 1, M = nt = nr ). IEEE
c 2002. Reprinted, with
permission, from Reference [33].
and
∞
1
dμ pκ=1 (μ)
μc u t μ
, √
1 1 4 − μcu t 1 μcu t
=− + + arcsin . (8.101)
2 π μcu t π 2
To calculate the capacity, the following integral must also be evaluated,
∞
dμ pκ=1 (μ) log2 (μ)
μc u t

1 1 1 3 3 μcu t
= 4 3 F2 , , , , ,
2 2 2 2 2 4

4 2
+ 4 − μcu t − √ arcsec √
μcu t μcu t
√
2π μcu t
× (1 − log[μcu t ]) − √ log(μcu t ) . (8.102)
μcu t π log[4]
By implicitly solving for cut-off eigenvalue μcu t , the capacity as a function of
2
a Po is evaluated and is displayed in Figure 8.7. The uninformed transmitter
spectral efficiency bound is plotted for comparison. For small a2 Po , μcu t ap-
proaches the maximum eigenvalue supported by pκ (μ). In this regime, the ratio
of the informed transmitter to the uninformed transmitter capacity cI T /cU T ap-
proaches 4. To be clear, this limiting value for the ratio occurs in the case of a
symmetric number of transmitters and receivers. Conversely, at large a2 Po , the
normalized informed transmitter and uninformed transmitter spectral efficiency
bounds converge, as predicted by Equation (8.48).
8.8 Outage capacity
The term outage capacity is poorly named because it is not really a capacity.
However, it is a useful concept for comparing various practical systems. In par-
ticular, it is useful for comparing various space-time codes and receivers. Given
the assumption of a stochastic model of the channel drawn from a stationary dis-
tribution defined by a few parameters, the outage capacity is defined to be the
rate achieved at a given SNR with some probability [253]. Because the capacity
is dependent upon the given channel matrix, under the assumption of stochastic
channel, the capacity becomes a stochastic variable. If the capacity for some at-
tenuated total transmit power a2 Po (the average SISO SNR), channel matrix H,
and interference-plus-noise spatial covariance matrix R is given by c(a2 Po , H, R)
for some random distribution of channels H, then the P (c ≥ η) is the probability
that the capacity for a given channel draw is greater than or equal to a given
spectral efficiency η. When the capacity for a given channel draw is greater than
or equal to the desired rate, then the link can close, theoretically. Explicitly, the
probability that the link can close, Pclose , is given by
3 4
Pr c(a2 Po , H, R) ≥ η = Pclose . (8.103)
If the link does not close, then it is said to be in outage. Implicit in this assump-
tion is the assumption that the capacity is being evaluated for a single carrier
link in a flat-fading environment. When the link has access to alternative types
of diversity, the discussion becomes more complicated.
Typically, the probability of closing is fixed and the SNR is varied to achieve
this probability. As an example, the outage capacities under the assumptions of
90% and 99% probabilities of closure for an uncorrelated Gaussian 4 × 4 MIMO
channel link in the absence of interference are displayed as a function of a2 Po
(average SNR per receive antenna) in Figure 8.8. The curves in this figure are
constructed by empirically evaluating the cumulative distribution function of
spectral efficiency for each SNR value. The values for 90% and 99% probability
of closing the link are extracted from the distributions.
8.9 SNR distributions
In Section 8.3.4, the channel capacity in the limit of high and low SNR was con-
sidered. Here the discussion of approximations to the uninformed transmitter
capacity in the limit of low SNR is extended. While capacity is the most funda-
mental metric of performance for a wireless communication system, it is often
useful to consider the distribution of SNR, particularly at low SNR. At lower
SNR, capacity is proportional to SNR. In addition, for practical systems, SNR
is often much easer to measure directly.
276 MIMO channel
8
90% prob close
Spectral Efficiency (b/s/Hz)

99% prob close
6
0
−10 −5 2 0 5 10
a P o (dB)
Figure 8.8 Outage capacity for a 4 × 4 MIMO link under the assumption of an
uncorrelated Gaussian channel as a function of SNR per receive antenna (a2 Po ). Link
closure probabilities of 90% and 99% are displayed.
The uninformed transmitter spectral efficiency bound in the presence of inter-

ference is given by

Po −1
cU T = log2 In r + R H H† . (8.104)
nt
The form of the spectral efficiency bound can be simplified by considering low
SNRs and strong interference limits. In the low SNR regime, the determinant
can be approximated using the trace of the whitened received signal spatial
covariance matrix by using the approximation M ≈ log(I + M) for small M,

Po −1

cU T = log2 In r + R HH †
nt

1 Po −1 †
= tr log In r + R HH
log 2 nt

1 Po −1 †
≈ tr R HH . (8.105)
log 2 nt
By defining R ≡ I + Pin t M M† and using Woodbury’s formula from Section 2.5,
the inverse of the interference-plus-noise covariance matrix is given by
−1 †
(I + Pin t MM† )−1 = I − Pin t M I + Pin t M† M M , (8.106)
which in the strong interference regime becomes the projection operator
(I + Pin t MM† )−1 ≈ P⊥
M
P⊥ †
M ≡ I − M(M M)
−1
M† , (8.107)
projecting onto a basis orthogonal to the space spanned by M. Consequently,
the low SNR capacity in the presence of strong interference is given by

1 Po ⊥
cU T ≈ tr PM HH†
log 2 nt
1 Po $ $P⊥
$2
$ ,
= M hn (8.108)
log 2 nt n
where hn is the nth column of the channel matrix H, and we have made use
of the idempotent property of projection matrices. In this limit, the spectral
efficiency bound can be expressed as the sum of beamformer outputs each having
an array-signal-to-noise ratio (ASNR) (which is the SNR at the output of a
receive beamformer),
1 Po $ $h†n P⊥
$
$
cU T ≈ M hn
log 2 nt n
1 Po $ † $2
nt
≡ $wn hn $
log 2 nt n =1
P⊥
M hn
wn =
P⊥
M hn
1
n t
≡ ASNRm
log 2 m =1
1
≡ ζ, (8.109)
log 2
where ζ is the sum of ASNRs optimized for each transmit antenna.
8.9.1 Total power

As an example, the uncorrelated Gaussian channel model is assumed. The chan-
nel matrix is proportional to a matrix, G ∈ Cn r ×n t , where the entries are inde-
pendently drawn from a unit-norm complex circular Gaussian distribution such
that
H = aG. (8.110)
By using the notation that gn is the nth column of G, the low SNR spectral effi-
ciency bound (that is, when a2 Po is small) in the presence of strong interference
is given by
1 a2 Po ⊥ 2
cU T ≈ PM gn
log 2 nt n
1 a2 Po †
= gn UU† P⊥ †
M UU gn
log 2 nt n
1 a2 Po †
= (gn ) JK gn
log 2 nt n
K ·n
1 a2 Po t 2
= gm , (8.111)
log 2 nt m =1
278 MIMO channel
where U is a unitary matrix that diagonalizes the projection matrix such that
the first K diagonal elements are one and all the other matrix elements are zero,
represented by JK , assuming K is the rank of the projection matrix,
K = nr − ni , (8.112)
where ni is the number of interferers for nr > ni . While the particular values
change, the statistics of the Gaussian vector g are not changed by arbitrary
unitary transformation Ug, so the statistics of each element of gn and gn are
the same. Here gm is used to indicate a set of random scalar variables sampled
from a unit-norm complex circular Gaussian distribution. As a consequence, the
statistical distribution of the approximate spectral efficiency bound is represented
by a complex χ2 -distribution. The array-signal-to-noise ratio, denoted ASNRm ,
is the SNR at the output of the beamformer associated with the mth transmitter,
assuming that the strong interferer is spatially mitigated. In the low SNR regime,
the presence of the other transmitters does not affect the optimal beamformer
output SNR.
The probability density function of the complex χ2 -distribution from Section
3.1.11 is given by
xN −1 −x
pC
χ 2 (x; N ) = e . (8.113)
Γ(N )
Thus, the following probability density function for the low SNR sum of ASNR
ζ can be expressed as

C nt ζ nt
p(ζ) ≈ pχ 2 2 ; (nr − ni ) · nt 2
. (8.114)
a P0 a Po
Similarly, the cumulative distribution function (CDF) for the complex
χ2 -distribution is given by
x0
PχC2 (x; N ) = dx f (x; N )
0
γ(N, x0 )
=1− , (8.115)
Γ(N )
where γ(N, x0 ) is the incomplete gamma function. Consequently, the CDF for
the sum of the ASNRs ζ is given by
2

C nt , a Po
P (ζ) = Pχ 2 ; K · nt
ζ

γ [nr − ni ] · nt , an2tPζo
=1− . (8.116)
Γ([nr − ni ] · nt )
As an example, the CDFs for 2 × 2, 3 × 3, and 4 × 4 MIMO systems, with and
without a single strong interferer, are compared as in Figure 8.9. The horizontal
axis is normalized such that the average SISO total receive power, normalized by
a2 Po , is 0 dB. In this flat, block-fading environment, the SISO system (not shown)
1
0.5
2×2 4×4
0.1
CDF
0.05
0.01 3×3
0.005
0 Interferer
1 Interferer
-10 -5 0 5 10
SASNR/a2 Po (dB)
Figure 8.9 CDFs for total receive power at the output of beamformers, which is the
sum of ASNRs (SASNR or ζ), for 2 × 2, 3 × 3, and 4 × 4 MIMO systems, with (solid)
and without (dashed) a single strong interferer.
would perform badly. In the low SNR regime, the information-theoretic spectral
efficiency bound is proportional to the sum of ASNRs ζ. Thus, the probability
of outage is given by the complementary CDF (that is, 1 − CDF). For example,
the 99% reliability (or outage capacity) is associated with the sum of ASNRs ζ
at the probability of 0.01. Because of the spatial diversity and because of the
receive-array gain, performance improves as the number of antennas increases.
In the presence of a strong interferer, the 4 × 4 MIMO system receives more than
13 dB more power than the 2 × 2 system if 99% reliability is required. At this
reliability, the 3 × 3 MIMO system has only lost 3 dB compared to the average
SISO channel. At this reliability, a SISO system would suffer significant losses
compared to the MIMO systems and would have infinite loss in the presence of
a strong interferer.
8.9.2 Fractional loss

The distribution of fractional loss caused by spatially mitigating strong interfer-
ence can be found in a manner similar to that in Section 8.9.1. Using Equation
(8.108), the ratio of low SNR capacity with and without the presence of the
strong interference is given by
"K ·n t
cin t gm 2
η ≡ U T ≈ "K ·n t m
" n r ·n t
, (8.117)
cU T n =1 gn
2 +
n = K ·n t +1 gn
2
which is described by the beta distribution, discussed in Section 3.1.17, where

the difference between the number of receivers and strong interferers is indicated
by K = nr − ni . Here, channel coefficients are proportional to gm ∼ CN (0, 1)
that are drawn from a circularly symmetric complex Gaussian distribution. Here
the probability density of the beta distribution, which describes the statistical
280 MIMO channel
1
0.5
0.1 2´2
CDF
0.05
0.01 3´3
0.005
4´4
-14 -12 -10 -8 -6 -4 -2 0
Power Loss (dB)
Figure 8.10 CDFs for fractional power loss caused by spatial mitigation of a single
strong interferer for 2 × 2, 3 × 3, and 4 × 4 MIMO systems.
distribution of the ratio

"k 2
m =1 gm
"k "j
, (8.118)
m =1 gm 2 + m =1 gm 2

where gm is a Gaussian random variable with the same statistics as gm , is given
by
Γ(j + k) j −1
pβ (x; j, k) = x (1 − x)k −1 (8.119)
Γ(j) Γ(k)
and the corresponding CDF is given by
x0
Pβ (x0 ; j, k) = dx pβ (x; j, k)
0
Γ(j + k)
= B(x0 ; j, k) , (8.120)
Γ(j) Γ(k)
where B(x; j, k) is the incomplete beta function. Consequently, the CDF P (η) of
fractional loss η due to mitigating an interferer is given by
P (η) ≈ Pβ (η; K · nt , [nr − K]nt )
≈ Pβ (η; [nr − ni ] · nt , ni · nt ) . (8.121)
As an example, the comparison of the total ASNR loss CDFs for 2 × 2, 3 × 3,
and 4 × 4 MIMO systems with a single strong interferer is shown in Figure 8.10.
At a 99% reliability (or outage capacity) the sum of ASNRs ζ loss is no worse
than −3.3 dB for a 4 × 4 MIMO system, but is worse than −12 dB for a 2 × 2
system.
The advantage of multiple transmitters is illustrated in Figure 8.11. Given
four receive antennas and the same total transmit power, there is a significant
difference in the performance of a 1 × 4 system versus a 4 × 4 system. At the
99% reliability level, the sum of ASNRs ζ losses are −6.7 dB versus −3.3 dB,
1
0.5
0.1
CDF
0.05
1´4
0.01
0.005
2´4 4´4
-10 -8 -6 -4 -2 0
Power Loss (dB)
Figure 8.11 CDFs for fractional power loss caused by spatial mitigation of a single
strong interferer for 1, 2, or 4 transmit antennas assuming 4 receiver antennas.
respectively. At a 99.9% reliability, the difference between the sum of ASNRs ζ

losses is nearly 6 dB. The difference in performance is caused by the ability of the
multiple transmitters to distribute information among multiple spatial modes,
ensuring that the spatial mitigation of the strong interferer does not accidentally
remove signal-of-interest power.
8.10 Channel estimation
Although joint channel estimation and decoding is possible, for most decoding
approaches and for any informed transmitter approach, an estimate of the chan-
nel is required. While, given some model for the environment, training or channel
probing sequences [139] can be designed to improve performance [23], here it is
assumed that the sequences associated with each transmitter are independent.
The flat-fading MIMO model, assuming nt transmit antennas, nr receive anten-
nas, and ns complex baseband samples is given by
Z = HS + N
,
Po
=H X+N
nt
= AX + N, (8.122)
as defined for Equation (8.3), where the amplitude-channel product is given by

,
Po
A= H, (8.123)
nt
282 MIMO channel
and the normalized reference signal (also known as a training or pilot sequence)
X ∈ Cn t ×n s is given by
S
X= (8.124)
Po /nt
and is normalized so that
% &
X X† = ns I . (8.125)
The “thermal” noise at each receive antenna can be characterized by the variance.
Here it is assumed that the units of power are defined so that the “thermal” noise
variance is one. It is also assumed that the channel is temporally static.
Here it is worth noting that, in the literature, the normalization of the chan-
nel is not always clear. Because the transmit power can be absorbed within the
amplitude-channel product matrix estimate A or within the transmit signal S, its
definition is ambiguous. Within the context of the discussion of theoretical capac-
ity it is often convenient to explicitly express the power, so that the transmitted
signal contains the square root of the power; however, in channel estimation, it
is often assumed that the reference signal has some arbitrary normalization, and
the transmit power is subsumed into the channel estimate. While this ambiguity
can cause confusion, it is typically reasonably clear from context. Nonetheless,
within the context of this section, we will endeavor to be slightly more precise
to avoid any confusion.
Under the assumption of Gaussian noise and external interference, the prob-
ability of ns received samples or given received signal is
e−tr {(Z−AX) R (Z−AX) }

† −1
p(Z|X; A, R) = , (8.126)
π n r n s |R|n s
where the interference-plus-noise spatial covariance matrix is given by
% &
N N†
R= . (8.127)
ns
Maximizing with respect to an arbitrary parameter α of A gives the following
estimator:
∂p(Z|X; A, R)
=0
∂α
⇒ (Z − Â X)X† = 0
Â = Z X† (X X† )−1 . (8.128)
This maximum-likelihood derivation of the estimator for the channel A is also

the least-squared estimate of the channel. A remarkable observation is that the
estimator does not use any knowledge of the interference-plus-noise spatial co-
variance matrix. However, channel-estimation performance is affected by inter-
ference and noise.
8.10.1 Cramer–Rao bound

Cramer–Rao bounds were introduced in Section 3.8. The variance of a parameter
is given by the inverse of the Fisher information matrix
3 4
var{(Â)m ,n } = J−1 {m ,n },{m ,n } . (8.129)
Here for notational convenience, the couplet {m, n} indicates an index into a
vector of size m · n. Similarly, {m, n}, {j, k} is used to specify an element of a
matrix at row {m, n} and column {j, k}. This is done to avoid using the vector
operation defined in Equation (2.35). For some known reference sequence X, the
received signal mean Y is given by
Y = Z = A X + N
= AX, (8.130)
under the assumption of zero-mean noise.

Because the channel matrix is not present in the covariance
∂R/∂(A)m ,n = 0 , (8.131)
and the derivative of one conjugation with respect to the other is zero,
∂
A = 0, (8.132)
∂{A}∗m ,n
the only contributing factor to the Fisher information from Section 3.8 for a
Gaussian model is given by
∂2
{J}{m ,n },{j,k } = − log p(Z|A, X)
∂(A)∗m ,n ∂(A)j,k
∂2
= log tr{(Z − A X)† R (Z − A X)}
∂(A)∗m ,n ∂(A)j,k

∂(A X)† −1 ∂(A X)
= tr R
∂(A)∗m ,n ∂(A)j,k
∗ T

† ∂(A ) −1 ∂(A)
= tr X R X , (8.133)
∂(A)∗m ,n ∂(A)j,k
where Wirtinger derivatives are being used. The derivatives of the channel are
given by
∂A
= ej eTk
∂(A)j,k
∂A†
= en eTm (8.134)
∂(A)∗m ,n
284 MIMO channel
where the em vector indicates a vector of zeros with a one at the mth row,
⎛ ⎞
0
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
em =⎜ ⎟
⎜ 1 ⎟. (8.135)
⎜ . ⎟
⎝ .. ⎠
0
Interference free
For the sake of discussion, first consider the interference-plus-noise covariance in
the absence of interference, given by
R = In r , (8.136)
where power is normalized so that the noise per channel is unity. The information
matrix is then given by
∗ T

† ∂(A ) −1 ∂(A)
{J}{m ,n },{j,k } = tr X R X
∂(A)∗m ,n ∂(A)j,k
0 1
= tr (em e†n X)† I−1 ej e†k X
0 1
= tr (em e†n X)† ej e†k X
= xk x†n δm ,j , (8.137)
where xm indicates a row vector containing the mth row of the reference sequence
X, and δm ,j is the Kronecker delta function. For sufficiently long sequences of
xm ,
xk x†n ≈ ns δk ,n (8.138)
is a good approximation. Under this approximation, the sequences are approxi-

mately orthogonal. As a consequence, the Fisher information matrix becomes
{J}{m ,n },{j,k } ≈ ns δm ,j δk ,n . (8.139)
The information matrix is thus diagonal with equal values along the diagonal.
For the interference-free channel estimate, the variance of the estimate is given
by
0 1 3 4
var (Â)m ,n = J−1 {m ,n },{m ,n }
1
≈ . (8.140)
ns
It is sometimes useful to consider the estimation variance normalized by the
mean variance of the channel because it is the fractional error that typically
drives any performance degradation in communication systems. The mean vari-

ance of an element in the channel matrix is given by
% &
A 2F Po
= a2 (8.141)
nr nt nt
under the assumption that A is drawn randomly from a stationary distribution.
The ratio of estimation variance to channel matrix element variance is then given
by
var{(Â)m ,n } var{(Â)m ,n }
=
A2F a2 Pn ot
nr nt
var{(Ĥ)m ,n }
=
a2
nt
≈
ns a2 Po
nt 1
= , (8.142)
ns SNR
where SNR indicates the mean SISO signal-to-noise ratio.
In interference
From above in Equation (8.133), the Fisher information matrix is given by

∂(A∗ )T −1 ∂(A)
{J}{m ,n },{j,k } = tr X† R X
∂(A)∗m ,n ∂(A)j,k
3 † † −1 4
= tr xn em R ej xk . (8.143)
The information matrix is given by

3 4
{J}{m ,n },{j,k } = tr x†n e†m R−1 ej xk
= e†m R−1 ej xk x†n
= {R−1 }m ,j xk x†n
J = R−1 ⊗ (X X† )
J−1 = R ⊗ (X X† )−1 . (8.144)
The variance of the channel estimate is then given by
var({Â}m ,n ) = {R}m ,m {(X X† )−1 }n ,n . (8.145)
By using the same approximation employed previously for the interference-free

bound, for which the mean of the reference signal is zero and covariance of the
reference signal is approximately proportional to the identity matrix
XX† ≈ ns In t , (8.146)
286 MIMO channel
the information matrix becomes

J ≈ ns R−1 ⊗ In t
1
J−1 ≈ R ⊗ In t , (8.147)
ns
and the corresponding channel-estimation variance is given by
1
var({Â}m ,n ) ≈ {R}m ,m . (8.148)
ns
For a rank-1 interferer, the interference-plus-noise covariance matrix is given by
R = In r + a2i Pi vv† , (8.149)
where the array response v (containing the complex attenuations from the single
interfering transmitter to each receive antenna) is constrained by v 2 = nr with
interference received power per antenna a2i Pi . For this covariance structure under
the approximately orthogonal signal assumption, the variance of the channel
estimation becomes
1
var {Â}m ,n ≈ {In r + a2i Pi vv† }m ,m
ns
1
≈ (1 + a2i Pi {v}m 2 ) . (8.150)
ns
Once again, it is sometimes useful to consider the ratio of the channel estima-
tion variance to the average attenuation,
var{(Â)m ,n } nt
≈ (1 + a2i Pi vm 2
)
F
A 2
ns a2 Po
nr nt
nt
≈ , (8.151)
ns SINRm
where SINRm is the received signal-to-noise-plus-interference ratio at the mth
receive antenna.
8.11 Estimated versus average SNR
It would seem that the SNR of a signal would be an easy parameter to define.
However, its definition is somewhat problematic for block-fading MIMO chan-
nels. As was suggested in Section 8.10, the channel estimate, average attenuation,
and transmit power are coupled.
In attempting to analyze space-time codes, which are discussed in Chapter 11
in an absolute context, there are a number of technical issues. The first is that
theoretical analyses of space-time codes often do not lend themselves to experi-
mental interpretation. A primary concern is the definition of SNR. In addition,
the performance of space-time codes is typically dependent upon channel delay
properties. Delay spread in the channel translates to spectral diversity that can
be exploited by a communication system.
8.11.1 Average SNR

In most discussions of space-time codes found in the literature, the code perfor-
mance is evaluated in terms of bit error rate as a function of average SNR. This
average is taken over random channels and noise. Often the entries in the channel
and noise are assumed to be drawn independently from complex circularly sym-
metric Gaussian distributions, as discussed in Section 8.6.1. The random channel
variable is often given in the context of a random channel F ∈ Cn r ×n t and some
overall average channel attenuation a such that the channel matrix is given by
H = aF, (8.152)
where F is a random variable with the normalization F 2F = nt nr .

By evaluating the expectation over noise, channels, and transmitted signal,
the average SNR per receive channel is given by
SNRave = SNR
% &
HS 2
=
N 2
% &
tr{H S S† H† }
=
tr{N† N}
% &
tr{H† H S S† }
=
n n
% †s &r % † &
tr{ H H S S }
=
n n
% † s & rP o
tr{ H H n t I}
=
nr
% & 2
F 2F a nPt o }
=
nr
a2 Po
nt nr nt
=
nr
2
= a Po , (8.153)
where S S† = ns Po /nt I. This average SNR is equivalent to the average single-

input single-output (SISO) SNR, assuming the same total power. When devel-
oping a simulation, it is often assumed that power is defined in units of noise
and average attenuation is set to one a2 = 1. One can generate random channels
and noise while scaling the transmit power to determine the error performance.
8.11.2 Estimated SNR

The determination of SNR in an experiment can be different because the ex-
pectation often cannot be taken over a set of channels with a fixed average
attenuation. Consequently, the estimate of the average attenuation and channel
288 MIMO channel
variation become coupled. Imagine the extreme case for which a single carrier
system is employed with a static channel. The code is not exercised over an
ensemble of channel matrices, and the estimated SNR is biased by the single
particular channel draw.
As described in Section 8.10, for some ns samples, the single-carrier channel
response Z ∈ Cn r ×n s can be expressed by
Z = AX + N, (8.154)

where A = Po /nt H denotes the amplitude-channel product, X ∈ Cn t ×n s is
the normalized transmitted signal, and N ∈ Cn r ×n s is the additive noise. Under
the assumption of a known transmit training sequence X, the channel can be
estimated. In estimating the channel, an amplitude-normalized version X of the
reference is typically employed such that
2
X F = nt ns . (8.155)
The least-squared-error channel estimator is given by
Â = Z X† (X X† )−1
= A X X† (X X† )−1 + N X† (X X† )−1
= A + N X† (X X† )−1 , (8.156)
where ˆ· indicates the estimated

% & parameter. Recall that the expected independent
2 2
channel takes the form H F = a nt nr . From this, we can define an estimated
receive SNR, which is coupled with the particular realization of the channel,
$ $2
$ $
2
a Po $Â$
F
= . (8.157)
nt nt nr
The estimated SNR per receive antenna for a given channel realization and esti-
mation is given by
= a
SNR 2P
o
Z X† (X X† )−1 2
F
= . (8.158)
nr
In the limit of large integrated SNR for the channel estimate, the estimation
error approaches zero and
Â → A . (8.159)
In this limit, a simple relationship between the estimated and average SNR can
be found,
2
= A F
SNR
nr
H 2F Pn ot
=
nr
F 2F
= SNRave . (8.160)
nt nr
To be clear, even though the channel is estimated perfectly, there is a bias in the
SNR estimate.
This discussion becomes somewhat complicated in the context of frequency-
selective fading. Specifically, if we assume that orthogonal-frequency-division
multiplexing (OFDM) modulation is employed where there is a large number
of carriers, then an average SNR maybe estimated. Under the assumptions of a
constant average attenuation across frequency, and of significant resolvable delay
spread, which indicates significant frequency-selective fading, the average SNR
can be found by averaging across the SNR estimates (remembering to perform
this on a linear scale). In this regime, the average estimated SNR converges to
the average SNR,
5 6
→ SNR .
SNR (8.161)
However, if there is not significant delay spread, then there will not be a large
set of significantly different channel matrices to average over. Consequently, the
average SNR and the average estimated SNR will not be the same.
8.11.3 MIMO capacity for estimated SNR in block fading

It is assumed here that we have block fading, such that for each draw from
the probability distribution for the channel matrix, the channel matrix is con-
stant for long period of time. There are various ways of describing the limit-
ing performance of a MIMO link. The information-theoretic capacity of MIMO
systems with uninformed transmitters is considered here in terms of the er-
godic capacity and the outage capacity, each in terms of either the average
SNR or the SNR measured for a given draw of the channel matrix, which is
denoted here the estimated SNR. Here the estimated SNR is assumed to be
accurate.
From Equation (8.37), the capacity of a MIMO system for a particular channel
in the absence of external interference is given by

Po

cU T = log2 In r + †
HH . (8.162)
nt
The ergodic capacity, introduced in Section 8.3.2, is the average capacity in
which the expectation is taken over a distribution of channel matrices.
290 MIMO channel
Consequently, the ergodic capacity in the absence of external interference is

given by
ce = cU T
) *
Po
= log2 In r + H H†
nt
) *
a2 Po
= log2 In r + F F† , (8.163)
nt
where the only remaining random variable is the channel distribution.
From this expression, it can be observed that the standard formulation of the
ergodic capacity is given in the context of the average SNR, which is a parameter
that may or may not be able to be estimated in an experimental context. We
can reformulate the relationship between the estimated and the average SNR for
the asymptotic estimated channel (that is, the well-estimated channel) found in
Equation (8.160) to replace the argument in the capacity,
a 2 Po nr .
= SNR (8.164)
nt F 2F
Consequently, the ergodic capacity as a function of estimated SNR is given by
) *
nr
ce, SNR = log I +
SNR F F†
2 nr
F F 2 , (8.165)
where the expectation is taken over channel realizations.

Similar to the discussion with regard to the ergodic capacity in Section 8.11.3,
the outage capacity can be represented in terms of the estimated rather than the
is found by implicitly solving
average SNR. The outage capacity co, SNR

a2 nt nr
Pr c SNR , H, R ≥ η = Pclose . (8.166)
H 2F
8.11.4 Interpretation of various capacities

In Figure 8.12, the ergodic and outage capacities for both the average and es-
timated SNRs are presented. In general, all four curves are useful in different
regimes. In simulations it is easy to use the average SNR. Both the outage and
ergodic capacities can be evaluated. Often in practical applications, it is the out-
age capacity that is of interest because it is a closer match to the use of modern
communications that allow frames to be resent in the case of outages. That is,
the receiver of the dropped frame asks the transmitter to resend the frame. For
experimental applications with limited frequency or temporal diversity, only the
estimated SNR is available. Good space-time codes can operate within a few deci-
bels of this estimated SNR capacity. For applications with significant diversity,
such as OFDM systems with very large symbols and significant delay spread,
8.12 Channel-state information at transmitter 291
5
99Out Est SNR

4 <Cap> Est SNR
99Out Ave SNR
<Cap> Ave SNR
3
0
−10 −5 0 5 10 15
SNR Per Receive Antenna (dB)
Figure 8.12 Performance bounds for a 2 × 2 MIMO system under the assumptions:
“99Out Est SNR” – the outage capacity with a 99% probability of closing the link
under the assumption that the SNR per receive channel is estimated from the
received signal; “<Cap> Est SNR” – the average or ergodic capacity under the
assumption that the SNR is estimated from the received signal; “99Out Ave SNR” –
the outage capacity with a 99% probability of closing the link under the assumption
that the SNR is the average SNR given by a distribution of channels; “<Cap> Ave
SNR” – the average or ergodic capacity under the assumption that the SNR is the
average SNR given by a distribution of channels.
the coding should approach the ergodic capacity and the average estimated SNR
should approach the average SNR.
8.12 Channel-state information at transmitter
To perform informed transmit approaches, channel-state information (CSI) must

be known at the transmitter. There are two ways typically used to provide this
information, the first is reciprocity and the second is a feedback channel link. The
channel may be tracked [190] or estimated in blocks of data. In either approach,
a bidirectional link is assumed. Implicit in this discussion is the notion that the
channel is sufficiently stable so that an estimate constructed at one point in time
is a reasonable estimate at some later point in time [30]. Under some models, the
changing channel can be predicted for some period of time [49, 302], although
that possibility is not considered here.
8.12.1 Reciprocity
For two radios, the reciprocity approach takes advantage of the physical property
that, after time reversal is taken into account, the channel is in principle the same
292 MIMO channel
from radio 1 to radio 2 as it is from radio 2 to radio 1. As an example, in a flat-

fading environment if the channel from radio 2 to 1 is given by H2→1 , then the
channel from radio 1 to 2 is given by
H1→2 = HT2→1 . (8.167)
There are, however, a few technical caveats to this expectation. Radios of-
ten use different frequencies for different link directions. In general, the chan-
nels in multipath environments decorrelate quickly as a function of frequency.
If frequency-division multiple access (FDMA) is employed, then reciprocity may
not provide an accurate estimate of the reverse link.
Another potential practical issue is that the path within the radio is not ex-
actly the same between the transmit and the receive paths. From a complex
baseband processing point of view, these hardware chains are part of the chan-
nel. In principle, these effects can be mitigated by calibration. However, it is a
new requirement placed upon the hardware that is not typically considered.
The most significant potential concern is that the reciprocity approach only
captures the interference structure at the local radio and does not capture the
interference structure at the “other” radio. The optimal informed transmitter
strategy is to use the interference-plus-noise whitened channel estimate. As an
example, in a flat-fading environment the channel from radio 1 to 2 is given
by H1→2 , the interference-plus-noise covariance matrix at radio 2 is denoted
R2 , and the whitened channel matrix used by the optimal strategy is given
−1/2
by R2 H1→2 . While a reasonable estimate for the channel can be made by
reciprocity, the interference structure R2 cannot be estimated by a reciprocity
approach. Consequently, for many informed transmitter applications, reciprocity
is not a valid option.
8.12.2 Channel estimation feedback

A variety of techniques provide feedback of channel estimates. This can be done
directly or indirectly and with various degrees of fidelity. As an indirect approach,
the transmitter can search for the best channel. In this approach, the receiver
provides the transmitter a simple feedback message. At each iteration, the feed-
back message contains information indicating if the last iteration’s transmit ap-
proach was better or worse. The message may be simple or provide derivative
information.
More commonly considered is a technique that provides a direct channel es-
timate in the feedback message. One of the issues when implementing this ap-
proach is how to represent the channel (really the whitened channel) for feedback.
Typically, lossy channel estimation source compression is considered. As an ex-
ample, having the receiver feed back beamforming vectors that occupy sparsely
on some Grassmannian manifold are considered in References [198, 195].
Problems 293
Problems
8.1 Develop the result in Equation (8.48) for the case in which nt > nr .
8.2 Under the assumption of an uninformed transmitter, 4 × 4 MIMO system

and an i.i.d. complex Gaussian channel, find the probability that at least 50% of
the energy remains after mitigating
(a) 1
(b) 2
(c) 3
strong interferers.
8.3 Evaluate the capacity in Equation (8.52) if the interference, noise, and
channel are all not frequency selective.
8.4 Considering the received signal projected onto a subspace orthogonal to a

known interference that is presented in Equation (8.5).
(a) Express the projection in terms of the least-squares channel estimation.
(b) Express the average fractional signal loss due to the temporal mitigation in
terms of the number of samples.
8.5 Reevaluate the relations in Equation (8.48) under the assumption that the
interference of rank ni is much larger than the signal SNR (INR SNR).
8.6 Evaluate the informed to uninformed capacity ratio cI T /cU T in the limit
of an infinite number of transmit and receive antennas (as discussed in Section
8.7) and in the limit of low SNR as a function of the ratio of receive to transmit
antennas κ.
8.7 For a frequency-selective 2 × 2 MIMO channel that is characterized with

reasonable accuracy by two frequency bins with channel values
⎛ ⎞
1 −3 0 0
⎜ 3 −1 0 0 ⎟
H̆ = ⎜
⎝ 0 0
⎟, (8.168)
1 2 ⎠
0 0 −2 −1
evaluate the informed transmitter capacity expressed in Equation (8.55) as a

function of per receive antenna SNR.
8.8 Consider the outage capacities (at 90% probability of closing) presented in
Equation (8.103) for a 4×4 flat fading channel matrix characterized by the expo-
nential shaping model introduced in Equation (8.83). Assume that the transmit
and receive shaping parameters have the same value α. As a function of per re-
ceive antenna SNR (−10 dB to 20 dB), numerically evaluate and plot the ratio of
294 MIMO channel
the outage capacity (at 90% probability of closing) for the values of exponential
shaping parameter α:
(a) α = 0.25
(b) α = 0.5
(c) α = 0.75
to the outage capacity assuming that α = 1.
9 Spatially adaptive receivers
The essential characteristic of an adaptive receiver is to be aware of its environ-

ment and to adjust its behavior to improve its typical performance. One might
notice that this description is relatively close to the definition of a cognitive
radio, which will be discussed in Chapter 16, but the flexibility of an adaptive
receiver is typically limited to estimating parameters used in its algorithms at the
receiver. When arrays of antennas are employed, the adaptive receiver usually
attempts to reduce the adverse effect of interference and to increase the gain on
signals of interest [294, 246, 223, 205, 312, 248, 189, 26, 182]. This interference
can be internal or external as discussed in Sections 4.1 and 8.2.
In Figure 9.1, a typical communication chain is depicted. The information is
encoded and modulated. While a continuous formulation is possible, most mod-
ern communication systems use sampled approaches, and the sampled approach
will be employed here. The nt number of transmitters by the number of samples
ns signal is indicated by S ∈ Cn t ×n s . For convenience, the signal S is represented
at complex baseband. At the receiver, the transmitted signal, corrupted by the
channel, is observed by nr receive antennas, represented at complex baseband
by Z ∈ Cn r ×n s . In many receivers, to reduce complexity an estimate of the
transmitted signal is evaluated as Ŝ.
For the approaches considered within this chapter, it is assumed that a por-
tion of the transmitted signal, represented by S, or more precisely a normalized
version of the transmitted signal X ∈ Cn t ×n s , is “known” at the receiver. The
normalization chosen here is
2
X F = nt ns , (9.1)
such that the transmitted signal matrix S is given by
,
Po
S= X, (9.2)
nt
where Po is the total noise-normalized power (really energy per sample). It is
common for communication systems to transmit a predefined signal for a portion
of a transmission frame. This portion of the transmitted signal is referred to as a
reference signal, as a pilot sequence, or as a training sequence. The beamformer
is constructed by using these training data and is then applied to extract an
estimate of the transmitted signal in some region of time near the training data.
Figure 9.1 Basic communication chain. Depending upon the implementation, the
coding and modulation may be a single block. Similarly, the receiver and decoding
may be a single block.
In some sense, the “right” answer is to not separate coding and modulation
or even channel estimation. The optimal receiver would instead make a direct
estimate of the transmitted information based upon the observed signal and a
model of the channel. In reality, the definition of optimal is dependent upon the
problem definition (for example, incorporating the cost of computations). Ignor-
ing external constraints, the optimal receiver in terms of receiver performance is
the maximum a posteriori (MAP) solution, that is, it maximizes the posterior
proability.
To begin, we consider the maximum a posteriori solution for a known channel.
Consider an information symbol represented by a single element selection from a
set α ∈ {α1 , α2 , . . . , αM } (which many take many bits or a sequence of channel
uses to represent). Given this notation, the maximum a posteriori solution is
formally given by
α̂ = argmaxα pα (α|Z)
pα (α)
= argmaxα p(Z|α)
p(Z)
= argmaxα p(Z|α) pα (α) , (9.3)
where Bayes’ theorem, as introduced in Equation (3.4), has been applied. As

introduced in Section 3.1.7, pα (α|Z) indicates the posterior probability distribu-
tion for information symbols given some observed data, pα (α) indicates the prior
probability distribution of the information symbols, and p(Z|α) is the conditional
probability distribution of observing Z given some information symbol. Because
it is expected that all information symbols are equally likely pα (αm ) = pα (αn ),
the maximum a posteriori solution is given by the maximum-likelihood (ML)
solution.
As a specific example, in the case of the Gaussian MIMO channel, described
in Equation (8.3), with external interference-plus-noise covariance matrix R ∈
Cn r ×n r , under the assumption of a known channel matrix, the Gaussian like-
lihood is maximized by searching over each transmitted sequence hypothesis
Spatially adaptive receivers 297
αm ⇒ Sm , where the subscript m indicates the mth hypothesis. The maximum-

likelihood solution for the estimated symbol α̂, which corresponds to sequence
Ŝ, is given by
1 †
R −1 (Z−H S m )}
α̂ ⇐ argmaxS m e−tr{(Z−H S m ) . (9.4)
πn r n s |R|
Evaluating this form directly for any but the smallest set of hypotheses is unten-
able, although approaches to reduce the cost of the search have been considered
as discussed in Section 9.6 and in Reference [72], for example.
This optimization can be extended to a general maximum a posteriori receiver
by including a searching over all possible channels H, and external-interference-
plus-noise covariance matrix R, in which case the estimate of the symbol α̂,
associated with Sm , is given by
1 †
R −1 (Z−H S m )}
α̂ ⇐ argmaxS m ;H ,R e−tr{(Z−H S m ) pH (H) pR (R) pα (α) ,
πn r ns |R|
(9.5)
where pH (H) is the probability density function for channel matrices, and pR (H)
is the probability density function for interference-plus-noise covariance matrices.
More typically, the likelihoods for each possible symbol as a function of time
are passed to a decoder that estimates the underlying information bits. The
decoder used in practice is usually far less computationally expensive than the
full maximum-likelihood or maximum a posteriori search.
A suboptimal approximation of the maximum-likelihood receiver searches over
potential beamformers and consequently does not require explicit training data.
An example is a receiver in which beamformers are guessed. For each guess, de-
coding is attempted. Once reasonable decoding performance is achieved, remod-
ulated data (decoded data that are passed through the encoding and modulation
blocks to build the reference) can be used as training data. This process can be
repeated and receiver performance iteratively improves. A version of this receiver
is discussed in Section 9.6
For many practical receiver implementations, it is useful to employ training
and to separate the receiver into components. One common approach is to per-
form adaptive spatial processing before decoding. At the output of the spatial
processing stage, an estimate of the transmitted signal Ŝ, or more precisely X̂, is
provided. This estimate may be in the form of hard decisions for the symbols or
may be a continuous approximation of the transmitted signal that can be used
to calculate likelihoods of various symbols.
In this chapter, a number of approaches for adaptive processing are discussed.
Often the adaptive spatial processing is employed to remove both internal and
external interference. When mitigating interference for which the signal is known
(as in the case of a MIMO training sequences) or can be estimated, the inter-
ference can also be mitigated by using temporal interference mitigation. Most
of these techniques discussed in this chapter focus on spatial processing; in
Section 9.6, a combination of spatial and temporal processing is considered.
9.1 Adaptive spectral filtering
There is a strong connection between approaches used for spectral filtering and
those used for adaptive spatial processing. As an introduction, the Wiener spec-
tral filter [346, 142, 256, 273] is considered here for a single-input single-output
(SISO) link. Specifically, we will consider a sampled Wiener filter applied to
spectral compensation, which is the minimum-mean-squared error (MMSE) rake
receiver [254]. The name rake receiver comes from the appearance of the original
physical implementation of the rake receiver in which the various mechanical
taps off a transmission line used to introduce contributions at various delays
looked like the teeth of garden rake. This filter attempts to compensate for the
effects of a frequency-selective channel.
The effect of channel delay is to introduce intersymbol interference, which
is used to describe delay spread in the channel introducing copies of previous
symbols at the current sample. Given some finite bandwidth signal, the channel
can be accurately represented with the sampled channel if the bandwidth B
satisfies Ts < 1/B. Note that the standard Nyquist factor of two for real signals
is absent because these are complex samples and, consequently, we are taking B
to span both the positive and negative frequencies at baseband.
To be clear, there are a number of issues related to the use of discretely sampled
channels. In particular, scatterers placed off the sample points in time can require
large numbers of taps to accurately represent the channel effect. These effects
are discussed in greater detail in Sections 4.5 and 10.1.
9.1.1 Discrete Wiener filter

For a transmitted complex baseband signal s(t) ∈ C, a received complex base-
band signal z(t) ∈ C, and channel impulse response h(t) ∈ C in additive Gaussian
noise n(t) ∈ C, the receive signal is given by the convolution of the transmitted
signal and the channel plus noise,

z(t) = dτ h(τ ) s(t − τ ) + n(t) . (9.6)
In order to approximately reconstruct the original signal with minimum error,

a rake receiver with coefficients wm is applied to the received signal,

∗
ŝ(t) = wm z(t − m Ts ) , (9.7)
m
where ŝ(t) is the estimate of the transmitted signal, and Ts is the sample period
for which it is assumed that the sampling is sufficient to satisfy Nyquist sampling
9.1 Adaptive spectral filtering 299
requirements. For an error (t), the mean-squared error is given by

∗
(t) = wm z(t − m Ts ) − s(t)
m
'$ $2 (
% & $ $
$ ∗ $
(t) 2 = $ wm z(t − m Ts ) − s(t)$ . (9.8)
$m $
The MMSE solution is found by taking a derivative of the mean-squared error

with respect to some parameter α of the filter coefficient wm ,
∂ % &
(t) 2 = 0 (9.9)
∂α
' ∗ (
∂
∗ ∗
= wm z(t − m Ts ) − s(t) wn z(t − n Ts ) − s(t)
∂α m n
'
∂ ∗
= w z(t − m Ts ) z ∗ (t − n Ts ) wn
∂α m ,n m
(
∗
− wm z(t − m Ts ) s∗ (t) − s(t) z ∗ (t − n Ts ) wn + s(t)s∗ (t)

∂
∗
= wm z(t − m Ts ) z ∗ (t − n Ts ) wn
∂α m ,n

∗ ∗ ∗ ∗
− wm z(t − m Ts ) s (t) − s(t) z (t − n Ts ) wm + s(t)s (t) .
Constructing the filter vector w, autocorrelation matrix Q, and the cross-correlation

vector v by using the definitions
{w}m = wm
{Q}m ,n = z(t − m Ts ) z ∗ (t − n Ts )
{v}m = z(t − m Ts ) s∗ (t) , (9.10)
the derivative of the mean-squared error or average error power1 can be written
as
∂ % & ∂ †
(t) 2 = w Q w − w† v − v† w + s(t)s∗ (t)
∂α ∂α

∂ †
= w [Q w − v] + h.c. , (9.11)
∂α
where h.c. indicates the Hermitian conjugate of the first term. This equation is
solved by setting the non-varying term to zero, so that the filter vector w is given
1 Strictly speaking, the output of the beamformer should be parameterized in terms of
energy per symbol, but it is common to refer to this parameterization in terms of power.
Because the duration of a symbol is known, the translation between energy per symbol
and power is a known constant.
by
Qw = v
w = Q−1 v ; Q > 0 . (9.12)
This result is known as the Wiener–Hopf equation [346, 256]. The result can be
formulated in more general terms, but this approach is relatively intuitive. Thus,
the MMSE estimate of the transmitted signal ŝ(t) is given by

∗
ŝ(t) = wm z(t − m Ts )
m
wm = {Q−1 v}m . (9.13)
9.2 Adaptive spatial processing
By using the same notation as that found in Equation (8.122), the nt transmitter
by nr receiver sampled flat-fading MIMO channel model can be given by either
of two forms depending upon whether the power parameter is absorbed into the
transmitted signal or the channel matrix in which the received data matrix Z is
given by either
Z = HS + N
, ,
Po Po
=H X + N; S= X,
nt nt
or
,
Po
= AX + N; A= H, (9.14)
nt
where the received signal is indicated by Z ∈ Cn r ×n s , the channel matrix is
indicated by H ∈ Cn r ×n t , the transmitted signal is indicated by S ∈ Cn t ×n s ,
the noise plus interference is indicated by N ∈ Cn r ×n s , the amplitude-channel
product is indicated by A ∈ Cn r ×n t , the normalized transmitted signal is indi-
cated by X ∈ Cn t ×n t , and the total thermal-noise-normalized power is indicated
by Po . It may be overly pedantic to differentiate between these two forms (X
versus S) because it is typically clear from context. Nonetheless, for clarity in
this chapter, we will maintain this notation.
By employing a linear operator, denoted the beamforming matrix W ∈ Cn r ×n t ,
an estimate of the normalized transmitted signals X̂ is given by
X̂ = W† Z , (9.15)
where the columns of the beamforming matrix W contain a beamformer associ-
ated with a particular transmitter. The complex coefficients within each column
are conjugated and multiplied by data sequences from each receive antenna data
stream. These modified data streams are then summed and are used to attempt
to reconstruct the signal associated with a given transmitter.
Adaptive spatial processing is sometimes referred to as spatial filtering. The

word filter is used because of the strong formal connection between spatial pro-
cessing and spectral processing. The spatial location of the antennas corresponds
to the delay taps in a spectral filter. The spatial direction corresponds to the fre-
quency.
There is an unlimited number of potential receive beamformer approaches.
Four approaches are discussed here: matched filter in Section 9.2.1, minimum
interference (zero forcing) in Section 9.2.2, MMSE in Section 9.2.3, and maximum
SINR in Section 9.2.4.
9.2.1 Spatial matched filter

The concept of a matched filter is used to construct a beamformer that max-
imizes the receive power to thermal noise ratio associated with a particular
transmitter. The beamformer that maximizes this power has a structure that is
matched to the received array response for a particular transmitter. This type of
beamformer is sometimes denoted a maximum ratio combiner (MRC). Note, this
formulation does not necessarily maximize the signal-to-interference-plus-noise
ratio (SINR). In the case of a line-of-sight environment, the filter corresponds to
the steering vector for the direction to the particular transmitter.
Here the beamformer for each transmitter is constructed individually. The
beamformer for the mth transmitter is wm ∈ Cn r ×1 , and the channel between
the mth transmitter and the receive array (the mth column of the amplitude-
channel product A) is given by am ∈ Cn r ×1 . The transmit sequence from the
mth transmitter is given by xm ∈ C1×n s . The received signal matrix Z is given
by
Z = AX + N

= a m xm + N . (9.16)
m
By ignoring the signal from other transmitters (the internal interference) and
the external interference plus noise, the power at the output of the beamformer
associated with a particular transmit antenna Qm is given by
1 % † &
Qm = wm am xm 2
ns
1 †
% &
= tr{wm am xm x†m a†m wm }
ns
†
= tr{wm am a†m wm } , (9.17)
where the expectation is over the transmitted signal, and the average power of
the signal transmitted by the mth antenna is normalized to be one. Because the
unconstrained beamformer that maximizes the power at the output has infinite
coefficients, some constraint on the beamformer is required. Here, it is required
that the squared norm of the beamformer be unity,

2
wm = 1. (9.18)
The beamformer that maximizes the average power under this constraint is found
by using the method of Lagrangian multipliers discussed in Section 2.12.1:
∂
0= Qm − λm wm 2
∂α
∂ †

= tr{wm am a†m wm − λm wm
†
wm }
∂α

∂wm†
= tr am a†m wm − λm wm + h.c. , (9.19)
∂α
where λm is the Lagrangian multiplier, α is some arbitrary parameter of wm ,

and h.c. indicates the Hermitian conjugate of the first term. This form is solved
by the nontrivial eigenvector that satisfies
am a†m wm = λm wm
am
wm = . (9.20)
am
The beamformer is matched to the array response or equivalently the appropriate

column of the channel matrix. This result is consistent with the intuition provided
by the Cauchy–Schwarz inequality, which requires that
†
wm am ≤ wm am . (9.21)
The equality is achieved only if wm ∝ am .

At this point, it is worth mentioning that amplitude-channel vector am is typ-
ically not known prior to the symbol estimation and must be estimated. The
transmitter of interest (the mth) emits the sequence xm . Similar to the develop-
ment of Equation (8.128), by assuming a Gaussian model, the log-likelihood is
given by
3 4
log p(Z) = −tr (Z − am xm − Am Xm )† R−1 (Z − am xm − Am Xm )
− ns log |R| + const. , (9.22)
where the channel matrix without the channel vector from the transmitter of
interest Am ∈ Cn r ×(n t −1) is given by
Am = (a1 · · · am −1 am +1 · · · an t ) . (9.23)
Similarly, the transmit signal matrix without the transmitter of interest Xm ∈

C(n t −1)×n s is given by
Xm = (xT1 · · · xTm −1 xTm +1 · · · xTn t )T . (9.24)

The maximum-likelihood estimate (evaluated in Problem 9.3) of the amplitude-

channel vector am under the Gaussian model is given by
Z P⊥ †
X m xm
âm = †
, (9.25)
xm P⊥
X m xm
where the the projection operator P⊥

X m orthogonal to the row space of Xm is
defined by
† † −1
P⊥
X m = I − Xm (Xm Xm ) Xm . (9.26)
To evaluate the beamformer, here it is assumed that X is a known training

sequence. By using Equation (9.25), the beamformer wm is given by
Z P⊥ †
X m xm
wm ≈ †
. (9.27)
Z P⊥
X m xm
If the transmit sequences are approximately orthogonal, which is typically true,

then âm = Z x†m (xm x†m )−1 .
9.2.2 Minimum-interference spatial beamforming

While the matched-filter beamformer discussed in the previous section ignored
the presence of external interference or even other transmit antennas, minimum-
interference beamformers focus on the effects of interference. Depending upon
the constraints and regime of operation assumed in their development, these
beamformers have a variety of names (such as minimum interference, zero forcing
[199], and killer weights [32]) that are often used.
The form of a minimum-interference receiver varies based upon the assump-
tions employed. The receiver may or may not assume that the external inter-
ference plus noise is spatially uncorrelated. In addition, the number of transmit
and external interference sources may or may not be larger than the number
of receive antennas. In the regime in which the number of receive antennas is
equal to or larger than the number of transmit and interference sources, the
minimum-interference beamformer has a convenient property that the signals
associated with each beamformer output stream are uncorrelated. It is, conse-
quently, a decorrelating beamformer. For analysis, this can be useful because
correlations can complicate analytic results. However, the minimum-interference
beamformer can often overreact to the presence of interference by attempting to
null interference that is weak, unnecessarily using degrees of freedom that con-
sequently reduce the signal-to-interference-plus-noise ratio (SINR) at the output
of the beamformers.
Channel inversion or zero forcing

A common beamforming approach is the channel inversion technique, which is
also denoted the zero-forcing receiver. This approach is a spatial extension to the
spectral zero-forcing equalizer [199] and reconstructs the transmitted sequences

exactly in the absence of noise if nr ≥ nt . In this approach, the beamform-
ers WZ F ∈ Cn r ×n t are constructed using the pseudoinverse of the amplitude-
channel product, revealing the transmit sequence corrupted by noise,
WZ F = A (A† A)−1
X̂ = WZ† F Z = (A† A)−1 A† Z
= (A† A)−1 A† A X + (A† A)−1 A† N
= X + (A† A)−1 A† N . (9.28)
The outputs of this beamformer, under the assumption of perfect channel knowl-
edge and at least as many receive antennas as transmit antennas (nr ≥ nt ), have
no contributions from other transmitters. The beamformer adapted for the mth
transmitter wm is given by
wm = WZ F em
= A(A† A)−1 em , (9.29)
where the selection vector {em }n = δm ,n is given by the Kronecker delta, that
is one if m and n are equal and zero otherwise.
Because the channel is not known, this beamformer wm must be approximated
by an estimate of the channel. By substituting the maximum-likelihood channel
estimate, under the Gaussian interference and noise model that is found in Equa-
tion (8.128), into Equation (9.29), the estimated channel inversion beamformer
is found,
wm ≈ Â(Â† Â)−1 em
= Z X† (X X† )−1 [(X X† )−1 X Z† Z X† (X X† )−1 ]−1 em
= Z X† [X Z† Z X† ]−1 X X† em
= Z X† [X Z† Z X† ]−1 X x†m , (9.30)
where to estimate the beamformer it is assumed that X is a known training

sequence.
Orthogonal beamformer
Imagine a scenario in which there is a MIMO link with no external interference.
The interference from other transmitters within the MIMO link is minimized by
constructing a beamformer wm for each transmit antenna that is orthogonal to
the spatial subspace spanned by the spatial responses of the other transmitters,
wm ∝ P⊥
A m am , (9.31)
where P⊥ A m ∈ C
n r ×n r
is the operator that projects onto a column space orthog-
onal to the spatial response of the interfering transmitters. This construction is
heuristically satisfying because the beamformer begins with the matched-filter
array response and then projects orthogonal to subspace occupied by the internal
interference. If there were no interference, then the beamformer would become
the matched filter. We will show that the beamformer constructed by using this
model is proportional to the zero-forcing beamformer and is therefore equivalent.
By using Equation (9.23), the projection operator that projects onto a basis
orthogonal to the receive array spatial responses of all the other transmitters
P⊥A m is given by
P⊥
A m = I − PA m
PA m = Am (A†m Am )−1 A†m . (9.32)
This form can be found by considering the beamformer that minimizes the in-
terference. The average interference power at the output of a beamformer from
(in t)
other transmitters of the MIMO link Qm is given by
) *
1 $
$wm
$2
†
Am Xm $
(in t)
Qm ∝
ns
1 † 5 6
= wm Am Xm X†m A†m wm
ns
†
= wm Am A†m wm , (9.33)
where the expectation is evaluated over the ensemble of transmitted sequences.

It is assumed that the cross correlation between normalized transmitted signals
is zero and the power from each transmitter is one,2
5 6
Xm X†m = ns In t −1 . (9.34)
By minimizing the expected interference power under this constraint, the beam-
former is found. The constraint on the norm of wm is enforced by using the
method of Lagrangian multipliers discussed in Section 2.12.1,
∂ (in t)
0= Qm − λm wm 2
∂α
∂
= tr{wm†
Am A†m wm − λm wm†
wm }
∂α
∂w†
†
= tr Am Am wm − λm wm m
+ h.c. , (9.35)
∂α
where λm is the Lagrangian multiplier, α is some arbitrary parameter of wm ,
and h.c. indicates the Hermitian conjugate of the first term. The beamformer
lives in the subspace spanned by the eigenvectors associated with the eigenvalues
with zero value,
0 wm = Am A†m wm . (9.36)
2 Implicit in this formulation is the assumption that the MIMO system is operating in an
uninformed transmitter mode.
The relationship can be simplified by recognizing that a projection onto the space
orthogonal to the column space of amplitude-channel product Xm imposes the
same constraint,
0 wm = PA m wm , (9.37)
by multiplying both sides of Equation (9.36) by Am (A†m Am )−2 A†m . An orthog-

onal beamformer must satisfy this relationship.
If the numbers of receive and transmit antennas are equal, nr = nt , then
the beamformer is uniquely determined by the above equation. However, if
nr > nt , then the beamformer has only been determined up to a subspace. While
any beamformer that satisfies the constraint in Equation (9.37) is a minimum-
interference beamformer, the remaining degrees of freedom can be used to in-
crease the expected power from the signal of interest. In other words, in the space
of possible beamformers that satisfy the constraint, we want to find the beam-
former that maximizes signal power. Consequently, it is desirable to maximize
the inner product between the beamformer and the amplitude-channel product
†
wm am , subject to the orthogonality constraint.
The manipulation is similar to that performed for the matched-filter beam-
†
former in Section 9.2.1 with the additional constraint that wm PA m wm = 0,
∂ †

0= tr{wm am a†m wm − λm wm
†
wm − ηm wm†
PA m wm }
∂α

†
∂wm †
= tr am am wm − λm wm − ηm PA m wm + h.c. , (9.38)
∂α
where λm and ηm are Lagrangian multipliers, α is some arbitrary parameter of
wm , and h.c. indicates the Hermitian conjugate of the first term. This relation-
ship is satisfied when
am a†m wm − λm wm − ηm PA m wm = 0 . (9.39)
The second constraint can be satisfied by requiring that the beamformer be

limited to a subspace spanned by P⊥ ⊥
A m because PA m PA m = 0. Consequently,
the beamformer wm satisfies
wm = P⊥
A m wm . (9.40)
By substituting this relationship into Equation (9.39), the constrained form is

found,
0 = am a†m P⊥ ⊥ ⊥
A m wm − λm PA m wm − ηm PA m PA m wm
= am a†m P⊥ ⊥
A m wm − λm PA m wm
= P⊥ † ⊥ ⊥
A m am am PA m wm − λm PA m wm , (9.41)
where the observation that projection operators are idempotent (which indicates
the operation can be repeated without affecting the result) is employed. This
eigenvalue problem is solved by

P⊥
A m am
wm = . (9.42)
P⊥
A m am
Once again, the channel response is not typically known; however, by using a
reference signal, the beamformer wm can be estimated by employing Equation
(9.25),
P̂⊥ ⊥ †
A m Z PX m xm
wm ≈ †
, (9.43)
P̂⊥ ⊥
A m Z PX m xm
where an estimate for the projection matrix is given by

† † −1
P̂⊥
A m ≈ I − Âm (Âm Âm ) Â†m . (9.44)
Similar to the development of Equation (9.25), the maximum-likelihood estimate

for Am is given by
† † −1
Âm = Z P⊥ ⊥
x m Xm (Xm Px m Xm ) , (9.45)
where P⊥ † †
x m = I − xm xm /(xm xm ).
Equivalence of the zero-forcing and orthogonal beamformers

The connection between the orthogonal beamformer found in Equation (9.31)
and the zero-forcing beamformer found in Equation (9.29) can be shown by
demonstrating that they are proportional,
A (A† A)−1 em ∝ P⊥
A m am . (9.46)
To show this proportionality, we note that P⊥

A m am must lie within the subspace
spanned by the matrix A, because PA and PA m can be jointly diagonalized and
thus commute
P⊥ ⊥
A m PA am = PA PA m am
8 9
= A (A† A)−1 A† P⊥
A m am . (9.47)
Here the projector on the column space of amplitude-channel product A has

no effect on the right-hand side of the above equation because am is contained
within the column space of A. Consequently, the two beamformers must be
proportional to each other if
em ∝ A† P⊥
A m am
= (P⊥ †
A m A) am , (9.48)
where the Hermitian property of projection matrices is exploited. Without loss

of generality, the first transmitter can be designated the transmitter of interest,
so that m = 1. The channel response associated with the transmitter of interest
a1 can be decomposed into two orthogonal subspaces defined by P1 = PA m and

P⊥ ⊥
1 = PA m for m = 1,
a1 = P1 a1 + P⊥
1 a1 . (9.49)
The amplitude-channel matrix A can then be expressed

A = P1 a1 +P⊥ 1 a1 a2 · · · an t . (9.50)
The projection onto the subspace orthogonal to that spanned by the columns of
the channel matrix other than the first column is given by

P⊥A m A = P1
⊥
P1 a1 +P⊥
1 a1 a2 · · · a n t

= P⊥ 1 a1 0 ··· 0 (9.51)
because P⊥ ⊥
A m which in this example is indicated by P1 is constructed to be
orthogonal to the subspace containing the vectors a2 · · · an t . Consequently, a
form proportional to the selection vector is found,
†
(PA m A)† a1 = P⊥
1 a1 0 ··· 0 a1
⎛ † ⊥ ⎞
a1 P1 a1
⎜ 0 ⎟
⎜ ⎟
=⎜ .. ⎟
⎝ . ⎠
0
∝ e1 . (9.52)
Similarly, for any value of m, this relationship holds, so the two beamformers
are the same up to an overall normalization.
Minimum interference in external interference

Up to this point in the discussion of the minimum interference beamformer, it has
been assumed that the channels to the interfering transmitters are known or can
be estimated. External interference is that which is caused by sources for which
there is insufficient information to estimate the channel explicitly. As an example,
the timing and frame structure of external interference may not be known, and
more importantly, the training sequences may not be known. However, the effect
on the receiver as expressed in terms of the interference-plus-noise covariance
matrix R ∈ Cn r ×n r can be estimated.
The interference may be sampled from any probability distribution, but it
is often assumed that the interference is drawn from a Gaussian distribution.
The Gaussian distribution represents the worst-case distribution in terms of the
adverse effect on link capacity because it has the maximum entropy. Further-
more, many distributions can be modeled with reasonable fidelity by the Gaus-
sian distribution. The important parameter of the distribution is the external
interference-plus-noise covariance matrix R,

1 % &
R= N N† . (9.53)
ns
Thus, the probability distribution for the received signal p(Z|X; A, R) is given
by the complex Gaussian distribution
1 † −1
p(Z|X; A, R) = e−tr{(Z−A X) R (Z−A X)} . (9.54)
π n s n r |R|n s
If the thermal noise is normalized to unity per receive antenna, then the
interference-plus-noise covariance matrix R can be expressed by
R = J J† + I , (9.55)
where the external interference is characterized by J J† . Along each column of J
is the receiver array response times the amplitude associated with a particular
interferer.
Quoting the result found in Equation (8.128), under the Gaussian interference
and noise, the maximum-likelihood estimate is given by
Â = Z X† (X X† )−1 . (9.56)
By employing this estimate for the channel vector, the log-likelihood is given by
log p(Z|Â, R; xm ) = −tr{(Z − Z X† (X X† )−1 X)† R−1
· (Z − Z X† (X X† )−1 X)} − log |R|n s + const.
= −tr{(Z P⊥ †
X) R
−1
(Z P⊥
X )}
− ns log |R| + const. , (9.57)
where the projection operator P⊥
X removes components within the subspace as-
sociated with the row space (that is, the temporal space) of the normalized
transmit reference matrix X,
P⊥ † † −1
X = I − X (X X ) X. (9.58)
By setting the derivative of the log-likelihood with respect to some parameter α of
the interference-plus-noise covariance matrix R to zero, the maximum-likelihood
estimator is found,
∂ ∂
log p(Z|Â, R; X) = − [tr{(Z P⊥ †
X) R
−1
(Z P⊥X )} + ns log |R|]
∂α ∂α
⊥ † −2 ∂R ⊥
= tr (Z PX ) R (Z PX )
∂α

−1 ∂R
−ns tr R
∂α

⊥ † −2

−1 ∂R
= tr Z PX Z R − ns R
∂α
= 0. (9.59)
The solution of this equation provides an estimate for the interference-plus-noise

covariance matrix R̂ given by
1
R̂ = Z P⊥ †
X Z . (9.60)
ns
An eigenvalue decomposition can be used to find the subspace in which the
interference exists,
R̂ = U D U† , (9.61)
where U is a unitary matrix and D is a diagonal matrix containing the eigen-

values of R̂. Selecting the q largest eigenvalues and collecting the corresponding
columns in U, the matrix Uq ∈ Cn r ×q is constructed. There are a variety of
techniques for selecting the correct value of q. A common approach is to set
a threshold some fixed value above the expected noise level. The selection of
eigenvalues for a related problem is discussed for angle-of-arrival estimation in
Section 7.5.
Once the orthonormal matrix Uq is selected, a modified version of the
minimum-interference beamformer can be constructed. The projection opera-
tor P⊥[U q A m ] orthogonal to both the external interference associated with Uq
and the internal interference associated with Am is given by
P⊥
[U q A m ] = I − [Uq Am ] ([Uq Am ]† [Uq Am ])−1 [Uq Am ]†
† ⊥
= I − P⊥
U q Am (Am PU q Am )
−1
Am † − P⊥ † ⊥
A m Uq (Uq PA m Uq )
−1
Uq † .
(9.62)
The minimum-interference beamformer in external interference can be
constructed by modifying Equation (9.31) such that the beamformer must be
orthogonal to the other MIMO transmitters and orthogonal to the external
interference. The beamformer wm is given by
wm ∝ P⊥
[U q A m ] am , (9.63)
where Equation (8.128) can be used to estimate am and Am This beamformer

form does not have a sensible interpretation if the number of identified interferers
q plus the number of transmitters nt is greater than the number of receivers.
Over-constrained minimum interference

In the over-constrained regime, in which the number of external interferers plus
the number of transmitters is greater than the number of receive antennas, the
minimum-interference beamformer is proportional to the eigenvector associated
with the minimum eigenvalue of the interference-plus-noise covariance matrix
Qm ∈ Cn r ×n r ,
5 6
Xm X†m
Qm = Am A†m + J J + I , (9.64)
ns
where the notation from Equation (9.23) is employed. Because the interference
cannot be completely removed, the next best thing is to remove as much as you
can. This corresponds to the minimum eigenvalue of the interference-plus-noise
covariance matrix Qm . The beamformer w that achieves this goal is given by
the eigenvector em in associated with the minimum eigenvalue λm in of Qm that
satisfies
w ∝ em in
λm in em in = Qm em in . (9.65)
By using Equations (9.60) and (9.45), an estimate for the interference-plus-noise-

covariance matrix Q̂m can be evaluated, so that
1
Q̂m = Âm Â†m + Z P⊥ †
X Z , (9.66)
ns
where for this estimation it is assumed that X is a known training sequence. An

alternative estimate that is asymptotically equal to Equation (9.66) in the limit
of a large number of samples3 is given by
1
Q̂m = Z P⊥ †
xm Z . (9.67)
ns
9.2.3 MMSE spatial processing

There is a strong formal similarity between the spectral Wiener filter discussed
in Section 9.1 and the adaptive MMSE spatial filter. To be clear, it is assumed
commonly, as it is here, that MMSE indicates a linear MMSE implementation for
adaptive spatial processing. As one might expect, the performance of the MMSE
beamformer is problematic when the number of transmitters is larger than the
number of receivers [320].
The error matrix E ∈ Cn t ×n s at the output of the beamformer W ∈ Cn r ×n t
is given by
E = W† Z − X . (9.68)
% &
The mean-squared error E 2F between the output of a set of beamformers
and transmitted signals is given by
% & % &
E 2
F = W† Z − X 2F
% &
= tr (W† Z − X)(W† Z − X)† . (9.69)
3 This assumes that the signals from each transmit antenna are uncorrelated.
To minimize this error, the derivative with respect to some parameter α of the
matrix of beamformers W is set to zero,
∂ % &
E 2F = 0
∂α
∂ % &
= tr (W† Z − X)(W† Z − X)†
∂α) *
∂
= tr (W† Z − X)(W† Z − X)†
∂α
) *
∂
= tr Z (W† Z − X)† W + c.c.
∂α

% & % & ∂
= tr Z Z† W − Z X† W + c.c. , (9.70)
∂α
where c.c. indicates the complex conjugate of the first term. This relationship is
satisfied if for all variations in beamformers ∂W/∂α the argument of the trace
is zero. Consequently, the term within the parentheses is set to zero,
% & % &
Z Z† W − Z X† = 0
% &−1 % &
W = Z Z† Z X† . (9.71)
This form has an intuitive interpretation. The first term is proportional to the
inverse of the receive covariance (signal-plus-interference-plus-noise) matrix Q ∈
Cn r ×n r and the second term is proportional to an array response estimator.
Consequently, this beamformer attempts to point in the direction of the signals
of interest, but points away from interference sources.
With the assumptions % that
& the transmit covariance matrix is proportional to
4 †
the identity matrix
% X
& X = ns I and that the cross covariance is proportional
†
to the channel Z X = ns A, the mean-squared error for the MMSE beam-
former is given by
) % % † *
% & †
&% &
† −1 †
&% &
† −1
2
E F = tr XZ ZZ Z−X XZ ZZ Z−X
% 3 † −1 † −1 4&
= tr A Q Z − X Z Q A − X†
3 4
= ns tr A† Q−1 Q Q−1 A − 2A† Q−1 A + I
3 4
= ns tr I − A† Q−1 A
3 4
= ns tr I − A† (A A† + R)−1 A . (9.72)
For practical problems, the expectations in Equation (9.71) cannot be known
exactly. The expectations can be approximated over some finite number of sam-
ples ns . If ns nr and ns nt , then the expectations can be approximated
well by
% &
Z Z† ≈ Z Z†
% &
Z X† ≈ Z X† . (9.73)
4 This assumption implies that the MIMO link is operating in an uninformed mode.
By using these relationships the set of approximate MMSE beamformers in the

columns of W are given by
−1
W ≈ Z Z† Z X† , (9.74)
which is also the least-squared error solution.
9.2.4 Maximum SINR

A reasonable goal for a beamformer is for it to maximize the SINR performance
for a given transmitter. Here it is assumed that both the internal and the external
interference are mitigated spatially. To be clear, as will be discussed in Section
9.4, for MIMO systems this beamformer is not necessarily optimal in terms
of capacity if the receiver does not take into account the correlations between
beamformer outputs. The output of a beamformer x̂m associated with the mth
transmitter is given by
†
x̂m = wm Z, (9.75)
where x̂m ∈ C1×n s is the output of the beamformer trained for the mth trans-
mitter, and wm ∈ Cn r ×1 is the receive beamformer. The received data matrix
can be represented by
Z = AX + N
Z = am xm + Am Xm + N , (9.76)
where the channel and transmitted signal associated with the mth transmitter
are indicated by am ∈ Cn r ×1 and xm ∈ C1×n s and the Am ∈ Cn r ×(n t −1) and
Xm ∈ C(n t −1)×n s indicate the channel matrix without the column associated
with the mth transmitter and the transmit signal matrix without the row asso-
ciated with the mth transmitter, respectively.
The SINR at the output of the mth beamformer is given by the ratio of
power at the output of the beamformer associated with the signal to the power
associated with the interference plus noise. For a particular beamformer adapted
to a particular channel realization (which implies that the expectation is taken
over the noise and training sequences), the SINRm is given by
% † &
wm am xm 2
SINRm = 5 6
†
wm (Am Xm + N) 2
†
% &
wm am xm x†m a†m wm
= 5 6
†
wm (Am Xm X†m + NN† )wm
†
ns wm am a†m wm
= †
wm (ns Am A†m + ns R) wm
†
wm am a†m wm
= †
, (9.77)
wm Qm wm
where the covariance matrix for the received internal and external interference
for the mth transmitter is indicated by Qm = Am A†m + R, assuming external-
interference-plus-noise covariance matrix R. It is assumed here that the transmit
covariance matrix is proportional to the identity matrix, Xm X†m /ns = In t −1 .
For the beamformer wm that maximizes the SINR for the mth transmitter,
the SINR is found by
†
wm am a†m wm
wm = argmaxw m †
. (9.78)
wm Qm wm
1/2
By employing the change of variables, η m = Qm wm , the optimization is equiv-
alent to
−1/2 −1/2
−1/2 η †m Qm am a†m Qm ηm
wm = Qm argmaxηm . (9.79)
η †m ηm
The value of η m that solves this form is proportional to the eigenvector of the
−1/2 −1/2
matrix Qm am a†m Qm , which is rank-1 and is constructed from the outer
product of the interference-plus-noise whitened (as introduced in Section 8.3.1)
channel vector. The eigenvector that solves this equation is proportional to the
whitened channel vector. Consequently, the beamformer wm for the mth trans-
mitter that maximizes the SINRm is given by
−1/2 −1/2 −1/2
wm = Qm η m = Qm Qm am
= Q−1
m am . (9.80)
While the structure of the beamformer is formally satisfying because the con-
tributions of all interfering sources are reduced by the matrix inverse, the form
assumes exact knowledge of the model parameters. However by using either
Equation (9.66) or Equation (9.67) along with Equation (9.25), an estimate of
the beamformer can be evaluated.
Maximum SINR and MMSE beamformers equivalence

It is interesting that the maximum SINR and MMSE beamformers are pro-
portional to each other and are consequently equivalent in SNR, SINR, and
capacity terms under the assumption
% & of a single transmitter or orthogonal train-
ing sequences, such that X X† ∝ I. To demonstrate this, consider the max-
M SINR M M SE
imum SINR wm and MMSE wm beamformers optimized for the mth
transmitter,
M SINR
wm = Q−1
m am
% &−1 % &
M M SE
wm = Z Z† Z X† em
= Q−1 am , (9.81)
where it has been assumed that training sequences from each antenna are or-
thogonal. By assuming the signals associated with each transmitter in the MIMO
system are uncorrelated, the received signal covariance matrix Q and the
internal-plus-external interference-plus-noise covariance matrix Qm are related

by
Qm = Q − am a†m . (9.82)
The equivalence between beamformers is demonstrated if the two are shown

to be proportional:
M SINR
wm = Q−1
m am
= (Q − am a†m )−1 am
= (Q − am a†m )−1 Q Q−1 am
= (I − Q−1 am a†m )−1 Q−1 am

Q−1 am a†m
= I+ Q−1 am
1 + a†m Q−1 am

a†m Q−1 am
= 1+ Q−1 am
1 + a†m Q−1 am

a†m Q−1 am M M SE
= 1+ †
wm
−1
1 + am Q am
∝ wm
M M SE
. (9.83)
9.3 SNR loss performance comparison
The SNR loss is used here as the metric of performance to compare the minimum
interference and MMSE beamformer approaches. The SNR loss provides a mea-
sure of the loss caused by mitigating interference. It is given by the ratio at the
output of an adaptive beamformer in the presence and absence of interference.
This metric does not address how well the interference is mitigated. Rather, it
provides insight into the cost in reducing SNR induced by attempting to mitigate
the interference. This may be of value when comparing various system concepts
that do or do not require interference mitigation such as time-division multiple-
access schemes. To simplify this analysis, it is assumed that there is a single
source of interest and a single interferer.
In the absence of interference, the received signal for some block of data Z
from the transmitter of interest is given by
Z = a0 x0 + N . (9.84)
In the above equation, the received signal matrix is indicated by Z ∈ Cn r ×n s , the

amplitude-channel product vector is indicated by a0 ∈ Cn r ×1 , the transmitted
signal row vector is indicated by x0 ∈ C1×n s , and the noise is indicated by
N ∈ Cn r ×n s .
In the line-of-sight environment without scatterers, the amplitude-channel vec-

tor a0 is proportional to the steering vector v(θ0 ),
a0 = a0 v(θ0 ) , (9.85)
where a0 is the constant of proportionality and θ0 is the direction to the trans-
mitter of interest. It is conceptually useful sometimes to display SNR loss as a
function of the angle between signals of interest and interfering signals. However,
for most of this discussion, the array response is represented in the more general
form of a0 .
The SNR at the output of the adaptive beamformer w0 is given by the ratio
of the signal power to the noise power and is defined to be ρ0 ,
5 6
w0† a0 x0 2
ρ0 = 5 6
w0† N 2
% &
x0 2 w0† a0 2
=
w0† In r w0 2
w0† a0 2
= . (9.86)
w0 2
For beamformer w0 , it is assumed that the noise covariance matrix is propor-
tional to the identity matrix,
% &
N N† = ns In r , (9.87)
and that the transmit sequence is normalized such that it has unit variance per
sample,
% &
x0 2 = ns . (9.88)
As was discussed in Section 9.2.1, the optimal adaptive beamformer in the
absence of interference is the matched spatial filter,
w0 ∝ a0 . (9.89)
Consequently, the SNR at the output of the matched spatial filter is equal to the
ratio ρ0 ,
a†0 a0 2
ρ0 =
a0 2
a†0 a0 2
=
a†0 a0
2
= a0 . (9.90)
The received signal matrix Z in the presence of a single interferer is given by
Z = a 0 x0 + a 1 x1 + N , (9.91)
where the interfering signal is indicated by the subscript 1. The SNR ρ0|1 (as
opposed to the SINR) at the output of a beamformer in the presence of an
interferer is given by the ratio of the signal power at the output of the beamformer
to the noise power at the output of the beamformer,
% &
w † a 0 x0 2
ρ0|1 =
w† N 2
% &
x 0 2 w † a0 2
=
w† N N† w
w † a0 2
= . (9.92)
w 2
SNR loss is given by the ratio of the SNR after mitigating interference to the
SNR in the absence of interference,
ρ0|1
SNR Loss =
ρ
0
w † a 0 2
ρ0|1 w 2
=
ρ0 a0 2
w † a0 2
= . (9.93)
w 2 a0 2
As with many metrics in engineering, the SNR loss ratio is often expressed on a
decibel scale. When expressed in a linear regime, its value is bounded by 0 and
1. However, when expressed on a decibel scale the sign is sometimes inverted.
9.3.1 Minimum-interference beamformer

The SNR loss for the minimum-interference beamformer, assuming that there is
a single-antenna transmitter and a single-antenna interference, is found by sub-
stituting the minimum-interference beamformer into Equation (9.93). By using
the notation P⊥ ⊥
1 is the spatial projection matrix PA m for m = 1, the beamformer
w under this simplified scenario is given by
P⊥
1 a0
w=
P⊥
1 a0
∝ P⊥
1 a0
= [I − a1 [a†1 a1 ]−1 a†1 ] a0

a1 a†1
= I− a0 . (9.94)
a1 2
By using this form of the minimum-interference beamformer, the SNR loss is

given by
w † a0 2
SNR lossM I =
a0 2 w 2
$8 † $
$ † 9 $2
$ I − a 1 a 12 a0 a0 $
$ a 1 $
= $8 † 9
$2
$ a a $
a0 2 $ I − a1 1 12 a0 $
$ 8 9 $2
$ † a a† $
$a0 I − a1 1 12 a0 $
= $ 8 9 $
$ a a† $
a0 2 $a†0 I − a1 1 12 a0 $

1 † a1 a†1
= a I− a0
a0 2 0 a1 2
a†1 a0 2
=1− . (9.95)
a0 2 a1 2
9.3.2 MMSE beamformer

Similarly, the SNR loss for the MMSE beamformer, assuming that there is a
single-antenna transmitter and a single-antenna interference, is found by sub-
stituting the MMSE beamformer into Equation (9.93). By quoting the result in
Equation (9.71), the MMSE beamformers W are given by
% &−1 % &
W = Z Z† Z X† . (9.96)
Under the assumption that there is a single transmitter of interest (otherwise

you would need multiple beamformers), the beamformer w ∈ Cn r ×1 is given by
% &−1 5 † 6
w = Z Z† Z x0 , (9.97)
where x0 is the normalized transmitted sequence of the signal of interest. The

spatial received covariance matrix is given by
1 % &
Q= Z Z†
ns
1
= [a0 x0 + a1 x1 + N]
ns
&
· [a0 x0 + a1 x1 + N]†
= a0 a†0 + a1 a†1 + In r , (9.98)
where it is assumed that the noise, x0 , x1 are all independent and have unit
variance per sample. From Equation (2.116), the inverse of the rank-2 matrix
plus the identity matrix is given by

a0 a†0 a1 a†1 a† a1 2
(I + a0 a†0 + a1 a†1 )−1 =I− †
+ †
1+ 0
1 + a0 a0 1 + a 1 a1 γ

1 †
+ a a1 a0 a†1 + a†1 a0 a1 a†0 , (9.99)
γ 0
where
γ = 1 + a†0 a0 + a†1 a1 + a†0 a0 a†1 a1 − a†0 a1 2
= (1 + a†0 a0 )(1 + a†1 a1 ) − a†0 a1 2

. (9.100)
The received signal covariance matrix Q represented by an identity matrix

plus the sum of two rank-1 matrices is given by
Q = I + a0 a†0 + a1 a†1 . (9.101)
The inner product φ of the vectors a0 and a1 is given by
φ = a†0 a1 = a0 a1 α
∗
φ = a†1 a0 . (9.102)
Here α represents the normalized inner product between the vectors a0 and a1 ,
using the definition
a†0 a1
α= . (9.103)
a0 a1
The received signal versus reference correlation term is given by
5 6 5 6
Z x†0 = [a0 x0 + a1 x1 + N] x†0
5 6
= a0 x0 x†0
= a0 ns . (9.104)
The MMSE beamformer w under this simplified scenario is given by
w = Q−1 a0 . (9.105)
By substituting the form found in Equation (9.99), the MMSE beamformer is

given by

a0 a†0 a1 a†1 a†0 a1 2
w = a0 − + 1+ a0
1 + a†0 a0 1 + a†1 a1 γ
1 †
+ a0 a1 a0 a†1 + a†1 a0 a1 a†0 a0
γ

a†0 a0 a†1 a0 a†0 a1 2
= a0 − a0 + a1 1+
1 + a†0 a0 1 + a†1 a1 γ
a†0 a1 a†1 a0 a† a0 a†0 a0
+ a0 + 1 a1
γ γ

a0 2 φ 2 φ 2
= 1− 2
1+ + a0
1 + a0 γ γ
∗
φ a0 2 φ∗ φ 2
+ − 1+ a1
γ 1 + a1 2 γ
= k0 a0 + k1 a1 , (9.106)
where k0 and k1 are used for notational convenience and are given by
γ = 1 + a0 + a1 2 + a0 2 a1 2 − φ
2 2

a0 2 φ 2 φ 2
k0 = 1 − 2
1 + +
1 + a0 γ γ
1 + a1 2
=
γ

φ∗ a0 2 φ∗ φ 2
k1 = − 1 +
γ 1 + a1 2 γ
∗
φ
=− . (9.107)
γ
It is worth noting that k0 is real while k1 is complex.

Substituting the above form for w in the form from Equation (9.93), the SNR
loss for the MMSE beamformer is given by
w † a0 2
SNR lossM M SE =
w 2 a0 2
$ † $2
$w a0 $
= 2 . (9.108)
w 2 a0
$ $2
The two terms of interest are $w† a0 $ and w 2 , given by
$ † $2 $ $2
$w a0 $ = $ † $
$a0 (k0 a0 + k1 a1 )$
$ $2
= $(k0 a0 2 + k1 φ)$
1 2
= 2 (1 + a1 2 ) a0 2 − φ 2
γ
1 2
= 2 (1 + a1 2 ) a0 2 − a0 2 a1 2
α2
γ
a0 4 2
= 2
(1 + a1 2 ) − a1 2 α2
γ
a0 4 2
= 2
1 + a1 2 (1 − α2 ) , (9.109)
γ
where α is the normalized inner product from Equation (9.103), and
w 2
= w† w
= (k0 a0 + k1 a1 )† (k0 a0 + k1 a1 )
= (k02 a0 2 + k0 k1 φ + k0 k1∗ φ∗ + k1 2 a1 2 )
1
= 2 ([1 + a1 2 ]2 a0 2 − 2[1 + a1 2 ] φ 2 + φ 2 a1 2 )
γ
a0 2
= ([1 + a1 2 ]2 − 2[1 + a1 2 ] a1 2 α2 + a1 4 α2 )
γ2
a0 2 2
= [α + (1 + a1 2 )2 (1 − α2 )] . (9.110)
γ2
Consequently, the SNR loss is given by
w † a0 2
SNR lossM M SE =
w 2 a0 2
2
1 + a1 2 [1 − α2 ]
= 2 . (9.111)
α + (1 + a1 2 )2 (1 − α2 )
In the limit of strong interference, the interfering term a1 becomes large, and the
SNR loss converges to that of the minimum-interference beamformer described
in Equation (9.95),
2
1 + a1 2 [1 − α2 ]
lim SNR lossM M SE = lim
a 1 →∞ a 1 →∞ α2 + (1 + a1 2 )2 (1 − α2 )
2
a1 2 [1 − α2 ]
= lim
a 1 →∞ ( a1 2 )2 (1 − α2 )
2
1 − α2
= = 1 − α2 . (9.112)
1 − α2
As an aside, in the case of a single signal of interest and single interferer, the
MMSE beamformer is the maximum SINR beamformer discussed in
Section 9.2.4. Consequently, the above analysis is valid for the maximum SINR
beamformer for this particular problem definition.
9.4 MIMO performance bounds of suboptimal adaptive receivers
Many of the advantages made possible by MIMO systems, as discussed in Chap-

ter 8, depend strongly upon the details of the receiver implementation [28, 32].
An important motivation for using suboptimal receivers is computational com-
plexity. Some suboptimal receivers were discussed earlier in this chapter. Compu-
tational complexity of various receivers can vary by many orders of magnitude.
The performance of MIMO links is strongly tied to the details of the coding
and modulation in addition to the receiver, and there is no simple method for
estimating overall link performance. The demodulation performance variation
between various coding and receiver combinations can also be dramatic [8, 207].
However, it is desirable to develop a set of bounds that are independent of the
details of the coding. Rather, these bounds assume ideal codes, but potentially
suboptimal spatial receivers. By incorporating various receiver constraints as
part of the MIMO channel, estimates on the performance bounds under these
constraints can be found.
For uninformed transmitter MIMO links, we consider in this section three re-
ceivers: the minimum-interference, MMSE, and optimal. The optimal receiver
achieves capacity. An additional variable considered in this section is the effect
of estimating the interference-plus-noise covariance matrix. For computational
and logistical reasons, many receivers make the simplifying assumption that the
interference-plus-noise covariance matrix is proportional to the identity matrix.
If a carrier-sense multiple access (CSMA) approach is employed, this simplifying
assumption may have some validity, although it is common in wireless networks
to operate in dynamic interference. In the presence of interference, this simplify-
ing model assumption can have a significant adverse effect upon link performance
compared to the optimal receiver.
The information-theoretic capacity of MIMO systems was discussed in Section
8.3 for a flat-fading environment. As discussed in that section, the bounds on
spectral efficiency can be separated into two classes that are defined by whether
the system has an informed transmitter (in which the transmitter has chan-
nel matrix and a statistical characterization of the external interference) or an
uninformed transmitter (in which only the receiver has channel state informa-
tion). Here bounds for the uninformed transmitter are considered.
The basic premise of the development of the bounds discussed in this sec-
tion is that the receiver will disregard some of the information available to it.
In particular, by considering a set of new channels defined by the output of
a set of beamformers optimized for each transmitter in turn, the performance
bounds are developed. In principle, the effect of an invertible operation, such
as a beamformer, on the received signal would have no effect upon the mutual
information between the transmitted signal and observed signal. However, here
it is assumed that the receiver ignores any potential performance gain available
from considering the correlations between beamformer outputs. In particular,
it is assumed that a beamforming receiver can only decode a single signal at
the output of each beamformer. This assumption is not valid for the optimal re-
ceiver or multiple-user type approaches that mix temporal and spatial mitigation
as discussed in Section 9.6 and in References [98, 324, 323, 69]. However, this
limitation is a reasonable approximation to the bounding performance for some
receivers that separate the transmitted signal by using receive beamformers and
ignoring the correlation between noise at the beamformer outputs.5
9.4.1 Receiver beamformer channel

In order to determine the effects of adaptive beamforming techniques, one can
incorporate the beamformer as part of the channel [32]. This can be done by
considering a set of adaptive receive beamformers in the columns of W ∈ Cn r ×n t ,
each optimized for a given transmit antenna. If a single temporal sample of the
observed signal at the receive array is given by z ∈ Cn r ×1 , then the corresponding
single sample of the signal vector y ∈ Cn t ×1 at the output of a set of beamformers
optimized for each transmitter is given by
y = W† z
= W† H s + W† n , (9.113)
for the transmitted signal s ∈ Cn t ×1 and external-interference-plus-noise n ∈

Cn r ×1 . The dimension at the output of the beamformer is given by the number
of transmit antennas rather than the number of receive antennas. This interpre-
tation of the effective channel implies
H ⇒ W† H , (9.114)
that is, the beamformer is subsumed into the channel. It is typical for beamform-
ers to attempt to reduce the correlations between beamformer outputs because
they mitigate interference associated with other transmit antennas. However,
there is typically some remaining correlation between beamformer outputs. Sim-
ilarly, this interpretation implies that the noise-plus-interference covariance ma-
trix becomes
% †& % &
n n ⇒ W† n n† W . (9.115)
Depending upon the beamforming approach, the signals of interest may suffer
SNR losses that may be significant, and the noise at the outputs of the beam-
formers may become correlated. The beamformers may or may not attempt to
estimate parameters of the external interference and thus may or may not miti-
gate it.
5 Portions of this section are IEEE
c 2004. Reprinted, with permission, from Reference [32].
To analyze the various performance bounds, the entropies associated with

different beamformer models are developed. If the receiver can take into account
the correlations between the beamformer outputs, then the entropy is given
by the entropy for y = W† z from Equation (8.17) rather than z, under the
replacements
R ⇒ W† R W

Po
H P H† ⇒ W † H I H† W . (9.116)
nt
The resulting bound on spectral efficiency, which is analogous to that developed

in Section 8.3 for the uninformed transmitter, is given by

Po †
hbf (y|H, R) = log2 πe W† R W + W H H† W (9.117)
nt

†
hbf (y|s, H, R) = log2 πe W R W . (9.118)
The resulting capacity is given by

† −1 Po †
cbf
= log2 In r + W R W W H H W .
†
(9.119)
UT
nt
If W is invertible, then

† −1 Po

cbf = log I
2 nr + W −1
R −1
W W †
HH†
W
UT
nt

Po
= log2 In r + R−1 H H†
nt
= cU T , (9.120)
and the capacity is the same as in the absence of the beamformer. This is not
surprising because the effect of the beamformers W on the channel H in Equation
(9.114) can be reversed if W is invertible.
In the case of a receiver based on beamformers that does not share information
across beamformer outputs, such as MMSE or minimum interference discussed
in Section 9.2, the form of the bound is modified. In this case, there is a separate
beamformer optimized for each transmitter. The interference power that could be
employed to jointly estimate signals instead contributes power to the noise-like
entropy term of the capacity.
We attempt to approximate the effects of ignoring the correlations between
beamformer outputs by evaluating the entropies while ignoring the correlations.
Because knowledge about {s}m is not used by the beamformer to remove inter-
ference for {y}k (for k = m), the entropy for the noise-like component becomes
the sum of entropies assuming independent sources.
The entropy for the mth beamformer hu c,m (y|H, R) is bounded by

† Po †
hu c,m (y|H, R) ≤ log2 πe wm R wm + wm H H† wm
nt

P
†
Hm H†m wm
o
= log2 πe wm R+
nt

Po †
+ wm hm h†m wm . (9.121)
nt
The resulting noise-like entropy hu c,m (y|s, H, R) for the mth beamformer is
given by

Po
hu c,m (y|s, H, R) = log2 †
πe wm R+ Hm H†m wm . (9.122)
nt
Here it is observed that the mean noise-like output (which includes residual
interference signals) of each beamformer is given by

Po
†
wm R+ Hm H†m wm . (9.123)
nt
The resulting approximate spectral-efficiency bound cu c (which is not the channel

capacity in general) for beamformers under the receiver assumption of uncorre-
lated residuals is defined by

nt
cu c = [hu c,m (y|H, R) − hu c,m (y|s, H, R)]
m
−1

nt
Po
= †
log2 1 + wm R+ Hm H†m wm
m
nt

Po †
× wm hm 2
, (9.124)
nt
where the beamformer represented by wm depends upon the choice of receiver.

As one would intuitively expect, the uninformed transmitter MIMO capacity is
an upper bound on the beamformer channel capacity6 cU T ≥ cu c . For simplicity,
it is assumed R = I, and the result can be generalized for a nonzero external
covariance matrix (see Problem 9.6). In addition,to simplify this evaluation
notationally, the amplitude-channel product A = Po /nt H is employed. The
6 This argument is due to suggestions made by Keith Forsythe.

inequality is demonstrated by

Po
cU T
= log2 I + †
HH
nt
= log2 |I + AA† |
≥ cu c
−1

nt
Po Po
cu c = log2 1 + †
wm I+ Hm H†m wm †
wm hm 2
m
nt nt

nt 8 † 3 4 −1 9
= log2 1 + wm I + A A† − am a†m wm †
wm am 2
, (9.125)
m
where am is the amplitude-channel product associated with the mth transmitter.

For the mth beamformer, the spectral efficiency bound denoted here as [cu c ]m can
be no larger than the spectral efficiency bound [cM uc
SINR
]m under the assumption
of the maximum SINR beamformer (which is equivalent to MMSE in this case
as discussed in Section 9.2.4) associated with the mth transmitter,
8 † 3 4 −1 9
[cu c ]m = log2 1 + wm I + A A† − am a†m wm †
wm am 2
≤ [cM
uc
SINR
]m
8 0 −1/2
= log2 λm ax I + I + A A† − am a†m am
−1/2 19
· a†m I + A A† − am a†m
8 −1 9
= log2 1 + a†m I + A A† − am a†m am , (9.126)
"n t
where λm ax {·} indicates the largest eigenvalue, such that m =1 [cu c ]m = cu c
"n t
and m =1 [cMuc
SINR
]m = cM
uc
SINR
. Consequently, any spectral efficiency bound for
beamformers under the receiver assumption of uncorrelated channels is bounded
by
8 −1 9
cM
uc
SINR
= log2 1 + a†m I + A A† − am a†m am . (9.127)
m
The uninformed transmitter capacity can be rewritten in its successive

interference cancellation form
cU T = log2 |I + A A† |
"2 "3 "4
|I + a1 a†1 | |I + m =1 am a†m | |I + m =1 am a†m | |I + m =1 am a†m | · · ·
= log2 "2 "3
|I| |I + a1 a†1 | |I + m =1 am a†m | |I + m =1 am a†m | · · ·
"m −1
|I + j =1 aj a†j + am a†m |
= log2 "m −1
m |I + j =1 aj a†j |
⎛ ⎞−1

m −1

= log2 I + ⎝I + aj aj ⎠ am am
† †

m j =1
⎡ ⎛ ⎞−1 ⎤

m −1
⎢ ⎥
= log2 ⎣1 + a†m ⎝I + aj a†j ⎠ am ⎦
m j =1

= [cU T ]m , (9.128)
m
where [cU T ]m indicates the mth term in the sum. Finally, it can be seen that
each term for the largest bound cM uc
SINR
(found in Equation (9.127)) is still less
than the successive interference cancellation term of the uninformed transmitter
capacity,
⎡ ⎛ ⎞−1 ⎤

m −1
⎢ ⎥
[cU T ]m = log2 ⎣1 + a†m ⎝I + aj a†j ⎠ am ⎦
j =1
8 −1 9
≥ log2 1 + a†m I + A A† − am a†m am
= [cM
uc
SINR
]m . (9.129)
This bound is found by observing

⎛ ⎞ ⎛ ⎞

m −1
nr
I + A A† − am a†m = ⎝I + aj a†j ⎠ + ⎝ aj a†j ⎠ (9.130)
j =1 j =m +1
and for any complex vector x and positive definite Hermitian matrices B and
positive semidefinite Hermitian matrix C
x† (B + C)−1 x ≤ x† B−1 x , (9.131)
where the equality is achieved if C = 0.

9.5 Iterative receivers
Iterative receivers are useful when sample matrix inversion (SMI) is not com-
putationally feasible. Implicitly, the typical sample matrix inversion approach
assumes that the environment is blockwise stationary. In some sense, the un-
derlying assumption of some continuously adapting iterative receivers, such as
recursive least squares (RLS) or least mean squares (LMS), can be a better match
to continuously changing environments. In practice, the choice between using a
sample matrix inversion or an iterative approach is usually driven by logisti-
cal and computational considerations. More thorough investigations of RLS and
LMS algorithms for adaptive spectral filtering can be found in Reference [142].
9.5.1 Recursive least squares (RLS)

The basic concept of the recursive least squares beamformer is to recursively
estimate the two components of the estimated MMSE beamformer which is ap-
proximated by the least squared beamformer under finite sample support. The
estimates for the inverse of the covariance matrix and the cross-correlation ma-
trix are modified at each update [142], for example. By quoting the result and
by using the notation in Section 9.2.3, the MMSE beamformers W ∈ Cn t ×n r are
given by
% &−1 % &
W = Z Z† Z X† . (9.132)
Here it is assumed that the reference signal X ∈ Cn t ×n s is known. Decision

feedback extensions to this approach, not discussed here, employ estimates of
the transmitted signal as a reference. The spatial covariance matrix Q ∈ Cn r ×n r
and the data-reference cross-covariance matrix V ∈ Cn r ×n t are given by
% &
Z Z†
Q=
ns
and
% &
Z X†
V= , (9.133)
ns
respectively, where ns is the number of samples in the block of data. For the
mth update, Qm and Vm indicate estimates of the receive covariance matrix Q
and the cross-covariance matrix V, respectively.
A column of Z is denoted zm ∈ Cn r ×1 and is the mth observation. A column
of X is denoted xm ∈ Cn t ×1 is the mth vector of known symbols transmitted.
The (m + 1)th updated estimate of the data-reference cross-covariance matrix
Vm +1 is given by
m Vm + zm +1 x†m +1
Vm +1 = . (9.134)
m+1
Just so that the notation does not become too cumbersome, we have dropped
the notation ˆ· for estimated values in this discussion. If the observed data vector
zm is drawn from a stationary distribution, then the estimated data-reference
cross-covariance matrix converges to the exact solution V,
lim Vm = V . (9.135)
m →∞
Similarly, the (m+1)th updated estimate of the receive spatial covariance matrix
Qm +1 of the received signal is given by
m Qm + zm +1 z†m +1
Qm +1 = , (9.136)
m+1
and under the same assumption for the data vector z, the estimated receive
spatial covariance matrix converges to the exact solution Q,
lim Qm = Q . (9.137)
m →∞
In practice, environments are not completely stationary. Consequently, it is

of some value to include a memory limitation. This can be done by including a
weighting parameter β rather than a function of m. The value of the weighting
parameter β is typically fixed. For large m, the older contributions are given a
smaller weighting compared to recent data. Over successive updates, the weight
of older contributions falls exponentially. While this exponential weighting does
allow the beamformer to adapt to nonstationary environments, the beamformer
will not converge to the exact solution. Under this weighting, the estimation
updates for the (m+1)th updated estimate of the data-reference cross-covariance
matrix Vm +1 and the (m+1)th updated estimate of the receive spatial covariance
matrix Qm +1 are given by
β Vm + zm +1 x†m +1
Vm +1 = (9.138)
β+1
and
β Qm + zm +1 z†m +1
Qm +1 = . (9.139)
β+1
The value of the weighing parameter 0 ≥ β ≥ 1 is typically set near 1, although

the exact value needs to match to the dynamics of the environment.
Updates for the estimate of the inverse of the covariance matrix can be found
directly. From Equation (2.113), the Woodbury formula is given by
(M + A B)−1 = M−1 − M−1 A (I + B M−1 A)−1 B M−1 . (9.140)
By using this relationship, the updated estimate of the inverse of the receive
covariance matrix Q−1
m +1 can be found,
†
Q−1
m +1 = (β + 1) (β Qm + zm +1 zm +1 )
−1
†

−1 −1
(β Q m ) z m +1 z (β Qm )
= (β + 1) (β Qm )−1 − m +1
. (9.141)
1 + z†m +1 (β Qm )−1 zm +1
By combining the results for the cross-correlation matrix update and the in-
verse of the receive covariance matrix update, the (m + 1)th updated estimate
of the beamformers Wm +1 is given by
Wm +1 = Q−1
m +1 Vm +1

−1 (β Qm )−1 zm +1 z†m +1 (β Qm )−1
= (β Qm ) −
1 + z†m +1 (β Qm )−1 zm +1
· (β Vm + zm +1 x†m +1 )
†

Q−1 −1
m zm +1 zm +1 Qm zm +1 x†m +1
= Q−1
m − Vm +
β + z†m +1 Q−1
m zm +1 β
zm +1 x†m +1
= Wm + Q−1m
β

Qm zm +1 z†m +1 Q−1
−1
m zm +1 x†m +1
− Vm + . (9.142)
β + z†m +1 Q−1
m zm +1 β
It is often convenient to express the (m + 1)th updated of the beamformer

in terms of the error qm +1 between the output of the beamformer and the
reference,
†
qm +1 = Wm zm +1 − xm +1 . (9.143)
By substituting xm +1 in terms of the error into the relationship for the beam-
former update, the simpler form is given,
†
zm +1 (Wm zm +1 − qm +1 )†
Wm +1 = Wm + Q−1m
β
†

−1 −1 †
Qm zm +1 zm +1 Qm zm +1 (Wm zm +1 − qm +1 )†
− Vm +
β + z†m +1 Q−1
m zm +1 β

zm +1 q†m +1 Q−1 †
m zm +1 zm +1 Qm
−1
zm +1 q†m +1
= Wm − Q−1 +
β + z†m +1 Q−1
m
β m zm +1 β
zm +1 z†m +1
+ Q−1
m Wm
β
†
†

Q−1
m zm +1 zm +1 Q−1
m zm +1 zm +1
− I+ Wm
β + z†m +1 Q−1
m zm +1 β
†
†
Q−1
m zm +1 zm +1 Q−1
m zm +1 qm +1
= Wm − I −
β + z†m +1 Q−1
m zm +1 β
zm +1 z†m +1
+ Q−1
m Wm
β
⎛ ⎞
1 z† Q−1 zm +1 zm +1 z†m +1
−⎝ ⎠ 1 + m +1 m Q−1
m Wm
z †m + 1 Q −1
m zm + 1 β β
1+ β

1 †
= Wm − Q−1
m zm +1 qm +1 . (9.144)
β+ z†m +1 Q−1
m zm +1
9.5.2 Least mean squares (LMS)

The least mean squares algorithm has a simple interpretation. It attempts to
modify the beamformer along a direction that minimizes the error based upon
the current observation [345, 344]. As with the recursive least squares, here it is
assumed that the reference signal x is known. Decision feedback extensions to
this approach, not discussed here, employ estimates of the transmitted signal as
a reference. In this discussion, a beamformer for each transmit antenna is con-
structed independently, the beamformer associated with each transmit antenna
at the mth update, wm , can be considered separately. The error for the current
output m with the current beamformer is
†
m = wm zm − xm , (9.145)
where xm is the transmitted signal for the mth update.

The expected squared-error power at the mth update is given by
% & % † &
m 2 = wm zm − xm 2 . (9.146)
The goal of the LMS algorithm is to minimize this error. The direction of steepest
descent, discussed in Section 2.12, is given by evaluating the additive inverse of
the derivative of the error with respect to each of the elements of the beamformer.
The nth element of the beamformer is denoted {wm }n . The complete gradient
denoted by 2∇w m∗ from Section 2.8.4 denotes a vector of Wirtinger calculus
derivatives (as discussed in Section 2.8.4) with respect to each of the beamformer
elements. The gradient of the error is given by
% &
2∇w m∗ m 2
= 2∇w m∗ m ∗m
% † &
= 2∇w m∗ (wm zm − xm )(z†m wm − s∗m )
= 2 ∗m zm (9.147)
so that the difference between the updated and the current receive beamformer
is given by
% &
wm +1 − wm ∝ −2∇w m∗ m 2
= −2 ∗m zm . (9.148)
The main contribution of the LMS algorithm is to suggest the relatively ques-
tionable approximation that the expected value above can be replaced with the
squared error associated with the form
% &
2∇w m∗ m 2 ≈ 2∇w m∗ m 2
= 2 ∗m zm . (9.149)
By using the above approximation, the LMS update to the beamformer is

given by
wm +1 − wm ∝ −2 ∗m zm
= −2 (z†m wm − x∗m ) zm . (9.150)
To reduce the sensitivity to noise, a gradient attenuation constant is introduced.

If the constant is given by μ, then the updated beamformer wm +1 is given by
wm +1 = wm − μ 2 ∗m zm . (9.151)
Smaller values of the constant μ will improve the stability of the beamformer
update by reducing its sensitivity to noise, while larger values of the constant μ
will enable the beamformer to adapt more quickly. If the value of μ is smaller
than the multiplicative inverse of the largest eigenvalue of the receive covariance
matrix, then the beamformer will converge to the MMSE beamformer for a wide-
sense stationary environment.
It is sometimes useful to consider the normalized least-mean-squared (NLMS)
update. For this version of the update, the constant of proportionality μ is re-
placed with μ̃/(z†m zm ). This form reduces the sensitivity to the scale of z when
selecting μ̃.
9.6 Multiple-antenna multiuser detector
The underlying assumption in this chapter is that multiple users are transmit-
ting simultaneously in the same band at the same time. Often the transmitters
use spreading sequences to reduce multiple-access interference. This interference
could be from multiple antennas on a single transmit node, or from multiple
nodes in a network. The significant difference between receivers discussed in this
chapter and those discussed previously is that here the separation in temporal
structure between users is exploited in addition to the differences in spatial re-
sponses. Often it is assumed that these systems are employing a direct-sequence
spread-spectrum technique.
There is some inconsistency in the use of “linear” versus “nonlinear” in dis-
cussions regarding multiple-user receivers. Often these receivers are implemented
as iterative receivers in which the receiver operates on the same block of data
multiple times. An iterative receiver is not linear in some sense. However, if the
receiver employs a linear operator applied to some space, then it is generally de-
noted a linear receiver. In this case, separation between various receive states is
separated by a hyperplane in some high-dimensional space. Conversely, nonlinear
receivers separate receive states in both angle and amplitude [184]. To further
complicate this discussion, receivers that exploit spatial and temporal structures
simultaneously are linear in each domain.
There is a significant body of literature dedicated to multiuser detectors (MUD).
A large portion of this literature is dedicated to systems with a single receive
antenna and multiple cochannel users with single transmit antennas. Significant
contributions to this area were made in References [324, 323].
Multiple-antenna multiuser detectors (also denoted multiple-channel multiuser
detectors or MCMUD) have been discussed by a number of authors [98, 38, 335].
While these concepts were developed for cellular networks, they can be applied
to MIMO receivers. In particular, they are well matched to bit-interleaved, coded
modulation approaches [28].
9.6.1 Maximum-likelihood demodulation

The maximum-likelihood formulation introduced in Equation (9.4) is extended
here to take advantage of multiple receive antennas. The multiple-channel mul-
tiuser detector discussed in References [98, 38] is presented here.7 Under a Gaus-
sian noise model assumption, the maximum-likelihood statistic is given by
−n s n r
πe
max p(Z|X; R, A) = |Z P⊥ † −n s
X Z | , (9.152)
R,A ns

where the matrix

P⊥
X = In s − PX
PX = X† (X X† )−1 X , (9.153)
given that P⊥ X projects onto the orthogonal complement of the row space of
X. The determinant of ZP⊥ †
X Z is minimized to demodulate the signals for all
transmitters jointly.
This result is developed in the following derivation. Under the assumption of
Gaussian external interference and noise, the probability density of the received
signal is given by
1 † −1
p(Z|X; R, A) = e−tr{(Z−A X) R (Z−A X)} . (9.154)
|R|−n s π n s n r
The maximum likelihood is found by jointly maximizing the probability density
for X and the nuisance parameters of the channel matrix A and the external-
interference-plus-noise covariance matrix R. These estimates were found in Equa-
tions (8.128) and (9.60), respectively. Those results are quoted here. As presented
in Section 8.10, by maximizing the log of the probability distribution with re-
spect to some arbitrary parameter of the channel matrix, the estimate of the
channel Â is found,
Â = Z X† (X X† )−1 . (9.155)
By substituting this form for the estimate of the channel Â, the probability
density is given by
1 ⊥ †
R −1 (Z P ⊥
p(Z|X; R, Â) = e−tr{(Z P X ) X )} . (9.156)
|R|−n s π n s n r
Similar to the result found in Equation (9.60), by maximizing the probability
density with the above substitution for the nuisance parameter A, for an arbi-
trary parameter of the interference-plus-noise covariance matrix R, the estimate
R̂ is given by
1
R̂ = Z P⊥ †
X Z . (9.157)
ns
By substituting this result into the probability density, only the received data
matrix and the possible transmitted signals are left,
log p(Z|X; R̂, Â) ∝ | Z P⊥ † ns
X Z | . (9.158)
Although it is theoretically possible to use the form
Z P⊥
X Z
†
(9.159)
directly for demodulation, this is computationally very expensive. A more practi-
cal procedure is to pursue an iterative receiver. One approach is to choose a basis
and optimize along each axis of the basis in turn in an alternating projections
optimization approach [76]. By using the result from the previous optimization
step and then optimizing along the next axis, the optimization climbs towards a
peak that is hopefully the global optimum. This iterative receiver can achieve the
maximum-likelihood performance; however, because the optimization criterion is
not guaranteed to be convex, convergence to the global maximum is not guar-
anteed. For many applications, it can be shown empirically that the probability
of convergence to the global maximum is sufficient to warrant the significant
reduction in computation complexity. A natural choice for bases is the signal
transmitted by each individual transmitter. Consequently, the receiver cycles
through the various rows of the transmitted signal matrix X.
The transmitted signal matrix X ∈ Cn t ×n s can be decomposed into the mth
row that is denoted here as x ∈ C1×n s and matrix with mth row remove Xm ∈
C(n t −1)×n s . We can construct a reorder version X̃ of the matrix X, given by

x
X̃ = . (9.160)
Xm
Because row space projection operators are invariant to reordering of rows (ac-
tually to any unitary transformation across the rows), the projection matrix for
the matrix X and X̃, so that P⊥ ⊥ ⊥ †
X = PX̃ , where PX̃ = I − X̃ (X̃X̃ )
† −1
X̃. The
⊥
matrix PX m that projects onto a subspace orthogonal to the row space of Xm
can be factored into the form
† † −1
P⊥
X m = I − Xm (Xm Xm ) Xm
= U† U , (9.161)
(n t −1)×n s
where the rows of U ∈ C form an orthonormal basis for the complement
of the row space of Sm . By using the definitions
ZU = Z U†
xU = x U† , (9.162)
the data and signal are projected onto a basis orthogonal to the estimates of
signals radiated from the other transmitters. It is useful to note that the two
quadratic forms are the same in the original or the projected bases,
Z P⊥ † ⊥ ⊥ ⊥
X Z = Z (PX m + PX m ) PX (PX m + PX m ) Z
†
= Z (P⊥ ⊥ ⊥
X m ) PX (PX m ) Z
†
= Z P⊥ † † −1
X m [I − X̃ (X̃ X̃ ) X̃] P⊥ X m Z
†
⎡ ⎤
† † −1
⎣I − x x x x ⎦ P⊥
= Z P⊥
X m X m Z
†
Xm Xm Xm Xm
⎡ ⎤
† † †
−1
⎣I − x x x x X x ⎦ P⊥
= Z P⊥ m
X m Z
†
X m
0 Xm x† Xm X†m 0
†
⊥ x α · x
= Z PX m I − P⊥ †
X m Z , (9.163)
0 · · 0
where · indicates terms in which we are not interested, and
α = [xx† − xX†m (Xm X†m )−1 Xm x† ]−1

= (xP⊥ † −1
X m x )
= (xU x†U )−1 (9.164)
so that from Equation (9.163)

8 9
† −1
Z P⊥
X Z†
= Z P⊥
X m I − x †
(x x
U U ) x P⊥
X m Z
†
† † −1
= Z P⊥ †
X m Z − ZU xU (xU xU ) xU Z†U
= ZU Z†U − ZU Px U Z†U
†
= ZU P⊥
x U ZU . (9.165)
Consequently, the determinant that is found in Equation (9.152) can be fac-

tored into a term with and without reference to x,

† † †
ZU P⊥ ⊥
x U ZU = ZU ZU − ZU Px U ZU

= ZU Z†U In r − ZU Px U Z†U (ZU Z†U )−1

= ZU Z†U In r − Px U Z†U (ZU Z†U )−1 ZU

= |ZU Z†U | In s − Px U PZ U . (9.166)
Because the first term is free of x, demodulation is performed by minimizing the

second term. Furthermore, because x is a row vector, the second term can be
simplified and interpreted in terms of a beamformer

In s − Px PZ = 1 − (xU x† )−1 xU PZ x†
U U U U U
w† ZU x†U
=1− , (9.167)
ns
where
w = R̂−1
U â ,
1 1
R̂U ≡ ZU Z†U = Z P⊥ †
X m Z ,
ns ns
â = ZU x†U (xU x†U )−1
= Z P⊥ † ⊥ † −1
X m x (x PX m x ) . (9.168)
The nr × 1 vector, w, contains the receive beamforming weights, R̂U is the

interference-mitigated signal-plus-noise covariance matrix estimate, and â is the
channel estimate associated with x with Xm mitigated temporally. Demodulation
is performed by maximizing the inner product of the beamformer output, w† ZU ,
and the interference-mitigated reference signal, xU .
9.7 Covariance matrix conditioning 337
9.7 Covariance matrix conditioning
For a variety of reasons, such as insufficient training samples or correlated noise,

matrices can be ill-conditioned or poorly conditioned. That is, the matrix will
have eigenvalues that are exactly or nearly zero. Covariance matrix estimates
that are formed from the outer product of data matrices will in general not be
rank deficient if the number of samples is greater than the number of spatial
dimensions.
As an example, if it is assumed that in a flat-fading environment a single
spatially correlated signal is present in addition to uncorrelated Gaussian noise,
then the covariance estimate is full rank. With nr receive antennas and ns sam-
ples, the estimate of the spatial receive covariance matrix Q̂ ∈ Cn r ×n r formed
from the noisy data matrix Z ∈ Cn r ×n s is given by
1
Q̂ = Z Z† , (9.169)
ns
where
Z = vs + N . (9.170)
The array response vector and transmitted sequence are denoted v ∈ Cn r ×1

and s ∈ C1×n s , respectively. The variable N ∈ Cn r ×n s represents the Gaussian
noise. Here the noise is normalized such that the expectation of the noise spatial
covariance matrix is the identity matrix,
) *
1
N N† = In r . (9.171)
ns
The covariance matrix Q is given by
) *
1 †
Q= ZZ
ns
= P vv† + In r , (9.172)
&
%
where the total noise-normalized received power is given by P = ss† /ns . It is
assumed that v 2 = 1. The eigenvalues of the covariance matrix Q are given
by
λ1 {Q} = P + 1
λ2 {Q} = · · · = λm {Q} = 1 . (9.173)
It is important to note that the eigenvalues of the covariance matrix and

estimated covariance matrix are not equal in general,
λm {Q} = λm {Q̂} . (9.174)
In particular, the smallest eigenvalues of the estimated covariance matrix can be

dramatically different. If the number of samples is greater than or equal to the
number of receivers ns ≥ nr , then the likelihood of an estimated covariance
15
Estimated λ
10 Covariance λ
Eigenvalue (dB)
5
−5
−10
2 4 6 8
Eigenvalue Number
Figure 9.2 Comparison of estimated versus actual eigenvalue distributions. An

ensemble of 10 estimated eigenvalue distributions is displayed. The number of
independent samples is ns = 16, and the number of receive antennas is nr = 8.
The total noise-normalized received signal power is 10.
matrix with a zero eigenvalue has zero probability. In particular, while the
“small” eigenvalues of the real covariance matrix are all 1, the noise eigenval-
ues of the estimated covariance matrix (ignoring any mixing with the received
signal) are given by the eigenvalues of a Wishart distribution discussed in Sec-
tion 3.5. An example, assuming ns = 16 samples and nr = 8 receive antennas,
of the difference between the eigenvalues is displayed in Figure 9.2. The total
noise-normalized received signal power is 10.
Depending upon the algorithm in which the eigenvalues will be used, the dif-
ference between the small noise eigenvalues of the estimated versus the real
covariance matrix may or may not be important. If the covariance matrix is in-
verted, then the small eigenvalues of the estimate can have a significant effect.
This effect can motivate the use of regularization to limit the small eigenvalues.
The range of eigenvalues can be much more dramatic in the case of space-time
covariance matrices that are temporally oversampled.
One approach to regularizing matrices is to perform an eigenvalue decompo-
sition of the matrix of interest Q,
Q = U D U† , (9.175)
where U is a unitary matrix containing the eigenvectors of the matrix of in-

terest, and diagonal matrix D contains the associated eigenvalues. The matrix
is regularized by imposing a lower limit on the eigenvalues of the matrix of
Problems 339
interest,
{D}m ,m ; {D}m ,m > a
{D̃}m ,m =
a ; otherwise
a = λm ax {Q} , (9.176)
where a is a constant of regularization. It is often set by some small scalar

times the maximum eigenvalue of the matrix of interest. The regularized matrix
Q̃ is then given by
Q̃ = U D̃ U† . (9.177)
While effective, the above approach is relatively computationally expensive. A

relatively inexpensive alternative is to exploit the observation that adding a term
proportional to the identity matrix does not change the eigenvector structure.
Furthermore, by using the fact that the trace of a matrix is the sum of its
eigenvalues combined with the observation that the sum of eigenvalues can be
used as a mediocre approximation to the peak eigenvalue, a reduced-computation
regularized matrix Q̃ can be constructed,
Q̃ = Q + tr{Q} I . (9.178)
In general, the value of diagonal loading is determined by examining the perfor-

mance algorithm in which the matrix Q̃ is being used.
Problems
9.1 At high SNR, compare the symbol error performance ML and MAP de-
coding for an unencoded QPSK constellation under the assumption that points
on the constellations of {±1, ±1} have
(a) equal symbol probability: p{±1,±1} = 1/4,
(b) symbol probabilities defined by
p{1,±1} = 2/6
p{−1,±1} = 1/6 .
9.2 At SNR of 20 dB per receive antenna (can assume high SNR), compare the
symbol error performance for a MMSE and MI beamformer for an unencoded
QPSK constellation with equal probabilities for each symbol. Assume a four-
antenna receiver in a line-of-sight environment in the far field with a signal of
interest, and a single interferer of arbitrary power, all with known channels.
Assume that the normalized inner product between
√ the array responses of the
signal of interest and the inner product is 1/ 2.
9.3 Develop the estimators expressed in Equations (9.25) and (9.45).

9.4 By employing the Wirtinger calculus, show that Equation (9.74) is the least
squared error solution for the estimator of X.
9.5 Evaluate the least-squares error beamformer which minimizes the Frobe-
nius norm squared of the error matrix E defined by
E = W† Z − X , (9.179)
and show that it provides the same solution as the approximate MMSE beam-
former found in Equation (9.74).
9.6 Extend the result in Equation (9.125) to include external Gaussian inter-
ference. Show that performance is still bounded by the uninformed transmitter
capacity.
9.7 Show that the LMS beamformer solution converges to the MMSE solution
in the limit of a large number of samples.
9.8 For a four-antenna receiver observing a known signal with 0 dB SNR per
receive antenna in a block-fading i.i.d. Gaussian channel that is static for at least
50 samples over which the beamformers are estimated, numerically evaluate the
average (over many channel draws) estimated signal error as a function of samples
1 to 50 for
(a) RLS
(b) LMS
(c) estimated MMSE using blocks of 10 samples,
where the RLS and LMS have no knowledge of the channel at the first sample.
9.9 For a four-antenna receiver observing a known signal with 0 dB SNR per
receive antenna with a 10 dB INR per receive antenna interferer in a block-
fading i.i.d. Gaussian channel that is static for at least 50 samples over which
the beamformers are estimated, numerically evaluate the average (over many
channel draws) estimated signal error as a function of samples 1 to 50 for
(a) RLS
(b) LMS
(c) estimated MMSE using blocks of 10 samples,
where the RLS and LMS have no knowledge of the channel at the first sample.
9.10 For a 10-antenna receiver observing a known signal with 0 dB SNR per
receive antenna in a block-fading i.i.d. Gaussian channel that is static for the
period of observation over which the beamformers are estimated, numerically
evaluate the average (over many channel draws) estimated signal error using the
estimated MMSE beamformer of the forming
−1
Z Z† Z X†
w= + I , (9.180)
ns ns
using blocks of five samples as a function of diagonal loading for the form de-
scribed in Equation (9.178).
10 Dispersive and doubly
dispersive channels
Frequency-selective channels are caused by delay spread in the channel. When

delay spread is introduced into the channel model, intersymbol interference is ob-
served at the receiver. Intersymbol interference denotes the effect of the channel
introducing contamination to the current sample from previous samples. If the
communication system does not compensate for this effect, the performance of
the link can be degraded significantly. The adverse effects of delay spread can be
even more dramatic if a strong interferer that is observed by a multiple-antenna
receiver has a channel that is frequency selective. For example, consider a single-
antenna interferer. Without delay spread, a capable multiple-antenna receiver
can mitigate the effects of the interference. In channels with significant delay
spread, the rank of the interference spatial receive covariance matrix can grow
from rank-1 to full rank, because each receive symbol can contain contributions
from multiple transmit symbols at various relative delays propagation through
channels that cause independent spatial responses. Without changing the pro-
cessing approach, this full-rank interference covariance matrix can overwhelm
the communications link.
The frequency-selective channel can be represented in the frequency domain
by employing a channel representation with coefficients at various frequencies,
or in the time domain by employing a channel representation with coefficients at
various delays (delay taps). To complicate the channel problem, if the channel is
not static because of the motion of the transmitter, receiver, or scatterers, then
compensating for delay spread can be more difficult. This dynamic channel can
be represented by explicitly employing time-varying channels or by employing
a channel representation with coefficients at various Doppler-frequency offsets
(Doppler taps). The use of Doppler-frequency offsets covers a number of poten-
tial issues that are not technically caused by the Doppler effect. These include
local oscillator (or frequency synthesizers) frequency offsets and low-frequency
local oscillator phase noise. We will not be precise in differentiating these effects
because they look similar from a channel and processing perspective. A channel
with significant delay spread and Doppler-frequency spread is denoted doubly
dispersive.
342 Dispersive and doubly dispersive channels
10.1 Discretely sampled channel issues
To represent a doubly dispersive channel, one can employ a channel that is a

continuous channel as a function of time and frequency. This channel can be
approximated well (although not exactly in general) by a finite number of taps
in delay and Doppler frequency. For a channel with a limited delay range for a
bandwidth-limited signal, the number of channel delay taps is denoted nd . For a
channel with limited Doppler frequency range the number of frequency taps are
denoted nf . Because a bandwidth-limited signal is not temporally limited and
because a temporally limited signal is not bandwidth limited, these constraints
are intrinsically incompatible. However, this is a typical problem in communica-
tions, and as a practical approximation for many problems, this formulation can
work well. In general, all channel models are approximations. Similarly, a num-
ber of taps can be employed for delay and Doppler-frequency processing. The
numbers of taps are denoted nδ and nν , respectively. To be clear, the numbers
of significant channel and processing taps are not typically equal.
For the sake of introduction, consider a static SISO channel. For a transmitted
complex baseband signal s(t) ∈ C, a received complex baseband signal z(t) ∈ C,
and infinite-bandwidth channel impulse response h̃(t) ∈ C in additive Gaussian
noise n(t) ∈ C, the receive signal z(t) as a function of time is given by the
convolution of the transmitted signal and the channel plus noise. If the channel
contains some set of discrete scatterers with amplitude am ∈ C at relative delay
τm ∈ R, then the channel can be described by

z(t) = dτ h̃(τ ) s(t − τ ) + n(t)

h̃(τ ) = am δ(τ − τm ) . (10.1)
m
If the channel representation h̃(τ ) is constructed from delta functions associated

with point scatterers, it can support signals with infinite bandwidth. In general,
it is not possible to represent this channel with a discrete regularly sampled chan-
nel model, which requires a finite channel spectral support. The solution is to
represent the channel with the same spectral support as the complex signal s(t),
which we assume has bandwidth B (including both positive and negative fre-
quencies). By assuming that the noise and signal have the same spectral support
(bandwidth B), the received signal can be represented in the spectral domain
with the following temporal versus spectral correspondences
z(t) ↔ Z(f )
s(t) ↔ S(f )
n(t) ↔ N (f )
h̃(t) ↔ H̃(f ) . (10.2)
10.1 Discretely sampled channel issues 343
Consequently, the frequency domain version of Equation (10.1) is given by

Z(f ) = H̃(f ) S(f ) + N (f ) . (10.3)
If the spectral support of the signal S(f ) is limited to bandwidth B, then the
signal is not changed by applying a perfect filter of bandwidth B (including both
positive and negative frequencies), so that

f
S(f ) = θ S(f ) , (10.4)
B
where the function θ(x) is 1 for −1/2 ≤ x < 1/2 and zero otherwise. Conse-
quently, the frequency-domain version of the channel model in Equation (10.3)
can be written as

f
Z(f ) = H̃(f ) θ S(f ) + N (f ) , (10.5)
B
which corresponds to

z(t) = dτ h(τ ) s(t − τ ) + n(t)

i 2π t f f
h(τ ) = df e H̃(f ) θ
B

= dν h̃(ν) B sinc([τ − ν]B)

= dν am δ(ν − τm ) B sinc([τ − ν]B)
m

= am B sinc([τ − τm ]B) , (10.6)
m
where ν is a dummy variable corresponding to relative delay. Given this

bandwidth-limited channel impulse response, a discrete version of the channel
model can be constructed,

z(t) = hm s(t − m Ts ) + n(t) , (10.7)
m
where hm = Ts h(m Ts ) indicates a discrete bandwidth-limited representation of

the continuous complex channel attenuation associated with the mth delay tap
of the channel, and Ts ≤ 1/B is the sample period. For many problems, it is as-
sumed that the channel that is provided for analyses is from a bandwidth-limited
sampled estimate, so this discussion is unnecessary. The issue often results when
channels are constructed from explicit physical channel models with arbitrary
delays.
In an approach that is formally similar to the discussion above, channels may
have Doppler-frequency spread rather than delay spread. Furthermore, there is a
discretized version of the effects of Doppler-frequency spread. Instead of limited
bandwidth, the channel must have limited temporal extent to satisfy Nyquist in
the Doppler-frequency domain.
While nearly all modern communication systems use sampled signals, there are
some subtleties to be considered. As an example, consider a physical channel with
an arbitrary delay relative to signal sampling. In general, it will take an infinite
number of channel taps to represent the channel. For many problems, this will
have little effect on the analysis because it will only take a few channel taps to
provide a sufficiently accurate approximation. However, for some problems that
require precise representations (often these are found in theoretical analyses),
misleading results can be generated.
Sampled doubly dispersive channels

In sampled representations of doubly dispersive channels (that is, a channel that
induces delay and frequency spread), there is an intrinsic problem. In order to
satisfy Nyquist in time, a bandwidth-limited signal is required. In general, a
bandwidth-limited signal implies a signal of infinite temporal extent. In order to
satisfy Nyquist in frequency, a temporally limited signal is required. This tempo-
rally limited signal implies signal of infinite spectral extent. Consequently, theo-
retically, sampled doubly dispersive channels are problematic. However, practi-
cally, discrete doubly dispersive channel representations are useful. With a suf-
ficient number of samples, the errors caused by limited spectral and temporal
extents can be made small.
10.2 Noncommutative delay and Doppler operations
The approaches used to implement delay and Doppler offsets observed through-
out this chapter ignore the effects of noncommutative delay and Doppler opera-
tors. Because a dense set of delay and Doppler taps is assumed in the processing,
the approach is not particularly sensitive to this oversight. However, when one is
attempting to use sparse sets of delay and Doppler taps, more care is required.
For a delay shift d and Doppler-frequency shift f , the effects on a signal s(t)
are sometimes approximated by the operation
ei 2π f t s(t − d) . (10.8)
Two assumptions were used in this formation. First, the velocity difference is
small enough that the frequency offset can be described by a frequency shift.
Second, the delay-shifting operation is applied before the frequency-shifting op-
eration. This choice was arbitrary. A useful model for considering the frequency
shift is to induce the frequency shift via time dilation 1 1 + . With independent
local oscillators, the time dilation is caused by one clock simply running faster
than another. Consider the operators Td {·} and F {·} that delay time by d and
1 We are not defining time dilation here in the special relativity sense.
dilate time by 1 + respectively,
Td {s(t)} = s(t − d)
F {s(t)} = s([1 + ]t) . (10.9)
Note that the operators do not in general commute
Td {F {s(t)}} = F {Td {s(t)}}

s([1 + ]t − d) = s([1 + ][t − d]) . (10.10)
However, if the product of the delay spread and the Doppler-frequency spread is
small, then the difference between the two operator orderings is small.
10.3 Effect of frequency-selective fading
Here we consider a static channel with delay spread. For multiple-antenna re-
ceivers, the effect of delay spread can cause the rank of the receive spatial co-
variance matrix to increase. To demonstrate this effect, consider the following
simple two-tap channel model of a transmitted signal s(t) for the received signal
z(t) ∈ Cn r ×1 as a function of time t impinging upon an array
z(t) = h0 s(t) + hτ s(t − τ ) + n(t) , (10.11)
where h0 ∈ Cn r ×1 and hτ ∈ Cn r ×1 are the receive-array responses for the first

and second arriving wavefronts. The complex additive noise is given by n(t) ∈
Cn r ×1 . The second wavefront is delayed by time τ . The average transmitted
power P is given by
P = s(t) s∗(t) . (10.12)
The units of power are selected so that the spatial covariance of the thermal
noise is assumed to be given by
% &
n(t) n† (t) = I , (10.13)
so that noise power per receive antenna is 1. The receive spatial covariance matrix
Q ∈ Cn r ×n r is given by
% &
Q = z(t) z† (t)
= h0 h†0 s(t) s∗(t) + h0 h†τ s(t) s∗(t − τ )
+ hτ h†0 s(t − τ ) s∗(t) + hτ h†τ s(t − τ ) s∗(t − τ ) + I
= P h0 h†0 + P ρτ h0 h†τ + P ρ∗τ hτ h†0 + P hτ h†τ + I , (10.14)
where the autocorrelation parameter ρτ is given by

s(t) s∗(t − τ )
ρτ = . (10.15)
P
Temporally unresolved multipath scattering

In the case of unresolved multipath, the delay τ approaches zero, so that the
transmitted signal and some slightly delayed version s(t) ≈ s(t − τ ). The receive
spatial covariance matrix Q becomes a rank-1 matrix plus the identity matrix.
Q → P (h0 + hτ ) (h0 + hτ )† + I . (10.16)
The eigenvalues {λ1 , λ2 , . . . , λn r } of the spatial covariance matrix are given by
λ1 = P (h0 + hτ )† (h0 + hτ ) + 1 (10.17)
and
λm = 1 for m > 1. (10.18)
Consequently, even though there are multiple channel paths, there is a single
large signal eigenvalue.
Temporally resolved multipath scattering

In the case of resolved multipath, the relative delay is large enough so that the
received data appear to contain two signal versions. The transmitted signals are
approximately independent because the autocorrelation between the two delays
is small. For the sake of discussion, consider a scenario in which the autocorre-
lation ρτ at some large delay τ is zero to a good approximation,
ρτ ≈ 0 . (10.19)
From Equation (10.14), the receiver spatial covariance matrix Q is then approx-
imately given by
Q ≈ P h0 h†0 + P hτ h†τ + I . (10.20)
The mth eigenvalue of the receiver spatial covariance matrix λm {Q} is given by
λm {Q} ≈ P λm {h0 h†0 + hτ h†τ } + 1 . (10.21)
For two-tap channels with taps that are well separated in delay, the eigenvalues
are given by

h0 2 + hτ 2 + ( h0 2 − hτ 2 ) + 4 h†0 hτ 2
2
λ1 {Q} = 1 + P
2
h0 + hτ − ( h0 2 − hτ 2 ) + 4 h†0 hτ 2
2 2 2
λ2 {Q} = 1 + P
2
λm {Q} = 1 ; m ∈ {3, . . . , nr } , (10.22)
where Equation (2.85) has been employed. For notational convenience, we will
make the following definitions. The normalized inner product between the array
responses at the different delays is given by η, and the ratio of the norms of the
array responses is given by γ,
h†0 hτ
η=
h0 hτ
hτ
γ= . (10.23)
h0
By using these definitions, the eigenvalues λ1 {·}, λ2 {·}, and the rest λm {·} of
the receive spatial covariance matrix are given by

2
1 + γ 2
+ (1 − γ 2 ) + 4γ 2 η 2
λ1 {Q} = 1 + P h0 2
2
2
1 + γ − (1 − γ 2 ) + 4γ 2 η 2
2
λ2 {Q} = 1 + P h0 2
2
λm {Q} = 1 ; m ∈ {3, . . . , nr } . (10.24)
In the special case of equal array response norms so that γ = 1, the first two
eigenvalues are given by
{λ1 {Q}, λ2 {Q}} = 1 + P h0 2

(1 ± η) . (10.25)
The ratio of the second to the first eigenvalue in the high-power limit is given by
λ2 {Q} 1−η
≈ . (10.26)
λ1 {Q} 1+η
In another special case, if array response norms are not equal, but the array
responses are approximately orthogonal so that η ≈ 0, then the first two eigen-
values are given by
λ1 {Q} = 1 + P h0 2
λ2 {Q} = 1 + P h0 2
γ2 . (10.27)
The approximate orthogonality assumption may not be bad in situations in which

the channels are random and the number of antennas is large. The ratio of the
second to the first eigenvalue in the high-power limit is given by
λ2 {Q}
≈ γ2 . (10.28)
λ1 {Q}
Here given resolvable delay spread in the received signal, it is seen in Equa-
tions (10.26) and (10.28) that the rank of the receive covariance matrix for a
single transmitter increases from rank-1 to rank-2. As the number of temporally
resolvable multipath components increases, so does the rank of the receive spa-
tial covariance matrix. This fact has implications for using the receiver spatial
degrees of freedom for interference mitigation.
10.4 Static frequency-selective channel model
From Equation (8.2), the received data vector z(t) using the standard flat-fading
MIMO channel signal model is given by
z(t) = H s(t) + n(t) . (10.29)
A dispersive channel is one that has temporally resolvable delay spread. This
induces frequency-selective channel attenuation. As an extension of Equation
(10.7), for a bandwidth-limited signal and a channel with a finite delay range, the
frequency-selective channel characteristics are incorporated by including channel
taps indicated by delay τm ,

nd
z(t) = Hτ m s(t − τm ) + n(t) , (10.30)
m =1
where Hτ m indicates the channel matrix at the mth delay, and τm the nd re-
solvable delays. In general, a set of physical delay offsets that are not matched
to the regularly sampled delay offsets will require an arbitrarily large number of
sample delays to represent the channel perfectly. However, given a moderate set
of sample delays τm , a reasonably accurate frequency-selective channel can be
constructed.
For a channel represented by nd delays, the space-time channel matrix H̃ ∈
Cn r ×(n t ·n d ) is given by

H̃ = Hτ 1 Hτ 2 · · · Hτ n d . (10.31)
Similarly, the matrix of the transmitted signal at the nd delays s̃(t) ∈ C(n t · n d )×1
is given by
⎛ ⎞
s(t − τ1 )
⎜ s(t − τ2 ) ⎟
⎜ ⎟
s̃(t) = ⎜ .. ⎟. (10.32)
⎝ . ⎠
s(t − τn d )
Consequently, the received signal is given by

z(t) = Hτ m s(t − τm ) + n(t)
m
= H̃ s̃(t) + n(t) . (10.33)
10.5 Frequency-selective channel compensation
As discussed in Chapter 9, there are two different perspectives for compensating

for channel distortions. One can modify the model to match the data, or one
can modify the data to match the model. Under the modify-the-data category,
the typical approach for compensating for frequency-selective fading in SISO

channels is equalization that introduces a tap delay line into the processing.
This approach can be extended to the multiple-antenna application by using
what is denoted either as space-time adaptive equalization (STAE) or space-
time adaptive processing (STAP).
Another approach for compensating for frequency-selective channels is to em-
ploy orthogonal-frequency-division multiplexing (OFDM). Either data-modifying
or model-modifying approaches can employ an OFDM signaling structure. There
are direct extensions to OFDM for multiple-antenna systems. For either ap-
proach, it is implicitly assumed that the channel does not vary significantly
during the coherent processing interval. In the case of OFDM, the typical ap-
proach is to apply narrowband multiple-antenna processing within each OFDM
carrier.
10.5.1 Eigenvalue distribution of space-time covariance matrix

In general, adaptive processing can be used to compensate for many potential
distortions of the transmitted signal. The most common is the frequency-selective
fading induced by resolvable delay spread in the channel. By defining the received
signal data matrix Zτ ∈ Cn r ×n s distorted by a delay τ ,

Zτ = z(0 Ts − τ ) z(1 Ts − τ ) z(2 Ts − τ )

· · · z([ns − 1] Ts − τ ) , (10.34)
for a regularly sampled signal with sample period Ts , a space-time data matrix
Z̃ ∈ C(n r ·n δ )×n s is constructed,
⎛ ⎞
Z0 δ τ
⎜ Z1 δ τ ⎟
⎜ ⎟
⎜ Z ⎟
Z̃ = ⎜ 2 δτ ⎟. (10.35)
⎜ . ⎟
⎝ .. ⎠
Z(n d −1) δ τ
As a reminder, we use nδ here rather than nd because nδ indicates the number of
delays used in the processing rather than that required to represent the channel
with some accuracy.
Here there is potentially some confusion because there is temporal sampling
both along the traditional temporal sampling dimension which is encoded along
the rows of space-time data matrix Z̃ and in delay which is mixed with the
receive antennas along the columns of space-time data matrix Z̃.
One of the reasons that this structure is interesting is that it can be used
to compensate for the eigenvalue spread observed in the spatial covariance ma-
trix in environments with resolvable delay spread. For the example of a single
transmitter in an environment with resolvable delay spread, the fraction of non-
noise-level eigenvalues approaches 1/nr as the number of delays and samples
becomes large.2 The space-time covariance Q̃ ∈ Cn r n d ×n r n d is given by

1 5 6
Q̃ = Z̃ Z̃† . (10.36)
ns
For the example of a single transmitter, the delay-dependent channel matrix Hτ ,
from Equation (10.30), collapses to a vector hτ ∈ Cn r ×1 as a function of delay
τ . Consequently, the received signal, under the assumption of a sampled channel
matrix, is given by

z(t) = hτ m s(t − τm ) + n(t) , (10.37)
m
where s(t) is the transmitted signal.

The ability to mitigate interference is related to the fraction of the total space
occupied by the interference as determined by the distribution of eigenvalues. It
will be useful to consider the discrete Fourier transform along the delay dimen-
sion. This transform does not affect the eigenvalue distribution. The eigenvalues
of the space-time covariance matrix λm {Q̃} are a function of the delay spread
and the number of delays used in the construction of Z̃. The mth eigenvalue of
the space-time covariance matrix is given by
0 1 1 05 61
λm Q̃ = λm Z̃ Z̃†
ns
1 05 61
= λm U Z̃ Z̃† U†
ns
1 05 61
= λm (F†n δ ⊗ In r ) Z̃ Z̃† (Fn δ ⊗ In r ) , (10.38)
ns
where U indicates some unitary matrix, and Fn δ is the discrete Fourier transform
matrix of size nδ with unitary normalization; thus, the form F†n δ ⊗ In r is unitary.
Here the observation that the eigenvalue distribution of a matrix is invariant
under arbitrary unitary transformations is employed.
Covariance matrix rank by using continuous approximation

In this section, we show that the fraction of the noise-free space-time covariance
space occupied by the interference asymptotically approaches 1/nt . For discus-
sion, consider a data matrix in which the noise is zero. For a set of nr rows of the
space-time data matrix Z̃ associated with delay τ is given by the vector z(t − τ )
as a function of sampled values of time t. To determine the asymptotic limit
of the eigenvalue distribution of the space-time covariance matrix, we will con-
sider the limiting continuous form of the Fourier transform along the channel
delays. The limiting case will provide a channel formulation that allows for an
infinite delay range and an infinitesimal channel sampling period. For a single
transmitter propagating through a frequency-selective channel, this data vector
2 The authors would like to thank Shawn Kraut for his thoughtful comments and
suggestions on this topic area.
is given by the convolution of the transmitted signal convolved with the channel,

z(t − τ ) = dq h(q) s(t − τ − q) , (10.39)
where h(q) ∈ Cn r ×1 is the channel response as a function of delay. In the con-

tinuous limit, the inverse Fourier transform of the space-time data matrix along
the delay space (F†n d ⊗ In r ) Z̃ is associated with the continuous form

−1
Fτ {z(t − τ )} = dτ e i2π f τ
dq h(q) s(t − τ − q)

= ei2π f t dτ e−i2π f (τ +q ) dq h(q) s(τ ) ; τ = t − τ − q

1
= ei2π f t √ dq e−i2π f q h(q) dτ e−i2π f τ s(τ )
2π
i2π f t
=e hf (f ) sf (f ) , (10.40)
where we have abused the notation somewhat, such that here hf (f ) is the Fourier
transform of h(t), and sf (f ) is the Fourier transform of s(t). Implicit in this for-
mulation is the implementation of an infinite-dimensional delay space, which is
an approximation to the case in which the space-delay matrix is very large com-
pared with the delay spread of the channel. In evaluating the space-time covari-
ance matrix, the expectation is evaluated over time and draws of the transmitted
signal, but the channel as a function of delay is assumed to be deterministic. In
a continuous analog to Equation (10.38), the nr × nr cross-covariance matrix as-
sociated with the frequencies {f, f } of the outer product of the inverse Fourier
transform of the space-delay array response
% −1 &
Fτ {z(t − τ )} Fτ−1 †
{z(t − τ )} , (10.41)
where the expectation is taken over time, is given by

% −1 &
Fτ {z(t − τ )} Fτ−1 {z(t − τ )}
†
5
6
∝ ei2π (f −f ) t sf (f ) s∗f (f ) hf (f ) h†f (f )
% &
= sf (f ) 2 hf (f ) h†f (f ) , (10.42)
where the expectation over the exponential produces a delta function, under the
assumption that the signal is uncorrelated at different frequencies. As a counter
example, cyclostationary signals would have some correlation across frequencies.
The resulting covariance is block diagonal with the outer product of channel
responses hf (f ) h†f (f ) at each frequency. For the finite case, with nδ delays, the
corresponding space-time covariance matrix is size nδ · nr × nδ · nr . In the limit
of nδ becoming large, because each block is rank-1 out of a nr × nr matrix, the
rank of the space-time covariance is given by nδ . Consequently, the fraction of
the eigenvalues that are not zeros is bounded by one over the number of receive
channels 1/nr .
Covariance matrix rank for finite samples

Once again, consider a data matrix in the absence of noise. We assume here that
the channel can be represented by nd delays. With a finite number of delays in
the space-time data matrix, the rank of the space-time covariance matrix can be
bounded by
rank ≤ nd + nδ − 1 (10.43)
out of a dimension of nδ nr . Consequently, the fractional space-time covariance
matrix rank (that is, the rank divided by the total number of degrees of freedom)
is given by
nd + nδ − 1
frac rank ≤ , (10.44)
nδ nr
which approaches 1/nr in the limit of large nδ .
To develop the result in Equation (10.44), consider a space-time data vector
z̃(t) ∈ Cn δ ·n r ×1 , which is a single column from Equation
% (10.35)
& at some time t.
The space-time covariance is then given by Q̃ = z̃(t) z̃† (t) . Under the assump-
tion that the channel and data matrix are sampled with the same period δτ , the
space-time data vector has the form
⎛ "n d −1 ⎞
hm s(t − [m + 0] δτ )
⎜ "n d −1 h s(t − [m + 1] δ ) ⎟
m =0
⎜ "m =0 m τ ⎟
⎜ n d −1 ⎟
⎜
z̃(t) = ⎜ h s(t − [m + 2] δ ) ⎟. (10.45)
m =0 m τ
⎟
⎜ .. ⎟
⎝ . ⎠
"n d −1
m =0 hm s(t − [m + n δ − 1] δ τ )
By rearranging the sum so that terms with the same value of delay s(t − kδτ ) are
combined, the rank of the space-time covariance matrix can be bounded. The
space-time data vector also has the form
⎛ ⎞ ⎛ ⎞
h0 h1
⎜ 0 ⎟ ⎜ h0 ⎟
⎜ ⎟ ⎜ ⎟
⎜ 0 ⎟ ⎜ ⎟
z̃(t) = ⎜ ⎟ s(t − 0 δτ ) + ⎜ 0 ⎟ s(t − 1 δτ )+
⎜ . ⎟ ⎜ . ⎟
⎝ .. ⎠ ⎝ .. ⎠
0 0
⎛ ⎞
0
⎜ 0 ⎟
⎜ ⎟
⎜ . ⎟
··· + ⎜ .. ⎟ s(t − [nd − 1 + nδ − 1] δτ ) . (10.46)
⎜ ⎟
⎝ 0 ⎠
h[n d −1]
Under the assumption that the channel and signal at each delay are independent,
the rank of the space-time covariance matrix is given by the contribution of each
of these nd + nδ − 1 terms; thus, Equations (10.43) and (10.44) are verified. As
the fraction of space that a signal occupies decreases, the adverse effect it has
on the ability for a receiver to decode other signals typically decreases. Thus,
by increasing the number of delays in processing, the typical performance of a
receiver that is observing multiple signals improves; however, this comes at the
cost of an increase in computation complexity.
10.5.2 Space-time adaptive processing

In general, approaches that are applicable to adaptive spatial processing can be
extended to adaptive space-time processing. As an example, in an extension to
the spatial beamformer discussed in Section 9.2.3, the estimate of the transmit-
ted signal Ŝ at the output of a linear space-time minimum-mean-square error
(MMSE) beamformer W̃ is given by
Ŝ = W̃† Z̃ , (10.47)
such that
5 6
W̃† Z̃ − S 2
F (10.48)
is minimized. The space-time adaptive beamformer is given by

5 6−1 5 6
W̃ = Z̃ Z̃† Z̃ S̃†
≈ (Z̃ Z̃† )−1 Z̃ S̃† , (10.49)
where the distorted transmitted training sequence is given by

⎛ ⎞
S0 δ τ
⎜ S1 δ τ ⎟
⎜ ⎟
⎜ S2 δ τ ⎟
S̃ = ⎜ ⎟ ∈ C(n t n δ )×n s , (10.50)
⎜ . ⎟
⎝ .. ⎠
S(n d −1) δ τ
where Sτ indicates the data matrix shifted to delay τ . The data matrix outer
product term
Z̃ Z̃† (10.51)
in the beamformer definition in Equation (10.35) must be nonsingular in the

approximate form. A necessary condition for this to be true is that
ns ≥ nr nδ . (10.52)
However, because of the potential correlation between samples, this condition

may not be sufficient. As an example, it is common to use temporally over-
sampled data, such that the sample rate is larger than the number required to
support the received bandwidth-limited signal. Approaches to address the result-
ing poorly conditioned matrices are discussed in Section 9.7. One of the most
common techniques is diagonal loading, for which Z̃ Z̃† → Z̃ Z̃† + I, where is
Data Coding/
01101110... Modulation
Transmitted
Signal
Cyclic
IFFT
Prefix
Figure 10.1 Notional construction of OFDM transmit signal.
an appropriately scaled small number. This technique was discussed in greater

detail in Section 9.7.
10.5.3 Orthogonal-frequency-division multiplexing

While there are a number of subtleties and variants of orthogonal-frequency-
division multiplexing (OFDM) modulation, the basic premise is that blocks of
symbols are defined and constructed in the frequency domain, then an inverse
fast Fourier transform (IFFT) converts the signal to the time domain and is
transmitted as seen in Figure 10.1. If the ns samples in the frequency domain
for the nt transmitters are indicated by S ∈ Cn t ×n s , then the time domain
representation X ∈ Cn t ×n s is given by
X = S F†
n s −1
1 mn
{X}k ,n = √ {S}k ,m ei2π ns . (10.53)
ns m =0
Thus, each symbol is placed in its own subcarrier. The approximate width of
a bin in the frequency (which is approximate because each subcarrier has the
spectral shaping of a sinc function) is given by the bandwidth of the complex
baseband signal divided by the number of samples B/ns . If the width of a bin is
small compared with the inverse of the standard deviation of the delay spread σd
B 1
, (10.54)
ns σd
then the frequency-selective fading will typically move slowly across the fre-
quency bins. Consequently, representing the channel as a complex attenuation
in each frequency bin is a good approximation. In this regime, performing nar-
rowband processing within each bin works reasonably well. Approaches to ad-
dress doubly dispersive channels (discussed later in this chapter) using OFDM
waveforms have also been considered [200, 277].
At the OFDM receiver, an FFT is performed upon a block of received data Z ∈
n r ×n s
C to attempt to recover an estimate of the original frequency domain signal.
However, because of temporal synchronization errors and because of multipath
delay spread, the receiver cannot extract the exact portion of data that was
transmitted. This mismatch in temporal alignment causes degradation in the
orthogonality assumption. Noting that a cyclic shift at the input of the FFT
induces a benign phase ramp across the output, the adverse effects caused by
delay spread and synchronization error can be mitigated by adding a cyclic
prefix. A portion (ncp samples) of the time-domain signal from the beginning
of the signal is added to the end of the signal at the transmitter, so that the
transmitted signal Y ∈ Cn t ×(n s + n c p ) is given by

Y = X x 1 x2 · · · x n c p , (10.55)
where xm is the mth column of X. Because of the modularity of the exponential

m (n + n s ) m (n )
ei2π ns = ei2π ns , (10.56)
the transmitted signal with cyclic prefix has essentially the same form as the
transmitted signal without the cyclic prefix,
1
ns
( m −1 ) ( n −1 )
{Y}k ,n = √ {S}k ,m ei2π ns ∀ n ∈ {1, 2, · · · , ns + ncp } . (10.57)
ns m =1
The sum over m can be considered the sum over subcarriers that produces the
final time-domain signal. The received signal in the time domain Z for the nth
sample in time and the jth receive antenna is then given by
1
ns nt
( m −1 ) ( n −1 )
{Z}j,n ≈ √ {Hm }j,k {S}k ,m ei2π ns + {N}j,n , (10.58)
ns m =1
k =1
where here Hm ∈ Cn r ×n t is the channel matrix for the mth subcarrier. The
result is an approximation because the model of a flat-fading channel within a
subcarrier is approximate.
The significant advantage of OFDM is the implicit frequency channelization
that, given a sufficient density of frequency bins, enables narrowband process-
ing within each channel. In addition, by employing FFTs, the computational
complexity increases by order of the logarithm of the number of frequency chan-
nels per signaling chip (because it grows order ns log ns for the whole block of
ns chips). This increase in computational complexity is much slower than most
equalization approaches for single-carrier systems.
One of the significant disadvantages of the OFDM approach is that the trans-
mitted signal has a large peak-to-average power ratio, as discussed in Section
18.5. Ignoring the possibility of receivers that can compensate for nonlinearities,
which would be very computationally intensive, the large peak-to-average power
ratio imposes significant linearity requirements on the transmit amplifier. Typ-
ically, improved transmitter amplifier linearity comes at the expense of greater
power dissipation. The large peak-to-average ratio can be understood by noting
that the time-domain signal is constructed by adding a number of indepen-
dent frequency-domain symbols together. Even if the starting frequency-domain
symbols have a constant modulus, by the central limit theorem, the limiting
transmitted signal distribution is Gaussian. While not likely, occasionally values
drawn from a Gaussian distribution can be several times larger than the standard
deviation. Thus, the transmitted signal has a large peak-to-average power ratio.
Effects of external interference

Because OFDM uses a cyclic prefix to maintain orthogonality of carriers, OFDM
can be particularly sensitive to external interference. Typically, such interfer-
ence will not be synchronized with an appropriate cyclic prefix. In multipath for
multiple-antenna receivers, despite the multiple-carrier approach, the rank of the
external interference in each frequency bin can increase rapidly with channel de-
lay spread. When using FFTs for spectral analysis, nonrectangular windowing is
typically used to reduce the spectral sidelobes [172]. There are many widowing
approaches: for example, Hamming, Hanning, Blackman, and Chebyshev win-
dows. However, these windowing approaches break the orthogonality between
carriers. In principle, these effects can be traded and the windows can be opti-
mized [64].
10.6 Doubly dispersive channel model
The doubly dispersive channel model includes the tap delay characteristics of the
frequency-selective channel model and allows for the model to vary as a function
of time. A general model for the received signal z(t) ∈ Cn r ×1 as a function of time
t that extends the static sampled delay channel model and allows time-varying
channel coefficients is given by

nd
z(t) = Hτ m (t) s(t − τm ) + n(t) , (10.59)
m =1
where the function Hτ m (t) ∈ Cn r ×n t indicates the time-varying channel matrix

at relative delay τm at time t for nd resolvable delays. Given a sufficient set
of values for τm , this model can accurately represent a time-varying frequency-
selective channel for bandwidth-limited signals.
Under the assumption of a bandwidth-limited signal with a channel that can
be represented by nd number of delays, the time-varying space-time channel
matrix H̃(t) ∈ Cn r ×(n t ·n d ) is given by

H̃(t) = Hτ 1 (t) Hτ 2 (t) ··· Hτ nd (t) . (10.60)
Similar in form to the static case, the received signal z(t) is given by

z(t) = Hτ m (t) s(t − τm ) + n(t)
m
= H̃(t) s̃(t) + n(t) . (10.61)

10.6.1 Doppler-domain representation

While the temporal dynamics of the channel can be accurately expressed by em-
ploying time-varying coefficients in the channel, for some applications it is useful
to consider a Doppler-domain representation for the time-varying channel. Here
we will consider a continuous SISO channel first and then extend the discussion
to a discrete MIMO channel.
Doppler-domain SISO channel

As an example, consider a time-varying, noise-free, SISO channel without delay
spread,
z(t) = ht (t) st (t) , (10.62)
where z(t), ht (t), and st (t) indicate the complex baseband received signal, the
complex attenuation, and the transmitted signal, respectively, in the temporal
domain. Transforming to the frequency domain, the transmitted signal is given
by
∞
sf (f ) = dt e−i 2π f t st (t) . (10.63)
−∞
The complex attenuation in the frequency domain hD (f ) can be expressed in

the Doppler domain,
∞
hD (f ) = dt e−i 2π f t ht (t) , (10.64)
−∞
where aD (f ) is the complex attenuation in the frequency domain (associated

with Doppler-frequency shifts).
By noting that the multiplication in the temporal domain is convolution in
the frequency domain,
ht (t) st (t) ⇔ hD (f ) ∗ sf (f ) . (10.65)
Under the assumption of a limited temporal signal, the transform relationship is
explicitly given by
∞
ht (t) st (t) = df ei 2π f t hD (f ) ∗ sf (f )
−∞
∞ ∞
= df ei 2π f t dfD hD (fD ) sf (f − fD )
−∞ −∞
∞
= dfD hD (fD ) ei 2π f D t st (t) , (10.66)
−∞
where the convolution hD (f ) ∗ sf (f ) is given by dfD hD (fD ) sf (f − fD ).
Doubly dispersive SISO channel

As discussed in the introduction of this chapter, this formulation is useful if
the signal is both bandwidth and temporally limited, which is not possible.
However, for many problems, this formulation can be employed to a good

approximation.
For the SISO case of a time-varying channel with delay spread, denoted ht (t, τ )
at time t and relative delay τ , the above time-varying channel can be extended
to include delays, so that the noise-free received signal z(t) is given by

z(t) = dτ ht (t, τ ) st (t − τ ) . (10.67)
By repeating the discussion of transforming to the Doppler domain as above,

and noting that the Fourier transform of the delayed transmitted signal is given
by
∞ ∞

dt e−i 2π f t st (t − τ ) = e−i 2π f τ dt e−i 2π f t st (t )
−∞ −∞
= e−i 2π f τ sf (f ) , (10.68)
the received signal is approximated by

z(t) = dτ ht (t, τ ) st (t − τ )

= dτ df ei 2π f t dfD hD (fD , τ ) sf (f − fD ) e−i 2π (f −f D ) τ

= dτ dfD hD (fD , τ ) ei 2π f D τ
df ei 2π f (t−τ ) sf (f − fD )

= dτ dfD hD (fD , τ ) ei 2π f D τ df ei 2π (f +f D ) (t−τ ) sf (f )

= dτ dfD hD (fD , τ ) ei 2π f D t st (t − τ ) . (10.69)
Doubly dispersive MIMO channel

By extending this discussion to the MIMO channel, the noise-free received signal
as a function of time z(t) ∈ Cn r ×1 is given by

z(t) = dτ dfD HD (fD , τ ) ei 2π f D t s(t − τ ) , (10.70)
where the function HD (fD n , τ ) ∈ Cn r ×n t indicates the channel in a delay and

Doppler-domain representation.
10.6.2 Eigenvalue distribution of space-time-frequency covariance matrix

Received signal matrices with a lattice of various delay or frequency offset dis-
tortions can be used in processing by the receiver. A space-time-frequency data
matrix Z̃ ∈ C(n r ·n δ ·n ν )×n s that is regularly sampled in delay δτ and in Doppler
frequency δν , is constructed by
⎛ ⎞
Z0 δ τ , 0 δ ν
⎜ .. ⎟
⎜ ⎟
⎜ . ⎟
⎜ ⎟
⎜ Z(n δ −1) δ τ , 0 δ ν ⎟
Z̃ = ⎜ ⎟, (10.71)
⎜ Z0 δ τ , 1 δ ν ⎟
⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
Z(n δ −1) δ τ , (n ν −1) δ ν
where the data matrix for distortions of a particular delay offset τ and frequency
offset ν is given by

Zτ ,ν = ei 2π ν 0 T s z(0 Ts − τ ) ei 2π ν 1 T s z(1 Ts − τ ) ei 2π ν 2 T s z(2 Ts − τ )

· · · ei 2π ν [n s −1] T s z([ns − 1] Ts − τ ) . (10.72)
The received space-time-frequency covariance matrix Q̃ ∈ Cn r ·n δ ·n ν ×n r ·n δ ·n ν

is given by
1 5 6
Q̃ = Z̃ Z̃† . (10.73)
ns
In the absence of noise, the rank of the space-time-frequency covariance matrix
Q̃ can be found by extension of the rank of the space-time covariance matrix
found in Section 10.5.1. With a finite number of delays and frequency taps in
the space-time-frequency data matrix, the rank of the space-time-frequency co-
variance matrix can be bound by
rank (nd + nδ − 1) (nf + nν − 1) (10.74)
out of a dimension of nδ nν nr . Note that the approximation is due to the limita-

tion of having both finite spectral and temporal extent. If it were possible to have
signals that have both finite spectral and temporal support, then the inequality
would be exact. Consequently, the fraction of the space-time covariance matrix
rank is given by
(nd + nδ − 1) (nf + nν − 1)
frac rank , (10.75)
nδ nν nr
which approaches 1/nr in the limit of large nδ and nν .
To develop the above result, consider a space-time-frequency data vector z̃(t) ∈
Cn δ ·n ν ·n r ×1 , which is a single column from Equation (10.71),
% at some& time t.
The space-time-frequency covariance is then given by Q̃ = z̃(t) z̃† (t) (which
is equivalent to the definition in Equation (10.73)). Under the assumption that
the sampled channel in the Doppler domain hδ τ ,δ f and data matrix are sampled
with the same period δτ and frequency resolution δf , the space-time-frequency
data vector has the form
⎛ "n d −1 "n f −1 ⎞
k =0 hm δ τ ,k δ f s(t − [m + 0] δτ ; [k + 0] δf )
⎜ "n d −1 "n f −1 h
m =0
⎟
⎜ m =0 k =0 m δ τ ,k δ f s(t − [m + 0] δτ ; [k + 1] δf ) ⎟
⎜ .. ⎟
⎜ ⎟
⎜ . ⎟
⎜ "n d −1 "n f −1 ⎟
⎜ hm δ τ ,k δ f s(t − [m + 0] δτ ; [k + nν ] δf ) ⎟
⎜ "mn=0 "kn=0f −1
⎟
⎜ d −1 ⎟
⎜ k =0 hm δ τ ,k δ f s(t − [m + 1] δτ ; [k + 0] δf ) ⎟
z̃(t) = ⎜ "n d −1 "n f −1 h
m =0
⎟, (10.76)
⎜ m δ τ ,k δ f s(t − [m + 1] δτ ; [k + 1] δf ) ⎟
⎜ m =0 k =0
.. ⎟
⎜ ⎟
⎜ . ⎟
⎜ "n d −1 "n f −1 ⎟
⎜ k =0 hm δ τ ,k δ f s(t − [m + 1] δτ ; [k + nν ] δf )
⎟
⎜ m =0 ⎟
⎜ .. ⎟
⎝ . ⎠
"n d −1 "n f −1
m =0 k =0 hm δ τ ,k δ f s(t − [m + nδ ] δτ ; [k + nν ] δf )
where the notation s(t; δf ) indicates the signal at time t that is shifted by fre-
quency δf . Similar to form of the space-time covariance matrix, by rearranging
the sum so that terms with the same value of delay s(t − mδτ ; kδf ) are grouped,
the rank of the space-time-frequency covariance matrix can be bounded,3 and
the number of contributions to the rank can be found. Because the frequency and
delay contributions are independent, for any given frequency there are nd +nδ −1
delay contributions. Consequently, there are (nd +nδ −1) (nf +nν −1) contributing
terms. This accounting can be observed in the rearranged space-time-frequency
data vector that is given by
⎛ ⎞
h0 δ τ ,0 δ f
⎜ 0 ⎟
⎜ ⎟
z(t) = ⎜ .. ⎟ s(t − 0 δτ ; 0 δf )
⎝ . ⎠
0
⎛ ⎞
0
⎜ .. ⎟
⎜ ⎟
⎜ . ⎟
⎜ ⎟
⎜ 0 ⎟
⎜ ⎟
+ · · · + ⎜ h[n d −1] δ τ ,0 δ f ⎟ s(t − [nd − 1 + nδ − 1] δτ ; 0 δf )
⎜ ⎟
⎜ 0 ⎟
⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
0
3 This argument is due to Shawn Kraut.

10.7 Space-time-frequency adaptive processing 361
⎛ ⎞
0
⎜ .. ⎟
⎜ ⎟
⎜ . ⎟
⎜ ⎟
⎜ 0 ⎟
⎜ ⎟
+ · · · + ⎜ h0 δ τ ,1 δ f ⎟ s(t − 0 δτ ; 1 δf )
⎜ ⎟
⎜ 0 ⎟
⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
0
⎛ ⎞
0
⎜ 0 ⎟
⎜ ⎟
⎜ .. ⎟ s(t − [nd − 1 + nδ − 1] δτ ;
+ ··· + ⎜ . ⎟ .
⎜ ⎟ [nd − 1 + nδ − 1] δf )
⎝ 0 ⎠
h[n d −1] δ τ ,[n f −1] δ f
(10.77)
Under the assumption that the channel and signal at each delay and frequency
are independent, the rank of the space-time-frequency covariance matrix is given
by the contribution of each of these (nd + nδ − 1)(nf + nν − 1) terms; thus,
Equations (10.74) and (10.75) are shown.
10.7 Space-time-frequency adaptive processing
Space-time-frequency adaptive processing is a direct extension of the space-time

adaptive processing. A space-time-frequency data matrix Z̃ ∈ C(n r ·n δ ·n ν )×n s is
defined in Equation (10.71). The dimensionality of Z̃ grows quickly with nr , nδ ,
and nν . For the processing to work well, the dimensionality of included distortions
must cover those in the environment (nδ > nd and nν > nf ); however, for large
numbers of distortions, this processing approach may be untenable.
In general, approaches that are applicable to adaptive spatial processing can
be extended to adaptive space-time-frequency processing. Similar to the space-
time adaptive processing, the space-time-frequency MMSE receive estimate of
the transmitted signal is given by
Ŝ = W̃† Z̃ . (10.78)
The space-time-frequency adaptive beamformer is given by
5 6−1 5 6
W̃ = Z̃ Z̃† Z̃ S̃†
≈ (Z̃ Z̃† )−1 Z̃ S̃† , (10.79)
where the distorted training sequence is given by

⎛ ⎞
S0 δ τ ,0 δ ν
⎜ S1 δ τ ,0 δ ν ⎟
⎜ ⎟
⎜ S2 δ τ ,0 δ ν ⎟
⎜ ⎟
⎜ .. ⎟
⎜ ⎟
⎜ . ⎟
⎜ ⎟
⎜ S(n δ −1) δ τ ,0 δ ν ⎟
S̃ = ⎜ ⎟. (10.80)
⎜ S0 δ τ ,1 δ ν ⎟
⎜ ⎟
⎜ S1 δ τ ,1 δ ν ⎟
⎜ ⎟
⎜ S2 δ τ ,1 δ ν ⎟
⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
S(n δ −1) δ τ ,(n ν −1) δ ν
Here, Sτ ,ν is defined similarly to Zτ ,ν . A necessary condition to evaluate the
space-time-frequency beamformer is that
ns ≥ nr nδ nν . (10.81)
10.7.1 Sparse space-time-frequency processing

As discussed previously, the dimensionality of the space-time-frequency covari-
ance matrix can grow quickly as the number of delay and Doppler-frequency taps
increase. This dimensionality can quickly become an issue for some applications.
For some special channels, the channel taps can be sparse. In this regime, it is
useful to perform the processing by using an operator algebra approach [141].
There are numerous difficulties in applying sparse processing, and in a large
portion of parameter space, it is simply not possible to implement a sparse so-
lution. Determining the applicability is a strong function of degrees of freedom,
phenomenology, and prior knowledge.
Problems
10.1 Consider a static single SISO channel that is represented by

h̃(τ ) = a δ(τ − T ) , (10.82)
where the notation in Section 10.1 is employed. For a complex signal bandwidth
B = 1/Ts , evaluate the discrete channel representation:
(a) T = 0,
(b) = T = Ts /2,
(c) = T = Ts /4.
10.2 The model frequency shifting (displayed in Equation (10.8)) is commonly
used in analyses rather than the more accurate time-dilation approach (displayed
in Equation (10.9)). A significant issue in practical systems is that the time
Problems 363
dilation eventually causes a chip slip that cannot be corrected by a simple phase
correction. For a filtered BPSK signal with a carrier frequency of 1 GHz and a
bandwidth of 1 MHz with a relative fractional frequency error of 10−6 between
the transmitter and receiver, evaluate the expected receiver loss because of chip
misalignment as a function of time since a perfect synchronization.
10.3 By using the notation in Section 10.2, consider a loop in parameter space
for delay and Doppler channel operators on a signal. Starting at some point, and
moving operators through the space of delay and Doppler along some path, and
then returning to the original point, the signal should be unaffected, because the
effect should only be determined by the parameters’ values. However, evaluate
an effect on some signal s(t) of the following sequence of operators Td F T−d F−
and evaluate the error.
10.4 Consider the eigenvalues of the observed space-time covariance matrix
that is observing critically sampled signals. For a 10 receive antenna array and
a single transmit antenna with a line-of-sight channel (equal channel responses
across receive antennas), evaluate the eigenvalue distribution of the receive space-
time covariance matrix of a 0 dB SNR per receive antenna assuming unit variance
per antenna noise. Evaluate the eigenvalues under the assumption that the space-
time covariance matrix includes
(a) 1 (spatial-only)
(b) 2
(c) 4
delay samples at Nyquist spacing.
10.5 Consider the eigenvalues of the observed space-time covariance matrix
with two delays. The signal and noise are strongly filtered so that they sig-
nificantly oversampled (that is, the sampling rate is large compared to the
Nyquist sample rate). For a 10 receive antenna array and a single transmit
antenna with a line-of-sight channel (equal channel responses across receive an-
tennas), evaluate the eigenvalue distribution of the receive space-time covari-
ance matrix of a 0 dB SNR per receive antenna assuming unit variance per
antenna noise in the region of spectral support. Evaluate the eigenvalues approx-
imately under the assumption that signal and noise are temporally oversampled
significantly.
10.6 Consider the signal s(t) and the SISO doubly dispersive channel char-
acterized by a time-varying channel ht (t, τ ) and the delay-frequency channel
hD (fD , τ ); develop the form of a bound on the average squared error in using
the hD (fD , τ ) form under the assumption of a bounded temporal T and bounded
spectral B signal.
10.7 Develop the results in Section 10.6.1 by using discrete rather than integral
Fourier transforms.
10.8 Develop the Doppler-frequency analysis dual of Equation (10.7).
10.9 Consider a simple discrete-time channel whose impulse response is given

by the following
1 1
h(m) = δm ,0 + δm ,1 + δm ,2 ,
2 4
where δm ,k is the Kronecker delta function defined in Section 2.1.3. Using a
computer, generate 10 000 orthogonal-frequency-division-multiplexing (OFDM)
symbols using 16 carriers with each carrier transmitting a BPSK symbol taking
values of ±1 with equal probability. Using the inverse fast Fourier transform
(IFFT), convert each OFDM symbol into its time-domain representation. Cre-
ate two vectors to store the time-domain samples, sz p and scp . The vector sz p
should contain the time-domain representation of the OFDM symbols with three
samples of zero padding between consecutive symbols, and scp should contain
a three-sample cyclic prefix. Convolve sz p and scp with the channel impulse re-
sponse h(m) and add pseudorandom complex Gaussian noise of mean zero and
variance 0.09 per sample to the results of the convolutions. A convenient way
to do this is to represent the impulse response in a vector h, create a vector of
noise samples n, and write
zz p = hT sz p + n
zcp = hT scp + n .
Therefore, zz p and zcp simulate received samples in an OFDM system with zero
padding and a cyclic prefix respectively. Decode the OFDM symbols by selecting
appropriate portions of zz p and zcp , using a fast Fourier transform (FFT) to
convert the time-domain samples into frequency domain values and checking the
sign of the frequency domain-values to decode the BPSK data. Estimate the bit
error rate by comparing the decoded data to the transmitted data for both the
zero-padding and cyclic prefix schemes. This computer exercise is intended to
illustrate the effects of using a cyclic-prefix versus simple zero padding for an
OFDM system.
11 Space-time coding
As is true for single-input single-output (SISO) communication links, there are

many approaches for coding multiple-input multiple-output (MIMO) systems in
an attempt to approach the channel capacity. Many of the space-time coding ap-
proaches have analogs in the SISO regime. The multiple antennas of MIMO sys-
tems enable spatial diversity, increased data rates, and interference suppression.
In general, there are trades in these characteristics, depending upon the coding
and receiver approach. One of the most important trade-offs in this regard is the
trade-off between data rate and diversity whereby the data communication rate
is reduced to improve probability of error or outage. That is to say, a fraction of
the data rate is sacrificed to improve robustness.
There have been numerous contributions in the field of space-time coding. The
major contributions include References [8, 99, 307, 305, 306, 361, 362, 314, 57,
58, 292, 86, 207, 166, 119, 87, 222, 270, 42, 43, 269, 226, 196]. Of particular note
are Reference [8] which introduced what is now known as the Alamouti code, a
simple and elegant space-time code for systems with two transmitter antennas,
Reference [307] which introduced systematic methods to evaluate space-time
codes, and Reference [361] which analyzed the fundamental trade-offs between
diversity and rate in multiantenna systems.
11.1 Rate diversity trade-off
The multiple transmit antennas of a MIMO system can be employed to increase

the diversity or the data rate of a given link (see References [361, 314]). The trade-
off between the rate and diversity can be analyzed either for specific modulation
schemes or in a more general form using an outage capacity formulation. In the
former, a fraction of the maximum data rate achievable on a given link using a
particular modulation scheme is sacrificed to reduce the probability of symbol
or bit error. In the latter, a fraction of the maximum rate is sacrificed to ensure
that under different realizations of the fading process, the probability that the
capacity of the link is below some target rate is reduced. Most of the practical
work in the field of space-time coding has focused on the probability of error
formulation as it relates directly to specific coding and modulation schemes.
The next two subsections describe these in more detail with specific examples.
The following sections of the chapter discuss various classes of space-time coding
schemes.
11.1.1 Probability of error formulation

The average probability of bit or symbol error for a multiple-antenna com-
munication system depends strongly on the type of modulation used (for ex-
ample binary-phase-shift keyeing (BPSK) vs. quadrature-amplitude modulation
(QAM)) and algorithms employed at the transmitters and receivers (for example,
transmitting independent data through each antenna or repeating the same infor-
mation on each antenna). Exact expressions for symbol and bit-error probabilities
of these systems may be difficult to find and moreover may differ significantly in
form complicating comparative analyses of such systems. In general, however, for
high SNR, the probability of symbol error of most practical modulation schemes
can be bounded in an order-of-growth sense. That is to say,

per r (SNR, R) ≤ a SNR−d + O SNR−d−1 , (11.1)
where d is known as the diversity coefficient or diversity order and a is a constant.

We use the notation per r (SNR, R) to represent the probability of error as a
function of the SNR and rate R.
For instance, for a SISO system using quadrature-phase-shift keying (QPSK)
and channel coefficient g, the probability of error is

per r (SNR|g) = Q ||g||2 SNR ,
where Q(.) is known as the Q-function, defined in Section 2.14.7, which is the
tail probability of the standard Gaussian probability density function,
∞
1 x2
Q(x) = √ dx e− 2 .
2π x
Note that we use the notation per r (SNR|g) to refer to the probability of error as
a function of the SNR, given the channel coefficient g.
With Rayleigh fading (complex Gaussian channel coefficients), the magnitude
square of the channel coefficient, that is, g 2 is exponentially distributed (see
Section 3.1.10). Hence, one can derive the marginal probability of error as follows
[255], [314]:
∞ ∞ √
g 2 SNR e−g = dτ Q τ SNR e−τ
2
per r (SNR) = d g Q
0 0
,
1 1 SNR 1 1
= − = +O ,
2 2 2 + SNR 2 SNR SNR2
where τ is an integration variable over the channel attenuation, that is, τ = g 2 .
The last equation indicates that a SISO QPSK system has diversity order 1 since
the dominant SNR term has an exponent equal to −1.
Note that in a fading system at high SNR, symbol and bit-error events are
typically due to a channel realization that is weak (rather than a spike in the
11.1 Rate diversity trade-off 367
noise), in the sense that the norm square of the channel coefficients is small
compared with the inverse of the SNR. Hence, for a SISO system with channel
coefficient g, for most practical modulation schemes, the probability of error is
approximated as follows [314]:
a
per r ≈ Pr g 2 < , (11.2)
SNR
where a is some constant that depends on the modulation scheme used.
11.1.2 Outage probability formulation

One can also analyze the diversity order of a system independent of the modula-
tion scheme by using the capacity of the system, replacing the error event with
an outage event. Specifically, we analyze the probability that the capacity of the
system is below a target rate under fading.
For example, consider the capacity of a slow-fading SISO link as discussed in
Section 5.3. The capacity is given by

c = log2 1 + g 2 SNR , (11.3)
where we recall that g is the magnitude of the channel coefficient for the duration
of communication. At very large SNR, the capacity can be approximated by

c ≈ log2 g 2 SNR . (11.4)
Suppose that a link wishes to communicate at a rate R. The link is in outage if

R < c. Outage occurs if g happens to be small for the duration of the communi-
cation. To reduce the probability of outage, suppose that the rate R is a fraction
r of the capacity of the channel assuming g = 1. That is to say,
R = r log2 (1 + SNR) ≈ r log2 (SNR) . (11.5)
Thus, the probability of outage, assuming Rayleigh fading, is [362]
pou t = Pr(c < R)

≈ Pr(log2 ( g 2 SNR) < r log2 SNR)

= Pr g 2 SNR < SNRr

= Pr g 2 < SNRr −1 . (11.6)
Because we assume Rayleigh fading, the magnitude square of the channel coef-
ficient g 2 is exponentially distributed. Hence,
SNR r −1
Pr( g 2 < SNRr −1 ) = dτ e−τ
0
r −1
= 1 − eSNR

= 1 − 1 − SNRr −1 + o SNR2(r −1)
≈ SNRr −1 = SNRd(r ) (11.7)

for large SNR and multiplexing rate r < 1. The function d(r) is the diversity
gain associated with the multiplexing rate r. The previous expression indicates
the rate at which outage probability can be improved at the expense of data rate
and is a fundamental relationship for fading channels.
Consequently, for the SISO Rayleigh channel, the diversity gain d and multi-
plexing rate r are related by
d(r) = r − 1 (11.8)
for high SNR. Any real coding scheme is bound by this relationship.
For a general MIMO link with nt transmitter and nr receiver antennas, the
optimal diversity-multiplexing trade-off curve was found by Zheng and Tse in
Reference [362]. While a precise analysis of this result is quite complicated, we
present a brief description of their findings here, which are based on the analysis
given in References [362, 314].
The capacity of an uninformed transmitter MIMO link with spatially uncor-
related noise, as discussed in Section 8.3, is given by

Po

c = log2 I + HH †
nt

a2 Po
= log2 1 + λm
m
nt

SNR
= log2 1 + λm , (11.9)
m
nt
where Po is the total noise normalized power and a is the average attenuation
from transmit to receive antenna. The variable λm is the mth eigenvalue of GG† ,
where the matrix is given by G ∈ Cn r ×n t , such that G = H/a, and is drawn from
an identically distributed, complex, circularly symmetric, unit-variance Gaussian
distribution. The term a2 Po is also the total SNR per receive antenna.
Note that at high SNR, the spectral efficiency for such a MIMO system can
grow approximately linearly with the minimum of nr and nt , that is, writing n =
min(nr , nt ), the MIMO link can support a spectral efficiency of approximately
n log2 (SNR), where n is the multiplexing gain provided by the multiple transmit
and receive antennas. For some real system with spectral efficiency R operating
with a multiplexing rate of r ≤ n,
R ≈ r log2 (SNR) . (11.10)
The probability of outage is given by
pou t = Pr(c < R)

SNR
≈ Pr log2 1 + λm < r log2 SNR . (11.11)
m
nt
Hence, the outage probability is related to the joint distribution of the eigenvalues
of the matrix HH† . The joint distribution of these eigenvalues has a complicated
(0, nr nt )
Diversity gain, d(r)

(1, (nr − 1) (nt − 1))
(2, (nr − 2) (nt − 2))
(min (nr , nt ),0)

(3, (nr − 3) (nt − 3))
(r, (nr − r) (nt − r))
0 1 2 3 r n
Multiplexing gain, r
Figure 11.1 Optimal diversity multiplexing trade-off for the MIMO channel.
relationship which results in a complicated analysis of the diversity multiplexing

trade-off. The analysis carried out in [362] shows that the optimal diversity-
multiplexing trade-off d(r) is a piece-wise linear function between the following
points in order:
{0, nr nt },{1, (nr − 1) (nt − 1)}, {2, (nr − 2) (nt − 2)}, . . . ,
{n, (nr − n) (nt − n)} . (11.12)
This curve is given in Figure 11.1, which illustrates the systematic trade-off
between the maximum multiplexing gain of n = min(nr , nt ) and the maximum
diversity gain of nr nt . Another observation that is often made is that if the
number of antennas at the transmitter and receiver are increased by one, the
entire curve shifts to the right by one, increasing the multiplexing gain for a
given diversity gain by one.
11.2 Block codes
In general, most practical space-time coding schemes encode information over

multiple symbols and antennas. Consider a space-time processing system in
which the transmitter has nt antennas, the receiver has nr antennas, and a
block of length ns is used. The input–output relationship of the system when
operating in frequency-flat fading can be represented by the following equation
Z = HC + N. (11.13)
The matrix C ∈ Cn t ×n s represents the transmitted codeword on each of the
nt antennas for each sample, H ∈ Cn r ×n t represents the channel coefficients
between the transmitter and receiver antennas, and N ∈ Cn r ×n s is a matrix of

additive noise samples. The structure of the codewords C is determined by the
coding scheme used.
Maximal ratio transmission

Maximal ratio transmission is a simple signal processing technique for multiple-
input, single-output (MISO) channels where transmissions are encoded over one
symbol. It is the transmit-side analog of the spatial matched-filter receiver de-
scribed in Section 9.2.1, and is also the water filling solution for the MIMO
channel described in Section 8.3 with one antenna at the receiver. The main idea
behind maximal ratio transmission is that the signals at the transmitter antennas
are phased in such a manner that they add coherently at the receiver antenna.
Consider a system with nt transmit antennas and a single receive antenna, and
the transmission of a single symbol s, that is, ns = 1. Since there is one antenna
at the receiver, that is nr = 1, the matrices in Equation (11.13) are either row
or column vectors as follows:
H = h = (h1 h2 · · · hn t )
⎛ ⎞
w1
⎜ w2 ⎟
⎜ ⎟
C=⎜ . ⎟s
⎝ . ⎠.
wn t
N = n,
where we recall that nt is the number of antennas at the transmitter, h1 , . . . , hn t

are the channel coefficients between the antennas of the transmitter and the
receiver, and n is a zero-mean, circularly symmetric, Gaussian random variable
of variance σ 2 , which represents the noise at the antenna of the receiver.
The vector
⎛ ⎞
w1
⎜ w2 ⎟
⎜ ⎟
w=⎜ . ⎟
⎝ .. ⎠
wn t
can be thought of as weights applied to the signals on the antennas of the trans-
mitter. Suppose that the transmitter uses the following w:
1 †
w= h . (11.14)
||h||
Since this w is a unit-norm vector, the transmitted power does not change with
h. Additionally, note that this w precompensates for the phase offset introduced
by the paths between each transmit antenna and the antenna of the receiver such
that the signal adds in phase at the receiver. Equation (11.13) then becomes
z = ||h|| s + n. (11.15)
From (11.2), the probability of error is approximately equal to the probability

that ||h||2 SNR is small, that is,
0 a 1
Pr ||h||2 ≤ ,
SNR
for some positive a. If the channel coefficients are independently faded, circularly
symmetric, Gaussian random variables, ||h||2 is a real χ2 random variable with
2 nt degrees of freedom or a complex χ2 random variable with nt degrees of
freedom as described in Section 3.1.11. The above probability is found using the
CDF of χ2 random variables
0 a 1 1 a
Pr ||h||2 ≤ = γ nt ,
SNR Γ(nt ) 2 SNR

1 a n t 1
≈ +o ,
nt 2 SNR SNRn t
where the last expression follows from Equation (2.272). Hence, a diversity order
of nt is possible with nt transmit antennas.
11.2.1 Alamouti’s code

Alamouti’s code, introduced in Reference [8], assumes two transmit antennas
and possibly multiple receiver antennas, although we shall focus on the single
receiver antenna case here. In other words, we assume nt = 2 and nr = 1. The
Alamouti code is implemented over two time slots with two data symbols s1
and s2 . The codeword matrix is constructed using the data symbols and their
conjugates as follows,

s1 −s∗2
C= . (11.16)
s2 s∗1
Let hj be the channel coefficient between the jth transmit antenna to the antenna
of the receiver. Communication takes place over two time slots and the channel
matrix H which is a row vector h here, and is assumed to be constant over the
two time slots, is defined as follows:
h = (h1 h2 ) . (11.17)
Hence, the received samples at times 1 and 2 are the entries of the row vector
z = Z given by
z1 = h1 s1 + h2 s2 (11.18)
z2 = −h1 s∗2 + h2 s∗1 . (11.19)
The receiver constructs a new vector w ∈ C2×1 whose first element is z1 and the
second is z2∗ . We can then write w as follows:

h 1 h2 s1 n1
w= + . (11.20)
h∗2 −h∗1 s2 n∗2
We can thus recover estimates of s1 and s2 by premultiplying w by the following
matrix
∗
1 h1 h2
, (11.21)
||h|| h∗2 −h∗1
which yields the following expression:
∗
ŝ1 1 h1 h2 h1 h 2 s1
=
ŝ2 ||h|| h∗2 −h∗1 h∗2 −h∗1 s2
∗
1 h1 h2 n1
+
||h|| h∗2 −h∗1 n∗2

||h|| 0 s1 ñ1
= +
0 ||h|| s2 ñ2

||h|| s1 ñ1
= + . (11.22)
||h|| s2 ñ2
Note that ñ1 and ñ2 are independent CN (0, 1) random variables since n1 and
n∗2 are independent CN (0, 1) variables and the matrix
∗
1 h1 h2
(11.23)
||h|| h∗2 −h∗1
has orthonormal columns.
Hence, two independent data symbols can be transmitted over two time in-
tervals. Since at high SNR the probability of bit error is mainly due to poor
fading conditions, the probability of bit error assuming unit transmit power is
approximately equal to the probability that ||h||2 is smaller than the inverse of
the SNR, that is,
0 a 1
Pr ||h||2 ≤
SNR
for some a. If the vector h is a 1 × 2 row vector of independent, circularly
symmetric, Gaussian random variables, the norm square of the vector, ||h||2 is
distributed as a complex χ2 random variable with two degrees of freedom, or a
real χ2 random variable with four degrees of freedom (see Section 3.1.11) and
σ 2 = 1. The CDF of a complex χ2 random variable with two degrees of freedom
is given in terms of γ(k, x), the lower incomplete gamma function as follows:
x
PχC2 (x; 2; 1) = γ 2,
2
1 2
= x + o(x2 ) ,
2
where the last expression follows from Equation (2.272). Hence the probability
of error for the Alamouti code is

1 a 1
Pr ||h||2 ≤ ≈ + o ,
SNR 2 SNR2 SNR2
which indicates that the diversity order of the Alamouti code is 2. Therefore, the
Alamouti code enables one transmission per symbol time and obtains a diversity
order of 2. Note that the diversity order of 2 is achieved because the Alamouti
code transmits each symbol over each antenna, and so, at high SNR, an error
occurs only if the channel coefficients for both antennas are small. Note that this
is the same diversity order as that achieved by maximal ratio transmission which
was shown previously to achieve a diversity order of nt and one symbol per unit
time. Unlike maximal ratio transmission, the Alamouti code however does not
require the transmitter to have channel-state information. Transmitter channel-
state information requires significant overhead as channel parameters have to be
estimated at receivers and then fed back to transmitters.
11.2.2 Orthogonal space-time block codes

The Alamouti code is an example of an orthogonal space-time block code. Space-
time block codes, like linear error-correction codes, can be described by a gen-
erator matrix G ∈ Cn s ×n t . (Note that the generator matrix here is unrelated
to generator functions.) Each row of the generator matrix describes the signals
transmitted on each antenna at a given time slot. For instance, consider the
following generator matrix
⎛ ⎞
g11 g12 · · · g1n t
⎜ g21 g22 · · · g2n t ⎟
⎜ ⎟
G=⎜ . .. .. ⎟ . (11.24)
⎝ .. . . ⎠
gn s 1 gn s 2 ··· gn s n t
The kth row represents the nt symbols transmitted at time slot k.

Hence, the Alamouti space-time code can be described by using the following
generator matrix,

s1 s2
G= , (11.25)
−s∗2 s∗1
where we recall that the two symbols to be transmitted are s1 and s2 .

An orthogonal space-time block code is one in which the generator matrix of
the code has orthogonal columns over the transmitted symbols. The orthogonal
columns of the generator matrix enable relatively low complexity decoding of
the transmitted data, as the following example of the Alamouti code illustrates.
For the Alamouti space-time block code in Equation (11.16), observe that the
product of the Hermetian transpose of the generator matrix with the generator
matrix is diagonal for any choice of s1 , s2 since

†
s1 s2 s1 s2 ||s1 ||2 + ||s2 ||2 0
= .
−s∗2 s1 ∗ −s∗2 s1 ∗ 0 ||s1 || + ||s2 ||2
2
(11.26)
This property enables simple linear decoding of the space-time code, as shown
in the previous section. For a general space-time block code for nt transmitter
antennas, let the generator matrix G have the property that

nt
G† G = |sj |2 I (11.27)
j =1
where the entries of the matrix G are s1 , −s1 , s2 , −s2 , . . . , sn t , −sn t , which rep-
resent the transmitted symbols from the nt antennas. If the sj s are real and
Equation (11.27) holds, the matrix G is known as a real orthogonal design [305].
Orthogonal designs permit easy maximum-likelihood decoding using only linear
operations as described in [305], which also provides some further examples of
real orthogonal designs for 4 × 4 and 8 × 8 generator matrices. Note that there
are only a small number of real orthogonal designs with all nonzero entries. A
4 × 4 example from Reference [305] is the following:
⎛ ⎞
s1 s2 s3 s4
⎜ −s2 s1 −s4 s3 ⎟
⎜ ⎟. (11.28)
⎝ −s3 s4 s1 −s2 ⎠
−s4 −s3 s2 s1
We refer the reader to specialized texts on space-time coding (for example, Ref-
erences [331, 157, 84]) for further examples and analyses of orthogonal block
codes. The material in this subsection is based on Reference [157].
11.3 Performance criteria for space-time codes
The benefits provided by a space-time code, in terms of the diversity and cod-
ing gains, can be systematically analyzed by using codeword difference matrices
associated with the space-time coding scheme as introduced in Reference [307],
which is the basis for the discussion in this section. The codeword difference ma-
trix between a pair of transmitted codewords C and Ck is simply the difference
between the two codewords
Dk = C − Ck .
Recall that Ck ∈ Cn t ×n s , so Dk ∈ Cn t ×n s . The codeword difference matrix
Dk is used to bound the probability of a transmitted codeword Ck being erro-
neously decoded as a codeword C at the receiver. Assuming that the channels
between different pairs of antennas of the transmitter and receiver are indepen-
dent, identically distributed, circularly symmetric, Gaussian random variables of
11.3 Performance criteria for space-time codes 375
zero mean and unit variance, the error probability can be bounded from above.
In order to write this bound, for notational simplicity, define the matrix Ak as
the product of Dk and its Hermetian transpose as follows,
Ak = Dk D†k , (11.29)
with λm denoting the mth largest nonzero eigenvalue of the matrix Ak . Using
this notation, the probability of confusing C with Ck denoted by Pr (Ck → C )
is bounded from above as follows:

rank(A) 1
Pr (Ck → C ) ≤ Πm =1 λm −n r
× SNR −rank(A) n r . (11.30)
4
The derivation of this inequality, which uses bounds for the tail of the Gaus-
sian probability density function and linear algebra properties, can be found in
Reference [305]. From the right-hand side of Equation (11.30), the probability of
decoding the codeword Ck as C decays as
1
. (11.31)
SNR rank (A k ) n r
Hence, to maximize the diversity order the codewords should be chosen to

maximize the rank of the codeword difference matrix Dk over all distinct k and
. This requirement is known as the rank criterion.
Additionally, since the codeword matrices are of dimensions nt ×ns , the largest
possible rank of Ak is min(nt , ns ). Since, in most systems, the latency associated
with encoding across ns samples can be large compared to realistic numbers of
antennas nt , it is typically the case that nt ≤ ns , which indicates that the
maximum possible diversity order for realistic nr × nt MIMO systems is nr nt .
Rewriting Equation (11.30) as
⎛ ⎞
1 rank(A k ) −n r
1 rank
Pr (Ck → C ) ≤ ⎝ Πm =1 k λm rank( A k ) SNR ⎠
(A )
,
4
(11.32)
we can observe that the SNR is scaled by the factor
1
rank(A )
Πm =1 k λm rank( A k ) ,
which means that, with good codewords, one can effectively increase the SNR.
Thus, space-time coding can also provide coding gain to the system. To maximize
the coding gain of a given space-time code, we need to choose the codewords
such that the minimum over all codeword pairs of the quantity in the previous
expression is maximized. In the literature, this is known as the determinant
criterion since the quantity in the parentheses of the previous expression equals
the determinant of Ak if Ak is a full-rank matrix.
Hence, the rank and determinant criteria can be used to design coding schemes
that have the desired diversity order and coding gains. The diversity order is
increased by maximizing the minimum (over all pairs of codewords) rank of the
matrix Ak , which is the product of the codeword difference matrix associated
with the kth and th codewords. The coding gain is maximized by maximizing
the minimum over all codeword pairs of the determinant of this matrix.
11.4 Space-time trellis codes
11.4.1 Trellis-coded modulation

Trellis-coded modulation (TCM) is a technique to combine error-control coding
with modulation and typically results in better performance than systems that
optimize error-control coding and modulation separately [316, 317]. The coding
process is described in terms of a trellis diagram which describes state transi-
tions and output symbols that correspond to input bits. In Figure 11.2, a 4-state
trellis from [316] is illustrated. This code uses an 8-PSK constellation whose
constellation points are labeled 0, 1, . . . , 7. Two uncoded data bits are mapped
into a 3-bit, 8-PSK constellation point. The states are labeled 0, 1, 2, 3. The arcs
represent state transitions that occur in response to two input bits. The labels
on the arcs represent the constellation points that are transmitted. The partic-
ular mapping of arcs to input bit sequences is not important. For this example,
suppose that the arcs emanating from each node correspond to input bit pairs
00, 01, 10, and 11 from top to bottom. Starting with state 0, the sequence 11 01
00 is encoded into the constellation points 6, 5, and 2. The state transitions that
occur are 0 → 1, 1 → 2, and 2 → 1. The dashed, dotted, and bold lines repre-
sent the transitions due to the first, second, and third bit pairs, respectively, in
Figure 11.2.
At the receiver, maximum-likelihood decoding of the sequence of transmitted
symbols could be performed based on the received symbols. The maximum-
likelihood decoder finds the path along the trellis that is most likely given the
sequence of received symbols. Since the maximum-likelihood decoder is typi-
cally computationally expensive, it is common to use an approximation to the
maximum-likelihood decoder such as the Viterbi decoder outlined in Section
11.4.2.
11.4.2 Space-time trellis coding

Tarokh et al. [307] proposed using trellis coding in the context of space-time cod-
ing. They present several different trellis diagrams for systems with nt transmit
antennas, nr receiver antennas, and corresponding QPSK and 8-PSK constella-
tions, which are shown in Figures 11.3 and 11.4, respectively. The trellis codes
they specify include symbols that are to be transmitted on each antenna. For
instance, in Figure 11.5, a code for two transmit antennas with QPSK and four
states originally proposed in Reference [307] is illustrated. The state labels cor-
respond to the QPSK symbols to be transmitted on each antenna as a result of
0 4 0
2
6
2
6
1 1
0
4
States States
1
5
2 2
3
7 3 7
3 1 3
Allowed
transitions
Figure 11.2 8-PSK trellis diagrams that shows the states, and allowed transitions from
left to right. The labels on the arcs represent the transmit symbol corresponding the
state transition.
each sequence of bits. For instance in the third state and if the data bits are 01,
symbols 2 and 1 are transmitted from antennas 1 and 2, respectively, and the
next state will be state 2. This idea can be generalized to the scenario of a larger
numbers of transmit antennas where each arc in the trellis diagram corresponds
to a codeword q1 q2 · · · qn t whereby qi is the constellation point transmitted on
the ith antenna.
In addition to being described by a trellis, space-time trellis codes can also
be described in terms of an input–output relationship [307]. Consider a 2M -
ary phase-shift-keying transmission where the M bits associated with the tth
transmit symbol are b1t , b2t , . . . , bM t . Let the output at time t at each of the nt
antennas be contained in a vector xt ∈ Cn t ×1 . xt can be expressed as
m −1
M K
xt = bm (t−k ) cm k , (11.33)
m =1 k =0
where cm k are coefficient vectors whose entries are in {0, 1, . . . , 2M − 1} and Km

is the memory depth of the encoder associated with the mth bit of the symbol.
Note that the summation in Equation (11.33) is done modulo 2M .
1
1
0.5
0 2 0
−0.5
−1 3
−1 −0.5 0 0.5 1
Figure 11.3 QPSK constellation diagram and labelings for Tarokh space-time trellis
code.
The space-time trellis code described in Figure 11.5 can be written in the form
of Equation (11.33) as follows:

2 1 0 0
xt = b2(t−1) + b1(t−1) + b2t + b1t , (11.34)
0 0 2 1
where the addition is performed modulo 4.
Another example from Reference [307] is an 8-PSK code given by

4 2 5
xt = b3(t−1) + b2(t−1) + b1(t−1)
0 0 0

0 0 0
+ b3t + b2t + b1t , (11.35)
4 2 1
where the addition is done modulo 8.
The receiver uses a Viterbi decoder [327] to estimate the maximum-likelihood
transmitted signal over the length of the space-time code. The decision metric
that is used in the Viterbi decoder at a given symbol time t is the following:
$
nr $
$2
nt $
$ $
$yt − h k qk t $ , (11.36)
$ $
=1 k =1
where yt is the received symbol at antenna at time t, qk t is the transmitted

symbol from antenna k at time t, and hk is the channel coefficient between
transmit antenna k and receive antenna . Observe that the second summation
corresponds to the received symbol at antenna at time t in the absence of noise.
1
2
0.8
0.6 3 1
0.4
0.2
0 4 0
−0.2
−0.4
−0.6 5 7
−0.8
6
−1
−1 −0.5 0 0.5 1
Figure 11.4 8-PSK constellation diagram and labelings for Tarokh space-time trellis
code.
Hence, the Viterbi decoder computes the path through the trellis with the lowest
accumulated decision metric. Note that the analysis bounding the probability of
error which led to (11.30) still holds in this case. The codewords correspond to
the different valid sequences through the trellis. Hence, the rank and determinant
criteria developed in Section 11.3 can also be used in the context of space-time
trellis codes.
Chen et al. [57, 58] proposed a different criteria to be used in designing space-
time trellis codes when the product of the number of transmitter and receiver
antennas is moderately high (nr nt > 3). They use a central-limit-theorem-based
argument to show that the probability of error associated with interpreting a
codeword Ck as C can be bounded from above in the limit as nr → ∞:

1 1 nt
lim Pr (Ck → C ) ≤ exp − nr SNR λi , (11.37)
n r →∞ 4 4 i=1
where the ith squared singular value of the codeword difference matrix λi is as
defined in Section 11.3. Thus, when the number of receiver antennas is large,
maximizing the trace of the matrix Ak should result in a smaller probability
of error. This criterion is introduced as the trace criteria by Chen et al. and is
used as a design tool to identify space-time trellis codes with low probabilities
of error [57, 58].
sym. 0 on antenna 1, sym. 0 on antenna 2

00 01 02 03
Branch 10 11 12 13
labels
20 21 22 23
30 31 32 33
Figure 11.5 Space-time trellis code for two transmit antennas with 4-PSK and two bits
per symbol.
Using this scheme, Chen et al. provided several different space-time trellis
codes for QPSK and 8-PSK systems in [58]. One that is equal in complexity as
Tarokh’s code from Figure 11.5 and Equation (11.34) is given by the following
equation, where addition is done modulo 4,

1 2 0 2
xt = b2(t−1) + b1(t−1) + b2t + b1t . (11.38)
2 0 2 3
This code is shown to outperform the Tarokh code by approximately 2.1dB when
the number of antennas at the receiver nr = 4. That is to say, the probability
of error achievable with the code in Equation (11.38) is equal to the probability
of error achievable with the code in Equation (11.34) with approximately 2.1
dB higher SNR. For nr = 1, however, the two codes are comparable which
is unsurprising since the Tarokh code was primarily designed for one receiver
antenna.
The following 8-PSK space-time trellis code was given by Chen et al. [58] and
has a complexity comparable to the code given in Equation (11.35),

3 2 4
xt = b3(t−1) + b2(t−1) + b1(t−1)
4 0 0

2 4 0
+ b3t + b2t + b1t , (11.39)
1 6 4
where the addition is done modulo 8. The performance of this code is comparable
to the code given in Equation (11.35) with nr = 1. For nr = 4, however, this
code is better by approximately 1.7 dB. Note that the 1.7 dB number is obtained
based on an i.i.d., circularly symmetric Gaussian channel model with constant
Interleaving Binary
Binary Coding
Labeling
Figure 11.6 Bit-interleaved coded modulation block diagram. The jth data bit is
denoted by bj with the jth interleaved bit given by bj . The transmit symbol of the
kth antenna is denoted by xk .
fading across a given block. We refer the reader to Reference [58] for details and
additional space-time codes that were found using the trace criteria.
11.5 Bit-interleaved coded modulation
11.5.1 Single-antenna bit-interleaved coded modulation

Zehavi introduced the idea of bit-interleaved coded modulation (BICM) as a
technique to improve the performance of code diversity over fading channels
through bit-wise interleaving in Reference [360]. Detailed analyses and perfor-
mance criteria for BICM can also be found in Reference [90].
The basic idea behind bit-interleaved coded modulation is to spread the bits
corresponding to a codeword across multiple transmit symbols, and in the case
of channels fading over time, multiple channel realizations. The spreading of the
bits is accomplished by rearranging the sequence of modulated bits using an
interleaver with some predetermined (typically pseudorandom) sequence. The
basic structure of a bit-interleaved coded modulator is shown in Figure 11.6.
A block of n data bits is first encoded using some binary error-control code
of rate R = n/N to produce a codeword of length N bits. The N bits of the
codeword produced by the binary encoder are permuted using some permutation
function π. The bits in the permuted codeword are then broken into groups of M
bits and mapped into constellation points corresponding to some 2M -ary QAM
constellation.
With sufficient interleaving depth, each transmitted symbol would have con-
tributions from bits from m different codewords. Furthermore, the bits of a given
modulated symbol will be effectively uncorrelated as they will belong to different
codewords. If a symbol error occurs at the decoder, the resulting erroneous bits
will be spread over several codewords. Hence, each codeword will have a small
number of errors increasing the chances that the errors can be corrected. In fast-
fading channels where channel realizations change rapidly over time, interleaving
1
x1
Modulation
Serial to parallel
x2
Modulation
Binary bj cj
Interleaving
Coding
x nT
T
Modulation
Figure 11.7 Multiantenna bit-interleaved coded modulation block diagram.
provides a diversity benefit whereby the bits corresponding to a particular code-

word are transmitted through multiple channel realizations.
11.5.2 Multiantenna bit-interleaved coded modulation

The extension of bit-interleaved coded modulation to multiantenna systems was
introduced in Reference [46]. Results from physical implementations can be found
in Reference [38]. The discussion in this subsection is based on these works.
For multiantenna bit-interleaved coded modulation, the coded bits are in-
terleaved as in the SISO case and the resulting bits are parallelized and then
modulated for transmission on each antenna as shown in Figure 11.7. Note that
the binary encoder takes a string of input data bits and produces a string of
coded bits bj that are then permuted using an interleaver to produce a string of
bits cj . Note that cj = bk for some k that is determined by the interleaver and
the mapping from the coded bits bj to the interleaved sequence of coded bits cj
is known at the receiver.
Blocks of m nt bits from the interleaver are then parallelized into nt symbols
of m bits each, which are then modulated and transmitted in parallel on the nt
antennas. The modulations thus use constellations of size 2m . The transmitted
signal on antenna j for a given symbol time is represented by xj . Note that we
have suppressed the time index for notational simplicity.
Since the bits for a given codeword are spread by the interleaver and transmit-
ted over multiple antennas, each codeword contains bits that were transmitted
through multiple channel realizations, one for each antenna and one for each co-
herence interval (duration for which channel parameters remain constant) over
which the coding/decoding takes place. Thus, spatial and time diversity can be
obtained.
1
y1
Log-Likelihood Ratio Computation

2
y2
(cj) (bj) Binary

De-interleaver Decoder
nR
ynR
Figure 11.8 Multiantenna bit-interleaved coded modulation receiver. The received

symbols on the kth antenna are denoted by yk with the log-likelihood ratio for the bit
cj given by λ(cj ). The log-likelihoods corresponding to the jth deinterleaved bit, bj
are denoted by λ(bj ).
A general receiver structure for a bit-interleaved coded modulation system is

shown in Figure 11.8. In general, we assume that the receiver performs some vari-
ant of maximum-likelihood decoding. As such, appropriate log-likelihood ratios
(which are sufficient statistics) are all that are required for decoding. To illustrate
the operation of the receiver, consider the transmission of the bits c1 , . . . , cm n t ,
which we represent in a vector c for simplicity. Suppose that these bits are en-
coded into nt transmit symbols corresponding to a given symbol time that we
represent by the vector x ∈ Cn t ×1 where
⎛ ⎞
x1
⎜ x2 ⎟
⎜ ⎟
x = ⎜ . ⎟.
⎝ .. ⎠
xn t
Let the mapping between the m nt bits to constellation points be denoted by

ϕ whereby x = ϕ(c). The sampled received signals from the antennas of the
receiver are contained in the vector y where
⎛ ⎞
y1
⎜ y2 ⎟
⎜ ⎟
y = ⎜ . ⎟ = H x + w.
⎝ .. ⎠
yn r
The matrix H ∈ Cn r ×n t contains the narrowband fading coefficient between

transmit antenna k and receiver antenna j in its jkth entry, which we denote
by hj k , and the vector w ∈ Cn r ×1 contains i.i.d. circularly symmetric Gaussian

random variables of variance σ 2 , that is w ∼ CN (0, σ 2 I).
The receiver computes a log-likelihood ratio of ck k = 1, . . . , m nt based on
the received samples y. That is to say, for each ck , the receiver computes the
log-likelihood ratio λ(.),

Pr (ck = 1|y)
λ(ck ) = log .
Pr (ck = 0|y)
Assuming that the transmit symbols are equally likely, and the entries of y are
conditionally independent if x is known, using some probabilistic manipulations,
we can write
⎛" ⎞
||y−H x||2
x∈Xk 1 exp − σ 2
λ(ck ) = log ⎝ " ⎠ ,
||y−H x||2
x∈Xk 0 exp − σ 2
where Xk 0 is the set of x corresponding to all combinations of c1 , . . . , cn t for

which ck = 0, and Xk 1 is the set of x corresponding to all combinations of
c1 , . . . , cn t for which ck = 1. That is to say:
Xk 0 := {x : ck = 0 and x = φ(c)}
Xk 1 := {x : ck = 1 and x = φ(c)} ,
where the := symbol denotes a definition. The de-interleaver reorders the

likelihood ratios according to the inverse of the mapping used to order bk to
ck . Hence, λ(bk ) is the likelihood ratio of bit bk , which was computed based on
the symbol time corresponding to the transmission of bk . The likelihood ratios
can then be used by the binary decoder to estimate the transmitted bits.
11.5.3 Space-time turbo codes

As with other forms of codes, turbo codes can also be used in the context of
space-time coding [292]. Turbo codes are forward-error-correction codes, origi-
nally introduced for SISO channels in Reference [19], that perform close to the
Shannon capacity. The general idea of a turbo code in a SISO channel is that the
block of information bits is encoded by using two separate encoders. The input
bits to one of the encoders are an interleaved version of the input bits to the
other encoder. At the receiver, iterative decoding is performed where the bits
corresponding to one of the encoders are decoded using soft decisions; that is,
the decoder specifies the probability that a given bit is a 1. The decoder then
uses these probabilities to decode the bits from the other encoder, yielding more
soft decisions, which are in turn used to decode the bits from the first encoder.
This process is iterated a number of times.
The transmit side of space-time turbo-coded systems is identical in principle
to Figure 11.7 with the binary encoder implementing a turbo code. Note that
the interleaver in the encoder of Figure 11.7 is not the same as the interleaver
11.6 Direct modulation 385
used in turbo coding. In general, a different interleaver is used in the turbo

encoder/decoder. The receiver side may be described as in Figure 11.8 with
the turbo decoder taking the place of the binary decoder. Note that alternative
strategies for decoding turbo space-time codes that make use of iterations of
estimated prior probabilities for cj have also been proposed in works such as
[46]. The interested reader should consult a text on turbo coding for space-time
systems such as [136] for further details.
11.6 Direct modulation
Linear block codes can be used to perform effective space-time coding by di-
rect modulation of the encoder output [207]. Consider a system with nt trans-
mit antennas and a constellation of size M , that is to say, an M -ary symbol
is transmitted through each antenna. Since a constellation point can represent
m = log2 M bits of information, there are nt m bits of information that can be
represented using m-bit constellations on nt antennas. Hence, we can construct
a “space-time symbol” that can represent nt m bits of information. Now suppose
that the block code is defined over a finite field (Galois field) of order q. Each
symbol in a codeword can be represented by log2 q bits and if
log2 q = nt m
q = 2n t m ,
we can directly map each codeword symbol into a space-time symbol.
Perhaps the best possible error-control codes for a direct-modulation coding
scheme are low-density parity-check codes (LDPC), which are a class of capacity-
achieving, linear, block codes. These codes are defined by their parity-check
matrices C, which have a certain structure, and comprise mostly zero-entries,
hence the term low-density. Using posterior probability decoding algorithms,
these codes have been shown to approach the Shannon capacity to fractions of
a decibel in signal-to-noise ratio. We refer the reader to standard texts on error
control coding such as Reference [191] for further details.
Low-density parity-check codes with direct modulation can perform close to
the outage capacity of MIMO links as illustrated in Figure 11.11 in which a
low-density parity-check code over GF (256) performs close to the ideal outage
capacity for 4×4 and 2×2 MIMO systems. The main drawback of this technique
is that it is computationally intensive as the receiver must perform decoding
over a large finite field. One notable cost is associated with the fact that simple
likelihood ratios are not sufficient for the iterative decoding process since each
codeword symbol can take one of q possible values.
Using these codes and Bayesian belief networks for decoding, Margetts et al., in
Reference [207], find that this technique consistently outperforms the space-time
trellis codes proposed by Chen et al. [57] and with a computational complexity
of O(ns q log q), where ns is the number of symbols per decoding block.
11.7 Universal codes
El-Gamal and Damen [86] introduced a framework for creating full-rate, and full
diversity, space-time codes out of underlying algebraic SISO codes. The general
approach is to divide a space-time codeword into threads over which single-input
single-output codes are used. Each thread is associated with a SISO codeword
and defines the antenna and time slot over which symbols corresponding to the
codeword are transmitted.
Consider a space-time system with nt transmit antennas with a codeword
spanning ns time slots. A thread refers to a set of pairs of antenna and time slot
indices such that all antennas (numbered 1, 2, . . . nt ) and time slots (numbered
1, 2, . . . , ns ) appear, no two threads have identical antenna and time slot pairs,
and the nt antennas appear an equal number of times in each thread. These
requirements ensure that each thread is active during every time slot, each thread
uses all of the antennas equally, and at any give time slot, there is at most one
thread active for each antenna.
A simple example of a thread for a system with ns = nt , which we denote by
1 , is
1 = {(1, 1), (2, 2), (3, 3), . . . , (nt , nt )} , (11.40)
where the kth antenna is used at the kth time slot. From [86], by offsetting the
antenna indices and incrementing them modulo nt , we arrive at the following
generalization to the above thread for 1 ≤ j ≤ L ≤ nt :
j = {(mod(j, nt ) + 1, 1), (mod(j + 1, nt ) + 1, 2), . . . , (mod(j + ns , nt ) + 1, ns )} .
As an example, consider a system with nt = ns = L = 4. The threads given by

the expression above are shown in Figure 11.9.
In conjunction with this framework, Reference [86] introduces a new class of
space-time codes called threaded-algebraic space-time codes (TAST), for which
an example is provided which achieves full diversity. The space-time codewords
for this class of space-time codes are generated as follows. Suppose that there are
K information-bearing symbols (for example, bits) in a vector u. For simplicity,
let’s assume that L/K is an integer. Partition the entries of u into L length L/K
vectors uj for j = 1, 2, . . . , L (note that this can be generalized to nonequal uj ).
A SISO code that achieves full diversity (called a component code) is used to
encode the vector uj into a length ns vector sj . The component codes could in
general be different for each thread. If the component codes are the same on
each thread, the threaded space-time code is called symmetric. Each codeword
sj is multiplied with a coefficient φj . Multiplication by these coefficients enable
the codewords to occupy independent algebraic subspaces. The coefficients φj
are defined as
1 L −1
φ1 = 1, φ2 = φ n t , . . . , φL = φ nt .
11.7 Universal codes 387
TS. 1 TS. 2 TS. 3 TS. 4
Ant. 1 1 4 3 2
Ant. 2 2 1 4 3
Ant. 3 3 2 1 4
Ant. 4 4 3 2 1
Figure 11.9 Threads for universal space-time coding with four antennas, four time
slots and four threads. The horizontal and vertical dimensions represent the time slots
and antennas respectively. The numbers correspond to the threads. For illustration,
the antenna and time-slot pairs corresponding to thread three are shaded.
A space-time formatter is used to assign the codeword φj sj to the jth thread,

which determines the sequence of antenna and time-slot pairs over which the
samples of the vector φj sj are transmitted.
The number φ is carefully chosen and depends on the structure of the com-
ponent codes that are used. Several examples of codes and corresponding φs are
presented in [86]. These φj lie in linearly independent algebraic subspaces and,
hence, φj sj and φk sk for k = j lie in different algebraic subspaces and essentially
do not interfere with each other in the appropriate algebraic space.
As is shown in [86], one can find values of φ to achieve full diversity. Addition-
ally, by choosing the number of threads L equal to the minimum of the number
of antennas at the transmitter or receiver, that is, L = min(nt , nr ), it is shown in
Reference [86] that full rate can be achieved. Furthermore, with this choice of the
number of threads L, maximum-likelihood decoding can be performed using a
sphere decoder with polynomial complexity for moderate SNRs. We refer the in-
terested reader to the original source, that is Reference [86] or a text specializing
in space-time coding such as Reference [157] for details.
D-BLAST
The Bell Labs layered space-time (BLAST) architecture is a family of space-time
transmission schemes that were developed at Bell Labs. These schemes can be
described by using the universal space-time coding framework discussed in the
previous section. Diagonal-BLAST (D-BLAST) [99] is the first of these.
The D-BLAST scheme uses a diagonal threading scheme as depicted in Fig-
ure 11.10 with coefficients φ1 = φ2 = · · · = φL = 1, where L is the number
of threads. Each thread is at least nt symbols long, and, hence, is transmitted
through all antennas. Since each thread includes transmissions over each of the
TS. 1 TS. 2 TS. 3 TS. 4 TS. 5
Ant. 1 1 2 3
Ant. 2 1 2 3
Ant. 3 1 2 3
Ant. 4 1 2 3
Unused space-
time slots
Figure 11.10 Antenna/time-slot assignments for Diagonal-BLAST. Three threads are

shown in this example. The shaded regions correspond to unused time/antenna
combinations.
nt transmitter antennas and each symbol is received by nr receiver antennas,

with proper coding the full diversity of nt nr is achievable by using D-BLAST.
D-BLAST requires computationally intensive decoding as the receiver has to per-
form joint maximum-likelihood decoding of all the streams. A simpler method of
decoding is to use vertically aligned layers in a method known as Vertical-BLAST
(V-BLAST) [353, 114]. Note that V-BLAST reduces to simply transmitting inde-
pendent data on each antenna, that is to say, the codewords are not transmitted
across multiple antennas.
11.8 Performance comparisons of space-time codes
For single-carrier systems, it is convenient to compare the performance of dif-

ferent space-time codes in terms of their distance from the outage capacity. In
Figure 11.11, various space-time codes are compared to outage capacities assum-
ing a 90% probability of closure. As an example, the 2 × 2 Alamouti space-time
code sits a little under 6 dB off the 2 × 2 outage capacity.1
11.9 Computations versus performance
Since maximum-likelihood decoding of space-time codes is computationally pro-

hibitive, it is common practice to use suboptimal space-time coding schemes,
1 These results are courtesy of Adam Margetts and Nicholas Chang.

11.9 Computations versus performance 389
5
4 ´ 4 Achievable Rate
2 ´ 2 Achievable Rate
4.5
3.5 4 ´ 4 BICM−ID,
Throughput (bps/Hz)
4 ´ 4 Direct 4 iter.
GF(256)
3 LDPC
4 ´ 4 BICM−ID, 2´2
2.5 1 iter. Alamouti
2
4´4 2´2
64 State 64 State
1.5 STTC 2 ´ 2 Direct
GF(16) STTC
LDPC
1
0.5
0
−5 0 5 10
SNR per Receive Antenna (dB)
Figure 11.11 Outage capacity for a 2 × 2 and 4 × 4 MIMO channel as a function of

average SNR per receive antenna (a2 Po ), assuming a 90% probability of link closure
(10% outage). The performance of various space-time codes is compared: Alamouti’s
code, bit-interleaved coded modulation (BICM), a 64-state space-time trellis code,
and direct modulation Galois field low-density parity-check (LDPC) space-time codes
with 16 and 256 symbols. Courtesy of Adam Margetts and Nicholas Chang.
which are computationally efficient but may have suboptimal performance. A

useful method to compare the performance of suboptimal space-time coding
schemes is by using a metric called the excess SNR, which was introduced in [55].
The excess SNR is defined as the additional SNR required for a given suboptimal
coding scheme to achieve the same frame-error-rate (FER) as the optimal cod-
ing scheme. Thus, the excess SNR for a given coding scheme characterizes the
amount of additional transmit power required for that coding scheme to achieve
a frame error rate equal to the optimal coding scheme. In Figure 11.11, the ex-
cess SNR corresponds to the difference in decibels between the specific coding
schemes (for example, 4 × 4 bit-interleaved coded modulation) and the related
system bound.
In Figure 11.12, the excess SNR versus computational complexity for various
space-time coding schemes is shown [55]. The vertical axis illustrates the gap be-
tween the ideal SNR required to achieve 10% outage and the SNR required
to achieve the same outage probability for a particular coding scheme. The
horizontal axis indicates the number of floating-point operations required per
Excess SNR versus Complexity

6
4−State STTC
2´1 8−State STTC
2´2
5 16−State STTC
Excess SNR (dB) at 10% Outage 32−State STTC
64−State STTC
4 BICM LDPC GF(2)
Direct LDPC GF(256)
3
4´4
4´1
2
0 2 3 4 5
10 10 10 10
Floating Point Operations per Information Bit
Figure 11.12 Excess SNR versus computational complexity for space-time trellis codes,
bit-interleaved coded modulation, and direct modulation. The code used to generate
this figure is courtesy of Adam Margetts and Nicholas Chang.
information bit. Note that the direct LDPC modulation using GF (256) can
achieve an excess SNR of lower than 1 dB. This near-optimal performance comes
at the significant computational cost of approximately 8 × 104 floating point op-
erations per information bit.
Problems
11.1 Using Monte Carlo simulations to generate channel matrices H, please

plot the empirical outage probability versus r for a system with two transmit
and receive antennas, nr = nt = 2, utilizing Equation (11.11). You may use and
SNR of 10 dB.
11.2 Evaluate the diversity order of a SIMO system in a Rayleigh-faded additive-
white Gaussian noise channel when the receiver uses a spatial matched-filter
receiver described in Section 9.2.1.
11.3 Consider a MIMO system with nt = nr = 2 antennas at the transmitter
and receiver. Assume that the Alamouti scheme is used to encode transmissions
and let hj k denote the channel coefficient between the jth transmitter antenna
and the kth receiver antenna. Additionally, let zj k be the sampled received signal
on the jth antenna at the kth time slot and write the following,
ŝ1 = h∗11 z11 + h∗12 z12
∗
+ h∗21 z21 + h∗22 z22
∗
ŝ2 = h∗12 z11 − h∗11 z12

∗
+ h∗22 z21 − h∗21 z22
∗
.
Problems 391
By comparing the equations above with that of a SIMO system with four receiver
antennas, show that a diversity order of 4 is achievable with the Alamouti scheme
and two receiver antennas.
11.4 Consider a space-time coding system with nt = 2 transmit antennas,
nr = 2 receive antennas, and coding preformed over ns = 2 symbol times. Let
the codewords be as follows:

1 1 1 −j
C1 = , C2 = ,
1 1 j j

1 1+j 1 − j −j
C3 = , C4 = . (11.41)
1 1−j 1−j j
Using the determinant criteria, find the maximum diversity gain achievable using
this space-time code.
11.5 Using the constellation diagram in Figure 11.3 and the space-time trellis
code given in Figure 11.5, please list the transmitted symbols from each transmit
antenna due to the following sequence of bits: 10 11 11 01 10. You should start
at state zero.
11.6 Use the determinant criteria to compute the diversity gain of the Alamouti
code with nr receiver antennas.
11.7 Perform a Monte Carlo simulation of an Alamouti space-time coding sys-
tem with Quadrature-Phase-Shift-Keyeing (QPSK) symbols and single-antenna
receivers. Show that the diversity order is what you expect by plotting the log-
arithm of the error probability at high SNR.
11.8 Show that the 4×4 space-time block code described by the real orthogonal
generator matrix in (11.28) has full diversity.
11.9 Explain why the requirement that all antennas are used for any given
thread in the universal space-time code framework described in Section 11.7
results in full diversity gain.
12 2 × 2 Network
12.1 Introduction
In this chapter, we analyze the performance of networks with two multiantenna

transmit nodes and two multiantenna receive nodes. The canonical 2×2 network
is illustrated in Figure 12.1. Transmitter 1, equipped with nt1 antennas, wishes
to communicate with receiver 1 which has nr 1 antennas, and transmitter 2,
equipped with nt2 antennas, wishes to communicate with receiver 2, which has
nr 2 antennas. The signal from transmitter 1 acts as interference to receiver 2
and vice versa.
Even for this simple network, fundamental capacity results are still unknown.
For instance, the capacity region of the 2 × 2 network even in the SISO case
under general assumptions is unknown. For certain special cases, it is possible to
derive the capacity of such channels, in particular when the interfering signals
are strong such as in References [52, 272] and [282]. Most works in the literature
have focused on deriving outer bounds to the capacity region such as References
[177, 224, 10], and [89] for SISO systems, and References [243] and [281] for
MIMO systems. Achievable rates of such networks under different sets of as-
sumptions, such as in References[135, 271, 59] and [283], have also been found.
Additionally, in Reference [89], the capacity region of the SISO Gaussian inter-
ference channel is derived to within one bit/second/Hz using an achievable rate
region based on the Han–Kobayashi scheme introduced in Reference [135], and
on novel outer bounds. Recently, the interference channel has been analyzed in
the high SNR regime where interference-alignment introduced in Reference [50]
has been shown to provide enormous network-wide performance improvements.
The 2 × 2 interference channel can also be analyzed in the context of a cognitive
radio link whereby one of the links, designated as the cognitive link needs to
operate without disrupting an existing legacy link.
The 2 × 2 network is useful in the context of larger networks as well. Since the
overhead associated with cooperation, particularly for multiantenna networks,
can be quite significant, real-world implementations of multiantenna networks
will likely have cooperation between only a limited number of nodes. Addi-
tionally, it is worth noting that the simpler 2 × 1 and 1 × 2 Gaussian chan-
nels are described in the context of broadcast and multiple-access channels in
Chapter 13.
Transmitter 1 Receiver 1
Figure 12.1 2 × 2 MIMO channel. Solid arrows indicate signal paths and dashed
arrows indicate interference paths.
The most common approach to analyzing the capacity of communication sys-

tems is to first find an upper bound to the capacity and show that the upper
bound is achievable. As of this writing, achievable upper bounds to the capacity
region of the interference channel for a general set of parameters are not known,
but upper bounds that are achievable to within one bit are known through the
findings of Etkin et al. in Reference [89].
A thorough treatment of this subject requires detailed information-theoretic
arguments that are beyond the scope of this text. However, we summarize some
general techniques used for finding the upper bound to the capacity region of
the 2 × 2 interference network in this section and leave the motivated reader to
consult a text on information theory such as Reference [68] for the details.
We start by discussing the achievable rates of the 2×2 MIMO network followed
by discussing upper bounds to the capacity region. We then analyze the 2 × 2
network in the cognitive-radio context whereby one of the links is assumed to be
a legacy link and the other, a cognitive link which is not allowed to disrupt the
legacy link.
12.2 Achievable rates of the 2 × 2 MIMO network
12.2.1 Single-antenna Gaussian interference channel

The general capacity region of the 2 × 2 network with single-antenna nodes has
been an open problem for a long time in the field of information theory [68].
When the noise at each receiver is Gaussian, the 2 × 2 network is known as the
394 2 × 2 Network
Gaussian interference channel. The sampled received signal at receivers 1 and 2

of a narrowband, flat-fading, Gaussian interference channel can be respectively
represented as follows:
z1 = h11 s1 + h21 s2 + n1
z2 = h22 s2 + h12 s1 + n2 ,
where hj k is the channel between transmitter j and receiver k, nj are CN (0, σ 2 )

random variables representing noise, and sj is the transmitted sample of trans-
mitter j. The Gaussian interference channel assumes that the data transmitted
by the jth transmitter are intended for the jth receiver only. Let the commu-
nication rate between transmitter j and receiver j be represented by Rj . The
capacity region of this network is the set of rate pairs (R1 , R2 ) for which commu-
nication at arbitrarily
% & low probability of error is possible, subject to the power
constraints1 |sj |2 = Pj and Pj ≤ P , for j = 1, 2.
Han–Kobayashi scheme
The Han–Kobayashi scheme is known to achieve rates within 1 b/s/Hz of the
capacity region of the Gaussian interference channel as shown in Reference [89].
The basic idea behind this scheme is that each transmitter partitions its data
into two separate streams, a common or public stream that is intended to be
decoded by both receivers, and a private stream that is intended to be decoded
by just the target receiver. By dividing the transmit power (and hence data
rates) appropriately between common and private streams, partial interference
cancellation can be performed by each receiver.
Suppose that the powers allocated by the jth transmitter to its private and
common streams are Ppj and Pcj , and the rates of the private and common
streams of link-j are Rpj and Rcj , respectively. Additionally, suppose that the jth
private and common symbols are spj and scj and the jth transmitter transmits
spj + scj . The sampled signals at receivers 1 and 2 are thus
z1 = h11 sc1 + h11 sp1 + h21 sc2 + h21 sp2 + n1 (12.1)

z2 = h22 sc2 + h22 sp2 + h12 sc1 + h12 sp1 + n2 . (12.2)
The jth receiver decodes the common streams from both transmitters and sub-
tracts the contribution of the common streams before decoding the private
stream from the jth transmitter, treating the private stream from transmitter
k = j as noise. Thus, the data rates on the private streams must satisfy

Pp1 |h11 |2
Rp1 < log2 1 + (12.3)
Pp2 |h21 |2 + σ 2

Pp2 |h22 |2
Rp2 < log2 1 + . (12.4)
Pp1 |h12 |2 + σ 2
1 For expedience, we break with our own convention and use | · | to represent the absolute
value in this chapter.
Rc2
Pc2 |h21 |2 Pc1 |h12 |2

log 2 1 + log2 1 +
Pp1 |h11 |2+ Pp2 |h21 |2 + σ2 Pp2 |h22 |2+ Pp1 |h12 |2 + σ2
Pc2 |h22 |2
log 2 1 +
Pp2 |h22 |2 + Pp1 |h12 |2 + σ2
Rc1
2
Pc1 |h11 |
log 2 1+
Pc2 |h21 |2 + Pp1 |h11 |2 + Pp2 |h21 |2 + σ2
Pc1 |h11 |2
log 2 1+
Pp1 |h11 |2+ Pp2 |h21 |2 + σ2
Pc1 |h12 |2
log 2 1+
Pc2 |h22 |2 + Pp2 |h22 |2 + Pp1 |h12 |2 + σ2
Figure 12.2 Rate region of common streams for Han–Kobayashi system, case 1.
The common streams are to be decoded by both receivers. For a given receiver,
we can model the common streams as a channel between the two transmitters
and the given receiver, with the private streams treated as noise. Hence we can
model this portion of the system as a multiple-access channel, for which the
capacity region is known and discussed in more detail in Section 13.2. Thus,
the common rates Rc1 , Rc2 must fall into the intersection of two multiple-access
channel capacity regions, each of which is a pentagon. The intersection of the
two pentagons can take several different forms depending on the parameters of
the system.
The possible intersections (excluding cases where one capacity region is a
subset of the other, and cases that can be constructed by reversing the roles
of the transmit–receive pairs), are illustrated in Figures 12.2 to 12.4. For the
common messages to be decoded with arbitrary low probability of error by
both receivers, the set of common rates Rc1 , Rc2 must belong to the intersec-
tion of the two pentagons illustrated using the solid and bold lines. The three
figures represent the different ways that the two multiple-access channels can
intersect.
The achievable rate region using the Han–Kobayashi scheme is the union over
all valid power allocations of all rate pairs (R1 = Rp1 + Rc1 , R2 = Rp2 + Rc2 ) for
which (Rc1 , Rc2 ) fall into one of the rate regions above, with the private rates
satisfying inequalities (12.3) and (12.4).
396 2 × 2 Network
Rc2
Pc2 |h22 |2 Pc1 |h12 |2

log2 1 + log2 1 +
Pp2 |h22 |2 + Pp1 |h12 |2 + σ2 Pp2 |h22 |2 + Pp1 |h12 |2 + σ2
Pc2 |h21 |2
log2 1 +
Pp1 |h11 |2+ Pp2 |h21 |2 + σ2
Rc1
2
Pc1 |h11 |
log2 1 +
Pc2 |h21 |2 + Pp1 |h11 |2 + Pp2 |h21 |2 + σ2 Pc1 |h11 |2
log2 1+
Pp1 |h11 |2 + Pp2 |h21 |2 + σ2
Pc1 |h12 |2
log2 1+
Pc2 |h22 |2 + Pp2 |h22 |2 + Pp1 |h12 |2 + σ2
Rc2
Pc2 |h21 |2 Pc1 |h12 |2

log 2 1+ log 2 1+
Pp1 |h11 |2+ Pp2 |h21 |2 + σ2 Pp2 |h22 |2 + Pp1 |h12 |2 + σ2
Pc2 |h22 |2
log2 1+
Pp2 |h22 |2 + Pp1 |h12 |2 + σ2
Rc1
2
Pc1 |h11 |
log2 1 +
Pc2 |h21 |2 + Pp1 |h11 |2 + Pp2 |h21 |2 + σ2
Pc1 |h11 |2
log2 1+
Pp1 |h11 |2+ Pp2 |h21 |2 + σ2
Pc1 |h12 |2
log2 1+
Pc2 |h22 |2 + Pp2 |h22 |2 + Pp1 |h12 |2 + σ2
Note that in the example given above, we have illustrated a successive decoding
scheme where the common and private messages are decoded in a particular
sequence. In general, however, better performance (that is larger achievable rates)
could be obtained by joint decoding of the common and private streams. We
refer the reader to references such as Reference [59] which provides a relatively
compact description of the Han–Kobayashi achievable rate region when joint

decoding is used.
12.2.2 Achievable rates of the MIMO interference channel

While Han–Kobayashi-type achievable rate regions can be constructed for the
MIMO interference channel, their construction is significantly more complicated
than in the SISO case as described in Reference [283], on which the development
in this subsection is based. The added complication arises from the fact that for
MIMO channels, all the transmit covariance matrices have to be jointly opti-
mized, compared to the SISO interference channel where only the joint power
allocations have to be optimized.
Consider the following two equations which describe the MIMO Gaussian in-
terference channel,
z1 = H11 s1 + H21 s2 + n1 (12.5)

z2 = H22 s2 + H12 s1 + n2 . (12.6)
The vectors z1 ∈ Cn r 1 ×1 and z2 ∈ Cn r 2 ×1 are the sampled received signals at

the antennas of receivers 1 and 2 respectively. The signals transmitted by trans-
mitters 1 and 2 are s1 ∈ Cn t 1 ×1 and s2 ∈ Cn t 2 ×1 respectively, and the vectors
n1 ∈ Cn r 1 ×1 and n2 ∈ Cn r 2 ×1 represent i.i.d. circularly symmetric complex
Gaussian noise at the antennas of each receiver.
Suppose that each transmitter partitions its transmit data streams into two
independent streams, a common stream to be decoded at both receivers and a
private stream to be decoded at the target receiver only. Let sc1 and sc2 be the
transmitted signals at a given sample time that encode the common data from
transmitters 1 and 2. Let sp1 and sp2 be the transmitted signals at a given sample
time that encode the private data from transmitters 1 and 2 respectively. For the
SISO case, we defined power allocations corresponding to these four streams. In
the MIMO case, however, power allocations alone will not suffice as the covari-
ance matrices of the transmitted signals influences the spatial structure of the
signals and interference. Hence, we need to define covariance matrices associated
with the signals encoding the private and and common data streams for each
transmitter. Let the following respectively denote the covariances matrices asso-
ciated with the common streams of transmitters 1 and 2 and the private streams
of transmitters 1 and 2:
5 6
K1c = s1c s†1c
5 6
K2c = s2c s†2c
5 6
K1p = s1p s†1p
5 6
K2p = s2p s†2p .
398 2 × 2 Network
Recall that R1c and R2c are the rates associated with the common data streams
from transmitters 1 and 2 respectively. Similarly, recall that R1p and R2p are the
rates associated with the private streams of transmitters 1 and 2 respectively.
The various rates and covariance matrices need to satisfy certain requirements
so that the common data streams are decodeable at both receivers for a given
decoding order. For all choices of decoding order, the private rates need to satisfy

R1p < log I + H11 K1p H†11 σ 2 I + H21 K2p H†21 −1

R2p < log I + H22 K2p H†22 σ 2 I + H12 K1p H†12 −1 .
In the previous two expressions, observe that in decoding the private streams,
each receiver only sees interference from the private stream corresponding to the
other transmitter as the common streams have all been decoded and subtracted
out by the time the private messages are decoded.
We can write different sets of inequalities corresponding to the different de-
coding orders of the common streams. For instance, suppose that receiver 1
decodes its common stream before decoding the common stream from trans-
mitter 2, and, likewise, receiver 2 decodes its common stream before decoding
the common stream from transmitter 1. Then the rates and covariance matrices
must satisfy the following requirements. For receiver 1 to be able to decode the
common stream from transmitter 1, we need

R1c < log I + H11 K1c H†11 σ 2 I + H11 K1p H†11 + H21 (K2p + K2c ) H†21 −1 .
(12.7)
Observe that the matrix that is inverted in the previous expression contains
contributions from the noise power, the private stream from transmitter 1, and
the private and common streams from transmitter 2. For receiver 2 to be able
to decode the common stream from transmitter 1, we need

R1c < log I + H12 K1c H†12 σ 2 I + H12 K1p H†12 + H22 K2p H†22 −1 . (12.8)
Observe that the matrix that is inverted in the previous expression contains con-
tributions from the noise power, the private stream from transmitter 1, and the
private stream from transmitter 2. Note that the common stream from trans-
mitter 2 does not contribute to the above expression as it is assumed to have
been decoded before the receiver decodes the common stream from transmit-
ter 1. Likewise, for receiver 1 to be able to decode the common stream from
transmitter 2, we require that

R2c < log I + H21 K2c H†21 σ 2 I + H11 K1p H†11 + H21 K2p H†21 −1 , (12.9)
12.3 Outer bounds of the capacity of the MIMO interference channel 399
and for receiver 2 to be able to decode the common stream from transmitter 2,
we need

R2c < log I + H22 K2c H†22 σ 2 I + H22 K2p H†22 + H12 (K1p + K1c ) H†12 −1 .
(12.10)
Note that inequalities (12.7) to (12.10) refer to the specific case of the receivers
decoding their respective common streams first followed by the other common
stream. One may write corresponding equations for other decoding orders.
Thus, one can construct an achievable rate region of the MIMO interference
channel as the convex hull of the rate pairs R1 = R1c + R1p and R2 = R2c + R2p .
The convex hull is taken over all possible decoding orders of the common streams
and over all possible covariance matrices K1c , K2c , K1p , and K2p that satisfy the
requirements of their respective decoding orders. Furthermore, the covariance
matrices must respect the following power constraints,
trace (K1c + K1p , ) ≤ P1
trace (K2c + K2p , ) ≤ P2 ,
where P1 and P2 are the power constraints on transmitters 1 and 2 respectively.
Since the achievable rate region depends on the covariance matrices in a com-
plicated way, visualizing the achievable rate region described above is compli-
cated. Most works in the literature that deal with the capacity region of MIMO
interference channels consider either the sum capacity (for example, Reference
[283]) or specific regimes of operation. For instance, in Reference [282], the ca-
pacity is found for the case that the interference is strong enough that it can be
decoded perfectly and then subtracted out. Note that as in the SISO case, joint
decoding of the common and private streams by the receivers can improve the
achievable rates compared to the sequential decoding described.
12.3 Outer bounds of the capacity region of the Gaussian

MIMO interference channel
The achievable rate regions described in the previous section can be combined
with appropriate outer bounds in order to characterize the capacity region of the
2 × 2 network. Outer bounds to the capacity region are discussed in this section,
and the discussions are based on the pioneering work of Etkin et al. [89] who
originally derived these bounds and showed that they Han–Kobayashi scheme
described in the previous section is within one bit of the outer bounds.
12.3.1 Outer bounds to the capacity region of the single-antenna

Gaussian interference channel
The general set of techniques used to derive outer bounds to the capacity region
are to treat the channel as a combination of broadcast and/or multiple access
400 2 × 2 Network
channels, and/or to use genie-aided methods. Genie-aided methods refer to a

class of methods in which one or more users is given information that it would
not normally have access to. For instance, for the interference channel described
by Equations (12.1) and (12.2), a possible genie-aided system would be one where
receiver 1 knows s2 , information that could be provided to receiver 1 by a genie.
We refer the motivated reader to the original source, Reference [89], for details.
An outer bound to the capacity region of the Gaussian interference channel
can be described by a set of inequalities. For each of the inequalities, we shall
summarize the general techniques used to find these bounds.
Strong interference bounds

The first set of bounds that can be written are based on interference-free commu-
nication. These bounds are are achievable if h12 and h21 are zero, in which case,
there is no interference. Another scenario is in the so-called strong interference
regime where the received interference power is greater than the received signal
power, that is,
P2 |h21 |2 > P1 |h11 |2 (12.11)
and
P1 |h12 |2 > P2 |h22 |2 , (12.12)
where the transmit powers of transmitters 1 and 2 are P1 and P2 respectively.

When Equations (12.11) and (12.11) hold, the capacity region is the following:

P1 |h11 |2
R1 ≤ log2 1 + (12.13)
σ2

P2 |h22 |2
R2 ≤ log2 1 + , (12.14)
σ2
which is the set of rates achievable as if the interfering paths did not exist,
that is, h12 = h21 = 0. In the high-interference case, the capacity region is the
intersection of two multiple-access channel capacity regions, each corresponding
to the multiple-access channel formed by the two transmitters and one of the
receivers.
One-sided interference channel bounds

A second pair of bounds can be written by using two genie-aided channels. In
the first, a genie provides s2 to receiver 1, and in the second a genie provides
s1 to receiver 2. Since the latter is simply the former case with roles reversed
it is sufficient to analyze just one of the cases. A genie-aided system in which
receiver 1 knows the transmitted signal of transmitter 2, s2 (presumably revealed
by a genie), is equivalent to a one-sided interference channel, which is depicted
in Figure 12.5, since receiver 1 can subtract out the signal from transmitter 2.
Figure 12.5 One-sided interference channel.
Since the case P2 |h21 |2 > P1 |h11 |2 is already treated by the bounds in Equations
(12.11) and (12.12), it is sufficient to consider the case where
P2 |h21 |2 < P1 |h11 |2 .
The sum capacity of the one-sided interference channel when P2 |h21 |2 < P1 |h11 |2
has been found by Sason in Reference [271], for the general one-sided interfer-
ence channel (that is, with the noise not necessarily Gaussian). For the case of
Gaussian noise, the sum capacity is bounded from above by the following

P1 |h11 |2 P2 |h22 |2
log2 1 + + log2 1 + 2 .
σ2 σ + P1 |h12 |2
Thus, one can write the following two bounds on the sum rate R1 + R2 of the
2 × 2 Gaussian interference channel,

P1 |h11 |2 P2 |h22 |2
R1 + R2 ≤ log2 1 + + log2 1 + 2
σ2 σ + P1 |h12 |2

P2 |h22 |2 P1 |h22 |2
R1 + R2 ≤ log2 1 + + log 1 + .
σ2 2
σ 2 + P2 |h21 |2
Note that the second bound is simply the first with the roles of the transmitter
and receiver reversed.
Noisy-interference bounds
A third type of bound can be found using a different genie-aided channel in which
the genie reveals to a particular receiver a noisy version of the interference which
402 2 × 2 Network
Figure 12.6 Genie-aided interference channel with interference plus noise revealed to
cross receivers.
that link causes to the other receiver. This bound can be explicitly described by
defining the following variables, which represent the interference plus noise seen
at the opposing receiver:
v1 = h12 s1 + n2
v2 = h21 s2 + n1 .
The genie reveals the noisy interference at receiver 2, v1 , to receiver 1 and the
noisy interference at receiver 1, v2 , to receiver 2. Note that v1 is the interference
plus noise seen at receiver 2 and v2 is the interference plus noise seen at receiver
1. This channel, where the broken lines represent information provided by the
genie, is illustrated in Figure 12.6.
This type of genie-aided network is different from the traditionally used genie-
aided network in that the information provided by the genie cannot be used
by any one node to perfectly cancel out interference. This technique provides a
useful bound to the sum capacity in certain regimes. Using detailed information-
theoretic techniques, it can be shown that an upper bound on the sum rate can
be written as

P2 |h21 |2 P1 |h11 |2
R1 + R2 ≤ log2 1 + +
σ2 σ 2 + P1 |h12 |2

P1 |h12 |2 P2 |h22 |2
+ log2 1 + + .
σ2 σ 2 + P2 |h21 |2
Again, we refer the reader to Reference [89] for details of the derivation.
Noisy-interference bounds with additional receiver

A fourth type of bound can be found by using a similar genie-aided system as
before, but introducing a second receiver for link 1 that does not have the aid of
the genie. This channel can be described by Figure 12.7.
Receiver 1B
Transmitter 1 Receiver 1A
Figure 12.7 Genie-aided interference channel with interference plus noise revealed to
cross receiver and an additional receiver without aid of the genie.
For this type of network, it can be shown that the sum rate including the rate
achieved at the additional receiver is
2
P1 |h11 |2 P2 |h21 |2 σ + P1 |h11 |2
2R1 + R2 ≤ log2 1 + + + log2
σ2 σ2 σ 2 + P1 |h12 |2

P1 |h12 |2
P2 |h22 |2
+ log2 1 + + 2 . (12.15)
σ 2 σ + P2 |h21 |2
A similar bound can be written if an additional receiver 2 was introduced instead

of an additional receiver 1, as follows:
2
P2 |h22 |2 P1 |h12 |2 σ + P2 |h22 |2
R1 + 2R2 ≤ log2 1 + + + log
σ2 σ2 2
σ 2 + P2 |h21 |2

P2 |h21 |2 P1 |h11 |2
+ log2 1 + + 2 . (12.16)
σ 2 σ + P1 |h12 |2
Thus, we can say that the capacity region of the two-user Gaussian interference
channel when both interference channels are weaker than the corresponding di-
rect channels, that is,
P2 |h21 |2 < P2 |h22 |2

P1 |h12 |2 < P1 |h11 |2 ,
404 2 × 2 Network
must simultaneously satisfy all of the following inequalities:

P1 |h11 |2
R1 ≤ log2 1 + (12.17)
σ2

P2 |h22 |2
R2 ≤ log2 1 + (12.18)
σ2

P1 |h11 |2 P2 |h22 |2
R1 + R2 ≤ log2 1 + + log 1 + (12.19)
σ2 2
σ 2 + P1 |h12 |2

P2 |h22 |2 P1 |h22 |2
R1 + R2 ≤ log2 1 + + log 1 + (12.20)
σ2 2
σ 2 + P2 |h21 |2

P2 |h21 |2 P1 |h11 |2
R1 + R2 ≤ log2 1 + + (12.21)
σ2 σ 2 + P1 |h12 |2

P1 |h12 |2 P2 |h22 |2
log2 1 + + (12.22)
σ2 σ 2 + P2 |h21 |2
2
P1 |h11 |2 P2 |h21 |2 σ + P1 |h11 |2
2R1 + R2 ≤ log2 1 + + + log
σ2 σ2 2
σ 2 + P1 |h12 |2

P1 |h12 |2 P2 |h22 |2
+ log2 1 + + 2 (12.23)
σ 2 σ + P2 |h21 |2
2
P2 |h22 |2 P1 |h12 |2 σ + P2 |h22 |2
R1 + 2 R2 ≤ log2 1 + + + log2
σ2 σ2 σ 2 + P2 |h21 |2

P2 |h21 |2
P1 |h11 | 2
+ log2 1 + + 2 . (12.24)
σ2 σ + P1 |h12 |2
For the case where one of the interference channels is weaker than the direct
channel but the other isn’t, that is,
P2 |h21 |2 ≥ P2 |h22 |2
P1 |h12 |2 < P1 |h11 |2 ,
we can write the following bounds that all rate pairs must satisfy:

P1 |h11 |2
R1 ≤ log2 1 + (12.25)
σ2

P2 |h22 |2
R2 ≤ log2 1 + (12.26)
σ2

P1 |h11 |2 P2 |h22 |2
R1 + R2 ≤ log2 1 + + log2 1 + 2 (12.27)
σ2 σ + P1 |h12 |2

P2 |h21 |2 P1 |h11 |2
R1 + R2 ≤ log2 1 + + (12.28)
σ2 σ2

P2 |h22 |2 P1 |h12 |2 P2 |h22 |2
R1 + 2R2 ≤ log2 1 + + + log2 1 + 2
σ2 σ2 σ + P2 |h21 |2

P2 |h21 | 2
P1 |h11 |2
+ log2 1 + + 2 . (12.29)
σ2 σ + P1 |h12 |2
Transmitter 1 Receiver 1A
Transmitter 2
Receiver 2A
Receiver 2B
Figure 12.8 Genie-aided mixed interference channel with interference plus noise
revealed to the cross receivers, interfering signal revealed at one receiver, and an
additional receiver without aid of the genie.
Except for the last inequality, the remaining expressions are either equivalent to
the weak interference channel, or can be found from the results of the weak inter-
ference channel. For instance, the first two inequalities are based on interference-
free communication and hold in all cases including the strong, weak, and mixed
interference channels.
The last inequality can be found by a genie-aided system with an additional
antenna (acting as an additional user) at node 2, as depicted in Figure 12.8. The
genie reveals the interference plus noise seen at receiver 2A to receiver 1 and the
interference plus noise seen at receiver 1 to receiver 2A. Additionally, the genie
reveals the interfering signal s1 to receiver 2A. Receiver 2B is not aided by the
genie and receives the signal s2 . The last bound can be found using arguments
detailed in Reference [89].
12.3.2 Outer bounds to the capacity region of the Gaussian interference

channel with multiple antennas
The general ideas used in bounding the capacity region of the single-antenna
Gaussian interference channel can be extended to multiantenna systems as done
in Reference [243]. In this section we summarize the outer bounds provided in
that work. Note that very recently, these bounds have been improved on in
Reference [170]. Assuming that the signals received at the antennas of users 1
and 2 are z1 ∈ Cn r 1 ×1 and z2 ∈ Cn r 2 ×1 respectively, we write the following,
√ √
z1 = ρ1 H11 s1 + γ1 H21 s2 + n1
√ √
z2 = ρ1 H22 s2 + γ2 H12 s1 + n2 .
406 2 × 2 Network
We shall assume that n1 ∈ Cn r 1 ×1 and n2 ∈ Cn r 2 ×1 comprise i.i.d., circularly

symmetric, complex, Gaussian random variables with unit variance, and the
channel matrix between the jth transmit node and the kth receive node is given
by Hj k ∈ Cn r k ×n t j .
Let the achievable rates on link 1 and link 2 be denoted by R1 and R2 , that
is, links 1 and 2 can simultaneously operate at rates R1 and R2 respectively
with arbitrarily low probability of error. We can write the following two bounds,
which are based on single-user communication (that is, no interference) on the
rates of links 1 and 2.

R1 ≤ log2 I + ρ1 H11 H†11

R2 ≤ log2 I + ρ2 H22 H†22 .
We can further write a bound on the sum capacity as follows,
R1 + R2 ≤ log2 |K1 | + log2 |K2 | , (12.30)
where the matrices K1 ∈ Cn t 1 ×n t 1 and K1 ∈ Cn t 2 ×n t 2 are defined as follows

−1
†
K1 = I + γ1 R21 + ρ1 H11 T−1 1 + γ2 H 12 H 12 H†11 , (12.31)
−1
†
K2 = I + γ2 R12 + ρ2 H22 T−1 2 + γ1 H21 H21 H†22 , (12.32)
and
R11 = H11 T1 H†11 , (12.33)

R22 = H22 T2 H†22 , (12.34)
R12 = H12 T1 H†12 , (12.35)
R21 = H21 T2 H†21
. (12.36)
5 6
Note here that Tj = sj s†j ∈ Cn t j ×n t j is the transmit covariance matrix of
the jth transmitter. Hence the matrix Rj k ∈ Cn r k ×n r k is the covariance matrix
of signals received at the nr k antennas of the kth receiver which are due to the
transmission of the jth transmitter. We assume here that the transmit covariance
matrices Tj have been optimized.
This bound can be proved by using a genie-aided network of the form depicted
in Figure 12.6, as shown in Reference [243]. The genie provides the following
signals to receivers 1 and 2 respectively,
√
v2 = γ2 H12 s1 + n2 (12.37)
√
v1 = γ1 H21 s2 + n1 . (12.38)
That is to say, the genie provides the jth receiver with the interference caused by
the jth transmitter on the kth receiver, for j = k. We can write two more bounds
that are simply the sum capacities of the multiple-access channels obtained by
removing receivers 1 and 2 respectively as follows
R1 + R2 < log2 |I + ρ1 R11 + γ1 R21 | (12.39)

R1 + R2 < log2 |I + ρ2 R22 + γ2 R12 | . (12.40)
The bound in Equation (12.39) applies when receiver 1 is able to decode the
messages intended for both receivers. Writing the singular-value decomposition
of the channel matrix between the jth transmitter and kth receiver as follows,
Hj k = Uj k Σj k Vj k , (12.41)
the bound in Equation (12.39) applies when

† †
γ2 V11 Σ−2 −2
11 V11 − ρ1 V12 Σ12 V12 ≥ 0 . (12.42)
Similarly, the second bound in Equation (12.40) applies when receiver 2 is able
to decode the messages intended for both receivers, which occurs if
† †
γ1 V22 Σ−2 −2
22 V22 − ρ2 V21 Σ21 V21 ≥ 0 . (12.43)
If we assume that a genie provides s2 to receiver 1, we have a one-sided inter-

ference channel. That is to say, receiver 1 effectively sees no interference from
transmitter 1. Then, generalizing the analysis in Reference [89], the following
bound found in Reference [243]

−1
R1 + R2 ≤ log2 |I + ρ1 R11 | + log2 I + (I + γ2 R12 ) ρ2 R22 . (12.44)
Similarly, if the genie reveals s1 to receiver 2, we have

−1
R1 + R2 ≤ log2 |I + ρ2 R22 | + log2 I + (I + γ1 R21 ) ρ1 R11 . (12.45)
Suppose now that receiver 1 is decomposed into two separate receivers. Assume
that s2 is revealed to one of the sub-receivers at receiver 1, and that v1 is revealed
to receiver 2. Then, once again generalizing the analysis of Reference [89], in
Reference [243] the following bound is found
2 R1 + R2 ≤ log2 |I + ρ1 R11 + γ1 R21 | + log2 |I + ρ1 R11 |

−1
+ log2 (I + γ2 R12 ) K2 , (12.46)
which holds if Equation (12.42) is true.

Switching the roles of links 1 and 2 (that is receiver 2 is now decomposed into
two separate receivers), yields the following
R1 + 2 R2 ≤ log2 |I + ρ2 R22 + γ2 R12 | + log2 |I + ρ2 R22 |

−1
+ log2 (I + γ1 R21 ) K1 , (12.47)
which holds if Equation (12.43) is true.

408 2 × 2 Network
Using a multiantenna version of Figure 12.8, we can write the following upper
bound proved in Reference [243].
R1 + 2 R2 ≤ log2 |K1 | + log2 |I + ρ2 R22 + γ2 R12 | +
−1

−1 †
log2 I + ρ2 H22 P2 + γ1 H21 H21 †
H22 . (12.48)
Once again, switching the roles of links 1 and 2, we have

2R1 + R2 ≤ log2 |K2 | + log2 |I + ρ1 R11 + γ1 R21 | +
−1

−1 †
log2 I + ρ1 H11 P1 + γ2 H12 H12 †
H11 . (12.49)
This collection of inequalities essentially represent the MIMO extension of the

outer bounds for the SISO channel given in the previous section.
12.4 The 2 × 2 cognitive MIMO network
Cognitive radio systems are, loosely speaking, radio systems that can sense and
adapt to their environment in an “intelligent” way. Various authors have used
this term to mean different things, and Chapter 16 treats the topic of cognitive
radio and its various definitions in more detail. In this chapter, we consider one
form of cognitive radio whereby a cognitive transmitter–receiver pair, which we
refer to as the secondary link, wishes to transmit simultaneously and in the same
frequency band as an existing legacy link, which we refer to as the primary link.
Here we assume that the primary link must be able to operate at the capacity
as if the cognitive link was absent. In other words, the capacity of the primary
link must not be diminished by the existence of the cognitive link.
We define two different models for the 2×2 MIMO network, namely a network
with a non-cooperative primary link and a network with a cooperative primary
link. For the non-cooperative primary link model, the primary link operates as
if the secondary link does not exist. Hence, the secondary link must operate in a
manner such that it does not reduce the data rate of the primary link, without
requiring the primary link to modify its behavior. One possible method for this
is for the secondary link to transmit only when the primary link is not accessing
the medium or for the secondary link to transmit in a subspace that is orthogonal
to that used on the primary link.
In the cooperative primary link model, we assume that the primary link will
alter its behavior to accommodate the secondary link but not at the expense of
its communication rate. In other words, the primary link operates in a manner
that is judicious to the secondary link but without sacrificing its data rate.
More sophisticated assumptions can be made in the cooperative primary link
model as well. For instance, we may allow the primary transmitter to share its
data with the secondary transmitter, which can then encode its transmissions in
a manner that helps the primary link maintain its maximum data rate. We shall
not consider this type of cooperation in this chapter as it involves a high degree
of overhead for the data exchange between the transmitters.
Consider the two-link interference network of Figure 12.1, in which the solid
arrows are signal paths and broken arrows are interference paths. Suppose that
the link between transmitter 1 and receiver 1 is the primary link, and the link
between transmitter 2 and receiver 2 is the secondary link. Let R1 and R2 denote
the data rates on the respective links, and the matrices Hk j ∈ Cn r j ×n t k denote
the channel coefficients between the k-th transmitter and j-th receiver. With
zj ∈ Cn r j ×1 denoting the received-signal vector at receiver j, and sk the transmit-
signal vector from transmitter k, the following equations hold,
z1 = H11 s1 + H21 s2 + n1 (12.50)
z2 = H22 s2 + H12 s1 + n2 , (12.51)
where n1 and n2 are i.i.d. complex Gaussian noise vectors of variance σ 2 . Let
K1 ∈ Cn t 1 ×n t 1 and K2 ∈ Cn t 2 ×n t 2 respectively denote the covariance matrices of
the vectors of transmit samples s1 and s2 with tr(Kj ) ≤ P enforcing a common
power constraint on each transmitter.
Applying Equation (1) of Reference [92] to our network model, the maximum
rate supportable on link 1 if the signal from transmitter 2 is treated as noise is
given by the following bound
−1

† 2
R1 < log2 I + H11 K1 H11 σ I + H21 K2 H21 † . (12.52)

For the maximum rate supportable on link 2, simply replace 1 with 2 and vice
versa in Equation (12.52) such that
−1

R2 < log2 I + H22 K2 H†22 H12 K1 H†12 + σ 2 I .
(12.53)
We shall assume that the secondary (cognitive) transmitter and receiver know
all the channel matrices Hj k and that the number of transmitter antennas at
the secondary transmitter nt2 is greater than the number of receiver antennas
at the primary receiver, nr 1 .
12.4.1 Non-cooperative primary link

Since the primary link does not cooperate with the secondary link, we fix the
transmit covariance matrix of the primary link to equal the transmit covariance
matrix that maximizes the data rate on the primary link.
The primary link can operate at R1 m by using a scheme motivated by singular-
value decomposition of the matrix H11 as follows:
H11 = U1 Λ1 V1† , (12.54)
where U1 ∈ Cn r 1 ×n r 1 is the left singular matrix, V1 ∈ Cn t 1 ×n t 1 is the right
singular matrix and Λ1 ∈ Cn r 1 ×n t 1 contains the singular values of H11 on its
410 2 × 2 Network
1
diagonal. The primary transmitter transmits V1 Φ12 s1 , resulting in a transmit
covariance matrix at transmitter 1 of K1 = V1 Φ1 V1† . The primary receiver
multiplies the receive-signal vector z1 by U†1 . This operation effectively produces
a system with nr 1 parallel channels as follows:
1

z̃1 = U†1 H11 V1 Φ12 s1 + H21 s2 + n1 (12.55)
1
= U†1 H11 V1 Φ12 s1 + U†1 H21 s2 + U†1 n1 (12.56)
1
= Λ1 Φ1 s1 + U†1 H21 s2 + U†1 n1 ,
2
(12.57)
where Φ1 = diag(φ11 , φ12 , . . . , φ1n t 1 ) contains the power allocations given by a

water-filling algorithm, that is,
+
σ2
φ1i = η − , (12.58)
λi
where λi is the ith largest eigenvalue of the matrix H11 H†11 . The notation (x)+
means the maximum of x or zero, in other words, (x)+ = max(0, x). The “water
level” η is chosen such that

nt 1
P = φ1i . (12.59)
i=1
Thus, φ1i gives the power that should be allocated to the stream transmitted
along vi . Note that this scheme achieves the capacity of the MIMO channel in
the absence of interference as described in Section 8.3.2.
Since the primary link is non-cooperative, one option is for the secondary link
to transmit in a subspace that is orthogonal to the subspace used in the primary
link. Suppose that the water-filling power allocation for the primary link allocates
zero power to K modes, that is, φ1j = 0 for j > nr 1 −K, for some integer K ≥ 0.
Thus, K spatial modes are available for secondary-link transmissions. Note that
this is the spatial analog of spectral scavenging, which is a commonly studied
cognitive radio paradigm.
1
Suppose that the secondary transmitter transmits K22 s2 instead of s2 , where
the matrix K2 ∈ Cn t 2 ×n t 2 is the covariance matrix of the signals transmitted by
transmitter 2. Substituting into Equation (12.57) yields
1 1
z̃1 = Λ1 Φ12 s1 + U†1 H12 K22 s2 + U†1 n1 . (12.60)
To avoid interfering with the primary link, the first nr 1 − K entries of the second
term on the right-hand side must equal zero as these correspond to the parallel
channels used by the primary link. Since s2 can be any vector, the first nr 1 − K
1 1
rows of the matrix U†1 H12 K22 must be all zeros. Since U†1 H12 K22 ∈ Cn r 1 ×n t 2 ,
it is possible to achieve this requirement if nt2 ≥ nr 1 − K. One can express this
requirement in matrix form by writing a diagonal matrix D ∈ Cn r 2 ×n r 2 whose
first nr 1 − K diagonal entries are unity and the remaining entries are zero. The
1
requirement that the first nr 1 − K entries of the matrix U†1 H12 K22 are all zero
can be written as
1
D U†1 H12 K22 = 0 , (12.61)
where 0 is a matrix whose entries are all zero.

With the transmit covariance matrix of K2 , the maximum rate of the sec-
ondary link R2 is given by the mutual information between the transmit and
received signals of the secondary link, which can be found by applying Equation
(1) from Reference [92],
−1

† 2 †
R2 < log I + H22 K2 H22 σ I + H12 K1 H12 , (12.62)

which can be maximized with respect to K2 subject to Equation (12.61) which

ensures zero interference to the primary link and the power constraint
tr(K2 ) ≤ P .
The secondary link can perform interference cancellation if the primary link’s
rate R1 m is smaller than the mutual information between s1 and z2 . That is, if
−1

R1 m † 2 †
< log I + H12 K1 H12 σ I + H22 K2 H22 , (12.63)

the secondary link operates as if there is no interference and the maximum rate
on the second link is given by the following bound

1
R2 < log I + 2 H22 K2 H†22 . (12.64)
σ
To maximize R2 in this case, Equation (12.64) needs to be maximized subject

to Equation (12.63) which ensures that the interference from the primary link
can be decoded, Equation (12.61) which ensures that there is no interference to
the transmissions of the primary link, and tr(K2 ) ≤ P , which enforces a power
constraint.
In this subsection, we have assumed that the primary transmitter knows the
channel between itself and its target receiver H11 . This information enabled the
primary transmitter to spatially encode its transmissions to increase capacity.
When the primary transmitter does not have channel-state information, the op-
timal behavior of the transmitter is to transmit independent data streams on
each antenna. In this case, the primary link always uses all its available channel
modes. Thus, for systems without channel-state information at the transmit-
ter of the legacy link, or for systems with transmit channel-state information
but with all channel modes used by the legacy link, the secondary transmitter
must encode its signals such that they are nulled at each antenna of the primary
receiver. Problem 12.4 explores such a system in more detail.
412 2 × 2 Network
12.4.2 Cooperative primary link

Suppose that the primary link is willing to cooperate with the secondary link
and can cancel interference due to the secondary link. The secondary transmitter
can then transmit two separate independent data streams. The first is a private,
noninterfering stream transmitted in the unused subspace of the primary link as
described in the previous subsection. The other is a common stream at rate R2c
in the subspace used by the primary link, but at a rate low enough that it can
be decoded and subtracted out by the primary receiver. This second stream is
referred to as a common stream as it is intended to be decoded and subtracted
by receiver 1 and receiver 2.
To ensure that the common stream is decodable at the primary receiver, R2c
must satisfy
−1

† 2 †
R2c < log I + H21 K2c H21 σ I + H21 K2p H21 + H11 K1 H11 † , (12.65)

where K2c and K2p are the transmit covariance matrices of the common and
private streams of the secondary transmitter, respectively. In addition, R2c must
be supportable by the secondary link in the presence of interference from the
primary link and by self-interference from the private stream of the secondary
link which is captured by the following inequality,
−1

† 2 †
R2c < log I + H22 K2c H22 σ I + H22 K2p H22 + H12 K1 H12 † . (12.66)

To ensure that the private stream of the secondary link does not interfere with
the primary link, the following needs to hold:
1
D U†1 H12 K2p
2
= 0. (12.67)
Hence, the secondary transmitter needs to find covariance matrices K2c and
K2p as well as rates R2c and R2p such that R2c + R2p is maximized subject to
the constraints in Equations (12.65), (12.66), (12.67) and the power constraint
tr (K2c + K2p ) ≤ P .
Problems
12.1 Consider the 2 × 2 MIMO interference channel with Han–Kobayashi en-

coding as described in Section 12.2.2. For each of the possible choices of decoding
orders, please write the inequalities that govern the common data rates.
12.2 Consider an extension of the Han–Kobayashi scheme for a network with

three transmitter and receiver pairs. How many streams should each transmitter
employ to achieve all possible combinations of partial interference cancelation.
Also, qualitatively describe the achievable rate region for this type of network.
Problems 413
12.3 For the 2 × 2 MIMO interference channel as described in Section 12.3.2,

show that receiver 1 is able to decode both s1 and s2 with arbitrarily low prob-
ability of error if (12.42) is satisfied.
12.4 Consider the 2 × 2 MIMO cognitive radio channel described in Section
12.4, but now assume that the primary transmitter does not have channel-state
information and hence transmits equal power and independent data on each of
its antennas. Derive an expression that the transmit covariance matrix of the
secondary link (that is, K2 ) needs to satisfy in order not to interfere with the
primary link.
12.5 Consider the 2 × 2 MIMO cognitive radio channel described in Problem
12.4, but assume that the cognitive transmitter knows the transmit signal of
the primary transmitter s1 . Show how this information could be used by the
cognitive link to increase its data rate compared to the previous problem.
12.6 Suppose that a 2 × 2 MIMO system has nt1 and nt2 antennas at transmit-
ters 1 and 2 and nr 1 and nr 2 antennas at receivers 1 and 2 respectively. Assume
that 1 < nt1 < nr 2 , nt2 = 1 and nr 1 > 1. Find the transmit covariance matrix T1
of transmitter 1 that minimizes the interference caused on receiver 2 assuming
transmitter 1 and receiver 1 have full channel-state information (that is, they
know all the channel matrices in the network) and receiver 2 knows only the
channel vector between transmitter 2 and itself.
12.7 Consider a 2 × 2 MIMO channel with a legacy link and a cognitive link.
Under what conditions on the relative numbers of antennas at all nodes can
the cognitive link operate without disrupting the legacy link? Assume that the
legacy link does not change its behavior in response to the presence of the legacy
link.
12.8 Consider a legacy 2×2 link for which each transmitter has a single antenna
and each receiver has two antennas. It is assumed that the receivers use zero-
forcing to cancel the interference from their respective undesired transmitters.
Assume that a cognitive link with two transmitter antennas wishes to operate
in the same frequency band as this existing 2 × 2 link in a manner such that
the existing links are completely unaffected, that is neither their communication
rates nor their behavior changes. Show that it is possible for the cognitive link to
operate with nonzero rate by appropriately phasing transmit signals. You may
make reasonable assumptions on the realizations of the various channel matrices.
13 Cellular networks
13.1 Point-to-point links and networks
The simplest wireless communication link is between a single transmitter and a

single receiver. In point-to-point systems, data communication rates depend on
factors such as bandwidth, signal power, noise power, acceptable bit-error rate,
and spatial degrees of freedom.
Many wireless systems, however, comprise multiple interacting links. The pa-
rameters and trade-offs associated with point-to-point links hold for networks as
well. Additional factors play a role in networks, however. For instance, interfer-
ence between links can reduce data communication rates. An exciting possibility
is for nodes to cooperate and help convey data for each other, which has the po-
tential to increase data communication rates. Table 13.1 summarizes some of the
key common and differentiating features of point-to-point links versus networks.
In this chapter, we analyze the performance of various multiantenna approaches
in the context of cellular networks whereby signal and interference strengths are
influenced by the spatial distribution of nodes and base stations. Note that we
use the term cellular in a broader context than many works in the literature,
which refer specifically to mobile telephone systems. Here we consider any kind
of network with one-to-many (downlink) and many-to-one topologies (uplink).
For most of this chapter except for Section 13.5.1, we shall focus on character-
izing systems without out-of-cell interference, whereby we assume that there is
some channel allocation mechanism with a reuse factor that results in negligible
out-of-cell interference.
Examples include wireless networks with access points acting like base stations
and sensor networks with data-collection nodes acting like base stations. Note
that our focus here is on the different receiver algorithms and the impact of
the spatial distribution of nodes and base stations on the performance of such
systems. We refer the interested reader to specialized texts on cellular mobile
networks, such as [328] and [192], for detailed discussions of such systems.
13.2 Multiple access and broadcast channels
In the absence of out-of-cell interference, the uplink and downlink of cellular

networks can be viewed as the canonical multiple-access channel (MAC) and
Table 13.1 Key features of wireless networks
Point-to-point Networks
Signal power, noise power Signal, interference and noise powers

Signal-to-noise-ratio (SNR) Signal-to-interference-plus-noise-ratio (SINR)
Bandwidth Per-link bandwidth
Bit-error rate (BER) Bit-error rate
Processing latency Processing and protocol latency
Local optimization Network or local optimization
Simple point-to-point protocols Sophisticated multiple-access protocols
Point-to-point topology Multiple network topologies
Fairness
broadcast channel (BC) respectively. The multiple-access channel is essentially

a many-to-one network and the broadcast channel is a one-to-many network. In
this section, we analyze these channels from an information theoretic point of
view by studying the capacity regions of these channels. The developments here
are standard and are discussed in texts such as [68] and [314].
Capacity region of the SISO multiple-access channel

For simplicity, consider a multiple-access channel with two transmitters and one
receiver where the signals from transmitters 1% and &2 are denoted% by s1& and s2
respectively with average power constraints ||s1 ||2 ≤ P1 and ||s2 ||2 ≤ P2 .
Suppose that the complex baseband received signal is
z = s1 + s2 + n , (13.1)
where n is a complex, white Gaussian noise process with variance σ 2 . Let R1

and R2 denote the rates of link 1 between transmitter 1 and the receiver, and
link 2 between transmitter 2 and the receiver respectively. For arbitrarily low
probability of error, the following must hold,

P1
R1 < log2 1 + 2 (13.2)
σ

P2
R2 < log2 1 + 2 , (13.3)
σ
which correspond to the capacity of each of the channels with the other turned
off.
To find a third bound, let’s suppose that transmitter 1 and transmitter 2 can
cooperate and share their power. The sum rate without cooperation must be less
than or equal to that with cooperation. Then we have

P 1 + P2
R1 + R2 < log2 1 + , (13.4)
σ2
R2
Capacity region is
inside this
P2
log2 1+ pentagon.
σ2 A C
P2 D
R ′2 = log2 1 +
P1 + σ2
B
R1
P1 P1
R ′1 = log2 1 + log2 1+
P2 + σ2 σ2
Figure 13.1 Bounds on the capacity region of the multiple-access channel. P1 and P2
are the received power due to transmitters 1 and 2 respectively, and R1 and R2 are
the data rates per channel use of transmitters 1 and 2 respectively.
where the right-hand side of the previous expression comes from the Shannon
capacity (for example, see Section 5.3) of a link with transmit power budget of
P1 + P2 . These bounds are shown in Figure 13.1. Any rate pair (R1 , R2 ) that can
be decoded with arbitrarily low probability of error must be inside the pentagon
in Figure 13.1 in order to satisfy the bounds given above.
Next, we show that all points inside the pentagon in Figure 13.1 are achievable
with arbitrarily low probability of error. In other words, communication with
arbitrarily low probability of error is possible at all pairs of rates (R1 , R2 ) that
are inside the pentagon in Figure 13.1. For the rest of this chapter, we shall use
the term “achievable” to describe a rate at which communication with arbitrarily
low probability of error is possible.
Consider Figure 13.1. Points A and B are achievable when transmitters 1 and
2 respectively are off, and by using Gaussian code-books at the active transmit-
ter, since with one transmitter off, the system is reduced to an additive white
Gaussian noise channel. Point C is achievable if the receiver first decodes the
signal from transmitter 1 which transmits at rate R1 , while treating the signal
from transmitter 2 as noise which effectively increases the noise variance by P2 .
The signal from transmitter 1 can be decoded with arbitrarily low probability of
error since by treating the signal from transmitter 2 as noise, the Shannon ca-
pacity result given in Section 5.3 indicates that communication with arbitrarily
low probability of error is possible at rates satisfying

P1
R1 < log2 1 + . (13.5)
P2 + σ 2
Therefore, the signal from transmitter 1 can be subtracted from the received
signal with arbitrary accuracy, thereby allowing any rate R2 < log2 1 + Pσ 22 to
be achievable. Point D is achievable using the same technique with the roles of
transmitters 1 and 2 reversed. Thus we have shown that all rates at the corner
points of the pentagon in Figure 13.1 are achievable. Finally, any point inside
the pentagon 0ACDB is achievable by time sharing between the strategies used
to achieve the corner points. Therefore, any point inside the pentagon in Figure
13.1 is achievable.
For comparison, consider a time-division multiple-access (TDMA) scheme (dis-
cussed in Section 4.3.2) in which transmitter 1 uses the channel for a fraction
α ≤ 1 of the time and transmitter 2 uses the channel for 1 − α of the time. The
achievable rate pairs now must satisfy

P1
R1 < α log2 1 + (13.6)
α σ2

P2
R2 < (1 − α) log2 1 + . (13.7)
(1 − α) σ 2
The factors of α and 1 − α that scale the noise power are due to the fact that
P1 and P2 are long-term average power budgets of transmitters 1 and 2 which
are respectively on for fractions α and 1 − α of the time.
By varying α, the rate pair (R1 , R2 ) traces out the dashed line shown in
Figure 13.2. Note that it is possible to meet the maximum sum-rate by selecting
α = P 1P+1P 2 , which yields the following after some algebraic manipulation:

P 1 + P2
R1 + R2 < log2 1 + . (13.8)
σ2
This analysis extends in a straightforward manner to multiple-access channels
with K users where the capacity region satisfies the following set of constraints
"
Pk
Rk < log2 1 + k ∈T2 (13.9)
σ
k ∈T
for all T , which are subsets of the integers 1, 2, . . . K.

Similar to the two transmitter case, the sum capacity is obtained by combining
powers of the K users, which gives the following sum capacity,
"M
M
k =1 Pk
Rk < log2 1 + . (13.10)
σ2
k =1
Capacity region of the broadcast channel

The downlink channel from the base station to the mobile users is an example of
a broadcast channel where one source transmits information to multiple users.
In general, we assume that the transmitter sends a distinct message to each
receiver.
As in the multiple-access channel, we first consider a channel with two mobile
users. Unlike for the multiple-access channel, we have to explicitly consider the
V arying
A C
Figure 13.2 Capacity region of the multiple-access channel with achievable rates using
TDMA in dashed lines.
strengths of the channels between the transmitter and each mobile, and the
power allocated by the transmitter to each mobile. Suppose that the transmitter
allocates power P1 and P2 to receivers 1 and 2 with P = P1 + P2 and that
the channels between the transmitter and receivers are denoted by h1 and h2
respectively. If the transmitted signal is the superposition of the signals intended
for receiver 1 and receiver 2, we can write the following expression for the signal
at the jth receiver,
zj = hj s1 + hj s2 + nj , (13.11)
where nj is circularly symmetric complex

% Gaussian
& noise of variance σ 2 , that is,
nj is distributed as CN (0, σ ), and ||sj || ≤ Pj .
2 2
Suppose that ||h1 ||2 < ||h2 ||2 . In this case, receiver 2 can decode s1 with
arbitrarily low probability of error provided that s1 is transmitted at a rate
R1 such that receiver 1 can decode s1 with arbitrarily low probability of error.
Hence, receiver 2 can perform successive interference cancellation and remove
the interference contribution caused by the signals intended for receiver 1. This
strategy leads to the following bound on the achievable rate of link 2,

P2 ||h2 ||2
R2 < log2 1+ . (13.12)
σ2
Note that this is the single-user bound, that is, the capacity of the channel
between the transmitter and receiver 2 if receiver 1 was not present.
If receiver 1 treats the signal intended for user 2 as noise, it can then achieve
the following:

P1 ||h1 ||2
R1 < log2 1 + b/s/Hz . (13.13)
P2 ||h1 ||2 + σ 2
The sum capacity of this channel is given by

P2 ||h2 ||2
R1 + R2 < log2 1+ . (13.14)
σ2
The proof of this is the subject of Problem 13.2.

So far, we have presented achievable rates for the two-user user broadcast
channel. Proving that this indeed is the upper bound on the achievable rates
and, hence, is the capacity region of the system is quite difficult and the reader
is referred to Reference [17] for the detailed proof of this.
Vector multiple-access channel

The capacity region of the vector multiple-access channel (that is, multiple-
access channel with multiantenna transmitters and base station) turns out to be
a relatively straightforward extension of the single-antenna case. For simplicity,
consider a multiple-access channel with one receiver and two transmitters where
the transmitters each have nt antennas and the receiver has nr antennas. Let the
channel between the base station and the jth receiver be denoted by Hj . Then,
by using the results for the single-user MIMO channel described in Section 8.3,
we can write the following two inequalities for R1 and R2 , the rates of user 1
and 2 respectively as a function of their respective transmit covariance matrices
T1 , T2 :

1
†
R1 < log2 I + 2 H1 T1 H1 (13.15)
σ

1
†
R2 < log2 I + 2 H2 T2 H2 . (13.16)
σ
Furthermore, we can write the following sum-rate constraint,

1 1
† †
R1 + R2 < log2 I + 2 H1 T1 H1 + 2 H2 T2 H2 , (13.17)
σ σ
which is simply the MIMO channel capacity if both transmitters are treated
as a single transmitter with their antennas pooled together. As in the single-
antenna case, these rates can be achieved by interference cancellation and time
sharing.
The capacity region can then be found as the union of the regions described by
Inequalities (13.15) to (13.17), over all positive-semidefinite matrices T1 and T2 ,
which respect the power constraints. Thus, the capacity region of the two-user
MIMO multiple-access channel is the following:
⎧ ⎫
⎪ † ⎪
⎪
⎪ R 1 < log I + 1
H T H
1 1 1 ⎪
⎪
J ⎨ 2
σ2 ⎬
†
(R1 , R2 ) s.t. R2 < log2 I + σ 2 H2 T2 H2
1
⎪
⎪ ⎪
⎪
tr(T j )≤P ,∀j ⎩ R1 + R2 < log2 I + 12 H1 T1 H† + 12 H2 T2 H† ⎪
⎪ ⎭
σ 1 σ 2
(13.18)
In the expression above, the inequalities in the braces are restrictions that the
pair of rates must satisfy for a given set of transmit covariance matrices T1 and
T2 . The union is of all pairs of rates such that the inequalities are satisfied, and
is taken over all covariance matrices that respect the power constraints, as the
transmitters may choose any pair of covariance matrices that satisfy the power
constraints.
Like the single-antenna, multiple-access channel, the analysis of the two-user
broadcast channel, extends in a straightforward way to broadcast channels with
K users, which yields the following capacity region:

J 1
†
(R1 , R2 ) s.t. Ri < log2 I + Hi T i Hi ∀T ⊆ (1, 2, . . . , K) .
σ2
tr(T j )≤P ,∀j i∈T i∈T
(13.19)
Vector broadcast channel

Unlike the multiple-access channel, the multiantenna broadcast channel is not a
simple extension of the single-antenna case. Consider a system with nt antennas
at the base station and K single-antenna receivers. Suppose that H ∈ Cn t ×K
represents the channel coefficients between the antennas of the base station and
the antennas of the K receivers, and the thermal noise at the receivers is rep-
resented by n ∈ CK ×1 ∼ CN (0, I). Note that with this notation, the sampled
base-band equivalent signals at the transmitters are denoted by the z ∈ CK ×1
as follows
z = H† s + n, (13.20)
where s ∈ Cn t ×1 represents the transmitted samples on each of the transmit

antennas.
The sum capacity of this broadcast channel was found in Reference [326] to
equal

csum = sup log2 I + HDH† , (13.21)
D ∈A
where the set A over which the supremum is taken is the set of all K ×K diagonal
matrices with non-negative entries that satisfy the transmit power constraint
tr{D} ≤ P . Note that the supremum refers to the smallest upper bound.
The corresponding capacity region, which holds for systems with multiantenna
receivers as well, is more complicated to describe and was derived in Reference
[341]. In the following, we shall briefly describe the capacity region and refer the
reader to Reference [341] for details of its derivation.
Let the received signal at the kth receiver which has nr k antennas be given by
zk ∈ Cn r k ×1 as follows,
zk = Hk s + nk ,
where Hk ∈ Cn r k ×n t is the channel matrix between the base station and the
antennas of the kth receiver. The transmitted signal of the base station is s ∈
Cn t ×1 , and nk ∈ Cn r k ×1 is a vector of circularly symmetric, complex, Gaussian
noise of variance σ 2 at the antennas of the kth receiver. The transmitted signal
is a superposition of signals intended for each of the K receivers. If the vector
sk ∈ Cn t ×1 represents the signal intended for the kth receiver, the transmitted
signal vector is

K
s= sk .
k =1
The capacity region of this channel is achieved by dirty-paper coding described

in Section 5.3.4. The dirty-paper coding for the broadcast channel is done succes-
sively. Let the kth user to be encoded be denoted by P(k). We use this notation
to emphasize the fact that user k need not be the kth user to be encoded. This
property is necessary since the final capacity region is given in terms of all possi-
ble encoding orders. The first user to be encoded, P(1), is assigned a codeword s1
that is a function of the data intended for that user. The kth user to be encoded,
P(k), is assigned a codeword sk that is a function of the codeword intended for
that user with the signal intended for users P(1), P(2), . . . P(k − 1) effectively
presubtracted using the dirty-paper coding strategy. Thus, user P(K) sees an ef-
fectively interference-free signal, and user P(k) effectively sees interference only
from users P(k + 1), . . . P(K). Note that P here refers to an ordering of the
integers 1, 2, . . . K. Let the covariance matrices of the codewords be denoted by
5 6
Tk = sk s†k . (13.22)
For simplicity, let us again consider the K = 2 case. Using dirty-paper coding,
if the signal for receiver 1 is encoded first, the achievable rate bound on link 1 is

2 † †
σ I + H1 T1 H1 + H1 T2 H1
R1 < I(z1 ; s1 ) = log2 (13.23)
2 †
σ I + H1 T2 H1

1
†
R2 < I(z2 ; s2 |s1 ) = log2 I + 2 H2 T2 H2 . (13.24)
σ
Note that the link to receiver 2 can operate at a rate as if it had perfect knowledge
of the signal from transmitter 1.
Extending this to K users where the transmissions are encoded in the order
1, 2, . . . K, we have
"K
2 †
σ I + Hk j =k Tj Hk
Rk < I(zk ; sk |s1 , . . . sk −1 ) = log2 "K . (13.25)
†
σ 2 I + j =k +1 Hk Tj Hk
To find the full capacity region, we have to take the convex hull of the union over
all possible encoding orderings and all possible covariance matrices. Note that
the convex hull of a set of points in Rk is the convex set that has the minimum
volume such that all the points are in the convex set.
To describe the general capacity region, we rewrite the above equation for a
general encoding order P so that the rate for the kth receiver to be encoded is
bounded as follows:
RP(k ) < I(zP(k ) ; sP(k ) |sP(1) · · · sP(k −1) )

"K
2 †
σ I + HP(k ) j =k TP(j ) HP(k )
= log2 "K .
(13.26)
†
σ 2 I + HP(k ) j =k +1 TP(j ) HP(k )
Taking the convex hull of the union over all transmit covariances and encoding
orderings, we arrive at the following achievable rate region:
⎧ ⎫
⎨ J ⎬
Convex Hull RP(1) , RP(2) , . . . , RP(K ) . (13.27)
⎩ " ⎭
P,T j s.t. T r (T j )≤P
This scheme is capacity achieving, as shown in Reference [341], which uses more
general assumptions, allowing for different numbers of antennas at each mobile
user and arbitrary noise covariance matrices. The proof is rather involved, and
we refer the interested reader there for details.
13.3 Linear receivers in cellular networks with Rayleigh fading

and constant transmit powers
In this section, we study the performance of some linear receiver structures in

cellular networks. Linear receivers are attractive as they are computationally in-
expensive compared to optimal receivers, which attempt to decode interfering sig-
nals. In this section, we consider three representative linear receivers, namely the
antenna-selection receiver discussed in Section 13.3.3, the matched-filter (MF)
receiver discussed in Section 9.2.1, and the linear minimum mean-square-error
(MMSE) receiver discussed in Section 9.2.3. We shall analyze the performance of
a representative link in such a network with multiple antennas at the receivers,
in the presence of frequency-flat Rayleigh fading.
The linear minimum-mean-squared error (MMSE) receiver discussed previ-
ously is optimal in the sense of maximizing the SINR. However, it requires
13.3 Linear receivers in cellular networks 423
Ro
d RI
Figure 13.3 Portion of a cellular network with hexagonal cells. The smallest circle
containing a cell has radius RI and the largest circle contained within a cell has
radius Ro .
knowledge of channel parameters between the antennas of the desired transmit-

ter and the receiver as well as the covariance matrix of the interference observed
at the antennas of the receiver. Any network protocol utilizing the linear MMSE
receiver will thus need a mechanism to allow receivers to estimate this informa-
tion rapidly. Additionally, the linear MMSE receiver requires a matrix inversion
which can be computationally expensive in low-cost systems in rapidly chang-
ing environments. Thus, while the matched filter receiver and selection combiner
have strictly worse performance than the MMSE receiver, they are attractive
because of lower complexity.
We start with analyzing systems that do not use power control, where all
transmitters use a fixed transmit power of P . We shall approximate the cells as
circles of radius R for simplicity and assume that there are a Poisson number of
transmitters distributed uniformly randomly in the circle with mean number of
points equal to ρ π R2 . For the selection combiner and matched-filter, we provide
a framework for evaluating the CDF of the SINR with hexagonal cells and provide
bounds for the CDF for the MMSE receiver.
Note that the derivations in the remainder of this chapter will be used in the
next chapter on ad hoc wireless networks as well since the expressions we find
can be adapted to ad hoc wireless networks.
13.3.1 Link lengths in cellular networks

In the analysis of cellular systems, it is useful to characterize the distribution of
link-lengths that arise from the cellular model as it impacts the distribution of
signal and interference strengths. Suppose that a given wireless node is randomly
located at some point on a cellular network and establishes a link of length x
with the base station that is closest in Euclidean distance to it.
For the hexagonal cell model (see Figure 13.3) with minimum base station
separation d, the CDF, PDF, and kth moment of x are given by
⎧
⎪
⎪
⎪0,√ 2
⎪
if x < 0,
⎪
⎪
⎪ 3d 2
⎪
2 3π x
, if 0 ≤ x < d2
⎨ √ 2 √ 2
Fx (x) =
2 3π x
3d 2 − 4 d3x
2 cos−1 2xd
(13.28)
⎪
⎪ √ x2 1 √
⎪
⎪ +2 3 d 2 − 4 1 2
, if 2 ≤ x < 3
d 3d
⎪
⎪
⎪
⎪ √
⎩1, if x ≥ 33d ,
⎧
⎪ 4π
⎪ √3d 2 x, if 0 < x < d
⎨ √ 2√
−1
fx (x) = √4π x− 8 3x
cos d
, if d
<x< 3d (13.29)
⎪
⎪ 3d 2 d2 2x 2 3
⎩0, otherwise ,
and
√ k π
% k
& 2 3 d 6 1
x = dτ k +2
. (13.30)
k+2 2 0 (cos τ )
This result is due to the fact that x is statistically equivalent to the distance
between a random point in an equilateral triangle of side length d to the closest
vertex of that triangle. The CDF, PDF, and kth moments of a random point
to the closest vertex of an equilateral triangle are known and can be found in
references such as [210]. Equation (13.29) was given in Reference [249] without
derivation. The interpretation given here is based on Reference [124].
For the Poisson-cell model described in Section 4.3 with base station density
ρb , the link length x has the following CDF and PDF:
+
1 − e−π ρ b x , if 0 < x
2
Fx (x) = (13.31)
0, otherwise ,
+
2 πρb xe−π ρ b x , if 0 < x
2
fx (x) = (13.32)
0, otherwise ,
and the kth moment of the link length is given by

% k& 1 1
x = k Γ 1 + k . (13.33)
(ρb π) 2 2
Equation (13.31) is found by noting that the probability that there is no point
from a Poisson point process of intensity ρb in a region of area πx2 is e−π ρ b x .
2
Hence, the probability that there is at least one point from the Poisson process
inside an area πx2 is 1 − e−π ρ b x , which is precisely the probability that the link
2
length is less than x. The PDF is obtained by deriving the CDF, and the kth
moment is found by direct integration. A more detailed derivation of the PDF
may be found in Reference [210].
13.3.2 General network model

Consider a representative receiver with nr antennas communicating with a single-
antenna transmitter located at a distance r1 from the receiver, in the presence
of n − 1 other transmitters which are interferers to the representative link, and
distributed in an i.i.d. fashion in a circular network of radius R. Suppose that
the transmitters are numbered j = 1, 2, . . . , n, where n may be infinite for the
case of the infinite-plane network. Let the distances between the corresponding
transmitters and the representative receiver be denoted by r1 , r2 , . . . , rn . Using
the inverse power-law model for path loss, we have the following expression for the
vector of received samples at the nr antennas of the representative receiver,
−α /2

n
−α /2
z = h1 r1 s1 + hj ri sj + n , (13.34)
j =2
where sj is the transmitted data sample from transmitter j and the vector n
contains i.i.d. complex, circularly symmetric, Gaussian random variables with
variance σ 2 . Note that the channels between each transmitter and the represen-
tative receiver can be factored into a fading component given by the vectors hj
and large-scale path loss which attenuates power by a factor of rj−α . We shall
assume a common path-loss exponent α for the transmitters.
The SINR of the representative link, link 1 denoted by β, is thus
||w† h1 ||2 r1−α

β = "n † 2 −α + σ 2 ||w||2
, (13.35)
j =2 ||w hj || rj
where
% w
& is the vector of weights applied by the receiver, and it is assumed that
||sj ||2 = 1.
For the matched-filter receiver, wM F = h1 as shown in Section 9.2.1. For
the antenna selection receiver, wA S is a vector of zeros with a single 1 in the
entry corresponding to the largest magnitude entry in h1 . The MMSE receiver
has a more complicated expression that depends on the covariance matrix of
the aggregate interference as well as the channel coefficients between the target
transmitter and the representative receiver.
The next three subsections derive the probability density function of the SINR
under these three receiver structures. The discussion on the matched filter and
antenna selection receivers is based on results from Reference [147], and the
discussion on the MMSE receiver is based on Reference [9].
13.3.3 Antenna-selection receiver

The SINR for the antenna-selection receiver can be rewritten as follows:
||h1m ||2 r1−α P1
β = "n 2 −α
, (13.36)
j =2 ||h1j m || rj Pj + σ
2
where hj k is the kth entry of hj and jm is the element with the largest magnitude
in hj . Pj is the transmit power of the jth transmitter.
Note that ||hj k ||2 are i.i.d., unit-mean exponential random variables as shown
in Section 3.1.10. Hence, the CDF of h1m at some x is equal to the probability
that nr i.i.d. exponential random variables are less than or equal to x. The CDF
of an exponential random variable with unit mean is 1 − e−x for x ≥ 0 and zero
otherwise, as shown in Section 3.1.10. Hence, the CDF of h1m is
Ph m (x) = (1 − ex )n r for x ≥ 0 . (13.37)
By using the binomial expansion,

n
n
(1 + x)n = xk .
k
k =0
We have the CDF Ph m (x), of the strongest channel between the target transmit-
ter and the antennas of the receiver as follows:
+"n
k −k x
r nr
k =0 k (−1) e for x ≥ 0
Ph m (x) = (13.38)
0 otherwise,
and zero otherwise. Thus, if we write the total interference power

"n
I = j =2 ||h1j m ||2 rj−α Pj , conditioned on I and link length r1 , the CDF of the
SINR is

||h1m ||2 r1−α P1
Pr {SINR ≤ x|I} = Pr
≤ x I, r1
I + σ2

x r1α 2

= Pr ||h1m || ≤ 2
(I + σ ) I, r1
P1
α
x r1
= Ph m (I + σ 2 )
P1
nr

nr x rα
= (−1)k exp −k 1 (I + σ 2 ) . (13.39)
k P1
k =0
Note that h1j m for j = 1 are simply i.i.d. exponential random variables. Remov-
ing the conditioning with respect to I, we have

Pr {SINR ≤ x} = dI Pr {SINR ≤ x|I} pI (I)
nr ) *
nr x rα
= (−1)k exp −k 1 (I + σ 2 )
k P1 I
k =0
nr α
) *
nr x r1 2 x r1α
= (−1) exp −k
k
σ exp −k I (13.40)
k P1 P1 I
k =0
where the subscript I on the expectation operator .I is used to emphasize

% that&
the expectation is with respect to the interference power. Note that e−s X
of a random variable X is the Laplacian of X which is defined as the Laplace

transform of the PDF of X, where the Laplace transform is defined in Section
2.11. Note that, aside from this chapter and Chapter 14, the term Laplacian is
used in the context of vector calculus as defined in Section 2.7.1.
Denoting the Laplacian of the interference power by ΦI (s), we have the fol-
lowing result,
nr
nr x rα x rα
Pr {SINR ≤ x} = (−1)k exp −k 1 σ 2 ΦI k 1 . (13.41)
k P1 P1
k =0
The Laplacian of the interference ΦI (s) depends on the particular network model
with the Laplacian of the interference due to cellular models given in Section
13.3.6.
13.3.4 Matched filter

The SINR β for the matched-filter receiver can be written as follows:

† 2 −α
h1 h1 r1
β=" . (13.42)
n † 2 −α
j =2 1 j
h h r j + σ 2 ||h1 ||2
Dividing the numerator and denominator by ||h1 ||2 ,
||h1 ||2 r1−α

β=
"n h †1 2 . (13.43)
−α
j =2 ||h 1 || hj rj +σ 2
A linear combination of circularly symmetric, complex, Gaussian random vari-

ables is still a circularly symmetric, complex, Gaussian random variable. Hence
h †1
||h 1 || hj
is a circularly symmetric Gaussian random variable with unit variance
because the entries of hj are i.i.d. circularly symmetric Gaussians with unit
h†
variance and ||h 11 || has unit norm. We can thus find the SINR β by simplifying
Equation (13.43) as follows:
||h1 ||2 r1−α

β=" 2 , (13.44)
n −α
j =2 h̃ j rj + σ 2
h†
where h̃j = ||h 11 || hj are i.i.d., circularly symmetric, unit variance Gaussian ran-
dom variables. Additionally, note that the interference power in the output of
"n
the matched-filter receiver I = j =2 |h̃j ||2 rj−α is statistically identical to the
interference seen using the antenna selection receiver from the last subsection.
To find the CDF of the SINR, we observe that S = ||h1 ||2 is the sum of nr
exponentially distributed random variables, which is a χ2 distributed random
variable with 2nr degrees of freedom as described in Section 3.1.11. The CDF of
S, PS (.) is given by
r −1
n
xk −x
PS (x) = 1 − e . (13.45)
k!
k =0
By substituting for the norm squared of the channel vector S into Equation
(13.44), we can write the CDF of the SINR as

S r1−α P1
Pr {SINR ≤ x|I} = Pr ≤ x I, r1
I + σ2
α
x r1
= PS (I + σ 2 )
P1
α k
x r1 x rα
r −1
n (I + σ 2
)
P1 − P1 (I + σ 2 )
=1− e 1 . (13.46)
k!
k =0
If we write the interference plus the thermal noise as W = I + σ 2 , we can write

the conditional CDF of the SINR above as follows:
Pr {SINR ≤ x|I} = Pr {SINR ≤ x|W }

α k
x r1
r −1
n
P1 W k − x r 1α W

=1− e P1
. (13.47)
k!
k =0
Removing the conditioning on W ,

k
r −1
n x r 1α x r 1α

P1 − w
Pr {SINR ≤ x} = 1 − dw wk e P1
PW (w). (13.48)
k!
k =0
By using the frequency differentiation property of the Laplace transform dis-

cussed in Section 2.11, the CDF of the SINR is
k
x r 1α
r −1
n k
P1 k d
Pr {SINR ≤ x} = 1 − (−1) ΦW (s)
k! ds k
k =0 x r 1α
s= P1
k k
r −1
n
1 −x r1α
d
=1− ΦW (s) , (13.49)
k! P1 dsk x r 1α
k =0 s= P1
where ΦW (s) = e−sσ ΦI (s). Note that the expression for the CDF of the SINR
2
given in the previous equation could be used to derive the probability of outage
for a wireless system with matched-filter receivers, assuming that an outage event
is defined as the event that the SINR is below some threshold.
13.3.5 Linear minimum-mean-square-error receiver
The MMSE weight vector

The MMSE receiver attempts to minimize the mean of the squared error between
its estimate and the true value of the transmitted data sample s1 . From Equation
(9.71), we find that the MMSE weight vector w is given by
% &−1
w = z z† h1 . (13.50)
Since we assume that the channel parameters are known perfectly by the repre-
sentative receiver here, the maximum SINR beamformer and the MMSE beam-
former are equivalent as shown in Section 9.2.4. The MMSE beamformer is given
by Equation (9.80) as follows,
w = a R−1 h1 , (13.51)
where a is a scale factor that does not affect the SINR, and R is the interference
plus noise covariance matrix that is given as follows,

n
R=P hj h†j rj−α + σ 2 I . (13.52)
j =2
The SINR associated with the MMSE receiver is equal to that of the maxi-
mum SINR beamformer as shown in Section 9.2.4. Thus, from Section 9.2.4, the
SINR β is
P r1−α w† h1 h†1 w P r1−α w† h1 h†1 w

β= = . (13.53)
w† R w w† R w
Substituting the MMSE weight vector from Equation (13.51) and noticing that
any scale factor on w will cancel in the numerator and denominator, we find the
SINR β to be given by
P r1−α h†1 R−† h1 h†1 R−1 h1

β=
h†1 R−† RR−1 h1
= P r1−α h†1 R−1 h1 . (13.54)
Cumulative distribution function of the SINR

As mentioned in the previous section, the CDF of the SINR can be used to
characterize the probability of outage if we assume that a link is in outage if
the SINR is below some threshold. The derivation of the CDF of the SINR
with Rayleigh fading and linear MMSE receivers with transmitters distributed
in a circle given here is based on Reference [9]. Conditioned on the distances
between the transmitting nodes and the representative receiver, that is, fixing
the variables r1 , . . . rn , the CDF of the SINR can be found to be equal to the
following as derived in Reference [105]:

K 2 α k −1
σ 2 r1α Ak σ r1
Pr{SINR ≤ z} = 1 − exp −z z , (13.55)
P (k − 1)! P
k =1
where the coefficients Ak are defined as

⎧ ⎞
⎨ 1 if nr ≥ n − 1 + k
Ak = "
1+ n =r 1−m κ z ⎠, (13.56)
⎩ ! n 1+z r −α r α ; otherwise
j=2( j 1 )
and κ is

κ = rn−α
1
r1α rn−α
2
r1α · · · rn−α

r1α . (13.57)
2≤n 1 < ···n ≤n
Equation (13.55) can be viewed as a conditional CDF, where the conditioning

is with respect to the link lengths r2 , r3 , . . . rn . If the interferers are distributed
independently and with uniform probability inside a circle of radius R, then the
CDF of the SINR can be found by integrating over all possible link lengths of
the interferers:
Pr{SINR ≤ z} = 1−
R R K 2 α k −1
σ 2 r1α Ak σ r1 2 r2 · · · 2 rn
··· dr2 dr3 · · · drn exp −z z .
0 0 P (k − 1)! P R2n −2
k =1
(13.58)
If we assume that the number of interferers n is a random variable and assum-

ing that the number of interferers in the circle of radius R is a Poisson random
variable with mean πR2 ρ and using the fact that the link lengths r2 · · · rn are
independent random variables, the CDF of the SINR β in Equation (13.58) was
expressed in Reference [9] as

2 R 2π
1
Pβ (β) = 1 − exp −σ β r1 exp ρ
α
− 1 r dr dθ
0 0 1 + r−α β r1α
k
r −1
n
1 2
α −k
R 2π
r−α β r1α
× σ β r1 ρ r dr dθ .
k!( − k)! 0 0 1 + r−α β r1α
=0 k =0
(13.59)
The integrals in the above equation can be evaluated as

R 2π R
1 1
dr dθ − 1 r = 2 π − 1 r dr
0 0 1 + r−α β r1α 0 1 + r−α β r1α
R
2 2 −α α

= −π r + π r 2 F1 1, − ; 1 − ; −r β r1
2 2
, (13.60)
α α r =0
where 2 F1(a, b; c; z) is the Gauss hypergeometric function described in Section

2.14.2. Applying the first Euler identity from Equation (2.289) to Equation
(13.60) yields
R 2π R
1 1
dr dθ − 1 r = 2π −1 r dr
0 0 1 + r−α β r1α 1 + r−α β r1α 0
R
π r2 2 r−α β r1α
= −π r2 + F 1, 1; 1 − ; . (13.61)
α r β r1 + 1 r =0
−α α 2 1 −α α
1 + r β r1
At r = 0, the hypergeometric function reduces to

2
F
2 1 1, 1; 1 − ; 1 . (13.62)
α
When the argument is unity, the hypergeometric function can be expressed as a

ratio of products of gamma functions according to the Gauss identity given in
Equation (2.275). Applying the Gauss identity, we have

2 Γ 1 − α2 Γ − α2 − 1
2 F1 1, 1; 1 − ; 1 = 2 . (13.63)
α Γ −2 α
Thus, the right-hand side of Equation (13.61) when evaluated at zero equals
zero. Hence, Equation (13.61) becomes
R 2π
1
dr dθ −1 r
0 0 1 + r−α β r1α

π R2 2 R−α β r1α
= πR −
2
2 F1 1, 1; 1 − ; . (13.64)
1 + R−α β r1α α R−α β r1α + 1
Substituting
1 r−α β r1α
−1=− (13.65)
1 + r−α β r1α 1 + r−α β r1α
into Equation (13.64) yields

k
R 2π
r−α β r1α
ρ dr dθ r
0 0 1 + r−α β r1α
k
π R2 2 R−α β r1α
= (−ρ)k π R2 − F
2 1 1, 1; 1 − ; . (13.66)
1 + R−α β r1α α R−α β r1α + 1
Thus, for a network with a random number n of interferers where n is a Poisson

random variable with mean π ρR2 , with the interferers distributed uniformly
randomly on a circle of radius R, the CDF of the SINR with a linear MMSE
receiver in Rayleigh fading is

FβMMSE (β) = 1 − exp −σ 2 β r1α

ρ π R2 2 R−α β r1α
· exp ρ π R2 − 2 F 1 1, 1; 1 − ;
1 + R−α β r1α α R−α β r1α + 1
r −1
n

1 2 −k
· σ β r1α
k!( − k)!
=0 k =0
k
ρ π R2 2 R−α β r1α
· −ρ π R +2
2 F1 1, 1; 1 − ; . (13.67)
1 + R−α β r1α α R−α β r1α + 1
13.3.6 Laplacian of the interference

For both the matched-filter and antenna-selection receiver, the Laplacian of the
interference plays a key role in the statistical properties of the SINR. We remind
the reader that, in this chapter and the next, we use the term Laplacian to refer
to the Laplace transform of the PDF of a random variable. In this section, we
derive the Laplacian of the interference due to randomly located transmitters
distributed in a circle of radius R. Additionally, we provide without proof an
exact expression for the Laplacian with transmitters distributed randomly, with
uniform probability in a hexagonal domain.
The general technique used here is to first find the Laplacian of a single trans-
mitter randomly distributed in the domain of interest and then use the fact that
the Laplacian of the sum of k i.i.d. random variables is simply the Laplacian of
the single random variable raised to the kth power.
Laplacian of the interference in a circular network

First consider a circular network of radius R with a receiver in the center of the
circle. A point randomly placed with uniform probability on this circle has the
following CDF:
+
1, for R < r
Pr (r) = r 2 (13.68)
R 2 , for 0 ≤ r ≤ R .
Writing the inverse of the path loss as g = rα , we have the CDF of g given by
⎧
⎨1, for Rα < g
Pg (g) = 2 (13.69)
⎩ g α2 , for 0 ≤ g ≤ Rα .
R
Taking the derivative with respect to g to obtain the PDF of the inverse path
loss g,
⎧
⎨0, for Rα < g
pg (g) = 2 −1 (13.70)
⎩ 2 g α 2 , for 0 ≤ g ≤ Rα .
αR
The CDF of the received power due to a randomly located transmitter condi-
tioned on g is therefore
0
h g p1
Pp (p|g) = Pr P ≤ p = Pr h <
g P
+ p
1 − e−g P , for 0 ≤ g
= (13.71)
0, otherwise.
Hence, for g ≥ 0, the CDF of the received power due to a randomly located
transmitter is given by
p
Pp (p) = 1 − L {fg (g)} , (13.72)
P
where L{.} denotes the Laplace transform operator. Using the Laplace transform
differentiation property given in Section 2.11,
p
pp (p) = 1 − L {g fg (g)}
P
1+ α2
2+α
2 P Γ α 2 p
= 2 − R2+α Ei − , Rα , (13.73)
α R2 p1+ α α P
where Ei (., .) denotes the exponential integral (see, for example, Reference [2]).
Finally, the Laplace transform of pp (p) is
2 2
2 sα P α 2 2+α 2 −2 + α −α
ΦP (s) = Γ − Γ + 2 F1 1, − , , −R s P .
α R2 α α α α
If there are exactly k-interferers, then we have the following for the Laplacian of
the interference powers,
k
ΦI (s) = (ΦP (s)) .
Suppose instead that the transmitters are distributed on the plane according to
a Poisson point process discussed in Section 3.4, with average density ρ users
per unit area, and that we are only concerned with the interference from users
located within the circular cell. This scenario may arise from an appropriate
frequency reuse scheme where transmitters in nearby cells operate in different
frequency bands and do not contribute appreciably to the interference seen at the
base station as illustrated in Figure 13.4. In this case, k is a Poisson distributed
random variable with mean equal to the average number of transmitters in the
cell which equals AR ρ with AR = πR2 . The probability mass function (PMF),
which is introduced in Chapter 3, of k is
(ρ AR )k −ρ A R
Pr{k users in cell} = e . (13.74)
k!
The z-transform for the Poisson PMF is
eρ A R (z −1) . (13.75)
Figure 13.4 Circular cell with a Poisson distribution of transmitters. The square in the
middle represents the base station and the crosses represent mobile transmitters.
Note that the z-transform is also known as the probability generating function
for discrete random variables. It is known that the Laplacian for the sum of a
random number k of i.i.d. random variables is simply equal to the z-transform
of the PMF of k evaluated at the Laplacian of a single realization of the random
variable (for example, see [93]). Hence, the Laplacian for the interference from
nodes distributed in the circular cell is

ΦI (s) = eρ A R (z −1)
z =Φ p (s)
= exp (ρ AR [Φp (s) − 1]) . (13.76)
Note that for specific values of the path-loss exponent α, the Laplacian of the in-
terference takes simpler forms. For α = 4 in particular, Equation (13.76) reduces
to
√ √ √
sP −1 sP sP
ΦI (s) = exp π R ρ2
tan −π . (13.77)
R2 R2 2 R2
Hexagonal cells
Consider a portion of a cellular network with hexagonal cells as illustrated earlier
in Figure 13.3. Assume that there is no out-of-cell interference, that is, nearby
cells operate at orthogonal frequency bands, and that the transmitters use con-
stant powers. The Laplacian of the total interference observed at the base station
at the center of the middle cell is
Φ(s) = exp (ρ AR [Φk (s; 1) − 1]) , (13.78)

where the area of the hexagonal cell AR is given in terms of the minimum base
station separation d as follows,
√
3 2
AR = d ,
2
and Φk (s; k) is the Laplacian of the interference due to exactly k interferers dis-
tributed randomly, with uniform probability in the hexagonal cell. Then Φk (s; k)
is given by
√ α
−4 π 2 2 3
Φk (s; k) = √ 2 F1 1, − , 1 − , −s PT GT
3 3 α α d

1 4π 2 2 2
+ √ · (PT GT ) α s2/α Γ − Γ 1+
α 3 d2 α α
√ α
3π 2 2 2
+ · 2 F1 1, − , 1 − , −s PT GT
2 α α d
∞
(2 n)! 8 · 3n
+
n =0
24n +1 (n!)2 (2 n + 1)(1 − 2 n)
√ α
2n − 1 2n − 1 3
× 2 F1 1, ,1 + , −s PT GT
α α d
√
2 3 (2 n)!
−
(2 n + 1)(1 − 2 n) 22 n (n!)2
α k
2n − 1 2n − 1 2
× 2 F1 1, ,1 + , −s PT GT , (13.79)
α α d
where 2 F1 (a, b, c, z) is the Gauss hypergeometric function. Equation (13.79) is

found by using the PDF of link lengths associated with hexagonal cells given by
Equation (13.28), Laplace transform properties, and appropriate Taylor series
expansions.1
Note that while Equation (13.79) is computationally expensive to evaluate, it
only has to be computed for different values of the path-loss exponent α. The
α involving d, PT , and GT can be collected into a single term of the form
terms
s d2 PT GT . By using the frequency scaling property of Laplace transforms
given in Section 2.11, one could compute the Laplacian for different values of d,
PT and GT from a “standard” computation of the Laplacian for any particular
α.
Outage probability of antenna-selection receiver

As an example, consider a circular cell of radius R = 50 m with path-loss expo-
nent α = 4 and a random distribution of users with density 10−3 /m2 . Assume
1 The Laplace transform of the PDF of the interference in hexagonal cells with Rayleigh
fading was found by Yifan Sun (unpublished).
0.9
0.8
nr = 2
Outage Probability
0.7
0.6 nr = 4
0.5
0.4 nr = 16
0.3
nr = 8
0.2
0.1
0
−20 0 20
SINR (dB)
Figure 13.5 Outage probability vs. SINR of an antenna selection receiver in a circular
cell with nr = 2, 4, 8 and 16 receiver antennas.
that the users have single antennas and the base station at the center of the cell
has nr = 2, 4, 8 or 16 receiver antennas and uses the antenna-selection receiver.
The outage probability as a function of the SINR threshold can be calculated by
using Equations (13.41) and (13.77), where the outage probability is the prob-
ability that the SINR is less than or equal to the SINR threshold. The outage
probability is shown in Figure 13.5, which illustrates the diminishing returns of
using antenna selection with large arrays at receivers.
13.4 Linear receivers in cellular networks with power control
In this section we study the uplink of power-controlled cellular systems when

the base stations are equipped with antenna arrays.2 Since the uplink on cellular
systems is many-to-one, the base station can control the power of the mobile
units to ensure fairness and increased performance. We shall again focus our at-
tention on linear receivers, in particular the matched-filter and MMSE receivers.
Currently, most base stations with multiple antennas use their antennas to in-
crease diversity and do not use more sophisticated signal processing techniques
such as the MMSE. Note that the matched-filter receiver is particularly sensitive
to variations in received power as it does not consider the spatial structure of
c 2008, 2012. Reprinted, with permission, from
References [124, 125].
the interference and, hence, works well when the interference is close to being
spatially white. With power control and a large number of interferers, the inter-
ferers will be close to spatially white. If the transmit power in constant (that is,
no power control), the received interference power will be dominated by a few
interferers that are close to the receiver and the aggregate interference will be
not be close to spatially white.
We shall first present an asymptotic result derived in a general system that
admits, but does not depend on, a cellular architecture or power control. We then
apply this result to cellular systems with power control. Note that power control,
whereby nodes whose channels to their respective base stations are strong trans-
mit with lower power to reduce interference, greatly increases the complexity
of the analysis as the transmit signal powers become dependent on the relative
positions of the mobile nodes and their respective base stations. By employ-
ing an asymptotic analysis, we can address the complexity associated with this
correlation of transmit power and spatial position.
13.4.1 System model

Consider a circular wireless network where there are n + 1 transmit nodes and an
unspecified number of receive nodes. Let R1 represent a representative receiver
located at the center of the network. Let Tj denote the jth transmitter and
Rj be the receiver that is in a link with the jth transmitter. We shall analyze
a representative link between R1 and T1 where Tj for j = 2, 3, . . . n + 1 are
cochannel interferers to the representative link.
Each transmitting node in the network has nt antennas and the representa-
tive receiver has nr . We assume frequency flat fading where the channel between
the jth antenna of transmitting node and kth antenna of the representative
√
receiver is modeled as γ gj k where γ is the path loss between transmitting
node and the representative receiver and gj k a zero-mean, unit-variance com-
plex Gaussian random variable. We make the standard assumption that nodes
transmit using Gaussian codebooks. Each transmitting node knows the channel
coefficients between itself and its target receiver, but not any other nodes. We
refer to this assumption as transmit link channel-state information (CSI). Note
that for the purposes of this chapter we refer to channel state information as
information concerning the statistical properties of signals and interference, or
channel vectors and matrices. In contrast, in the discussion of dirty-paper coding
in Chapter 4, channel state at the transmitter refers to the interfering signals
themselves.
Note that transmit link CSI is identical to the information required for the
informed transmitter in isolated MIMO links presented in Section 8.3.1. The re-
ceivers know the channel coefficients between themselves and their target trans-
mitters and the spatial covariance matrix of the interference. Note that transmit
link CSI can be obtained in half-duplex systems with reciprocal channels if chan-
nels do not vary rapidly in time. In such systems, transmitters can use channel
estimates performed when they acted as receivers in the past. The spatial in-
terference covariance matrix can be estimated by receivers by constructing a
sample interference covariance matrix by listening to aggregate transmissions
of the interferers. The asymptotic regime that we consider is when the ratio of
the number of interferers n and the number of receiver antennas nr denoted by
a = n/nr > 0 is a constant.
13.4.2 Optimality of parallelized transmissions with link CSI

Since it is assumed that transmitters do not know the channels between them-
selves and receivers other than their target receivers, they are not able to encode
their transmissions to minimize interference they cause on unintended receivers.
Hence, it is optimal for nodes to parallelize the channels between themselves
and their targets using a singular-value decomposition and to transmit indepen-
dent data streams on each parallelized channel with some power allocation as
shown in Reference [39]. We shall assume that all transmit nodes parallelize the
channels between themselves and their respective receivers, and transmit inde-
pendent data streams on a fixed number of parallel channel modes M ≤ nt . We
assume that the th transmit node allocates power n1r Pj to its jth mode for
= 1, 2, . . . n and j = 1, 2, . . . , M . Pj here is a nominal power allocated to the
jth transmit stream by transmitter . We use this power allocation to maintain a
constant amount of received power as the number of receiver antennas increases
to infinity.
Let the Pj be i.i.d. over all nodes, that is, over all . This assumption is
reasonable given that the transmitters do not know the channel parameters be-
tween themselves and other nodes. For a given transmitter, however, we allow
arbitrary correlation between the powers allocated to their M streams. In other
words, Pj are i.i.d. over and may be correlated over the streams j for a given
transmitter . Let the PDF and CDF of Pj for all and each j be denoted by
fj (x) and Fj (x) respectively, and additionally, assume that the following bound
on the transmit power holds:

M
Pj ≤ Pm ax . (13.80)
j =1
Let the channel coefficients between the antennas of transmitting node and the
jth receiver be contained in the nr × nt matrix γj Hj .
Following the analysis of Section 8.3.2, the spectral efficiency of link between
R and T is given by
⎛ ⎞−1

n

c = log2 I + γ H T H† ⎝σ 2 I + γj Hj Tj H†j ⎠ , (13.81)

j =1,j =
where γj is the path loss between node and node j, and Tj is the transmit
covariance matrix of node j, that is, it is the covariance matrix of the signals
sent on the transmit antennas of node j.
Next, we shall find transmit covariance matrices T that the th transmitter
uses to maximize c from Equation (13.81). Recall that each transmitter only
knows the channel matrix between itself and its target receiver. In other words
the th transmitter only knows the channel matrix H .
Performing a singular-value decomposition on the channel matrix Hj between
transmitter and receiver j yields
†
Hj = Uj Σj Vj . (13.82)
Let the vectors vj and uj denote the jth column of the right singular matrix
V1 and left singular matrix U1 respectively, with λj representing the jth
largest singular value of H1 .
From Section 8.3.2, we know that, without knowledge of the quantity in the
parentheses in Equation (13.81) which is the covariance matrix of the interference
plus noise observed at the antennas of the receiver, to maximize Equation (13.81)
the th transmitter should use a transmit covariance matrix T as follows:
T = V P V† (13.83)
with Pj ∈ Cn t ×n t a diagonal matrix containing power allocations on each of the

modes by the -th transmitter. In other words
1
Pj = diag (Pj 1 , Pj 2 , . . . , Pj n t ) . (13.84)
nr
Recall that only the strongest M modes are allocated nonzero power, that is,
Pj k = 0 for k = M + 1, M + 2, . . . , nt , which means that Pj has rank M . If we
assume that all transmitters in the network use transmit covariance matrices of
this form, substituting Equation (13.83) into Equation (13.81) yields the spectral
efficiency on the th link as
⎛ ⎞−1

n

c = log2 I + γ H V P V† H† ⎝σ 2 I + γj Hj Vj Pj Vj† H†j ⎠ .

j =1,j =
(13.85)
Note that random matrices with Gaussian distributed entries maintain their
statistical properties when multiplied by unitary matrices. Thus, we can write
Equation (13.81) as
⎛ ⎞−1

n

c = log2 I + γ H V P V† H† ⎝σ 2 I + γj H̃j Pj H̃†j ⎠ ,

j =1,j =
(13.86)
where H̃j = Hj Vj are identically distributed as Hj for any choice of Vj where

j = .
Substituting the SVD of the channel H = U Σ V† yields the spectral effi-
ciency of the th link
⎛ ⎞−1

n

c = log2 I + γ U Σ P Σ† U† ⎝σ 2 I + γj H̃j Pj H̃†j ⎠

j =1,j =
⎛ ⎞−1

n
† † ⎝ 2 † ⎠
= log2 I + γ Σ P Σ U σ I + γj H̃j Pj H̃j U ,

j =1,j =
(13.87)
where Σ is a diagonal matrix of the magnitude-squared singular values of the

channel matrix between the th transmitter and th receiver, H , with λj
equal to the jth largest of those singular values.
We shall now proceed by deriving upper and lower bounds to the spectral
efficiency of the representative link (that is, link 1), c1 . We will then show that the
upper and lower bounds converge when the number of antennas at the receivers
and the number of nodes in the network get large. To find the upper bound, let’s
write the product of the three right-most matrices in the determinant expression
above for = 1 as
⎛ ⎞−1
n+1
Q = U†1 ⎝σ 2 I + γ1j H̃j 1 Pj H̃†j 1 ⎠ U1 . (13.88)
j =2
We can now write Equation (13.87) as

c1 = log2 I + γ11 Σ1 P1 Σ†1 Q . (13.89)
With [Q]j k denoting the jkth entry of Q, we have the jth diagonal entry of Q
as
⎛ ⎞−1

n +1
[Q]j j = u†1j ⎝σ 2 I + γ1j H̃j 1 Pj H̃†j 1 ⎠ u1j , (13.90)
j =2
where u1k is the kth column of U1 . Thus, the jth diagonal entry of I+γ11 Σ1 P1 Σ†1 Q
is 1 + γ11 λ1j n1r P1j [Q]j j . Hence, by the Hadamard inequality (see Section 2.2.3),
we can bound Equation (13.89) as
⎛ ⎞
M
1
c1 = log2 I + γ11 Σ1 P1 Σ1 Q ≤ log2 ⎝
†
1 + γ11 λ1j P1j [Q]j j ⎠ ,
j =1
nr
(13.91)
where we have used the fact that P1j = 0 for j > M . This upper bound is
achieved if the matrix Q is diagonal. This would be the case if the M data
streams from transmitter 1 do not interfere with each other. Substituting the
definition of [Q]j j from Equation (13.90) yields
⎛ ⎛ ⎞−1 ⎞
M
⎜ 1
n +1
⎟
c1 ≤ log2 ⎝1 + γ11 λ1j P1j u†1j ⎝σ 2 I + γ1j H̃j 1 Pj H̃†j 1 ⎠ u1j ⎠ .
j =1
nr j =2
(13.92)
After some matrix manipulations, the upper bound on the spectral efficiency
from Equation (13.92) can be written as
−1
M
1 † 1 †
c1 ≤ log2 1 + γ11 λ1j 2
P1j u1j σ I + K1 Φ1 K1 u1j , (13.93)
j =1
nr nr
where the diagonal matrix Φ1 ∈ Rn M ×n M contains the average received powers

from each active stream of the interferers as its diagonal entries:
Φ1 = diag(γ21 P21 , . . . , γ21 P2M , γ31 P31 , . . . , γ31 P3M , . . . ,
γ(n +1)1 P(n +1)1 , . . . , γ(n +1)1 P(n +1)M ) . (13.94)
The entries of the matrix K1 ∈ Cn r ×n M , which comprises the products of the
channel matrices between the interferers and the representative receiver and uni-
tary matrices, are i.i.d. circularly symmetric, complex, Gaussian random vari-
ables of unit variance. The vectors u1j ∈ Cn r ×1 are unit-norm isotropic random
vectors that are mutually orthogonal, and recall that γ is the path loss between
the th transmitter and the representative receiver.
We can also write a lower bound on the spectral efficiency of the represen-
tative link c1 by assuming that the representative receiver performs successive
decoding of the M streams from the representative transmitter. When decoding
a particular stream, the receiver is assumed to use zero-forcing (see Section 9.2.2)
to perfectly null the interference due to the remaining M −1 modes from the rep-
resentative transmitter. This strategy yields a spectral efficiency that does not
include the self-interference from the streams of the representative transmitter
but at the loss of M −1 degrees of freedom which are used up for the zero-forcing,
and are no longer available for interference mitigation. Note that since this is a
specific procedure, the spectral efficiency achieved by this procedure is a lower
bound on the spectral efficiencies that can be achieved on the representative link.
The spectral efficiency has the following lower bound,
−1
M
1 † 1 †
c1 ≥ log2 1 + γ1 2
P1j λ1j ŭj σ I + K̆j Φ1 K̆j ŭj . (13.95)
j =1
nr nr
For the lower bound in Equation (13.95), K̆j ∈ C(n r −M +1)×n M are matrices
whose entries are i.i.d., circularly symmetric, complex, Gaussian random vari-
ables of unit variance, and ŭj ∈ C(n r −M +1)×1 are unit-norm, isotropic, random
vectors. The lower bound is achieved when the receiver uses M − 1 of its de-
grees of freedom to null the interference from the M − 1 other streams from the
desired transmitter when it decodes a particular stream from the target trans-
mitter. When the number of receiver antennas nr M , the degrees of freedom
used for nulling the interference from other streams in the lower bound are a
negligible fraction of the total nr degrees of freedom available at the receiver.
Thus we approximate the spectral efficiency of the representative link in the high
nr regime by the following:
−1
M
1 † 1 †
c1 ≈ log2 1 + γ1 2
P1j λ1j u1j σ I + K1 Φ1 K1 u1j . (13.96)
j =1
nr nr
Note that it can be rigorously shown that in the limit as nr → ∞, the upper
and lower bounds in Equations (13.93) and (13.95) coincide, precisely because
the M − 1 degrees of freedom used for nulling the self-interference in the lower
bound are negligible when nr is large [125].
13.4.3 Asymptotic spectral efficiency of parallelized system

In the previous section, we found upper and lower bounds to the spectral effi-
ciency of the representative link in terms of matrices of the channel coefficients,
signal powers, and path losses. In this section, we derive the spectral efficiency
in the asymptotic regime where the number of antennas at the representative
receiver nr and number of nodes in the network n go to infinity with their ratio
a = nr /n being a positive constant. The asymptotic spectral efficiency is derived
by showing that the upper and lower bounds coincide in the limit.
Suppose that the CDF of the path losses from the interferers to the represen-
tative receiver γ21 , γ31 , . . . , γ(n +1)1 is Ψn r (τ ) and that in the limit as n, nr → ∞
with n/nr = a , Ψn r (τ ) converges to a limiting CDF Ψ(τ ). Then, the spectral
efficiency of link 1 given by Equation (13.96) converges with probability 1 to c∞ 1 ,
which is given by

M
{∞}
c∞
1 = log2 (1 + λj P1j γ11 β) , (13.97)
j =1
where β is the unique, non-negative solution to the equation

∞
dH(x) x
σ2 β + 1 = β a , (13.98)
0 1 + xβ
where
M
1
H(x) = dτ f (τ ) Ψ(x/τ ) (13.99)
M j =1
{∞}
and λj is the limiting value of the jth eigenvalue of an nr ×nr Wishart matrix
†
1
nr G G , where G is an nr × nt matrix with i.i.d. CN (0, 1) entries (see Section
3.5). In particular, if nr , nt → ∞ such that nr /k = d > 0
{∞} {∞} {∞}
√
λ1 = λ2 = · · · = λM = (1 + d)2 (13.100)
and if nt is a finite constant,

{∞} {∞} {∞}
λ1 = λ2 = · · · = λM = 1. (13.101)
{∞}
Note that λ1 is really the transmit beamforming gain that is obtained when
the transmitter phases its signals in favorable directions. In the absence of trans-
mitter CSI, the transmit beamforming gain is just unity. This means that if the
number of transmit antennas per node is fixed, in the limit as the number of
receiver antennas goes to infinity, the transmit CSI is not useful. The reason for
this is that the receive array is able to obtain all the beamforming gain that the
transmit array can provide, and since the number of transmit antennas is kept
constant as the number of receive antennas nr goes to infinity, the cost of using a
fraction of the receiver degrees of freedom for beamforming becomes negligible.
To prove this result, consider the expression for the spectral efficiency of the
representative receiver given in Equation (13.96). It turns out that the effec-
tive SINR at the output of the receive beamformer of the jth mode from the
representative transmitter SINRj at the representative receiver converges with
{∞}
probability 1 to λj P1j γ11 β, as shown in the next subsection. The continuity
of the log function implies that log2 (1 + SINRj ) converges with probability 1 to
{∞}
the limit log2 (1 + λj P1j γ11 β) (see Section 3.2.2). The limit of the sum of
terms that converge with probability 1 is known to equal the sum of the individ-
ual limits (see Section 3.2.2). Hence the sum over j of log2 (1 + SINRj ) converges
{∞}
with probability 1 to the sum of log2 (1 + λj P1j γ11 β).
Convergence of the SINR

From Equation (13.93) let’s define the SINR of the jth data stream from the rep-
resentative transmitter at the beamformer output of the representative receiver
by the following expression,
−1
1 † 1 †
SINRj = γ1 2
P1j λ1j u1j σ I + K1 Φ1 K1 u†1j . (13.102)
nr nr
Note that technically Equation (13.93) is an upper bound so SINRj is really the
SINR of the jth stream, assuming a system that can achieve the upper bound.
Since we shall subsequently show that this upper bound is indeed achievable
in the limit, this notation is consistent in the limit. In the asymptotic regime
described here, the SINRj converges with probability 1 to an asymptotic limit
given by
{∞}
γ1 P1j λj β, (13.103)
{∞}
where β is a unique, non-negative solution for β in Equation (13.98), where λj
are defined in Equations (13.100) and (13.101).
The proof of this result uses the fact that the SINR associated with a CDMA
receiver using MMSE estimation and random i.i.d. spreading codes converges
with probability 1 under certain conditions as shown in Reference [13]. To map
our model to that of CDMA systems with random spreading codes, we use the
fact that uj are isotropic vectors as follows. Note that u1j in Equation (13.96) is
the jth right singular vector of the matrix H11 , which is assumed to have i.i.d.
CN (0, 1) entries. The right singular vectors of a matrix of i.i.d. complex Gaussian
entries (of which u1j is one) are isotropic, have unit norm (as shown, for example,
in Reference [315]), and are statistically independent of their associated singular
values λ1j due to the isotropic property of random vectors of circularly symmetric
complex Gaussian entries. Furthermore, due to this isotropic property of vectors
with i.i.d. CN (0, 1) entries, u1j can be expressed as
1
u1j = gj , (13.104)
||gj ||
where the entries of gj are distributed as i.i.d. CN (0, 1), which means that Equa-
tion (13.102) can be written as
−1
1 1 1 † 2 1 †
SINRj = 1 γ1 P 1j λ 1j g σ I + K 1 Φ1 K 1 gj . (13.105)
n r ||gj ||
2 nr nr j nr
If the empirical distribution function (EDF), which is defined in Section 3.6.1

as the proportion of a set of values that are less than or equal to the argument
of the function, of the diagonal entries of Φ1 converges with probability 1 to a
limiting function H(τ ), the term
−1
1 † 2 1 †
g σ I+ K1 Φ1 K1 g (13.106)
nr nr
was shown in Reference [13] to converge with probability 1 to an asymptotic
limit. Let Hn (x) denote the empirical distribution function of the interference
powers for any given n, that is, Hn (x) is the empirical distribution function
of the entries of the diagonal matrix Φ1 . Then as n, nr → ∞ with a = n/nr
the empirical distribution function of the received interference powers Hn (x),
converges to a limiting function H(x) as
M
1
Hn (x) → H(x) = dτ f (τ ) Ψ(x/τ ) . (13.107)
M j =1
The proof of this can be found in Reference [127]. If Hn (x) → H(x), it was
shown in References [13] and [313] that the term
−1
1 † 2 1 †
g nr σ I + K1 Φ1 K1 gj
nr j nr
converges with probability 1 to an asymptotic limit, which we define as βj . Note
that βj is the unique, non-negative solution for β(z) in the equation
∞
dH(τ ) τ
z β(z) + 1 = β(z) a (13.108)
0 1 + τ β(z)
when the dummy variable z = −σ 2 . Note that all the βj s for j = 1, 2, . . . M

converge to the same value. Additionally, note that
1
→1
1
nr ||g||2
with probability 1 by the strong law of large numbers. If nt → ∞ as nr → ∞

with nr /nt = d, n1r λ1j for j = 1, 2, . . . , M is known to converge with probability
{∞} √
1 to an asymptotic limit λj = (1 + d)2 for j = 1, 2, . . . , M (for example, see
Reference [315]). If nt is a finite constant, by the strong law of large numbers
1
n r λ1j for j = 1, 2, . . . , M converges with probability 1 to unity.
Hence, each random term in Equation (13.102) converges with probability 1
to a constant. It is known that for sequences of random variables Xn and Yn ,
if Xn → X and Yn → Y where the convergence is with probability 1, then
Xn Yn → X Y with probability 1 as well (see Section 3.2.2). Hence, Equation
(13.102) converges with probability 1 as follows,
{∞}
SINRj → γ1 P1j λj βj . (13.109)
Thus, the SINR for each of the M modes of the representative link converges with
probability 1 to the right-hand side of Equation (13.109). This implies that the
spectral efficiency, which is simply the sum of log2 (1 + SINRj ) over j, converges
with probability 1 as follows

M
c1 → log2 (1 + SINRj ) . (13.110)
j =1
13.4.4 Application to power-controlled systems without out-of-cell interference

We shall now apply the results of the previous section to cellular systems with
power control, where users are restricted to a single cell. Assume that there are
n + 1 transmit nodes in the cell and consider the spectral efficiency of a rep-
resentative link between one of the transmit nodes and a representative receive
node. The remaining n transmitters are interferers to the representative link. To
apply the results of the previous section to power-controlled systems, we can
assume that the transmit power from each user is a constant P , and the path
losses between each transmitter and the representative receiver are all equal to γ.
Hence, when averaged over the realizations of the channel matrices Hij ∈ Cn r ×n t ,
which are unit variance, circularly symmetric, complex, Gaussian random vari-
ables, the average power received on each antenna from a given transmitter
equals P γ.
In this case, the CDF of path losses Ψ(x) can be expressed in terms of a “step”
function with a step at γ, and Equation (14.8), which is the implicit equation
for β, becomes
∞
1 x fj (x/γ)
M
−σ 2 β + 1 = β a dx , (13.111)
0 M j =1 1 + x β
where γ is the path loss between the representative transmitter and represen-
tative receiver. Because the transmit powers are all equal to P , the probability
density function of the transmit powers f (x) = δ(x − P ) and Equation (13.111)
becomes
β aP γ
−σ 2 β + 1 = . (13.112)
1+P γβ
By applying the quadratic formula and selecting the positive term, the limiting
value for β, which we denote by βep to emphasize that this is from the equal
received power model, is found to be
1−a 1 (1 − a)2 1+a 1

βep = 2
− + 4
+ 2
+ . (13.113)
2σ 2P γ 4σ 2Pγ σ 4 P 2 γ2
Note that the equal received power model can be used to model large cell
networks with power control as nodes that are closer to base stations can re-
duce their transmit power to meet a target received power requirement at the
base station.
Thus, from Equation (13.97) the limiting spectral efficiency on the represen-
tative link for this model is

M
{∞}
c1ep = log2 (1 + λj P γ11 βep ) . (13.114)
j =1
13.4.5 Monte Carlo simulations

Monte Carlo simulations of systems with power control whose spectral efficiency
is given by Equation (13.114) with λ∞ 1j approximated by Equation (3.147) are
shown in this section. We assumed σ 2 = 1 × 10−13 W . We simulated systems
with the ratio of interferers to receiver antennas n/nr = 1 and n/nr = 4 with
a common path loss to the representative receiver of γ = −125 dB. The repre-
sentative transmitter had a path loss of −100 dB to the representative receiver.
Each experiment was repeated 1000 times. Each transmitting node transmit-
ted P = M1 W on each of the M data streams. For both models we simulated
M = 1, 2, 4 and 8 streams per transmitting node with equal numbers of antennas
at all transmitters and at the representative receiver, that is, nr = nt .
In Figures 13.6 and 13.7, results of the simulations for n/nr = 1 and n/nr = 4,
respectively, are shown, along with the asymptotic spectral efficiency predicted
by Equation (13.114). The points represent a random sampling of 100 trials
from the 1000 trials of the simulation for each nr . The standard deviation of the
spectral efficiency from 1000 trials is plotted using dashed lines. The convergence
2
10
Asymptotic 8 Streams
Simulation 4 Streams
2 Streams
1 1 Stream
10
4 Streams 8 Streams
0 1 Stream 2 Streams
10
−1
10 0 1 2
10 10 10
Number of Antennas
Figure 13.6 Simulated and asymptotic spectral efficiency vs. number of antennas with
power control and the ratio of nodes to receive antennas n/nr = 1. The dashed lines
represent the standard deviation of the simulated spectral efficiencies. The path loss
of each interferer was assumed to be −125 dB, with noise power of 10−1 3 W, and
path loss exponent α = 4. The representative link had a path loss of −100 dB.
2
10
Asymptotic
Simulation 8 Streams
4 Streams
2 Streams
1
10 1 Stream
4 Streams 8 Streams
0 1 Stream 2 Streams
10
−1
10 0 1 2
10 10 10
Number of Antennas
Figure 13.7 Simulated and asymptotic spectral efficiency vs. number of antennas with
power control and the ratio of nodes to receive antennas n/nr = 4. The dashed lines
represent the standard deviation of the simulated spectral efficiencies. The path loss
of each interferer was assumed to be −125 dB, with noise power of 10−1 3 W, and
path loss exponent α = 4. The representative link had a path loss of −100 dB.
of the spectral efficiency is evident from the figure since the points representing
different trials of the simulation converge with increasing numbers of receive
antennas nr . Additionally, note that the standard deviation decays with nr ,
which indicates convergence in the mean-square sense. For nr ≥ 14 antennas
and 1000 trials, the largest deviation from the asymptotic prediction is less than
15% for n/nr = 1 and n/nr = 4. For nr ≥ 25, the largest deviation falls below
10% in both cases.
13.5 Matched-filter receiver in power-controlled cellular networks
Consider an identical network to the one from Section 13.4, but now assume that
the receivers use matched-filter receivers (see Section 9.2.1) with single stream
transmissions, that is, the covariance matrices at the transmitters are unit rank.
The matched filter is attractive because it is simpler to compute than the MMSE
receiver, and does not require any information about the spatial structure of the
interference, which reduces protocol complexity. The SINR at the output of the
matched filter can be found to converge in probability to [313]
P
SINR = c, (13.115)
σ 2 + a p
where p is the expected value of the received interference power from any given
user in the network. This expression is computed by using the limiting empirical
distribution function of the interference powers seen at the receiver and the ratio
of the number of interferers to receive antennas a = n/nr .
For systems with power control where the received powers from interferers and
the representative transmitter are identically P , then p = P and the SINR is
given by [323, 313],
P 1
SINR = = 1 , (13.116)
σ2 + a P P σ2 + a
where σP2 is the normalized SNR at the input to the receiver.

In Figure 13.8, plots of the limiting SINR of the linear minimum-mean-square-
error and matched-filter receivers in different normalized SNR conditions are
shown. The limiting SINR for the MMSE receiver is given by Equation (13.103)
{∞}
with λ1 = 1 since we assume no channel-state information at the transmitter.
Observe from the plot that the MMSE receiver significantly outperforms the
matched filter when the SNR is high. This is because in the high SNR regime, the
MMSE receiver is able to use the spatial structure of the interference to null out
interferers whereas the matched-filter receiver only focuses the signals arriving
from the target transmitter. This difference in operation of the matched-filter
and MMSE structures also explains the significant decrease in the asymptotic
SINR for the high SNR cases when the ratio of interferers to receiver antennas
a = 1, because for a ≤ 1 the MMSE receiver is able to null out all interferers.
20
15
10
Limiting SINR (dB)
−5
Matched Filter (20 dB SNR)
−10 Linear MMSE (20 dB SNR)
Matched Filter (10 dB SNR)
−15 Linear MMSE (10 dB SNR)
−20 Matched Filter (0 dB SNR)
Linear MMSE (0 dB SNR)
−25 −2 −1 0 1 2
10 10 10 10 10
Users per receiver degrees of freedom n/nr
Figure 13.8 Limiting SINR versus users per degree of freedom for linear receivers. The
transmitter was assumed to not have channel-state information.
If a > 1, however, the MMSE receiver is unable to null out all the interference,
and at high SNR, there will be a significant amount of interference remaining
after the MMSE receiver. Note that in the low SNR cases, the matched filter
and the MMSE have approximately the same asymptotic SINR because when
the SNR is low, the interference is not very significant compared with the noise
and thus the matched filter which is optimal in noise performs nearly as well as
the MMSE receiver.
13.5.1 Application to power-controlled systems with out-of-cell interference

We shall now consider cellular architecture with power control and out-of-cell
interference. In particular, we consider the case in which the frequency reuse
factor is one, that is, all cells operate in the same frequency band. To derive
asymptotic expressions for the spectral efficiency, we apply the results of Section
13.4 to a cellular architecture.
Consider a plane that has base stations at arbitrary locations with area density
ρt . Each cell is associated with one base station and comprises the region of the
plane that is closer to that base station than any other. Thus, the cells constitute
a Voronoi tessellation [235] of the plane with the base station locations as the
generator points. We shall assume that there exists a base station at the origin,
which we shall refer to as the representative receiver. We further assume that all
cells are of bounded area, which holds for most models that have regular cells
(for example, hexagonal cells) and holds with probability 1 if the base station
locations form a homogenous Poisson point process. In Figure 13.9, one such
case is shown where base stations are at hexagonal grid points with adjacent
base stations separated by distance d.
Suppose that overlaid on this cellular architecture is a circular network with n
wireless transmitters located at random i.i.d. points in a circle of radius R such
that
n = ρw π R 2 , (13.117)
where ρw is the area density of wireless nodes in this network. Let these n
transmitters, which will act as interferers to a representative link, be numbered
2, 3, . . . n + 1. Additionally, suppose that a representative transmitter is at dis-
tance r1 from the representative receiver and is located in the cell associated
with the representative receiver at the origin.
In order to limit the complexity of the following exposition, we assume that
the transmitters have single antennas and no channel-state information, and that
average power received at a distance r from a transmitter transmitting with
power P̄ is
P̄ r−α . (13.118)
α /2−1
Additionally, suppose that the th transmitter transmits with power nr P ,
and controls its transmit power as follows:

pt α
P = min rt , Pm ax , (13.119)
Gt
where rt is the distance between the th transmitter and its nearest base station.
Thus, Equation (13.119) models a scheme where the th wireless node tries to
α /2−1
achieve a target received power (relative to path loss) of nr pt at its nearest
base station, with the maximum power constrained by Pm ax . The scale factor
α /2−1
of nr applied to the transmit power is done in order to keep the system
interference limited as the number of receive antennas nr → ∞ as without this
scale factor, the MMSE receiver will be able to suppress interference to levels
comparable to the noise, resulting in a system that is no longer interference
limited. Note that, since both the representative transmitter and all interferers
are assumed to apply this scaling, the scaling does not affect the SINR when the
noise is negligible. For a given set of base stations, the link lengths and hence
transmit powers are independent random variables as they depend solely on the
locations of the wireless nodes, which are independent by assumption. Hence,
results derived using the assumptions of the previous section can be applied to
this network model with the transmit-power PDF fP (p), which depends on the
base station locations.
α /2
The SINR in interference-limited networks is known to grow as nr [123].
Hence, as nr increases, the SINR will increase without bound. Thus, we define a
Base Stations
Wireless Nodes
Figure 13.9 Illustration of base stations on a hexagonal grid with wireless nodes in a
circular network.
normalized version of the SINR that normalizes out the rate of growth with the
number of receiver antennas as
βn r = n−α
r
/2
SINR, (13.120)
and the thermal noise is equal to σ 2 . On the basis of the analysis of Section
13.4, as long as the empirical distribution of the received interference powers
converges with probability 1 to an asymptotic limit function H(τ ), we will be
able to evaluate the limiting normalized SINR. The normalization of scaling of
the interference powers assures that the empirical distribution of the received
interference powers does indeed converge to a limiting function.
To explicitly show this convergence and to find H(τ ), recall that the represen-
tative receiver at the origin is connected to a representative transmitter that we
call node 1 at a distance r1 from the origin. The remaining transmitting nodes
numbered 2, 3, . . . n + 1 are treated as interferers. Let p̃ equal the average re-
ceived power (averaged over the channel fading) from transmitter at all the nr
antennas of the representative receiver and be given by
p̃ = nαr /2 P Gt r−α for = 2, 3, . . . n + 1. (13.121)
Representative receiver at the origin
r
th wireless node
rb
db
Base station
Cell boundary
Figure 13.10 A representative cell in which the th wireless node is located. The
distance between the wireless node and the receiver at the origin is bounded by the
difference between, and the sum of, the distance between the base station associated
with the th transmitter and the radius of the smallest circle centered at the base
station that contains the cell.
Recall that n wireless interferers are distributed in a circle of radius R centered

at the origin. The CDF of the average received power p̃ from a wireless node
is

Pr{p̃ ≤ x} = dP Pr{p̃ ≤ x|P }fP (P )

= dP Pr{P nαr /2 r−α ≤ x|P } fP (P )
+ α1 7
r P
= dP Pr √ ≥ P fP (P ) . (13.122)
nr x
Note that r and P are correlated as P is dependent on the distance of

the wireless node from its closest base station and the locations of the base
stations are fixed. Suppose that the closest base station to wireless node is at
distance rb from the origin and the radius of the smallest circle that contains
the cell associated with that base station is db which we assume is bounded. A
representative diagram of this is given in Figure 13.10. Observe that the distance
between the wireless node and the receiver at the origin must satisfy
rb − db ≤ r ≤ rb + db . (13.123)

Substituting the inequalities in Equation (13.123) into Equation (13.122) yields

+ α1 7
rb − db P
dP Pr √ ≥ P fP (P ) ≤ Pr{p̃ ≤ x}
nr x
+ α1 7
rb + db P
dP Pr √ ≥ P fP (P ) . (13.124)
nr x
As the number of receiver antennas goes to infinity, the upper and lower bounds
in the expression above approach each other. In other words, as nr → ∞,
+ α1 7
rb P
Pr{p̃ ≤ x} → dP Pr √ ≥ P fP (P ) . (13.125)
nr x
Since the transmit power of the th node is asymptotically not dependent on the
distance of its base station from the representative receiver at the origin,
+ α1 7
rb P
Pr{p̃ ≤ x} → dP Pr √ ≥ fP (P ) (13.126)
nr x
+ α1 7
r P
→ dP Pr √ ≥ fP (P )
nr x
1
P α √
= dP 1 − Fr nr fP (P ) , (13.127)
x
where Equation (13.127) results from substituting Equation (13.123). Note that
the inequalities in Equation (13.123) hold if the cells are bounded. For the Pois-
son cell model, this may not be the case although the cells have finite area with
probability 1. In the case of Poisson cells, however, the convergence in Equa-
tion (13.126) can be shown by using an alternate technique which is given in
α /2
Reference [126]. Substituting Equation (13.117), a = n/nr , and b = π aρ w ,
the probability that the received power from the th interferer at the repre-
sentative receiver is less than or equal to a value x converges in the following
manner
−α / 2 − α2
n
−
x nr
π ρw P
Pr{p̃ < x} → dP n I+ −α / 2 − α1 √ 7 f (P )
P
x nr n
π ρw 0< P < π ρw

π ρw x − α2
= dP 1 − I{P b< x< ∞} fP (P )
a P
2
π ρw P α
= dP fP (P ) 1 − I{P < x }
a x b
x π ρ 5 6
w −2 π ρw − 2 ∞ 2
= FP − x α P 2/α + x α dP fP (P )P α ,
b a a x
b
(13.128)
where IA is the indicator function which equals 1 if the condition A is true and
zero otherwise. By the Glivenko–Cantelli theorem (see, for example, Reference
[131]), the empirical distribution function of a set of i.i.d. random variables con-
verges uniformly, with probability 1, to its CDF (for example , see Reference [79]).
Hence, the empirical distribution function of the p˜ s converges with probability
1 to the right-hand side of Equation (13.128), that is, H(x) = Pr{p̃ < x, P }.
The derivative of H(x) is

dH(x) 2 π ρw 5 2 6 − 2 −1 2 π ρw − 2 −1 ∞ 2
= Pα x α − x α dτ fP (τ )τ α . (13.129)
dx aα aα x/b
The right-hand side of Equation (13.108) thus becomes

∞ ∞
2 π ρw 5 2 6 τ − α
2
τ dH(τ )
ma = ma dτ Pα
b 1+τm 0 aα 1 + mτ
∞ ∞
2 π ρw τ − α
2
2
− ma dτ dx fP (x) x α .
0 aα 1 + mτ τ /b
(13.130)
The first term on the right-hand side of Equation (13.130) is evaluated using
Lemma 1 of Reference [123], which yields
∞
τ dH(τ ) 2 π ρw 5 2 6 2 2π
ma = P m π csc
α α
b 1+τm α α
∞ ∞
τ−α
2
2 π ρw m 2
− dτ dx fP (x) x α . (13.131)
α 0 1 + m τ τ /b
Substituting Equation (13.131) into Equation (13.108) yields

2 π ρw 5 2 6 2 2π
z m(z) + 1 = P α m(z) α π csc
α α
∞ − α2 ∞
2 π ρw m(z) τ 2
− dτ dx fP (x) x α . (13.132)
α 0 1 + m(z) τ τ /b
Since βn r → β = m(−σ 2 ), substituting z = −σ 2 and β = m(z) into Equation

(13.132) yields an expression for β as follows:

2 π ρw 5 2 6 2 2π
−σ 2 β + 1 = P α β α π csc
α α
∞ − α2 ∞
2 π ρw β τ 2
− dτ dx fP (x) x α . (13.133)
α 0 1 + β τ τ /b
Rearranging terms yields the following result. As the number of interferers n →

∞, the number of antennas nr → ∞, and the outer radius of the network R → ∞
such that a = n/nr > 0 and ρw = π nR 2 are constants, then βn r → β with
probability 1 where β is a unique, non-negative, real solution to the following
equation:
5 6 ∞
2 π ρ β r1α −2 τ−α
2
π 2π
P 2/α
β 2/α
csc − α dτ
α α P1 α 2
0 1+τβ
∞
α
2 β r1α −2 σ 2 P12
× dx fP (x) x α + 1− α2
= . (13.134)
τ /b 2 Gt ρw π P1 2 ρw π r12
We assume that all transmit nodes use Gaussian codebooks and the receiver uses
single-user decoding. Hence, the spectral efficiency of the representative link in
the limit is given by the Shannon formula from Equation (5.33) as follows:
c = log2 (1 + SINR) = log2 (1 + nαr /2 β) .
Since the log function is continuous, as n, nr → ∞ such that a = n/nr and

n
ρw = π R 2 , the following expression holds with probability 1 (for example, see
Section 3.2.2):
c − log2 (nαr /2 ) → log2 (β). (13.135)
This previous expression indicates that the spectral efficiency grows approxi-
α /2
mately as log2 (nr β). Hence, with appropriate normalization, the spectral ef-
ficiency converges to an asymptotic limit with probability 1 as nr → ∞. Addi-
tionally, the deviation of the mean spectral efficiency from its asymptotic value
can also be shown to decay to zero, that is,

c − log2 (1 + nαr /2 β) → 0. (13.136)
The previous expression implies that the asymptotic spectral efficiency log2 (1 +
α /2
nr β) is a good approximation for the mean spectral efficiency as the difference
between the two quantities decays to zero with increasing numbers of receiver
antennas nr .
From Equation (13.134), it is unclear what the limiting normalized SINR β is.
To obtain a more meaningful expression for β and the spectral efficiency, we can
simplify Equation (13.134) by showing that the second term on the left-hand
side of Equation (13.134) is small if the number of nodes in the network n is
much larger than the number of antennas at the representative receiver nr , that
is, when b is small. When b is small, the lower limit of the following integral,
∞
2
dx fP (x) x α , (13.137)
τ /b
approaches infinity; hence, the integral approaches zero as b → 0. This property

can be proved more rigorously as in Lemma 3 of Reference [125]. Thus, when
the ratio of the number of antennas at the representative receiver to the number
of interferers is high (that is, large a), Equation (13.134) can be written as3
5 26 2
α
π P α βα 2π β r1α −2 σ 2 P12
csc + 1− α
≈ . (13.138)
α α 2 Gt ρw π P 2 2 ρw π r12
1
Additionally, in the interference-limited regime, we assume that σ 2 is sufficiently

small that the second term on the left-hand side of Equation (13.134) is dom-
inated by the first. Equating the first term on the left-hand side of Equation
(13.138) with the right-hand side, substituting the definition of the normalized
SINR βn r , and rearranging terms yields the following approximation for the
SINR when nr is large:
α /2
nr
SINR ≈ P1 Gα % 2/α& , (13.139)
P π ρw r12
where
α2
α 2π
Gα = sin . (13.140)
2π α
Hence, for large nr , the mean spectral efficiency given the length of the repre-
sentative link r1 and its transmit power P1 can then be approximated as
⎛ ⎛ ⎞α2 ⎞
⎜ nr ⎠ ⎟
C(r1 , P1 ) ≈ log ⎝1 + P1 Gα ⎝ 5 2 6 ⎠. (13.141)
P α 2
π ρw r1
Suppose now that the link lengths are bounded such that the maximum distance
between any transmitting node and its desired receiver rM ≤ (Gt PM /pt )1/α . We
call this the sufficient power case since every wireless node can satisfy the target
received power (relative to path loss) at its desired receiver. The sufficient power
case corresponds to the base station separation being small enough that the
target received power is attained by each wireless node. Substituting Equation
(13.119) into Equation (13.141),
⎛ ⎛ ⎞α2 ⎞
⎜ pt α ⎜ nr ⎟ ⎟
c ≈ log2 ⎜ ⎜
⎝1 + Gt r1 Gα ⎝ ) α2 *
⎟ ⎟
⎠ ⎠
pt α
Gt rti π ρw r12
α2
nr
= log2 1 + Gα , (13.142)
rti
2 πρ
w
which is a function of the second moment of the link lengths arising from the
cell shapes. Hence, we can evaluate the spectral efficiency for different models
3 This approximation requires the solution of Equation (13.134) to be a continuous function
of β, which holds if the path loss exponent α is rational, as Equation (13.134) can then be
raised to a sufficiently high power, resulting in a polynomial equation in the limit of the
normalized SINR β, with real coefficients, which are known to have continuous roots.
1
10
20%
Mean Spectral Efficiency (b s−1 Hz−1)

0
10
−1
10 5%
10%
−2
10
ρ = 10−2 nodes m−2

ρ = 10−3 nodes m−2
−3
Asymptotic
10 0 1 2
10 10 10
Number of Base Station Antennas
Figure 13.11 Mean spectral efficiency vs. number of receive antennas for ρw = 10−3
and ρw = 10−2 nodes/m2 with unlimited transmit powers.
of the cellular architecture by computing the second moment of the link lengths
associated with the cell shape. For the hexagonal-cell model, the second moment
of the link lengths can be found using Equation (13.30) with k = 2, to compute
the second moment.
√
% 2& 3 sin π6 1 + 2 cos2 π6 5 2
x = d2 = d ≈ 0.14 d2 .
24 cos3 π6 36
Substituting into Equation (13.142) yields the following simple approximation
for the mean spectral efficiency of interference-limited hexagonal-cell systems
with a large number of receive antennas per base station:
α2
nr
c ≈ log2 1 + Gα , (13.143)
0.14 d2 π ρw
where the averaging is with respect to the locations of the interferers and repre-
sentative transmitter, and channel fading coefficients.
The mean uplink spectral efficiency from Monte Carlo simulations for wireless
node densities of ρw = 10−3 and ρw = 10−2 nodes/m2 , and unlimited trans-
mit powers per node versus the number of antennas at the representative base
station is illustrated in Figure 13.11. The square and asterisk markers represent
simulations of wireless networks with node densities of 10−2 and 10−3 nodes/m2 ,
respectively, and the solid lines represent the asymptotic mean spectral efficiency
from Equation (13.143).
Note that the asterisk and square markers coincide, indicating that the abso-
lute density of wireless nodes does not affect the mean spectral efficiency, and it is
the relative density of wireless nodes to base stations that matters. Furthermore,
it is clear that the asymptotic approximation Equation (13.143) holds when the
number of receive antennas nr , is sufficiently large. For instance, when the base
station density is 20% of the wireless node density, the asymptotic and simulated
mean spectral efficiency are within 10% of each other when the number of receive
antennas nr ≥ 10. For lower densities of base stations, the convergence is slower,
for example, when the base station density is 5% of the wireless node density, the
difference between the simulated and asymptotic mean spectral efficiency drops
below 10% only when nr > 37.
Low power
√ budgets 1/α
If d > 3 (Gt Pm ax /pt ) , each node has a maximum power budget that is so
low that nodes may sometimes fall so far away from the nearest base station that
they cannot meet the target power requirement and hence some fraction of nodes
transmit at full power. In this case, the expected
% value
& of the transmit power
of the wireless nodes raised to the power 2/α, P 2/α (which is required to find
the mean spectral efficiency using Equation (13.141)) takes the following √forms
α
which can be found by using straightforward calculus. If Pm ax < pt /Gt d/ 3 ,
that is, a randomly located wireless node has some nonzero probability of
being unable to achieve the target received power, the following two cases
apply.
pt
d α
(1) If Pm ax < Gt 2 then
5 6 √ α2
2 2 3π Gt 4
Pα = Pmα ax − Pmα ax . (13.144)
3 d2 pt
d α √ α
(2) If pt
Gt 2 ≤ Pm ax < pt
Gt 3
3
d , then
5 26 √ − 2
2 π 3 pt α 4
P α = Pm ax −
α
2
Pmα ax
3d Gt
√ − 2 α1
2 3 pt α 4
−1 d pt
+ 2 Pm ax cos
α
d Gt 2 Gt Pm ax
√ 2 √ 2
3 d pt α 5 3 α2 Gt Pm ax α
+ − Pm ax 4 − d2 .
12 Gt 6d pt
(13.145)
5 26
When we substitute the appropriate expression for P α from above into Equa-
tion (13.141), we obtain the mean spectral efficiency for a link of given length r1 .
Averaged over the PDF of link lengths associated with the hexagonal-cell model,
the mean spectral efficiency is

⎛ ⎛ ⎞α2 ⎞
( P p t G ) −1 α
m ax t ⎜ pt α ⎝
5 26 r
n ⎠ ⎟
c = dx log2 ⎝1 + Gα x ⎠ fx (x)
0 G t P α π ρw x2
⎛ ⎛ ⎞α2 ⎞
√33 d
⎜ ⎝5 n
6 r ⎠ ⎟
+ −1 dx log2 ⎝1 + Gα Pm ax 2
⎠ fx (x)
pt 2
(Gt Pm ax ) α
P α π ρw x
⎛ ⎛ ⎞α2 ⎞
−1/α
pt ⎜ pt ⎝ n
5 2 6r ⎠ ⎟
= Fx log2 ⎝1 + Gα ⎠
Gt Pm ax Gt P α π ρw
⎛ ⎛ ⎞α2 ⎞
√33 d
⎜ ⎝5 n
6 r ⎠ ⎟
+ −1 dx log 2 ⎝1 + Gα Pm ax 2
⎠ fx (x) ,
pt 2
(Gt Pm ax ) α
P α π ρw x
(13.146)
where the CDF Fx (x) and PDF fx (x) of the % link &lengths are given by Equations
2/α
(13.28) and (13.29) respectively, and the P are from the previous set of
expressions. The second term on the right-hand side of Equation (13.146) cannot
be easily evaluated in closed form but can be evaluated efficiently using standard
numerical integration techniques.
The mean spectral efficiency versus number of receive antennas for ρw = 10−4 ,
with 200 mW maximum transmit power per wireless node is illustrated in Figure
13.12. The different markers represent the simulated mean spectral efficiencies
for different relative densities of tethered to wireless nodes. The solid lines are the
predicted asymptotic mean spectral efficiencies obtained by numerically evaluat-
ing Equation (13.146). The close agreement between the simulated values and the
asymptotic prediction illustrates the utility of Equation (13.146) in estimating
the mean spectral efficiency.
Random cells
Suppose that instead of at hexagonal lattice sites, the base stations were located
at random points in the plane according to a Poisson point process with intensity
ρt nodes/m2 . The cells generated by such a process have random shapes and
constitute a Poisson–Voronoi tessellation of the plane, where the Voronoi cell
associated with each base station is the subset of the plane that is closer in
Euclidean distance to that base station than any other base station. Figure 13.13
illustrates a portion of such a network. The base stations are the circles and the
cell boundaries are the solid lines.
In general, the distances between wireless nodes and their closest base station
are correlated random variables, which implies that their transmit and received
powers at any receiver are correlated as well. This correlation arises because link
lengths of wireless nodes are related to each other through the random locations
1
10

0
10
−1
10 ρt /ρw = 0.2
ρt /ρw = 0.1
ρt /ρw = 0.05
−2
Asymptotic
10 0 1 2
10 10 10
Figure 13.12 Mean spectral efficiency for ρw = 10−4 nodes/m2 with different relative
densities of tethered to wireless nodes. The transmit power budget was 200 mW and
path-loss exponent α = 4.
Figure 13.13 Illustration of network with base stations at random locations.
of the base stations. Intuitively, if a particular link is long, it is likely that that
link is located in a large cell, in which case the nearby wireless nodes will also
tend to have long links, which leads to a correlation between link lengths and
consequently transmit powers.
We cannot directly apply the technique used for the hexagonal-cell system
in the previous section to find the mean spectral efficiency for the random cell
model as it requires the transmit powers of individual wireless nodes to be in-
dependent random variables. However, conditioned on a particular realization of
the base station point process (that is, conditioned on a given set of base station
positions), the transmit powers of the wireless nodes are independent as they
are simply functions of the wireless node locations, which are independent by
assumption. We can then write an expression for the mean spectral efficiency of
links conditioned on a particular realization of the base station process. We then
average over all realizations of the base station process to obtain an expression
for the mean spectral efficiency.
Consider a specific realization of the base station process which we call Πt .
We shall assume that Πt does not result in any Voronoi cell of infinite area.
Realizations of Poisson point processes that result in Voronoi cells of infinite
area are known to be zero probability events (for example, see [298] page 310),
and of course not physically possible. Hence, excluding such realizations does not
influence the mean spectral efficiency when averaged over all possible realizations
of the base station process. Additionally, we shift the coordinates of our system
such that there is a base station at the origin of the system for every realization
of Πt that we consider. We shall analyze a representative link of a length r1
between the base station at the origin and a representative transmitter, which
we assume is independent of Πt for simplicity.4
Conditioned on a realization of the base-station process Πt , and link length
r1 , and using the parameters as defined in (13.139) the mean spectral efficiency
is:
⎛ ⎛ ⎞α2 ⎞
⎜ nr ⎠ ⎟
c|Πt , r1 ≈ log2 ⎝1 + Gα P1 ⎝ 5 2 6 ⎠. (13.147)
P α Πt π ρ r1
2
Hence, taking the expectation of C|Πt , r1 with respect to Πt :

⎛ ⎛ ⎞α2 ⎞(
'
⎜ nr ⎠ ⎟
c|r1 ≈ log2 ⎝1 + Gα P1 ⎝ 5 2 6 ⎠ (13.148)
P α Πt π ρ r1
2
⎛ ⎛ ⎞α ⎞ 2
⎜ nr ⎠ ⎟
= log2 ⎝1 + Gα P1 ⎝ 5 2
6 ⎠. (13.149)
2
P π ρ r1
α
The step from Equation (13.148) to Equation (13.149) is due to the ergodicity
of the Poisson–Voronoi tessallation [216]. The ergodicity implies that with prob-
ability 1, properties of different realizations of the Poisson–Voronoi tessellation
4 Note that in reality, r1 depends on Π t as the representative link must be contained in the
cell associated with the base station at the origin.
will have equal means. Since we have conditioned on the fact that there is a point
of the base station process at the origin, the resulting process is not ergodic as it
is conditioned on there being a cell with an associated base station at the origin.
As the radius of the circular network R →5 ∞ however,
6 5the 6influence of the cen-
2 2
ter cell diminishes. This fact implies that P α Πt = P α with probability 1.
Intuitively, this property holds because5 typical
6realizations of the base station
2
point process result in equal values of P Πt since the expectation is taken
α
with respect to all the wireless nodes in the infinite network. Any realization that
does not have this property occurs with zero probability. A detailed discussion on
ergodicity of point processes is beyond the scope of this text but the interested
reader is referred to references such as [299, 71, 70] and [235].
The probability distribution fr (r) of distances r between any wireless node
and its closest base station, for r ≥ 0, is given by
fr (r) = 2 π ρt r e−π ρ t r .
2
(13.150)
By using Equation (13.150) and the power control of Equation (13.119),
5 6 ∞ α2
pt α
2 π ρt r e−π ρ t r
2
2/a
P = dr min r , Pm ax
0 G t
( Gp t P m a x ) α1 α2
t pt
2 π ρt r3 e−π ρ t r
2
= dr
0 Gt
∞
−π ρ t r 2
+ 1
dr Pm2/αax 2 π ρt re dr
Gt
( p t Pm a x ) α
α2 Gt
2
1 − e−π ρ t ( p t P m a x )
α
pt
= . (13.151)
Gt π ρt
Substituting Equation (13.119) into Equation (13.149) and taking the average
with respect to r,
⎛ α2 ⎞
∞
pt α nr
c ≈ dr log2 ⎝1 + min r , Pm ax r−α Gα % & ⎠
0 Gt P 2/a π ρw
· 2 π ρt r e−π ρ t r
2
⎛ α2 ⎞
( Gp t 1
Pm a x ) α
pt nr
dr log2 ⎝1 + ⎠
t
= Gα % &
0 Gt P 2/a π ρw
2
⎛ ⎛ ⎞ α2 ⎞
∞
⎜ nr ⎠ ⎟
+ 1
dr log2 ⎝1 + Pm ax r−α Gα ⎝ 5 6 ⎠
Gt 2/a
( pt Pm a x ) α
Pt π ρw
2
⎛ α2 ⎞
2

Gt pt nr
= 1 − e−π ρ t ( p t P m a x ) log2 ⎝1 + ⎠
α
Gα % 2/a&
Gt P π ρw
⎛ α2 ⎞
∞
nr
+ 1
dr log2 ⎝1 + Pm ax r−α Gα % 2/a& ⎠
Gt
( p t Pm a x ) α P π ρw
· 2 π ρt r e−π ρ t r .
2
(13.152)
It is difficult to find a closed-form expression for the second term on the right-
hand side of Equation (13.152). We can use numerical integration to evaluate it.
However, if the transmit power of each wireless node is large (or the density of
base stations is high), Equation (13.151) simplifies to
5 6 ptα
2
2/a
Pt ≈ 2 (13.153)
Gtα π ρt
because the exponential term becomes negligible compared to unity. Equation
(13.152) then simplifies to
⎛ ⎛ ⎞α2 ⎞
⎜ nr ⎠ ⎟
c ≈ log2 ⎝1 + pt Gα ⎝ 5 6 ⎠. (13.154)
2/a
Pt π ρw
By substituting Equation (13.153) into Equation (13.154) we find that in the

regime where all nodes transmit with high power, the mean spectral efficiency is
approximately
α
nr ρt 2
c ≈ log2 1 + Gα . (13.155)
ρw
Observe that the mean spectral efficiency does not depend on the specific values
of ρt and ρw but rather on their ratio, which implies scale invariance of the
network when the thermal noise is negligible compared to the interference.
Note that while Equation (13.155) does not depend on the choice of pt , the
original equation used to derive Equation (13.155) was based on the assump-
tion that the system is interference limited, which means Equation (13.155) is
valid only when pt and ρw are sufficiently high that the system is interference
limited. The scale invariance implied by Equation (13.155) indicates that as in
the hexagonal-cell case, constant mean spectral efficiency can be maintained by
fixing the relative density of base stations to wireless nodes.
1
10

0
10
−1
10
ρt /ρw = 0.2
−2
10 ρt /ρw = 0.1
ρt /ρw = 0.05
−3
Asymptotic
10 0 1 2
10 10 10
Figure 13.14 Mean spectral efficiency of uplink communications with random cells and
unlimited transmit powers. The base station and wireless node densities are denoted
by ρt and ρw .
Monte Carlo simulations

Since the expressions for the mean spectral efficiency were derived asymptoti-
cally, it is useful to verify the extent to which they are accurate for finite systems
by simulations. Equations (13.152) and (13.155), for instance, can be tested by
running simulations of the network topology as follows. We placed base stations
in a circular network of radius 2 R, where the numbers of base stations were se-
lected to achieve relative densities of tethered to wireless nodes of 20%, 10%, and
5%. The network of base stations was then re-centered such that a base-station
exists at the origin; 4000 wireless nodes were then placed in a circular network
of radius R, centered on the network of base stations with R selected to achieve
a wireless node density of 1 × 10−3 nodes/m2 . By ensuring that the network
of base stations extends beyond the edge of the circular network containing the
wireless nodes, we can reduce edge effects. For the results presented in this sec-
tion, each experiment as detailed above is repeated 5000 times with the spectral
efficiency of a randomly selected link in the center-most cell was collected and
averaged to eliminate edge effects. The transmit power of each wireless node was
set according to Equation (13.119) with Pm ax = ∞ (to simulate the sufficient
power case) or Pm ax = 200 mW. Gt = 10−5 mα (note that the units of Gt are
meters to the power of the path loss exponent α), thermal noise power of 10−14
W, and α = 4, were assumed.
Results of Monte Carlo simulations and the asymptotic expression given by
Equation (13.155) for systems with unlimited transmit powers per node are
shown in Figure 13.14. Note that the simulations match the asymptotic results
1
10

0
10
ρt /ρw = 0.2
−1
10
ρt /ρw = 0.1
ρt /ρw = 0.05
Asymptotic
−2
10 0 1 2
10 10 10
Figure 13.15 Mean spectral efficiency of uplink communications with random cells and
200 mW transmit power limit per node. The base station and wireless node densities
are denoted by ρt and ρw .
to within 10% when nr ≥ 13 for a relative tethered-to-wireless node density of

20%. For lower relative densities, the convergence is slower. For 10% relative den-
sity, the simulations match the asymptotic expression to within 10% only when
nr ≥ 20 and only when nr ≥ 46 for 5% relative density. The rate of convergence
for random cells is slower than that for hexagonal cells because the range of
transmit powers is much larger for random cells compared with hexagonal cells,
which results in slower convergence.
Simulations results of systems with a 200 mW transmit power limit are illus-
trated in Figure 13.15. The target received power pt was set such that the target
SNR, pt /σ 2 = 30 dB. For relative tethered-to-wireless node densities of 20% and
10%, the simulated mean spectral efficiencies are within 10% of the asymptotic
prediction when nr ≥ 10. For 5% relative density, the agreement is within 10%
for nr ≥ 13. The convergence of the simulated mean spectral efficiencies to the
asymptotic values is faster for systems with limited transmit power as the range
of transmit powers in the network is smaller when there is a bound on the trans-
mit power. These simulations indicate that the asymptotic expressions are useful
characterizations of realistic systems.
The cost of random cells

A natural question one may ask at this juncture is, how much spectral efficiency is
lost by a random placement of base stations compared with a uniform placement
of base stations on a hexagonal grid. While in real life propagation losses are not
1
10
20% relative
Mean spectral efficiency (b/s/Hz/Link) density
0
10
−1
10
5 % relative
density
10% relative
−2 density
10 0 1 2
10 10 10
Number of receiver antennas at base stations
Figure 13.16 Mean spectral efficiency of the uplink with random cells and hexagonal
cells and transmit power limited to 200 mW. Solid and dashed lines represent
hexagonal and random cells respectively.
translation invariant as assumed here, this comparison can shed some insight
into the performance differences between a network with a completely random
placements of base stations (according to a Poisson point process), and a network
where base stations are placed with the optimal packing density, at the hexagonal
sites.
For systems with limited transmit powers, we can numerically evaluate and
plot equations of the spectral efficiency corresponding to random and hexag-
onal cells as shown in Figure 13.16, where the solid and dashed lines repre-
sent hexagonal and random cells, respectively. The transmit power budget was
200 mW and wireless node density was 10−3 . We simulated relative densities of
base stations to wireless nodes of 5%, 10%, and 20% as shown in the plot. Note
that the difference in mean spectral efficiencies diminishes with the number of
antennas. However, for high base station densities the mean spectral efficiency
for random cells is significantly lower. For instance, with 10 antennas at the base
stations and 20% relative density of base stations to wireless nodes, the mean
spectral efficiency with hexagonal cells is twice that of random cells.
When base station density and/or transmit power budgets are high, the mean
spectral efficiency given by Equation (13.143) can be rewritten in terms of the
Problems 467
effective base station density ρh as

α2
1.95 nr ρh
c ≈ log2 1 + Gα , (13.156)
d2 π ρw
which, when compared to the mean spectral efficiency for random cells given by
Equation (13.155), indicates a factor of approximately 2 that appears inside a
logarithm. At low effective SINRs, we can use the approximation log(1 + x) ≈ x,
which indicates that for a given density of base stations, a hexagonal-grid place-
ment of base stations yields approximately twice the mean spectral efficiency of
networks with base stations at random locations.
This indicates that several-fold (but not orders of magnitude) gains in mean
spectral efficiency can be achieved by evenly distributing base stations in planar
networks compared to randomly distributing them on the plane and furthermore,
the difference diminishes with the number of base station antennas.
13.6 Summary
This chapter covers the main concepts of cellular networks with multiantenna
base stations. We have primarily focused on the uplink part of the network
in part because the uplink is better understood from an information-theoretic
perspective but also because downlink typically uses orthogonal multiple-access
techniques such as orthogonal CDMA, which is technically challenging on the
uplink because signals pass through different channels from each mobile user
to the base station. Additionally, since the base station can accurately control
the received power at each mobile device, it can perform accurate power control
since it draws from the same total power budget for each mobile node. We have
also focused on aspects of cellular networks relating to adaptive receivers and
in particular we have focused on the distribution of users in space which not
typically covered in texts. We refer the interested reader to other texts that
focus on more practical aspects of cellular networks such as more sophisticated
power control and channel equalization techniques such as Reference [328].
Problems
13.1 Consider a multiple-access channel with three transmitters. Show that the
sum capacity of this channel is achievable using TDMA and provide expressions
for the fraction of time used by each of the tree transmitters.
13.2 Show that the sum capacity of the two user broadcast channel with addi-
tive Gaussian noise with power constraint P = P1 + P2 and channel coefficients
of h1 and h2 with ||h1 || < ||h2 || is given by Equation (13.14). You may wish to
use the optimization techniques described in Section 2.12.
13.3 Consider a narrow-band broadcast channel with transmit power P , and a

Poisson distributed number of receivers distributed i.i.d. with uniform probability
in a circle of radius R, and mean number of receivers μ. Assuming the inverse-
power-law path-loss model and independent Rayleigh fading between all nodes,
derive the CDF of the sum capacity of the broadcast channel. Note that the sum
capacity of a broadcast channel with K users and narrow-band fading coefficients
h1 , h2 , . . . , hK is

P ||hk ||2
max log2 1 + . (13.157)
||h j || σ2
13.4 Consider a cellular network approximated by a circular cell of radius Rc .
Assume that there is a Poisson distributed number of single-antenna wireless
nodes in the cell with mean number of nodes equal to ρw π Rc2 , independent
Rayleigh fading between all nodes, and a transmit power budget of P per wireless
node. Assume that the base station is connected to an infrastructure link with
capacity B log2 (1+g Pb /σb2 ), where g is a unit mean exponential random variable.
Suppose that the wireless nodes have a bandwidth of B, find the probability that
the sum capacity of the multiple-access channel formed by the wireless nodes and
the base station at the origin exceeds the capacity of the infrastructure link.
13.5 Consider the network model of Problem 13.4 but assume that the average
received power from the n interferers in the network are i.i.d. random variables
taking the values of P1 or P2 with probabilities q and 1 − q respectively.
(a) Find the asymptotic spectral efficiency when the number of streams M = 1,
and the number of receiver antennas nr and interferers n go to infinity with
a constant ratio a. Hint: you will find the formula for the roots of a cubic
polynomial useful in this question.
(b) Explain a context where this network model could be useful.
(c) Write a computer simulation to verify your answer to part (a) with P1 = 1,
P2 = 0.5 and q = 0.9.
13.6 Plot and qualitatively describe the capacity region of the vector multiple-
access channel with two receivers. Explain how it differs from multiple-access
channel capacity region for single-antenna systems.
13.7 Find the CDF of the SINR for a multiantenna receiver in a circular
cell communicating with a single-antenna transmitter in the presence of equal-
transmit power, single-antenna interferers distributed according to a Poisson
point process on the plane, assuming that the receiver knows only the spatial
covariance matrix of the interferers in the circular cell as in Section 13.3.5. In
other words, the out-of-cell interference is modeled as white noise here.
13.8 Consider a hexagonal-cell system with no out-of-cell interference, the
power control algorithm of Equation (13.119), with Pm ax = ∞ and single-
antenna transmitters with nr antennas at the base station. You may approximate
the hexagonal cell by a circle.
Problems 469
(a) Find the CDF of the spectral efficiency of a random link in this system.
(b) Using the previous result and the fact that the integral of 1 minus the CDF
equals the mean of a positive random variable, compute the mean spectral
efficiency of this system.
(c) Assuming that a reuse factor of K is required for the assumption of no
out-of-cell interference to hold, compare the mean area spectral efficiency
using the asymptotic analysis of and a reuse factor of 1 in (13.143) with
your answer in the previous section. Make sure you take into account the
penalty on the area spectral efficiency due to the reuse factor of K.
14 Ad hoc networks
14.1 Introduction
As described in Chapter 4, ad hoc wireless networks are networks with no central-

ized control. Compare this with cellular networks where the base station/access
point acts as a controlling authority to synchronize and coordinate transmissions.
14.1.1 Capacity scaling laws of ad hoc wireless networks

The capacity of ad hoc wireless networks is difficult to compute as there are a
large number of ways that nodes may cooperate in such a network. For instance,
nodes may assist other nodes in forwarding packets or they may schedule their
transmissions in a manner that is helpful for other nodes. Most existing capacity
results for ad hoc networks have focused on scaling laws. Specifically, capacity
scaling laws describe how the capacity of a network scales with the number or
density of nodes.
A seminal result in this field is that of Gupta and Kumar [132], who consider n
nodes distributed randomly in disk of fixed area and a multi-hop protocol. They
√
found that the per-link capacity in such a network must decay as O (1/ n) (see
Section 2.13 for a description of the order-of-growth notation), that is to say that
as the number of nodes grows large, the per-link rate must decay inversely with
the square root of the number of nodes. Furthermore, they provide a specific traf-
fic pattern that achieves this rate of capacity decay
and show that it is possible
√
to achieve a per-link scaling law of O 1/ n log n with a random traffic pattern.
The main idea behind Gupta and Kumar’s result is that physical links will be
between nearest neighbors and packets will be routed in a multi-hop fashion.
As n increases, the density of nodes increases and the distances between nearest
neighbors decrease as the square root of node density. The physical links between
nearest neighbors are found to be able to maintain the same data rate with high
probability. The reduction in capacity results in the increased packet-forwarding
burden that arises from shorter hop lengths, which necessitates a greater number
of hops for a packet to traverse any constant distance.
Note that most works in the literature assume that each node can only per-
form half-duplex communications, that is, a node may not transmit and receive
at the same time. Hence, nodes that are forwarding packets on behalf of others
Figure 14.1 Illustration of Ozgur hierarchical cooperation protocol.
may not receive data while transmitting and vice versa. This assumption arises
because simultaneous transmission and reception of signals is physically very
difficult for wireless systems due to the large discrepancy between the trans-
mitted and received signal powers, which would result in most of the dynamic
range of analog-to-digital converters at a receiver being taken up by its own
transmit signal. By using multiantenna technology, however, this overwhelming
self-interference can be reduced to manageable levels, resulting in feasible oper-
ation of simultaneous transmit and receive systems such as the system reported
in Reference [37].
√
The per-link upper bound of O (1/ n) from [132] was shown to be achievable
for random traffic patterns by Franceschetti et al. [101] using techniques from
percolation theory. They show that as the number of nodes increases in a fixed
area, high throughput paths spanning the network naturally form. These paths
have high throughput because they contain nodes that are separated from one
another by small distances. Packets going from a source to a destination far away
are then routed through these high-throughput paths. Routing packets in this
√
manner is shown to support a per-link capacity scaling of O (1/ n). Ozgur et al.
[240] proposed a hierarchical cooperation scheme in which a network is divided
into subnetworks which are further divided into sub-subnetworks and so on. This
hierarchical division is combined with distributed MIMO communications where
multiple nodes in a subnetwork cooperate as a virtual antenna array to transmit
data over long distances. Figure 14.1 illustrates how nearby nodes are used in
a link between the node labeled S and the node labeled D. Each square in the
figure represents a sub network in which the same virtual array scheme is used.
Ozgur et al. found that with sufficient levels of hierarchy, it is possible to achieve
472 Ad hoc networks
a per-link capacity scaling of O (n ) where > 0 can be made arbitrarily small
by using sufficiently many hierarchical levels.
Franceschetti et al. [102] used a degrees-of-freedom argument that arises from
electromagnetic propagation and bounded the per-link rate of ad hoc wireless
√
networks by O (log n)2 / n . They used a cut-set bound (for example, see [68])
to derive this result. The cut-set bound essentially states that the sum of all the
individual data rates achievable with arbitrarily low probability of error between
any set of source and destination nodes is less than or equal to the capacity
of a MIMO link where the source nodes act as a unified transmitter and the
destination nodes act as a unified receiver. The term cut-set is used to refer to
the partitioning of nodes in a network into a set of transmitters and receivers
where the boundary between the two sets is known as the cut. In Reference [102],
a circular network is cut into concentric circles. The authors showed that a MIMO
link formed by nodes in the inner circle communicating with nodes in the outside
√
of the circle has the number of degrees of freedom bounded by O( n log n) which
limits the capacity of the MIMO link formed between nodes on one side of the
cut to the other. Adding up the capacities of all the links between
nodes inside
√
the inner circle and outside the outer circle leads to a O (log n)2 n scaling
law on the total of the capacities of the n links in the network which in turn
2
√
leads to the per-link scaling law of O (log n) / n . The discrepancy between
this result and others such as the Ozgur result are noted by Franceschetti et al.
to be artifacts of unrealistic channel models in these other works.
Gupta and Kumar capacity scaling law

The work of Gupta and Kumar [132] analyzes the throughput capacity scaling
of a wireless network with n nodes distributed randomly and with uniform prob-
ability in the interior of a disk with radius R. Note that they also consider the
surface of a sphere in their analysis.
They define two models for the success or failure of a link: the protocol model
and the physical model. In the protocol model, a link between node (transmit-
ter) and node j (receiver) of length rj is successful if
(1) rj ≤ ν, where ν is a threshold that determines the maximum distance over
which a link can successfully be closed, that is, a link is successful only if the
source and destination are sufficiently close to each other;
(2) there are no other nodes transmitting in a circle of radius (1 + Δ)ν around
node j. The term Δ represents a guard zone around the given receiver.
For the physical model, a link is considered successful if the signal-to-interference-
plus-noise ratio (SINR) exceeds a defined threshold. An additional significant
contribution of this work is the introduction of transport capacity as a metric for
the performance of an ad hoc wireless network. The transport capacity of a link is
the distance-weighted throughput capacity of a link and is the product of the data
rate (bits/second) and distance over which the bits in the link are transported
(meters). Transport capacity thus has nominal units of bit-meters/second.
For the protocol model, the authors show

that the per-link throughput capacity
for the protocol model is of order 1/ n log2 (n) with probability approaching
unity as n → ∞. That is to say, there exists a constant K1 for which a per-link
throughput capacity of
K1
(14.1)
n log2 (n)
is achievable with probability approaching 1 as n → ∞, and there exists another

constant K2 for which a per-link throughput capacity of
K2
(14.2)
n log2 (n)
is not achievable with probability approaching 1 as n → ∞.

For the physical model, the authors show that the per-link throughput capacity
is bounded from above by a function of order √1n with probability approaching
unity as n → ∞. That is to say, there exists a constant K3 for which a per-link
throughput capacity of
K
√3 (14.3)
n
is not achievable with probability approaching 1 as n → ∞.
Additionally, they show that it is possible to achieve a per-link throughput
capacity of order √ 1 with probability approaching unity as n → ∞. That
n log 2 (n )
is to say, there exists a constant K4 for which a per-link throughput capacity of
K4
(14.4)
n log2 (n)
is achievable with probability approaching unity as n → ∞. Note that this latter

lower bound was exceeded for a square network by the scheme proposed by
Franceschetti et al. [101].
Hierarchical cooperation
In this section, we briefly describe the hierarchical cooperation scheme of Ozgur
et al. Please refer to Reference [240] for a complete description. Suppose that
n randomly selected source–destination pairs wish to communicate in a square
network of fixed area A in which there are n nodes, that is, each node is a
source as well as a destination. The channel between a pair of nodes separated
by distance r is modeled by a single coefficient of the form r− 2 e−iθ , where α is
α
the path-loss exponent and θ is uniformly distributed from zero to 2π, and are
independent between all pairs of nodes.
They prove that if there exists a communication strategy with network through-
put K nb , then there exists another strategy where the throughput is at least
1
Kj n 2 −b with 0 ≤ b < 1, with high probability (that is, approaching 1 as n → ∞).
474 Ad hoc networks
By applying this recursively, the total throughput capacity scaling approaches

n, that is, linear capacity scaling with the number of nodes.
They introduce a protocol that achieves this scaling law by communicating
in stages. Suppose we have a scheme with a throughout of K0 (for example,
time-division-multiple-access (TDMA)). Then a source S communicates with its
destination D as illustrated in Figure 14.1. The network is divided into clusters
of M nearby nodes. For simplicity, we assume that each square in Figure 14.1
contains exactly M nodes. First, the source S transmits a block of L bits to each
of the M − 1 nodes in its cluster, which is illustrated by the arrows from S to
its nearby nodes in Figure 14.1. This transmission is done using TDMA with a
reuse factor of nine, that is, one in every nine clusters can communicate at the
same time. The number nine comes from the manner in which TDMA is done in
this example where the square network is divided into smaller squares, each of
which contains a cluster of nodes. At each time slot, only one out of nine clusters
are allowed to transmit, such that when a given cluster is transmitting, there
can be no other transmitting clusters that are adjacent. For simplicity, assume
that there are exactly M nodes in each cluster and assume that there exists a
communications strategy in this network where n nodes can communicate with
throughput K nb where b > 0. The communication in the network takes place
in stages. In the first stage each node in a given cluster exchanges L bits it
wishes to communicate with all the remaining nodes in the cluster. There are
approximately M 2 exchanges of L bits that have to take place and since there is
a strategy for communicating with throughput K nb with n nodes, this exchange
of information takes 2M 2−b L/K time units. Note the factor of two is required to
handle the special case of the source and destination located in adjacent clusters
(see Section IV of Reference [240] for details). Since only one of nine clusters can
transmit at a given time, across the entire network, this exchange of information
takes 18M 2−b L/K time units.
In the next stage, for each source–destination pair, the M nodes in its cluster
transmit the LM bits in a multiple-input multiple-output (MIMO) fashion to
the M nodes in the cluster of the destination (dashed lines in Figure 14.1) using
TDMA and require a total of 2 nsy m n time slots, where nsy m is the number of
symbols required to transmit the M L bits in the MIMO phase from the source
cluster to the destination. The factor of two again is required to handle the
special case where the source and destination clusters are adjacent. Note that
the MIMO communication here does not require channel information at the
transmitter and thus corresponds to the uninformed transmitter MIMO channel
described in Section 8.3.3.
The M nodes in the receive cluster each quantize their nsy m received symbols
to Q bits, and transmit them to nodes in the destination cluster one at a time
using the scheme used to distribute bits to the transmit cluster in the first stage.
This step is illustrated by the arrows going into D from its nearby nodes in
Figure 14.1. This stage requires 2 nsy m Q M 2−b /K time slots as each node in the
cluster has to now transmit Q bits per symbol, for nsy m symbols to every other
node in the cluster. Thus, the total number of bits exchanged is nsy m Q M 2 .
We assume that this exchange of information uses the communication strategy
which achieves a throughput of K nb for a network with n nodes. An additional
factor of two occurs to handle the case where source and destination clusters are
adjacent.
The total throughput in the network T (n) for this system is now
nM L
T (n) ≥ . (14.5)
18 M 2−b L/K0 + 2 nsy m n + 18 nsy m Q M 2−b /K
1
By setting M = n 2 −b , Equation (14.5) is maximized yielding the following bound
on the total throughput
L 1
T (n) ≥ n 2 −b . (14.6)
18L/K + 2nsy m + 18nsy m Q/K
Hence, starting with a communication strategy that achieves a throughput of

1
K nb , we have constructed a scheme that can achieve a throughput of Kj n 2 −b .
If we start assuming TDMA, which can achieve a per-link throughput of K0 ,
we can achieve a throughput of K1 n1/2 which implies that we can achieve a
2
throughput of n 3 , and so on, where K are constants. Continuing this recursively,
we can achieve a throughput of Kj n , where is arbitrarily close to zero.
Note that because the pre-constant for this hierarchical cooperation scheme
decays so rapidly with the number of levels of hierarchy, using multiple levels
of hierarchy is beneficial only when the number of nodes in the network is very
large as shown in Problem 14.1. This underscores the fact pointed out in Section
VI-A of Reference [240] that capacity scaling laws are useful as coarse architec-
tural guidelines but may not indicate the best approach communications systems
designers should take when designing real systems. We need to also stress that
the analysis here assumed the specific protocol described in Reference [240] that
was designed to achieve a capacity scaling law rather than as a protocol suited
to fixed, finite number of nodes. Thus, it is likely that the general hierarchical
protocol can be improved for fixed numbers of nodes.
14.2 Multiantenna links in ad hoc wireless networks
Besides capacity scaling laws, one can also analyze the performance of ad hoc
networks with multiantenna nodes using specific system assumptions. Much work
has been done in characterizing the achievable rates of multiantenna links in
ad hoc wireless networks with specific receiver architectures [123, 149, 9, 165,
194, 321]. While this type of approach may not give a great deal of insight into
the ultimate performance limits of such systems, they are immensely useful in
practical scenarios, and are particularly attractive as closed-form results can
often be obtained. For the remainder of this chapter, we shall focus on analyses
476 Ad hoc networks
of systems with specific system parameters such as the type of multiantenna

receiver used.
The performance of multiantenna links in ad hoc wireless networks varies de-
pending on the availability of channel state information (CSI) and the transmit-
ter/receiver type. Note that in this chapter, the term channel state information
refers to the statistical properties of the received signal, or to the channel co-
efficients between nodes. This usage is in contrast to Section 5.3.4 whereby the
channel state information at the transmitter refers to the interfering signal itself.
Asymptotic techniques have proved useful in characterizing the spectral effi-
ciency for such systems. Of particular interest is the Marcenko–Pastur theorem
which is described in Section 3.6.1. We first present a technique to find the
limiting spectral efficiency of a representative wireless link in an ad hoc wireless
network when the transmitters have CSI corresponding to the channel coefficients
between themselves and their target receivers. We follow this with a technique
to compute the spectral efficiency when transmitters have no CSI.
14.2.1 Asymptotic spectral efficiency of ad hoc wireless networks with limited

transmit channel-state information and minimum-mean-square-error
(MMSE) receivers
In this section, we shall apply the analysis of Section 13.4 to an ad hoc wireless
network model.1 We shall consider the spectral efficiency of a representative
link (link 1), in the presence of spatially distributed interferers. Assume that the
system model of Section 13.4 holds. Recall that each transmitter has nt antennas
and transmits independent data streams from M channel modes as defined in
Section 13.4. Assume that in the limit as n, nr → ∞ with n/nr = a , the PDF
of received interference powers Ψn r (τ ) converges to a limiting function Ψ(τ ).
Then, the spectral efficiency of link 1 given by Equation (13.96) converges with
probability 1 to

M
{∞}
c1 = log2 (1 + nαr /2 λj P1j γ1 β), (14.7)
j =1
where β is the unique, non-negative solution to the equation

∞
x
−σ 2 β + 1 = β a dH(x) . (14.8)
0 1 + xβ
In Equation (14.7), P1j is the power allocated to the jth stream by the represen-
tative transmitter and γ1 is the path loss between the representative transmitter
and receiver. The array gain resulting from the transmit precoding is captured
{∞}
in the term λj which is the limiting value of the jth eigenvalue of an nr × nr
Wishart matrix GG† /nr , where G is an nr × nt matrix with i.i.d. CN (0, 1)
Figure 14.2 Illustration of wireless network with representative link.
entries. In particular if nr , nt → ∞ such that nr /nt = d > 0,
{∞} {∞} {∞}

√
λ1 = λ2 = · · · = λM = (1 + d)2 (14.9)
and if nt is a finite constant,
{∞} {∞} {∞}

λ1 = λ2 = · · · = λM = 1. (14.10)
Recall that a = n/nr , β is the limit of a normalized version of the SINR on

each stream and x is a dummy integration variable. Here, H(x) is the limit of
the empirical distribution function of the received interference powers,
M
1
H(x) = dτ f (τ ) Ψ(x/τ ) . (14.11)
M j =1
We shall now construct a spatially distributed network model where nodes trans-
mit with random powers that are not dependent on their distance from the rep-
resentative receiver and are distributed spatially. We shall show that an appro-
priately normalized version of the spectral efficiency converges with probability
1 to an asymptotic limit as given in Equation (14.7).
478 Ad hoc networks
14.2.2 Spatially distributed network model

Consider a circular wireless network of radius R with n wireless transmitters at
random i.i.d. points in the circle such that
n = ρ π R2 . (14.12)
The representative receiver R1 is assumed to be at the origin of the circle and
the representative transmitter T1 is at a distance r1 from R1 as shown in Figure
14.2. The n interferers, are in links with other receivers whose locations do not
impact the representative link. Let r denote the distance between transmitting
node and the origin and the representative link be denoted link 1. The path
loss γ = Gt r−α with α > 2, which is a standard model for spatially distributed
networks.
As in Section 13.4, we shall assume that the jth node transmits with power
α /2−1
nr Pj where Pj for j = 1, 2, . . . n are i.i.d. random variables. This assumption
enables the asymptotic analysis of the SINR as nr → ∞. Without this assump-
tion, as nr → ∞, the thermal noise will eventually dominate over the interference
levels as the MMSE receiver will reduce interference to levels comparable with
the thermal noise levels, resulting in the system no longer being interference
limited.
It is known from Reference [123] that as nr → ∞, the SINR in the interference-
α /2
limited regime for systems without transmit CSI grows as nr as is pointed out
in Section 13.5.1. To avoid singularities, we define a normalized SINR for the jth
data stream from the representative receiver as
− α2
ηn{jr } = nr SINRj . (14.13)
As the number of interferers n, and number of receive antennas nr → ∞ with
{}
n/nr = a, the normalized SINR for stream- ηn r approaches an asymptotic limit
with probability 1 as follows:
{∞}
ηn{}
r
→ η {} = P1 λ Gt r1−α β , (14.14)
where the limit of the normalized, per-stream SINR β satisfies the following
equation,

2π 2 ρ (Gt β) α 5 α2 6
2 M
2π
Pj csc
α j =1
α
M
∞
τ−α ∞
2
2πρβ 2
− dτ dx fj (x)x α dx + βσ 2 = 1, (14.15)
α 0 1 + τ β j =1 τ /b
α /2 5 6
2/α
where the parameter b = π ρn n
r
, and Pj is the expected value of the
transmit power allocated by the interferers to their jth strongest stream, raised
to the power α2 . This result is proved in Reference [127].
The relationship of various parameters such as the SINR, number of receiver
antennas, interferer density, and path loss are not clear from Equation (14.15). A
few approximations can be made to yield additional insight into how the various
factors contribute to the limiting SINR. From Problem 14.4, we note that if
n/nr is very large, that is, the number of nodes in the network is much larger
than the number of antennas per receiver, the transmit power limit per node
implies that the second term on the left-hand side of Equation (14.15) is small.
Furthermore, we assume that the thermal noise power is small, which implies
that the third term on the left-hand side of Equation (14.15) is small. Using
these approximations, we have

2π 2 ρ (Gt β) α 5 α2 6
2 M
2π
Pj csc ≈1
α j =1
α
⎡ ⎤
α /2
1 ⎣ α 2π ⎦
β≈ 5 2 6 sin , (14.16)
Gt 2π 2 ρ "M Pα α
j =1 j
which when substituted into Equation (14.14) yields the normalized SINR in the
interference-limited regime on the th stream η {} ,
⎡ ⎤
α /2
{∞} α 2π ⎦
η {} ≈ λ1 P1j r1−α ⎣ "M 5 2
6 sin
2π 2 ρ Pjα α
j =1
⎡ ⎤α /2

{∞} α 2π ⎦
= λ1 P1j ⎣ "M 5 2
6 sin . (14.17)
2π 2 ρr12 Pj α α
j =1
α α /2 α /2
Defining Gα = 2π sin 2π
α , rescaling the normalized SINR by nr and
summing the contribution from M streams yield the following approximation
for the spectral efficiency of link 1 in the interference-limited regime when nr is
large,
⎛ ⎡⎤ α2 ⎞

M
⎜ {∞}
log2 ⎝1 + λ1 P1j Gα ⎣
nr ⎦ ⎟
c1 ≈
2
"M 5 α2 6 ⎠. (14.18)
=1 πρr1 j =1 Pj
We illustrate the utility of Equation (14.18) by using the constant transmit

power and 2-class models from Section 13.4.4. For the constant power model
(recall that the transmitters use equal power on each of their M transmit streams
here), Equation (14.18), which gives a spectral efficiency of cep 1 (recall that the
ep superscript denotes equal-power streams), becomes
α2

M
{∞} nr
cep
1 ≈ log2 1+ λ1 Gα . (14.19)
πρr12
=1
480 Ad hoc networks
For the 2-class model with link 1 assigned to the first class Equation (14.18)
gives a spectral efficiency c2c
1 of
⎛ ⎡ ⎤α /2 ⎞
M
⎜ {∞} nr ⎟
1 ≈
c2c log2 ⎝1 + λ1 P1 Gα ⎣ 2 2
⎦ ⎠ .
=1 π ρ r1 q M P1 + [1 − q] r P2
2 α α
(14.20)
14.2.3 Asymptotic spectral efficiency without transmit channel-state information

A similar analysis can be carried out for systems with no transmit CSI as first
presented in [121], and subsequently in [123].2 The main difference between the
two systems is that, without any transmit CSI, transmitters will not be able to
steer their transmit signals in directions that are stronger between themselves
and their target receivers. Hence, the loss in performance comes from the un-
availability of the beamforming gain that transmit link CSI provides. Note that
the beamforming gain with transmit link CSI comes from the singular values of
the channel matrix between the transmitter and receiver, which is represented
{∞}
by the λ1 terms in Equation (14.18).
Without transmit CSI, it can be shown that as the number of interferers n,
number of receive antennas nr → ∞ with n/nr = a, the normalized SINR for
{}
stream- ηn r approaches an asymptotic limit with probability 1 as follows:
ηn{}
r
→ η {} = P1 Gt r1−α β, (14.21)
where the proof of this follows the steps used to prove Equation (14.14) with
{∞}
λ1 equal to unity since it is not possible to get beamforming gain without
transmitter CSI, and the limiting value of β given by Equation (14.8) for a general
distribution of received powers (subject to the convergence criteria required for
Equation (14.8)). For the spatially distributed model, the limiting value for the
normalized SINR β is given by Equation (14.15).
For the spatially distributed network model, using a similar set of approxima-
tions as that used for systems with transmit link CSI, the mean of the spectral
efficiency in the interference-limited regime, cn1 c (the superscript nc is used to
denote a system with no CSI) can be approximated as
⎛ ⎡ ⎤ α2 ⎞
M
⎜ nr ⎟
cn1 c ≈ log2 ⎝1 + P1 Gα ⎣ " 5 26⎦ ⎠ . (14.22)
2 M
=1 π ρ r1 j =1 Pjα
The power allocations in Equation (14.22) can then be optimized to maximize

the left-hand side. However Equation (14.22) is nonconvex in the powers allo-
cated to the streams from the transmitters in the network P s. This fact leads
to difficulty in the optimization of the power allocations. For M = 2, the op-
timization can be computed in closed form where either all power is allocated
Approximate Mean Spectral Efficiency (b/s/Hz)

nr /(πρ r21) = 7
5 2
nr/(πρ r1) = 6
4 2
nr/(πρ r1) = 5
3 nr/(πρ r2) = 4
1
nr/(πρ r2) = 3
2 1
2
nr/(πρ r1) = 2
1
2
nr /(πρ r1) = 1
0
0 0.2 0.4 0.6 0.8 1
Power Allocation to Stream 1
Figure 14.3 Approximate mean spectral efficiency vs. power allocated to one stream of
a two-stream system. Note that the optimal power allocation is to assign all power to
one stream or divide the power equally between two streams.
to one stream or equal power is allocated to both streams as done in Refer-

ence [120]. For other values of M , it can be numerically verified that the best
power allocation is to allocate zero power to some fraction of the M streams and
equal power to the rest. Tables of values for the numerical optimization can be
found in Reference [120]. Figure 14.3 illustrates the optimal power allocations
for M = 2 for different values of π nρ rr 2 , and α = 4. Hence, the optimal power
1
allocation becomes an optimization on the number of equal power streams. The
mean spectral efficiency with equal power streams cn c is
⎛ ⎡ ⎤ α2 ⎞
⎜ nr
5 26⎦ ⎟
cn1 c ≈ M log2 ⎝1 + P Gα ⎣ ⎠. (14.23)
2
π ρ r1 M P α
Note from Equation (14.23) that increasing the number of streams M reduces
the quantity inside the log function while increasing the quantity outside the
log function. By relaxing the integer requirement on M , the optimal number of
transmit streams (in the regime where Equation (14.23) is valid) is
α /2
arg max n
M opt
= 0 1 x log 1 + Gα r
,
2
x ∈ floor
nr
2 , ceil
nr
2
xπ ρ r12
K α π ρ r1 K α π ρ r1
(14.24)
where
2/α
2π 2π α
Kα = csc −1 − , (14.25)
α α 2W0 − α2 e−α /2
482 Ad hoc networks
and where W0 (z) is the Lambert’s W function (see Section 2.14.4). This opti-
mization is the subject of Problem 14.6.
If we substitute the integer-relaxed optimum number of streams from Equation
(14.25) into Equation (14.23), we find the mean spectral efficiency copt under
the optimal number of equal-power streams is

% & nr
opt
c ≈ Kα , (14.26)
π ρ r12
where
1
Kα = log 1 + G α (K α /2
) . (14.27)
Kα 2 α
Thus, the mean, per-link spectral efficiency can grow linearly with the number
of antennas, and approximately constant mean spectral efficiency can be main-
tained if the number of receiver antennas increases linearly with user density.
Comparison of systems with transmit link CSI and without transmit CSI
In Figure 14.4, the percentage increase in mean spectral efficiency with transmit
link CSI versus the quantity πρr12 which can be interpreted as the average number
of interferers closer to the receiver than its target transmitter and was defined as
the link rank in Reference [123] is shown. Note that the gain in using transmit
CSI is highly dependent on the quantity π ρ r12 . For instance, for π ρ r12 = 6, and
two transmit streams transmit link CSI provides a twofold increase in spectral
efficiency. The increase in mean spectral efficiency can be greater than threefold
for high-rank links. These results indicate that a significant (but not orders of
magnitude) increase in spectral efficiency is possible with link transmit CSI.3
Thus, we see that several-fold increases in spectral efficiency are possible using
transmit link CSI, especially in networks with long links, user densities, or both.
Hence, transmit link CSI can be useful in environments which are very stable
over time whereby receivers can feed back channel estimates infrequently to
transmitters.
14.2.4 Maximum-signal-to-leakage-plus-noise ratio receiver

The maximum-signal-to-leakage-plus-noise ratio (Max-SLNR) beamformer struc-
ture is a heuristic beamforming strategy in which transmitters attempt to be
good neighbors in a network by reducing the effects of the interference they
cause on unintended receivers. Such a system has been implemented experimen-
tally with findings reported in Reference [37], which also considers space-time

350
1 Stream
2 Streams
Spectral Efficiency Increase (%)

300
4 Streams
8 Streams
250
200
150
100
0 5 10 15 20
Mean no. of interferers closer than transmitter πρ r2
1
Figure 14.4 Percentage increase in mean spectral efficiency of systems with transmit
link CSI over systems without transmit CSI versus πρr12 for nr = 12 antennas per
node.
coding extensions to this system and simultaneous transmission and reception. A

theoretical study of such a system was done in Reference [267] in which the term
Max-SLNR was introduced. Note that Chapter 16 considers other strategies for
nodes to act as good neighbors in the context of cognitive radio.
This technique could be thought of as a “reverse MMSE” strategy where trans-
mitters attempt to maximize the ratio of signal power delivered to their desired
receivers to the sum of the powers they deliver to unintended receivers and noise.
The term leakage is used to describe the signal that is delivered to unintended
targets.
Suppose that the channel matrix between transmitter and receiver j is Hj .
We assume that in addition to H , transmit node has knowledge of the matrix
H̃ H̃† , where the matrix H̃ contains the combined channels of the unintended
receivers as follows
H̃ = [H1 H2 · · · H(−1) · · · H(+1) · · · Hn ]. (14.28)
That is, H̃ consists of the channel matrices between transmitter and all unin-
tended receivers. The SLNR associated with link- is defined as follows
||Hii w ||2
SLNR = "n , (14.29)
k =1,k = ||Hik w || + nr σ
2 2
where the vector of weights applied at the antennas of transmitter is given by

w , and the noise variance per receive antenna is σ 2 . The numerator of Equation
(14.29) is simply the signal power received at the target receiver of transmitter .
484 Ad hoc networks
The summation in the denominator corresponds to the sum of the signal powers
due to transmitter as observed at all unintended receivers plus the total noise
observed at all nr receive antennas. Using some matrix manipulations Equation
(14.29) can be rewritten as
||Hii w ||2
SLNR = . (14.30)
||H̃ w ||2 + nr σ 2
As shown in Reference [267], the transmit weights w are given by the eigenvector
corresponding to the largest eigenvalue of the matrix:
−1
Hii H†ii H̃ H̃† + σ 2 nr I .
14.3 Linear receiver structures in spatially distributed networks
In this section we discuss the linear receivers discussed in the previous chapter
for cellular networks in the context of spatially distributed ad hoc wireless net-
works. The derivations of the previous chapter will be used in this section with
appropriate modifications.
14.3.1 Linear MMSE receivers in Poisson networks

Recall from Section 13.3.5 that the density of interferers in a radius R circular
network is ρ and α is the path-loss exponent that controls signal power decay
with distance. Recall that the CDF of the SINR for the linear-MMSE receiver
located in the center of a circular cell of radius R with interferers distributed
uniformly randomly is given by Equation (13.59) in Section 13.3.4:
2π
2 α R
1
Pr(SINR ≤ x) = 1 − exp −σ xr1 exp ρ dr dθ −1 r
0 0 1 + r−α xr1α
2π k
r −1
n
1 2 α −k R
r−α xr1α
× σ xr1 ρ dr dθ r , (14.31)
k!( − k)! 0 0 1 + r−α xr1α
=0 k =0
where r, θ, , and k are dummy variables. The integrals in the above equation
can be evaluated as follows:
R 2π R
1 1
dr dθ − 1 r = 2π dr −1 r
0 0 1 + r−α xr1α 0 1 + r−α xr1α
R
2 2
= −πr2 + πr2 2 F1 1, − ; 1 − ; −r−α xr1α . (14.32)
α α r =0
When R = ∞, that is, the limit as the radius of the circle goes to infinity, the
double integral in Equation (14.32) can be directly evaluated to yield
R 2π
1 2π 2 2 −α
α − α π (2 − α)
dr dθ −1 r = (xr1 ) csc .
0 0 1 + r−α xr1α α α
14.3 Linear receiver structures in spatially distributed networks 485
Using this result and Equation (13.65), the CDF of the SINR is found in Refer-
ence [9] to be given by
k

2
r −1
n ρ Kα x α r12 + σ 2 x r1α 2
Pr(SINR ≤ x) = 1 − exp −ρ Kα x α r12 − σ 2 x r1α ,
k!
k =0
(14.33)
where the parameter
2 π Γ(2/α) Γ(1 − 2/α)
Kα = .
α
Note that taking the radius of the circle to infinity while maintaining a constant
density of interferers ρ results in a Poisson point process (PPP) of interferers
discussed in Section 3.4 with density ρ. Hence Equation (14.33) is the CDF of
the SINR of a link of length r1 in a Poisson field of interferers with density ρ
and Rayleigh fading.
14.3.2 Laplacian of the interference in Poisson networks and

matched-filter and antenna-selection receivers
For both the matched-filter and antenna-selection receivers, the Laplacian of the
interference plays a key role in the statistical properties of the SINR. Recall that
the Laplacian of a random variable X is the expected value of e−s X , for complex
s and is also the Laplace transform (see Section 2.11) of the PDF of the random
variable X. In Section 13.3.6, the Laplacian of the interferernce was derived for a
circular network in Rayleigh fading. To obtain the result for an infinite network,
one just has to take the radius of the circular cell to infinity. Recall from Section
13.3.6 that the Laplacian of the interference from a circular cell with radius R is

ΦI (s) = eρA R (z −1)
z =Φ p (s)
= exp [ρAR (Φp (s) − 1)] , (14.34)
where

2s2/α P 2/α 2 2+α 2 −2 + α −α
ΦP (s) = Γ − Γ + 2 F 1 1, − , , −R sP .
αR2 α α α α
(14.35)
If we now take the radius of the circular cell to infinity, we find that the Laplacian
of the total interference equals (see for instance [133]),

2 2 2
ΦI (s) = exp −π ρΓ 1 + Γ 1− sα . (14.36)
α α
Equation (14.36) corresponds to the Laplacian of the interference due to trans-
mitters distributed according to a Poisson point process on the plane with density
486 Ad hoc networks
ρ and Rayleigh fading. This Laplacian can then be used to find the outage prob-
ability for the antenna selection receiver with nr receive antennas as derived in
Section 13.3.3 and repeated below:
nr

nr xr1α 2 xr1α
Pr {SINR ≤ x} = (−1) exp −k
k
σ ΦI k . (14.37)
k P1 P1
k =0
The outage probability of the matched-filter receiver can also be found using the
Laplacian as derived in Section 13.3.3, and is given below
k
xr 1α
r −1
n k
P1 d −sσ 2
Pr {SINR ≤ x} = 1 − (−1)k e ΦI (s)
k! dsk
k =0 x r 1α
s= P1
k
r −1
n
−xr1α dk −sσ 2
1
=1− e ΦI (s) . (14.38)
k! P1 dsk x r 1α
k =0 s= P1
Note that Equations (14.37) and (14.38) were first given in Reference [148].
Example calculations of outage probability of linear receivers in Poisson networks

Consider a Poisson point process with a uniformly random distribution of users
with density 10−3 /m2 and path loss exponent α = 4. The length of the repre-
sentative link is fixed such that πρr12 = 1. The transmitters have single antennas
and the representative receiver has nr antennas. We assume that the noise power
is negligible, that is, σ 2 = 0.
If the representative receiver uses the linear MMSE receiver, the outage prob-
ability as a function of the SINR can be calculated using Equation (14.33). The
outage probability is shown in Figure 14.5 for nr = 2, 4, 8, and 16 antennas. Note
that increasing the number of antennas dramatically reduces the probability of
outage for a given SINR.
The outage probability of the matched-filter receiver is given in Equation
(14.38). Notice from Equation (14.38), that the first three derivatives of ΦI (s) are
required to evaluate
the CDF
of the SINR for the matched filter receiver. Let’s
write G1 = πρΓ 1 + α2 Γ 1 − α2 for simplicity. Then we have the following:
d 2G1 2 −1 2

ΦI (s) = − s α exp −G1 s α ,
ds α
d2 2G1 −G 1 s α2 2 −2 α 4 −2 α

ΦI (s) = e (α − 2)s α + 2G 1 s α ,
ds2 α2
d3 4G1 −G 1 s α2 2 −3 α 4 −3 α

2 6 −3 α
ΦI (s) = − e (2 − 3α + α 2
)s α + (3α − 6)G1 s α + 2G 1 s α .
ds3 α3
The equations above can be combined with Equation (14.38) to find the CDF of
the outage probability with the matched filter receiver. The outage probability of
1
nr = 2
0.8
nr = 4
Outage Probability nr = 8
0.6
nr = 16
0.4
0.2
−40 −30 −20 −10 0 10 20

SINR (dB)
Figure 14.5 Outage probability versus SINR of a MMSE receiver in a Poisson field of
interferers nr = 2, 4, 8, and 16 receiver antennas. For this Figure, the density of
interferers was ρ = 10−3 nodes/m2 , the length of the representative link r1 = 35.7 m,
such that πρ r12 = 1, and the path-loss exponent α = 4.
the antenna selection receiver is given directly in Equation (14.37). The outage
probabilities for the linear MMSE, matched-filter and antenna-selection receivers
are shown in Figure 14.6 for nr = 2 and nr = 4 receiver antennas. The right-
most plot for each receiver type corresponds to the nr = 4 case and the left-
most corresponds to nr = 2. Notice that the antenna-selection receiver and the
matched filter have similar performance for small numbers of antennas but the
difference increases significantly going from two to four antennas as expected.
14.4 Interference alignment
Interference alignment has recently emerged as a powerful technique for inter-

ference mitigation in high signal-to-noise-ratio (SNR) scenarios. The basic idea
of interference alignment is to exploit channel diversity (for example, time, fre-
quency, or space) whereby transmitters encode their signals in such a manner
that interfering signals occupy a smaller number of dimensions than the total
number of interfering signals. In this section, we describe the basic properties of
interference alignment. The presentation here is based on Reference [50], which
introduced the idea of interference alignment.
Consider a network with three transmitters and three receivers, each with
three antennas at each transmitter and receiver. We assume a frequency-flat,
fast-fading model where the channels between all antennas are independent,
identically distributed random variables from a continuous distribution. Let the
channel matrix Hj k ∈ C3×3 denote the matrix of channel coefficients between
transmitter j and receiver k. We shall assume that all the channel matrices are
488 Ad hoc networks
0.8
Matched
Outage Probability
Filter
0.6
Antenna
Selection
0.4 Receiver
MMSE
0.2
0
−40 −30 −20 −10 0 10 20
SINR (dB)
Figure 14.6 Outage probability versus SINR of linear receivers in a Poisson field of
interferers nr = 2 and 4 receiver antennas. The density of interferers ρ = 10−3
nodes/m2 , the length of the representative link r1 = 35.7, such that πρ r12 = 1, and
the path-loss exponent α = 4.
invertible, which holds with probability 1 if the channel coefficients are sampled
from a continuous distribution. We shall now show that transmitter 1 can send
two independent messages to receiver 1, and transmitters 2 and 3 can send one
independent message each to receivers 2 and 3 respectively. Thus, four indepen-
dent messages can be sent in three time slots without interference.
Suppose that transmitter 1 encodes its messages using two vectors v11 and v12 ,
and transmitters 2 and 3 use vectors v21 and v31 . That is to say, if transmitter
1 wishes to send the values s11 and s12 to receiver 1, it sends the entries of the
3 × 1 vector
s11 v11 + s12 v12 (14.39)
over the three antennas of the transmitters. Similarly, suppose that transmitters
2 and 3 wish to send the values s21 and s31 to receivers 2 and 3 respectively.
They respectively transmit entries of the following vectors on the three antennas,
s21 v21 (14.40)

s31 v31 . (14.41)
The interfering signals from transmitters 2 and 3 are aligned at receiver 1 if
H21 v21 = H31 v31 . (14.42)

The interfering signals from transmitter 3 and one of the interfering signals from
transmitter 1 (the signal associated with s11 in this case) are aligned at receiver
2 if
H12 v11 = H32 v31 . (14.43)
Similarly, we can align interference from transmitter 2 and one of the interfering
signals from transmitter 1 (the signal associated with s12 in this case) at receiver
3 if
H13 v12 = H23 v21 . (14.44)
Equations (14.42), (14.43), and (14.44) can simultaneously be satisfied using the
following choices of vj k . We start by setting v21 to the all-ones vector,
⎛ ⎞
1
v21 = ⎝ 1 ⎠ . (14.45)
1
Note that this choice of an all-ones vector is arbitrary. With this choice of v21 ,
we can find v31 by solving Equation (14.42), which yields
⎛ ⎞
1
v31 = H−1 13 H12
⎝ 1 ⎠. (14.46)
1
We can now solve for v11 by substituting the above expression for v31 into
Equation (14.43) which yields
⎛ ⎞
1
v11 = H−1 −1
12 H23 H13 H12
⎝ 1 ⎠. (14.47)
1
Finally, we can solve for v12 by substituting for v21 into Equation (14.44)
⎛ ⎞
1
v12 = H−1
31 H32
⎝ 1 ⎠. (14.48)
1
We should note here that all transmitters need to know the channel coefficients
of all receivers which will incur significant overhead in real systems. In Figure
14.7, interference alignment at receiver 1 is illustrated. The dashed arrows rep-
resent the signals of interest (that is, signals from transmitter 1) and the solid
arrows represent the interfering signals (that is, signals from transmitters 2 and
3). Observe that the two interfering signals lie on a single dimension at receiver
1, which leaves two other dimensions for useful signal communication.
In Figure 14.8, interference alignment at receiver 2 is illustrated. The dashed
arrows represent the signals of interest (that is, the signal from transmitter 2),
490 Ad hoc networks
Figure 14.7 Interference alignment at receiver 1.
and the solid arrows represent the interfering signals (that is, two signals from
transmitter 1 and and one from transmitter 3). Observe that the signal from
transmitter 3 and one of the signals from transmitter 1 (that is, the signal en-
coded with v11 lie on a single dimension). Together with the interfering signal
from transmitter 1, which is encoded with v12 , the total interference occupies
two dimensions, leaving an additional dimension for the useful signal.
In Figure 14.9, interference alignment at receiver 3 is illustrated. The dashed
arrows represent the signals of interest (that is, the signal from transmitter 3),
and the solid arrows represent the interfering signals (that is, two signals from
transmitter 1 and and one from transmitter 2). Observe that the signal from
transmitter 2 and one of the signals from transmitter 1 (that is, the signal en-
coded with v12 ), lie on a single dimension. Together with the interfering signal
from transmitter 1, which is encoded with v11 , the total interference occupies
two dimensions, leaving an additional dimension for the useful signal.
Hence, at each receiver, the interfering signals occupy two dimensions, leaving
an additional dimension for the desired signal. Thus, a zero-forcing receiver (see
Section 9.2.2) can be used to decode the signal without interference.
In this example, four concurrent transmissions are possible in a three dimen-
sional space. Extending this idea to 2 m + 1 dimensions, that is communication
Problems 491
over 2 m + 1 antennas with independent fading coefficients, it is shown in Refer-

ence [50] that it is possible to achieve
3m + 1
(14.49)
2m + 1
degrees of freedom for the 3 × 3 interference channel. Additionally, Reference [50]
shows that, for a K × K interference channel, by extending the communications
over a large number of dimensions, it is possible to achieve arbitrarily close to K2
degrees of freedom per antenna. That is to say, each user gets to communicate
approximately on half the available dimensions. It is worth noting that further
extensions of interference alignment for systems with imperfect channel-state
information, etc., have been proposed in the literature, for example, in References
[118], [44], [155], [228] and [156], where the last reference is a monograph that
contains a comprehensive survey of the topic.
Problems
14.1 Using the Ozgur hierarchical cooperation scheme, compare the bound on
the network throughput capacity given by Equation (14.6), for one and two
levels of hierarchy. In particular, use the bound to estimate the number of nodes
required in the network for the throughput capacity with two levels of hierarchy
to exceed the throughput capacity with one level of hierarchy. Your answer will
illustrate a weakness in using capacity scaling laws to estimate the performance
of practical wireless networks.
14.2 Consider a square wireless network of fixed area which is divided into
n uniform squares. In each of these squares, place a wireless node with uni-
form probability as shown in Figure 14.10, which illustrates a network with
n = 36 nodes. For simplicity, assume that the total interference in the network
is proportional to nα /2 and signal power decays with distance according to the
inverse-power-law model with path-loss exponent α > 2. Using these simplifying
492 Ad hoc networks
Figure 14.10 Illustration of a square network with n = 36 nodes.
assumptions, show that a multi-hop protocol can achieve a per-link throughput

√
capacity of the order n as n → ∞.
14.3 The upper regularized gamma function is defined as the ratio of an upper
incomplete gamma function and a gamma function (see Section 2.14.1) as follows:
Γ(ν, x)
Q(ν, x) = .
Γ(ν)
When the first parameter ν = L is an integer, the upper regularized gamma
function simplifies to a sum of elementary functions as follows,

L −1
xk −x
Q(s, x) = e .
k!
k =0
Additionally, the following is known about the upper regularized gamma function
for a positive real number q and integer L [364],
+
0, if q ≥ 1
lim Q(L, q L) =
L →∞ 1, if q < 1 .
Consider a homogenous Poisson network with a multiantenna receiver, and single-
antenna transmitters, with i.i.d. Rayleigh fading between all antennas.
(a) Ignoring the contribution of the noise and using the above properties of the
upper regularized incomplete gamma function, show that the SIR converges
in probability to a nonrandom limit if the number of antennas at the receiver
is increased linearly with node density.
Problems 493
(b) The result above suggests that it may be possible to scale ad hoc wireless
networks by increasing the number of receiver antennas with node density.
Discuss the feasibility of doing so.
14.4 Prove that the second term on the left-hand side of Equation (14.15) goes
to zero as n/nr → ∞.
14.5 Consider an ad hoc network with interferers distributed according to a

homogenous Poisson point process on the plane with single-antenna transmitters
and a multiantenna receiver with nr antennas at the origin. Assume that the
receiver uses a zero-forcing algorithm and cancels the interference due to the
nr − 1 closest interferers to it. Assuming Rayleigh fading across all antennas
and noting that the unitary transformation of a matrix with i.i.d. circularly
symmetric Gaussian random variables does not change the statistical properties
of the matrix, find the CDF of the SINR for this system.
14.6 Derive the integer-relaxed optimum number of streams for the multi-
stream transmissions in an ad hoc wireless network with multiantenna MMSE
receivers given in Equation (14.24).
14.7 Consider a circular network of radius R with a multiantenna receiver with

nr antennas at the origin. Let n interferers be independently distributed in the
circular network such that the distribution of their distances from the origin is
uniform, that is the probability density function of the distance of an interferer,
r, from the origin is
+
1
, if 0 ≤ x ≤ R,
p(r) = R (14.50)
0, otherwise .
Assume that the noise power is equal to P N −α at each antenna of the represen-
tative receiver and all nodes transmit with equal power in the network, with the
standard inverse-power-law path loss, and i.i.d., unit variance fading between all
antennas in the network. Show that βN = N −α SINR converges with probability
1 to a limit β as n, nr , R → ∞ such that n/nr equals a positive constant c,
and n = ρ R with ρ > 0 equal to a nominal density. Find an implicit expression
analogous to (14.15) that β needs to satisfy in this case. This problem is inspired
by results in Reference [122].
14.8 Consider an ad hoc wireless network with a representative receiver at the

origin and a representative transmitter at a fixed distance rT from the origin.
Assume that this link operates in the presence of interferers that are modeled
according to a homogenous Poisson point process on the plane with density
ρ. Assume that all users have single antennas and the channel between any
wireless node and any other wireless node follows a block fading model with
nc coherence bands, and M frequency subchannels. The channels between all
transmitter–receiver pairs on the Nc coherence bands are i.i.d. Rayleigh random
494 Ad hoc networks
Coherence bands
Magnitude of channel
coefficient
Subchannels
Figure 14.11 Illustration of a block-frequency fading channel with six coherence bands
and twenty four subchannels.
variables. Figure 14.11 illustrates the channel strengths between a transmitter–

receiver pair with Nc = 6 coherence bands and M = 24 subchannels, with four
subchannels per coherence band. Each transmitter picks one subchannel out of
the strongest coherence band between itself and its target receiver, on which to
transmit. Compute the CDF of the SINR of the representative link. You may
wish to use the result on the antenna selection receiver to solve this problem.
This problem is inspired by the results in [128].
14.9 Consider the antenna selection receiver in a spatially distributed network
whose SINR CDF conditioned on the link-length r1 is given by Equation (14.37).
Suppose that r1 is distributed according to a nearest-neighbor Poisson distribu-
tion whereby r1 is the distance between a reference point such as the origin, to
the nearest point of a homogenous Poisson point process with intensity ρc . Note
that the PDF of r1 is given by Equation (13.32). Please find a closed-form ex-
pression for the CDF of the SINR (without the conditioning on r1 ). This problem
is inspired by the results in [128].
15 Medium-access-control protocols
15.1 The need for medium-access control
In wireless and certain wired networks, multiple users share the same physical
medium. Data communication rates in networks can often be improved by us-
ing medium-access control (MAC) protocols, whereby multiple users share the
medium in a controlled manner such that the adverse effects of their interfering
signals is reduced. A general treatment of this topic can be found in Reference
[21]. The main reason for improved data rates with medium-access control is
that communication in noise typically tends to be at much higher data rates
than communication in interference if the data rates are a function of the signal-
to-interference-plus-noise ratio (SINR).
Earlier in the book, we introduced multiple-access schemes such as frequency-
division-multiple access (FDMA), time-division-multiple access (TDMA), code-
division-multiple access (CDMA) and space-division-multiple access (SDMA).
Each of these multiple-access schemes attempts to reduce interference by ensur-
ing that multiple links operate in orthogonal or approximately orthogonal spaces,
such as by time or frequency division. We did not, however, describe in much
detail how the assignments of frequency bands, time slots, or spatial dimensions
to users are made.
In cellular telephone networks, the assignments of links to time slots, frequency
bands, or codes can be made by the base station, which controls the behavior of
the mobile units in its own cell. The network topology (where there is a central
control node) and the connection-oriented nature of telephone links where links
stay operational for long periods (seconds or minutes) make this an attractive
approach.
There are, however, many scenarios in which communication is naturally
bursty, such as internet communications. In such systems, communications last
on the order of milliseconds and each user spends a large fraction of its time
not communicating. Communication data that are naturally bursty lend them-
selves well to a simple form of TDMA, where nodes transmit data when they
need to. Collisions will not be likely if the nodes in the network transmit data
very infrequently. Protocols that rely on this burstiness are generally termed
contention protocols. For the purpose of simplicity, we limit our discussions to
simple protocol scenarios, particularly with respect to the effect of propagation

delays.
15.2 The ALOHA protocol
An improvement over the “transmit-when-you-need-to” protocol is the ALOHA

protocol, which was introduced for packet radio communications in the 1970s at
the University of Hawaii, reported in Reference [3]. This system assumes that
there is a mechanism to detect packet collisions. In the ALOHA protocol, a node
that has a packet to transmit does so with probability p. If it does transmit and
a collision occurs, the node retransmits the packet after some random amount
of time has passed. A random wait time is used since a deterministic wait time
will surely result in another collision.
In general, the closed-form analysis of the throughput associated with any
protocol is difficult. However, a variant of the ALOHA system called slotted
ALOHA can be analyzed without too much difficulty. In slotted ALOHA, all
nodes are assumed to be synchronized in time and time is divided into equal-
length slots. Transmissions can only begin at the start of a slot and must last
the duration of the slot.
Consider a slotted ALOHA system where packets are transmitted in the net-
work according to a Poisson distribution with average rate of G packets per slot.
Such a system can arise in a network with a large number of nodes n, all of
which have backlogged packets (i.e., the nodes have collided packets awaiting
retransmissions), and transmit in a given slot with some probability. Suppose
that the probability that any given node transmits is
G
p= .
n
Since there are n nodes in the network, the average rate of packets transmitted in
the network is G. Note that in a network, the parameter G can be controlled by
changing the probability of transmission and retransmission of collided packets.
The probability of a successful transmission ps , in a given slot is equal to the
probability of exactly one transmission in that slot, which is equivalent to a
single arrival of a Poisson point process (see Section 3.1.16) in that slot. This
probability is given by
ps = G e−G .
Note that G is the average number of transmissions in the network during a given
slot. For very small G, we would expect the probability of successful transmission
to increase with G as it is very unlikely that more than one transmission would
occur when G is very small, but nevertheless a transmission is necessary for there
to be a successful transmission. But as G increases further, the probability of
successful transmissions reduces very rapidly as it is becomes likely that multiple
transmissions will occur at the same time resulting in collisions.
15.2 The ALOHA protocol 497
0.4
0.35
0.3
0.25
Ge−G
0.2
0.15
0.1
0.05
0
0 1 2 3 4 5 6
G
Figure 15.1 Probability of successful transmission versus packet arrival rate for slotted
ALOHA.
In Figure 15.1, the probability of a successful transmission versus G is illus-

trated. Note that the probability of successful transmission has a single maxi-
mum point. Taking the derivative of the probability of successful transmission
and setting to zero to find the maximum point yields
d
ps = e−G − G e−G = 0.
dG
for which G = 1 is a solution. With G = 1, the probability of successful transmis-
sion is 1e . Thus, the average throughput of a backlogged, slotted ALOHA system
when every user has a packet to send and does so with the optimal choice of
transmit probability,
1
≈ 36% .
e
The probability of no transmissions in a given slot is the probability of no Poisson
arrival of packets in that slot, which equals
1
e−G = .
e
for the optimal value of G = 1. Hence, in slotted ALOHA, approximately 36% of
the time is spent with successful transmissions, approximately 36% of the time
is spent idling, and the remainder, approximately 28% of the time, is wasted on
collisions.
A similar analysis can be carried out for systems that are unslotted but with
fixed packet duration. The throughput with the optimal packet arrival rate can
T1 R2 T2
R1
Range of T1
Range of T2
Figure 15.2 Illustration of hidden-node problem.
be shown to equal 2e 1
≈ 18% as is done in Reference [3]. The factor of two loss
arises from the fact that, when a given transmission begins, there cannot be a
transmission initiated in the packet duration preceding the current transmission,
in addition to the fact that there cannot be another packet transmission initiated
during the transmission of the given packet.
15.3 Carrier-sense multiple access (CSMA)
In slotted ALOHA systems, a node decides whether or not to transmit in a

passive manner, without actively trying to sense if another transmission is in
progress. A simple modification to the ALOHA protocol is to have nodes sense
the medium before transmission. A transmission can only be initiated if the trans-
mitting node does not detect any ongoing transmission. This type of protocol is
called carrier-sense multiple access (CSMA).
There are some obvious drawbacks to this system. For instance, in a congested
CSMA system at the conclusion of a transmission, multiple nodes could sense
that the medium is free and initiate transmissions at approximately the same
time, resulting in collisions.
Another scenario in which CSMA fails is called the hidden-node problem (or
hidden-terminal problem), which is illustrated in Figure 15.3. Suppose that T1
and T2 wish to transmit packets to R1 and R2 respectively. The circles indicate
the range over which T1 and T2 are able to sense a transmission. For simplicity,
let us assume that the range at which a transmission can be sensed is equal
to the range at which a transmission can cause a collision. In this situation, if
T2 initiates a transmission, T1 will not be able to detect it and may initiate a
transmission of its own, thereby causing a packet collision at R2.
T2 R2
T1
R1
Range of T1
Range of T2
Figure 15.3 Illustration of exposed-node problem.
A converse to the hidden-node problem is the exposed-node problem, which

does not result in a collision but in inefficient use of the medium. Consider
Figure 15.3. If T2 initiates a transmission intended for R2, T1 will not initiate
a transmission to R1 even though it can safely communicate simultaneously
with R1 without disrupting the link between T1 and R1. Note that we have
illustrated the hidden- and exposed-node problems by using ranges of detection
that are circles. However, one could easily create scenarios in which obstacles
placed appropriately cause the same problems. The hidden-node problem and
the fact that the probability of multiple, simultaneous transmissions in congested
networks is high make it necessary to control the rate of transmission attempts
to reduce collisions. As a result, the throughput of real CSMA systems is close
to that of ALOHA systems.
An additional enhancement to the CSMA protocol, which provides a signifi-
cant improvement to the throughput when packet sizes are large, is CSMA with
collision avoidance (CSMA/CA) presented in the next subsection.
15.3.1 CSMA with collision avoidance (CSMA/CA)

The CSMA/CA protocol uses a sequence of control messages to initiate the
transmission of a long data packet. The sequence of transmissions is designed to
address both the hidden- and exposed-node problems and to reduce the proba-
bility of a collision on the data packet. The startup sequence essentially reserves
3 1
2
6
Figure 15.4 Example network to illustrate operation of CSMA/CA. The unfilled

circles represent nodes that have data to transmit and the filled circles represent
nodes that are receiving data.
the medium for some duration of time, and most collisions are limited to the
initiation packets, which are short so that less time is wasted on collisions.
The basic sequence of transmissions for CSMA/CA utilizes two types of control
packets to initiate a transmission, the request-to-send (RTS) packet and the
clear-to-send (CTS) packet. The contents of the RTS packet are as follows.
• Message ID – identifies this as an RTS packet.

• MAC address of target node – this address uniquely identifies the intended
destination of the data packet.
• MAC address of source node – this address uniquely identifies the source of
the data packet.
• Length of the reservation – this tells all the nodes within range the duration of
the reservation so they know how long to wait to initiate new transmissions.
The CTS packet contains the same information as the RTS packet, except for a
different message ID. The basic sequence of a link is as follows.
(1) An RTS packet is transmitted by a node that desires to transmit data. This
packet informs all nodes within range of the transmit node that it is about
to start transmission and that all nodes except for the intended destination
of its transmission should not agree to receive packets for the duration of
the reservation.
(2) A CTS packet is transmitted by the destination node of the preceding RTS.
This packet informs all nodes within range of the destination node that it
is about to start receiving a data packet and that all other nodes within its
vicinity should not transmit anything for the duration of the reservation.
(3) After the CTS packet is received by the transmit node, it can then start
transmitting its data packet.
(4) After the duration of the reservation is completed, other nodes can then
initiate their own transmissions.
To illustrate, consider the network in Figure 15.4 in which the unfilled circles
represent nodes that have packets to transmit. Suppose that node 1 wishes to
transmit a packet to node 3. Consider the timing diagram in Figure 15.5 where
Node 2 knows someone is

about to start receiving and Node 2 can initiate its
does not initiate any own transmission now
transmissions
1
RTS CTS Data for 3 t
2 RTS CTS Data for 3 RTS t
3 RTS CTS Data for 3 t
4 RTS CTS Data for 3 t
Node 4 knows someone is

about to start transmitting data
and knows not to accept any
transmissions as a destination
node
Figure 15.5 CSMA/CA timing diagram for successful transmission.
the numbers on the left indicate the nodes and the packets are represented by
rectangles with the labels indicating the type of packet. The arrows indicate the
propagation of the source packet to the different nodes. For instance, the RTS
packet transmitted by node 1 first arrives at node 2, followed by nodes 3 and 4.
The sequence of events is as follows.
(1) Node 1 initiates a link by transmitting an RTS message with node 3 as the
destination.
(2) The RTS packet from node 1 is received after some propagation delay at
nodes 2, 3, and 4.
(3) Upon receiving the RTS packet from node 1, node 2 waits to see if it can
detect a CTS from node 3. If node 2 detects a CTS packet from node
3, it knows not to transmit because node 3 is about to receive a data
transmission.
(4) Node 3 receives the RTS from node 1 and makes a decision on whether or
not to accept the transmission from node 1.
(5) Node 4 knows not to accept any transmissions after receiving the RTS from
node 1 as it now knows that a data transmission is about to begin in its
vicinity.
(6) Node 3 decides to accept the data packet from node 1 and transmits a CTS
packet.
(7) Node 2 receives the CTS packet from node 3, indicating to node 2 that
5
4
2 3
Assume node 5 cannot detect transmissions from nodes 1, 2, and 3
Assume node 4 cannot detect transmissions from nodes 1 and 2
Figure 15.6 CSMA/CA network that results in a data packet collision with timing
diagram in Figure 15.7.
node 3 is within range of it and and is about to start receiving data. Hence,
node 2 knows not to start a transmission of its own.
(8) Node 4 receives the CTS packet from node 3 and knows not to initiate a
transmission.
(9) Node 1 receives the CTS packet from node 3 so it knows that its RTS has
been accepted and that it can now transmit a data packet to node 3.
(10) Node 1 then initiates its data transmission.
(11) After the duration of the reservation expires, node 2 is now free to transmit
data by first sending an RTS packet.
Note that while the RTS/CTS exchange reduces the probability of collision on
the data packet, it is still possible to have data packet collisions in CSMA/CA
systems, as illustrated by the network of Figure 15.6 and the associated tim-
ing diagram in Figure 15.7. Suppose that in Figure 15.6, node 5 cannot detect
transmissions from nodes 1, 2, or 3 either because it is too far away from those
nodes or because of obstacles. Additionally, suppose that node 4 cannot detect
transmissions from nodes 1 or 2. The associated timing diagram is illustrated in
Figure 15.7 where dropped packets (either due to nodes being out of range or
collisions) are represented by the × symbol.
Suppose that node 1 wishes to transmit a packet to node 3 and node 4 wishes
to transmit a packet to node 5. The following is the sequence of transmissions
illustrated by Figure 15.7, which results in a collision on the data packet trans-
mitted by node 1 intended for node 3.
(1) Node 1 initiates transmission by sending an RTS packet.
(2) Node 3 successfully receives the RTS packtet from node 1.
(3) Node 4 does not receive the RTS packet from node 1 because it is out of
range.
range.
(5) Node 4 initiates its own transmission by sending an RTS message.
(6) Node 5 successfully receives the RTS packet from node 4.
Node 2 knows someone is about to

receive data in its vicinity even
though it missed corresponding RTS
1
RTS CTS Data t
2
RTS CTS Data RTS t
LOST PACKET
3
RTS CTS Data t
4
RTS CTS Data t
5
RTS CTS Data t
Node 4 does not

receive the RTS from Node 4 does not receive the CTS from node 3 as
node 1 and does not it was transmitting data when the CTS from
know to wait for a CTS node 3 arrived.
Figure 15.7 CSMA/CA timing diagram with data packet collision. Arrows indicate
transmissions and crosses indicate lost data or control packets.
range.
(8) Node 3 initiates transmission of a CTS packet accepting the RTS from node
1 at the same time that the RTS from node 4 arrives at node 3 and, hence,
does not receive the RTS from node 4.
(9) Node 5 transmits a CTS packet accepting the RTS from node 4.
(10) Node 4 receives the CTS packet from node 5.
(11) Node 2 receives the RTS packet from node 3, but does not receive the RTS
from node 5 because it is out of range.
(12) Node 3 does not receive the CTS packet from node 5 because it is out of
range.
(13) Node 1 does not receive the CTS packet from node 5 because it is out of
range.
(14) Node 1 receives the CTS packet from node 3.
(15) Node 4 initiates data transmission to node 5.
(16) Node 3 receives data packet from node 4.
(17) Node 1 does not detect data transmission from node 4 because it is out of
range and initiates data transmission to node 3.
(18) Node 3 suffers a dropped data packet because the data packet intended for it
from node 1 arrives when the data packet from node 4 is being transmitted.
Note that the RTS/CTS packet exchange including the associated delays make
the CSMA/CA scheme useful only if the size of the data packet is significantly
larger than the duration of the RTS/CTS exchange. A closed-form analysis of
the efficiency and throughput of CSMA/CA is difficult and is highly dependent
on packet sizes and propagation delays. Hence, most studies on the efficiency of
such networks are done empirically, either using hardware or simulation.
15.4 Non-space-division multiple-access protocols
The simplest way to exploit multiantenna systems in wireless networks is to

simply use the multiplexing or beamforming abilities afforded by multiantenna
systems in conjunction with a noninterference protocol such as CSMA/CA. Com-
pare this to the protocols described in Section 15.5, where the antenna arrays
are explicitly used to enable simultaneous transmissions.
Non Space-Division Multiple-Access (SDMA) multiantenna protocols operate
in a similar fashion to single-antenna protocols where transmissions within in-
terfering range of each other do not overlap in time or frequency (usually time).
The multiple antennas can then be used to multiplex transmissions and/or beam-
forming. The main difference between non-SDMA multiantenna protocols and
single-antenna protocols is that provisions need to be included in the protocol to
estimate and exchange channel parameters. For instance, in a typical CSMA/CA
protocol, the RTS message would include training signals designed for MIMO
channel estimation. The CTS message sent by the target receiver would include
the estimated channel parameters so that the transmitter would be able to appro-
priately encode its transmissions, such as by using a singular-value decomposition
as described in Chapter 8.
One such protocol is the IEEE 802.11n protocol [151]. The basic operation of
this protocol follows its previous versions (IEEE 802.11a/b/g) with a CSMA/CA-
based system. Enhancements are provided to the basic CSMA/CA protocol to
enable operations in full MIMO mode (i.e., parallelized, multistream transmis-
sions), spatial multiplexing mode (where transmitters do not have channel-state
information but transmit independent data streams through each antenna), and
antenna selection where the best among the available pairs of antennas is used
for communications. Note that these MIMO encoding techniques have been de-
scribed in Chapter 8.
15.5 Space-division multiple-access (SDMA) protocols
15.5.1 Introduction
The ability of antenna arrays to suppress interference can be exploited in the
context of MAC protocols, allowing users to be separated spatially rather than
Interferer
Target
transmitter
Interferer
Figure 15.8 SDMA network.
by frequency or in time. Consider a system with nr antenna elements at a re-

ceiver and M users. Traditional communication protocols such as ALOHA and
CSMA/CA are based on orthogonal communication whereby only one link oc-
cupies a particular frequency in a given time, i.e., multiple links are not allowed
to interfere with each other. Events in which multiple transmissions occur at the
same time and in the same frequency are considered collisions.
With antenna arrays, it is possible to achieve orthogonal communications even
though transmissions occur simultaneously and in the same frequency band since
users can be spatially separated. An nr element receiver can null signals arriving
from nr − 1 users while maintaining a desired signal level from the target user as
described in Section 9.2.2. This nulling can be done by projecting the received
signal onto the subspace orthogonal to the subspace spanned by the channel
vectors of the unintended targets. Suppose that a single receiver with nr antennas
wishes to detect the signal from a particular transmitter in the presence of M −1
interferers as illustrated in Figure 15.8 where M = 2.
Let the channel from the target mobile transmitter to the representative re-
ceiver be represented by h1 ∈ Cn r ×1 and the channels from the unintended
transmitters to the receiver be denoted by h ∈ Cn r ×1 , = 2, 3, . . . M . The
received signal vector is then

M
z = h1 s1 + h s + n , (15.1)
=2
where s are the transmitted symbols and n ∈ Cn r ×1 is a vector of i.i.d. circularly

symmetric, complex Gaussian CN (0, σ 2 ) noise entries.
Node 2 accepts Node 2 estimates channels

request from node 1 from nodes 3 & 4
Node 1 requests a link
1 with node 2
RTS1 CTS RTS3 RTS4 Data1 t
2
RTS1 CTS RTS3 RTS4 t
3
RTS1 RTS3 RTS4 Data3 t
4
RTS1 RTS3 RTS4 Data4 t
Node 3 sends Node 4 sends

RTS signal RTS
Figure 15.9 Timing diagram for SDMA protocol. Arrows indicate transmissions.
By using the analysis in Section 9.2.2, the weight vector w ∈ Cn r ×1 , which

nulls the signals arriving from the M − 1 interferers, is given by
A = (h2 |h3 | · · · | · · · |hM ) (15.2)

v† = h†1 (I − A (A† A)−1 A† ) (15.3)
v
w= . (15.4)
||v||
If nr ≥ M , there will not be any residual interference at the output of the
receiver. Hence, orthogonal communications are possible in the same time and
frequency space by using antenna arrays.
The SPACE-MAC protocol described in Section 15.5.3 uses this general idea to
accommodate multiple simultaneous transmissions in the same frequency band.
Instead of scheduling transmissions in time or frequency, this technique requires
all receivers to estimate the matrix of channel vectors of the interferers A, in
addition to the channel vector of the target transmitter h1 . Hence, protocols that
use orthogonal spatial communications must have a mechanism for estimating
these vectors.
15.5.2 A simple SDMA protocol

A simple protocol for a spatial zero-forcing system is shown in Figure 15.9. In
this example, node 1 wishes to transmit a data packet to node 2 with node 3 and
4 as potential interferers that wish to transmit packets to other nodes (omitted
from the timing diagram). The sequence of events leading up to successful data
transmissions is as follows.
(1) Node 1 transmits an RTS packet containing training data that can be used
by node 2 and any other node that wishes to receive a transmission.
(2) Node 2 receives the RTS packet from node 1 and transmits a CTS packet
indicating that it accepts node 1’s transmission.
(3) Nodes 3 and 4 transmit RTS messages to other nodes, initiating links of their
own.
(4) Node 2 estimates the channel between nodes 3 and 4 to itself from the RTS
messages sent by those nodes.
(5) Node 1 transmits its data packet.
(6) Nodes 3 and 4 transmit data packets to their respective destinations.
(7) Node 2 uses the estimated channels from nodes 1, 3, and 4 to itself to perform
zero-forcing.
(8) The destinations of the packets from nodes 3 and 4 use the channels esti-
mated from the respective RTS packets to perform zero-forcing.
Note that this protocol requires the RTS and CTS packets to be transmitted
without collisions by all nodes. The data packets can, however, be transmitted
simultaneously as the required channel estimations can be performed by using
the RTS/CTS exchanges. It is also possible to not require destination nodes to
transmit CTS packets as they technically do not need to indicate to adjacent
nodes that they are receiving data as in the CSMA/CA protocol because the
destination nodes can perform zero-forcing. However, the CTS packets could be
useful to indicate acceptance of a link request.
15.5.3 SPACE-MAC
The SPACE-MAC protocol described in Reference [242] implements a more so-
phisticated version of the simple SDMA protocol described above. In SPACE-
MAC, RTS/CTS exchanges are used to request and accept transmissions as well
as to estimate channel parameters to perform nulling. Unlike the simple proto-
col described in Section 15.5.2, however, SPACE-MAC allows nodes to initiate
links during ongoing data transmissions, using antenna arrays at the transmitters
to place nulls in the directions of nodes receiving data during their RTS/CTS
handshakes.
This ability can be accomplished as follows. Consider a network of four nodes,
each with an antenna array with node 1 wishing to transmit to node 2, and node 3
wishing to transmit to node 4. Figure 15.10 is a timing diagram for the SPACE-
MAC protocol with four users. The dashed lines represent transmissions with
nulling, i.e., the transmitter of the packet places nulls in the direction(s) of the
receivers to which the dashed arrows connect. Hence, the dashed lines represent
very weak signal paths that are assumed to not disrupt ongoing receptions.
Node 1 requests a link Node 2 accepts

with node 2 Request from node 1
1
RTS1 CTS1 Data1 t
2
RTS1 CTS1 Data1 t
3
RTS1 CTS1 RTS3 CTS4 Data3 t
4
RTS1 CTS1 RTS3 CTS4 Data3 t
Nulls in node 2’s

Node 4 estimates direction
Node 3 estimates channel to node 2
channel to node 2
Figure 15.10 Timing diagram of SPACE-MAC protocol. The bold arrows indicate
nulls which are placed in the direction of node 2 to avoid interfering with node 2’s
reception.
(1) Node 1 transmits an RTS packet requesting to transmit to node 2 with a

default beamforming vector at its transmitter.
(2) Nodes 2, 3, and 4 estimate the channels between themselves and node 1,
using the signals from the RTS packet from node 1.
(3) Node 2 accepts the RTS from node 1 by transmitting a CTS packet.
(4) Nodes 3 and 4 hear the CTS packet from node 2 which informs them that
node 2 has accepted the request from node 1. Nodes 3 and 4 also estimate
the channels between node 2 and themselves, using the signals from the CTS
packet.
(5) After receiving the CTS packet from node 2, node 1 begins transmitting data
intended for node 2.
(6) During the data transmission from node 1, node 3 sends an RTS packet
initiating a link with node 4 but performs transmit beamforming by using
the estimated channel from itself to node 2 such that the signal to node 2 is
nulled, thus not interfering with node 2’s reception of the data packet from
node 1.
(7) Node 4 sends a CTS accepting node 3’s request to initiate a link and performs
transmit beamforming by using the estimated channel from itself to node 2
such that the signal to node 2 is nulled, thus not interfering with node 2’s
reception of the data packet from node 1.
(8) Node 2 receives the CTS from node 1 and initiates data transmission with
beamforming.
Suppose an additional pair of nodes, nodes 5 and 6, were part of the network
but only have two antennas at each node. They will not be able to initiate a
link once nodes 1 and 3 have initiated their links because they do not have
sufficient degrees of freedom to null transmissions to both nodes 2 and 4 (that
is the receiver sides of the preestablished links).
15.5.4 The reciprocity assumption

SPACE-MAC, the simple SDMA protocol in Section 15.5.2, and many other
proposed MIMO-MAC protocols rely on the key assumption that the channels
between nodes are reciprocal. That is, the channel between a pair of nodes with
node A transmitting and node B receiving equals the channel between the nodes
if node B were transmitting and node A receiving.
While the physical medium between the antennas of a pair of nodes is well
approximated as reciprocal, transmit and receive signal paths within the radios
in most systems are different. For instance, most systems use different amplifiers
for transmitting and receiving signals, resulting in different transmit and receive
signal paths. The correct operation of this protocol would thus require that
the transmit and receive paths to be well calibrated or the differences between
the two paths be well characterized. Additional parameters to compensate for
transmitter–receiver path differences could then be communicated during the
RTS/CTS exchange.
For instance, consider a flat-fading channel model in which the channels (in-
cluding transmit and receive signal paths) between the antennas of nodes and
j with node transmitting and node j receiving are represented by Hj = Hj .
Suppose that the channel between nodes and j can be decomposed as
Hj = Rj Gj T , (15.5)
with T representing the characteristics of the transmit signal path in node
(assumed to not depend on j), Rj representing the characteristics of the receive
signal path in node j (assumed to not depend on ) and Gj = Gj representing
the physical channel between the antennas of the nodes and j. Suppose that T
and Rj are precharacterized. If T and R are included as fields in the RTS and
CTS packet, a node that has estimated Hj from the RTS (or CTS) transmission
from node can estimate Hj as follows:
Hj = R R−1 −1
j Hj T Tj , (15.6)
assuming that T and Rj are invertible.
15.5.5 Ward protocol

The Ward protocol introduced in References [336, 337] is a technique to improve
the performance of a slotted ALOHA network by using an adaptive antenna
Figure 15.11 Network model for Ward protocol.
array at one of the nodes. The node endowed with the antenna array acts as
a base station or access point, and aids in relaying packets from one node to
another as illustrated in Figure 15.11. Acknowledgment packets and timeouts
can be used to detect dropped packets in this protocol.
The base station uses its antenna arrays with multiple sets of weights, each
tuned to receive a packet from a particular source node while placing a null in the
directions of the other nodes. By doing this, multiple packets can be successfully
received by the base station. The base station then forwards packets to their
respective destination nodes by using an orthogonal communication protocol.
In order to design a set of weights that simultaneously focuses the signals from
a target transmitter while placing a null in the directions of the others, the base
station needs to estimate the channels between the antenna of the target and its
own antennas, as well as the covariance matrix of the aggregate received signals
from all the interfering transmissions and the target transmitter. The protocol
specifies a method for the base station to acquire these parameters from each
transmitter.
Each time slot is divided up into two intervals. The first Tu time units is called
the uncertainty window. Nodes that wish to transmit must begin their transmis-
sions at a random time in the uncertainty window. Each transmitted packet
is made of three consecutive intervals PN1, PN2, and PN3, during which the
same pseudorandom sequence is transmitted followed by data carrying samples.
For this discussion, we include any header, destination and other housekeep-
ing bits in the data. The duration of the pseudorandom sequence is TP N and
Tu + = TP N where is a small positive number. That is to say, the duration
of the PN transmission is slightly greater than the length of the uncertainty
window. Thus, all other nodes are guaranteed to be transmitting a PN sequence
during the transmission of the second PN sequence by any given transmitting
Slot
Uncertainty
window
Node 1 PN1 PN2 PN3 Data
Time
Channel and
Covariance matrix
estimation
Base station
Spike in base station Spike in base station

matched-filter 1 matched-filter 2
Figure 15.12 Timing diagram of the Ward protocol.
node. This is illustrated in Figure 15.12, whereby during the transmission of PN2
by node 1, node 2 is also transmitting a pseudorandom signal. Similarly, during
the transmission of PN2 by node 2, node 1 is also transmitting a pseudorandom
signal.
Suppose that the system is designed to support K simultaneous receptions.
The start of transmission is detected by a bank of K matched-filter receivers
at the base station that correlate the received signals with the pseudorandom
sequence. The presence of a pseudorandom sequence will manifest itself as a
sudden increase in the output of the matched filters. The start of a packet is
declared if the output of the matched filter exceeds some threshold. Suppose
that the matched filters are numbered 1, 2, . . . , K, where the kth matched filter
is used to detect the start of the kth packet. The output of the kth matched
filter is not compared against a threshold unless the k − 1st matched filter
has been triggered. This mechanism enables the bank of matched filters to de-
tect packets that begin at different times. Note that pseudorandom sequences
will not correlate if offset in time. Therefore, even though the PN sequence of
node 2 is transmitted during the time that the first PN sequence is transmitted on
node 1, it will not contribute significantly to the output of the first matched fil-
ter. The channel and interference covariance matrix estimations are performed in
the interval TP N immediately proceeding the detection of a packet. For a more
detailed discussion of synchronization issues please see Chapter 17.
The channel and covariance matrix estimations are performed as follows. Sup-
pose that the sampled vector of signals received on each antenna of the base
station at time n is z[n]. Suppose that the jth packet is detected at time nj
and the th pseudorandom value is p . Then, the channel vector estimation for
the first node is performed by weighting the received signal vector by the pseu-
dorandom value and averaging for nr samples. Note that the duration of the
ns samples must be less than TP N . The estimated channel vector ĥ1 ∈ Cn r ×1
between the antenna of node 1 and the antennas of the base station is then given
by
1
ns
ĥ1 = z[n1 + ] p . (15.7)
ns
=1
Note that the signal contribution from node 2 will average out if ns is large.
The covariance matrix of the signals received at node 1 is estimated by using a
sample covariance matrix as follows:
1
ns
R̂1 = z[n1 + ] z† [n1 + ] .
ns
=1
Then, an approximate MMSE receiver to detect the packet from node 1 can
then be found as follows:
w1† = a ĥ†1 R̂−1
1 ,
where a is a scale factor. The data samples from node 1 can then be estimated
as
w1† z[n1 + 2ns ] ,
where npn is the length of the pseudorandom sequence in samples.
The sequence shown in Figure 15.12 can be described in words as follows.
(1) The base station is continually monitoring the output of its first matched-
filter.
(2) Node 1 commences transmission.
(3) Node 2 commences transmission.
(4) At the end of the pseudorandom sequence transmitted by node 1, the matched
filter at the base station detects the presence of the pseudorandom sequence.
(5) The base station commences channel and covariance matrix estimation to
compute the weights for the packet from node 1.
(6) At the end of the pseudorandom sequence transmitted by node 2, the matched
filter at the base station detects the presence of the pseudorandom sequence.
(7) The base station commences channel and covariance matrix estimation to
compute the weights for the packet from node 2.
(8) When weight estimations are complete, the base station applies the weight
vector for node 1 to detect signals from node 1 and the weight vector for
node 2 to detect signals from node 2.
There are several failure modes for the Ward protocol. Since multiple trans-
missions can be simultaneously received, the traditional definitions of packet
collision do not hold. Packets are not successfully received if any of the following
occur.
(1) Insufficient degrees of freedom at the receiver. With nr antennas at the re-
ceiver, the base station can only place nr − 1 nulls. Hence, at most nr − 1
simultaneous transmissions are possible. If more than nr − 1 packets are
transmitted during any one slot, it is highly likely that none of the packets
will be received successfully as the base station will not be able to null the
interfering packets.
(2) Insufficient resolution at the receiver array. Even though there are sufficient
degrees of freedom at the receive array, if the channel vectors of two transmit
nodes are close, the MMSE receiver may not be able to place a null in the
direction of one of the packets while focusing on the other. This will result
in poor SINR for the packets in concern and likely result in unsuccessful
reception of those packets. Note that packets from other nodes may still be
received successfully.
(3) Transmissions that commence within one sample of the pseudorandom
sequence. If transmissions from multiple nodes commence within a small
amount of time from each other, the pseudorandom sequences will signifi-
cantly contribute to each other and the matched filter at the base station may
detect a single packet even if there are multiple packets. This is illustrated
in Figure 15.13. The channel estimation algorithm will end up estimating
h1 + h2 .
A detailed analysis incorporating all these failure modes can be found in [337],
which shows that a throughput of approximately 3.6 packets per slot may be
successfully transmitted in a system with 10 element arrays and K = 6. Com-
pare this with a throughput of approximately 0.37 for slotted ALOHA without
antenna arrays. Note here that the overhead associated with the pseudorandom
sequence is assumed to be negligible compared to the data.
15.5.6 Summary of some existing SDMA protocols

In this section, we summarize some existing protocols that use multiple antennas
to enable simultaneous transmissions.
Mitigating interference with multiple antennas MAC (MIMA-MAC)

The MIMA-MAC protocol which stands for mitigating interference with multi-
ple antennas medium-access control protocol is a protocol that uses SDMA to
enable multiple simultaneous transmissions [242]. The protocol assumes that all
nodes in the network are synchronized and time is slotted. Each slot corresponds
to a MIMA-MAC frame during which multiple transmitter-receiver pairs can
communicate. The structure of the MIMA-MAC frame for two simultaneous
Slot
Uncertainty
window Transmissions commence within an
interval of 1 bit of the PN sequence
Base station
Spike in base station matched-filter 1 in

response to packets from node 1 and node 2
Figure 15.13 Timing diagram of the Ward protocol with a collision at the start of
transmission.
MIMA-MAC frame
CS 1 CS 2 TS 1 TS2 Data Ack1 Ack2
RTS/CTS exchange Channel est.
Figure 15.14 Frame structure for the MIMA-MAC protocol.
links, where each receiver has at least two antennas, is illustrated in Figure
15.14.
The MIMA-MAC frame is divided into a contention period, training period,
data period, and acknowledgment period. Except for the data period, all other
periods are divided into two slots (or N slots if N simultaneous transmissions
are desired). The contention period is divided into contention slot 1 (CS1) and
contention slot 2 (CS2). During the contention slots, an RTS/CTS exchange as
described in Section 15.5.3 takes place. The link that succeeds in carrying out the
RTS/CTS exchange in CS1 will not participate in CS2, and the transmit side of
that link will send a training sequence in training slot 1 (TS1) and data during the
data slot. The receiver side of the link will transmit an acknowledgment during
ACK1 if it successfully decodes the data. Likewise, the transmit side of the link
that succeeds in carrying out the RTS/CTS exchange during CS2 will transmit
a training sequence during training slot 2 (TS2) and data during the data slot
(simultaneous to the link that succeeded in CS1). The receiver side of that link
will transmit an acknowledgment packet during acknowledgment slot 2 (ACK2).
During TS1 and TS2, both receivers will estimate the channels between their
antennas and the antennas of the respective transmitting nodes. Hence, at the
conclusion of TS2, both receivers know the channel parameters between their an-
tennas and both transmitters. During the data slot, both transmitters send their
signals simultaneously, and each receiver performs beamforming to null interfer-
ence from the undesired transmitter while focusing on the desired transmitter.
The protocol also has mechanisms to minimize collisions during the RTS/CTS
exchange and a backoff method to reduce probability of transmissions when con-
gestion is high. The interested reader can consult the original publication of the
protocol in Reference [242] for further details.
The comparison between MIMA-MAC and conventional CSMA/CA can be
made by comparing TDMA to SDMA since conventional CSMA/CA is essen-
tially a form of TDMA. From a degrees-of-freedom perspective, SDMA does not
offer any performance benefit compared to TDMA. However, if we assume that
nodes have short-term power constraints, MIMA-MAC with two simultaneous
transmissions has twice the transmit power since two nodes can transmit data
simultaneously. In a conventional CSMA/CA protocol, only one node transmits
and, hence, has half the power.
NullHoc protocol
The NullHoc protocol is a medium-access control protocol that uses beamforming
in both transmit and receiver sides as opposed to MIMA-MAC, which only uses
receive-side beamforming. A key assumption of the NullHoc protocol is that
channels are reciprocal.
The NullHoc protocol provides for channel estimation and exchange of channel
information between any given node and its nearby nodes. The protocol requires
that the total available bandwidth B be divided into a control channel and a data
channel where a factor 0 < α < 1 is used to assign a fraction of the bandwidth
to the control channel. The protocol allows the channel to be partitioned either
in frequency or in code space where, in the latter case, the control and data
channels use orthogonal sets of codes.
The control channel uses a CSMA/CA-type protocol to reserve the data chan-
nel for the duration of the data channel communication. Assuming that node 1
wishes to communicate with node 2, the following exchange takes place prior to
transmission on the data channel.
(1) Node 1 sends a request-to-send (RTS) packet to node 2. The RTS packet
includes pilot signals to enable other nodes to perform channel estimation
and the set of weights it will use to receive the acknowledgment (ACK)
packet at the end of the data transmission.
(2) If node 2 is able to accept, it sends a clear-to-send (CTS) packet to node 1.

The CTS message contains pilot signals, the antenna weights node 2 will use
during data reception, and the transmit antenna weights it plans to use for
transmitting the ACK.
(3) Node 1 responds with a data-send (DS) control message that contains the
weights it intends to use for its transmit beamformer during the data trans-
missions.
Channel estimations are performed by all nodes in the network using the pi-
lot training sequence. Beamformer weights at the transmit and receive sides are
performed using a zero-forcing-type algorithm specified in [225]. Simulations in-
dicate that NullHoc can have up to double the throughput of 802.11 when the
number of antennas is large [225].
STI-MAC
The simultaneous-transmissions-in-interference MAC (STI-MAC) protocol spec-
ifies methods for communications in the presence of interference using a com-
bination of multi-carrier CDMA (MC/CDMA) and SDMA with antenna arrays
and was introduced in Reference [289]. The most significant difference between
this protocol and the majority of the protocols described above is that it does
not depend on orthogonal communications; that is to say, the protocol allows for
communication in interference, which is accomplished by using a linear minimum-
mean-square-error receiver that combines the degrees of freedom provided by the
antenna array and the multi-carrier CDMA system. The MMSE receiver does
not completely remove interference but rather optimally balances interference
suppression with noise suppression to maximize the SINR. In contrast, proto-
cols such as SPACE-MAC, NullHoc and MIMA-MAC all depend on the antenna
arrays to completely null the interference.
Like the NullHoc protocol, the STI-MAC protocol depends on a control chan-
nel for session initiation and training, and a data channel for payload transmis-
sions. The protocol has provisions for estimating all the required channel parame-
ters and a novel protest mechanism that allows an ongoing link to protest against
a new user entering the network if the new user’s presence will compromise the
ongoing link. The control channel can be allocated in frequency or time.
Transmissions with and without channel-state information are possible in this
protocol with slightly differerent session initiation sequences for each. For sys-
tems with the control channel allocated in time, time is divided periodically into
data and control slots. The control channel is operated using a slotted-ALOHA
protocol.
The operation of STI-MAC in its most basic form, without channel-state in-
formation at the transmitter and no protest messages is illustrated in Figure
15.15. The dashed lines are used to indicate control channel slots. Node 1 wishes
to transmit data to node 2, and node 3 is currently receiving data from some
other node. The control channel transactions needed to establish a link between
node 1 and node 2 are shown in Figure 15.15. Note that all data transmissions
cease during the control channel slots and commence once the control channel
period ends. The following sequence of transactions is depicted in Figure 15.15.
(1) Node 1 transmits a session initiation request packet that contains a prede-
termined training sequence and the address of its target receiver, node 2.
(2) Node 2 estimates the channel parameters between node 1 and itself using
the session initiation packet from node 1 and decodes the session initiation
packet. Node 2 also determines the SINR it expects to see during the data
channel if node 1 transmits. This is done by using the channel parameters es-
timated from node 1’s initiation request packet and an estimated interference
covariance matrix based on previous channel estimations.
(3) Node 3 estimates the channel between node 1 and itself using the session
initiation packet from node 1, and since it is in the midsts of receiving data,
it computes the SINR it would observe during the data channel slot if node
1 is transmitting. In this case, node 3 determines that its SINR will be
sufficiently high during the data channel even if node 1 transmits. It there-
fore does not send a protest message.
(4) Node 2 agrees to receive data from node 1 and sends an initiation response
message to node 1 indicating its acceptance. As with the CTS message in
CSMA/CA, this message also indicates to other nodes that node 2 is about
to receive data in the data slot.
(5) The three slots following the initiation response message from node 2 are
protest slots. During this time, any node with an already established link that
determines that a transmission by node 1 during the data channel will cause
its SINR to fall below an acceptable level can send a protest message that
will cause node 1 and node 2 to not commence their link in the next data slot.
(6) If no protests are heard, node 1 may begin transmission at the next data
channel.
(7) Node 3 adapts its MMSE receiver to compensate for the added interference
caused by node 1’s transmission during the data slot. This is accomplished
by augmenting its estimated interference covariance matrix with the channel
information it estimated in step 3.
Note that the STI-MAC protocol is susceptible to the hidden-node problem
since the node initiating the new link (node 1 here) does not respond to the initi-
ation response message from the new receiver (node 2 here) with another control
packet. In protocols such as 802.11, this type of packet is used to inform receivers
within range of the new transmitter, but out of range of the new receiver, that
a link is going to be initiated.
Note that STI-MAC has provisions for transmissions with channel-state infor-
mation whereby immediately following the initiation response message, the new
receiver (node 2 in this example) estimates and transmits the covariance matrix
that should be used by the transmitter during the data channel. Other nodes
currently receiving data can hear this information and use it to estimate the
SI
Data Channel
IR
protest window
Figure 15.15 Timing diagram of STI-MAC with no transmit channel-state information

and no protest messages.
No data
transmission
due to protest
N1
SI
Data Channel
N2
IR
N3 P
protest window
Figure 15.16 Timing diagram of STI-MAC with no transmit channel-state information

and a protest message.
effects of the new transmitter on their ongoing links, which determines whether
or not nodes transmit protest messages (see Figure 15.16).
Problems
15.1 Derive the throughput capacity of unslotted ALOHA under the assump-
tion that the packet duration is a constant.
15.2 Construct a different scenario where the CSMA/CA protocol fails and
results in a collision during transmission of the data packet. Your answer should
include a timing diagram as well as a figure that describes the relative positions
of nodes and/or obstacles.
15.3 Modify the Ward protocol described in Section 15.5.5 such that it applies
to ad hoc wireless networks, that is networks with one-to-one links. You should
construct a timing diagram for a scenario where a link is successfully established.
15.4 For the carrier-sense-multiple-access system described in Section 15.3,
construct a scenario where a packet collision occurs. Discuss the role of
Problems 519
propagation delays and the amount of time required for sensing the medium
in the probability of a collision.
15.5 Qualitatively explain why a randomized sensing duration could result in
lower probability of collision in carrier-sense-multiple-access systems compared
to a fixed sensing duration.
15.6 Assuming that channels are static, construct a scenario where the SPACE-
MAC protocol described in Section 15.5.3 results in a collision during the data
packet transmission.
15.7 Construct a timing diagram for a simple interference-alignment protocol
with three transmit and receive pairs, each with three antennas. You may use the
system described in Section 14.4. Your may assume channel reciprocity between
all antennas, and only consider a case where links are successfully established.
Your answer should indicate when all necessary channel estimations are per-
formed and in which packets channel parameters that cannot be estimated are
exchanged.
15.8 Assuming devices that can simultaneously transmit and receive signals
in different frequency bands, consider the following communications protocol
which is a variant of busy-tone protocols (see, for instance, Reference [20]). The
available bandwidth B is divided into nc data subchannels and nc busy-tone
channels where each data channel has a corresponding busy-tone channel at
significantly different frequency range to enable simultaneous transmissions and
receptions. A node that is receiving data in the kth data channel simultaneously
transmits a noise-like busy-tone signal in the kth busy-tone channel. Any node
that wishes to transmit can only do so in a data channel in which the average
received energy is below a threshold.
(a) Describe how this protocol alleviates the hidden node problem.
(b) Describe how this protocol alleviates the exposed node problem.
(c) Suppose that the busy-tone channel occupies a very narrow range of fre-
quencies. Qualitatively describe a failure mode of the protocol that results
in a collision in a data channel when multiple receivers are successfully re-
ceiving data in a given channel, but a new transmitter believes that the data
channel is available and starts transmitting. Hint: the narrow bandwidth of
the busy-tone channel causes this problem.
16 Cognitive radios
The nomenclature of cognitive radio, suggested in Reference [219], indicates the

concept of a radio that is flexible in terms of its strategy or etiquette so that it
can respond to the needs of the users and environment. The fundamental notion
of the cognitive radio is that it is aware of its users and environment and makes
decisions that maximize the link performance while minimizing adverse effects
on other links in its or a legacy network [94].
A significant driver to the investigation of cognitive radios is the observation
that allocated spectrum is not always utilized well [6]. Some researchers in cog-
nitive radios focus on the technology used to describe the logic and rules used by
the radio [153]. Game-theoretic models for developing dynamic spectrum access
techniques have been investigated [161], and punishment techniques have been
considered for users that behave poorly [354]. Conversely, some research focuses
on information-theoretic investigations [80, 329] that often assume some level of
cooperation between cognitive radios. At a deeper level, an aggressive definition
would be for a cognitive radio to try strategies and then learn from them how
well a particular strategy works, although we will not require that high level of
cognition.
As an aside, it has been common in engineering literature to require that a cog-
nitive system be of greater sophistication than an adaptive system. Amusingly,
the engineering literature is at odds with the typical usage of these terms. As
one can immediately recognize, a system must be cognizant of its environment
before it can be adaptive to it. Consequently, typical usage of these terms would
indicate that adaptive systems must have greater sophistication than cognitive
systems. However, we will not attempt to correct this abuse of the language.
Many basic cognitive concepts are employed by communication systems with-
out using the term cognitive. As an example, WiFi systems typically employ a
carrier-sense scheme to avoid interfering with other communication links. In addi-
tion, many of these systems can switch to other carrier frequencies if a particular
frequency channel seems to be oversubscribed. On the basis of link attenuation,
modulation, coding, and data rate are adapted. While these are cognitive capa-
bilities in a general sense, these radios are generally not given credit for their
cognitive skills.
The cognitive term is usually applied to systems that attempt to modify their
waveforms to specifically reduce interference. A potential source of confusion in
16.1 Cognitive radio channel 521
the discussion of cognitive radios is in the assumption of homogeneous versus

heterogeneous radio environments. Some of the literature assumes that all the
radios operate in a cooperative cognitive fashion, while other papers assume
that cognitive radios must operate in the presence of legacy radio links. Early
practical applications of cognitive radios often assume the latter. A significant
motivation for operating in bands allocated to legacy systems is the observation
that many spectral allocations are underutilized. A specific example is the televi-
sion band. Because of the frequency-reuse model employed in the United States
by the Federal Communications Commission (FCC), the spectral allocations for
television broadcast stations tends to have relatively sparse spatial-spectral oc-
cupancy. A cognitive radio solution, which is addressed in IEEE 802.22 [152], is
to have radios find television bands that are underused and operate low-power
links in these bands. In order to do this “safely,” the cognitive radios need to
have some understanding of signals in the legacy system and of the potential
adverse effects on the legacy links.
Another driving force behind the development of cognitive radios is the devel-
opment of software-definable radios [219]. The dramatic increase in computation
capabilities and the decrease in power requirements have allowed implementation
of complex radio waveforms on these flexible platforms; thus, software-definable
radios enable waveforms to be developed after the radio hardware has been con-
structed on a given platform. These waveforms can be optimized for the specific
needs of the link environment.
In this chapter, we provide a survey of some of the concepts associated with
cognitive radios, while placing greater emphasis on the signal processing tools
necessary to detect legacy systems [303] and to minimize adverse effects on net-
works. Some of the concepts related to the information theoretic treatment of
the MIMO cognitive interference channel are discussed in Section 12.4. To be
fair, maybe this chapter should have been entitled “Tools for cognitive radio.”
16.1 Cognitive radio channel
The basic cognitive problem is similar in topology to the 2 × 2 interference

channel that was discussed in Chapter 12. As displayed in Figure 16.1, the typical
cognitive radio channel is characterized by a legacy transmitter and receiver, and
by a secondary transmitter and receiver. The difference between the cognitive
radio channel and the classic 2 × 2 channel is that the optimization metric is
different. The secondary transmitter is obligated to minimize the adverse effect
on the legacy system [329].
Under the assumption of flat-fading channels, the simple channel can be
characterized by the relationships for received signals
Z1 = H1,1 S1 + H2,1 S2 + N1
Z2 = H2,2 S2 + H1,2 S1 + N2 , (16.1)
522 Cognitive radios
Figure 16.1 Layout of the basic cognitive radio problem.
where the ns samples observed at the nr 1 antennas of the legacy and nr 2 antennas
of the secondary receivers are indicated by Z1 ∈ Cn r 1 ×n s and Z2 ∈ Cn r 2 ×n s . The
nt 1 antennas of the legacy and nt 2 antennas of the secondary transmitters trans-
mit a complex baseband sequence indicated by S1 ∈ Cn t 1 ×n s and S2 ∈ Ct r 2 ×n s .
All the channel matrices between transmitters and receivers of the legacy and
the secondary links are indicated by H1,1 ∈ Cn r 1 ×n t 1 , H1,2 ∈ Cn r 2 ×n t 1 , H2,2 ∈
Cn r 2 ×n t 2 , and H2,1 ∈ Cn r 1 ×n t 2 . Finally, the additive complex circularly sym-
metric Gaussian noise is indicated by N1 ∈ Cn r 1 ×n s and N2 ∈ Cn r 2 ×n s .
16.1.1 Cooperative cognitive links

It is sometimes assumed in the information-theoretic literature discussing the
cognitive radios channel that the users can cooperate at some level [80, 329].
Potentially, some portion of the message could be shared and some portion pro-
tected [187]. It is worth noting, however, that for the vast majority of radio
systems this assumption is unrealistic.
Under the assumption that the cognitive link operates in the presence of an-
other link in the environment rather than avoiding it, it is in the cognitive radio’s
interest to employ some type of interference mitigation. In Reference [80], bounds
are investigated on achievable radio link performance for the interference channel
in which one of the transmitters has full or partial knowledge of the transmission
of the other transmitter. By employing variants of dirty-paper coding (discussed
in Section 5.3.4), performance limits are constructed.
16.2 Cognitive spectral scavenging
A common use of the term cognitive radio is to denote a radio that finds an
unused portion of spectrum and operates there. The radio determines if a legacy
signal is operating in a given band by exploiting techniques like those discussed
in Section 16.3.
16.2 Cognitive spectral scavenging 523
While finding and using underemployed spectrum seems like a simple enough
prospect, there are a number of practical issues. First, because of spectral licens-
ing reasons, not all empty bands are open for scavenging. A cognitive radio of
this type would have to know the radio’s location and the local applicable regu-
lations. Second, a radio is only useful if at least two radios decide to use the same
band to communicate. In order to achieve this consensus, there are a few possi-
ble approaches. One potential solution is to expect that all cognitive radios in a
region to have the same spectral-selection algorithm and consequently come to
the same spectral-selection conclusion, given similar environmental observations.
However, because the observations are not identical, consensus is not guaranteed.
Another potential solution is to employ a preagreed-upon control channel. This
could be in a licensed band. In the control channel, the radios could agree upon
a common spectrally scavenged channel for data communications.
16.2.1 Orthogonal-frequency-division multiple access

By assuming that the spectrum of interest is reasonably constrained to a region
that can by covered by an orthogonal frequency-division multiplexing (OFDM)
symbol, orthogonal frequency-division multiple access (OFDMA) can be em-
ployed for frequency scavenging on a finer spectral scale. OFDMA is a multiple-
access extension to OFDM that is discussed in Section 10.5.3. In its typical usage,
OFDMA allocates subsets of subcarriers to various users. In a cognitive radio,
the concept can be extended by allowing users to avoid subcarriers occupied by
legacy users. It should be noted that if the legacy users are not synchronized
in time and frequency with a compatible OFDM modulation, the guarantee
of orthogonality is violated, as discussed in Section 10.5.3. Consequently, this
approach is not always applicable.
16.2.2 Game-theoretical analysis

It may be of use to consider the strategies of cognitive radios in terms of game
theory [161]. The users may be cooperative or non-cooperative. In the case of
non-cooperative users, it is useful to establish rules that are self-enforcing such
that the non-cooperative users will find the Nash equilibrium [227]. Under this
equilibrium, a set of user strategies is found such that no one user can do better
under the constraint that all other users do no better under some other strategies.
This equilibrium point is sometimes denoted a Pareto optimal solution. As an
extension, economic considerations can be included. If the legacy or primary
users of a set of frequency allocations are aware of the option of leasing their
allocations to secondary users, then they may decide that it is of greater value to
lease these allocations. The parameters of the lease may include power, frequency,
and duration. Multiple secondary users may vie for allocations, and multiple
primary users may be interested in leasing. Consequently, a market is established
[231].
16.3 Legacy signal detection
In the previous section, it was assumed that the legacy signal could be de-
tected and avoided. To be a good neighbor, a practical cognitive radio attempts
to avoid interfering with the legacy link by detecting and avoiding used spec-
trum. Depending upon what is known about the signal, there are many detection
approaches (for examples, see Reference [358] and references therein).
16.3.1 Known training sequence

The simplest problem is given by a situation in which the details of the legacy
waveform are known completely. In particular, if there is a known pilot or training
sequence, then specific test statistics can be constructed to identify the presence
of the training sequence. Techniques such as those described in Chapter 17 can
be employed. If, however, less is known about the system, then more general
techniques must be used.
16.3.2 Single-antenna signal energy detection

There are a variety of approaches for a single-antenna system to detect the
presence of a legacy signal. These vary in sophistication from simple energy
detectors, to chip-rate detectors, to detectors that exploit detailed information
of the waveform.
The most basic signal detector is the energy detector. If the noise level is
known, the basic question is, does the received signal energy integrated over
some period of time and at some frequency exceed that expected by the noise?
Furthermore, signals can be “channelized” or equivalently filtered to sets of fre-
quency bins if the energy is expected to have a temporal-spectral structure.
Here three types of energy detectors are considered. Detection is identified
when the observed energy exceeds some threshold under some model for the
signal and the noise background:
• Gaussian signal in Gaussian noise of known variance,
• unknown deterministic signal in Gaussian noise of known variance,
• change in estimated signal variance.
The first two detectors are useful for reasonably static environments for which
the noise floor can be estimated well enough that the knowledge of its variance
can be approximated as exact. For a dynamic environment, the third test statistic
attempts to capture increases in the observed signal variance to identify a new
signal in the environment.
Estimated Gaussian signal in Gaussian noise

If the energy of complex Gaussian noise is known to have some variance, σ 2 =
1, then the probability of detection Pd versus probability of false alarm Pf a
of a complex Gaussian signal can be calculated as a function of some power

threshold η.
The probability density of the measurement of signal energy for ns sampled
observations of a complex Gaussian distribution is given by the complex χ2
density with ns complex degrees of freedom that was discussed in Section 3.1.11.
The ns samples of the observed signal are given by the row vector z ∈ C1×n s . It
is assumed here that the variance of signal power is given by σs2 with noise power
σn2 = 1. When the legacy signal is present, under the assumption that the signal
is incoherent with respect to the noise, the total variance is given by σs2 + 1. The
observed integrated energy q of the sampled signal is given by
q= z 2

= {z}m 2
. (16.2)
m
The probability density for the integrated energy q of the observed Gaussian
signal z of some variance σ 2 is given by the complex central χ2 distribution with
ns complex degrees of freedom as defined in Section 3.1.11:
q n s −1 q
pC 2
χ 2 (q; ns , σ ) dq = n e− σ 2 dq . (16.3)
(σ 2 ) s Γ(ns )
A detection is declared if some threshold η is exceeded. The probability of

detection Pd for some threshold is then given by
∞
Pd = dq pC 2
χ 2 (q; ns , σs + 1)
η

1 η
=1− γ ns , 2 , (16.4)
Γ(ns ) σs + 1
where the result in Equation (3.44) is employed. The probability of a false alarm
Pf a under the assumption of unity noise variance is given by
∞
Pf a = dq pC
χ 2 (q; ns , 1)
η
1
=1− γ(ns , η) . (16.5)
Γ(ns )
In Figure 16.2, the probabilities of detection and false alarm are presented
under the assumption of a signal and noise variance of 0 dB with 10 observa-
tions. By evaluating the probability of detection and probability of false alarm
for a range of thresholds, a receiver operating characteristic (ROC) curve can
be generated as discussed in Section 3.7.3. In Figure 16.3, the probability of
detection as a function of probability of false alarm is presented under the as-
sumption of a signal and noise variance of 0 dB with either 1 or 10 observations.
1.0
0.8
Probability
0.6
0.4
0.2
0.0
0 5 10 15 20 25 30
Threshold
Figure 16.2 Single-antenna energy detection probability of false alarm (black) and
probability of detection (gray) for a Gaussian signal in the presence of Gaussian noise,
assuming an SNR of 0 dB for 10 observations.
1.0
Probability of Detection
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Probability of False Alarm
Figure 16.3 Single-antenna energy detection probability of detection as a function of

probability of false alarm for a Gaussian signal in the presence of Gaussian noise,
assuming an integrated SNR of 0 dB and 10 dB for 1 (black) and 10 observations
(gray).
Under the assumption of 10 samples, the receiver operating characteristic curve

is noticeably improved. One can interpret this improvement in performance by
observing that large fluctuations in noise are relatively uncommon, so by observ-
ing a large number of samples, the contributions of possibly confusing fluctua-
tions are reduced.
Unknown deterministic signal in Gaussian noise of known variance

Under the model of an unknown, but deterministic signal a s plus a complex
Gaussian noise of known variance σn2 = 1, the norm squared of the received row
vector of ns complex samples is given by q,
2
q = as + n . (16.6)
Given this model, the distribution for the norm squared of the received signal
vector is the complex noncentral χ2 distribution.
The probability density for the complex noncentral χ2 distribution as defined
in Section 3.1.12,
(n s −1)/2
1 −(q + ν C ) q
pC
χ2 (q; ns , σ 2
n , ν C
) dq = 2
e 2 C
In s −1 2 ν C q dq ,
σn σn ν
(16.7)
where for unit-variance complex Gaussian noise, the complex noncentrality pa-
rameter ν C is given by
1
νC = 2 {a s}m 2
σn m
2 2
= a s . (16.8)
The cumulative distribution function PχC2 for the complex χ2 random variable is
given by
η
C C
2
Pχ 2 (η; ns , σn , ν ) = dq pC 2 C
χ 2 (q; ns , σn , ν )
0
∞ C m
−ν C ν γ(m + ns , η)
=e , (16.9)
m =0
m! Γ(m + ns )
where γ(·, ·) indicates the lower incomplete gamma function that is defined in
Section 2.14.1.
The probability of detection is given by the integral of the threshold η up to
infinity. Consequently, the probability Pd is given by
Pd = 1 − PχC2 (η; ns , 1, a 2 s 2 )
∞ m
−ν C a 2 s 2 γ(m + ns , η)
=1−e . (16.10)
m =0
m! Γ(m + ns )
The evaluation for the probability of false alarm Pf a is that same as that for the
known Gaussian assumption found in Equation (16.5):
1
Pf a = 1 − γ(ns , η) . (16.11)
Γ(ns )
Similar to the process in the previous section, by evaluating the probability
of detection and probability of false alarm for a range of thresholds, a receiver
operating characteristic curve can be generated. In Figure 16.4, the probability
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Figure 16.4 Single-antenna energy detection probability of detection as a function of

probability of false alarm for an unknown deterministic signal (dashed) and a
Gaussian signal (solid) in the presence of Gaussian noise, assuming an integrated
SNR of 0 dB and 10 dB for 1 (black) and 10 observations (gray).
of detection as a function of probability of false alarm is presented as dashed

lines under the assumption of a signal with noise variance of 0 dB with either
1 or 10 observations and with a noncentrality parameter of either 1 or 10. For
comparison, the receiver operating characteristic curves, copied from Figure 16.3,
are also displayed. In general, the development using the unknown deterministic
approach provides a slightly improved probability of detection compared to the
Gaussian model. This is not surprising in that the model is more constrained
than the Gaussian model. However, the improvement is slight.
New energy legacy signal detection

A direct extension to the energy detector is the new-energy detector. In this
version, a change in the energy level is detected. This is similar to the problem
of considering the above Gaussian energy detection approach in which the noise
variance must be estimated. It may be worth noting that the same new-energy
test statistic can be used to detect a departing signal by swapping the data
used for “new” and “old” references. Under the assumption that the variance
of the noise level can be estimated over a long period of time, the uncertainty
in this variance goes to zero, and the result is identical to the energy detection
performance in the previous sections. However, if the power levels must be esti-
mated over a short period of time because the environment is changing relatively
quickly, then estimates of the variance will have some uncertainty that adversely
affects the detection performance.
The performance of the test statistic is determined by the uncertainty of the
2
power or variance estimation. Specifically, the signal variance changes from σold ,
which is identified as the background noise variance, to σn2 ew , which is the vari-
ance of the new signal plus the old background noise. Under the assumptions of
Gaussian signals, ns samples from two distributions (old and new) for the signals
zold ∈ C1×n s and zn ew ∈ C1×n s are given by
1
e−z o l d
2
/σ o2 l d
p(zold ) = 2n s
π n s σold
1
e−z n e w
2
/σ n2 e w
p(zn ew ) = . (16.12)
π n s σn2news
It is certainly not required for ns to be the same for estimating zold and zn ew ,
but it will be assumed here for convenience.
Detection is declared if the difference between the power estimates σ̂n2 ew and
2
σ̂old exceeds some threshold η/ns ,
η
σ̂n2 ew − σ̂old
2
> . (16.13)
ns
The probability density for either σ̂n2 ew or σ̂n2 ew under the assumption of Gaus-
sian signals and noise are given by complex χ2 distribution,
1 q
pC 2
χ 2 (q; ns , σ ) dq = q n −1 e− σ 2 dq , (16.14)
σ2 n s Γ(ns )
where the integrated energy q is given by the sum of the squared magnitudes of
n zero-mean complex Gaussians zm of variance σ 2 ,

ns
2
q= zm . (16.15)
m =1
The maximum-likelihood estimate for the variance σ 2 is found by maximizing

the probability density, given the observed signal. By setting the derivative to
zero to maximize the likelihood in the χ2 distribution,
q n s −1
e− σ 2 q (q −n s σ 2
)
∂ C
2
pχ 2 (q; ns , σ 2 ) = 2+n
∂σ σ s Γ(n )
s
= 0. (16.16)
The estimator is unsurprisingly given by
q
σ̂ 2 = . (16.17)
ns
For the sake of this discussion, the old and new power levels will be estimated
with the same number of samples ns . The probability of detection Pd of new
energy, when σ̂n2 ew − σ̂old
2
> η/ns , occurs when
qn ew − qold > η , (16.18)

where qn ew and qold correspond to the new and old energy estimates respectively.
For some σn2 ew and σold
2
, the probability of detection is given in Reference [77].
The probability of detection Pd is given by
Pd = Pr(qn ew > qold + η)

∞ ∞
= dqold dqn ew pC 2 C 2
χ 2 (qn ew ; ns , σn ew ) pχ 2 (qold ; ns , σold ) . (16.19)
0 qo l d +η
The inner integral is given by

∞ Γ ns , qσo l2d +η
dqn ew pC 2
χ 2 (qn ew ; ns , σn ew ) =
n ew
, (16.20)
qo l d +η Γ(ns )
where the function Γ(n, a) is the upper incomplete gamma function discussed in
Section 2.14.1 that is defined by
∞
Γ(n, a) = dx xn −1 e−x . (16.21)
a
For integer values of n > 0, the upper incomplete gamma function is given by

n −1
−a am
Γ(n, a) = Γ(n) e . (16.22)
m =0
m!
Consequently, the inner integral is given by

m
∞ s −1
n q o l d +η
−
qold + η
σ n2 e w
dqn ew pC 2 2
σn
χ 2 qn ew ; ns , σn ew = e . (16.23)
ew
qo l d +η m =0
m!
m
By exploiting the binomial theorem, the term (qold + η) is given by
m

m
(η)m −k ,
m k
(qold + η) = qold (16.24)
k
k =0
where the binomial coefficient is indicated by

m m!
= . (16.25)
k k! (m − k)!
With this form, the inner integral in Equation (16.19) is given by

qold + η s −1
n m

− 1 m
(η)m −k .
2
σn k
e ew
m qold (16.26)
m =0
m! (σn2 ew ) k
k =0
The probability of detection Pd is then given by

∞
Pd = dqold pC 2
χ 2 (qold ; ns , σold )
0

qold + η s −1
n m

− 1 m
·e (η)m −k
2
σn k
ew
m qold
m =0
m! (σn2 ew ) k
k =0
∞ n s −1 q
qold − o2 l d
σ
= dqold 2 )n Γ(n )
e old
0 (σold s s

qold + η s −1
n m

− 1 m
·e (η)m −k
2
σn k
ew
m qold
m =0
m! (σn2 ew ) k
k =0
∞ σ2 +σ2
1 −q o l d o2l d 2n e w
σ σ
= dqold 2 )n s
e old n ew
0 (σold Γ(ns )

η ns −1 m

− 1 m n s +k −1
·e (η)m −k .
2
σn ew
m qold (16.27)
m =0
m! (σn2 ew ) k
k =0
By reordering the terms and moving the integral to the end, the probability of
detection Pd becomes
1
Pd = 2 )n s Γ(n )
(σold s

η s −1
n m

− σ2 1 m
·e n ew
m (η)m −k
m =0
m! (σn2 ew ) k
k =0
∞ σ o2 l d + σ n
2
n s + k −1 −q o l d ew
· dqold qold e σ2 σ2
old n ew
0
1
= 2 )n s Γ(n )
(σold s

η s −1
n m
− σ2 1 m
·e n ew
2 m (η)m −k
m! (σ n ew ) k
m =0 k =0
2 2
n s +k
σold σn ew
(ns + k − 1)! 2 + σ2
σold n ew
s −1
n m n s +k
− σ 2η (η)m −k (ns + k − 1)! 2
σold σn2 ew
=e n ew
ns .
s − 1)! k! (m − k)!
2 2 m 2 + σ2
m =0 k =0
(σold ) (σn ew ) (n σold n ew
(16.28)
The integral is found by using the parameter definition
2
σold + σn2 ew
b= 2 (16.29)
σold σn2 ew
and the substitution
x = qold b . (16.30)
1.0
0.8
Probability
0.6
0.4
0.2
0.0
0 5 10 15 20 25 30
Threshold
Figure 16.5 Probability of detection (upper curve, gray) and probability of false alarm
(lower curve, black) of new energy as a function of threshold η for the example of
ns = 10 samples and with variances of σn2 e w = 2 and σo2 l d = 1.
The probability of false alarm Pf a is given by the probability of the appearance

of change in the estimated variance fluctuating above the threshold even though
the actual variance remains the same. This is given by setting σn2 ew = σold 2
= σ2
in Equation (16.28). By using this substitution, the probability of false alarm
Pf a is given by
s −1
n m 2 2 n s +k
η (η)m −k (ns + k − 1)! σ σ
Pf a = e− σ 2 2 )n s (σ 2 )m (n − 1)! k! (m − k)! 2 + σ2
m =0 k =0
(σ s σ
m n s +k
s −1
η m −k
n
η (ns + k − 1)! 1
= e− σ 2 . (16.31)
m =0
σ 2 (ns − 1)! k! (m − k)! 2
k =0
For the example of integrating over ns = 10 samples and with new signal-
plus-noise and old noise variances of σn2 ew = 2 and σold
2
= 1, the probabilities of
detection and false alarm as a function of the threshold η are displayed in Figure
16.5. As the threshold value goes to 0, the probability of detection becomes large,
approaching 1, and the probability of false alarm approaches 1/2. The false-alarm
limit occurs because, at threshold value of zero, either the new or old variance
estimate can fluctuate to be larger with equal probability. In Figure 16.6, the
receiver operating curve that gives the probability of detection as a function of
probability of false alarm is displayed, given the above parameters. For compar-
ison, the single-antenna energy detection for an unknown deterministic signal
and a Gaussian signal in the presence of Gaussian noise are presented. As one
would expect, the performance of the new-energy test statistic is worse because
less is known about the environment compared to the other test statistics. At
the probability of detection of 0.8, the false-alarm rate of the new-energy test
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Figure 16.6 Single-antenna receiver operation characteristic curve for new-energy

detection (dashed black), giving the probability of detection as a function of the
probability of false alarm for the example of ns = 10 samples and with variances of
σn2 e w = 2 and σo2 l d = 1. For comparison, the upper two curves are the single-antenna
energy detection for an unknown deterministic signal (gray) and a Gaussian signal
(black) in the presence of Gaussian noise.
statistic about 0.18, compared with 0.08 for detecting a unknown deterministic
signal.
Single-antenna chip-rate legacy signal detection

The chip-rate (or equivalently symbol-rate) detector exploits the cyclostationary
characteristic that many communication signals exhibit [230, 73, 257, 106, 300].
This detector detects the regular changes that occur as the signal transitions
from one symbol to the next. The cyclostationary characteristic of some signals
is also employed to achieve synchronization [41]. The periodicity of these changes
are detectable and can be observed in the complex ambiguity function (CAF)
surface that depicts the cyclostationary characteristic of this waveform. For some
period of integration T , this surface is given by the autocorrelation φ(τ, ω) under
shifts in delay τ and angular frequency ω,
T
0
dt z(t + τ /2) z ∗ (t − τ /2) e−iω t
φ(τ, ω) = T
. (16.32)
0
dt z(t) z ∗ (t)
In Figure 16.7, an example is displayed of the cyclostationary structure of a

binary phase-shift keying (BPSK) of a sequence with 200 chips sampled at five
times the chip rate. The CAF surface is displayed in Figure 16.7. The artifact
from the chip rate can be observed at 1/5th of the measurement rate. While
2 1
0.8
1
Chips Delay
0.6
0
0.4
−1
0.2
−2
0 0.1 0.2 0.3 0.4
Fractional Frequency Offset
Figure 16.7 Complex ambiguity function surface, φ(τ, ω), for a 200-chip binary
phase-shift-keying signal.
filtering or pulse shaping employed by many communication systems to suppress

spectral sidelobes can reduce the strength of the cyclostationary peaks, most
single carrier communication systems have useful sufficient cyclostationary peaks
to be detected.
16.3.3 Multiple-antenna legacy signal detection

A receiver with multiple antennas enables an interesting set of extensions to
the single-antenna signal detectors. In general, the ability to detect signals is
improved, potentially reducing the probability of interfering with the legacy sys-
tem.
By using the MIMO channel defined in Equation (8.3) with ns samples of
received signals on nr antennas, the received signal is given by the received
signal matrix Z ∈ Cn r ×n s ,
Z = HS + N, (16.33)
where the channel matrix defining the complex attenuation between transmit
and receive antennas is indicated by H ∈ Cn r ×n t , the transmitted signal is given
by S ∈ Cn t ×n s , and the complex additive interference plus noise is denoted
N ∈ Cn r ×n s .
Multiple-antenna energy detection

If the interference-plus-noise covariance matrix R ∈ Cn r ×n r is known or esti-
mated well, then it is useful to consider the whitened data matrix that is defined
here as Z̃,
Z̃ = R−1/2 Z . (16.34)
The total energy of the whitened received signal plus noise ρ received by an
array of antennas is given by the Frobenius norm squared of the received signal
matrix, which is given by
ρ = Z̃ 2F = tr{Z̃ Z̃† }

= {R−1/2 H S}m ,n + R−1/2 {N}m ,n 2
. (16.35)
m ,n
By construction, the whitened interference plus noise is sampled from a circular

complex Gaussian distribution with unit variance, such that
5 6
{R−1/2 N}m ,n 2
= 1, (16.36)
and the distribution of total received energy ρ is given by the complex noncentral
χ2 distribution of complex degree nr · ns , as described in Section 3.1.12. The
noncentrality parameter ν C is given by the sum of the standard deviation-
normalized Gaussian means,
1
νC = μk 2
, (16.37)
σ2
k
where μk and σ 2 are the mean and variance for each Gaussian. From Equation
(16.36), the standard deviation is one σ = 1, and the noncentrality parameter
ν C is given by
$
$ −1/2
$2
$
νC = ${R H S}m ,n $
m ,n
0 1
= tr R−1/2 H S S† H† R−1/2
3 4
= tr R−1 H S S† H† . (16.38)
The complex noncentrality parameter ν C can be interpreted as the total inte-

grated spatially whitened energy received across time and antennas. The value
of total received energy ρ is drawn from the noncentral complex χ2 distribution
given by
ρ ∼ pC C
χ 2 (r ; nr ns , ν ) . (16.39)
The probability of detection Pd is given by the probability that the value of total
received energy ρ exceeds some threshold η,
Pd = Pr{ρ > η} (16.40)

∞
= dρ pC C
χ 2 (ρ; nr ns , ν )
η0
η0
=1− dρ pC C
χ 2 (ρ; nr ns , ν )
0
3 4
= 1 − Pχ 2 (η0 ; nr ns , tr R−1 H S S† H† )
∞ 3 −1 4m
tr R H S S† H† γ(m + nr ns , η0 )
= 1 − e−tr {R H S S H }
−1 † †
.
m =0
m! Γ(m + nr ns )
The probability of false alarm Pf a is given by the probability that the received
energy ρ, in the absence of a signal, fluctuates above the threshold η. The prob-
ability density for ρ in this environment is given by
ρ ∼ pC 2
χ 2 (ρ; nr ns , σ = 1) . (16.41)
This false-alarm probability is given by

∞
Pf a = dρ pC 2
χ 2 (ρ; nr ns , σ = 1)
η
η
=1− dρ pC 2
χ 2 (ρ; nr ns , σ = 1)
0
=1− PχC2 (η; nr ns , σ 2 = 1)
1
=1− γ(nr ns , η) . (16.42)
Γ(nr ns )
As an example, the receiver operating characteristic curve, giving the probabil-

ity of detection as a function of the probability of false alarm for the example for
a multiple-antenna receiver with nr = 4 receive antennas and ns = 10 samples is
depicted in Figure 16.8. In the example, the noncentrality parameter is assumed
to be ν C = nr ns . For comparison, the single-antenna deterministic signal test
statistic in Gaussian noise is provided. The multiple-antenna version provides
significantly improved performance.
Multiple-antenna new-energy detection

For the single-antenna receiver, the new energy was characterized by a change
in the estimated power observed at the receiver. Similarly, the detection of new
energy by a multiple-antenna system is achieved by considering the change in the
multiple-antenna extension of the power estimate. There are a very large number
of potential methods for measuring a change in a multiple-antenna signal, which
is the spatial covariance estimate. The difference between two matrices can be
measured in a variety of ways. We can search for changes in total observed
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Figure 16.8 Receiver operation characteristic curve for a unknown deterministic signal.
The probability of detection as a function of the probability of false alarm for the
example for a multiple-antenna receiver with nr = 4 receive antennas and ns = 10
samples (black). The noncentrality parameter ν C is nr ns . As reference, the
single-antenna performance (gray) is also displayed.
energy. Also, we can look for changes in the spatial structure. Additionally, we
can attempt to find changes in specific waveform characteristics. Of these few
examples, we consider the first two in the following discussion.
The change in the total receive energy for some number of samples between
the old Zold and new Zn ew observations is given by
Zn ew 2
F − Zold 2
F > η, (16.43)
where η is the test statistic threshold. However, this approach has the disadvan-
tage of ignoring the spatial structure of the received signal.
Another approach is to consider using the old data to estimate the interference-
plus-noise spatial covariance matrix R̂old
1
R̂old = Zold Z†old . (16.44)
ns
Similar to the method used in the previous section, the data are whitened by
using this covariance matrix estimate,
−1/2
Z̃ = R̂old Zn ew . (16.45)
The total receive energy of the estimated whitened data is then given by
2
ρ = Z̃ F
= tr{R̂−1 †
old Zn ew Zn ew } . (16.46)
If the old and new signals are drawn from the same distribution, then one would
expect that the total receive energy of the estimated whitened data would have
a value near the product of the number of receive antennas and the number of
samples nr ns . If it is significantly greater than this value, then it is an indica-
tion of new energy. While techniques with this or similar forms can be useful,
explicit evaluation of probability of detection versus probability of false alarm
are challenging.
16.4 Optimizing spectral efficiency to minimize network

interference
In previous sections, an implicit assumption has been employed. It has been

assumed that the cognitive radio could determine that it is operating in a white
space or open space. This is a spectral region in which the cognitive radio could
operate without concern of interfering. Here it is assumed that the radio is going
to operate in a regime in which a legacy radio network is operating. The legacy
radio network is assumed to be underutilizing its resources such that the cognitive
radio can also operate in the band. The legacy radio is transmitting packets
pseudorandomly over frequency and time, and transmitters are located randomly
with an uniform distribution over a two-dimensional space.
If the cognitive radio has a packet of some number of information bits to
deliver to its receiver, then the goal is to optimize the waveform to minimize
the adverse effect on the legacy system; that is, the cognitive radio would like to
optimize its waveform to be a good neighbor. This an example of the hidden-node
or hidden-terminal problem. The geometry, which is depicted in Figure 16.1, is
characterized by four nodes: a transmitter of interest, a receiver of interest, a
legacy transmitter, and legacy (hidden-node) receiver [94]. Given a reasonable set
of assumptions, the optimal radio waveform is dependent upon the environmental
characteristics exclusively. This optimization can also be applied to waveform
optimization for symmetric ad hoc wireless networks. Both SISO and uninformed
transmitter MIMO wireless communication links are considered. This analysis
follows closely that presented in [94, 29, 35].1
It is assumed that links in a legacy ad hoc wireless network are communicating
with random occupancy in frequency and time, as seen in Figure 16.9(a). The
geometry of a particular transmitter, receiver, and hidden node is depicted in
Figure 16.9(b).
There are several assumptions used in this analysis.
• The probability of interfering is relatively small, which is equivalent to saying

that the spatial, temporal, spectral occupancy of the network is not particu-
larly high so that the probability of multiple collisions can be ignored.
(a) (b)
Hidden y
Receiver
Link ri
x
f
Link
Transmitter
Hidden
t Node
Figure 16.9 (a) Displays a notional link of interest and distribution of hidden links in
time (t) and frequency (f ) in the presence of a waveform. (b) Depicts a notional
geometry of transmitter, receiver, and hidden node. The region of disruptive
interference is contained with radius ri . IEEE
c 2010. Reprinted, with permission,
from Reference [29].
• The effects of interference on the hidden node can be factored into the proba-
bility of collision and the probability that the interference-to-noise ratio (INR)
at the hidden node exceeds some critical threshold.
• The hidden node location is sampled uniformly over some large area, so we
do not have prior knowledge of what absolute level of power will cause inter-
ference.
• The average channel attenuation from the transmitter to the hidden node can
be accurately modeled by using a power-law attenuation model.
• The link performance can be characterized with reasonable accuracy by the
channel capacity.
• The hidden node does not have some interference mitigation capability that
prefers a particular waveform structure.
• Finally, the desired data rate of the link of interest is sufficiently low such that
the link has the freedom to transmit in packets with relatively low spectral-
temporal occupancy. Consequently, the optimization is developed for a single
packet of a given number of information bits.
For the transmitter of interest to cause disruptive interference at the hidden

node, the interfering signal must satisfy two requirements. First, it must overlap
with a hidden link spectrally and temporally such that transmissions collide. The
probability of collision (sufficient overlap) is denoted pc . Second, the interfering
signal must be of sufficient strength at the hidden node to cause disruptive
interference, assuming sufficient overlap in time and frequency. The probability
that the distance from the transmitter to hidden node is within sufficient range to
cause disruptive interference is denoted pr . Because we expect the probabilities of
collision and being within range to be independent, the probability of disruptive
interference is denoted pi and is given by the product of pc and pr .
16.4.1 Optimal SISO spectral efficiency

For a fixed message length, given the assumptions stated in the introduction, this
functional dependence is developed by noting that the probability of collisions
pc is linearly related to the transmitted duration and bandwidth
pc ∝ (T + TH ) (B + BH ) , (16.47)
where TH and BH are the temporal and spectral extents of the hidden node link
because of the fraction of the temporal and spectral space occupied by the links
[seen in Figure 16.9(a)]. In the case of a cognitive radio that is attempting to
transmit a larger message, the probability of collision pc is approximated well by
pc ∝ T B . (16.48)
Thus, the probability of collision is proportional to the area subtended by the

link in the temporal-spectral space, under the assumption that the distribution
of packet occupancy over frequency and time is uniform. Similarly, in the case
of an ad hoc network for which all waveforms are being optimized jointly, the
durations of the optimized waveform of interest and the hidden waveforms are
the same so that the probability of collision pc is given by
pc ∝ 4 T B ∝ T B . (16.49)
If it is assumed that the hidden node is randomly located on a plane with uni-
form density with respect to the transmitter in a two-dimensional physical space
(Figure 16.9(b)), then the probability that the hidden node is within sufficient
range pr to cause disruptive interference is proportional to the area A over which
the signal has a sufficient INR, η > ηj , at the hidden node
pr ∝ A(η > ηj ) . (16.50)
Consequently, to a good approximation, the probability of interference exceeding

some threshold pj is given by
pj ∝ T B A . (16.51)
The area is a function of the transmit energy and propagation loss to the hidden
node.
For a SISO system, the information-theoretic bound, which is introduced in
Section 5.3, on the number of bits ninfo that can be transmitted within time T
and bandwidth B is given by
ninfo ≤ T B c (16.52)
c = log2 (1 + γ) ,
γ
c̃ = log2 1 + ,
l
where c is the information-theoretic limit in bits/s/Hz on the SISO spectral
efficiency (assuming a complex modulation), and γ is the SNR at the receiver.
The bound is not achievable for finite ninfo , but it is a reasonable approximation
to the limiting performance. To approximate a more realistic rate c̃, it is assumed
that the achieved spectral efficiency is given by the information theoretic capacity
with an additional implementation loss figure l, so that γ → γ/l.
By assuming the link of interest can be approximated by modified capacity,
the SNR at the receiver can be expressed in terms of the number of bits ninfo
transmitted and the spectral efficiency
γ
ninfo ≈ T B log2 1 + ,
l
c̃
γ ≈l 2 −1 . (16.53)
If the channel gain to the hidden node is denoted b2 and the channel gain to the
receiver of interest is denoted a2 , then the INR, denoted η, at the hidden node
is
b2
η= γ. (16.54)
a2
By using a simple power-law model for loss, with the channel gain to the hidden
node proportional to r−α , the radius rj at the critical interference level (at which
η = ηj ) is found by observing the SNR γ and INR η are related by
a2 a2 a2
γ= η ∝ η ⇒ ηi
b2 r−α ri−α
rj ∝ γ 1/α = l1/α (2c̃ − 1)1/α . (16.55)
Consequently, the probability of interference pi for the SISO system is given by
pi ∝ T B A
ninfo ninfo 2
≈ A∝ rj
c̃ c̃
(2c̃ − 1)2/α
∝ . (16.56)
c̃
The optimal spectral efficiency copt for some α is given by
2
−1
2/α
∂pj 2c̃+1 (−1 + 2c ) α log(2) −1 + 2c̃
∝ − =0
∂c̃ c̃α c̃2
α + 2 W0 − 21 e−α /2 α
copt = , (16.57)
2 log(2)
where W0 (x) is the product log or principal value of the Lambert W function2
[65] that is discussed in Section 2.14.4. It is remarkable that optimal spectral
efficiency is dependent upon the channel exponent exclusively. Similar result
were found in Reference [94], and when attempting to optimize the spectral
partitioning of an interference-limited network [163].
2 The Lambert W function is the inverse function of f (W ) = W eW . The solution of this
function is multiply valued.
c [b/s/Hz]
2
0
2 3 4 5 6
α
Figure 16.10 Optimal SISO spectral efficiency c̃ for ideal coding in a static
environment as a function of transmitter-to-hidden-node channel gain exponent, α.
IEEE
In Figure 16.10, the optimal spectral efficiency for a given channel exponent,
under the assumption of ideal coding in a static channel, is displayed. In the ab-
sence of multipath scattering, the line-of-sight exponent is α = 2 (an anechoic, for
example). For α = 2, the optimal spectral efficiency approaches zero. For most
scattering environments, α = 3 to 4 [140] is a more reasonable characterization,
suggesting an optimal spectral efficiency around 2 bits/s/Hz. Heuristically, one
can interpret these results by noting that as the attenuation exponent α in-
creases, the environment attenuates the signal more quickly in range, so a better
strategy is to transmit at higher power (and thus higher spectral efficiency) and
consequently for less time. The shorter transmission reduces the probability of
collision.
16.4.2 Optimal MIMO spectral efficiency

The analysis for a MIMO link is similar to that for the SISO link. It is assumed
that the hidden node does not have an interference mitigation approach that
prefers a particular waveform structure, such as spatial interference mitigation
capability. This assumption is consistent with current communication systems.
Under the assumption of particular hidden-node mitigation strategies, the fol-
lowing analysis can be modified to address those assumptions; however, this is
beyond the scope of this chapter.
It is assumed that the number of transmitters by number of receivers, nt × nr ,
MIMO channel is not frequency selective. The received signal discussed in detail
in Chapter 8 is given by
Z = HS + N, (16.58)
where Z ∈ Cn r ×n s is the received signal, S ∈ Cn t ×n s is the transmitted signal,

H ∈ Cn r ×n t is the channel matrix, and N ∈ Cn r ×n s is the noise. The number of
transmitted symbols is ns .
By employing the results from Chapter 8, for a MIMO system with an unin-
formed transmitter (a transmitter without channel-state information), the
information-theoretic bound on the number of bits ninfo transmitted is given by
ninfo ≤ T B c

P0
c = log2 I + H H† , (16.59)
nt
where c is the bounding spectral efficiency, which is achievable as the number
of information bits ninfo approaches infinity, and P0 is the total thermal-noise-
normalized transmit power. By employing an approach similar to that used for
the SISO case, an approximation to a practical achieve rate c̃ is given by modi-
fying the SNR by a loss factor l
ninfo ≤ T B c̃

P0
c̃ = log2 I + H H† . (16.60)
l nt
Implicit in this formulation is the assumption that the interference-plus-noise
covariance matrix, is proportional to the identity matrix which is a reasonable
model for most interference-avoiding protocols.
Because the capacity is a function of a random SNR matrix, there is not a
single solution as there is in the SISO analysis. However, by assuming that the
channel matrix H = a G is proportional to a matrix sampled from an i.i.d. zero-
mean element-unit-norm-variance complex matrix, G, where a is the average
attenuation, an asymptotic analysis in the limit of a large number of antennas
like that employed in Section 8.7, a solution can be found. With this model,
the term a2 P0 is the the average SNR per receive antenna at the receiver of
interest. To simplify the analysis, it is assumed that nr = nt ≡ n. The optimal
spectral efficiency under the assumption of other ratios of number of transmitters
to receivers can be found following a similar analysis. The asymptotic capacity
c for the uninformed transmitter, discussed in Section 8.7, is given by
c a2 P 0
≈ 3 F2 ([1, 1, 3/2], [2, 3], −4 a P0 ) ≡ f (a P0 )
2 2
n log 2
√ √
4 log 4a2 P0 + 1 + 1 4a2 P0 + 1
= + 2
log(4) a P0 log(4)
1 2
− 2 −2− , (16.61)
a P0 log(4) log(4)
where p Fq is the generalized hypergeometric function [129], as discussed in Sec-
tion 2.14.2, and the function f (x) is used for notational convenience. The ap-
proximation for the achievable rate is given by modifying the SNR term in the
capacity with a loss parameter l, producing the form
c̃ a2 P 0
≈ 3 F2 ([1, 1, 3/2], [2, 3], −4 a P0 /l) ≡ f (a P0 /l)
2 2
n l log 2

4 log 4a2 P0 /l + 1 + 1 4a2 P0 /l + 1
= + 2
log(4) a P0 /l log(4)
1 2
− 2 −2− . (16.62)
a P0 /l log(4) log(4)
The SNR per receive antenna at the receiver a2 P0 can be expressed in terms
of the number of information bits ninfo transmitted and the spectral efficiency c̃,
2
a P0
ninfo = T B n f
l

c̃
a2 P0 = l f −1 . (16.63)
n
Unfortunately, a simple formulation of the functional inverse of f (x) (denoted

f −1 (y)) is not available; however, it is tractable numerically. By using a model
similar to the link of interest, if the channel to the hidden node is given by
Hhn = b2 Ghn , then the average INR per receive antenna η at the hidden node
is given by
η = b2 P0 . (16.64)
By using a similar analysis to the SISO case and power-law model for the average
channel gain b, the radius of disruptive interference ri for the average attenuation
is found by observing
a2 a2 a2
a2 P0 = η ∝ η ⇒ ηi
b2 r−α ri−α
1/α
c̃
rj ∝ (a2 P0 )1/α = l1/α f −1 . (16.65)
n
Consequently, the probability of interference pi for the MIMO system is given

approximately by
ninfo ninfo 2
pj ∝ T B A = A∝ rj
c̃ c̃
−1 2/α
f (β)
∝ , (16.66)
β
where the spectral efficiency normalized by the number of antennas is given by

β ≡ c/n, and the probability is said to be approximate because of the potential
channel fluctuations to both the receiver and to the hidden node. The optimal
spectral efficiency normalized by the number of antennas β for a given channel
Problems 545
3.5
3.0
c/n [b/s/Hz]
2.5
2.0
1.5
1.0
0.5
0.0
2 3 4 5 6
α
Figure 16.11 Optimal MIMO spectral efficiency for ideal coding in a static
environment as a function of the transmitter-to-hidden-node channel gain exponent.
IEEE
attenuation exponent α is given approximately by

−1 2/α
f (β)
βopt = argminβ . (16.67)
β
In Figure 16.11, the optimal spectral efficiency per antenna for a given channel
exponent, under the assumption of ideal coding in a static channel, is displayed.
For most scattering environments, α = 3 to 4 [140] is a reasonable characteriza-
tion, suggesting a spectral efficiency per antenna of a little more than 1 bit/s/Hz.
Problems
16.1 Consider a single observation at an nr receiver with known noise covari-

ance matrix R = I. Evaluate the probability of detection and probability of false
alarm under the assumption of a single transmitter in a Gaussian channel with
a constant modulus transmit signal as a function of SNR per receive antenna.
16.2 For a single-antenna energy detection approach under the assumptions of

circular complex Gaussian noise and a signal with SNR of 0 dB, numerically
find the minimum number of samples required to achieve a false-alarm rate of
no larger than 10−6 and a probability detection of no less than 0.9, for the
models of
(a) Gaussian signal in Gaussian noise of known variance,
(b) unknown deterministic signal in Gaussian noise of known variance.
16.3 Considering a single-antenna, new-energy detector, by assuming 10 inde-

pendent observations, for probability of false alarm of 10−5 numerically find the
required SNR for the probability of detection to be at least 0.9.
16.4 Extend the evaluation of probability of detection and false alarm for the
single-antenna, new-energy detector found in Equation (16.28) to include unequal
numbers of observations for the old and new variance estimates.
16.5 Under the assumption of a four-antenna receiver and a single transmitter
received at 0 dB SNR per antenna, numerically evaluate the receiver operat-
ing curves for the multiple-antenna, new-energy detectors defined in Equations
(16.43) and (16.46) under the assumption of
(a) ns = 4,
(b) ns = 8,
(c) ns = 32
independent samples.
16.6 Under the assumption of a four-antenna receiver and a single transmitter
with 10 samples received at 0 dB SNR per antenna, consider a modification of
that form in Equation (16.43). Replace the trace with evaluating the maximum
eigenvalue. Numerically compare the receiver operating curves of the form in
Equation (16.43) and the modified form.
16.7 For a SISO channel, under the assumptions discussed in Section 16.4.1,
evaluate an approximate optimal spectral efficiency to minimize interference with
a legacy or hidden-node network under the assumption that the frame is small
compared with the legacy waveform in time and bandwidth.
17 Multiple-antenna acquisition
and synchronization
A receiver cannot decode a signal if it is not aware of the existence of the trans-
mitted signal. Furthermore, if a transmitter and receiver are not aligned in time
and frequency, then the transmitted signal will not make any sense to the re-
ceiver. Consequently, in order to establish a wireless communication link, the
receiver must find or acquire the transmitted signal, and some sort of synchro-
nization in time and frequency between the transmitter and receiver must occur.
In this chapter, the process of acquisition and synchronization is simply denoted
synchronization. In order for two nodes to be synchronized, they must agree
on both the carrier frequency and timing. In this discussion, it is assumed that
any frequency errors are small enough that frequency synchronization can be
achieved after temporal synchronization. Extensions to the discussions provided
here would enable joint temporal and spectral synchronization. In situations in
which coherence is required for long durations, frequency is of greater impor-
tance [244], and the techniques discussed in this chapter need to be modified to
address this sensitivity.
The performance of synchronization or acquisition techniques is often charac-
terized in terms of probability of detecting the signal of interest given that it is
there versus the probability of falsely “detecting” a signal (a false alarm) given
that the signal is absent. The function relating these two probabilities for some
test statistic is often denoted the receiver operating characteristic (ROC) curve
that is discussed in Section 3.7.3.
Synchronization can be the weakest component of a communication link. This
potential weakness is exacerbated when an attempt is made to establish a link in
the presence of interference, which can effectively break many synchronization
approaches. Synchronization performance is a function of both the signal-of-
interest signal-to-noise ratio (SNR) and the interference-to-noise ratio (INR).
Here various approaches for temporal synchronization of multiple-input multiple-
output (MIMO) communication links are introduced.
Synchronization has been studied in a variety of contexts. For single-input
single-output (SISO) systems, synchronization is often achieved by finding the
peak in the correlation between received data and a known reference [287, 255].
These concepts can be extended to MIMO systems [221, 333, 201]. The discussion
in this chapter follows the discussion in Reference [36] closely.1
1 Portions of this Chapter are IEEE
548 Multiple-antenna acquisition and synchronization
17.1 Flat-fading MIMO model
Here it is assumed that static signal and interference channels are not frequency
selective. Synchronization in frequency-selective channels is beyond the scope of
this chapter, but is discussed in Reference [36]. By extending the MIMO model
used in Chapter 8 to include a delay τ in time, the received signal at nr receive
antennas at time t z(t) ∈ Cn r ×1 as a function of t is described by using the
following form,
z(t) = H s(t − τ ) + n(t) , (17.1)
where H ∈ Cn r ×n t is the flat-fading channel matrix, s(t) ∈ Cn t ×1 is the vector
of transmitted signals at the nt transmit antennas, and n(t) ∈ Cn r ×1 is the
interference and complex circular additive Gaussian noise.
We can rewrite the MIMO model in terms of sampled blocks of data as dis-
cussed in Section 8.10,
Z = HS + N, (17.2)
where Z ∈ Cn r ×n s is the received data matrix, H ∈ Cn r ×n t is the flat-fading
channel matrix, S ∈ Cn t ×n s is the transmitted signal matrix, and N ∈ Cn r ×n s is
the noise-plus-interference matrix. For notational convenience, here the received
complex baseband signal, at some delay τ the matrix Zτ ∈ Cn r ×n s , is defined in
terms of the time-dependent form of the received signal
Zτ = (z(0 Ts − τ ) z(1 Ts − τ ) ··· z([ns − 1] Ts − τ )) , (17.3)
where Ts is the sample period, and s(t) and z(t) are the continuous transmit and
received vectors as a function of time t.
Implicit in the following discussion is the concept that the flat-fading channel
does not introduce resolvable delay. The delay introduced by a SISO channel
can be represented by the sum of contributions at various delays. A flat fading
channel could be represented by a single nonzero coefficient in this sum. Similarly,
a MIMO channel can be represented by a sum of channel matrices at various
delays as discussed in Section 10.4.
17.2 Flat-fading MIMO delay-estimation bound
The best unbiased estimate of the delay can be found by evaluating the Cramer–
Rao bound. The Cramer–Rao bound (discussed in Section 3.8) that is developed
here for the product of the root-mean-squared baseband complex-signal band-
width, Br m s , and the standard deviation of delay estimation, στ , is given by
1
στ = √
2π Br m s ρ
3 4
ρ ≡ ns tr P H† R−1 H , (17.4)
17.2 Flat-fading MIMO delay-estimation bound 549
where the spatial covariance matrix of interference plus noise is given by R, and
the transmit array emits ns samples from each antenna with transmit spatial
covariance matrix P. The variable ρ can be interpreted as the total received
integrated SNR (in which the coherent integration of the signal scales as n2s
and the incoherent integration of the noise scales as ns ), where the integration
occurs over the ns independent complex samples and at the output of nt adaptive
beamformers.
If the signals emitted from the transmit antennas are independent, then P =
P0 /nt I, where P0 is the total transmitted power. If ρ ≥ 1/π 2 , then, based on the
Cramer–Rao bound, the error in temporal synchronization in the context of ac-
quisition should be less than a half of a complex sample at the Nyquist sampling
rate. However, at this SNR, the false-alarm rate would be unacceptably large.
Consequently, the typical synchronization performance is not set by the Cramer–
Rao bound, but by the probability of detection and false alarm. Nonetheless, for
completeness, we will present the bound here under the assumption of Gaussian
interference and noise with a signal in the mean.
The delay-estimation bound is developed for the flat-fading MIMO channel
model. The probability density function p(z(t); τ ) for the received signal z(t)
described in Equation (17.1) under the assumption the signal is transmitted at
some delay τ is given by
† −1
e−(z(t)−H s(t−τ )) R (z(t)−H s(t−τ ))
p(z(t); τ ) = , (17.5)
π n r |R|
where R ∈ Cn r ×n r is the spatial interference-plus-noise covariance matrix defined

by
% &
R = n(t)n†(t) . (17.6)
By using the notation
∂s(t − τ )
s (t − τ ) = , (17.7)
∂τ
the partial derivative of the log probability density for z(t) is given by
∂ log p(z(t); τ )
= n† (t) R−1 H s(t − τ ) + h.c. , (17.8)
∂τ
where h.c. indicates the Hermitian conjugate of the first term. The partial deriva-
tive of the log probability density p(Zτ ) for data matrix Zτ at delay τ is given
by
∂ log p(Zτ ) †
= n (mTs ) R−1 H s(mTs − τ ) + h.c. (17.9)
∂τ m
The Fisher information J, from Section 3.8, is given by

' 2 (
∂ log p(Zτ )
J= (17.10)
∂τ
'$ $2 (
$ $
$ † −1 $
= $ n (mTs ) R H s (mTs − τ ) + h.c. $
$m $
% &
= n†(mTs ) R−1 H s(mTs − τ ) [s(mTs − τ )]† H† R−1 n(mTs )
m
3% & 4
= tr s(mTs − τ ) [s(mTs − τ )]† H† R−1 H ,
m
using the independence of n(t) and s(t), the expectation of cross terms is zero,
and the assumption that s and n are critically sampled (sampled at the Nyquist
rate). Here it is assumed that the channel matrix H is deterministic. By consid-
ering the signals transmitted from each transmitter to be stochastic and to have
the same spectral support with the Fourier transform indicated by s̃(f ), and by
noting the definition for the root-mean-squared bandwidth of the signal Br m s
implies
% &
tr df f 2 s̃(f ) s̃† (f )
2
Br m s = % &
tr dfs̃(f ) s̃† (f )
% &
1 tr s(t − τ ) (s(t − τ ))†
= . (17.11)
(2π)2 tr s(t − τ ) s†(t − τ )
Here it is assumed that s(t) is a complex baseband signal with mean frequency of
zero. The expectation of the derivative found in the right-hand side of Equation
(17.11) simplifies to the following for the Fisher information J,
J = (2π)2 Br2m s
3% & 4
· tr s(mTs − τ )s†(mTs − τ ) H† R−1 H
m
3 4
= (2π) Br2m s ns tr P H† R−1 H ,
2
(17.12)
where the elements in s(t) are defined to have an expected unit variance. Conse-
quently, the uncertainty bound for the estimation of τ in terms of the standard
deviation is given by
στ = J −1/2 (17.13)
and is equal to the form in Equation (17.4).
17.3 Synchronization as hypothesis testing
As discussed in the previous section, synchronization is, in principle, a continuous

parameter-estimation problem. In practice, most digital communication systems
17.3 Synchronization as hypothesis testing 551
require synchronization no better than half a sample period. Finer timing align-
ment is often left to other components in the receiver. Furthermore, when a link
is being established in a channel that has delay spread caused by multipath scat-
tering, which is common in non-line-of-sight, ground-to-ground communications,
there may not be any single, well-defined delay that specifies synchronization. A
receiver designed to operate in environments with frequency-selective channels
(those with resolvable delay spread) will be able to compensate for some finite
delay spread. Consequently, if the temporal alignment is found to be within some
window (determined by the details of the receiver), then successful synchroniza-
tion can be claimed [319].
Given a discrete set of potential timing offsets, in principle, synchronization is
a multiple statistical hypothesis test. In the limit of a large number of potential
hypotheses with independent measurements, the statistical interdependence di-
minishes. As an example, if one knew that one of two tests must be the correct
synchronization delay, then a “not selected” outcome for one delay strongly af-
fects the likelihood of the other delay being the correct delay. Conversely, for a
billion potential delays, a “not selected” outcome for a given delay has essentially
no effect on the likelihood of another test point. Consequently, synchronization
for the situation in which there is no constraint on the set of potential delays
can be viewed as a sequence of statistical independent binary hypothesis tests
[255, 174]. This approach is also appropriate for typical practical implementa-
tions.
At each timing offset, the first hypothesis is that the signal of interest is prop-
erly aligned in time. The second hypothesis is that the signal is misaligned or
does not exist. At each test point in time, a statistical criterion (known as test
statistic) is evaluated, given the observed data. Synchronization is declared if
the test statistic threshold is exceeded. The performance of a synchronization
test statistic is characterized by the probability that synchronization is detected,
given the correct timing offset (within the allowed receiver window), versus the
probability of a false alarm that occurs if synchronization is declared in error.
By varying the threshold, a receiver operating characteristic (ROC) curve (that
is discussed in Section 3.7.3) in the space of the probability of detection versus
the probability of a false alarm can be constructed.
As a practical matter, it is sometimes useful to consider a two-stage synchro-
nization process in which a coarse synchronization search is followed by a fine
synchronization search. Once a coarse synchronization is detected, a search in the
neighborhood of that timing offset may be employed. Particularly for frequency-
selective environments, in which multiple delays in some local region may satisfy
the detection statistic threshold, selecting the delay with the largest test statistic
value may improve link performance, depending upon the receiver.
17.3.1 Motivations for test statistic approaches

Considering synchronization as a detection problem, a variety of approaches
are explored in this chapter. Multiple-antenna detection test statistics assuming
flat-fading and frequency-selective channels are developed. Techniques employed

to develop test statistics include test statistics based on correlation [287, 255].
New techniques based on least-squared channel estimation [30], MMSE equalizer
formulation [115, 223], the generalized-likelihood ratio test (GLRT) [312, 173],
and those inspired by test statistic invariance [275, 175] are introduced.
In composite hypotheses tests where the prior distributions of the nuisance
parameters are unknown (as is the case studied in the sequel), the optimal test
statistic is not clearly specified [312]. However, it has been observed that for
most physical problems, the GLRT can give “satisfactory” results [312, 174]. In
addition, by restricting the class of decision rules for test statistics, it is possible
to derive invariant statistics [275, 179] that have provided desirable receiver
operating characteristics.
17.4 Test statistics for flat-fading channels
In this section, a number of synchronization test statistics are introduced. In the

following discussions, it is useful to think of the reference being at a fixed delay
and for the sampled test data at various delays Zτ to be compared with the
reference signal by using some test statistic.
17.4.1 Correlation
The standard SISO link detection test statistic is given by simply correlating
the observed data with respect to a known reference signal at a given delay τ
[287, 255], which is optimal for a SISO link in additive Gaussian noise. Because
this is a MIMO correlation, the output of the correlation is a matrix, Γτ ∈
Cn r ×n t , given by
Γτ = Zτ S† , (17.14)
where the matrix elements correspond to the inner products between the row
vectors in Zτ and the row vectors in the reference signals S. Strong correlation
corresponds to large elements in Γτ . There are a variety of ways of exploiting
the structure in Γτ . As a reference, a noncoherent, multiple-antenna, combin-
ing approach is considered. If the magnitudes squared of the elements in Γτ are
summed, then the expected evaluation of the sum in the absence of the syn-
chronization signal is proportional to the noise variance. This is the test statistic
constructed by noncoherently combining the standard SISO test statistic for each
transmit–receive antenna pair. Consequently, the Frobenius norm squared is a
useful measure of the strength of the correlation, defined by
Γτ 2
F = tr{Γτ Γ†τ } = tr{(Zτ S† ) (Zτ S† )† }

= zm s†n 2 , (17.15)
m ,n
where zm and sn indicate the mth and nth row vectors in Zτ and S respectively.
The form Γτ 2F is sensitive to channel gain and signal power and is bound by

Γτ 2F ≤ zm 2 sn 2 = Zτ 2F S 2F . (17.16)
m ,n
To construct a test statistic φcor (τ ) that is insensitive to variations in channel

gain or transmit power, the Frobenius norm squared can be normalized by the
sum squared of the norms of the row vectors of Zτ and S:
$ $
$Zτ S† $2
F
φcor (τ ) = 2 2 . (17.17)
Zτ F S F
This is similar to an approach that is suggested by [221]. The test statistic
described here is bounded by zero and one. Other approaches, such as using the
maximum magnitude value of the elements in Γτ , are suggested in [201].
17.4.2 MMSE beamformer

An important tool in multiple-antenna signal processing is the MMSE beam-
former introduced in Section 9.2.3. In the context of a synchronization test statis-
tic, the beamformer is used to spatially mitigate interference and produce a set
of data streams (one for each transmitter). This technique is introduced in the
context of synchronization of a SIMO system in Reference [215].
The output of a set of beamformers, given by
Y = W† Zτ (17.18)
produces nt rows of data in Y ∈ Cn t ×n s , where each column of W ∈ Cn r ×n t

is optimized for a particular transmitter. The beamformers are optimized so
that the error between the reference signal for a particular transmitter and the
beamformer output (for that transmitter) is minimized when properly aligned in
time. If the synchronization sequence is not properly aligned, then beamformer
optimization will not be essentially random, and the beamformer outputs will
have reduced power compared to those under correct time alignment.
As discussed in Section 9.3.2, the MMSE beamformer is found by minimizing
the expected magnitude squared of the error ,
) *
% 2& 1 $ $
$W† Zτ − S$2
= F
ns
1 % 3 4&
= tr (W† Zτ − S)(W† Zτ − S)† , (17.19)
ns
where the expectation is taken with respect to the noise. Minimizing the error,
defined in Equation (17.19), with respect to W gives
1 % &
0= Zτ (Z†τ W − S† ) . (17.20)
ns
For a reasonably large number of samples ns nr , the expectation can be

approximated well by using the average over the number of samples so that the
MMSE beamformers W are given by
W ≈ (Zτ Z†τ )−1 Zτ S† . (17.21)
The total energy, which is calculated by noncoherently combining the output of
each beamformer, is given by the Frobenius norm squared of Y; thus, an MMSE
test statistic can be formed,
$ $2
Y 2F ≈ $[(Zτ Z†τ )−1 Zτ S† ]† Zτ $F = S PZ τ 2F , (17.22)
where PZ τ ≡ Z†τ (Zτ Z†τ )−1 Zτ is the operator that projects onto the row space
spanned by Zτ . Given the form of Equation (17.22), a useful normalization that
limits the MMSE test statistic φm m se (τ ) within the range [0, 1] is given by
2
S PZ τ F
φm m se (τ ) = . (17.23)
S 2F
17.4.3 Generalized-likelihood ratio test

The generalized-likelihood ratio test (GLRT) [312, 173] is a likelihood ratio test
for two hypotheses in which the parameters of the probability density function
are estimated using maximum-likelihood estimators under the assumption that
each hypothesis is true. The first hypothesis is that the synchronization signal is
present and aligned in time, and the second is that the synchronization signal is
absent or misaligned. The generalized-likelihood ratio test statistic φg lr t (τ ) has
the form
maxH ,R p(Zτ |S; H, R)
φg lr t (τ ) = , (17.24)
maxQ p(Zτ |Q)
where R is the receive spatial covariance matrix of the noise and interference,
and Q is the receive spatial covariance matrix for a missing or misaligned signal
of interest plus the contribution for interference and noise contained in R. The
probability density for each hypothesis is given by p(Zτ |S; H, R) and p(Zτ |Q).
By using this standard statistical approach, a new synchronization test statistic
is constructed.
While the internal noise is typically sampled from a Gaussian distribution, the
interference and misaligned signal can be sampled from a wide range of distribu-
tions. Many standards operating in the industrial, scientific, and medical (ISM)
band employ orthogonal-frequency-division multiplexing (OFDM) signals that
can be approximated well by Gaussian distributions, while other signal types
may be sampled from other distributions. It is assumed that misaligned refer-
ence signals and external interference can be modeled reasonably well by being
sampled from complex Gaussian distributions. Because much of the strength of
the approach is in how it exploits the spatial structure of the signals, the appli-
cability of the approach is wider than might be expected naively. Given detailed
knowledge of a system’s modulation, alternative GLRTs can be developed. By

assuming a Gaussian model, the probability density function of noise and inter-
ference in the absence of synchronization is given by
e−tr{Z τ Q Z τ }
† −1
p(Zτ |Q) = . (17.25)

π n r n s |Q|n s
Given the total average signal-of-interest transmit power P0 , the total noise-plus-
interference-plus-misaligned-signal covariance matrix Q is given by
P0 % &
Q=R+ H s(t) s† (t) H† , (17.26)
nt
if
% the† &signal is misaligned. If the signal is absent, then Q = R, where R =
N N /ns is the noise-plus-interference covariance matrix. The vector of com-
plex baseband signals transmitted from each antenna as a function of time is
given by s(t). Because this transmitted signal may contain unknown data, it can
only be characterized in a statistical sense. The probability density function with
the synchronization signal present and properly aligned in time is given by
e−tr{(Z τ −H S) R (Z τ −H S) }
† −1
p(Zτ |S; H, R) = . (17.27)

π n r n s |R|n s
Given the above form, to maximize the likelihood at some given delay, the prob-
ability must also be maximized with respect to two nuisance parameters, R and
H. As presented in Equation (8.128), maximizing with respect to an arbitrary
parameter α of H gives the following estimator:
∂p(Zτ |S; H, R)
= 0 → (Zτ − ĤS)S† = 0
∂α
Ĥ = Zτ S† (SS† )−1 . (17.28)
This is the channel estimator used in Chapter 2. Substituting this estimate into
Equation (17.27) produces
e−tr{(Z τ P S ) R (Z τ S )}
⊥ † −1
P⊥
p(Zτ |S; Ĥ, R) = , (17.29)
π n r n s |R|n s
where the projection operator P⊥S projects onto a basis orthogonal to the row
space spanned by S. It is defined to be
P⊥ † † −1
S = In s − S (SS ) S. (17.30)
Maximizing the likelihood in Equation (17.29) with respect to an arbitrary pa-
rameter β of R gives
∂p(Zτ |S; Ĥ, R)
=0
∂β

∂R ∂R
= tr (Zτ P⊥ ⊥ † −2
S )(Zτ PS ) R − ns R−1
∂β ∂β
Zτ P⊥ †
S Zτ
R̂ = , (17.31)
ns
using matrix derivative identities from Section 2.7.1 and the notion that projec-
tion matrices are idempotent. Substituting this estimator into the likelihood in
Equation (17.29), the maximum probability density is given by
e−tr{n s In r }
p(Zτ |S; Ĥ, R̂) =
n s Z τ P S Z τ
⊥ † ns
πn r ns
nns s e−n s n r
⊥ † −n s
= n n
Z τ PS Zτ . (17.32)
π r s
Similarly, maximizing the probability density function with a misaligned syn-
chronization, the received covariance matrix estimate of Q̂ is given by
Zτ Z†τ
Q̂ = , (17.33)
ns
which gives the probability density
nns s e−n s n r −n s
p(Zτ |Q̂) = Zτ Z†τ . (17.34)
πn r n s
Consequently, the generalized-likelihood ratio test statistic φg lr t (τ ) is given by

Zτ Z†τ n s
φg lr t (τ ) = n = |I − PS PZ τ |−n s . (17.35)
⊥ † s
Zτ PS Zτ
This test statistic is bounded to values greater than or equal to one. It is in-
teresting to note that the statistic is a function of the row space of S and Zτ
exclusively.
17.4.4 Spatial invariance

An observation about the generalized-likelihood ratio test statistic is that it is
spatially invariant [275]. As an example, under the transformations Zτ → A Zτ
and S → B S for some arbitrary nonsingular square matrices A and B, the
GLRT statistic is unchanged,
φg lr t (τ ; AZτ , BS) = φg lr t (τ ; Zτ , S) . (17.36)
Here, the test statistic notation includes an explicit reference to the received
and transmitted signal parameterization. The test statistic is a function of the
row space of Zτ and S only. This suggests another approach for developing
new synchronization test statistics that are spatially invariant. While there is no
guarantee that invariance will provide a useful test statistic, there is precedence
for using it as motivation [175, 97].
The row spaces of Zτ and S can be represented by the matrices K and T
respectively, such that K† K = PZ τ and T† T = PS . A variety of spatially in-
variant test statistics can be constructed by considering the distances between
the subspaces defined by K and T. The distance between the subspaces is not
Problems 557
uniquely defined. Distance between subspaces can be expressed in terms of prin-

cipal angles [117, 85], which are a multidimensional subspace extension to the
angle between vectors and are defined implicitly in
K T† = U diag{cos(a)} V† , (17.37)
where the vector a contains the principal angles and U and V are orthonormal
matrices. Here the cosine of a vector indicates the vector constructed from the
cosine of the individual elements. Given this set of principal angles, a number
of possible measures of the distance between subspaces are available [85]. The
following is a subset: arc length, minimum angle, chord, or the product:
• arc length: φA r cL en (τ ) = a −1
• minimum angle: φM in A n g (τ ) = (min{a})−1
• chord length: φC hor d (τ ) = 2 sin{a/2} −1
! −1
• product: φP r od (τ ) = ( m sin{am }) .
Interestingly, for this environment, the product-motivated test statistic and the
generalized-likelihood ratio test statistic are the same up to an exponential coeffi-
cient and, therefore, have the same performance characteristics. The relationship
between the two forms is given by
φg lr t (τ ) = |I − PS PZ τ |−n s
= |I − (KT† )(T K† )|−n s
= |I − U diag{cos(a)} V† V diag{cos(a)} U† |−n s
= |diag{sin2 (a)}|−n s
−2n s

= sin{am } = φ2n
P r od (τ ) .
s
(17.38)
m
17.4.5 Comparison of performance

In Figure 17.1, the performances are compared in the presence of a single 10 dB
INR interferer. In interference, the correlator test statistic performs poorly. How-
ever, the MMSE and the GLRT test statistics perform well in the presence of
interference. Both the MMSE and the GLRT test statistics attempt to compen-
sate spatially for the effect of the interference. Consequently, they perform better.
The best performance over the entire ROC curve is given by the GLRT statistic.
The implication is the GLRT would enable acquisition for sets of environmental
parameters for which the correlator test statistic would rarely work.
Problems
17.1 Reformulate the Cramer–Rao bound in Section 17.2 for frequency estima-
tion.
Figure 17.1 Comparison of receiver operating characteristics in terms of probability of

missing detecting the signal as a function of the probability of false alarm for 2 dB
SNR per symbol per receive antenna, 16 symbols length, and a 10 dB INR interferer
in a flat-fading environment. Correlation (Cor), minimum-mean-square-error
(MMSE), and generalized-likelihood ratio test (GLRT), flat-fading, test statistics are
used. For reference, the random (Rand) test curve is displayed. IEEE
c 2010.
Reprinted, with permission, from Reference [36].
17.2 Reformulate the Cramer–Rao bound in Section 17.2 under the assump-
tions of an uninformed MIMO transmitter, nI strong interferers, and a random
flat-fading i.i.d. complex circular Gaussian channel matrix.
17.3 Reformulate the correlation, MMSE, and GLRT test statistics in terms of
frequency synchronization.
17.4 Evaluate the GLRT under the assumption that the interference-plus-
covariance matrix is known and given by I.
17.5 For a channel with a number of receivers greater than or equal to trans-
mitters (nr ≥ nt ), develop a test statistic that replaces the beamformers in W
from Equation (17.21) with zero-forcing beamformers using the estimated chan-
nel. Numerically compare performance of the statistic to the performances in
Figure 17.1.
17.6 For the correlation, MMSE, and GLRT test statistics determine numeri-
cally the SNR required for a 4 × 4 link to achieve a probability of false alarm of
less than 10−6 and a probability of detection of at least 0.9 for 32 observations
under an INR per receive antenna of:
(a) −∞ dB,
(b) 20 dB.
18 Practical issues
In many texts on communication, including this one, it is often assumed that

transmitters and receivers are ideal. This assumption is never true. In practice, to
get a communication system to operate, more time and effort are often invested
in compensating for imperfections than in designing the ideal communication
system. Furthermore, many system and algorithm design choices can increase or
decrease the sensitivity of the communications system to the imperfections of
the constituent components.
In this chapter, a few practical issues are addressed, including noise models,
noise figure, power consumption, antennas, local oscillators, and dynamic range.
These are only a small fraction of all practical issues faced by radio designers,
but these are presented to sensitize the system designer to these potential issues.
Many of these issues are addressed with greater depth in texts such as References
[261, 252].
18.1 Antennas
In order for signals to be radiated or received, the electromagnetic energy is

coupled through an antenna [15]. As discussed in Section 5.1, antennas can sig-
nificantly affect the signal being transmitted or received. Each antenna has some
directional and polarimetric response. In addition, the antenna has a frequency
response. In the context of this text, the effects of the antennas are typically
folded into the channel. Finally, if the impedance of the antenna is not matched
well to the transmitter or receiver, then inefficiencies are introduced. There is
constant drive to reduce the size of radio systems. This places additional con-
straints on antenna and antenna array design. When placed in close proximity to
conducting material, antennas often have characteristics that differ significantly
from their idealized properties. Often the antennas become feeds for the larger
conductive structure in which they are embedded. In the context of this text,
one of the important issues is the density of elements that can be supported in
some finite-size radio [111].
560 Practical issues
18.1.1 Electrically small antennas

Antennas that are small compared with a wavelength introduce additional com-
plications. Simply put, it is difficult to couple into an antenna that is small
because there is not much room for induced currents. A solution is to couple to
the antenna via a strong resonance (a high quality factor Q circuit). This in effect
limits the bandwidth of the signal that is transmitted or received. It is assumed
that the antenna under analysis is contained within a sphere of radius a. This
bandwidth is described in terms of the resonance strength Q of the resonant
circuit
f0
Q≈ , (18.1)
Δf
where f0 is the carrier frequency, and Δf is the bandwidth An approximate
bound on the minimum strength of resonance is given in Reference [60] as
3
c
Q , (18.2)
2π f0 a
where c is the speed of propagation.
This bound was developed under the assumption that there is nothing near the
antenna. In most applications, radios are near other conductive objects. In such
cases, the electrically small antennas become feeds for the conductive structures
nearby, which can significantly affect the impedance and characteristics of the
antenna. In reality the design of antennas are affected by a range of issues and
constraints, and actual performance can vary widely [285].
18.1.2 Crossed polarimetric array

It is often asked how small an array can be placed in a radio. In the limiting case
of a small enclosing volume, there is no room for traditional arrays. However,
under the assumption that the elements can be electrically small, a polarimetric
array (as discussed in Section 6.5) can be constructed with three elements. An-
other potential solution is an array of small electric dipole antennas lying along
each of the spatial axes in which large reactance is supported by the driving
circuits.
18.2 Signal and noise model errors
In theoretical analyses, it is commonly assumed that signals, interference, and

noise are drawn from Gaussian distributions. Sometimes non-Gaussian structures
of the signal are considered, but the possibility of non-Gaussian noise and inter-
ference is often ignored. From a capacity point of view, the Gaussian models for
noise and interference are the worst cases. However, algorithms developed under
18.4 Local oscillators 561
the assumption of the Gaussian model often perform worse than expected when
exposed to non-Gaussian signals, noise, and interference.
It is relatively common for real noise and interference distributions to have
longer tails than a Gaussian distribution. These occasional large deviations from
the expected signal strength can have significant effects on various algorithms.
As an example, if a soft decoder algorithm is presented with a noise sample that
has a large deviation from the expected Gaussian distribution, then the erro-
neous likelihood for that sample can be extremely large or small. The erroneous
likelihood can then propagate throughout the decoding processing, overwhelm-
ing other reasonable likelihoods. A few samples with a large deviation may cause
a decoding error across the frame. Consequently, while the Gaussian noise signal
might be the worst case from an information-theoretic perspective, in practical
systems non-Gaussian noise can be much worse.
18.3 Noise figure

(th)
As discussed in Section 4.2.2, the theoretical noise power Pn of a load observed
over some bandwidth B is given by
Pn(th) = kB T B , (18.3)
where kB is the Boltzmann constant, and T is the absolute temperature. In
real systems, the observed noise power is greater than that expected, given the
ambient temperature. This difference is often denoted the noise figure 1 fn , so
the observed noise power Pn is given by
Pn = fn Pn(th) . (18.4)
With effort, noise figures of a couple of decibels are possible, although some
systems have noise figures that are much larger. While it is common to focus
on the noise figure of the receiver, transmitter noise can be an issue for systems
with very high transmit dynamic range requirements.
18.4 Local oscillators
Nearly all modern communication systems use a local oscillator as a frequency

reference. Often the reference is a crystal oscillator that provides a frequency
reference at some convenient value, The frequency is then shifted to the desired
carrier or intermediate frequency by using some sort of frequency synthesizer.
One approach for doing this is to use a phase-locked loop with a frequency
divider at the output of the voltage controlled oscillator (VCO). Because of the
1 There is a convention to denote this effect as noise figure when expressed on a decibel scale
and noise factor when expressed on a linear scale. Here we will not respect this convention
because it should be clear from context if the effect is expressed on a linear or decibel scale.
electrical sensitivity of the inputs of the frequency synthesizers, they are often
particularly sensitive to packaging-induced coupling problems. There are two
significant concerns with regard to using these frequency references: accuracy
and phase noise.
18.4.1 Accuracy
Local oscillators vary widely with respect to accuracy. Furthermore, the fre-
quency provided by the frequency reference often changes as a function of tem-
perature and age. An inexpensive crystal oscillator used in consumer electronics
may have an accuracy of 10−5 . If the carrier frequency is 1 GHz, then the radio
would transmit or receive with a frequency error of 10 kHz. As a point of com-
parison, a 100 km/h induced Doppler shift produces a frequency shift of 93 Hz.
Consequently, the frequency error caused by the local oscillator can be orders of
magnitude larger than that produced by Doppler shift.
There are much better oscillators available. Temperature-compensated oscil-
lators (TCXO) can have accuracies of better than 10−6 . Ovenized crystal oscil-
lators (OCXO) can have accuracies of better than 10−7 ; however, this comes at
the expense of size, weight, and power.
High-performance atomic clocks enable accuracies of better than 10−14 . These
“clocks” are currently the size of rooms and would make for an inconvenient
mobile phone. There are small atomic clocks with degraded performance. As
technology evolves, these oscillators may become viable for mobile radios.
One approach that allows radios to have high accuracy is to use an external
source as a reference. As an example, base stations of cellular phone systems
typically have access to accurate frequency standards. Mobile units can estimate
and track the frequency from a signal broadcast from the base station. Alterna-
tively, radios can use other external broadcast frequency standards. If the radio
has access to the Global Positioning System (GPS) [218] or some equivalent sys-
tem, an accurate frequency reference can be extracted from a system that has
access to atomic clocks.
Modern radios can be frequency agile. Often radios use a fixed frequency ref-
erence, and then uses a synthesizer to shift the operating frequency. Depending
upon the details of the synthesizer, it may require a noticeable amount of time
for the frequency at the output of the synthesizer to settle to its final value.
18.4.2 Phase noise

In addition to the mean accuracy of the frequency provided by a frequency
reference, the instantaneous phase noise can be a significant concern [185]. By
observing the power spectral density produced by a reference, one can see that the
power peaks strongly near the intended frequency (depending upon the accuracy
of the reference), and then decreases quickly as the observation frequency shifts
away from the peak. Phase noise is described by this finite width of the spectrum
at the output of the frequency reference. Fortunately, crystal oscillators that are
used as references typically have good phase noise characteristics. However, when
frequencies are derived from the crystal oscillators by using synthesizers, phase
noise can be increased significantly. For applications that require the frequency
synthesizer to change frequencies, the settling time of the synthesizer can be an
issue. Often phase noise and settling time are competing requirements because
one characteristic is often improved at the expense of the other.
There are a couple of potentially important adverse effects of phase noise.
Relative phase noise between the transmitter and the receiver will cause the
constellation observed at complex baseband to rotate back and forth. While a
frequency adaptive approach can be employed to compensate for these effects,
this comes at the expense of greater computations and may not be effective in all
situations. If the communication link is using higher-order constellations, then
the small amounts of phase noise can cause errors in decoding the signal.
For a multiple-antenna system, if multiple synthesizers are used, then the phase
noise for each spatial channel can be different. As a consequence, small relative
frequency errors can be introduced between spatial channels. These errors can
place limits in the null depths produced by spatial interference mitigation. As
discussed in Chapter 10, in some situations these relative frequency effects can be
mitigated by using space-frequency adaptive processing, but a better engineering
solution is probably to remove the source of the relative phase errors. Even if the
multiple-frequency synthesizers are using a common reference, the relative phase
may be different from one power-up cycle to the next. Consequently, unless some
care is taken, the phase calibration of the multiple-antenna system may not be
stable.
18.5 Dynamic range
One of the common assumptions in discussions of communication systems is

that of infinite dynamic range, that is, the ratio of the strongest to the weak-
est signal that a radio can transmit or receive. Depending upon the amount
of effort invested in hardware, the dynamic range of the hardware may vary
from several decibels to over a hundred. There are a number of reasons why the
dynamic range of a radio’s transmit or receive chain can be of concern. If the
modulation approach has a large peak-to-average power ratio, as in the case of
orthogonal frequency-division multiplexing (OFDM) signals or high-order con-
stellations, then the dynamic range of the signal places stricter requirements on
the hardware. There are a range of issues associated with clipping the levels of an
OFDM signal, including spectral growth and violation of orthogonality [75, 311].
In addition, because of channel attenuation variations, the signal power at the
receiver can vary significantly more. If the transmitter employs power control,
it may have to vary the power significantly to compensate for the channel at-
tenuation. Because the time scale of channel variation tends to be significantly
slower than modulation, alternative approaches can be used to extend the range
over which a transmitter or receiver can operate. If there is cochannel interference
(that is, multiple signals being received at the same time at the same frequency),
then dynamic range requirements are increased by the ratio of the stronger to
the weaker received signal [14]. To receive the weaker signal, both the strong and
weak signals must fit within the dynamic range of the receiver. Because of the
near-far problem in networks, the range of received powers can be significant.
For many systems, the instantaneous dynamic range requirements of the receiver
are more stringent than for the transmitters. Limitations to dynamic range are
usually the result of various nonlinearities in the transmitter or receiver.
18.5.1 Quantization
The most apparent (although often not always the most significant) source of
nonlinearity is the analog-to-digital converter (ADC) for the receiver or simi-
larly the digital-to-analog converter (DAC) for the transmitter. In converting a
continuous signal to a set of discrete values, errors in the signal are introduced,
although theses effects may not be important if the noise is larger than the er-
rors. These errors are often referred to as quantization noise. While this “noise”
is clearly not Gaussian, it is often approximated as Gaussian.
Under the assumption of a flat probability distribution across a given quanti-
zation value, the variance of the quantization noise σq2 in units of bits squared is
given by
1/2
2 1
σq = dx x2 = . (18.5)
−1/2 12
The variance of a full-scale signal, of course, depends upon the statistics of the
signal in question. Under the assumption that the signal covering the range 0 to
2n b i t s is centered at the amplitude (2n b i t s − 1)/2 for a digitizer with nbits bits,
2
the maximum variance of a signal σs,m ax in units of bits squared is given when
the signal value is near the minimum and maximum values exclusively,
2 (2n b i t s − 1)2
σs,m ax = . (18.6)
4
2
The variance of a signal σs,eq in units of bits squared that occupies all amplitudes
with equal likelihood is given by
2 n b i t s −1
2 0
dx [x − (2n b i t s − 1)/2]2
σs,eq = 2 n b i t s −1
0
dx
(2 n bits
−1) 3
12 (2n b i t s − 1)2
= = . (18.7)
(2n b i t s − 1) 12
Alternatively, it is commonly assumed in assessments of effective number of bits
2
that the input to the digitization is a sinusoid so that the signal variance σs,sin
is given by
2π 2 n b i t s −1 2
2 0
dφ 2 sin(φ)
σs,sin = 2π
0
dφ
(2 nb it s
− 1) 2
= . (18.8)
8
The dynamic range r is then defined by the ratio of the largest signal variance to
the quantization noise variance. Depending upon the choice of signal used, the
ratio is given by
σs2
r= = (2n b i t s )2 c
σq2
⎧
⎨ 3 ; max
c= 1 ; equal , (18.9)
⎩
3/2 ; sinusoid
where c is a constant that is dependent upon the distribution of the signal, and
it is assumed 2n b i t s − 1 ≈ 2n b i t s .
It is common to describe the dynamic range in terms of an effective number
of bits. The observed maximum dynamic range r is then equated to this number
of effective bits. From above, the observed dynamic range in power r is given by
( e f f ) 2
r ≈ c 2n b i t s
r (e f f )
≈ 4n b i t s . (18.10)
c
These relations are typically expressed in terms of decibels as a function of the
(ef f )
number of bits, so the effective number of bits nbits is given by
10 log10 (r/k) = r[dB] − c[dB]
(e f f )
(ef f )
≈ 10 log10 (4n b i t s ) ≈ nbits 6[dB]
(ef f ) r[dB] − c[dB]
nbits ≈ , (18.11)
6[dB]
where r[dB] and c[dB] indicate the dynamic range and distribution constant
expressed on a decibel scale. The values of c[dB] are given by 4.8 dB, 0 dB,
and 1.8 dB for maximum, equal, and sinusoidal signal distributions respectively.
Because it is easy to generate sinusoids in the laboratory, the sinusoidal version
is most often quoted. However, for most qualitative purposes, r[dB]/6[dB] is a
sufficiently accurate estimate for the number of effective bits.
18.5.2 Finite precision

In addition to the quantization resulting from digitization, digital computation
can have an effect on dynamic range. In simulations it is common to use floating-
point calculations. However, computations executed within a radio are often
executed with a finite number of bits. By reducing the number of bits used in
computations, the amount of real estate on a chip or in a system occupied by the
computation can sometimes be reduced significantly. In some cases, reducing the
number of bits used by a computation can reduce the execution time. Conversely,
limiting the number of bits used in a calculation can significantly affect the dy-
namic range supported in a calculation. As a specific example, the inverse of a
covariance matrix can require far more bits than were used to accurately store
the data matrix. This is because information contained in the covariance matrix
is stored in the power domain, requires approximately twice as many bits as the
data matrix which contains amplitude information. Furthermore, matrix inver-
sion inverts the eigenvalues of the covariance matrix. It is, therefore, sensitive
to both the largest and smallest values, and errors in the smallest eigenvalues
which are exaggerated. One approach to circumvent this problem is to perform
inversions in the amplitude domain, such as using QR decomposition to whiten
a vector.
Consider the estimate of the number of receivers by the number of receivers
covariance matrix C ∈ Cn r ×n r , constructed by using the ns samples contained
within the data matrix Z ∈ Cn r ×n s . The estimate and the data matrix QR
decomposition [117] are given by
1
C= Z Z†
ns
Z = (Q R)† , (18.12)
where Q ∈ Cn s ×n r is a unitary matrix, and R ∈ Cn r ×n r is an upper right-hand

triangle matrix. To be clear, this notation for R is not consistent with the us-
age in the rest of the text. However, the notation used here is common in the
literature discussing QR decomposition. A variety of computations contain
the quadratic form
φ = z† C−1 z . (18.13)
An example is the log-likelihood calculation. By employing the QR decomposi-

tion of C, the quadratic form is simplified to the inner product of two whitened
vectors y ∈ Cn r ×1 ,
1 †
φ= z [(Q R)† (Q R)]−1 z
ns
1 † † −1
= z [R R] z
ns
1 † −1
= (z R ) ([R† ]−1 z)
ns
= y† y
1
y = √ ([R† ]−1 z) . (18.14)
ns
The whitened vector y can by found by solving the relation

√
ns R† y = z , (18.15)
which, because R is upper triangular, can be done by employing the computa-

tionally efficient method of back substitution [117]. In this case, because of the
Hermitian conjugate, it is technically a forward substitution. Because the covari-
ance is never explicitly evaluated, the dynamic range of the values involved in
the calculation are significantly reduced.
18.5.3 Analog nonlinearities

Along the transmit and receive signal chain there are a number of possible sources
of nonlinearities. The two most common sources are amplifiers and mixers. In
the transmitter, there may be a series of amplifiers along the signal chain. The
linearity of amplifiers can often be improved at the expense of greater power
dissipation. For the transmitter, the power amplifier is often the most significant
source of nonlinearity because it is the most sensitive to the increase of power
required to improve linearity. If the nonlinearity of the amplifier is memoryless,
then the nonlinear function can be estimated and the inverse function can be ap-
plied to correct for the effect until hard clipping of the signal occurs. In general,
these nonlinearities have memory; consequently, the correction function is more
complicated. In general, systems with nonlinearities and memory can be charac-
terized by using Volterra series. However, the dimensionality of this expansion
can quickly become overwhelming, so pruned versions have been investigated for
various applications [363, 143].
With very high sample rate analog-to-digital and digital-to-analog convert-
ers, direct radio frequency to digital conversion is possible, allowing the mixing
to be done in the digital domain. However, motivated by limits in power, cost,
and dynamic range, in nearly all modern communication systems the complex
baseband signal and the carrier frequency are connected by one or more analog
mixers. These mixers are by necessity nonlinear. Theoretically, the effects (im-
ages) are placed at frequencies outside the range of interest, allowing them to
be filtered. However, to the extent that the mixing and filtering are not ideal,
these can introduce distortions into the signal. As an example, if two relatively
closely spaced tones are observed by a receiver, then some of the distortions ob-
served are low-level tones at various multiples of the differences between tones
introduced. As the power of the two input tones is increased, the strength of all
these intermodulation signals grows.
One of the ways in which the nonideal effects of the mixers are characterized
is expressed in terms of the third-order intercept (IP3). For a receiver, this is
the theoretical input power at which the third-order nonlinearity is equal to the
power of the input tones. Consequently, larger values for third-order intercept
points are better.
18.5.4 Adaptive gain control

If there is only a single signal being received, limited dynamic range can be cir-
cumvented to some extent by employing an adaptive gain control (AGC) system
that attenuates the signal if the received signal is larger than the dynamic range
that the receiver supports. However, adaptive gain control can also introduce
problems for systems by changing the attenuation during coherent processing
intervals, causing amplitude and phase modulation. Particularly for systems that
receive multiple signals with large dynamic range, care must be taken in using
adaptive gain controls.
18.5.5 Spurs
Spurs is shorthand notation for some spurious or unwanted signal caused by
distortions or other effects. In almost all real receivers, various spurs at low levels
can often be observed in the frequency domain. In modern hardware, there are
various clocks, digital signals, and many sources of nonlinearities causing these
unwanted spurs. Depending upon the effort expended to minimize these effects,
spurs can be very small and innocuous or large and disruptive. As an example, a
strong spurious tone in the middle of the intended received signal spectrum can
change the statistics of the signal, potentially causing adverse effects to the bit-
error-rate floor. In general, spurs in real systems can be disruptive to approaches
that are particularly sensitive to the model or statistics of the noise.
18.6 Power consumption
For many communication applications, one of the most important system design
trades is performance versus power consumption. While it is clear that the power
amplifier can draw significant power, particularly if high linearity is desired, many
other components can also consume significant power. Mixers and other analog
components can draw nontrivial amounts of power that can be a significant
concern, particularly for low-power systems. For more interesting waveforms,
coding, and algorithms, computations can sometimes be the dominant source
of power consumption. As was mentioned in Chapter 11, the required number
of computations per information bit can vary by several orders of magnitude.
It is difficult to make general comments about power consumption because it
is highly dependent upon the radio technologies involved, which tend to evolve
quickly, and because the importance of power consumption is dependent upon
the requirements of the system. Nonetheless, a system designer must consider
these requirements carefully.
References
[1] A. O. Hero III. Secure space-time communication. IEEE Transactions on Infor-

mation Theory, 49(12):3235–3249, Dec. 2003.
[2] M. Abaramovitz and I. Stegun. Handbook of Mathematical Functions. Dover
Publications, New York, 1970.
[3] N. Abramson. The ALOHA system – another alternative for computer commu-
nications. Proceedings of the Fall Joint Computer Conference AFIPS, 1970.
[4] V. D. Agrawal and Y. T. Lo. Mutual coupling in the phased arrays of randomly
spaced antennas. IEEE Transactions on Antennas and Propagation, 20:288–295,
May 1972.
[5] L. V. Ahlfors. Complex Analysis. McGraw-Hill, New York, 1953.
[6] I. F. Akyildiz, W.-Y. Lee, M. C. Vuran, and S. Mohanty. NeXt generation/
dynamic spectrum access/cognitive radio wireless networks: a survey. Elsevier
Computer Networks, 50(13):2127–2159, Sept. 2006.
[7] S. Alamouti and V. Tarokh. Transmitter diversity technique for wireless commu-
nications, 2001. U.S. Patent 6,185,258.
[8] S. M. Alamouti. A simple transmit diversity technique for wireless communica-
tions. IEEE Journal in Selected Areas in Communications, 16:1451–1458, Oct.
1998.
[9] O. B. S. Ali, C. Cardinal, and F. Gagnon. Performance of optimum combining
in a Poisson field of interferers and Rayleigh fading channels. IEEE Transactions
Wireless Communications, 9(8):2461 –2467, Aug. 2010.
[10] V. S. Annapureddy and V. V. Veeravalli. Gaussian interference networks: Sum
capacity in the low-interference regime and new outer bounds on the capacity
region. IEEE Transactions on Information Theory, 55(7):3032–3050, 2009.
[11] George B. Arfken and Hans-Jurgen Weber. Mathematical Methods for Physicists.
Elsevier, 2005.
[12] Z. D. Bai and J. W. Silverstein. On the empirical distribution of eigenvalues of
a class of large dimensional random matrices. Journal of Multivariate Analysis,
54:175–192, 1995.
[13] Z. D. Bai and J. W. Silverstein. On the signal-to-interference-ratio of CDMA
systems in wireless communications. Annals of Applied Probability, 17(1):81–
101, 2007.
[14] O. Bakr, M. Johnson, R. Mudumbia, and U. Madhow. Interference suppression
in the presence of quantization errors. Allerton Conference on Communication,
Control, and Computing, pages 1161–1168, Oct. 2009.
570 References
[15] C. A. Balanis. Antenna Theory: Analysis and Design. John Wiley & Sons,
Hoboken, New Jersey, 2005.
[16] A. Barabell. Improving the resolution performance of eigenstructure-based
direction-finding algorithms. IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), 8:336–339, April 1983.
[17] P. Bergmans. A simple converse for broadcast channels with additive white
Gaussian noise. IEEE Transactions on Information Theory, 6:85–127, 1974.
[18] Dennis S. Bernstein. Matrix Mathematics: Theory, Facts, and Formulas. Prince-
ton University Press, 2009.
[19] C. Berrou, A. Glavieux, and P. Thitimaijshima. Near Shannon limit error cor-
recting coding and decoding: turbo-codes. Proceedings of ICC 1993, Geneva,
2:1064–1070, May 1993.
[20] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, 1995.
[21] D.P. Bertsekas and R.G. Gallager. Data Networks, volume 2. Prentice-hall Upper
Saddle River, NJ, USA:, 1987.
[22] Ezio Biglieri. MIMO Wireless Communications. Cambridge University Press,
2007.
[23] M. Biguesh, S. Gazor, and M. H. Shariat. Optimal training sequence for MIMO
wireless systems in colored environments. IEEE Transactions on Signal Process-
ing, 57(8):3144–3153, Aug. 2009.
[24] Patrick Billingsley. Probability and Measure. John Wiley & Sons, Hoboken, New
Jersey, 1995.
[25] C. Bissell. Vladimir Aleksandrovich Kotelnikov: pioneer of the sampling theo-
rem, cryptography, optimal detection, planetary mapping. IEEE Communica-
tions Magazine, 47(10):24–32, Oct. 2009.
[26] B. A. Bjerke and J. G. Proakis. Multiple-antenna diversity techniques for trans-
mission over fading channels. IEEE Wireless Communications and Networking
Conference, 3:1038–1042, 1999.
[27] I. Blake and W. Lindsey. Level-crossing problems for random processes. IEEE
Transactions on Information Theory, 19(3):295–315, May 1973.
[28] D. W. Bliss. Robust MIMO wireless communication in the presence of interference
using ad hoc antenna arrays. Proceedings of MILCOM 03 (Boston), Oct. 2003.
[29] D. W. Bliss. Optimal SISO and MIMO spectral efficiency to minimize hidden-
node network interference. IEEE Communications Letters, 14(7):620–622, July
2010.
[30] D. W. Bliss, A. M. Chan, and N. B. Chang. MIMO wireless communication chan-
nel phenomenology. IEEE Transactions on Antennas and Propagation, 52(8),
Aug. 2004.
[31] D. W. Bliss and K. W. Forsythe. Angle of arrival estimation in the presence
of multiple access interference for CDMA cellular phone systems. Proceedings
of the 2000 IEEE Sensor Array and Multichannel Signal Processing Workshop,
Cambridge, Mass., March 2000.
[32] D. W. Bliss and K. W. Forsythe. Information theoretic comparison of MIMO
wireless communication receivers in the presence of interference. IEEE Asilomar
Conference on Signals, Systems and Computers, 1:866–870, Nov. 2004.
References 571
[33] D. W. Bliss, K. W. Forsythe, A. O. Hero, and A. F. Yegulalp. Environmental

issues for MIMO capacity. IEEE Transactions on Signal Processing, 50(9):2128–
2142, Sept. 2002.
[34] D. W. Bliss, K. W. Forsythe, and A. F. Yegulalp. MIMO communication capacity
using infinite dimension random matrix eigenvalue distributions. IEEE Asilomar
[35] D. W. Bliss and S. Govindasamy. Minimizing hidden-node network interference
by optimizing SISO and MIMO spectral efficiency. IEEE Asilomar Conference
on Signals, Systems and Computers, pages 1588–1592, Nov. 2010.
[36] D. W. Bliss and P. A. Parker. Temporal synchronization of MIMO wireless
communication in the presence of interference. IEEE Transactions on Signal
Processing, 58(3):1794–1806, Mar. 2010.
[37] D. W. Bliss, P. A. Parker, and A. R. Margetts. Simultaneous transmission and
reception for improved wireless network performance. Conference Proceedings of
the IEEE Statistical Signal Processing Workshop, pages 478–482, Aug. 2007.
[38] D. W. Bliss, P. H. Wu, and A. M. Chan. Multichannel multiuser detection
of space-time turbo codes: Experimental performance results. IEEE Asilomar
[39] R. S. Blum. MIMO capacity with interference. IEEE Journal on Selected Areas
in Communications, 21(5):793–801, June 2003.
[40] Mary L. Boas. Mathematical Methods in the Physical Sciences. John Wiley &
Sons, Hoboken, New Jersey, 2006.
[41] H. Bolcskei. Blind estimation of symbol timing and carrier frequency offset in
wireless OFDM systems. IEEE Transactions on Communications, 49(6):988–999,
June 2001.
[42] H. Bolcskei and A. J. Paulraj. Performance of space-time codes in the presence
of spatial fading correlation. IEEE Asilomar Conference on Signals, Systems and
Computers, 1:687–693, Oct. 2000.
[43] H. Bolcskei and A.J. Paulraj. Space-frequency codes for broadband fading chan-
nels. In Information Theory, 2001. Proceedings. 2001 IEEE International Sym-
posium on, page 219. IEEE, 2001.
[44] H. Bolcskei and I. J. Thukral. Interference alignment with limited feedback. IEEE
International Symposium on Information Theory (ISIT), pages 1759–1763, 2009.
[45] Helmut Bölcskei. Space-Time Wireless Systems: From Array Processing to MIMO
Communications. Cambridge University Press, 2006.
[46] Joseph J. Boutros, Francesc Boixadera, and Catherine Lamy. Bit-interleaved
coded modulations for multiple-input multiple-output channels. IEEE Sympo-
sium on Spread-Spectrum Technology and Applications, Sept. 2000.
[47] D. H. Brandwood. A complex gradient operator and its application in adaptive
array theory. IEE Proceedings of Microwaves, Optics and Antennas, 130(1):11–
16, Feb. 1983.
[48] Ira S. Brodsky. The History of Wireless: How Creative Minds Produced Technol-
ogy for the Masses. Telescope Books, 2008.
[49] D. W. Browne, M. W. Browne, and M. P. Fitz. Singular value decomposi-
tion of correlated mimo channels. IEEE Global Telecommunications Conference
(GLOBECOM), Dec. 2006.
572 References
[50] V. Cadambe and S. A. Jafar. Interference alignment and degrees of freedom

of the k-user interference channel. IEEE Transactions on Information Theory,
54(8), Aug. 2008.
[51] J. Capon. High-resolution frequency-wavenumber spectrum analysis. Proceedings
of the IEEE, 57(8):1408–1418, Aug. 1969.
[52] A. Carleial. A case where interference does not reduce capacity (corresp.). IEEE
Transactions on Information Theory, 21(5):569–570, 1975.
[53] George F. Carrier, Max Krook, and Carl E. Pearson. Functions of Complex
Variable: Theory and Technique. Hod Books, Ithaca, NY, 1983.
[54] George F. Carrier, Max Krook, and Carl E. Pearson. Functions of a Complex
Variable: Theory and Technique. Classics in Applied Mathematics. Society for
Industrial and Applied Mathematics, 2005.
[55] N. B. Chang, A. R. Margetts, and A. L. McKellips. Performance and complexity
tradeoffs of space-time modulation and coding schemes. IEEE Asilomar Confer-
ence on Signals, Systems and Computers, pages 1446–1450, 2009.
[56] B. Chen and M. J. Gans. MIMO communications in ad-hoc networks. IEEE
Transactions on Signal Processing, 54:2773–2783, July 2006.
[57] Z. Chen, J. Yuan, and B. Vucetic. Improved space-time trellis coded modulation
scheme on slow rayleigh fading channels. Electronics Letters, 37(7):440–441,
March 2001.
[58] Z. Chen, J. Yuan, and B. Vucetic. An improved space-time trellis coded modu-
lation scheme on slow rayleigh fading channels. IEEE International Conference
on Communications, 4:1110–1116, 2001.
[59] H. F. Chong, M. Motani, H. K. Garg, and H. El Gamal. On the Han–Kobayashi
region for the interference channel. IEEE Transactions on Information Theory,
54(7):3188–3195, 2008.
[60] L. J. Chu. Physical limitations of omni?directional antennas. Journal of Applied
Physics, 19(12):1163, Dec. 1948.
[61] Lewis Coe. Wireless Radio: A Brief History. McFarland, 1996.
[62] Cristina Comaniciu, Rarayan Mandayam, and H. Vincent. Wireless Networks:
Multiuser Detection in Cross-Layer Design. Springer, 2005.
[63] J. P. Conti. The 10 greatest communications inventions. Communications Engi-
neer, 5(1):14–21, 2007.
[64] A. D. Copeland, D. W. Bliss, and A. L. McKellips. Optimal windowing in MIMO
OFDM for network interference suppression. IEEE Asilomar Conference on Sig-
nals, Systems and Computers, pages 1699–1703, Nov. 2009.
[65] R. M. Corless, G. H. Gonnet, D. E. G. Hare, D. J. Jeffrey, and D. E. Knuth. On
the Lambert W function. Advances in Computational Mathematics, 5:329–359,
Dec. 1996.
[66] L. M. Correia and R. Prasad. An overview of wireless broadband communications.
IEEE Communications Magazine, 35(1):28–33, Jan. 1997.
[67] M. Costa. Writing on dirty paper. IEEE Transactions on Information Theory,
29(3):439–441, May 1983.
[68] T. M. Cover and J. A. Thomas. Elements of Information Theory, 2nd Edition.
John Wiley & Sons, New York, 2006.
References 573
[69] H. Dai, A. F. Molisch, and H. V. Poor. Downlink capacity of interference-limited

MIMO systems with joint detection. IEEE Transactions on Wireless Communi-
cations, 3(2), March 2004.
[70] D.J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes,
Volume I: Elementary Theory and Methods, volume 2. Springer Verlag, 2008.
[71] D.J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes,
Volume II: General Theory and Structure, volume 2. Springer Verlag, 2008.
[72] M. O. Damen, H. El Gamal, and G. Caire. On maximum-likelihood detection
and the search for the closest lattice point. IEEE Transactions on Information
Theory, 49(10):2389–2402, Oct. 2003.
[73] A. V. Dandawate and G. B. Giannakis. Statistical tests for presence of cyclosta-
tionarity. IEEE Transactions on Signal Processing, 42(9):2355–2369, Sept. 1994.
[74] Michael D’Antonio. A Ball, A Dog, and a Monkey: 1957 – The Space Race
Begins. Simon & Schuster, New York, 2007.
[75] D. Dardari. Joint clip and quantization effects characterization in OFDM re-
ceivers. IEEE Transactions on Circuits and Systems, 53(8):1741–1748, Aug.
2006.
[76] Jon Dattorro. Convex Optimization & Euclidean Distance Geometry. Meboo
Publishing, 2008.
[77] D. F. Delong. Multiple signal direction finding with thinned linear arrays. Tech-
nical Report TST-68, DTIC:ADA128924, MIT Lincoln Laboratory, April 1983.
[78] D. F. Delong. Use of the weiss-weinstein bound to compare the direction-finding
performance of sparse arrays. Technical Report AST-17, M.I.T. Lincoln Labora-
tory, Aug. 1991.
[79] L. Devroye. Bounds for the uniform deviation of empirical measures. Journal of
Multivariate Analysis, 12:72–79, 1982.
[80] N. Devroye, P. Mitran, and V. Tarokh. Achievable rates in cognitive radio chan-
nels. IEEE Transactions on Information Theory, 52(5):1813–1827, May 2006.
[81] M. Dieckmann and R. Hell. Lichtelektrische bildzerlegerrehre fr fernseher, 1927.
German Patent 450,187.
[82] A. E. Dolbear. Mode of electric communication, 1886. U.S. Patent 350,299.
[83] M. Donvito and S. Kassam. Characterization of the random array peak sidelobe.
IEEE Transactions on Antennas and Propagation, 27(3):379–385, May 1979.
[84] Tolga M. Duman and Ali Ghrayeb. Coding for MIMO Communication Systems.
John Wiley & Sons, 2008.
[85] A. Edelman, T. A. Arias, and S. T. Smith. The geometry of algorithms with
orthogonality constraints. SIAM Journal on Matrix Analysis and Applications,
20(2):303–353, 1999.
[86] H. El Gamal and M. O. Damen. Universal space-time coding. IEEE Transactions
on Information Theory, 49(5):1097–1119, May 2003.
[87] H. El Gamal and A. R. Hammons. A new approach to layered space-time coding
and signal processing. Information Theory, IEEE Transactions on, 47(6):2321–
2334, 2001.
[88] Y. C. Eldar and G. Kutyniok. Compressed Sensing: Theory and Applications.
Cambridge University Press, 2012.
[89] R. Etkin, D. Tse, and H. Wang. Gaussian interference channel capacity to within
one bit. IEEE Transactions on Information Theory, 54(12), Dec. 2008.
574 References
[90] Albert Guillen I Fabregas, Alfonso Martinez, and Giuseppe Caire. Bit-Interleaved
Coded Modulation. Foundations and Trends in Communications and Information
Theory. Now Publishers, 2008.
[91] P. Farnsworth. Television system, 1930. U.S. Patent 1,773,980.
[92] F. R. Farrokhi, G. J. Foschini, A. Lozano, and R. A. Valenzuela. Link-optimal
space-time processing with multiple transmit and receive antennas. IEEE Com-
munications Letters, 5:85–87, March 2001.
[93] William Feller. An Introduction to Probability and Its Applications, Vol. II. John
Wiley & Sons, 1971.
[94] B. A. Fette. Cognitive Radio Technology: 2nd Edition. Elsevier, Burlington, MA,
2009.
[95] L. De Forest. Space telegraphy, 1908. U.S. Patent 879,532.
[96] K. W. Forsythe. Utilizing waveform features for adaptive beamforming and direc-
tion finding with narrowband signals. MIT Lincoln Laboratory Journal, 10(2):99–
126, 1997.
[97] K. W. Forsythe. Performance of space-time codes over a flat-fading channel using
a subspace-invariant detector. IEEE Asilomar Conference on Signals, Systems
and Computers, 1:750–755, Nov. 2002.
[98] K. W. Forsythe, D. W. Bliss, and C. M. Keller. Multichannel adaptive beam-
forming and interference mitigation in multiuser CDMA systems. IEEE Asilomar
Conference on Signals, Systems and Computers, 1:506–510, Oct. 1999.
[99] G. J. Foschini. Layered space-time architecture for wireless communication in
a fading environment when using multi-element antennas. Bell Labs Technical
Journal, 1(2):41–59, Autumn 1996.
[100] Giorgio Franceschetti and Sabatino Stornelli. Wireless Networks: From the Phys-
ical Layer to Communication, Computing, Sensing, and Control. Elsevier Aca-
demic Press, 2006.
[101] M. Franceschetti, O. Dousse, D. N. C. Tse, and P. Thiran. Closing the gap in
the capacity of wireless networks via percolation theory. IEEE Transactions on
Information Theory, 53(3):1009–1018, March 2007.
[102] M. Franceschetti, M. D. Migliore, and P. Minero. The capacity of wireless net-
works: Information-theoretic and physical limits. IEEE Transactions on Infor-
mation Theory, 55(8):3413–3424, July 2009.
[103] J. Freebersyser and B. Leiner. A DoD perspective on mobile ad hoc networks. In
C. E. Perkins, editor, Ad Hoc Networking, pages 29–51. Addison-Wesley, 2001.
[104] B. Friedlander and A. J. Weiss. Direction finding in the presence of mutual cou-
pling. IEEE Transactions on Antennas and Propagation, 39(3):273–284, March
1991.
[105] H. Gao, P. J. Smith, and M. V. Clark. Theoretical reliability of MMSE lin-
ear diversity combining in Rayleigh-fading additive interference channels. IEEE
Transactions on Communications, 46(5):666 –672, May 1998.
[106] W. A. Gardner. Exploitation of spectral redundancy in cyclostationary signals.
IEEE Signal Processing Magazine, 8(2):14–36, April 1991.
[107] Patrick Geddes. The Life and Work of Sir Jagadis C. Bose. Longmans, Green
and Co., London, 1920.
[108] S. I. Gel’fand and M. S. Pinsker. Coding for channel with random parameters.
Problems of Control Theory, 9(1):19–31, 1980.
References 575
[109] D. Gerlach and A. Paulraj. Adaptive transmitting antenna methods for mul-
tipath environments. IEEE Global Telecommunications Conference (GLOBE-
COM), 1:425–429, Nov. 1994.
[110] D. Gesbert, H. Bolcskei, D. A. Gore, and A. J. Paulraj. Outdoor MIMO wireless
channels: models and performance prediction. IEEE Transactions on Communi-
cations, 50(12):1926–1934, Dec. 2002.
[111] D. Gesbert, T. Ekman, and N. Christophersen. Capacity limits of dense palm-
sized MIMO arrays. IEEE Global Telecommunications Conference (GLOBE-
COM), 2:1187–1191, Nov. 2002.
[112] M. Godavarti, A. O. Hero III, and T. L. Marzetta. Min-capacity of a multiple-
antenna wireless channel in a static Ricean fading environment. IEEE Transac-
tions on Wireless Communications, 4(4):1715–1723, July 2005.
[113] M. J. E. Golay. Notes on digital coding. Proceedings of the IRE, 37, 1949.
[114] G. D. Golden, G. J. Foschini, R. A. Valenzuela, and P. W. Wolniansky. V-BLAST:
A high capacity space-time architecture for the rich-scattering wireless channel.
Fifth Workshop on Smart Antennas in Wireless Mobile Communications, July
1998.
[115] A. Goldsmith. Wireless Communications. Cambridge University Press, New
York, 2005.
[116] A. Goldsmith, S. A. Jafar, N. Jindal, and S. Vishwanath. Capacity limits of
MIMO channels. IEEE Journal on Selected Areas of Communications, 21(5),
June 2003.
[117] Gene Howard Golub and Charles F. Van Loan. Matrix Computations. Johns
Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press,
Baltimore, 1996.
[118] K. Gomadam, V. R. Cadambe, and S. A. Jafar. Approaching the capacity of wire-
less networks through distributed interference alignment. IEEE Global Telecom-
munications Conference (GLOBECOM), pages 1–6, 2008.
[119] D. A. Gore and A. J. Paulraj. Mimo antenna subset selection with space-time
coding. Signal Processing, IEEE Transactions, 50(10):2580–2588, 2002.
[120] S. Govindasamy. Multiple-Antenna Systems in Ad-Hoc Wireless Networks. Ph. D.
dissertation, Massachusetts Institute of Technology, Department of Electrical En-
gineering and Computer Science, 2008.
[121] S. Govindasamy, F. Antic, D. W. Bliss, and D. Staelin. The performance of linear
multiple-antenna receivers with interferers distributed on a plane. IEEE Inter-
national Workshop on Signal Processing Advances for Wireless Communications,
2005.
[122] S. Govindasamy and D. Bliss. On the spectral efficiency of links with multi-
antenna receivers in non-homogenous wireless networks. In Proceedings of IEEE
ICC, Kyoto, pages 1–6. IEEE, 2011.
[123] S. Govindasamy, D. W. Bliss, and D. H. Staelin. Spectral efficiency in single-hop
ad-hoc wireless networks with interference using adaptive antenna arrays. IEEE
Journal on Selected Areas of Communications, 25(7):1358–1369, Sept. 2007.
[124] S. Govindasamy, D. W. Bliss, and D. H. Staelin. Asymptotic spectral efficiency
of the uplink in spatially distributed wireless networks with multi-antenna base
stations. IEEE Asilomar Conference on Signals, Systems and Computers, 2008.
576 References

of multi-antennas links in ad-hoc wireless networks with limited Tx CSI. IEEE
Transactions on Information Theory, 58(8):5375–5387, August 2012.
of the uplink in spatially distributed wireless networks with multi-antenna base
stations. To appear in IEEE Transactions on Communications, 2013.
[127] S. Govindasamy, D. W. Bliss, and D.H. Staelin. Spectral-efficiency of multi-
antenna links in ad-hoc wireless networks with limited Tx CSI. IEEE Asilomar
Conference on Signals, Systems and Computers, 2009.
[128] S. Govindasamy, R. Rangan, E. Koukina, and A. Lloyd. CDF of the spectral-
efficiency of a simple distributed channel assignment algorithm in spatially dis-
tributed wireless networks. In IEEE Asilomar Conference on Signals, Systems
and Computers, 2009.
[129] I. S. Gradshteyn and I. M. Ryzhik. Table of Integrals, Series, and Products.
Academic Press, New York, 1994.
[130] A. Graham. Kronecker Products and Matrix Calculus. Ellis Horwood Limited,
Chichester, England, 1981.
[131] G. Grimmet and D. R. Stirzaker. Probability and Random Processes. Oxford
University Press, 2001.
[132] P. Gupta and P. R. Kumar. The capacity of wireless networks. IEEE Transactions
on Information Theory, 46(2):388–404, March 2000.
[133] M. Haenggi, J. G. Andrews, F. Baccelli, O. Dousse, and M. Franceschetti.
Stochastic geometry and random graphs for the analysis and design of wireless
networks. IEEE Journal on Selected Areas of Communication, 27, Sept. 2009.
[134] R. W. Hamming. Notes on digital coding. Bell System Technical Journal, 29:147–
160, 1950.
[135] T. Han and K. Kobayashi. A new achievable rate region for the interference
channel. IEEE Transactions on Information Theory, 27(1):49–60, 1981.
[136] Lajos L. Hanzo, T. H. Liew, and B. L. Yeap. Turbo Coding, Turbo Equalisation
and Space-Time Coding. John Wiley & Sons, 2002.
[137] F. J. Harris. On the use of windows for harmonic analysis with the discrete
Fourier transform. Proceedings of the IEEE, 66(1):51–83, Jan. 1978.
[138] R. V. L. Hartley. Transmission of information. Bell System Technical Journal,
pages 535–563, July 1928.
[139] B. Hassibi and B. M. Hochwald. How much training is needed in multiple-antenna
wireless links? IEEE Transactions on Information Theory, 49(4):951–963, April
2003.
[140] M. Hata. Empirical formula for propagation loss in land mobile radio services.
IEEE Transactions on Vehicular Technology, 29(3):317–325, Aug. 1980.
[141] G. F. Hatke, K. W. Forsythe, A. L. McKellips, and T. T. Phuong. Space-time-
frequency adaptive processor design for ultra-sparse apertures. IEEE Asilomar
Conference on Signals, Systems and Computers, Oct. 2006.
[142] Simon Haykin. Adaptive Filter Theory. Prentice Hall, Upper Saddle River, New
Jersey, 1996.
[143] M. Herman, B. Miller, and J. Goodman. The cube coefficient subspace architec-
ture for nonlinear digital predistortion. IEEE Asilomar Conference on Signals,
Systems and Computers, pages 1857–1861, Oct. 2008.
References 577
[144] Heinrich Hertz. Untersuchungen ueber die Ausbreitung der Elektrischen Kraft.
Johann Ambrosius Barth, Leipzig, 1892.
[145] A. Hjorungnes and D. Gesbert. Complex-valued matrix differentiation:
techniques and key results. IEEE Transactions on Signal Processing, 55(6):2740–
2746, June 2007.
[146] Sungook Hong. Wireless: from Marconi’s Black-box to the Audion. Transforma-
tions. MIT Press, 2001.
[147] A. M. Hunter, J. G. Andrews, and S. Weber. Transmission capacity of ad hoc
networks with spatial diversity. IEEE Transactions on Wireless Communications,
7(12), Dec. 2008.
[148] A. M. Hunter, J. G. Andrews, and S. Weber. Transmission capacity of ad hoc
networks with spatial diversity. IEEE Transactions on Wireless Communications,
2009.
[149] A. M. Hunter, J. Andrews, and S. Weber. Transmission capacity of ad hoc
networks with spatial diversity. Wireless Communications, IEEE Transactions,
7(12):5058–5071, 2008.
[150] IEEE. IEEE standard for information technology – telecommunications and in-
formation exchange between systems – local and metropolitan area networks –
specific requirements – part 11: Wireless LAN medium access control (MAC) and
physical layer (PHY) specifications. IEEE Std 802.11-1997, 1997.
[151] IEEE. IEEE standard for information technology – telecommunications and in-
formation exchange between systems – local and metropolitan area networks
– specific requirements part 11: Wireless LAN medium access control (MAC)
and physical layer (PHY) specifications amendment 5: Enhancements for higher
throughput. IEEE Std 802.11n-2009 (Amendment to IEEE Std 802.11-2007 as
amended by IEEE Std 802.11k-2008, IEEE Std 802.11r-2008, IEEE Std 802.11y-
2008, and IEEE Std 802.11w-2009), 29 2009.
[152] IEEE. IEEE standard for information technology–telecommunications and in-
formation exchange between systems wireless regional area networks (WRAN)–
specific requirements part 22: Cognitive wireless RAN medium access control
(MAC) and physical layer (PHY) specifications: Policies and procedures for op-
eration in the TV bands. IEEE Std 802.22-2011, pages 1–680, 1 2011.
[153] Joseph Mitola III. Cognitive Radio Architecture: The Engineering Foundations
of Radio XML. John Wiley & Sons, Hoboken, New Jersey, 2006.
[154] J. D. Jackson. Classical Electrodynamics. John Wiley & Sons, Hoboken, New
Jersey, 1975.
[155] S. A. Jafar. Exploiting channel correlations – simple interference alignment
schemes with no CSIT. IEEE Global Telecommunications Conference (GLOBE-
COM), pages 1–5, 2010.
[156] S. A. Jafar. Interference Alignment – A New Look at Signal Dimensions in a
Communication Network. Now Publishing, 2011.
[157] Hamid Jafarkhani. Space-Time Coding: Theory and Practice. Cambridge Uni-
versity Press, 2005.
[158] A. K. Jagannatham and B. D. Rao. Cramer–Rao lower bound for con-
strained complex parameters. IEEE Transactions on Signal Processing Letters,
11(11):875–878, Nov. 2004.
578 References
[159] Alan T. James. Distributions of matrix variates and latent roots derived from
normal samples. The Annals of Mathematical Statistics, 35(2):pp. 475–501, 1964.
[160] Mohinder Jankiraman. Space-Time Codes and MIMO Systems. Artech House,
2004.
[161] Z. Ji and K. J. R. Liu. Dynamic spectrum sharing: a game theoretical overview.
IEEE Communications Magazine, 45(5):88–94, May 2007.
[162] Y. Jiang, J. Li, and W. W. Hager. Joint transceiver design for MIMO commu-
nications using geometric mean decomposition. IEEE Transactions on Signal
Processing, 53(10):3791–3803, Oct. 2005.
[163] N. Jindal, J. G. Andrews, and S. Weber. Bandwidth partitioning in decentralized
wireless networks. IEEE Transactions on Wireless Communications, 7(12):5408–
5419, 2008.
[164] N. Jindal, J. G. Andrews, and S. Weber. Rethinking MIMO for wireless networks:
linear throughput increases with multiple receive antennas. IEEE International
Conference on Communications (ICC), June 2009.
[165] N. Jindal, J. G. Andrews, and S. Weber. Multi-antenna communication in ad
hoc networks: Achieving mimo gains with simo transmission. Communications,
IEEE Transactions, 59(2):529–540, 2011.
[166] Y. Jing and B. Hassibi. Distributed space-time coding in wireless relay networks.
Wireless Communications, IEEE Transactions on, 5(12):3524–3536, 2006.
[167] J. B. Johnson. Thermal agitation of electricity in conductors. Physical Review,
32:97–109, Jul 1928.
[168] D. Jonsson. Some limit theorems for the eigenvalues of a sample covariance
matrix. Journal of Multivariate Analysis, 12:1–38, 1982.
[169] R. Kahn. The organization of computer resources into a packet radio network.
IEEE Transactions on Communications, 25(1):169–178, Jan. 1977.
[170] S. Karmakar and M. K. Varanasi. Capacity of the mimo interference channel to
within a constant gap. In Information Theory Proceedings (ISIT), 2011 IEEE
International Symposium on, pages 2193–2197, 31 2011–Aug. 5 2011.
[171] Alan F. Karr. Probability. Springer-Verlag, 1993.
[172] Steven M. Kay. Fundamentals of Statistical Signal Processing: Estimation Theory.
Prentice Hall, Upper Saddle River, NJ, 1993.
[173] E. J. Kelly and K. W. Forsythe. Adaptive Detection and Parameter Estima-
tion for Multidimensional Signal Models. Technical Report 848, M.I.T. Lincoln
Laboratory, April, 1989.
[174] Maurice Kendall and Alan Stuart. The Advanced Theory of Statistics. Macmillan
Publishing, New York, 1979.
[175] H. S. Kim and A. O. Hero. Comparison of GLR and invariant detectors un-
der structured clutter covariance. IEEE Transactions on Image Processing,
10(10):1509–1520, Oct. 2001.
[176] A. N. Kolmogorov. Stationary sequences in Hilbert space. Bulletin of Moscow
University, 2(6):1–40, 1941.
[177] G. Kramer. Outer bounds on the capacity of Gaussian interference channels.
IEEE Transactions on Information Theory, 50(3):581–586, 2004.
[178] John Daniel Kraus and Daniel A. Fleisch, editors. Electromagnetics with Appli-
cations, 5th Edition. McGraw-Hill, New York, 1999.
References 579
[179] S. Kraut, L. L. Scharf, and R. W. Butler. The adaptive coherence estimator: a

uniformly most-powerful-invariant adaptive detection statistic. IEEE Transac-
tions on Signal Processing, 53(2):427–438, February 2005.
[180] Erwin Kreyszig. Advanced Engineering Mathematics. John Wiley & Sons, 2006.
[181] A. R. Kuruc. Lower bounds on multiple-source direction finding in the presence of
direction-dependent antenna-array-calibration errors. Technical Report TR-799,
MIT Lincoln Laboratory, Oct. 1989.
[182] E. G. Larsson. MIMO detection methods: How they work [lecture notes]. IEEE
Signal Processing Magazine, 26(3):91–95, May 2009.
[183] E. G. Larsson and P. Stoica. Space-Time Block Coding for Wireless Commu-
nications. Space-Time Block Coding for Wireless Communications. Cambridge
University Press, 2008.
[184] R. E. Learned, A. S. Willsky, and D. M. Boroson. Low complexity optimal joint
detection for oversaturated multiple access communications. IEEE Transactions
on Signal Processing, 45(1):113–123, Aug. 1997.
[185] T. H. Lee and A. Hajimiri. Oscillator phase noise: a tutorial. IEEE Journal of
Solid-State Circuits, 35(3):326–336, March 2000.
[186] Tom Lewis. Empire of the Air: The Men Who Made Radio. Edward Burlingame,
1991.
[187] Yingbin Liang, A. Somekh-Baruch, H. V. Poor, S. Shamai, and S. Verdu. Ca-
pacity of cognitive interference channels with and without secrecy. IEEE Trans-
actions on Information Theory, 55(2):604–619, Feb. 2009.
[188] J. C. Liberti and T. S. Rappaport. A geometrically based model for line-of-sight
multipath radio channels. IEEE Vehicular Technology Conference, 2:844–848,
April 1996.
[189] Joseph C. Liberti and Theodore S. Rappaport. Smart Antennas for Wireless
Communications: IS-95 and Third Generation CDMA Applications. Prentice
Hall, 1999.
[190] J. Lin, J. G. Proakis, F. Ling, and H. Lev-Ari. Optimal tracking of time-varying
channels: a frequency domain approach for known and new algorithms. IEEE
Journal on Selected Areas in Communications, 13(1):141–154, Jan. 1995.
[191] Shu Lin and Daniel J. Costello. Error Control Coding. Prentice Hall, 2005.
[192] Yi-Bing Lin and Imrich Chlamtac. Wireless and Mobile Network Architectures.
John Wiley & Sons, 2008.
[193] M. Loomis. Improvement in telegraphing, 1872. U.S. Patent 129,971.
[194] R. H. Y. Louie, M. R. McKay, and I. B. Collings. Open-loop spatial multiplexing
and diversity communications in ad hoc networks. Information Theory, IEEE
Transactions, 57(1):317–344, 2011.
[195] D. J. Love, R. W. Heath, V. K. N. Lau, D. Gesbert, B. D. Rao, and M. Andrews.
An overview of limited feedback in wireless communication systems. IEEE Jour-
nal on Selected Areas in Communications, 26(8):1341–1365, Oct. 2008.
[196] D. J. Love and R. W. Heath, Jr. Limited feedback unitary precoding for orthog-
onal space-time block codes. IEEE Transactions on Signal Processing, 53(1):64–
73, 2005.
[197] D. J. Love and R. W. Heath, Jr. Multimode precoding for MIMO wireless sys-
tems. IEEE Transactions on Signal Processing, 53(10):3674–3687, Oct. 2005.
580 References
[198] D. J. Love, R. W. Heath, Jr., and T. Strohmer. Grassmannian beamforming for

multiple-input multiple-output wireless systems. IEEE Transactions on Infor-
mation Theory, 49(10):1341–1365, Oct. 2003.
[199] R. W. Lucky and H. R. Rudin. Generalized automatic equalization for commu-
nication channels. Proceedings of the IEEE, 54(3):439–440, March 1966.
[200] X. Ma, G. B. Giannakis, and S. Ohno. Optimal training for block transmissions
over doubly selective wireless fading channels. IEEE Transactions on Signal
Processing, 51(5):1351–1366, May 2003.
[201] Z. Ma, X. Wu, and W. Zhu. An ICI–free synchronization algorithm in MIMO
OFDM system. 2nd International Symposium on Wireless Pervasive Computing,
ISWPC, Feb. 2007.
[202] David J. C. MacKay. Information Theory, Inference, and Learning Algorithms.
Cambridge University Press, 2003.
[203] U. Madhow. Fundamentals of Digital Communication. Cambridge University
Press, 2008.
[204] Dimitris G. Manolakis and Vinay K. Ingle. Applied Digital Signal Processing:
Theory and Practice. Cambridge University Press, Cambridge, 2011.
[205] Dimitris G. Manolakis, Vinay K. Ingle, and Stephen M. Kogon. Stastistical and
Adaptive Signal Processing. Artech House, 2005.
[206] V. A. Marcenko and L. A. Pastur. Distribution of eigenvalues for some sets of
random matrices. Mathematics of the USSR-Sbornik, 1(4), 1967.
[207] A. R. Margetts, K. W. Forsythe, and D. W. Bliss. Direct space-time GF(q) LDPC
modulation. IEEE Asilomar Conference on Signals, Systems and Computers,
Oct. 2006.
[208] H. K. Markey and G. Antheil. Secret communication system, 1942. U.S. Patent
2,292,387.
[209] T. L. Marzetta and B. M. Hochwald. Capacity of a mobile multiple-antenna com-
munication link in Rayleigh fading. IEEE Transactions on Information Theory,
45:139–158, January 1999.
[210] A. M. Mathai. An Introduction to Geometrical Probability. Gordon and Breach,
1999.
[211] J. C. Maxwell. A dynamic theory of electromagnetic field. Philosophical Trans-
actions of the Royal Society of London, 155:459–512, 1865.
[212] Robert J. McEliece. The Theory of Information and Coding. Encyclopedia of
Mathematics and Its Applications. Cambridge University Press, 2002.
[213] Merriam-Webster. The Merriam-Webster Dictionary. Perfection Learning, 2005.
[214] R. M. Metcalfe and D. R. Boggs. Ethernet: distributed packet switching for local
computer networks. Communications of the ACM, 19:395–404, July 1976.
[215] R. Mhiri, D. Masse, and D. Schafhuber. Synchronization for a DVB-T receiver in
presence of co-channel interference. IEEE International Symposium on Personal,
Indoor and Mobile Radio Communications, pages 2307–2311, Sept. 2002.
[216] R. E. Miles. On the homogeneous planar Poisson point process. Mathematical
Biosciences, 6:85–127, 1970.
[217] Kenneth S. Miller. Some Eclectic Matrix Theory. Robert E. Krieger Publishing,
New York, 1987.
[218] Pratap Misra and Per Enge. Global Positioning System: Signals, Measurements,
and Performance. Ganga-Jamuna Press, Hoboken, NJ, 2006.
References 581
[219] J. Mitola III and G. Q. Maguire, Jr. Cognitive radio: making software radios
more personal. IEEE Personal Communications, 6(4):13–18, Aug. 1991.
[220] Sanjit Kumar Mitra. Digital Signal Processing: A Computer Based Approach.
McGraw-Hill, New York, 2006.
[221] A. N. Mody and G. L. Stuber. Synchronization for MIMO OFDM systems. IEEE
Global Telecommunications Conference (GLOBECOM), 1:509–513, 2001.
[222] A. F. Molisch, M. Z. Win, and J. H. Winters. Space-time-frequency (stf) coding
for mimo-ofdm systems. Communications Letters, IEEE, 6(9):370–372, 2002.
[223] R. A. Monzingo and T. W. Miller. Introduction to Adaptive Arrays. John Wiley
& Sons, New York, 1980.
[224] A. S. Motahari and A. K. Khandani. Capacity bounds for the Gaussian interfer-
ence channel. IEEE Transactions on Information Theory, 55(2):620–643, 2009.
[225] J. C. Mundarath, P. Ramanathan, and B. D. Van Veen. A cross layer scheme
for adaptive antenna array based wireless ad hoc networks in multipath environ-
ments. Wireless Networks, 13:597–615, October 2007.
[226] A.F. Naguib, V. Tarokh, N. Seshadri, and A.R. Calderbank. A space-time coding
modem for high-data-rate wireless communications. Selected Areas in Commu-
nications, IEEE Journal, 16(8):1459–1478, 1998.
[227] J. R. Nash. Equilibrium points in n-person games. Proceedings of the National
Academy of Sciences of the United States of America, 36(1):48–49, Jan. 1950.
[228] B. Nazer, S. A. Jafar, M. Gastpar, and S. Vishwanath. Ergodic interference
alignment. IEEE International Symposium on Information Theory (ISIT), pages
1769–1773, 2009.
[229] A. Nehorai and E. Paldi. Vector-sensor array processing for electromagnetic
source localization. IEEE Transactions on Signal Processing, 42(2):376–398,
February 1994.
[230] David L. Nicholson. Spread Spectrum Signal Design: LPE and AJ Systems. Com-
puter Science Press, New York, 1988.
[231] D. Niyato and E. Hossain. Market-equilibrium, competitive, and cooperative
pricing for spectrum sharing in cognitive radio networks: analysis and comparison.
IEEE Transactions on Wireless Communications, 7(11):4273–4283, Nov. 2008.
[232] A. Nuttall. Some integrals involving the qm function. IEEE Transactions on
Information Theory, 21(1):95–96, Jan. 1975.
[233] H. Nyquist. Thermal agitation of electric charge in conductors. Physical Review,
32:110–113, July 1928.
[234] H. Ochiai, P. Mitran, H. V. Poor, and V. Tarokh. Collaborative beamforming
for distributed wireless ad hoc sensor networks. IEEE Transactions on Signal
Processing, 53(11):4110–4124, Nov. 2005.
[235] Atsuyuki Okabe, Barry Boots, Kokichi Sugihara, and Sung Nok Chiu. Spatial
Tessellations. Concepts and Applications of Voronoi Diagrams. With a foreword
by DG Kendall. John Wiley & Sons, Hoboken, New Jersey, 2000.
[236] B. M. Oliver. Thermal and quantum noise. Proceedings of the IEEE, 53(5):436–
454, May 1965.
[237] E. Ollila, V. Koivunen, and J. Eriksson. On the Cramer–Rao for the constrained
and unconstrained complex parameters. IEEE Sensor Array and Multichannel
Signal Processing Workshop, pages 414–418, July 2008.
582 References
[238] Alan V. Oppenheim, Ronald W. Schafer, and John R. Buck. Discrete-Time Signal
Processing. Prentice Hall, Upper Saddle River, NJ, 1999.
[239] H. C. Ørsted, K. Jelved, A. D. Jackson, and O. Knudsen. Selected Scientific
Works of Hans Christian Ørsted. Princeton University Press, 1998.
[240] A. Ozgur, O. Leveque, and D. Tse. Hierarchical cooperation achieves optimal
capacity scaling in ad-hoc networks. IEEE Transactions on Information Theory,
53(10):3549–3572, Oct. 2007.
[241] Athanasios Papoulis and S. Unnikrishna Pillai. Probability, Random Variables,
and Stochastic Processes, 4th Edition. McGraw-Hill, New York, 2002.
[242] M. Park, S.-H. Choi, and S. M. Nettles. Cross-layer mac design for wireless
networks using MIMO. IEEE Global Telecommunications Conference (GLOBE-
COM), 5:5 pp. –2874, Dec. 2005.
[243] P. A. Parker and D. W. Bliss. Outer bounds for the MIMO interference channel.
IEEE Asilomar Conference on Signals, Systems and Computers, pages 1108–
1112, Oct. 2008.
[244] P. A. Parker, P. Mitran, D. W. Bliss, and V. Tarokh. On bounds and algorithms
for frequency synchronization for collaborative communication systems. IEEE
Transactions on Signal Processing, 56(8):3742–3752, Aug. 2008.
[245] A. J. Paulraj and T. Kailath. Increasing capacity in wireless broadcast systems
using distributed transmission/directional reception (dtdr), 1994. U.S. Patent
5,345,599.
[246] A. J. Paulraj and C. B. Papadias. Space-time processing for wireless communi-
cations. IEEE Signal Processing Magazine, 14(6):49–83, Nov. 1997.
[247] Arogyswami Paulraj, Rohit Nabar, and Dhananjay Gore. Introduction to Space-
Time Wireless Communications. Cambridge University Press, Cambridge, 2003.
[248] S. U. Pillai and C. S. Burrus. Array Signal Processing. Springer-Verlag, 1989.
[249] P. Pirenen. Cellular topology and outage evaluation for ds-uwb system with
correlated lognormal multipath fading. The 17th Annual IEEE International
Symposium on Personal, Indoor and Mobile Radio Communications, 2006.
[250] H. V. Poor. An Introduction to Signal Detection and Estimation. Springer-Verlag,
1994.
[251] H. V. Poor and G. W. Wornell. Wireless Communications: Signal Processing
Perspectives. Prentice Hall, 1998.
[252] David M. Pozar. Microwave Engineering. John Wiley & Sons, Hoboken, New
Jersey, 2005.
[253] N. Prasad and M. K. Varanasi. Outage theorems for MIMO block-fading chan-
nels. IEEE Transactions on Information Theory, 52(12):5284–5296, Dec. 2006.
[254] R. Price. and P. E. Green. A communication technique for multipath channels.
Proceedings of the Institute of Radio Engineers, 46(3):555–570, March 1958.
[255] John G. Proakis. Digital Communications. McGraw-Hill, New York, 2001.
[256] John G. Proakis and Dimitris G. Manolakis. Digital Signal Processing. Pearson
Prentice Hall, 2007.
[257] R. W. Heath, Jr. and G. B. Giannakis. Exploiting input cyclostationarity for
blind channel identification in OFDM systems. IEEE Transactions on Signal
Processing, 47(3):848–856, March 1999.
[258] C. Rao and B. Hassibi. Analysis of multiple-antenna wireless links at low snr.
IEEE Transactions on Information Theory, 50(9):2123–2130, Sept. 2004.
References 583
[259] P. B. Rapajic and D. Popescu. Information capacity of a random signature

multiple-input multiple-output channel. IEEE Transactions on Information The-
ory, 48(8):1245–1248, Aug. 2000.
[260] T. S. Rappaport. Wireless Communications: Principles & Practice. Prentice
Hall, 1996.
[261] Behzad Razavi. RF Microelectronics. Pearson Education, 2011.
[262] Reinhold Remmert. Theory of Complex Functions. Springer-Verlag, New York,
1991.
[263] C. D. Richmond. Mean squared error and threshold SNR prediction of maximum-
likelihood signal parameter estimation with estimated colored noise covariances.
IEEE Transactions on Information Theory, 52(5):2146–2164, May 2006.
[264] F. Rief. Fundamentals of Statistical and Thermal Physics. McGraw-Hill, New
York, 1965.
[265] Gordon Rottman. World War II Battlefield Communications. Osprey Publishing,
Oxford, 2010.
[266] R. Roy and T. Kailath. ESPRIT-estimation of signal parameters via rotational
invariance techniques. IEEE Transactions on Acoustics, Speech and Signal Pro-
cessing, 37(7):984–995, July 1989.
[267] M. Sadek, A. Tarighat, and A. H. Sayed. A leakage-based precoding scheme for
downlink multi-user MIMO channels. IEEE Transactions on Wireless Commu-
nications, 6(5), May 2007.
[268] H. Sampath, P. Stoica, and A. Paulraj. Generalized linear precoder and decoder
design for MIMO channels using the weighted MMSE criterion. IEEE Transac-
tions on Communications, 49(12):2198–2206, Dec. 2001.
[269] S. Sandhu, R. Heath, and A. Paulraj. Space-time block codes versus space-
time trellis codes. In Communications, 2001. ICC 2001. IEEE International
Conference, volume 4, pages 1132–1136. IEEE, 2001.
[270] S. Sandhu and A. Paulraj. Space-time block codes: A capacity perspective. Com-
munications Letters, IEEE, 4(12):384–386, 2000.
[271] I. Sason. On achievable rate regions for the Gaussian interference channel. IEEE
Transactions on Information Theory, 50(6):1345–1356, June 2004.
[272] H. Sato. The capacity of the Gaussian interference channel under strong inter-
ference (corresp.). IEEE Transactions on Information Theory, 27(6):786–788,
1981.
[273] Ali H. Sayed. Adaptive Filters. John Wiley & Sons, Hoboken, New Jersey, 2008.
[274] A. Scaglione, P. Stoica, S. Barbarossa, G. B. Giannakis, and H. Sampath. Optimal
designs for space-time linear precoders and decoders. IEEE Transactions on
Signal Processing, 50(5):1051–1064, May 2002.
[275] Louis Scharf and Cedric Demeure. Statistical Signal Processing: Detection, Esti-
mation, and Time Series Analysis. Addison-Wesley, Reading, MA, 1991.
[276] R. O. Schmidt. A signal subspace approach to multiple emitter location and
spectral estimation. Ph.D. dissertation, Stanford University, 1981.
[277] P. Schniter. Low-complexity equalization of OFDM in doubly selective channels.
IEEE Transactions on Signal Processing, 52(4):1002–1011, April 2004.
[278] R. A. Scholtz. The origins of spread-spectrum communications. IEEE Transac-
tions on Communications, 30(5):822–854, May 1982.
584 References
[279] M. Schwartz. Edouard Branly, the coherer, and the Branly effect [history of
communications]. IEEE Communications Magazine, 47(9):20–26, Sept. 2009.
[280] Mischa Schwartz, William R. Bennett, and Seymour Stein. Communication Sys-
tems and Techniques. McGraw-Hill, New York, 1966.
[281] X. Shang, B. Chen, G. Kramer, and H. V. Poor. Interference suppression in the
presence of quantization errors. Allerton Conference on Communication, Control,
and Computing, pages 700–707, Sept. 2008.
[282] X. Shang, B. Chen, G. Kramer, and H. V. Poor. Capacity regions and sum-
rate capacities of vector gaussian interference channels. IEEE Transactions on
Information Theory, 56(10):5030–5044, Oct. 2010.
[283] Xiaohu Shang, Biao Chen, and Michael J. Gans. On the achievable sum rate for
MIMO interference channels. IEEE Transactions on Information Theory, 52(9),
September 2006.
[284] C. E. Shannon. A mathematical theory of communication. Bell System Technical
Journal, 27:379–423, July 1948.
[285] D. F. Sievenpiper, D. C. Dawson, M. M. Jacob, T. Kanar, S. Kim, J. Long,
and R. G. Quarfoth. Experimental validation of performance limits and design
guidelines for small antennas. IEEE Transactions onAntennas and Propagation,
60(1):8–19, Jan. 2012.
[286] J. W. Silverstein. Eigenvalues and eigenvectors of large dimensional sample co-
variance matrices. Contemporary Mathematics, 50:153–159, 1986.
[287] Bernard Sklar. Digital Communications: Fundamentals and Applications. Pren-
tice Hall, 1988.
[288] S. T. Smith. Statistical resolution limits and the complexified Cramer–Rao
bound. IEEE Transactions on Signal Processing, 53(5):1597–1609, May 2005.
[289] D. H. Staelin, D. W. Bliss Jr, D. A. Hinton, et al. Protocols for multi-antenna ad-
hoc wireless networking in interference environments. PhD thesis, Massachusetts
Institute of Technology, 2010.
[290] D. H. Staelin, A. W. Morgenthaler, and J. A. Kong. Electromagnetic Waves.
Prentice Hall Englewood Cliffs, NJ, 1994.
[291] William Stallings. Data and Computer Communications. Pearson/Prentice Hall,
2007.
[292] A. Stefanov and T. M. Duman. Turbo coded modulation for wireless communi-
cations with antenna diversity. Proceedings of IEEE Vehicular Technology Con-
ference, Amsterdam, 3:1565–1569, Sept. 1999.
[293] S. Stein. Unified analysis of certain coherent and noncoherent binary communi-
cations systems. IEEE Transactions on Information Theory, 10(1):43–51, Jan.
1964.
[294] Bernard D. Steinberg. Principles of Aperture and Array System Design: Including
Random and Adaptive Arrays. John Wiley & Sons, New York, 1976.
[295] P. Stoica, E.G. Larsson, and A. B. Gershman. The stochastic CRB for array
processing: a textbook derivation. IEEE Signal Processing Letters, 8(5):148–150,
May 2001.
[296] P. Stoica and A. Nehorai. MUSIC, maximum likelihood, and Cramer–Rao bound.
IEEE Transactions on Acoustics, Speech and Signal Processing, 37(5):720–741,
May 1989.
References 585
[297] Petre Stoica and Randolph Moses. Introduction to Spectral Analysis. Prentice
Hall, 1997.
[298] D. Stoyan, W. S. Kendall, and J. Mecke. Stochastic Geometry and Its Applica-
tions. John Wiley & Sons, Hoboken, New Jersey, 1995.
[299] Dietrich Stoyan, Wilfrid S. Kendall, and Joseph Mecke. Stochastic Geometry and
Its Applications, 2nd Edition. John Wiley & Sons, 1995.
[300] P. D. Sutton, K. E. Nolan, and L. E. Doyle. Cyclostationary signatures in prac-
tical cognitive radio applications. IEEE Journal on Selected Areas in Communi-
cations, 26(1):13–24, Jan. 2008.
[301] T. Svantesson. A double-bounce channel model for multi-polarized MIMO sys-
tems. IEEE Vehicular Technology Conference, 2:691–695, Fall 2002.
[302] T. Svantesson and A. L. Swindlehurst. A performance bound for prediction of
mimo channels. IEEE Transactions on Signal Processing, 54(2):520–529, Feb.
2006.
[303] A. Taherpour, M. Nasiri-Kenari, and S. Gazor. Multiple antenna spectrum
sensing in cognitive radios. IEEE Transactions on Wireless Communications,
9(2):814–823, Feb. 2010.
[304] Tapan K. Sarkar et al. History of Wireless. John Wiley & Sons, Hoboken, New
Jersey, 2006.
[305] V. Tarokh, H. Jafarkhani, and A. R. Calderbank. Space-time block codes from
orthogonal designs. IEEE Transactions on Information Theory, 45(5):1456–1467,
July 1999.
[306] V. Tarokh, A. Naguib, N. Seshadri, and A.R. Calderbank. Combined array
processing and space-time coding. Information Theory, IEEE Transactions on,
45(4):1121–1128, 1999.
[307] V. Tarokh, N. Seshadri, and A. R. Calderbank. Space-time codes for high data
rate wireless communication: performance criterion and code construction. IEEE
Transactions on Information Theory, 44(2):744–765, March 1998.
[308] I. E. Telatar. Capacity of multi-antenna Gaussian channels. European Transac-
tions on Telecommunications, 10(6):585–595, Nov.–Dec. 1999.
[309] N. Tesla. System of transmission of electrical energy, 1890. U.S. Patent 645,576.
[310] N. Tesla. On light and other high frequency phenomena. Record of Franklin
Institute, 1893.
[311] S. C. Thompson, J. G. Proakis, and J. R. Zeidler. The effectiveness of signal
clipping for PAPR and total degradation reduction in OFDM systems. IEEE
Global Telecommunications Conference (GLOBECOM), 5:5 pp. –2811, Dec. 2005.
[312] H. L. Van Trees. Detection, Estimation, and Modulation Theory, Part I. John
Wiley & Sons, New York, 1968.
[313] D. Tse and S. Hanly. Linear multiuser receivers: effective interference, effec-
tive bandwidth and user capacity. IEEE Transactions on Information Theory,
45(2):641–657, 1999.
[314] David Tse and Pramod Viswanath. Fundamentals of Wireless Communication.
Cambridge University Press, Cambridge, 2005.
[315] Antonia M. Tulino and Sergio Verdu. Random Matrix Theory and Wireless Com-
munications. Now Publishers, 2004.
[316] G. Ungerboeck. Trellis-coded modulation with redundant signal sets part i: in-
troduction. IEEE Communications Magazine, 25(2):5–11, Feb. 1987.
586 References
[317] G. Ungerboeck. Trellis-coded modulation with redundant signal sets part ii: state
of the art. IEEE Communications Magazine, 25(2):12–21, Feb. 1987.
[318] A. van den Bos. A Cramer–Rao lower bound for complex parameters. IEEE
Transactions on Signal Processing, 42(10), Oct. 1994.
[319] R. van Nee and R. Prasad. OFDM Wireless Multimedia Communications. Artech
House, Boston, 2000.
[320] M. K. Varanasi, C. T. Mullis, and A. Kapur. On the limitation of linear MMSE
detection. IEEE Transactions on Information Theory, 52(9):4282–4286, Sept.
2006.
[321] R. Vaze and R. W. Heath. Transmission capacity of ad-hoc networks with mul-
tiple antennas using transmit stream adaptation and interference cancellation.
Information Theory, IEEE Transactions, 58(2):780–792, 2012.
[322] S. Verdu. Optimum multi-user signal detection. Ph.D. Thesis, Dept. of Electrical
and Computer Engineering, University of Illinois, Aug. 1984.
[323] S. Verdu and S. Shamai (Shitz). Spectral efficiency of CDMA with random
spreading. IEEE Transactions on Information Theory, 45(2):622–640, March
1999.
[324] Sergio Verdu. Multiuser Detection. Cambridge University Press, Cambridge,
1998.
[325] E. Visotsky and U. Madhow. Space-time transmit precoding with imperfect
feedback. IEEE Transactions on Information Theory, 47(6):2632–2639, Sept.
2001.
[326] P. Viswanath and D. Tse. Sum capacity of the vector gaussian broadcast channel
and uplinkdownlink duality. IEEE Transactions on Information Theory, 49,
August 2003.
[327] A. Viterbi. Error bounds for convolutional codes and an asymptotically optimum
decoding algorithm. IEEE Transactions on Information Theory, 13(2):260–269,
April 1967.
[328] A.J. Viterbi et al. CDMA: Principles of spread spectrum communication, volume
129. Addison-Wesley, 1995.
[329] M. Vu and V. Tarokh. Scaling laws of single-hop cognitive networks. IEEE
Transactions on Wireless Communications, 8(8):4089–4097, Aug. 2009.
[330] Mai Vu and A. Paulraj. MIMO wireless linear precoding. IEEE Signal Processing
Magazine, 24(5):86–105, Sept. 2007.
[331] Branka Vucetic and Jinhong Yuan. Space-Time Coding. John Wiley & Sons,
2003.
[332] J. W. Wallace, Chan Chen, and M. A. Jensen. Key generation exploiting MIMO
channel evolution: Algorithms and theoretical limits. European Conference on
Antennas and Propagation, (EuCAP), pages 1499–1503, March 2009.
[333] D. Wang and J. Zhang. Timing synchronization for MIMO OFDM WLAN sys-
tems. IEEE Wireless Communications and Networking Conference, pages 1177–
1182, March 2007.
[334] X. Wang. Volumes of generalized unit balls. IMathematics Magazine, 78(5):390–
395, Dec. 2005.
[335] X. Wang and H. V. Poor. Space-time multiuser detection in multipath CDMA
channels. IEEE Transactions on Signal Processing, 47(9):2356–2374, Sept. 1999.
References 587
[336] J. Ward and R. T. Compton, Jr. Improving the performance of a slotted ALOHA
packet radio network with an adaptive array. IEEE Transactions on Communi-
cations, 40(2):292–300, Feb. 1992.
[337] J. Ward and R. T. Compton, Jr. High throughput slotted ALOHA packet
radio networks with adaptive arrays. IEEE Transactions on Communications,
41(3):460–470, March 1993.
[338] W. W. Ward. The NOMAC and Rake systems. Lincoln Laboratory Journal,
5(3):351–366, 1992.
[339] M. Wax and T. Kailath. Detection of signals by information theoretic criteria.
IEEE Transactions on Acoustics, Speech and Signal Processing, 33(2):387– 392,
April 1985.
[340] S. Weber, X. Yang, J. G. Andrews, and G. de Veciana. Transmission capac-
ity of wireless ad-hoc networks with outage constraints. IEEE Transactions on
Information Theory, pages 4091–4102, Dec. 2005.
[341] H. Weingarten, Y. Steinberg, and S. Shamai (Shitz). The capacity region of the
Gaussian multiple-input-multiple-output broadcast channel. IEEE Transactions
on Information Theory, 52(9), Sept. 2006.
[342] E. Weinstein and A. J. Weiss. A general class of lower bounds in parameter
estimation. IEEE Transactions on Information Theory, 34:338–342, March 1988.
[343] E. T. Whittaker and G. N. Watson. A Course of Modern Analysis. Cambridge
University Press, Cambridge, 1927.
[344] B. Widrow and S. S. Haykin. Least-Mean-Square Adaptive Filters. John Wiley
& Sons, Hoboken, New Jersey, 2003.
[345] B. Widrow and Jr. M. E. Hoff. Adaptive switching circuits. Convention Record
of IRE WESCON, 4:96–104, 1960.
[346] Norbert Wiener. Extrapolation, Interpolation, and Smoothing of Stationary Time
Series. John Wiley & Sons, New York, 1949.
[347] E. P. Wigner. Characteristic vectors of bordered matrices with infinite dimen-
sions. Annals of Mathematics, 62(3):548, Nov. 1955.
[348] William H. Tranter, et al., editor. The Best of the Best: Fifty Years of Commu-
nications and Networking Research. John Wiley & Sons, Hoboken, New Jersey,
2007.
[349] J. Winters. On the capacity of radio communication systems with diversity in a
Rayleigh fading environment. IEEE Journal on Selected Areas in Communica-
tions, 5(5):871–878, June 1987.
[350] J. H. Winters, J. Salz, and R. D. Gitlin. The capacity of wireless communication
systems can be substantially increased by the use of antenna diversity. Proceed-
ings of the 1st International Conference on Universal Personal Communications,
pages 02.01/1–02.01/5, 1992.
[351] J. H. Winters, J. Salz, and R. D. Gitlin. The impact of antenna diversity on the
capacity of wireless communication systems. IEEE Transactions on Communi-
cations, 42(234):1740–1751, 1994.
[352] W. Wirtinger. Zur formalen Theorie der Functionen von mehr complexen Vern-
derlichen. Mathematische Annalen, 97(1):357–375, 1927.
[353] P. W. Wolniansky, G. J. Foschini, G. D. Golden, and R. A Valenzuela. V-BLAST:
an architecture for realizing very high data rates over the rich-scattering wireless
588 References
channel. IEEE International Symposium on Signals, Systems, and Electronics

(ISSSE), pages 295–300, 1998.
[354] K. A. Woyach, A. Sahai, G. Atia, and V. Saligrama. Crime and punishment for
cognitive radios. Allerton Conference on Communication, Control, and Comput-
ing, pages 236–243, Sept. 2008.
[355] John M. Wozencraft and Irwin Mark Jacobs. Principles of Communication En-
gineering. John Wiley & Sons, New York, 1965.
[356] J. Yang and A. L. Swindlehurst. The effects of array calibration errors on
DF-based signal copy performance. IEEE Transactions on Signal Processing,
43(11):2724–2732, Nov. 1995.
[357] S. F. Yau and Y. Bresler. A compact Cramer–Rao bound expression for paramet-
ric estimation of superimposed signals. IEEE Transactions on Signal Processing,
40(5):1226–1230, May 1992.
[358] T. Yucek and H. Arslan. A survey of spectrum sensing algorithms for cognitive
radio applications. IEEE Communications Surveys & Tutorials, 11(1):116–130,
2009.
[359] A. Zanella, M. Chiani, and M. Z. Win. On the marginal distribution of the eigen-
values of Wishart matrices. IEEE Transactions on Communications, 57(4):1050
–1060, April 2009.
[360] Ephraim Zehavi. 8-PSK trellis codes for a Rayleigh channel. IEEE Transactions
on Communications, 40(5):873–884, May 1992.
[361] L. Zheng and D. Tse. Optimal diversity-multiplexing tradeoff in multiple antenna
fading channels. IEEE Asilomar Conference on Signals, Systems and Computers,
Nov. 2001.
[362] L. Zheng and D. N. C. Tse. Diversity and multiplexing: a fundamental trade-
off in multiple-antenna channels. IEEE Transactions on Information Theory,
49(5):1073–1096, May 2003.
[363] A. Zhu and T. J. Brazil. Behavioral modeling of RF power amplifiers based
on pruned Volterra series. IEEE Microwave and Wireless Components Letters,
14(12):563–565, Dec. 2004.
[364] J. Zhu and S Govindasamy. Performance of multi-antenna mmse receivers in
non-homogenous poisson networks. In Communications (ICC), 2012 IEEE In-
ternational Conference. IEEE, 2012.
[365] V. K. Zworykin. Television system, 1938. U.S. Patent 2,141,059.
Index
∗, 12 electrically small, 560

E b /N 0 , 165 gain, 142, 144
χ 2 distribution isotropic, 180
complex, 525 polarimetric, 560
†, 12 antenna array
∃, 12 angle estimation, 201
∀, 12 angle-of-arrival estimation, 201
∈, 12 beamscan, 205
, 17 beamsum, 207
⊗, 18 boresight, 177
i, 12 circular, 177
802.11n, 504 direction finding, 201
end fire, 177
absolutely integrable, 89 irregular sparse, 188
acquisition, 547 linear, 179
ad hoc wireless networks, 538 manifold, 201
adaptive gain control, 568 maximum-likelihood angle estimation
adaptive processing known reference, 203
space-time, 353 maximum-likelihood angle estimation
adaptive receiver, 295 unknown reference, 205
adaptive receiver minimum variance distortionless response,
capacity, 322 207
maximum SINR, 314 multiple signal classification, 208
MMSE, 314 MuSiC, 208
ADC, 564 MVDR, 207
Advanced Research Projects Agency, 7 polarization, 196
AGC, 568 regular, 186
airborne platforms, 174 sparse, 186
Alamouti, 371 vector, 235
Alamouti, Siavash M., 9 antenna arrays, 170
aliasing, 46, 138 antenna feed, 559
almost sure convergence, 82 Antheil, George, 6
ALOHA, 496 anti-aliasing filter, 138
ALOHANet, 9 AOA, 174
amplitude modulation, 5 aperture, 179
analog-to-digital converter, 564 ARC-50, 6
analytic, 34, 40 Armstrong, Edwin Howard, 5
anechoic, 170 ARPA, 7
angle estimation, 201 array calibration, 173
angle of arrival, 174 array factor, 174
antenna, 141, 559 array pattern, 174
antenna array response, 174
dipole, 142 array-signal-to-noise ratio, 277
effective area, 144, 146 ASNR, 277
590 Index
associate, 18 bound
asymptotic eigenvalue densities, 270 spectral efficiency, 243
atom, 270 BPSK, 120
attenuation Branly, Edouard Eugene Desire, 3
line-of-sight, 143 Braun, Karl Ferdinand, 3
auto-correlation, 87 brick-wall filter, 138
back substitution, 567 calculus of variations, 53

base station, 119 capacity
Bayes’ theorem, 66, 67, 296 achievability, 154
beam pattern, 174, 175 ergodic, 253, 289
beam pattern estimated SNR, 289
Fourier transform formulation, 182 MIMO, 28
symmetry, 182 MIMO flat-fading channel, 243
beamformer, 174 outage, 275
beamformer real versus complex variables, 156
channel inversion, 303 capacity scaling laws, 470
decorrelating, 303 carrier-sense multiple access, 322
least squared error , 313 cat’s whisker detector, 4
matched-filter, 177 Cauchy–Riemann equations, 34, 36
maximum ratio combiner (MRC), 301 Cauchy–Schwarz inequality, 17, 100
maximum SINR, 313 CDF, 66
MMSE, 313, 315 CDMA, 6, 129
receive, 175 ceiling, 14
transmit, 175 cellular phone, 8
zero forcing, 303, 305 central limit theorem, 188, 266
beamforming central moment, 68
adaptive, 300, 315 channel
external interference, 308 delay spread, 122, 341
minimum interference, 308, 310 discretely sample issues, 342
orthogonal, 304, 307 dispersive, 348, 356
over-constrained minimum interference, Doppler spread, 123
310 doubly dispersive, 344, 356
SNR loss, 315 dynamic, 341
beamwidth estimation, 281
generalized, 261 estimation bound, 283
beamwidth separation, 261 estimation feedback, 291
Bell Laboratories, 8 fading, 123
Berrou, Claude, 8 flat-fading, 239
Bessel function, 62, 238 fractional delay, 344
Bessel function frequency-selective, 123, 298, 341,
contour integral representation, 222 348
modified, 62, 75, 77, 81 frequency-selective channel
beta function, 61 compensation, 348
beta function Gaussian, 252
incomplete, 61 Gaussian noise, 159
big-O, 57 geometric capacity interpretation,
binary hypothesis test, 550 149
binary phase-shift keying, 120 matrix, 240
binning, 162 MIMO, 239
BLAST, 387 multipath, 122
block fading, 289 narrowband, 239
Boltzmann constant, 123, 157, precoding, 245, 252
166 Rayleigh, 252
boresight, 177, 182, 188, 212 reciprocity, 246
Bose, Jagdish Chandra, 3 Rician, 252
Index 591
sampled doubly dispersive, 344 covariance matrix

SISO capacity interference-plus-noise, 246, 247, 254, 258,
Shannon capacity, 149 275, 276
stale estimate, 246 space-time, 349, 350
static, 246, 341 space-time-frequency, 358
wireless, 122 whitened, 202
channel capacity CR calculus, 36
asymptotic, 270 Cramer–Rao bound, 548
channel estimation feedback, 292 Cramer–Rao estimation bound, 99
channel impulse response, 298, 342 Cramer–Rao estimation bound
channel matrix, 246 Cramer–Rao estimation bound
channel matrix angle, 211
asymptotic, 270 deterministic signal, 212
whiten, 247 random signal, 214
whitened, 314 real multivariate, 102
channel-state information, 245, 291 real multivariate with Gaussian signal,
channels 103
doubly dispersive, 341 real parameter, 99
characteristic function, 70 reduced information matrix, 105
charge density, 2 cross-correlation, 87
circulant matrix, 26 crystal oscillator, 561
close link, 275 CSI, 245, 291
CSMA, 322, 498
code-division multiple access, 6, 129
CSMA/CA, 499
cognitive radio, 520
cumulant, 70
cognitive radio
cumulative distribution function, 66
channel, 521
current density, 2
performance, 522
cut set, 472
cognitive radio spectral allocation, 522
cut-set bound, 472
complex conjugate, 13, 15
cyclostationary, 533
complex doppelgangers, 36
cyclostationary signals, 351
complex normal distribution, 72
complex numbers, 12
D-BLAST, 387
complex-real calculus, 36
DAC, 564
concave, 14, 69 Dannehl, Kurt, 6
conditional entropy, 158 DARPA, 7
conditional probability density, 67 data matrix
constellation, 121 whitened, 534
contention density, 136 decision feedback, 328, 331
contention protocols, 495 decomposition
contour integral, 222 eigenvalue, 21
convergence in distribution, 83 QR, 23
convergence in probability, 83 singular value, 22
convergence in quadratic mean subspace, 24
mean-square convergence, 82 delay estimation bound, 548
convergence of random variables, 81 delay spread, 122
convex, 14 delta function, 14
convex hull, 422 derivative
convolution, 45 Cauchy–Riemann equations, 34
Cooper, Martin, 8 complex, 34, 35
coordinates gradient, 31
cylindrical, 31 Laplacian, 31
Euclidean, 31 matrix determinant with respect to real
spherical, 32 scalar, 30
Costa precoding, 162 matrix inverse with respect to real scalar,
covariance matrix, 72 30
592 Index
derivative (cont.) dot product, 16

matrix log determinant with respect to downlink, 125
real scalar, 30 DSSS, 6
matrix trace with respect to real scalar, 30 dynamic range, 563
matrix with respect to real scalar, 30
real, 29 effective area, 144
vector with respect to real scalar, 30 effective number of bits, 565
derivatives eigenvalue distributions
complex, 33 finite, 92
detector infinite, 92
chip rate, 533 eigenvalues, 21
energy, 524 eigenvalues
new energy, 527, 528 2 × 2 Hermitian matrix, 22
determinant, 19 degenerate, 21
determinant criteria, 375 duplicated, 21
determinant criterion, 374 eigenvectors, 21
DFT, 46, 182 Einstein–Wiener–Khinchin theorem, 88
Dieckmann, Max, 7 electric displacement, 2
digital-to-analog converter, 564 electric field, 2
Dirac delta function, 14 electrically small, 142
direct modulation of space time codes, electrically small antennas, 560
385 electromagnetics, 1
direct-sequence spread spectrum, 6 empirical distribution function (e.d.f.), 94
direction of arrival, 174 end fire, 177
dirty-paper coding, 162, 242, 246, 522 energy detector
discrete Fourier transform, 46 multiple antenna, 534
discrete-time Fourier transform, 138 energy per bit, 165
discretely sample channel issues, 342 entropy, 156
distribution equivalence
χ 2 , 73 MMSE and maximum SINR beamformers,
F , 76 314
beta, 79 zero-forcing and orthogonal beamformers,
central χ 2 , 73 307
complex χ 2 , 74 erf, 63
complex Gaussian, 71 ergodic, 87
complex noncentral χ 2 , 76 ergodic capacity, 253
exponential, 73 error function, 63
Gaussian, 71 estimation bound
log-normal, 79 delay, 548
logarithmically normal, 79 nuisance parameters, 105
Nakagami, 78 ETSI, 10
noncentral χ 2 , 75 Euler integral, 61
Poisson, 78 Euler–Langrange differential equation, 54
Rayleigh, 72 excess kurtosis, 68
Rice, 77 excess SNR, 388
Rician, 77, 78 exposed node/terminal, 496
Wishart, 92, 338 extrema, 37
distributive, 18
DOA, 174 factorial, 58
Dolbear, Amos Emerson, 3 fading, 123
dominated convergence theorem, 86 fading
doppelgangers, 36 block, 289
Doppler spread, 123 flat, 239
Doppler taps, 341 frequency-selective, 240, 345
Doppler-domain representation, 357 Faraday, Michael, 1
Doppler-frequency spread, 343 Farnsworth, Philo Taylor, 7
Index 593
fast Fourier transform, 46, 47 Gel’fand–Pinsker, 162

FCC, 10, 521 generalized beamwidths, 261
FCC geolocation, 174, 177
part 15.247, 10 Gitlin, Richard D., 8
FDMA, 241, 292 Glavieux, Alain, 8
FEC, 120 global positioning system, 562
Federal Communications Commission, 521 global system for mobiles, 129
Fessenden, Reginald Aubrey, 5 Golay, Marcel J. E., 5, 8
FFT, 46, 47 good neighbor, 538
filter GPS, 562
minimum mean-squared error (MMSE), gradient
299 complex, 38
minimum-mean-squared error, 298 gradient operator, 31
MMSE, 298 Grassmannian manifold, 292
Wiener, 298 Green, Jr., Paul Eliot, 6
filtering GSM, 129
adaptive, 298 Gupta and Kumar, 470
finite precision, 565
Fisher information matrix Hadamard inequality, 20, 21
change of variables, 104 Hadamard product, 17
complex nuisance parameters, 112, 113 Hamming, Richard Wesley, 5, 8
parameter in the mean, 113 Harry Nyquist, 123
pseudo, 109 Hartley, Ralph Vinton Lyon, 5
reduced, 105 Hell, Rudolf, 7
floor, 14 Hermitian conjugate, 15
forward error correction, 120 Hertz, Heinrich Rudolf, 2
forward substitution, 567 heterodyne receiver, 5
Foschini, Gerard Joseph, 8 hidden node, 498, 538
four-color theorem, 127 hidden node/terminal, 496
Fourier transform, 44 hidden terminal, 498, 538
Fourier transform Hilbert space, 15, 17
discrete, 182 Hilbert space
fractional delay, 344 infinite, 17, 185
frequency modulation, 5 HiperLAN, 10
frequency reuse, 127 holomorphic, 40
frequency synthesizer, 561 holomorphic function, 34
frequency synthesizers, 341 holomorphic functions, 222
frequency taps, 341 hypergeometric, 59
frequency-division multiple access, 241, hypergeometric function, 62
292 hypersphere
frequency-hopping spread spectrum, 6 hardening, 151
frequency-selective channel, 123, 298 volume, 152
Frobenius norm, 19 hypothesis test, 550
function
analytic, 34, 40 idempotent, 25, 277, 306
differentiable, 40 IEEE 802.11, 10
holomorphic, 34, 40 IEEE 802.11n, 10
pole, 40 IEEE 802.22, 521
IF, 121
gain, 144 incomplete beta function, 61
game theory, 523 incomplete gamma function, 58
gamma function, 58, 74 industrial, medical, and scientific frequency
gamma function band, 10, 241
incomplete, 74 inflection, 37
Gaussian channel, 252 informed transmitter, 245, 246
Gaussian Q-function, 63 inner product, 16
594 Index
inner product least squared error beamformer, 313

Hermitian, 16 Lerch transcendent, 59
integer numbers, 12 line-of-sight, 143
integrated SNR, 214 link
integration close, 275
contour, 40 little-o, 57
Jacobian, 43 LMS, 331
line, 40 LMS
path, 40 decision feedback, 331
pole, 40 local oscillator, 561
pole on path, 42 local oscillator frequency offsets, 341
residues, 40 local oscillator phase noise, 341
volume, 42 logarithm, 13
interference, 241 logarithm
external, 241 complex value, 14
interference natural, 13
cochannel, 241 perturbative expansion, 29
external, 118, 240, 295, 297 principal value, 14
internal, 118, 241, 295 translation of base, 13
interference alignment, 487 Loomis, Mahlon, 3
intermediate frequency, 121
intersymbol interference, 163, 298, 341 Magnavox, 6
ISI, 163 magnetic field, 2
ISM, 554 magnetic flux density, 2
ISM frequency band, 10, 241 MAP, 296
Marcenko–Pastur theorem, 94
Jacobian, 43, 68, 243 Marcenko–Pastur probability distribution,
Jensen’s inequality, 69 93, 270
John Johnson, 123 Marconi, Guglielmo, 4
Johnson noise, 123 Marcum Q-function, 63, 76, 222, 238
Marcum Q-function
K-factor, 77 generalized, 63
Kailath, Thomas, 8 marginal probability density, 67
Karush–Kuhn–Tucker (KKT) theorem, 51 Markey, Hedy Kiesler, 6
Karush–Kuhn–Tucker conditions, 249 Maskelyne, Nevil, 4
KKT conditions, 249 matched-filter beamformer, 177
Kolmogorov, Andrey Nikolaevich, 6 matrix, 15
Kotelnikov, Vladimir Aleksandrovich, 137 matrix
Kotowski, Paul, 6 2 × 2 inverse, 27
Kronecker delta, 15, 304 circulant, 26
Kronecker product, 18 conditioning, 337
kurtosis, 68 determinant, 19
kurtosis determinant of inverse, 20
excess, 68 diagonal, 15
diagonal loading, 339
Lagrange multiplier, 48, 50 eigenvalue limiting, 338
Lamarr, Hedy, 6 eigenvalues, 26
Lambert W function, 61 Frobenius norm, 19
Landau notation, 57 Hadamard product, 17
Laplace transform, 48 Hermitian, 15
Laplacian, 426 identity, 16
Laplacian operator, 31 inverse of identity matrix plus a rank 1,
Laurent series, 41 28
least mean squared, 331 inverse of identity matrix plus a rank 2,
least mean squares 28
decision feedback, 331 inverse partitioned, 27
Index 595
inversion, 27 multiple antennas, 333

Kronecker product, 18 multiple channels, 333
low rank, 26 mutual information, 156
minor, 20
nonsingular, 27 Nash equilibrium, 523
positive definite, 16, 21, 23 nats, 157
positive semidefinite, 16, 21, 23 natural logarithm, 13
product, 17 negative frequencies, 46
projection, 24 new-energy detector
quadratic Hermitian, 23 multiple antenna, 536
rank, 23 NLMS, 332
rank 1, 26 noise factor, 561
rank 2, 27 noise figure, 123, 561
square root, 202 non-Gaussian noise, 560
SVD of inverse, 27 noncentral χ 2 random variable, 221
Toeplitz, 26 noncentral moment, 69
trace, 19 noncentrality parameter, 75
unitary, 16 noncommutative delay and Doppler
Wishart, 92, 270 operations, 344
matrix decomposition, 21 nonlinear programming, 248
maximal ratio transmission, 370 nonlinearities, 564
maximum, 37 nonlinearities
maximum likelihood, 296 analog, 567
maximum ratio combiner (MRC), 301 norm, 19
maximum a posteriori, 296 norm
Maxwell’s equations, 2 Frobenius, 19
Maxwell, James Clerk, 2 normal distribution, 72
MC/CDMA, 516 notation, 12
MCMUD, 333 nuisance parameters, 105
mean, 68 NULLHOC protocol, 515
medium-access control, 495 Nyquist criteria, 46
method of intervals, 218
MIMA-MAC protocol, 513 Oersted, Hans Christian, 1
MIMO, 239 OFDM, 47, 121, 349, 354, 523, 554, 563
MIMO OFDMA, 522, 523
broadcast channel capacity, 420 open space, 538
multiple-access channel capacity region, open systems interconnection model, 118
419 optimization, 37
successive interference cancellation, 327 orthogonal frequency-division multiple
minimum, 37 access, 522, 523
ML, 296 orthogonal frequency-division multiplexing,
MMSE 523, 563
spatial networks, 429 orthogonal space-time block codes,
mobile phone, 8, 119 373
model errors, 560 orthogonal-frequency-division multiplexing,
moment 47, 121, 349, 354
central, 68 oscillator
noncentral, 69 phase noise, 562
Motorola, 8 oscillators
MUD, 8 atomic clocks, 562
multipath, 122 OCXO, 562
multipath scattering, 239 ovenized compensated, 562
multiple-input multiple-output (MIMO) TCXO, 562
communications, 239 temperature compensated, 562
multiuser detection, 8 OSI model, 118
multiuser detectors outage, 275
596 Index
outage capacity, 275 projection matrix, 24

outer product, 17 pseudospectral estimator
spatial, 208
parameter estimation PSK, 121
threshold point, 218 pulse-shaping filter, 138
Pareto optimal, 523
Parseval’s theorem, 45 Q factor, 560
Paulraj, Arogyaswami J., 8 QAM, 121
peak-to-average power ratio, 355, 563 QPSK, 121
permeability, 2 QR decomposition, 566
permittivity, 2 quadrature amplitude modulation, 121
phase center, 172 quadrature phase-shift keying, 121
phase-lock loop, 561 quality factor, 560
phase-shift keying, 121 quantization, 564
pilot sequence, 295 quantization
Planck constant, 123 noise, 564
plane wave, 170 quantum mechanics, 123
Pochammer symbol, 59
Poisson point process, 134 radar, 174
Poisson process, 91 Radio Day, 4
Poisson–Voronoi tessellation, 459 RADIONET, 9
pole, 40 raised-cosine filter, 138
pole on path, 42 random process, 86
Popov, Alexander Stepanovich, 4 random variables
positive-definite matrix, 16, 21, 23 product of Gaussians, 81
positive-semidefinite matrix, 16, 21 sum, 80
posterior probability density, 67 rank criterion, 374, 375
power constraint Rayleigh channel, 252
average, 249 Rayleigh–Jeans law, 145
per element, 252 RCA, 7
power consumption, 568 real function of complex variables, 38
Poynting vector, 235 real numbers, 12
practical issues, 559 receiver
precoding, 245, 252 decision feedback, 328, 331
Price, Robert, 6 dense space-time-frequency adaptive
prior probability density, 67 processing, 361, 362
PRNet, 9 doubly dispersive, 361, 362
probability, 66 iterative, 328, 334
probability least mean squared, 331
change of variables, 67 LMS, 331
multivariate distribution, 70 maximum likelihood, 296
probability density maximum a posteriori, 296
characteristic function, 70 NLMS, 332
conditional, 67 recursive least squares, 328
cumulant, 70 RLS, 328
marginal, 67 space-time-frequency adaptive processing,
posterior, 67 361, 362
prior, 67 receiver operating characteristic, 98, 525,
a posteriori, 67 527, 532, 536, 547
a priori, 67 receiver performance, 559
probability distribution reciprocity, 143, 246, 291
Marcenko–Pastur, 93, 270 recursive least squares
product decision feedback, 328
inner, 16 reduced Fisher information matrix, 105
outer, 17 reference signal, 295
product logarithm function, 61 regularized beta function, 61
Index 597
regularized gamma function, 492 killer weights, 303

residues, 40 matched filter, 301
reuse factor, 127 minimum interference, 303
Rician channel, 252 MMSE, 311
Rician distribution zero forcing, 303
K -factor, 77 spectral efficiency, 243
Rician random variable, 221 spectral efficiency
RLS good neighbor optimization, 538, 540, 542
decision feedback, 328 spectral efficiency bound, 243
ROC, 98, 525, 527, 532, 536, 547 spectral estimator, 218
spectral estimator
Salz, Jack, 8 MVDR, 207
sample matrix inversion, 328 pseudo, 207, 208
sampled signals, 137 spatial, 207
scalar, 12 spectral scavenging, 522
SCORE satellite, 7 spectrum market, 523
SDMA, 504 sphere hardening, 151
sec:CSMA/CA, 514 spread spectrum
sectored antenna, 132 frequency hopping, 6
selection vector, 304 spread spectrum, 6
Shannon limit, 120 spurs, 568
Shannon, Claude Elwood, 5, 137 STAE, 349
sidelobe, 177 stale channel estimate, 246
sidelobes STAP, 349
importance of, 177 stationary point, 37
signal communications orbit relay equipment statistical mechanics, 150, 157
satellite, 7 statistical thermodynamics, 157
signal detection, 524 steering vector, 173, 174, 301
simple pole, 40 STI-MAC, 516
singular-value decomposition, 22 successive interference cancellation, 327
SINR, 313 superheterodyne, 121
skew, 68 superheterodyne receiver, 5
Slutsky’s theorem, 85 SURAN, 9
SMI, 328 SVD, 22, 27
SNR loss, 315 synchronization, 547
solid angle, 141
SPACE-MAC protocol, 507 tap delays, 341
space-time taps
eigenvalue distribution, 349 Doppler, 341
space-time adaptive equalization, 349 frequency, 341
space-time adaptive processing, 349, 353 Tarokh, Vahid, 9
space-time block codes, 9 television, 6
space-time codes, 8 Tesla, Nicola, 3
space-time covariance matrix, 350 test statistic, 98, 218
space-time covariance matrix thermal noise, 123
eigenvalue distribution, 349 Thitimajshima, Punya, 8
space-time trellis codes, 9 threshold point, 218
space-time trellis coding, 376 time-division multiple-access (TDMA), 128
space-time-frequency Toeplitz matrix, 26
eigenvalue distribution, 358 trace, 19
space-time-frequency covariance matrix trace criteria, 376
eigenvalue distribution, 358 training sequence, 295
spark-gap transmitter, 2 transform
spatial filtering, 175 Fourier, 44
spatial processing Laplace, 48
adaptive, 300, 315 transmission capacity, 135
598 Index
transmitter waveform exploitation, 203

informed, 246 waveform optimization, 538
uninformed, 246 wavefront, 170
transmitter performance, 559 wavenumber, 213
transpose, 15 wavevector, 213
turbo codes, 8 Westinghouse Laboratories, 7
white Gaussian noise (WGN) process, 90
US Federal Communications Commission, 10 white space, 538
uniformly integrable, 85 white-noise process, 89
uninformed transmitter, 246 whitened channel matrix, 247, 314
union bound, 155 whitened covariance matrix, 202
universal space-time codes, 386 whitened data matrix, 534
uplink, 125 Whittaker, John Macnaghten, 137
wide sense stationarity, 88
variance, 68 wide sense stationary, 86
VCO, 561 Wiener, Nobert, 5
vector, 15 Wiener–Hopf equation, 300
vector WiFi, 10, 11
broadcast channel, 420 windowing, 356
multiple-access channel, 419 Winters, Jack H., 8
norm, 19 wireless channel, 122
vector operation, 16 wireless telegraphy, 3
vector sensor, 235 Wirtinger calculus, 35, 203
vector space, 14 Wirtinger calculus
Verdu, Sergio, 8 gradient, 38
Viterbi algorithms, 8 multivariate, 38
Viterbi, Andrew James, 7 Wishart matrix, 92, 270, 442
vocoder, 119 Wishart matrix
voltage controlled oscillator, 561 asymptotic eigenvalue densities, 270
Volterra series, 567 Woodbury’s formula, 28
Voronoi tessellation, 449
z-transform, 433
Walsh functions, 129 zenith, 32
Ward protocol, 509 zero forcing, 304, 307
water filling, 245 Zworykin, Vladimir K., 7

AdaptiveWirelessCommunicationsMIMOChannelsandNetworks-1

Uploaded by

Copyright:

Available Formats

You might also like

AdaptiveWirelessCommunicationsMIMOChannelsandNetworks-1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AdaptiveWirelessCommunicationsMIMOChannelsandNetworks-1

Uploaded by

Copyright:

Available Formats

Adaptive Wireless Communications

Daniel W. Bliss is an Associate Professor in the School of Electrical, Computer and

This publication is in copyright. Subject to statutory exception

First published 2013

Library of Congress Cataloguing in Publication data

ISBN 978-1-107-03320-7 Hardback

Additional resources for this publication at www.cambridge.org/bliss

Cambridge University Press has no responsibility for the persistence or

Preface page xvii

2 Notational and mathematical preliminaries 12

2.6 Useful matrix approximations 28

3 Probability and statistics 66

3.1.11 Central χ2 distribution 73

4 Wireless communications fundamentals 118

5 Simple channels 141

6 Antenna arrays 170

7 Angle-of-arrival estimation 201

7.8.4 Correlated Rician random variables 226

8 MIMO channel 239

9 Spatially adaptive receivers 295

9.2.3 MMSE spatial processing 311

10 Dispersive and doubly dispersive channels 341

11 Space-time coding 365

11.5.1 Single-antenna bit-interleaved coded modulation 381

13 Cellular networks 414

13.5.1 Application to power-controlled systems with

14 Ad hoc networks 470

15 Medium-access-control protocols 495

16 Cognitive radios 520

16.3 Legacy signal detection 524

17 Multiple-antenna acquisition and synchronization 547

18 Practical issues 559

I would like to thank and remember Professor David Staelin of Massachusetts

1.1 Development of electromagnetics

1.2 Early wireless communications

1 Throughout this chapter we employ the common, if imprecise, usage of “American” to

military organizations to make wireless communications available to relatively

1.3 Developing communication theory

In 1900, Canadian-born American engineer Reginald Aubrey Fessenden [304],

1.4 Television broadcast

1.5 Modern communications advances

In the modern age of wireless communications, with a few notable exceptions,

This algorithm speciﬁed the decoding of convolutional codes via a dynamical

1.5.1 Early packet-radio networks

1.5.2 Wireless local-area networks

This chapter contains a number of useful deﬁnitions and relationships used

2.1.1 Table of symbols

a∈S a is an element of the set S

eiα = cos(α) + i sin(α) . (2.4)

An arbitrary complex number a ∈ C can be expressed in terms of polar coordi-

where the real and imaginary parts of a are indicated by

respectively. The complex conjugate of a variable is indicated by

The value of i can also be expressed in an exponential form,

Consequently, exponents of i can be evaluated. For example, the inverse of i is

The logarithm of variable x ∈ R, assuming base a ∈ R, is indicated

loga (x) , (2.10)

loga (ay ) = y , (2.11)

log(x) = loge (x) . (2.12)

The translation between bases a and b of variable x is given by

λm {A + B} = λm {A} + λm {B} , (2.80)

rank{M} = #{m : λm {M} = 0} , (2.92)