Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Signal Transmission and Reception Techniques

Analog Signals & Systems


and
Probability & Statistics

A refresher

This document provides a brief refresher of basic concepts from Analog Signals and Systems and
Probability and Statistics, which should already be familiar to the student from previous courses.
Due to the extensive use that we will be making of this concepts, their correct understanding and
mastery are of paramount importance. At the end of this document a series of self-test problems
are provided; students should be able to solve them without much difficulty.
Contents

1 Analog Signals & Systems 3


1.1 Analog signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Real-valued and complex-valued signals . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Periodic signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Symmetry properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Some important signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.5 Basic signal operations. Convolution. . . . . . . . . . . . . . . . . . . . . . . 4
1.2 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Definition and properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Fourier Transform of periodic signals . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Linear time-invariant (LTI) systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Impulse response and transfer function . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Problems (Analog signals & systems) . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Probability & Statistics 14


2.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 Definition and properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.2 Conditional probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Distribution and density functions . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 The Gaussian distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3 Characteristics of a Random Variable . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.2 Conditioning of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.3 Independence and covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.4 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.5 Statistics of a random process . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.6 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Problems (Probability & Statistics) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2
Chapter 1

Analog Signals & Systems

1.1 Analog signals


Analog signals are functions of a continuous variable t, to which we generally associate the meaning
of time.

1.1.1 Real-valued and complex-valued signals


An analog signal x(t) may take real or complex values. For example, the complex exponential
x(t) = Aej(2πf0 t+θ) , as its name suggests, is complex-valued. Assuming A > 0, its magnitude,
phase, real and imaginary parts are:

|x(t)| = A, ∠x(t) = 2πf0 t+θ, Re{x(t)} = A cos(2πf0 t+θ), Im{x(t)} = A sin(2πf0 t+θ)

In general, for any signal x(t), one has


q
j∠x(t)
x(t) = |x(t)| · e = Re{x(t)} + jIm{x(t)}, |x(t)| = Re2 {x(t)} + Im2 {x(t)}

1.1.2 Periodic signals


Periodic signals constitute an important class. If x(t) is periodic withn period T0 , then its values
repeat every T0 seconds, that is, x(t) = x(t+T0 ) for all t. Note that in that case x(t) is also periodic
with period kT0 , with k any integer, since x(t) = x(t + kT0 ) for all t and all k ∈ Z. For instance,
the complex exponential x(t) = Aej(2πf0 t+θ) is periodic with period T0 = 1/f0 , as well as its real
and imaginary parts; note that the magnitude and phase of a periodic signal are not necessarily
periodic.

1.1.3 Symmetry properties


Some signals present certain degrees of symmetry. We say that a real-valued signal x(t) has even
symmetry if x(−t) = x(t) for all t, and that it has odd symmetry if x(−t) = −x(t) for all t. In
the complex case, x(t) is said to have Hermitian symmetry if x(−t) = x∗ (t), and anti-Hermitian
symmetry if x(−t) = −x∗ (t).

3
1.1.4 Some important signals
The following are some important non-periodic signals:

 1 t>0
1
Unit step: u(t) = t=0
 2
0 t<0
sin πt
Sinc function: sinc(t) =
πt
 1 |t| < 12
1
Rectangular pulse: rect(t) = t = ± 12
 2
0 |t| > 12

 1 + t −1 ≤ t ≤ 0
Triangular pulse: tri(t) = 1−t 0≤t≤1
0 else.

Another important function for us is the Dirac delta function, or unit impulse. It can be thought
of as the limiting case of a rectangular pulse with duration  and unit area when  → 0. The most
important properties of the unit impulse are:

δ(t) = 0 for all t 6= 0


Z b 
1 a < t0 < b
δ(t − t0 )dt =
a 0 else.
x(t)δ(t − t0 ) = x(t0 )δ(t − t0 )
1
δ(at) = δ(t) for all a 6= 0
|a|

From these properties, it follows that


Z ∞
x(t)δ(t − t0 )dt = x(t0 ) (1.1)
−∞

that is, multiplying the signal x(t) by a unit impulse located at t = t0 and computing the integral
of the result over the whole time axis, we precisely obtain the value of the signal x(t) at t = t0 .

1.1.5 Basic signal operations. Convolution.


Starting with a signal x(t), the following operations can be defined:

1. Amplitude scaling by a factor α: y(t) = α · x(t).

2. Time shift by t0 seconds: y(t) = x(t − t0 ).

3. Time reversal: y(t) = x(−t).


t

4. Time axis scaling by a factor b: y(t) = x b .

Time axis scaling is illustrated in Figure 1.1. If the scaling factor satisfies b ∈ (0, 1), then x bt


is a ”compressed” version of x(t). On the other hand, if b > 1, then x bt is a ”stretched” version of


x(t). In the particular case in which x(t) is a sinusoid of frequency f0 , then x bt is also a sinusoid,


but with frequency fb0 .

4
Figure 1.1: Graphical illustration of time axis scaling by a factor b > 0.

Very often we find combinations of the above operations. Perhaps the one causing the most
confusion is the combination of time shift and time scaling: y(t) = x t−t

b  . The right way to
0

obtain y(t) from x(t) is to compute first the time-scaled signal z(t) = x bt and then apply the
time shift: y(t) = z(t − t0 ). See Figure 1.2
It is also possible to obtain a signal y(t) in terms of two signals x1 (t), x2 (t) according to the
following operations:

1. Linear combination: y(t) = α · x1 (t) + β · x2 (t).

2. Product: y(t) = x1 (t) · x2 (t).

3. Convolution: Z ∞
y(t) = x1 (t) ? x2 (t) = x1 (τ )x2 (t − τ )dτ (1.2)
−∞

This last operation is of great importance in the analysis of linear time invariant (LTI) systems,
so its correct understanding cannot be overstated. To obtain the value of the signal y(t) at a time
instant t = t0 , expression (1.2) says that:

1. First we must take the first of the signals to convolve, seen as a function of the variable τ :
x1 (τ ).

2. Then we must take the second signal x2 (τ ), apply a time reversal to obtain x2 (−τ ), and
center the resulting signal at τ = t0 . In this way we obtain x2 (t0 − τ ).

3. These two signals, x1 (τ ) and x2 (t0 − τ ), are multiplied to obtain another one, which we may
call st0 (τ ).

4. Integrating st0 (τ ) over −∞ < τ < ∞, the value y(t0 ) is finally obtained.

The signal y(t) is the one obtained bt applying the above procedure to each value of t. For a
graphical depiction of this process, see Figure 1.3. This figure also illustrates an important property
of convolution, due to the fact that if x1 (τ ) and x2 (t0 − τ ) do not overlap, then their product st0 (τ )
is zero for all τ , and therefore its integral over the real line will be y(t0 ) = 0. Therefore:

5
Figure 1.2: Combining time axis scaling and time shifting (0 < b < 1, t0 > 0).

• If x1 (t) = 0 outside the interval [a, b], and x2 (t) = 0 outside the interval [c, d], then y(t) =
x1 (t) ? x2 (t) is zero outside the interval [a + c, b + d].

Other important properties of convolution are:

• Conmutativity: x1 (t) ? x2 (t) = x2 (t) ? x1 t).

• Linearity:
[ax1 (t) + bx2 (t)] ? x3 (t) = a [x1 (t) ? x3 (t)] + b [x2 (t) ? x3 (t)] .

• When one of the two signals to convolve is an impulse, say x1 (t) = δ(t − α), then applying
property (1.1) the result is seen to be
Z ∞
y(t) = δ(t − α) ? x2 (t) = δ(τ − α)x2 (t − τ )dτ = x2 (t − α),
−∞

or in words, convolving a signal with an impulse located at t = α amounts to applying a time


shift of α seconds to said signal.

1.2 The Fourier Transform


1.2.1 Definition and properties
The Fourier Transform (FT) of a signal x(t) is defined as
Z ∞
X(f ) = x(t)e−j2πf t dt (1.3)
−∞

6
Figure 1.3: Graphical illustration of the convolution of two signals.

The original signal can be recovered from X(f ) by means of the Inverse FourierTransform:
Z ∞
x(t) = X(f )ej2πf t df (1.4)
−∞

Sometimes the FT is expressed as a function of the angular frequency ω (whose units are rad/s)
instead of f (whose units are Hz), in which case one has
Z ∞ Z ∞
0 −jωt 1
X (jω) = x(t)e dt, x(t) = X 0 (jω)ejωt dω
−∞ 2π −∞

Note that this amounts to a mere change of variables (ω = 2πf ) so that X(f ) = X 0 (j2πf ). We
prefer to work with f in Hz, because the physical meaning of this variable is arguably more clear.
1
In addition, in this way we avoid to drag constant factors such as 2π or 2π in many equations.
Some observations concerning the FT are:

• X(f ) takes in general complex values, even if x(t) is real-valued. We often refer to X(f ) as
the spectrum of x(t) (plural spectra).

• Expression (1.4) indicates that x(t) can be understood as the superposition of (infinitely
many) complex exponentials ej2πf t , each of them weighted by a factor that is precisely X(f ).
That is why we refer to |X(f0 )| as the spectral component at frequency f = f0 : it is the
weight of a complex exponential with frequency f0 has when synthesizing x(t). Note that
X(f0 )ej2πf0 t = |X(f0 )|ej(2πf0 t+∠X(f0 )) .

7
• There exist signals for which it is not immediate to compute their FT via (1.3), as well as
signals whose FT does not exist. Next we will review the definition of FT for periodic signals.
In general, non-periodic signals with no FT will be of no interest to us.

A list of the most commonly found FTs is given in Table 1.1, whereas the most useful FT properties
are listed in Table 1.2.

Time domain Frequency Domain


δ(t) 1
1 δ(f )
δ(t − t0 ) e−j2πf t0
A jθ −jθ δ(f + f )]
A cos(2πf0 t + θ) 2 [e δ(f − f0 ) + e 0
A sin(2πf0 t + θ) A jθ
[e δ(f − f ) − e −jθ δ(f + f )]
2j 0 0
Aej(2πf0 t+θ) Aejθ δ(f − f0 )
rect(t) sinc(f )
sinc(t) rect(f )
tri(t) sinc2 (f )
sinc2 (t) tri(f )  
P∞ 1 P∞ k
n=−∞ δ(t − nT0 ) T0 k=−∞ δ f − T0

Table 1.1: Commonly used Fourier Transforms

Property Expression
Symmetry x(t) real-valued ↔ X(f ) = X ∗ (−f )
Linearity ax(t) + by(t) ↔ aX(f ) + bY (f )
Duality X(f ) = FT{x(t)} ↔ x(f ) = FT{X(−t)}
Time shift x(t − t0 ) ↔ X(f )e−j2πf
 
t0

1 f
Scaling x(at) ↔ |a| X a
Convolution x(t) ? y(t) ↔ X(f )Y (f )
Product x(t)y(t) ↔ X(f ) ? Y (f )
Modulation x(t)ej2πf0 t ↔ X(f − f0 )
Differentiation R∞y(t) = dx(t)
dt ↔ Y
R∞(f ) = j2πf X(f )
Parseval’s relation x(t)y ∗ (t)dt = ∗ )df
−∞
R∞ 2
R ∞X(f )Y (f
−∞
2
Rayleigh’s theorem −∞ |x(t)| dt = −∞ |X(f )| df

Table 1.2: Fourier Transform properties.

8
1.2.2 Fourier Transform of periodic signals
If x(t) is periodic with period T0 , then its FT consists of impulses in the frequency domain. This
is because, to begin with, x(t) admits a Fourier series representation:

j 2πn t
X
x(t) = x(t + kT0 ) ∀t, ∀k ∈ Z ⇒ x(t) = xn e T0 (1.5)
n=−∞

where the coefficients xn are obtained in the following way (the value of α is irrelevant):
Z α+T0
1 −j 2πn t
xn = x(t)e T0 dt
T0 α
Therefore, taking transforms in (1.5) we obtain
∞  
X n
X(f ) = xn δ f −
n=−∞
T0

which shows that the FT of a periodic signal consists of a series of impulses (delta functions) located
at multiples of the fundamental frequency. The weights of these impulses are just the coefficients
of the Fourier series representation of the periodic signal.
The Parseval relation and Rayleigh’s theorem form Table 1.2 have their equivalents for periodic
signals: if x(t), y(t) are both periodic with period T0 , then for any α ∈ R it holds that
Z α+T0 ∞ Z α+T0 ∞
1 ∗
X
∗ 1 2
X
x(t)y (t)dt = x n yn , |x(t)| dt = |xn |2 ,
T0 α n=−∞
T 0 α n=−∞

where {xn } and {yn } are the corresponding coefficients of their Fourier series.

1.3 Linear time-invariant (LTI) systems


1.3.1 Impulse response and transfer function
A system is an operator or device which maps signals at its input onto signals at its output. A
system is linear if it satisfies the following property. Suppose that y1 (t) is the output of the system
when the input is x1 (t), and that y2 (t) is the output when the input is x2 (t). Then the system is
linear if for any pair of signals x1 (t), x2 (t) and any pair of scalars a1 , a2 , application of the signal
a1 x1 (t) + a2 x2 (t) at its input produces the signal a1 y1 (t) + a2 y2 (t) at its output.
A system is time invariant if, for any signal x(t) and any t0 ∈ R, its response to the input
x(t − t0 ) is y(t − t0 ), where y(t) is the response to x(t).
LTI systems are those which are simultaneously linear and time invariant. For LTI systems,
the output can be obtained as the convolution of the input with certain function h(t) known as the
impulse response: Z ∞
y(t) = x(t) ? h(t) = h(τ )x(t − τ )dτ (1.6)
−∞
The term impulse response is due to the fact that h(t) is the output of the system when its input
is a unit impulse x(t) = δ(t), since in that case y(t) = δ(t) ? h(t) = h(t).
Using the convolution property of the FT, the relation between the FTs of the input and output
signals to an LTI system is found:
Y (f ) = X(f )H(f ) (1.7)

9
where H(f ) is the FT of the impulse response h(t), known as the transfer function of the LTI
system. We can see that the FT provides an extremely useful tool for the analysis of LTI systems,
since it allows us to picture their operation in the frequency domain in a very simple way (it is just
a product).
Causality is a necessary condition for the physical implementation of a system (whether it is
LTI or not). Causality means that the output cannot anticipate the input, or in other words, the
value of its output at time t0 can only depend on the values of the input for t ≤ t0 . An LTI system
is causal if and only if its impulse response h(t) satisfies h(t) = 0 for t < 0.

1.3.2 Filtering
From (1.7), two important observations about LTI systems can be made:

1. On the one hand, an LTI system cannot create new spectral content: If X(f0 ) = 0 for some
frequency f0 , then the spectrum of the output will satisfy Y (f0 ) = 0 as well, independently
of the value of H(f0 ).

2. On the other hand, an LTI system can remove spectral content: if the transfer function is
such that H(f0 ) = 0 for some frequency f0 , then the output spectrum will satisfy Y (f0 ) = 0,
regardless of the value of X(f0 ).

It is because of this frequency domain interpretation that we often refer to certain LTI systems as
filters, since they only let certain spectral components of the input pass through, eliminating (or
at least attenuating) the remaining ones. For instance:

• Ideal lowpass filter:   


f 1, |f | ≤ B,
H(f ) = rect =
2B 0, |f | > B.
This filter only passes spectral components below the cutoff frequency B. In practice it is not
possible to implement a filter with such a sharp cutoff at B, and we must settle for something
like 
 1, |f | ≤ B1 (passband)
|H(f )| ≈ ?, B1 < |f | ≤ B2 (transition band) (1.8)
0, |f | > B2 (stopband)

This transfer function deviates from the ideal lowpass filter in four aspects:

1. The magnitude in the passband is only approximately constant.


2. The drop from the passband to the stopband is smooth, from f = B1 to f = B2
(transition or roll-off band).
3. The spectral components of the input for |f | > B2 are not completely eliminated, but
only attenuated.
4. Expression (1.8) only provides conditions on the magnitude of H(f ), without specifying
its phase. In practice, it is important that this phase be approximately a linear function
of frequency, at least within the passband, to avoid distortion in the spectral components
of the input within the passband.

10
Similar considerations can be made regarding the practical implementation of the following
types of filer:

• Ideal highpass filter: 


0, |f | ≤ B,
H(f ) =
1, |f | > B.

• Ideal bandpass filter: 


 0, |f | ≤ W1 ,
H(f ) = 1, W1 < |f | ≤ W2 ,
0, |f | > W2 .

• Ideal bandstop filter: 


 1, |f | ≤ W1 ,
H(f ) = 0, W1 < |f | ≤ W2 ,
1, |f | > W2 .

1.4 Problems (Analog signals & systems)


1. Let x(t) = A1 ej(2πf1 t+θ1 ) + A2 ej(2πf2 t+θ2 ) . Under which conditions is x(t) periodic? In that
case, what is its period?

2. If x(t) has Hermitian symmetry, what can be said about the symmetry properties of its real
and imaginary parts? What about its magnitude and phase?
Rt
3. Find an expression for the signal x(t) = −∞ δ(τ )dτ .

4. Represent graphically the following signal



1−t 0<t≤1
x(t) =
0 otherwise
t t−t0
 
as well as the signals αx(t), x(−t), x(t − t0 ), x 2 , x(2t), x 3 , x(3t − t0 ).

5. Consider the signal y(t) = x1 (t) ? x2 (t), where x1 (t) = rect(t), x2 (t) = rect(t/T ).

(a) For which values of t is it certain that y(t) = 0?


(b) Show that y(0) = min{1, T }.
(c) Sketch y(t).

6. Show that convolution is commutative: x1 (t) ? x2 (t) = x2 (t) ? x1 (t).

7. Prove the properties of the FT in Table 1.2.

8. Find the FT of the signal y(t) = x(t) cos(2πf0 t + θ).

9. Find the FT of the signal y(t) = x(t) ? sinc(Bt).

10. Find the signal x(t) whose FT is given by X(f ) = A [rect(T (f − f0 )) + rect(T (f + f0 ))].

11. Find the FT of the signal x(t) = sign{cos(2πf0 t)}.

11
12. Consider the system producing at its output a time-shifted version of its input, that is,
y(t) = x(t − t0 ). Under which conditions is this system causal? Is this an LTI system? If so,
obtain its impulse response and transfer function.

13. For the LTI system of the previous problem, obtain the magnitude and phase of its transfer
function. Conclude that this system is an all-pass filter.

14. Assume that the transfer function of certain LTI system satisfies |H(f )| = 1 for all f . Can
we conclude that the output of the system will always be identical to the input?
dx(t)
15. Consider the system whose output is the derivative of the input, that is, y(t) = dt . Is this
an LTI system? If so, obtain its transfer function.

16. Given two LTI systems with impulse responses h1 (t), h2 (t), consider their series intercon-
nection. Is the global system so obtained LTI? If so, what is its impulse response? And its
transfer function?

17. Repeat the previous problem for the parallel interconnection of two LTI systems.

18. Show that if the input to an LTI system is periodic, then the output is periodic as well. Is
the fundamental period of the output necessarily equal to that of the input?

19. Find the output of an ideal lowpass filter with cutoff frequency B when the input is x(t) =
sign{cos(2πf0 t)}, assuming (i) B = 12 f0 , (ii) B = 32 f0 .

20. Consider a system that produces at its output a time-reversed version of the input, that is,
y(t) = x(−t). Is this system linear? Is it time invariant? Is it causal?

21. The output of a system is given by y(t) = x(t) + ax(t − η), where x(t) denotes the input,
and a, η are real numbers. Show that this is an LTI system. Find its impulse response and
transfer function. Find the magnitude of the transfer function. Under which conditions on
a, η is this system causal?

22. Show that if x(t) is periodic with period T0 and real-valued, then the coefficients of its Fourier
series representation satisfy x−n = x∗n for all n.

12
Answers.

1. For x(t) to be periodic, ff21 must be a rational number. In that case, let ff12 = nn21 where the
fraction is irreducible, i.e., n1 , n2 have no common divisors other than 1. Then the period is
T0 = nf11 = nf22 .

2. Both the real part and the magnitude of x(t) have even symmetry, whereas the imaginary
part and the phase have odd symmetry.

3. x(t) = u(t), the unit step function.


ejθ e−jθ
8. Y (f ) = 2 X(f − f0 ) + 2 X(f + f0 ).
1
9. Y (f ) = |B| X(f ) for − B2 < f < B
2 and zero otherwise.
2A
10. x(t) = T sinc(t/T ) cos(2πf0 t).
P∞ j n+1
11. X(f ) = − n=−∞ πn (1 + (−1)n+1 )δ(f − nf0 ).

12. The system is causal if and only if t0 ≥ 0. The system is LTI for all values of t0 . The impulse
response is h(t) = δ(t − t0 ), and the transfer function is H(f ) = e−j2πf t0 .

13. |H(f )| = 1 for all f . The phase is ∠H(f ) = −2πf t0 .

14. No.

15. It is an LTI system, with transfer function H(f ) = j2πf .

16. It is an LTI system, with impulse response h(t) = h1 (t) ? h2 (t) and transfer function H(f ) =
H1 (f )H2 (f ).

17. It is an LTI system, with impulse response h(t) = h1 (t) + h2 (t) and transfer function H(f ) =
H1 (f ) + H2 (f ).

18. Not necessarily. It can be an integer multiple.


4
19. (i) y(t) = 0 for all t; (ii) y(t) = π cos(2πf0 t).

20. It is linear, but not time invariant. It is not causal.

21. h(t) = δ(t) + aδ(t − η), H(f ) = 1 + ae−j2πf η , |H(f )| =


p
1 + a2 + 2a cos(2πf η). The system
is causal for all a as long as η ≥ 0.

13
Chapter 2

Probability & Statistics

2.1 Probability
2.1.1 Definition and properties
Based on a random experiment, the sample space (Ω) is the set of all possible outcomes. An event
is a subset of Ω. An event happens or takes place if the outcome of the experiment belongs in the
event.
Since events and sets are analogous concepts, it makes sense to define the inclusion, union and
intersection of events, as well as the complement of an event. Two events are disjoint if they cannot
happen simultaneously (i.e., their intersection is the empty set, also known as the impossible event
in this context).
Recall the frequentist interpretation of the concept of probability: if we repeat the experiment
n times, out of which the event A has happened nA times, then the probability of A is defined as
. nA
P(A) = lim
n→∞ n

Also recall the axioms to be satisfied by any probability measure:

1. P(A) ≥ 0 for any event A.

2. P(Ω) = 1, that is, the probability of the sure event is 1.

3. The probability of the union of an arbitrary number of disjoint events is the sum of their
individual probabilities.

From these axioms a number of properties follow:

• P(∅) = 0, that is, the probability of the impossible event is zero.

• P(Ā) = 1 − P(A), where Ā is the complement of A (so that A ∪ Ā = Ω and A ∩ Ā = ∅).

• P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

• If B ⊂ A, then P(B) ≤ P(A).

• For any event A it holds that 0 ≤ P(A) ≤ 1.

It is common to use the notation P(A, B) to denote the probability of the intersection P(A ∩ B).

14
2.1.2 Conditional probabilities
Given two events A and B, the probability of A conditioned on B (or equivalently, the probability
of A given B) is denoted as P(A|B), and represents the probability of occurrence of event A once
we know that event B has taken place. It is given by

P(A ∩ B) P(A, B)
P(A|B) = =
P(B) P(B)

and satisfies the following:

• P(A|Ω) = P(A).

• If A and B are disjoint, then P(A|B) = 0.

• If A ⊂ B, then P(A|B) ≥ P(A).

• If B ⊂ A, then P(A|B) = 1.

A partition of Ω is a collection of mutually disjoint events, all with strictly positive probability, and
such that their union is the sure event Ω. Then one has:

• Total Probability Theorem: If {A1 , A2 , · · · } is a partition of Ω, then


X
P(B) = P(B|Ai ) P(Ai ) for any event B.
i

• Bayes Theorem: for any two events A, B, it holds that

P(B|A) · P(A)
P(A|B) = .
P(B)

If {A1 , A2 , · · · } is a partition of Ω, this result, together with the Total Probability Theorem,
yields the following:
P(B|Ai ) · P(Ai )
P(Ai |B) = P .
j P(B|Aj ) P(Aj )

2.1.3 Independence
Two events A and B are said to be independent if the fact that we know that one of them has
happened or not does not affect the probability of the other. That is to say

P(A|B) = P(A)
A, B independent ⇔ ⇔ P(A ∩ B) = P(A) P(B).
P(B|A) = P(B)

If A and B are independent, then their complements Ā and B̄ are independent as well.

2.2 Random variables


We say that X is a random variable (RV) when it is associated to a non-deterministic experiment
and the particular value that X is to take, depending on such experiment, is unknown to us (even
though we may know the set, or range, of possible values that X may take).

15
• X is said to be a discrete RV if its range is finite, or countably infinite.

• X is said to be continuous if its range is not countable and all events of the type {X = x}
have zero probability.
An RV can take real or complex values.

2.2.1 Distribution and density functions


The Cumulative Distribution Function (CDF) of a real-valued RV X, denoted as FX (x) or just
F (x), is the accumulated probability from minus infinity up to the value x:
FX (x) = P(X ≤ x).
The Cumulative Distribution Function has the following properties:
• It is nonnegative: FX (x) ≥ 0 for all x.

• FX (−∞) = 0, whereas FX (+∞) = 1.

• FX (x) is nondecreasing: if x1 ≤ x2 , then FX (x1 ) ≤ FX (x2 ).

• If x1 < x2 , then the probability of X taking some value in between them is given by P(x1 <
X ≤ x2 ) = FX (x2 ) − FX (x1 ).
The Complementary Cumulative Distribution Function (CCDF) of an RV X is defined simply as
1 − FX (x) = P(X > x).
The probability density function (pdf) of an RV X is the derivative of the Cumulative Distri-
bution Function:
dFX (x)
fX (x) = .
dx
The probability density function has the following properties:
• fX (x) ≥ 0, since FX (x) is nondecreasing.

• The total area under the graph is one:


Z ∞
fX (x)dx = 1.
−∞

• The CDF can be obtained by integrating the pdf:


Z x
FX (x) = fX (u)du
−∞

• If x1 < x2 , then the probability of X taking some value between them is the area under the
graph in that interval:
Z x2
P(x1 < X ≤ x2 ) = FX (x2 ) − FX (x1 ) = fX (u)du.
x1

It is this property that gives the interpretation of fX (x) as a probability density.

• For a discrete RV taking values in the set {xi }, the pdf consists of a series of Dirac delta
functions located at the elements of such set, with amplitudes given by the corresponding
probabilities:
X
fX (x) = P(X = xi ) · δ(x − xi )
i

16
2.2.2 The Gaussian distribution
The Gaussian (or normal) distribution constitutes a probabilistic model which is well suited to
a large number of variables found in practice; in our case, it is of particular interest because it
adequately models the noise in communication systems.
The Gaussian pdf is bell-shaped, and is characterized by two parameters µ and σ. It is symmetric
with respect to µ, that is, for all x one has fX (µ − x) = fX (µ + x). The width of the bell is
proportional to the parameter σ. It is common to use the notation X ∼ N(µ, σ) to denote that X
follows a Gaussian distribution with parameters µ and σ. The expression of the pdf is
1 (x−µ)2
fX (x) = √ e− 2σ2 .
σ 2π
The values of the CDF have to be numerically computed. In practice, the CDF values of a standard-
ized normal RV are tabulated, and one can resort to these tables to find the CDF for any normal
RV. This means that the normal random variable X ∼ N(µ, σ) is transformed into a standard
normal RV Z ∼ N(0, 1) by means of the following change of variables:
X −µ 1 2
Z= ⇒ fZ (z) = √ e−z /2
σ 2π
It is common to use the Q function, defined as the CCDF of a standardized normal RV:
Z ∞
. 1 2
Q(z) = √ e−u /2 du = P(Z ≥ z) = 1 − FZ (z)
2π z
In this way, to compute the probability P(x1 < X ≤ x2 ) when X ∼ N(µ, σ), we note that
 
x1 − µ X −µ x2 − µ
P(x1 < X ≤ x2 ) = P < ≤
σ σ σ
x1 −µ x2 −µ
Therefore, making z1 = σ , z2 = σ , one has
Z z2
1 2
P(x1 < X ≤ x2 ) = P(z1 < Z ≤ z2 ) = √ e−u /2 du
2π z1
= Q(z1 ) − Q(z2 )
   
x1 − µ x2 − µ
= Q −Q .
σ σ
1
Observe that Q(0) = 2 and Q(−z) = 1 − Q(z).

2.2.3 Characteristics of a Random Variable


The expected value (or mean) of an RV X is defined in general as
Z ∞
E{X} = x · fX (x)dx,
−∞

which, in the case of a discrete RV X reduces to


X
E{X} = xi · P(X = xi )
i

The expected value has the following properties:

17
• If the pdf is symmetric about some value a, that is, if it holds that fX (x + a) = fX (x − a)
for all x, then E{X} = a.

• The expected value does not necessarily belong in the range of the RV.

• If the RV Y is obtained as a transformation of X, that is, Y = g(X), then


Z ∞ Z ∞
E{Y } = y · fY (y)dy = g(x) · fX (x)dx
−∞ −∞

which allows to compute the mean of Y without having to obtain first its pdf fY (y).

• The expectation operator is linear: for a, b deterministic constants, and X, Y random vari-
ables, one has E{aX + bY } = aE{X} + bE{Y }.

The variance of an RV X with E{X} = µ is defined in general as


Z ∞
2
Var{X} = E{(X − µ) } = (x − µ)2 · fX (x)dx,
−∞

which, in the case of a discrete RV X reduces to


X
Var{X} = (xi − µ)2 · P(X = xi )
i

The square root of the variance is known as the standard deviation.


The following are some properties of the variance:

• Var{X} ≥ 0.

• Var{X} = E{X 2 } − E2 {X}.

• If X is deterministic in the sense that it can only take a given value c with P(X = c) = 1,
then E{X} = c and Var{X} = 0.

• If a, b are deterministic constants, then Var{aX + b} = a2 Var{X}.

For a Gaussian RV X ∼ N(µ, σ) one has E{X} = µ and Var{X} = σ 2 , that is, µ is the mean
and σ 2 is the variance (so that σ is the standard deviation).

2.3 Random Vectors


2.3.1 Definitions
We say that X = [ X1 X2 · · · Xn ] is an n-dimensional random vector if for each i, Xi is a random
variable. The CDF of the random vector is defined as

FX (x) = P(X1 ≤ x1 , · · · , Xn ≤ xn ) where x = [ x1 · · · xn ],

whereas its pdf is


∂ n FX (x)
fX (x) = ,
∂x1 · · · ∂xn

18
which is non-negative: fX (x) ≥ 0 for all x ∈ Rn . The integral of the pdf is one:
Z
fX (x)dx = 1.
Rn

For any region D ⊂ Rn , the probability of X taking values in D is given by


Z
P(X ∈ D) = fX (x)dx.
D

The pdf of a random vectors is also known as the joint pdf. The marginal pdf ’s are those corre-
sponding to the 1-dimensional random variables given by the components of the random vector.
From the joint pdf the marginal pdf’s can always be obtained, but in general knowing all the
marginal pdf’s is not enough in order to find the joint pdf.
The marginal pdf of Xi is obtained by integrating the joint pdf with respect to the remaining
n − 1 components Xj , j 6= i.

2.3.2 Conditioning of Random Variables


Given a random variable X and an event A with nonzero probability, the conditional RV X|A is
defined in terms of its CDF and pdf:

P(X ≤ x, A) ∂FX|A (x|A)


FX|A (x|A) = ⇒ fX|A (x|A) = .
P(A) ∂x

The conditional expectation is defined then as


Z ∞
E{X|A} = xfX|A (x|A)dx
−∞

Note that the event A may be defined in terms of X; for example, A = {X > 0}, A = {a < X ≤ b},
etc.
The Total Probability Theorem applies now: if {A1 , A2 , · · · } is a partition of Ω, then
X
FX (x) = FX|Ai (x|Ai ) P(Ai )
i
X
fX (x) = fX|Ai (x|Ai ) P(Ai )
i
X
E{X} = E{X|Ai } P(Ai )
i

It is also possible to condition on the occurrence of an event of the type {Y = y} where Y is a


continuous RV. In this way, we can define the conditional probability of an event A given that Y
has taken the value y as
P(A)fY (y|A)
P(A|Y = y) = ,
fY (y)
which is analogous to the previously seen formulation of Bayes’ Theorem.
Analogously, the conditional pdf of X given that Y = y is defined as

fXY (x, y)
f (x|Y = y) =
fY (y)

19
where fXY (x, y) is the joint pdf of (X, Y ). Note that f (x|Y = y) is a funtion of two variables: x
and y. Analogously, one has
fXY (x, y)
f (y|X = x) = .
fX (x)
This definition of conditional pdf allows to extend the Total Probability Theorem to the case of
uncountable partitions made up by events of the type {X = x} sweeping over the whole range of
X: Z ∞ Z ∞
P(A) = P(A|X = x)fX (x)dx, fY (y) = f (y|X = x)fX (x)dx.
−∞ −∞
And analogously for Bayes Theorem:

P(A|X = x)fX (x) fX (x|Y = y)fY (y)


f (x|A) = R ∞ , f (y|X = x) = .
−∞ P(A|X = u)fX (u)du
fX (x)

2.3.3 Independence and covariance


Two random variables X, Y are said to be statistically independent (or just independent) if the
events {X ∈ A}, {Y ∈ B} are independent for any sets A, B. This definition is equivalent to any
of the following conditions for all x, y:

fXY (x, y) = fX (x)fY (y), f (x|Y = y) = fX (x), f (y|X = x) = fY (y).

If X, Y are independent then any function of X is independent of any function of Y . That is, the
random variables U = g(X) and V = h(Y ) are independent for any functions g, h.
Let X = [ X1 X2 · · · Xn ] be an n-dimensional random vector. We say that its components
are independent if
fX (x) = fX1 (x1 )fX2 (x2 ) · · · fXn (xn ).

The covariance of two random variables X, Y is the expected value of their product, after
previously substracting the means µX = E{X}, µY = E{Y }:

Cov(X, Y ) = E{(X − µX )(Y − µY )} = E{XY } − E{X}E{Y }.

If Cov(X, Y ) = 0 we say that X and Y are uncorrelated.


If two random variables X, Y are independent, then they are uncorrelated as well, since in that
case
Z ∞Z ∞
E{XY } = xyfXY (x, y)dxdy
−∞ −∞
Z ∞Z ∞
= xyfX (x)fY (y)dxdy
−∞ −∞
Z ∞ Z ∞
= xfX (x)dx yfY (y)dy = E{X}E{Y }.
−∞ −∞

The reciprocal is not true in general, with a notable exception: if the random variables X, Y are
Gaussian, then incorrelation and independence are equivalent.

20
2.3.4 Stochastic Processes
A random, or stochastic, process X(t) is a collection of random variables indexed by the variable t
(representing time, for example). In this way,

• When those RV’s take values, we obtain a deterministic function of t, referred to as a realiza-
tion of the stochastic process.

• By fixing t = t0 we obtain a random variable X(t0 ).

• When the RV’s take values and we fix t = t0 , we obtain a realization of the RV X(t0 ), that
is, a number.

Given a stochastic process X(t), the random variable X(t0 ) obtained by fixing t = t0 will have
a corresponding CDF. If we change the time instant we will obtain a different RV, whose CDF
could be different from that of X(t0 ). In this sense, the CDF changes with time, and therefore the
first-order CDF and pdf of the stochastic process X(t) are respectively defined as:
∂FX (x; t)
FX (x; t) = P(X(t) ≤ x) ⇒ fX (x; t) = .
∂x
If we fix two time instants, we will have two RV’s of the process, X(t1 ) and X(t2 ). Then the
second-order CDF and pdf are respectively defined as:
∂ 2 FX (x1 , x2 ; t1 , t2 )
FX (x1 , x2 ; t1 , t2 ) = P(X(t1 ) ≤ x1 , X(t2 ) ≤ x2 ) ⇒ fX (x1 , x2 ; t1 , t2 ) =
∂x1 ∂x2
and analogously for order n.

2.3.5 Statistics of a random process


The mean of the random process X(t) is defined as
Z ∞
E{X(t)} = µX (t) = xfX (x; t)dx.
−∞

For each value of t we have in principle different pdf’s fX (x; t), so that we will also have different
values of the mean. In other words, the mean of a random process is in general a function of time.
The autocorrelation of the process X(t) is defined as
Z ∞Z ∞
RX (t1 , t2 ) = E{X(t1 )X(t2 )} = x1 x2 f (x1 , x2 ; t1 , t2 )dx1 dx2 .
−∞ −∞

Taking t1 = t2 = t, we obtain the instantaneous power of the process at time t: RX (t, t) = E{X 2 (t)}.
Analogously, for two processes X(t), Y (t) their cross-correlation is defined as
Z ∞Z ∞
RXY (t1 , t2 ) = E{X(t1 )Y (t2 )} = xyfXY (x, y ; t1 , t2 )dxdy.
−∞ −∞

The processes X(t), Y (t) are said to be orthogonal if RXY (t1 , t2 ) = 0 for any t1 , t2 .
The autocovariance (or just covariance) and cross-covariance are respectively given by

CovX (t1 , t2 ) = RX (t1 , t2 ) − µX (t1 )µX (t2 ),


CovXY (t1 , t2 ) = RXY (t1 , t2 ) − µX (t1 )µY (t2 ).

21
Note that for t1 = t2 = t, the covariance reduces to the instantaneous variance of the process at
time t, that is, CovX (t, t) = Var{X(t)}. The processes X(t), Y (t) are said to be uncorrelated if
CovXY (t1 , t2 ) = 0 for any t1 , t2 .
Two random processes X(t), Y (t) are said to be statistically independient if their joint pdf of
any order can be factored as the product of two marginal pdf’s, the first containing terms depending
only on the process X(t) and the other containing terms depending only on Y (t).

2.3.6 Stationarity
Intuitively, we could state that a stochastic process is stationary if its statistical properties do not
change after we apply a time shift. This means that the underlying physical mechanism generating
the process does not change with time.
The definition of strict-sense stationarity is based on the invariance to time shifts of the pdf of
any order. In practice it is more tractable to work with a looser concept of stationarity: wide-sense
(or second-order) stationarity.
A process is said to be wide-sense stationary (WSS) if it satisfies the following two conditions:

• E{X(t)} = µ, that is, the mean of the process is independent of time..

• RX (t1 , t2 ) = RX (τ ) with τ = t2 − t1 , that is, the autocorrelation of the process only depends
on the difference between the time instants considered.

Two processes X(t), Y (t) are said to be jointly wide-sense stationary if each of them is WSS
and in addition RXY (t1 , t2 ) = RXY (τ ) with τ = t2 − t1 , that is, their cross-correlation only depends
on the difference between the time instants considered.
The correlation functions of WSS processes have the following properties:

• The instantaneous power remains constant over time, so we may refer to it simply as the
mean power of the process. This is because E{X 2 (t)} = E{X(t)X(t)} = RX (0).

• Similarly, the instantaneous variance remains constant over time.

• The cross-correlation satisfies RXY (τ ) = RY X (−τ ).

• The autocorrelation has even symmetry: RX (τ ) = RX (−τ ).

If a process X(t) is such that the RV’s X(t1 ), X(t2 ) are uncorrelated for any t1 6= t2 , then X(t)
is said to be a white process.

2.4 Problems (Probability & Statistics)


1. Consider the experiment ”throw two dice (one red, one blue) and add the result”. Let S be
the resulting sum. Assuming both dice are fair, find:

(a) The value of S with largest probability.


(b) The probability of S being equal to 8, if we know that one of the two dice (we don’t
know which) shows 1.

22
(c) The probability of S being equal to 8, if we now that one of the two dice (we don’t know
which) shows 3.
(d) The probability of S being equal to 8, if we now that the red die shows 3.
(e) The probability of S being equal to 8, if we now that S is not a multiple of 5.
(f) The probability of S being an even integer.
(g) The probability of S being a prime number.
(h) The probability of some of the dice showing 2, given that S = 6.
(i) The probability that one of the dice shows an even number and the other shows an odd
number.
(j) The probability of some of the dice showing an even number.

2. In Macondo, the car sales market is shared by three makers: Nord (20%), Hinault (30%) and
Masda (50%). The probability of a car needing an important repair during its first year is
0.05, 0.1 and 0.15 respectively for each of the above makers.

(a) What is the probability that a car from Macondo requires an important repair during
its first year?
(b) Col. A. Buendia’s car had to have its starter fixed ten months after he bought it. What
is the probability that the Colonel drives a Nord?

3. An information source produces 0’s and 1’s with probabilities 0.3 and 0.7 respectively, which
are then transmitted through a channel whose error probability (i.e., the probability of flipping
a 0 into a 1, or a 1 into a 0) is 0.2. A single symbol is transmitted.

(a) What is the probability of observing 1 at the channel output?


(b) A 1 is observed at the channel output. What is the probability that the transmitted
symbol is 1?

4. Show that for X ∼ N(0, σ), the probability that X is no larger in absolute value than its
standard deviation is 1 − 2Q(1) (≈ 0.6827).

5. Show that for statistically independent random variables X, Y , it holds that Var(X + Y ) =
Var(X) + Var(Y ).

6. Consider the n-dimensional random vector X = [ X1 X2 · · · Xn ]. Assume that the compo-


nents of X are statistically independent.

(a) Show that any subset of k < n components of X are independent as well. That is, for
xi(1) , xi(2) , . . . , xi(k) with 1 ≤ i(1) < i(2) < · · · < i(k) ≤ n, show that

fXi(1) ···Xi(k) (xi(1) , · · · , xi(k) ) = fXi(1) (xi(1) ) · · · fXi(k) (xi(k) ).

(b) Show that conditioning one of the components of X on a subset of other k components
does not change the pdf. That is,

fXj xj | Xi(1) = xi(1) , · · · , Xi(k) = xi(k) = fXj (xj )

as long as j ∈
/ {i(1), . . . , i(k)}.

23
(c) What happens in the above case if j = i(`) for some `?

7. Consider the stochastic process X(t) = A cos(ωt + Θ), where A is a random variable uni-
formly distributed in [−b, b], Θ is a random variable uniformly distributed in [−π, π], and
ω is deterministic. Further, A and Θ are statistically independent. Determine whether this
process is wide-sense stationary.

8. Consider two random processes X(t), Y (t).

(a) Show that if X(t), Y (t) are jointly WSS, then the process Z(t) = aX(t) + bY (t) is WSS
for any a, b.
(b) Assume X(t), Y (t) are jointly WSS, and in addition each one of them is a white process.
Is it true that Z(t) = aX(t) + bY (t) is also a white process?

24
Answers.

1. (a) S = 7, with probability 1/6.


(b) 0.
(c) 2/11.
(d) 1/6.
(e) 5/31.
(f) 1/2.
(g) 4/9.
(h) 2/5.
(i) 1/2.
(j) 3/4.

2. (a) 11.5 %.
(b) ≈ 8.7 %.

3. (a) 0.62.
(b) 28/31.

7. It is: the mean is zero (and therefore constant), and the autocorrelation is RX (t1 , t2 ) =
b3
6 cos(ω(t2 − t1 )), which depends only on t2 − t1 .

8. (b) Not necessarily.

25

You might also like