Download as pdf or txt
Download as pdf or txt
You are on page 1of 56

Random Signals

Basic Probability Theory


Probability theory begins with three basic components.

1. The set of all possible outcomes (elementary events, sample space) denoted Ω.

2. The collection of all sets of outcomes (events), denoted S.

3. A probability measure P.
Specification of the triple (Ω, S, P) defines the probability space which models a real-
world measurement or experimental process.
Example:

Ω = {all the results (outcomes) of the dice roll} = {1, 2, 3, 4, 5, 6} ,

S = {all possible sets Sk of outcomes} = {{1}, … , {6}, {1, 2}, … , {5, 6}, … , {1, 2, 3, 4, 5, 6}}

P = probability of all sets/events Sk .


2
Basic Probability Theory
Probability theory begins with three basic components.

1. The set of all possible outcomes (elementary events, sample space) denoted Ω.

2. The collection of all sets of outcomes (events), denoted S.

3. A probability measure P.
Specification of the triple (Ω, S, P) defines the probability space which models a real-
world measurement or experimental process.
How many events
Example: occur as a result of
one rolling?
Ω = {all the results (outcomes) of the dice roll} = {1, 2, 3, 4, 5, 6} ,

S = {all possible sets Sk of outcomes} = {{1}, … , {6}, {1, 2}, … , {5, 6}, … , {1, 2, 3, 4, 5, 6}}

P = probability of all sets/events Sk .


3
Basic Probability Theory
Probability theory begins with three basic components.

1. The set of all possible outcomes (elementary events, sample space) denoted Ω.

2. The collection of all sets of outcomes (events), denoted S.

3. A probability measure P.
Specification of the triple (Ω, S, P) defines the probability space which models a real-
world measurement or experimental process.
How many events
Example: occur as a result of
one rolling?
Ω = {all the results (outcomes) of the dice roll} = {1, 2, 3, 4, 5, 6} ,

S = {all possible sets Sk of outcomes} = {{1}, … , {6}, {1, 2}, … , {5, 6}, … , {1, 2, 3, 4, 5, 6}}
32 sets,
P = probability of all sets/events Sk .
out of all ?? possible sets.
4
Basic Probability Theory
Probability theory begins with three basic components.

1. The set of all possible outcomes (elementary events, sample space) denoted Ω.

2. The collection of all sets of outcomes (events), denoted S.

3. A probability measure P.
Specification of the triple (Ω, S, P) defines the probability space which models a real-
world measurement or experimental process.
How many events
Example: occur as a result of
one rolling?
Ω = {all the results (outcomes) of the dice roll} = {1, 2, 3, 4, 5, 6} ,

S = {all possible sets Sk of outcomes} = {{1}, … , {6}, {1, 2}, … , {5, 6}, … , {1, 2, 3, 4, 5, 6}}
32 sets,
P = probability of all sets/events Sk .
out of all 64 possible sets.
5
𝑛 = numer of elementary events = 6, 𝑘 = size of a possible subset 𝑆𝑘

Number of all events 𝑛−1 5


𝑛−1 5
with one selected = ෍ = ෍ = 1 + 5 + 10 + 10 + 5 + 1 = 32
𝑘 𝑘
element 𝑘=0 𝑘=0

Number of all 𝑛 6
𝑛 6
possible subsets Sk = ෍ = ෍ = 1 + 6 + 15 + 20 + 15 + 6 + 1 = 64
𝑘 𝑘
(events) on  set 𝑘=0 𝑘=0
∅ Ω

𝑛 𝑛!
=
𝑘 𝑘! 𝑛 − 𝑘 !

6
𝑛 = numer of elementary events = 6, 𝑘 = size of a possible subset 𝑆𝑘

Number of all events 𝑛−1 5


𝑛−1 5
with one selected = ෍ = ෍ = 1 + 5 + 10 + 10 + 5 + 1 = 32
𝑘 𝑘
element 𝑘=0 𝑘=0

Number of all 𝑛 6
𝑛 6
possible subsets Sk = ෍ = ෍ = 1 + 6 + 15 + 20 + 15 + 6 + 1 = 64
𝑘 𝑘
(events) on  set 𝑘=0 𝑘=0

𝑛Blaise Pascal
𝑛!
=
𝑘 1623𝑘! - 𝑛1662
−𝑘 ! Pascal's triangle
Pascal created the concept of expected value.

7
An event is a result of some experiment.

In probability theory, an elementary event (also called a simple event) is an


event which contains only a single outcome in the sample space.

Probability Measures 𝑃(𝑆𝑘 ) ≜ Pr{𝑆𝑘 } – Kolgomorov axioms

Probability measures must satisfy the following properties:


1. Pr {Sk } ≥ 0 , ∀ Sk ∈ S, where S = {all possible sets Sk of events},
2. Pr{Ω} = 1, where Ω is a set of all possible elementary events,
𝐾 𝐾

3. For disjoint sets, Pr ራ 𝑆𝑘 = ෍ Pr {𝑆𝑘 } . (disjoint vs. not disjoint)


𝑘=1 𝑘=1

8
From the Kolmogorov axioms, one can deduce other useful rules (no axioms !)
for calculating probabilities.

1. The probability of the empty set: Pr ∅ = 0.

2. Monotonicity: if 𝐴 ⊆ 𝐵 than Pr{𝐴} ≤ Pr{𝐵}.

3. The numeric bound (limit): 0 ≤ Pr 𝑆𝑘 ≤ 1 for ∀𝑆𝑘 ∈ 𝑆.

Example: Proof of (1):


Pr ∅ ∪ ∅ = Pr ∅ + Pr ∅ = 2 Pr ∅
Pr ∅ ∪ ∅ = Pr ∅ }  2Pr ∅ = Pr ∅  Pr ∅ = 0

Remember: the inverse theorem is not true.


9
Random Variable (precisely: real-valued random variables)
A random variable is a mapping from the sample space (event space )
to the set of real numbers. In other words, a random variable is a function
from the sample space to the real numbers, Ω → R .

For us it will be  or 𝜉 and not 


because  is for us a symbol for
frequency in rad/s.

Complex random variables can


always be considered as pairs of
real random variables: their real
and imaginary parts.

10
Random Process and Random Signal
A random process is a collection of time functions, or signals, corresponding to various
outcomes (results 𝜉 ) of a random experiment. For each outcome 𝜉, there exists

a deterministic function, which is called a sample function or a realization 𝑥𝜉 (𝑡).


𝑥𝜉 (𝑡𝑘 ) − set of random values at 𝑡𝑘 ⇛ 𝑿(𝑡𝑘 , 𝜉)
𝑥1 (𝑡)
Real number

𝑥2 (𝑡)
A set 𝑿 𝑡, 𝜉 (process) of sample
functions or realizations, that we
will consider as deterministic
𝑥𝑁 (𝑡)
functions of time.

time
A random signal x(t) can be any signal from a set of signals 𝑿 𝑡, 𝜉 = 𝑥1 𝑡 , 𝑥2 𝑡 , … , 𝑥𝜉 𝑡 , …
which is the sample space for the random process 𝑿(𝑡, 𝜉 ).
11
A random process is not just one signal but
an ensemble of signals, as shown
schematically in the figure beside, for which
the outcome of the probabilistic experiment
could be any of the four waveforms indicated.
In our model, each waveform is deterministic,
but the process is probabilistic or random
because it is not known a priori which
waveform will be generated by the
probabilistic experiment.
Consequently, before obtaining the result of a
probabilistic experiment (or a priori), there is
uncertainty about what signal will be They will be synonyms for us:
produced. After the experiment (or a • random process or function set X(t, 𝜉 ),
posteriori), the result is completely • random signal x(t),
determined.
12
Cumulative Distribution Function

In probability theory and statistics, the cumulative distribution function (CDF), or


simply distribution function, describes the probability that a real-valued random
variable or signal x(t) will be found to have a value less than or equal to x value.
The distribution function P(x) [or Px(x)] of the random signal (process) x(t) is given by

Please note that from now, P(x)


denotes a distribution function C
and not a probability, denoted D
by Pr { }, and x is just the value F
of the signal x(t).

13
The CDF has the following properties:

CDF is non-decreasing function in limited interval 0  P(x)  1.


In the definition (2.1) and properties (2.2), the "less than or equal to" - mark
"≤", is a convention for continuous CDF functions, not universally used, but
is important for discrete distributions.
Remember that for continuous signals x(t):

14
For quantized signals when xi  DX = {x1, x2, x3, ..., xk , …, xK} and appropriate
probabilities are defined as

and then

15
Probability Density Function - PDF
The PDF is defined as the derivative of the cumulative distribution function:

The terms „probability distribution function” or „probability function”


are often used to denote the probability density function.
16
The PDF has the following properties:

Since (2.6d) for continuous distributions (and continuous random signals x(t) )
the probability at a single point is always zero, as an integral over a single point.
17
Expected Value – Mean Value
In probability theory, the expected value of a random variable is intuitively the long-run
average value of repetitions of the experiment it represents. For example, the
expected value of a six-sided dice roll is 3.5 because the average of an extremely
large number of dice rolls is practically always nearly equal to 3.5.
More practically, the expected value of a discrete random variable is the probability-
weighted average of all possible values.
General definition for continuous random signal x(t) or random variable defined
on a probability space (Ω, S, P) is defined as

(2.7)
and for quantized signal as

(2.8)

18
Quantized Random Signals - Probability Mass Function (PMF - pk)
𝑝𝑘

𝑥𝑘

𝐶𝐷𝐹

𝑥𝑘

 =  xi pi = 1,0 0,2 +1,5 0,3 + 2,0 0,3 + 3,7 0,05 + 6,0 0,15 = 2,335 19
Expected Value Properties

For statistically independent signals, when a joint (cross) probability density function

pxy (x,y) = px (x) py(y) ,

and than

20
Variance
In probability theory, variance measures how far a set of numbers is spread out.
A variance of zero indicates that all the values are identical.
Variance is always non-negative: a small variance indicates that the data points tend
to be very close to the mean (expected value) and hence to each other, while a high
variance indicates that the data points are very spread out around the mean and from
each other.
The variance of a set of samples that is represented by random variable X is its
second central moment, the expected value of the squared deviation from the mean
μ = E[X] is:
(2.9a)

(2.9b)

where 𝜎𝑥 means so-called standard deviation.


21
Practical formula for computing variance has form

V[X] = E[X 2] – E2[X] = E[X 2] – x2 = Px – x2 . (2.10)

Formula for computing total average power of signal x(t) is:

Px = V[X ] + x2 = Power of xAC + Power of mean value (2.11)


where xAC = x - x and E[xAC] = 0.

Properties:

for (2.12)
independent
signals
22
For an random signals (process) representing an electrical signal – for example
a voltage u(t) - we can identify some common terms as follows:
• mean value u is the DC-component of u(t)  𝑢DC ,
• the square of the mean u2 is the DC-power u2 (t)  𝑢DC
2
,
• the mean-squared value E[u2] is the total average power P of u(t),
• the variance σu2 is the AC-power (the power in the time-varying part uAC(t)),
• the standard deviation σu is the RMS value of the time-varying part uAC(t), but not
𝑢RMS = 𝑃.
Be sure not to make the common mistake of confusing "the square of the mean" with
"the mean-squared value", which means "the mean of the square".
In general, the mean square is greater than the square of the mean (ultimately equal).
23
For dependent (correlated) random signals

Var 𝑥 𝑡 ± 𝑦 𝑡 = Var 𝑥 𝑡 + Var 𝑦 𝑡 ± 2Cov 𝑥 𝑡 , 𝑦 𝑡 , (2.13)


where covariance (or cross-covariance) between two random signals x(t) and y(t)
with finite second moments is defined as
Cov 𝑥 𝑡 , 𝑦 𝑡 = E 𝑥 𝑡 − 𝜇𝑥 𝑦 𝑡 − 𝜇𝑦 =E𝑥 𝑡 𝑦 𝑡 − 𝜇𝑥 𝜇𝑦 . (2.14)
Because for independent signals it holds E 𝑥 𝑡 𝑦 𝑡 = 𝜇𝑥 𝜇𝑦 , (see page 19)
therefore for these signals

Cov 𝑥 𝑡 , 𝑦 𝑡 = Cov 𝑥0 𝑡 , 𝑦0 𝑡 = Cov 𝑥AC 𝑡 , 𝑦AC 𝑡 =0. (2.15)

Signals satisfying the dependence (2.15) are called uncorrelated.

For quantized signal variance is defined as:

Var 𝑥 𝑡 = V 𝑥 = ෍ 𝑥𝑘 − 𝜇𝑥 2 𝑝𝑘 .
𝑘
24
For dependent (correlated) random signals

Var 𝑥 𝑡 ± 𝑦 𝑡 = Var 𝑥 𝑡 + Var 𝑦 𝑡 ± 2Cov 𝑥 𝑡 , 𝑦 𝑡 , (2.13)


where covariance (or cross-covariance) between two random signals x(t) and y(t)
with finite second moments is defined as
Correlation 𝑹𝒙𝒚
Cov 𝑥 𝑡 , 𝑦 𝑡 = E 𝑥 𝑡 − 𝜇𝑥 𝑦 𝑡 − 𝜇𝑦 =E𝑥 𝑡 𝑦 𝑡 − 𝜇𝑥 𝜇𝑦 . (2.14)
Because for independent signals it holds E 𝑥 𝑡 𝑦 𝑡 = 𝜇𝑥 𝜇𝑦 , (see page 19)
therefore for these signals

Cov 𝑥 𝑡 , 𝑦 𝑡 = Cov 𝑥0 𝑡 , 𝑦0 𝑡 = Cov 𝑥AC 𝑡 , 𝑦AC 𝑡 =0. (2.15)

Signals satisfying the dependence (2.15) are called uncorrelated.

For quantized signal variance is defined as:

Var 𝑥 𝑡 = V 𝑥 = ෍ 𝑥𝑘 − 𝜇𝑥 2 𝑝𝑘 .
𝑘
25
Uniform Probability Density Function (continuous case of p(x))

Figure: a) PDF; b) CDF.

26
Expected value:
variance:

end finally variance:

27
Expected value:
variance:

end finally variance:

When a =  - A, b =  + A than

28
For given  and  uniform PDF formula is:

29
Uniform Probability Density Function - discrete case
Probability Mass Function (PMF or pmf)

Figure: a) PDF  PMF; b) CDF.

A simple example of the discrete uniform distribution is throwing a fair regular die.
The possible values are 1, 2, 3, 4, 5, 6, and each time the die is thrown the
probability of a given score is 1/6.

30
Uniform Probability Density Function - discrete case
Probability Mass Function (PMF or pmf)

Figure: a) PDF  PMF; b) CDF.

A simple example of the discrete uniform distribution is throwing a fair regular die.
The possible values are 1, 2, 3, 4, 5, 6, and each time the die is thrown the
probability of a given score is 1/6.

If two ideal dice are thrown and their values added, the resulting distribution is

31
Uniform Probability Density Function - discrete case
Probability Mass Function (PMF or pmf)

Figure: a) PDF  PMF; b) CDF.

A simple example of the discrete uniform distribution is throwing a fair regular die.
The possible values are 1, 2, 3, 4, 5, 6, and each time the die is thrown the
probability of a given score is 1/6.

If two ideal dice are thrown and their values added, the resulting distribution is
no longer uniform since not all sums have equal probability!
32
Normal Probability Density Function (continuous p(x))
A random signal (or process ) x(t) is said to be normally (Gaussian) distributed with
mean µ and variance σ2 if its probability density function is:
 ( x −  )2 
− 
1  2 2 
f ( x) = e 
. (2.16)
 2

1777 - 1855 The normal distribution is sometimes informally called the bell curve.
33
Normal Probability Density Function (continuous p(x))
A random signal (or process) x(t) is said to be normally (Gaussian) distributed with
mean µ and variance σ2 if its probability density function is:
 ( x −  )2 
− 
1  2 2 
f ( x) = e 
. (2.16)
 2
?

1777 - 1855 The normal distribution is sometimes informally called the bell curve. 34
Normal Probability Density Function (continuous p(x))
A random signal (or process) x(t) is said to be normally (Gaussian) distributed with
mean µ and variance σ2 if its probability density function is:
 ( x −  )2 
− 
1  2 2 
f ( x) = e 
. (2.16)
 2
?


2 π
න e−(𝑎𝑥) d𝑥 = ,
𝑎
−∞

1
where 𝑎= .
𝜎 2
1777 - 1855 The normal distribution is sometimes informally called the bell curve. 35
N(, )

Figure: a) PDF; b) CDF.


1 𝑥−𝜇
heads or tails 𝑃𝑥 𝑥 = 1 + erf
2 𝜎 2
-1 or 1

The bean machine or Galton box, is


a device invented by Sir Francis Galton
to demonstrate the central limit theorem,
in particular that the normal distribution is
approximate to the binomial distribution.
36
N(, )

Figure: a) PDF; b) CDF.


1 𝑥−𝜇
heads or tails 𝑃𝑥 𝑥 = 1 + erf
2 𝜎 2
-1 or 1 𝑥
2 −𝜉 2
erf 𝑥 = න e 𝑑𝜉
𝜋
0
The bean machine or Galton box, is
a device invented by Sir Francis Galton
to demonstrate the central limit theorem,
in particular that the normal distribution is
approximate to the binomial distribution.
37
About 68% of values drawn from a normal distribution are within one standard
deviation σ away from the mean; about 95% of the values lie within two standard
deviations; and about 99,7% are within three standard deviations.
This fact is known as The normal distribution N(𝜇, 𝜎) with
parameter values 𝜇 = 0 and 𝜎 = 1 is called
the 3-sigma rule.
the standard normal distribution N(0,1) .
In order to obtain a signal 𝑥(𝑡) with
distribution N(𝜇, 𝜎) from the standard normal
signal 𝑢(𝑡), we apply the transformation
𝑥 𝑡 = 𝑢 𝑡 𝜎𝑥 + 𝜇𝑥
and vice versa, N(0,1) distributed signal 𝑢(𝑡)
as
𝑥 𝑡 − 𝜇𝑥
𝑢 𝑡 = .
𝜎𝑥
38
The marginal probability
density functions:

39
Probability density of the sum of the signals

If u(t) = x(t) + y(t) and signals are statistically independent, that is 𝑝𝑥𝑦 𝑥, 𝑦 = 𝑝𝑥 (𝑥)𝑝𝑦 (𝑦),
thus CDF

The density probability function of the sum of independent signals is a linear convolution
of the marginal density functions

(2.17)

40
The Central Limit Theorem
The central limit theorem is one of the great results of mathematics.
It explains the omnipresent occurrence of the normal distribution in nature.
The theorem states that the average of many independent and identically distributed
random variables (identical  and ) with finite variance tends towards a normal
distribution irrespective of the distribution followed by the original random variables.
The Central Limit Theorem describes the relationship between the distribution of
sample means and the population that the samples are taken from.

Let’s xi(t) is a random signal with  and  and , then

This tells us that the distribution of sample means has the same center as the
population, but it is not as spread out.
41
42
43
Correlations
A random signal x(t) can be any signal from a set of signals {x1(t), x2(t),…, xk(t),…}
which is the sample space for the random signal x(t).
The probability that x(t) will equal xk(t) is Prx(t)x(t) = xk(t) = px(t)[xk(t)] .

The expected value („mean”) of x(t) is

x(t) = E[x(t)] =  xk(t) px(t)[xk(t)] .


k

The autocorrelation Rxx (t1, t2) of x(t)


is defined as
Rxx (t1, t2) = E[x(t1)x(t2)] =  xk(t1) xk(t2) px(t)x(t)[xk(t1), xk(t2)] (2.18)
k
and the crosscorrelation between two random signals y(t) and x(t) - Ryx (t1, t2) is defined as

Ryx (t1, t2) = E[x(t1)y(t2)] =  xk(t1) yk(t2) px(t)y(t)[xk(t1), yk(t2)] . (2.19)


k
44
Stationary and Ergodic Signals
Stationary random signals are those whose characteristics do not depend upon the time.
This implies the following:

a) the mean values are independent of time, i.e.

 x(t) = E[x(t)] =  x , (2.19)

b) the auto- and cross- correlation functions are functions of the time difference only,
named lag , that is

Rxx (t1, t2) = E[x(t1)x(t2)] = Rxx (t2 - t1) = Rxx ( ) (2.20)


and
Ryx (t1, t2) = E[x(t1)y(t2)] = Ryx (t2 - t1) = Ryx ( ) . (2.21)

All signals studied in this course will be stationary.


45
Averages of signals can be done in two ways,
a) ensemble average: average in the sample space is average with respect to a probability
density function, as for example  x = E[x(t)] =  xk(t) px[xk(t)] ,
k
b) time average: average evaluated over a period of time as follows,
T
1
xk = lim  xk(t)dt .
T → T
0

When ensemble average is equal to the corresponding time averages the signal is called
ergodic. For ergodic signals, for all indeks k

 x = E[x(t)] = −xk
and
T
1
Rxx(, k) = lim
T → T  xk(t)xk(t+ )dt = Rxx () .
0

Ergodicity implies that each signal in the sample space is representative of the whole set.
For a process to be ergodic it must be stationary. The converse is not true.
All signals studied in this course will be ergodic.
46
A random process is called strict-sense (or strong-sense) stationary (SSS)
if its probability structure (as PDF, CDF) is invariant with time. Note that for
a Gaussian or an uniform process WSS implies SSS, because those process are
entirely determined by the their first and second moments (see pp. 29,33-35).

Cleary, always SSS  WSS. The converse is not necessarily true.


47
Finally, for ergodic process (power signals) correlation functions are defined as

(2.22)

, (2.23a)

(2.23b)

48
Finally, for ergodic process (power signals) correlation functions are defined as

even (2.22)
function

, (2.23a)
odd
functions
(2.23b)

49
Finally, for ergodic process (power signals) correlation functions are defined as

(2.22)

, (2.23a)

(2.23b)

In automatic control, everyone knows that it is only appropriate to count the correlation of
the output (usually denoted by y) with the input (denoted as x or u). Then the order of the
indexes does not mean the order of the signals – it is only an alphabetical order).

50
Finally, for ergodic process (power signals) correlation functions are defined as

(2.22)

, (2.23a)

(2.23b)

In automatic control, everyone knows that it is only appropriate to count the correlation of
the output (usually denoted by y) with the input (denoted as u). Then the order of the
indexes does not mean the order of the signals (alphabetical order)

51
52
( x: result, after-effect, y: reason, cause ) ( y: result, after-effect, x: reason, cause )

53
Remember

(2.22)

, (2.23a)

(2.23b)

and in addition appropriate covariances

Covyx() = E[(y(t +) – y)(x(t) – x)] = Ryx() -  y  x , (2.24a)

Covxy() = E[(x(t +) – x)(y(t) – y)] = Rxy() -  x  y . (2.24b)


54
Properties of correlations and covariance functions

Rxx() = Rxx(-) ,
Rxx (0) = E[x(t)2] = x2 + x2 = Px ,
Rxx (0)  Rxx() ,
lim Rxx () = x2.
→

narrow wide, large 55


If two random signals (processes) are statistically independent then
𝑝𝑥𝑦 𝑥, 𝑦 = 𝑝𝑥 (𝑥)𝑝𝑦 (𝑦)
and their covariance is zero (implication ). However, if two random variables
have a zero covariance that does not mean they are necessarily independent.

If covariance 𝐶𝑜𝑣𝑥𝑦 0 = 0, we say that the signals x(t) and y(t) are
uncorrelated. Note again that the term “uncorrelated” in its common usage
means that the processes have zero covariance rather than zero correlation,
because
𝐶𝑜𝑣𝑥𝑦 𝜏 = 𝑅𝑥𝑦 𝜏 − 𝜇𝑥 𝜇𝑦 .

If Rxy(0) = 0, the two random signals (processes) are said to be orthogonal.

56

You might also like