ISVR6130

Join: vevox.
app ID: 180-736-270
Fourier Integrals: Part 1
Paul White
1
Join: vevox.app ID: 180-736-270 POLL OPEN
If a signal has a period Tp, the spacing (in frequency)

between two successive harmonics is:
a) Tp
6.67%
b) 1/Tp
86.67%
c) Tp2
6.67%
d) 1/Tp2
0%
2
If a Fourier series has coefficients, dn, that roll-off at 40 dB per decade at

high frequency. What of the following are true regarding the asymptotic
(high frequency) behaviour of dn?
Vote for up to 4 choices
1 1
a) |𝑑𝑛|∝ 𝑛 equivalent to |𝑑𝑛|∝ 𝑓
0%
1 1
b) |𝑑𝑛|∝ 2 equivalent to |𝑑𝑛|∝
𝑛 𝑓2
85.71%
1 1
|𝑑𝑛|∝ |𝑑𝑛|∝ 4
c) 𝑛
4 equivalent to
𝑓
21.43%
d) The coefficients roll-off at approximately 12 dB per octave
28.57%
(% = Percentage of Voters) 3
If a signal has Fourier series coefficients which roll-offs

asymptotically at 40 dB per decade, what do this say about the
signal?
a) It is discontinuous.
6.67%
b) It is continuous but is discontinuous when differentiated.
60%
c) It’s continuous, its derivative is continuous, its second derivative is discontinuous
20%
d) It is discontinuous but its derivative is continuous
13.33%
4
Contents
• Delta functions
• Fourier Series for non-periodic signals
• Definition
• Examples
• Properties
5
Dirac Delta Functions
Paul Dirac
1902-84
• The concept of Dirac delta functions occurs frequently when

considering Fourier integrals.
• Strictly not functions at all, but distributions.
• Usually denoted d(t) and multiple definitions exist, but the
most familiar is probably:
 t  lim R t 
 0
R t  1/  t /2
R t   0
0 t /2
Note Re(t) always has
an area of 1 1/e
6
-e/2 e/2
Properties of a Delta Function
 0 t 0
• Zero everywhere, except at t=0, where it is infinite,  t  
 t 0
• Symmetric
  t   t  t
 means “for all”
• Unit area b
  t dt 1
t a
b  0, a  0
• Sifting property
b
  t    x t dt x  
t a
a b
The sifting property is the key one we shall use

7
Sifting Property in Pictorial Form

  t    x t dt x   d(t-t)

x(t) x(t)
t t
d(t-t)x(t)
Area
is x(t)
t t
• The sifting property provides us a mathematical way of finding

the value of a signal at a point in time, i.e. sampling the signal.
8
Examples of the Sifting Property
 4 2
 t 2
 t  3  dt 32
9   t  3dt 9
t 2
  t  3dt 0
t 2
 2 0

 sin 2t  t 1dt  1 sin  2  0

2 2
t

e
 t2
cos t  t dt e  02
cos 0  1

 t  2t cos 20t  t  a dt a3  2a cos 20 a 

3

1/2
1/2 2 1/2  t  13
 7
  t       1d dt   t  1 dt 
2 2
 
t 0  2 t 0  3  0 24
9
Alternative (useful) Definition
• In Fourier analysis one frequently encounters delta functions.
• Consider the following integral

I  f    e 2 ift dt

• Consider how this integral behaves as a function of f:
– When f0, the integrand (e 2 ift ) oscillates. When it is integrated
over the whole of the t axis you get 0.
– When f=0, the integrand is 1. When that is integrated over all t then
the result is infinite.
– So the integral is equal to 0 for all f  0 and infinity when f=0, like a
delta function. 
I  f    e 2 ift dt   f 

This is strictly not a proof that the integral is a delta function, but does show that it is not an unreasonable assertion. 10
Fourier Series Revisited
• FS of a signal with a period, Tp=0.1 s (fp=1/Tp=10 Hz)
|dn| Components
separated by fp
fp
-40 -30 -20 -10 0 10 20 30 40 f (Hz)
• FS of a signal with a period, Tp=0.25 s (fp=1/Tp=4 Hz)

|dn|
fp
-40 -30 -20 -10 0 10 20 30 40 f (Hz)
As Tp increase, fp reduces, so the spacing between Fourier series components reduces.
11
Non-Periodic Signals
• What is the frequency domain representation of a non-
periodic signal? (Answer: the Fourier Integral!)
• One can consider a non-periodic signal as a signal which is
periodic, but whose period (Tp) is ∞!
• Consider the definition of the (complex) FS coefficients
Tp /2
1
x t e
 2 int /Tp
dn 
Tp 
 Tp /2
dt
Note the limits are –Tp/2 to Tp/2, rather
than 0 to Tp: this makes no difference
• Now consider as long as the integral covers one period of
the signal.
Tp /2 Tp /2
1 1  
x t e x t e
 2 int /Tp  2 i nf p t
lim d n  lim
Tp   Tp   T
p

 Tp /2
dt  lim
Tp   T
p

 Tp /2
dt
12
…. in the limit
Tp /2 1/2 f p
1    
x t e x t e
 2 i nf p t  2 i nf p t
lim
Tp   T
p

 Tp /2
dt  lim f p
fp 0 
 1/2 f p
dt
Tend to zero
• Assume x(t) has finite support, say it is zero for |t|>u, then
Tp /2 u
   
x t e dt   x t e
 2 i nf p t  2 i nf p nt
lim
Tp   
 Tp /2 u
dt  
• So that
d n  0 as Tp  
• Not much use! (In the limit all the FS coefficients tend to zero)
13
Alternative Approach
• Instead of dn , how about considering dnTp=dn/fp?
• Recall that fp is the spacing between harmonics (as well as the
fundamental frequency) so we could denote it Df.
Tp /2
dn dn
 lim d nTp  lim  x t e   dt
 2 i nf t
lim  lim
Tp   f T p   f Tp   Tp  
p  Tp /2
• This limit is well behaved for many signals.

• nDf represents a frequency, we can just call that f.
• The spacing, in frequency, between dn and dn+1 tends to zero
• And we write: 
This is the Fourier integral,
X  f    x t e  2 ift dt mapping a signal in time, t, to a
 representation in frequency f.
14
Inverse Fourier Integral
• It can be shown (using definition of a Dirac delta function, see
example proofs) 
x t    X  f e 2 ift df

• This “undoes” the Fourier integral, creating a time series from

its Fourier representation.
• Note the essential similarity between the forward and inverse
transforms both like:

 e 2 ift d

This sign is different between the two

15
Basic Properties
• Operator notation: X  f  F  x t  and x t  F  1  X  f 
• Linearity F  a x t   b y t  a X  f   bY  f  a, b scalar constants
F  x  t   X  f 
*
• Time-reversal
• Time- shifts F  x t    e  2 if  X  f 
• Conjugate symmetry X  f   X  f *
 X  f   X  f  , arg  X  f   arg  X  f 
Signal Symmetries
• Symmetric signals, i.e. x  t   x t  then X  f  is real, since
from the above X  f   X  f 
*
• Anti-symmetric signals, i.e. x  t   x t  then X  f  is

purely imaginary , since X  f *  X  f 
*
In general for a complex number z: z  z  zr  izi  zr  izi  z r  zr  z r 0 16
Proof of Conjugate Symmetry
X  f   X  f 
*
• Show that, if x(t) is real, then:
*
 1
 

X  f    x t e X  f    x t e
 2 ift *  2 ift
dt  dt 

 
  
X  f     x t e  dt   x t  e
 2 ift *
 dt
 2 ift *
* *
 
Note that    2 ift *

2 ift
e e and that because x(t) is real
then x t  x t 
*
 
 2 i  f t
X  f    x t e dt   x t e
* 2 ift
dt
 
This is 1 with f replaced by –f, i.e. X(-f).
Proof of Time Shifts
• To show that F  x t    e  2 if  X  f 

F  x t      x t   e  2 ift dt

Using the substitution u=t-t so that du=dt and t=u+t.

 
 2 if u  
 x t    e  2 ift
dt   x u e du
 
  
 2 if u  
 x u e du   x u e  2 ifu e  2 if  du e 2 if      2 ifu
x u e du
  
Is exactly 1 with t replaced by u, because these

are dummy variables in the integration this makes no
difference so the underlined integral is just X(f).
Proof of Inverse Fourier Integral
 
If X  f    x  e  2 if  d show that x t    X  f e 2 ift df
 
Taking the definition of X(f), multiply by e2pift and integrate:

 
 
 2 ift
 X  f e df     x  e
2 ift  2 if 
d  e df
     
   
 2 if t    
   x   e e dfd   x    e
 2 if  2 ift
df  d
     

  x   t   d  x t   t   

Order of integrals swapped
19
Example 1: Rectangular Function
• What is the Fourier transform of:

x t  1 t T / 2
0 Otherwise
x(t)
-T/2 T/2 t

X  f    x t e  2 ift dt Definition of Fourier Integral

T /2
X  f   1e  2 ift dt Using the expression for x(t)
 T /2
T /2
1  2 ift T /2
X  f   e  2 ift
dt   e 
 T /2
 2if t  T /2
Using integral of eat, which is eat/a

1  1  2 if T /2  2 if  T /2
X  f 
 2if
 e  2 ift T /2

t  T /2

2if
e  e 
1  
ifT  ifT
 1  ifT e  e
X  f 
2if
 e  e  ifT
f 2i
sin fT  sin fT  Using Euler’s equation
X  f  T sin  
e i  e  i
f fT 2i
What does this function look like?
Called a sinc function
sin fT 
X  f  T
fT
It is helpful to consider this in terms of U=pfT
T=1
sin U 
X  f  T
U
1
T  sin U 
U
At U=0, sin(U)/U looks to be 1
U=-2p U=-p U=p U=2p U=3p

At U=±p, ±2p, ±3p, .....
sin(U)/U=0
What happens at f=0?
Note that X(f)=0, for fT=k, where k=….-2, -1, 1, 2, 3
That is to say the Fourier transform is zero for all frequencies which are integer
multiples of p, with one exception, namely f=0.
sin U  0
When f=0, U=pfT=0, so that X ( f ) T T
U 0
The ratio 0/0 is undefined (it is not zero, not infinity or anything else, it is not defined!)
We can talk of the limit of sin(U)/U as U approaches zero in two ways.
Formally, using L’Hopital’s rule: A little less formally,
If q ( x), p ( x)  0 as x  0 then
sin U  U for small U
lim p  x  lim p '  x 
 So
x  0 q x  x  0 q ' x 
lim sin U  lim U
So  1
U0 U U  0U
lim sin U  lim cos U 
 1
U0 U U0 1
Either way one sees that X  f   T as f  0

What is the influence of the duration T?
Two things happen:
1) The peak height increases. The maximum value is T.
2) The rate of oscillation of the Fourier Transform increases. So, for instance, the
first zero crossing is at U=p, or pfT=p which means f=1/T so the point at which
X(f)=0 for the first time moves closer to f=0 as T→∞.
The combined effect is that X(f) becomes narrower and taller as T gets bigger.
In the limit as T→∞
So as T gets bigger, the Fourier transform becomes taller and narrower.
It becomes more like a Dirac delta – Dirac deltas have infinite height and zero
width.

Recall one definition of the Dirac delta is:   f    e 2 ift dt


T /2
And that for this example we are computing X  f   e  2 ift dt

 T /2
lim
So it is clear that for this example X  f    f 
T
Or to put it another way the Fourier transform of x(t)=1, for all t, is d(f)
Example 2: Cosine Wave
x t   A cos 2f 0t  t
Note f0 is a constant, be careful not
to confuse it with f the variable in the
x(t) Fourier transform X(f).
A T=1/f0
d d d d
d d d d t



X  f    Acos 2f 0t e  2 ift dt Using the expression for x(t)
 Here is where you can get f and f0 confused! f0 is the frequency of the cosine wave
being analysed and f is the variable in the Fourier transform.
i  i
e +e
To evaluate this integral one simplifies using cos  
2

A

X  f    e 2 if0t  e  2 if0t e  2 ift dt
2 

 
A A
 e e2 if 0t  2 ift
dt   e  2 if0t e  2 ift dt
2  2 
 
A  2 i  f  f 0 t A  2 i  f  f 0 t
 e dt   e dt
2  2 


2 i t
Both the integrals have the form: e dt

Note that the integrals have infinite limits and the argument of the
exponential is imaginary. Consequently the result is a Dirac delta:

 e 2 it
dt   

So the Fourier transform is:

A A
X  f     f  f0     f  f0 
2 2 The way we draw a Dirac delta
X(f) is with an arrow. The height of
A/2 A/2 the arrow reflects the area
under the Dirac delta, not its
amplitude (the amplitude is of
a Dirac delta is always ∞)
-f0 f0 f
Link to Example 1
Taking A=1 and f0=0.
Then the Fourier transform is
A A 1 1
X  f     f  f 0     f  f 0     f     f    f 
2 2 2 2
And the signal in the time domain is
x t   A cos 2f 0t  cos 0  1 t
This shows, like in example 1, that the Fourier transform of

the constant 1 is a Dirac delta (d(f)).
Example 3: Exponential Decay

x t  e  t
t 0,   0
0 t 0
x(t)
t



X  f   e  t e  2 ift dt Using the expression for x(t)
0

 2 if t
X  f   e dt Combining the exponential terms
0
1  2 if t

X  f  e  Using the standard integral
   2if   0 1 at

at
e dt  e
a
 2 if t
For large t, then e e  t e  2 ift  0 because e  t  0
1 1
X  f  0  1 
  2if    2if 
Discussion
• In this case the Fourier integral is complex valued (previous
examples have all been real valued).
• To explore this we typically consider the magnitude and
sometimes the phase of X(f).
• To compute the magnitude we commonly use the fact that
z  zz * and for the phase arg  z tan  1  zi / zr 
2
2 1 1 1 1 1
X f     2
  2if    2if    2if    2if    42 f 2
*
1   2if    2if   2f 

X  f   2  2 i 2
  2if    2if    4 f   4 f
2 2 2 2
  4 2 f 2
Re  X  f  Im  X  f 
arg  X  f  tan  2f /  
1
Representations
Magnitude Phase
1 arg  X  f  tan  1  2f /  

X f   2
  42 f 2
The Effect of a
Note that for at f=0, then |X(f)|=1/a.
Large a means rapid decay in time, small peak in |X(f)|
Small a slow decay in time, large peak in |X(f)|
Asymptotic (large frequency) behaviour
1 1 1 1
X f   2 2 2
 2 2
  as f  
  4 f 4 f 2f f
Log-log plot of |X(f)|

In Fourier series we saw that if a signal was
discontinuous there was a 20 dB per decade
reduction in the FS coefficients at high
frequency.
10-fold (20 dB)
reduction in This signal also has a discontinuity (at t=0)
|X(f)|
and we now observe that the Fourier integral
10-fold (decade) also decays at 20 dB per decade at high
Increase in f
frequencies.
Example 4: Double Exponential
x t  e
 t
t
x(t)
e t
e  t
t
Note when t is negative then |t|=-t and e-a|t| is equal to e-a(-t) = eat.
We shall take a different approach to this Fourier Integral
This will be based on the following:

x t  e  y t   y  t 
 t
Where y(t) is given by: y t  e  t t 0

0 t 0
y(t)
y(-t)
+ t
= x(t)
Identifying y(t) as the same function as used in example 3 (a
decaying exponential). So we already know that:
1
Y  f 
  if
Then we also know that F  y  t  Y  f  (see properties of the

*
Fourier integral).
This means that
x t   y t   y  t  X  f  Y  f   Y  f 
*

*
1  1  1 1
X  f    
  2if    2if    2if   2if
2
 2
  4 2 f 2
Symmetry in the Examples
• Note Examples 1, 2 and 4 all involve signals which are
symmetric and their Fourier integrals are real.
– We just look at X(f) since it is real.
• Example 3 is asymmetric and the Fourier integral is complex

valued.
– Since X(f) is complex we look at its magnitude and phase separately
(note this is the only example we do that for).
39
Laurel and Hardy
• Example 1 - as T varies:
– As T increases the window gets broader in the time domain, but its
Fourier transform becomes narrower (first intercept on the frequency
axis is 1/T).
• Example 3,4 - as a varies: (Example 4 is easier to deal with …)
– For small a the time domain function is broad, but the Fourier
transform is narrow. Specifically for example 4
2 2
X 0   2 2

  4 0 
2 1 X 0 
X  / 2   2   i.e. at f=a/2p the Fourier
  4  / 2  
2
2 transform is half its peak value,
so a/p represents the 6 dB
bandwidth.
40
Example 4 (in detail)
• Functions as a varies
Time Domain Frequency Domain
o indicates peak value (2/a)

* Indicates point which is 6 dB lower 41
General Rule
• Fourier transform of x(at), assume a>0
– If a>1 then x(at) is compressed along the time axis
– If a<1 then x(at) is stretched along the time axis
1
F  x at   X  f / a  a  0
a
• Thus
– For a>1, x(at) is compressed but X(f/a) is stretched out along the frequency
axis.
• Shorter duration signals have broader bandwidths
– For a<1, x(at) is stretched but X(f/a) is compressed along the frequency
axis.
• Longer duration signals have narrower bandwidths
42
Examples Summarised
• Continuity in the time domain of the four examples
– Example 1 (Square pulse) is discontinuous
– Example 2 (Cosine wave) is completely continuous
– Example 3 (Exponential decay) is discontinuous
– Example 4 (Double exponential) is continuous, but is discontinuous in
its derivative
• Their Fourier transforms are:

 sin fT  1
Example 1 AT   Example 3
 fT    2if
A A 2
Example 2   f  f0     f  f0  Example 4
 2  4 2 f 2
2 2
43
Asymptotic Roll-Off
• What happens to the Fourier transform for large f ?
 sin fT  A 1
• Example 1 : AT     Discontinuous
 fT  f f
• Example 2 : for large f (in fact for any f>f0) X(f)=0 Smooth
1 1
• Example 3 : for large f this is  Discontinuous
  2if f
2  1 Continuous,
• Example 4 : for large f this is  2 discontinuous
2 2 2
  4 f f derivative
Continuity properties
in time domain
44
General Rule
• Consider a signal x(t) which is continuous up to its nth
derivative
– i.e. it is continuous, as is its first derivative and all those up to n-1 are
continuous, but the nth derivative is discontinuous.
• The Fourier transform of this signal will satisfy:
1
lim X  f   n1
f  f
• This means in terms of decibels (20 log10(|X(f)|)) that for large
f the Fourier transform reduces by 20(n+1) dB per decade
increase in frequency (or 6(n+1) dB per octave).
45
Parseval’s Theorem

• Defining as signal’s energy as: E   x t  dt
2

 
2
• One can show E  x t  dt 
 X  f  df
2
  
Proof in summary
∞ ∞
𝑥 ( 𝑡 ) =∫ ❑𝑋 ( 𝑓 1) e d𝑓 1 𝐸= ∫ ❑𝑥 ( 𝑡 ) d𝑡
2𝜋𝑖𝑓 1𝑡 2
−∞ −∞ 46
Examples for you to try
1) Compute the Fourier transform for:
a) x(t)=1 t
b) x(t)=d(t)
c) x(t)=sin(2pf0t) t
d) x(t)=te-at t>0
=0 t<0
2) For each Fourier transform you found in 1), consider the
continuity of x(t) and show that the asymptotic rate of X(f)
conforms with the general rule relating these two (slide 45).
3) Prove that:
a) F  x at   1 X  f  , a  0 b) F  x  t   X  f 
*
 
a a
47
Fourier Integrals: Part 2
Join: vevox.app ID:108-365-854

What can you say about the Fourier Integrals (FIs) of

the signals in the Figure below?
1. FI of a) is real and of b) is imaginary and the asymptotic roll-off of a) is faster than that of b)
0%
2. FI of a) is real and of b) is imaginary and the asymptotic roll-off of b) is faster than that of a)
0%
3. FI of b) is real and of a) is imaginary and the asymptotic roll-off of a) is faster than that of b)
0%
4. FI of b) is real and of a) is imaginary and the asymptotic roll-off of b) is faster than that of a)
0%
Vote Trigger
In the Figure below, which time domain signal (top row) matches
which Fourier integral (bottom row)?
1. a) - i), b) - ii), c) - iii)

0%
2. a) - i), b) - iii), c) - ii)

0%
3. a) - ii), b) - i), c) - iii)

0%
4. a) - ii), b) - iii), c) - i)
0%
5. a) - iii), b) - i), c) - ii)
6. a) - iii), b) - ii), c) - i)
0%
In the Figure below, which time domain signal (top row) matches
which Fourier integral (bottom row)?
1. a)-i), b)-ii), c)-iii)

714%
2. a)-i), b)-iii), c)-ii)

0%
3. a)-ii), b)-i), c)-iii)

14.29%
4. a)-ii), b)-iii), c)-i)

7.14%
5. a)-iii), b)-i), c)-ii)

64.29%
6. a)-iii), b)-ii), c)-i)

7.14%
Contents
• Convolution (linear systems)
– Properties of convolution
– Output of a Linear Time Invariant (LTI) system
• Data Truncation (Windowing)
– Properties of good windowing functions
Multiplication
• The Fourier transform of x(t)+y(t) is X(f)+Y(f), but what
happens if x(t) and y(t) are multiplied together?

F  x t  y t    x t  y t e  2 ift dt

• Is not: X(f) Y(f)
• It is the convolution of X(f) and Y(f), which I (along with many
others) denote X(f)*Y(f).
* Denotes convolution (NOT multiplication!), which is completely different

when superscripted (*) it denotes conjugation.
 
X  f * Y  f    X  f   Y  d    X  Y  f   d 
 
Proof
F  x t  y t    x t  y t e  2 ift dt

 
x t  F  1  X  f    X  f1 e2 if t df1 1
and y t    Y  f 2 e2 if2t df 2
 
   
     2 ift
F  x t  y t      X  f1 e df1    Y  f 2 e df 2  e
2 if1t 2 if 2t
dt
t    f1    f2   
  
    X  f1 Y  f 2 e 2 if1t e 2 if2t e  2 ift df 2 df1dt
t   f1   f 2  
 
   2 i f  f1 f2 t 
   X  f1 Y  f 2   e dt  df 2 df1
f1   f 2   t   
  
   X  f1 Y  f 2   f  f1  f 2 df 2 df1   X  f1 Y  f  f1 df1
f1   f 2   f1  
… in the Frequency Domain
• Similarly in the frequency domain:

F  1  X  f Y  f    X  f Y  f e2 ift df


  x   y t   d   x t * y t 

• Hence if you multiply two signals in one domain (time or

frequency) they are convolved in the other domain.
What is Convolution?
Example in pictures
• Consider:
X  f * Y  f 

  X u Y  f  u du

• Y(f-u), considered as a function

of u, is shifted by f and reversed.
• This is multiplied by X(u) and the
area under this product is
computed in the integral.
• This area varies with the shift f,
i.e. the result is a function of f.
Properties of Convolution
• Commutative x t * y t   y t * x t 
• Linear x t * a y t   b z t  a x t * y t   b x t * z t 
• Time shifts x t   * y t   x t * y t   
• Delta functions is the identity function1 x t *  t   x t 
– Combining the last two properties x t *  t     x t   
1
Same way as 1 is the identity value for multiplication (anything multiplied by 1 does not
change it) and zero is the identity for addition (add zero to anything and
it does not change).
Linear Time Invariant Systems
• Consider a Linear Time Invariant (LTI) system.
Input, x(t) Output, y(t)

System
• An LTI system is characterised by 2 properties:

– If y1(t) and y2(t) are the responses to x1(t) and x2(t) respectively. Then
the response to ax1(t) + bx2(t) is ay1(t) + by2(t).
– If the input is delayed, i.e., the input is x(t-t), then the output is simply
delayed by the same amount, so can be written y(t-t).
Characterising an LTI System
• An LTI system can be characterised by its impulse response
h(t).
Input, d(t) Output, h(t)
System
• The impulse response can be used to evaluate the output of

the system in response to any input.
• Start by considering the response of the system to an input of
the form
x t  a t  1   b t  2  y t  ah t  1   bh t  2 
a t  1  ah t  1 
b t   2  bh t  2 
t1 t2 System t1 t2
General Result
• The sifting property of a delta function can be used to express
an arbitrary input: 
x t    x   t   d 

• We can think of this as saying that x(t) consists of an infinite

sum (integral) of terms like x   t   
• The response of a system to one such a term is x  h t   
• The response to x(t) is thus

y t    x  h t   d 

h t * x t 
LTI Systems in the Frequency Domain
• Since
y t  h t * x t   x t   h  d  h t    x  d
• Fourier transforming this gives

  y t  F  x t * h t   X  f  H  f 
Y  f =F
• Where H(f) is the frequency response function (FRF) and also

is the Fourier transform of the impulse response, i.e.
Yf 
H  f  F  h t  
X f 
Data Truncation
• All measurements are of finite duration and can be thought of as
truncated versions of some infinite duration signal.
• Consider a signal x(t) whose Fourier transform is X(f).

• What is the Fourier transform of a truncated form of that signal, which
we will call x t 
, where
x t  x t   T / 2 t T / 2
0 Elsewhere
• Note this assumes

– the signal is zero outside the time of the measurement
– symmetric truncation, i.e., the signal is truncated symmetrically about t=0. This
implies t=0 is in the middle of your signal! Which is not the natural choice but
makes the following development easier and has no substantive impact on the
result.
Fourier Transform of Truncated Signals
• To relate the Fourier integral of x t  to x t  then we first
relate the two signals in the time domain.
• To do that one can use the following relation:
x t  x t r t 
where r(t) is the rectangular function (example 1 in Fourier integrals):
r t  1 t T / 2 r(t)
0 Elsewhere
t=-T/2 t=T/2 t
….. Cont’d
• Since x t  can be related to x t  via multiplication then
X  f  R  f * X  f 
• From example 1 (Fourier Integral notes)

sin fT 
R  f  T
fT
• Thus, the effect of truncating x(t) in the time domain is to
convolve its Fourier transform by R(f) in the frequency domain
• Note as T∞ then R(f) d(f) and X  f     f * X  f   X  f 
Properties of R(f)
• Recall r(t) is discontinuous and consequently R(f) rolls off at
high frequencies slowly (20 dB per decade).
T=1 in this example
First zero crossing at 1/T (=1)
Height of first peak is ~ -13 dB 20 dB per

decade
Note the log frequency scale

Note the dB scale for y-axis
Terminology
Main Side lobes
lobe
Height of largest
sidelobe
Side lobe roll-off
Asymptotic rate of decay
of the function
Main lobe width

(can be expressed using various metrics)
Alternative Windowing Functions
• The large side lobes of the function R(f) are due to the
discontinuity in r(t).
• We can (and often do) use another function, w(t), to truncate
the signal:
x t  w t x t 
• The function w(t) is called a windowing function and r(t) is one

specific example of a windowing function.
• As before X  f  W  f * X  f 
• If W(f) can be made more like a delta function then w(t) is a

better windowing function than r(t).
Desirable Properties
• The windowing function, w(t), must be zero outside of the
interval [-T/2,T/2].
• Ideally W(f) should be like a delta function.
• So, it should have
a) Small side lobes Delta functions have no side lobes
• Low first side lobe and the main lobe is of width zero
• Rapid roll-off
b) Narrow main lobe
• The conditions a) and b) turn out to conflict with each other –
you cannot have small side lobes with a narrow main lobe ……
one needs to compromise.
Typical Windowing Functions
in the Time Domain
• There are a large number of available options for w(t).
• The common choices for w(t) share some basic properties:
– w(t) is positive
– w(t) is finite support, i.e. w(t)=0 |t|>T/2
– w(t) is unimodal (has one peak), this means W(f) is maximum at f=0.
– w(t) is symmetrical, so W(f) is real (has no phase).
Blue: rectangular
or boxcar
Green: triangular
Light blue: Gaussian
Red: Hanning
Common Choices for Windowing Functions
• Rectangular window w t  1 t T / 2
0 Elsewhere
• Hanning window (raised cosine)
w t  1  cos 2t / T  / 2 t T / 2
0 Elsewhere
• Hamming window (raised cosine on a pedestal)
w t  0.54  0.46cos 2t / T  t T / 2
0 Elsewhere
• Blackman window
w t  0.42  0.5cos 2t / T   0.08cos 4t / T  t T / 2
0 Elsewhere
Rectangular Hanning
Hamming Blackman
Properties of the Common Windows
Name Main lobe Height of Continuity Roll-off

width (1/T) largest side dB / Decade
lobe (dB)
Rectangular 0.85 -13 0 20
Hanning 1.4 -31 2 60
Hamming 1.4 -43 0 20
Blackman 1.6 -58 2 60
3 dB measure of main lobe width

The order of the derivative
which is the first one which is
discontinuous.
Window Width
• Recall narrow functions in time, i.e., short duration signals,
have broader Fourier transforms (Laurel and Hardy)
• Whilst all the windows are zero for |t|>T/2 we still think of
some being narrower than others (compare Blackman and
Rectangular windows).
• Broad windows (windows with narrow main lobes) can only
be created if they are not smooth at |t|=T/2 (the window’s
start and end).
• Hence broad windows (in time) tend to have high sidelobes
whereas narrow windows (in time) can have small side lobes.
Windows in Action
Example 1: Time Domain f1 22.57
Two sine waves close together in frequency
f 2 23.67
x t  sin 2f1t  1   0.8sin 2f 2t  2 
Unwindowed data Data Windowed using a Blackman Window
Windows in Action
Example 1: Frequency Domain
Two peaks can be seen One or two peaks?
Lines show
true signal
frequencies
Frequency (Hz) Frequency (Hz)

Windows in Action
Example 2: Time Domain
Big sine wave and small sine wave
f1 15.57
x t  sin 2f1t  1   0.001sin 2f 2t  2  f 2 30.67
Unwindowed data Data Windowed using a Blackman Window

Windows in Action
Example 2: Frequency Domain
Small peak lost in side lobes Lower side lobes reveal the
from larger component small peak at 30.67 Hz.
Frequency (Hz) Frequency (Hz)

Peak split due to
multiple strings
Real Example
Fourier Transform
Piano note, C3,

fundamental frequency, ~131 Hz.
Peaks from strings which are excited

sympathetically are hidden in the rectangular
windowing case because of the side lobes
Fourier Transform of a Sampled
Signal
Paul White
Introduction
• Sampling
• Fourier transform of a sampled signal
– Example
• Impulse train (Dirac comb)
• Poisson sum formula
• Aliasing
Sampling
x[1] x[2]
x(t)
x[3]
x[0]
x[4]
Sampling interval
Sampling frequency fs=1/D [Hz]

Digital Signals
• The sample x[n] corresponds to the signal, x(t), evaluated at
tn=nD, i.e.
x  n  x t  t n x tn  tn 0, , 2,.....
• If x[n] is known, then this says nothing about the value of x(t)
between samples, (n-1)D<t<nD.
• The digital signal x[n] is a sequence of numbers (not a

function).
• One can not integrate a list of numbers, so that the Fourier
integral can not be applied directly to x[n]
The Rectangle Method for Numerical
Integration
• Consider the general problem of approximating an integral.
– We shall temporally use x as the independent variable, instead of t.
Width dx
f(x)
f(xn)
xn x
a b
Area in each rectangle f(xn)dx

Rectangular Rule (cont’d)
• The rectangle method uses the approximation:
b N
 f x dx  f x x
a n0
n
xn nx where x b  a  / N
– You might be more familiar with the trapezoidal rule, which is actually
a little more accurate than the rectangle rule. (The area of the two half
strips outside of [a,b] are subtracted when using the trapezoidal rule).
Approximating the Fourier Integral
• We can used the rectangular method to approximate the
Fourier integral as follows:
  
 x t  e  2 ift
dt   x tn e  2 iftn
   x  n  e  2 ifn
 n  n 
• This approximation becomes more accurate as D reduces.
• Small values of D (=1/fs) correspond to high sample rates.

– It is natural to consider what is the largest value of D (smallest
sampling rate) you can use and keep an accurate approximation of the
Fourier integral.
Fourier Transform of a Sampled Signal
• The Fourier transform of a sampled signal, Xs(f), is defined as:

X s  f    x  n  e  2 ifn
n 
• Note this is exactly the approximation on the last slide, but

with the constant scale factor D omitted.
– The subscript, “s”, in the notation Xs(f) is included to distinguish it from
the Fourier integral X(f).
• The transform Xs(f) is, in general, complex valued, even if x[n]

is real.
(Revision) Sum of Geometric Progression
• Finite sums
N1
1 r N
2 3
S a  ar  ar  ar  ....  ar N1
 n
ar a r 1
n0 1 r
r is the geometric ratio and a is the initial term.
• Infinite sums
Based on the above expression as N∞, |rN|0 as long as |r|
<1. In which case

a
S a  ar  ar  ar  .......  ar 
2 3 n
r 1
n 0 1 r
If |r|>1 the sum diverges, i.e. S∞ as N∞
Periodicity of Xs(f)
• Consider Xs(f+1/D)=Xs(f+fs)
  1 
 1  2 i f   n
X s  f     x  n e  
  x  n  e  2 ifn e  2 in
   n  n  
 1
 
n
e  2 in
e  2 i n
1 1  X s  f    X s  f  fs   X s  f 
 
• Consider Xs(1/D-f)=Xs(fs-f)
 1  
1   2 i  f  n
X s   f    x  n e  
  x  n  e 2 ifn e  2 in
  n  n 

1 
 x  n  e 2 ifn  X s  f   X s   f   X s  fs  f   X s  f 
* *
n   
Implications of Periodicity
• The observations on the previous slide mean that Xs(f) is
periodic in frequency, regardless of the signal x[n].
|Xs(f)|
-fs -fs/2 fs/2 fs f

Arg{Xs(f)}
-fs -fs/2 fs/2 fs f

Comments
• Xs(f) is periodic in frequency for all signals x[n].
• The transform repeats itself every fs Hz.

• Further above fs /2 there is conjugate symmetry:
X s  fs  f   X s  f 
*
• Hence if Xs(f) is known in the band 0- fs /2 then one can find

the transform for any f – using the symmetry of the
transform.
Example 1
• Exponential decay
x  n   n n 0  1
0 n0
 
X s  f    x  n  e  2 ifn   ne  2 ifn
n   n 0
 Recall:
X s  f   e 
 2 if  n 
S  ar n 
a
1 r
r 1
n 0 n 0
r

1
X s  f   r  n
if r  1
n 0 1 r
1
1
Xs  f    2 if  if r   e  2 if 
  e  2 if 
  1
1  e
r e 2 if  Note the condition |a|<1 means that
the time-domain signal is decaying (stable)
Example 1 (cont’d)
*
2  1  1  1
Xs  f    2 if     2 if   
 1  e   1  e   
1  e  2 if  1  e 2 if  
1 1
 
1  e 
2 if 
e  2 if 

 2
1  2 cos 2if     2
1
Xs  f  
1  2 cos 2f     2
   sin 2f   
arg  X s  f  tan  1

1   cos 2 f  
Impulse Train (or Dirac Comb)
• Consider the function i(t)

i t     t  n 
n  D 2D 3D 4D t
• This is a periodic signal, so can be represented as a Fourier

series:
 /2
Tp=D, fp=1/D

1
i t    d n e 2 int / 
dn   i t  e  2 int / 
dt
n   t   /2
• We shall now compute the Fourier series coefficients, dn, for the
Dirac comb. Recall definition of complex Fourier Series
T /2

1
p
x t    d ne  
2 inf t  2 int /T
dn   x t e dt
p p
n   Tp  Tp /2
Fourier Series of a Dirac Comb
 /2
1
dn   i t  e  2 int / 
dt
 t   /2
i(t)  t  t   
 t  2   t  3 
-D/2 D/2
D 2D 3D 4D t
Region over which the Fourier series integral is computed.
Over the region –D/2 to D/2, only one delta function in i(t) is present,
i.e. the one at t=0. So in this region we can write i t   t 
 /2
1 1 1  2 int / 
dn    t e  2 int / 
dt  hence i t    e
 t   /2   n 
Alternative representation of
an impulse train
Fourier Transform of an Impulse Train
• The Fourier transform of i(t) can be computed as

1  2 int /   2 ift
I  f  F  i t     e e dt

 n 

1   2 i f  n /  t 1   n 1 
  e dt     f       f  nfs 
 n  
 n      n 
i(t) I(f)
F
1
1/D
D 2D 3D 4D t 1/D 2/D 3/D f
i.e. Fourier transform of a Dirac comb is another Dirac comb (scaled and with reciprocal spacing)
Alternative Definition of Xs(f)
• Consider F  x t i t  where i(t) is an impulse train (Dirac
comb) with spacing D.
   
F  x t i t    x t    t  n e  2 ift dt    x t   t  n  e  2 ift
dt
 n  n   
 
  x t  t n e  2 ifn   x  n  e  2 ifn  X s  f 
n  n 
• So the Fourier transform of a sequence can be regarded as

the Fourier transform of the continuous time signal,
multiplied by an impulse train.
Poisson Sum Formula
• Since X s  f  F  i t  x t   X s  f  I  f * X  f 
and we know that:
1 
I  f      f  nfs 
Thus:   n 

1 
X s  f    I   X  f   d    
 n 
   nfs  X  f   d 
 

1 
      nf  X  f   d 
s
 n  
So that: Poisson sum
formula
1 
X s  f    X  f  nfs 
 n 
Comments
• The Poisson sum formula relates the Fourier transform of the
sampled signal to that of the signal prior to sampling.
• We would like Xs(f)=X(f) so that the act of sampling does not
affect the Fourier transform of the signal, i.e. x(t) and x[n]
have the same transform, this can never be true for all f
because:
– Xs(f) is a periodic function of f. (All sampled signals have Fourier
transforms which are periodic in frequency with period fs).
– X(f) is not, in general at least, periodic.
• Sampling the signal fundamentally changes the Fourier
transform, so does that mean we are stuck?
Poisson sum 1 
X s  f    X  f  nfs 
Formula in pictures  n 
fs/2
Aliasing
• If x(t) is band limited, such that
X  f  0 f  f0
• If fs/2>f0 then
 X s  f  X  f  for  fs / 2  f  fs / 2
So, with the exception of the 1/D scaling factor, in this band the
transforms are the same.
• If fs/2<f0 then
 Xs  f  X  f 
i.e. the Fourier transform of the sampled signal is a distorted

version of the Fourier transform of the original signal – this
distortion is called aliasing.
Nyquist Criterion
• In order that a signal’s Fourier transform is not distorted by
aliasing, one needs to sample at a sufficiently high sample
rate.
• This assumes the signal x(t) is band-limited.
• In particular fs  2 f0
Sample rate Highest frequency in the signal
• fs/2 is the highest frequency you can accurately represent

using a sampling rate of fs.
• fs/2 is commonly called the folding frequency.
Practical Method for Avoiding Aliasing
• To ensure aliasing does not occur one should remove high
frequencies from x(t) before sampling.
• This is achieved by applying a low-pass filter prior to sampling.
• This filter is called an anti-aliasing (AA) filter.
x t  Anti-aliasing
x t  Analogue to x  n
digital converter To computer
Analogue input filter Filtered input (ADC) Digital signal
Filter cut-off frequency is commonly quoted as the frequency which

is 3 dB (or sometimes 6 dB) below the pass-band.
When selecting an AA filter you want higher attenuation at fs/2.

Which means your 3 dB point may be lower in frequency than the
folding frequency.
Guard Band
• Say we wish to sample at fs=5 kHz, then where should we put
the cut-off on the AA filter?
• See last slide, if cut-off is 2.5 kHz then there is only 3 dB of
attenuation at the folding frequency (fs/2).
• For this filter we need to put the cut-off at 1.4 kHz to get 40
dB attenuation at fs/2. The guard band
Frequency range before the folding frequency
where the AA filter is rolling-off, but has not
reached a high level of attenuation.
There will be aliasing in the guard band, but

hopefully at a low level so it does not affect
overall signal quality.
We want an AA filter with high roll-off so the

guard band can be narrow.
Example 2
• Complex sinusoid x  n  e 2 iqn frequency q.
 
Xs  f    e   e 2 inq f 
2 iqn  2 ifn 2 in q  f 
e
n   n  
t   f  q

1  2 int /    1/ 
Recall i t     t  n    e
n   n 
 
 n
So that X s  f      f  q       f  q  nf s 
n    n 
Xs(f) fs 2fs 3fs 4fs
q<fs
D
q-fs q fs+q 2fs+q 3fs+q f
D
q>fs
q-2fs q-fs q fs+q 2fs+q f
Example 3
• Cosine wave x  n  cos 2qn 
e 2 iqn  e  2 iqn
• We break this up using cos 2qn  
2
• From last example 
F e 
2 iqn

    f  q  nfs 
n 
• It is a simple step to then show


F e  2 iqn
     f  q  nf  s
• Thus n 
 
  cos 2qn   
X s  f =F    f  q  nf     f  q  nf 
s s
2 n 
Xs(f) fs 2fs 3fs
q<fs
D/2
-q-fs q-fs -q q fs-q fs+q 2fs-q 2fs+q 3fs-q 3fs+q f

Effect of Aliasing in the Time Domain
f0=1.234 Hz
Signal Reconstruction
• Can one recover x(t) from the samples x[n]?
• IF there is no aliasing then
X  f  X s  f  for  fs / 2  f  fs / 2
and we know:
 
x t  F  1  X  f    X  f e 2 ift df , X s  f    x  n  e  2 ifn
 n 
So that
fs /2 f s /2 
x t    X s  f e 2 ift df    x  n  e  2 ifn e2 ift df
 fs /2  f s /2 n 
 f s /2
 2 if t  n 
 sin  t  n  /  
  x  n   e df   x  n 
n   fs /2 n   t  n  / 
Shannon’s Reconstruction Formula
 sin  t  n  /  
x t    x  n 
n    t  n  / 
Closer Look at Reconstructed Signal
• In this case the reconstructed signal and the original signal are
not exactly the same
• This is because the original signal (one cycle of a sine wave)
has energy above (fs/2), i.e. there is some aliasing.
Problems with Shannon’s Reconstruction
 sin  t  n  /  
x t    x  n 
n   t  n  / 
• To reconstruct the signal at time t then one needs to know:

– Values of x[n] which are a long way from t.
• The sinc function decays slowly, so if you want to truncate Shannon’s
reconstruction formula to operate on a finite set of x[n], then you need to
include many terms in that truncation.
– Values of x[n] for all points in time.
• Including points happening after t, i.e. one needs future samples. This
makes the method unsuitable for real-time application.
Practical Reconstruction
• In practice one rarely creates x(t) from x[n] using Shannon’s
formula.
• Reconstruction is more commonly performed using two steps:
a zeroth order hold followed by a low pass filter.
(Reconstruction Filter)
x  n Zeroth order
x t  Low pass x t 
From
computer hold Stepped signal filter Analogue output
Digital signal
The Discrete Fourier Transform
Introduction
• Desirable properties of a digital Fourier transform
• Sampling of the Fourier transform of a sequence
• The Discrete Fourier Transform (DFT)
• The Inverse DFT
• Zero Padding
• Digital Convolution
• Relationship between different Fourier transforms
Desirable Properties of a Digital Fourier
Transform
• What do we want from a Fourier transform so we can
implement it on a digital computer:
1. Should take a digital input
• Like the Fourier transform of a sequence (Xs(f))
2. Should take a finite duration signal
• Covered by our discussion of windowing
3. Should create a digital output
• Covered in this lecture – the discrete Fourier transform (DFT)
4. Should be computationally efficient
• See next lecture on FFTs
Sampling the Fourier Transform of a
Sequence
• Consider a digital signal x[n] defined for 0n<N.
• The Fourier transform of this sequence is
N1
X s  f   x  n  e  2 ifn
n 0
• Recall this is a periodic in frequency f, with period fs.

• To make this “fully digital” we need to sample this function in
frequency.
– i.e. to evaluate Xs(f) at a discrete set of frequencies: the question is
which frequencies?
The Discrete Fourier Transform (DFT)
• Because Xs(f) is periodic in f then it is natural to only sample
one period, e.g. 0f<fs.
• Since we start with N time samples, the usual option is to
create N samples in frequency.
• This implies we evaluate Xs(f) at the frequencies
f 1
f k k f f  s  , k 0,1, , N  1
N N
• So that the DFT, X[k]=Xs(fk), is defined as follows
N1 N1 k N1
 2 i n
X s  f k   x  n  e  2 if k n  x  n  e N
 x  n  e  2 ink / N  X  k 
n0 n0 n0
DFT
Pictorial Representation of the DFT
Inverse DFT (IDFT)
• One can show that:
N1

n0
e 2 iqn/ N 0 for 0  q  N  1
N for q 0
Complex exponential with q cycles in 0n<N-1
• Given the above then:

1 N1
x  n    X  k  e 2 ikn/ N
N n0
Outline of proof:
N1
1 N1 1 N1 N1
X  k   x  n  e  X k e    x  n  e   x  m 
 2 ink / N 2 ikm / N  2 i n  m k / N

n0 N k 0 N k 0 n0
Sampling Interval in Frequency Domain
• The interval between samples in the DFT in the frequency
domain, df, is given by 1/ND.
– D is the time between samples
– N is the number of samples in the measurement
– Thus ND is the duration of the measurement (T), so that df =1/T.
• The spacing between samples in the DFT only depends on the

duration of the measurement.
– Increasing the sampling rate without increasing the measurement
time does not affect the interval between samples in the DFT.
– Such an increase in the sample rate only increases the bandwidth (i.e.
increases the maximum frequency represented).
Zero Padding
• One can artificially increase the duration of the signal, and so
reduce the spacing between frequency samples, by zero
padding.
• Zero padding is just appending zeroes to the end of a signal.
x  n  x  n  0 n  N
Some value >N
0 N n  M
• For this padded signal
X s  f   X s  f  f
• But it is sampled more finely since
 1 1
f   f 
M N
Example of Zero Padding
Single Cycle of a sine wave
With Zero Padding by a factor of ~3
Zeroes added
Without Zero Padding

Resolution
• So zero padding enhances resolution ……. NO!!!
• Fairly obviously zero padding adds no information.
– x[n] can be recovered from X[k], i.e. the DFT has the same information
as x[n] and transforming a signal can not add new information to it.
• Zero padding evaluates the Fourier transform of a sequence
on a more densely packed set of points.
– This spacing between samples in the DFT is sometimes called it is
resolution, this is not good terminology.
– The term “resolution” should be reserved as a measure of how close
two sine wave can be in frequency before one can not determine that
there are two signals (i.e. before you see one peak instead of two).
Periodicity
1 N1
x  n    X  k  e 2 ikn/ N
N k 0
• What happens to samples beyond n=N-1?
• Consider x[n+N] in which case, using the IDFT, one has:
1 N1 2 ik n  N / N 1 N1 1
xn  N    X k  e   X k e 2 ikn / N 2 ik
e x  n
N k 0 N k 0
• Hence, according to the DFT, x[n] is periodic.

– The DFT assumes periodic extension (see Fourier series notes) when
considering samples outside the measurement period.
Shifts in Finite Sampled Signals
• How does one shift a digital signal whose length is N samples?
– Let us consider a simple shift of one sample
x[n] x[n-1] 1 sample shift
n n
N-1 N-1
• The problems:
– What do we do with the space at the start of the signal?
– What do we do with the sample beyond the end of the measurement interval?
Option 1: Circular Shifts
• If we assume x[n] is periodically extended then the problems
on the preceding slide have a “natural” solution.
x[n]
N-1
x[n-1]
N-1
Option 2: Linear Shift
• Assuming the signal is zero outside of the measurement
regime then a shift is defined so that:
x[n] x[n-1]
n n
N-1 N
• New sample at the start is zero and the measurement grows

to become one sample longer.
Linear Digital Convolution
• The idea of convolution extends naturally from analogue to
the digital domain.
• Analogue (continuous time) convolution:

x t * y t    x   y t   d 

• Digital convolution:
 
x  n * y  n   x  m y  n  m   x  n  m y  m
m  m 
• If x[n] has N samples and y[n] has M samples then their
convolution lasts N+M-1 samples.
Circular Convolution
• For two finite length digital signals of duration N samples (for
example) then one can define a second form of convolution,
namely circular convolution.
• This is defined as
N1 N1
x  n  * y  n   x  m  y  n  m   x  n  m  y  m 
m0 m0
where here the shifts, like y[n-m], are defined as circular shifts
(not linear shifts).
• The result of circularly convolving two signals of length N is
also of length N.
Convolution in the Frequency Domain
• For the Fourier transforms considered up to now then, in
some form, we have:
F  1  X  f Y  f   x t * y t 
i.e. convolution in time, relates to multiplication in frequency.

• If we ask what is DFT  1  X  k  Y  k 
• Then it turns out that the answer is x[n]*y[n] where * refers
to circular (and not linear) convolution.
Computing Convolution
• To compute convolutions there are two broad approaches:
– Time domain, i.e. directly implement

 x  n  m y  m
m 
for each n reverse one of the signals, shift it by n, multiple the shifted
and unshifted signals together and then add the product up.
This can be computationally demanding.
– Frequency domain
See next slide for details, but basically Fourier transform the signals,
multiply them in frequency and then apply the inverse transform.
Because of the Fast Fourier transform (see next lecture) this can be
computationally efficient.
Computing Linear Convolution using the DFT
• To compute the linear convolution of x[n] (length N samples)

and y[n] (length M samples) in the frequency domain then
one needs to:
– Zero pad both x[n] and y[n] to the same length, which should be at
least M+N-1 samples.
– DFT the zero padded signals to generate, X[k] and Y[k].
– Form the product X[k]Y[k].
– Inverse DFT this product.
Note because of the zero padding the IDFT will be long enough (longer than
M+N-1) to accommodate the linear convolution.
The Fourier integral
The original continuous time signal (upper frame) and its Fourier
transform (lower frame).
F x t 
X  f  F
Fourier transform of the truncated signal

Truncation of the continuous time signal introduces
oscillations into the Fourier transform. The grey curve
shows the unwindowed signal with the black one showing
the signal after windowing.
X TT  f  F  w t  x t 
Fourier transform of the sampled signal

Sampling of the time domain signal makes the Fourier transform
periodic. Sample rate 10 Hz. In the upper frame, the grey plot shows
the original signal and the circles represent the sample values.
In the lower frame the grey curve represents the unsampled windowed
Fourier transform.
X s  f  F  i t  w t  x t 
The Discrete Fourier transform

Sampling in the Fourier domain, at an interval of 1/T Hz ,
introduces periodicity in the time domain. In both plots,
open faced circles indicate periodic repetitions and close
faced symbols represent points in the principle domain.
N1
X  k   x  n  e  2 ink / N
n 0


x t    d kke 22ikt
ikt //TT
kk
 

T /2
1 T /2
d k   x t e 22iktikt//TTdt
T  TT //22

1 NN 11
x t    X  f e 2 iftiftdf x  n   X  k  e22inkink//NN
 N kk00


N1
X  f    x ttee 22iftiftdt
dt X  k   x  n  e  2 ink / N
 
 nn
00
ff /2
/2
1 ss
x  n   XXss  ff ee22ifnifndf
df
f ss  ffs /2/2
s
N
N11
X ss  ff   xx nn ee22ifn
ifn
nn
00
The Fast Fourier Transform
Introduction
• Computation of DFT
• FFT Algorithm
• Divide and conquer approach
• Mathematical “trick”
• Diagrammatic representation
• Speed of the FFT
Computing DFT
• A DFT can be computed directly using the following recipe
(algorithm):
– For each frequency k (of which there are N)
• Multiply the signal x[n] by the complex exponential e-2pink/N
• Sum this product to for X[k]
• This requires one to compute N complex multiplies and adds

for N different frequencies, that is roughly N2 operations.
– To compute a 1024 point DFT requires roughly 1,000,000 operations.
• This is too burdensome for many applications.

General Comments on FFT
• The Fast Fourier Transform (FFT) computes the DFT in an
efficient fashion.
– i.e. the output of the FFT is exactly that one would get from a DFT.
• It performs this computation in a divide and conquer fashion.
• There are a wide range of methods and we will consider only

one: a radix 2 decimation in time algorithm.
– “Radix 2” means the problem is divided into two at each step
– “Decimation in time” means this problem in broken up in terms of
time samples (as opposed to frequencies)
Notation
• When considering FFT it is normal to use a bespoke notation.
• Specifically WN e  2 i / N (Twiddle factors)
• So that one can write
N1 N1
X  k   x  n  e  2 ink / N
 x  n WNnk
n 0 n 0
and use the following diagrammatic conventions
a a+b
b
a a b
b a-b
Decimation in Time
• The FFT starts by dividing a sequence into two, assuming N is
even, by considering the odd and even numbered samples in
a signal as two different sequences:
N1
X  k   x  n WNnk
n0
x1  n   x  2n  n 0,1, 2,....., N / 2  1
x2  n  x  2n  1 n 0,1, 2,....., N / 2  1
x[n]
n
x2[n]
x1[n]
Resulting Simplification
• The DFT can be written in terms of these two sequences:
N /2 1 N /2 1
X  k    x  2n W  x  2n  1WN 2 n1k
2 nk 2 n1 k
N 
n0 n0
N /2 1 N /2 1
  x1  n WN2 nk  WNk  x2  n WN2 nk
n0 n0
• We can re-express this using a basic property of the Twiddle

factor
WN2 nk e  2 i 2 nk / N WNnk/2
N /2 1 N /2 1
X  k    x1  n WNnk/2  WNk  x2  n WNnk/2
n0 n0
Similarly ……
• Consider the frequency k+N/2:
N /2 1 N /2 1
nk  N /2  k  N /2 
X  k  N / 2   x1  n W  x2  n WNn/2k  N /2
n k  N /2
N /2  WN
n0 n0
• Again looking at the twiddle factors:
WNk  N /2 e  2 ik  N /2/ N e  2 ik / N e  in  WNk
k  N /2  2 i k  N /2 / N
WNn/2k N /2 e  2 ink  N /2/ N /2 e  2 ik / N /2 e  2 in WNnk/2

n k  N /2  2 in k  N /2 / N /2
• Hence
N /2 1 N /2 1
X  k  N / 2   x1  n WNnk/2  WNk  x2  n WNnk/2
n 0 n 0
Summary so far
• Thus we have
N /2 1 N /2 1
X  k    x1  n W nk
N /2
k
W
N  x2  n WNnk/2
n0 n0
N /2 1 N /2 1
X  k  N / 2   x1  n W nk
N /2
k
W
N  x2  n WNnk/2
n0 n0
• But
N /2 1 N /2 1
X 1  k    x1  n W nk
N /2 X 2  k    x2  n WNnk/2
n0 n0
i.e. these are the DFT of the two sequences (the odd and even
numbered samples).
The Saving
• To compute X[k] one can compute X1[k] and X2[k] and
combine them as follows:
X  k   X 1  k   WNk X 2  n 
X  k  N / 2  X 1  k   WNk X 2  n 
• Computing each of X1[k] and X2[k] requires (N/2)2 = N2/4

• So total computation is N2/4+ N2/4= N2/2, i.e. 50% less than
computing the DFT directly.
• This saving can be magnified by applying this recursively, i.e.
the same idea can be applied to compute X1[k] and X2[k], so
saving another 50% ……..
Computing an 8 Point FFT using Two 4 Point
Transforms
Complete 8 Point FFT
Ordering of Input Samples
• The order in which the time
series data is applied to the FFT
is shuffled in a particular way.
• Specifically to find the position
of the nth sample you:
– Express n as a binary number
– Reverse the order of the binary
digits.
– Convert back to a decimal value.
For example to find the position of the 6th sample:

6 = 110 reverse the order of the bits, to get 011=3
(note one counts from zero).
Computational Load of an FFT
• For an FFT of size N=2p (for some p)
– There are log2(N) layers in the FFT.
– Each layer requires N/2 complex multiplies and adds
• A complex multiply requires two real multiplications.
– Total computational load  Nlog2N
• Modern FFT algorithms achieve a performance roughly

proportional to Nlog2N for any FFT size.
• The FFT continues to be most efficient when an the data
length is a power of 2.
Computational Time
• MATLAB implement of the FFT for different sizes.
• Timed over 10,000 realisations.
Random Processes: Some Basics
How much statistics have you done in the past?
1. None
0%
2. High school (A-level)
0%
3. Some in first year
0%
4. Some throughout my degree
0%
5. ..... I did a stats degree!
0%
What is the name of the distribution with the probability density
function shown below?
1. Gaussian
0%
2. Rayleigh
0%
3. Uniform
0%
4. Normal
0%
5. Chi-squared
0%
Two experiments, E1 and E2, yield the two data sets:

E1={10, -2, -8, 4, -4}, E2={-0.2, 1, -0.4, -0.8, 0.4}. What
can one say about these data?
1. Mean of E1 > Mean of E2
0%
2. Mean of E1 < Mean of E2
0%
3. Standard deviation of E1 > Standard deviation of E2
0%
4. Standard deviation of E1 < Standard deviation of E2
0%
The figure below shows a scatter diagram for two random
variables X and Y. What can one say about these variables?
1. They have the same mean

0%
2. X and Y are uncorrelated
0%
3. X and Y are statistically independent
0%
4. X and Y are positively correlated
0%
5. X and Y are negatively correlated
0%
Introduction
• Random Processing
• Histograms
• Probably distribution functions (pdfs)
• Moments
• Bivariate random variables
• Correlations
Random Processes
• Consider a random variable (r.v.) X representing the numerical
outcome of some experiment (e.g. rolling a dice, tossing a
coin or a measurement error).
• Each time the experiment is conducted a value is generated
xk, which is called a realisation of the r.v.
• The values of xk are unpredictable (random!)
• We shall assume the possible values that xk lie on a
continuum (possibly from - and ) – so called continuous
random variables.
– This excludes things like dice rolling, since the outcome of rolling a
dice only gives one of 6 values (1,2,3,4,5 or 6).
– But includes things like measurement errors.
Example
• Consider testing the lifetimes of a set of batteries.
• Each battery lasts a different length of time and any positive
time might be an outcome.
• If one tests a batch of 100 batteries you get a set of data
consisting of 100 different measured lifetimes.
• If a second batch of 100 batteries are tested, then a second
set of data is obtained.
– The two results from the batches will (of course) consist of different
values.
– Assuming the two batches are tested in the same way, we do expect
some similarities
• The average for the values from 2 batches is likely to be similar
• More generally, the data will be “distributed” in the same way.
Histogram
• You are probably familiar with the concept of a histogram.
Histograms for 4 different

batches of 100 battery life times
(simulated on a computer!)
Each histogram uses bins of

width 10, so for example, the
bin centred on 50 min counts
the number of batteries which
lasted between 45 and 55
minutes.
Note the example histograms are

not identical, but share basic
shape.
Distributions
• From this histogram it is clear that in this example most
batteries last for a time of around 50 minutes
– Some last significantly shorter times, less than 20 mins.
– Some last significantly longer, more 80 mins.
• The manner in which the values in such a random process are
distributed is characterised through the probability density
function (pdf).
• The pdf has a shape resembling that of the histogram
computed from a large dataset.
– Indeed the pdf can be obtained by considering a normalised histogram
and taking the limit as the sample size tends to infinity and the bin size
tends to zero.
Properties Probability Density Functions
• For a pdf, p(X), then the probability of obtaining a value of x in

the range a to b ( Pr{a<x<b} ) is Pr{30<x<45}=0.29
b
Pr a  x  b =  p ( x ) dx
a
• All values must lie between - and , i.e. Pr{- <x< }=1,
hence for any pdf 
 p ( x ) dx = 1
−
• Since a probability cannot be negative, then also a pdf cannot

be negative.
Expectations/Means/Averages
• The concept of the average of a data set is familiar, e.g.
1 N
x =  xk
N k =1
• As the sample size (N) increases such an average converges to
the mean value which is also called the expectation.
• The expectation, denoted E  X  , can be computed directly
from the pdf using

1 N
E  X  =  x p ( x ) dx = lim  xk
N → N
− k =1
• Note integral corresponds to the centre of mass of the shape

of p(x).
General Expectations
• More generally one can compute expectations of any function
of the random variable using:

E  f ( X )  =  f ( x ) p ( x ) dx
−
• For example

1 N
E cos ( X )  =  cos ( x ) p ( x ) dx = lim  cos ( xk )
N → N
− k =1
Moments
• The moments, mn, of a process are defined as the expectation
of the monomials: 
mn = E  X n  =  x n p ( x ) dx
−
• The central moments are defined as:


M n = E ( X − m )  =  ( x − m ) p ( x ) dx m = EX 
n n
 
−
• For many engineering measurements, the mean of a process
is zero (m=0), so the moments and central moments are the
same.
– A process whose mean is zero is commonly called a “zero-mean
process”, this is an assumption we shall frequently invoke.
Interpreting Moments
• The first moment, the mean m, is a measure of the typical
value generated by the random process.
• The second central moment is called the variance, s2, (its
square root is call the standard deviation, s, and if m=0 then
the standard deviation is the root mean squared (rms) value).
• The third moment is a measure of the skewness, i.e. how
symmetrical the pdf is.
• The fourth moment is a measure of peakiness of a
distribution, i.e. how large the extreme values are.
PDF vs Moments
• Computing the pdf directly from a data set requires a large
amount of data.
• It is much easier to estimate the moments, since
 N
1
mn = E  X n  =  x n p ( x ) dx   xk n
−
N k =1
i.e. one just averages the data raised to different powers.

• Using a small number of moments one can understand some
of the general features of the pdf.
p( x) =
1
10 2
e −( x −50) /200
2
Examples p( x) =
1
20
e −( x −10) /20
2
m=50 m=10
s=10 s=3.2
M3=0 M3=0
M4=30,000 M4=300
m=1 m=0
s=1 s=2
M3=2 M3=0
M4=9 M4=24
p ( x ) = e− x x0 1 −x
p( x) = e
=0 x0 2
Examples (cont’d)
• Consider the pdf: p ( x ) = e− x x0
=0 x0
 
−x 
• First moment (mean) E  X  =  xe− x dx =  xe  0 +  e− x dx = 1
• Second moment
0 0

(variance) 
E ( X − 1)  = 

2
 ( x − 1)
2
e − x dx =  (x 2
)
− 2 x + 1 e − x dx
0 0
  
 xe − x dx = 1,  e − x dx = 1  E ( X − 1)  =  x 2e − x dx − 1
2
 
0 0 0
 
2 −x 
 x 2 e − x dx =  − x e  + 2  xe − x dx = 2  E ( X − 1)  = 2 − 1 = 1
2
0  
0 0
• You can prove the 3rd and 4th moment results yourself ……
Example: Using Data
• Consider 1000 samples from the exponential pdf.
• We can compute the average of these samples, which
happens to be 1.0158.
• We can compute the average of (x-1.0158)2, which is 1.01.
• Similarly we can compute the 3rd central moment
1 N
 ( xk − 1.0158)
3
= 1.973
N k =1
1 N
• The fourth central moment is  ( xk − 1.0158) = 8.17
4
N k =1
• Note these values are quite close to the theoretical values of

1, 1, 2 and 9 see on previous slide and they can be quickly and
efficiently computed from measured data.
Bivariate Data
• Consider an experiment where one measures 2 quantities, for
example height and weight of subjects.
• The data then consists of a pair of numbers, say X and Y.
• So for example X could be a subject’s height and Y their
weight.
• Commonly one wants to understand how (and whether) X and
Y depend on each other.
– For height and weight the fact there is a connection is intuitively
“obvious”.
– In other cases this the existence of a relationship might be an
interesting question, for example if X is the amount of a drug taken
and Y is length of time after treatment the patient survived.
Bivariate PDFs
• For bivariate data one can defined a 2 dimensional pdf, p(X,Y).
• This has equivalent properties as the 1 dimensional pdf:
• It is positive p ( X , Y )  0
• Areas below the curve represent probabilities
b d
Pr a  x  b, c  y  d  =   p ( x, y ) dx dy
x =a y =c
• The total area under the curve is 1.

 
  p ( x, y ) dx dy = 1
x =− y =−
Further Properties of Bivariate PDFs:
Expectations
• One can use bivariate pdfs to compute expectations, in
general  
E  f ( X , Y )  =   f ( x, y ) p ( x, y ) dx dy
− −
• One can recover the univariate pdfs from the bivariate

distributions, specifically
 
p( X ) =  p ( X , y ) dy p (Y ) =  p ( x,Y ) dx
− −
Independence
• If Y does not depend on X then knowing the value of X
provides no information about the value of Y.
• This means that Pr c  y  d  does not depend on X.
• Mathematically that means that p(X,Y) can be factorized as
p(X)p(Y).
• If p ( X , Y ) = p ( X ) p (Y ) then X and Y are said to be
statistically independent (or just independent).
• Statistical independence tends to be hard to prove, since it is
based on whether one can factorise the bivariate pdf.
Correlation
• Correlation is a weaker condition than independence but is
much easier to demonstrate.
– By “weaker” we mean that independent processes are always
uncorrelated, but uncorrelated processes are not necessarily
independent.
• The correlation, Rxy, between two processes is defined as
Rxy = E ( X − m x ) (Y − m y )  m x = E  X  , m y = E Y  ,
• If Rxy=0 the processes are said to be uncorrelated.

• If Rxy0 the processes are said to be correlated.
Example of Correlated Data
• Consider randomly picking adults and measuring their height
and weight.
• If we measure 100 people we might get data like:
Clearly the mean weight

for people close to 160 cm
tall is less than that for people
Mean weight close to 180 cm.
That does not mean everyone

who is 180 cm tall is heavier
than everyone who is 160 cm
Mean height
tall.
Computing Correlation
• To calculated the correlation is conceptually simple:
– Calculate the averages of both data sets
N N
 
1 1
x= xk y= yk
N k =1 N k =1
– Subtract these means from the samples
xk = xk − x yk = yk − y
– The compute the mean of the product xk yk
N
xy
1
Rxy = k k
N k =1
• Notes:
– This is not the most computationally efficient method.
– Strictly the 1/N factor in the last step should be 1/(N-1) – don’t worry
why, it makes little difference unless N is very small.
Positive and Negative Correlations
• The height vs weight data is an example of a positive
correlation: an increase in weight is associated with an
increase in height.
– In the graph in the preceding slide the slope of a line through the data
is positive.
• Data for which an increase in one variable is associated with a
reduction in the other is called a negative correlation.
– For example if one were to plot life time against number of cigarettes
smoked one would expect to see a graph in which the average life time
reduces as the number of cigarettes smoked increases – a negative
correlation.
Examples
Effect of Scale
• The height and weight data were expressed in cm and kg.
• If height were expressed in metres, then all of the x values
would be multiplied by 1/100.
• Thus the value of Rxy would be similarly scaled by 1/100 if we
used m instead of cm.
• Clearly this does not mean that the correlation is less because
of the units used to express values!
• To avoid such dependencies on the units used one can define
the correlation coefficient.
(Pearson’s) Correlation Coefficient
• The correlation coefficient, r, is defined as:
E ( X − m x ) (Y − m y ) 
s x = E ( X − m x ) , s y = E (Y − m y ) 
  
2 2
r=
sxs y    
• sx and sy are the standard deviations (see earlier slides).

• The correlation coefficient is independent of the units used to
measure the quantities.
• It can only take on values between -1 and +1.
• If |r|=1, then Y=aX, for some constant a.
– If r=-1, then a is negative and the correlation is negative
– If r=+1, then a is positive and the correlation is positive
Random Time Series and
Correlation Functions
Outline
• Basic principles
– Ensemble averages
– Stationary signals
– Ergodicity
• Correlation function
• Cross correlation functions
What is a Random Time Series?
• [Throughout this section we shall consider continuous time
signals – a parallel set of concepts can be applied to discrete
signals, with very little modification]
• A random time series is a signal that is random …..
• If we make multiple measurements of the same process we
obtain different signals which have the same “structure”.
Details in the signals are different, but their

underlying structure is the same, so they
sound the same.
Random Signals
• In the same way we denote a random variable as X and

realisations of that variable x, we do the same for signals, i.e.
the random process is X(t) and a realisation is denoted x(t).
– There is the potential for confusion here, capital X is used for the
random variable as well as for the Fourier transform (the context and
the function’s argument should make the intended meaning clear).
• We can talk of pdfs and moments of these processes.
• For example E  X ( t )  is the mean of the signal at time t or
p(X(t)) is the pdf of the signal at that time.
Ensemble Averages
• We may be able to measure a signal many times (multiple

realisations) to generate a set of time series, xk(t).
• This allows us to use averaging to compute moments, for
example the mean (under some broad assumptions) is:
N

1
E  X ( t )  =  x ( t ) = lim xk ( t )
N → N
k =1
N

and we can use 1
E  X ( t ) 
  xk ( t )
N k =1
• Averages computed over multiple realisations are called
Ensemble averages.
• In many applications making a large number of measurements
is not feasible.
Time Averages
• If only one realisation is available then one can compute an

average across time. T

1
E t  X ( t )  = x ( t ) dt
T
0
• The notation E t   is used here for time averages, to

distinguish them from ensemble averages E   .
• In general time and ensemble averages are quite different for
instance:
– The time average of a signal is a single value.
– The ensemble average is defined for each point in time, so it is itself a
signal.
Stationarity
• Some signals have a mean which is constant in time, i.e.
E  X ( t )  = C
where C is a constant, so that the mean is independent of
time.
• Such signals are said to have a stationary mean.
• This extends to other moments, for instance, for the variance
x (t ) = E ( X (t ) −  x (t )) 

2 2
 
• If the variance is independent of t then the process is said to

have a stationary variance.
• If all the statistics are independent of time then the process is
said to be wide-sense stationary.
Ergodicity
• It is easier to compute time averages rather than ensemble

averages (since you only need one realisation to compute a
time average).
• One would like to be able to say, using the mean as an
example, that:
E t  X ( t )  = E  X ( t ) 
• This can only be reasonable if the ensemble average is

independent of time (i.e. stationary for that average).
• If the time and ensemble average are equal then the process
is said to be ergodic.
Correlation Functions
• The internal structure of a signal can be evaluated by
considering how two points in one signal are correlated with
each other.
• We shall assume that the signal is stationary and has zero
mean, so that
 x (t ) = 0 t
• The correlation function is defined as
rxx ( t1 , t2 ) = E  X ( t1 ) X ( t2 ) 
• This is the correlation between the signal at two different

points in time t1 and t2.
Interpretation of Correlation
• The correlation function rxx ( t1 , t2 ) describes how well the
signal’s value at time t1 relates to its value at t2.
• In general, if t1 and t2 are close together then one expects the
correlation function to be larger than if they are further apart.
• Sometimes the correlation function is written in terms of a
time (t) and a delay (t).
rxx ( t , t ) = E  X ( t − t ) X ( t )   t1 → t − t, t2 → t
• The correlation function depends on the two times (t1 and t2

or t and t) and hence is difficult to plot and interpret.
• One needs to compute it using ensemble averaging.
• Thus this form of correlation is not particularly practical.
Correlation for Stationary Signals
• If a signal is stationary, then the correlation function rxx ( t1 , t2 )

only depends on the interval between the two time points,
i.e. on the value of t (=t2-t1).
– So two points in a signal separated by the same interval (t), are have
the same correlation regardless of where they are in the signal, i.e
regardless of t.
• Thus for a stationary signal we can define the correlation
function as
rxx ( t ) = E  X ( t − t ) X ( t ) 
Properties of Correlation Functions
• These functions have some basic properties:

– They are symmetric rxx ( −t ) = rxx ( t )
(to see that consider t → t + t in the definition of rxx ( t ) )

– The correlation function at t=0 is equal to the variance
r ( 0 ) = E  X ( t )  = 2
2
 
– The correlation never exceeds the value at t=0.
r ( t )  r ( 0 ) = 2 t
This can be thought of as defining the idea that no point can be better
correlated with X(t) than X(t) itself.
Time Averages
• If the signal is stationary and we can assume that the signal is

ergodic then 

1
rxx ( t ) = E  X ( t − t ) X ( t )  = lim x ( t − t ) x ( t ) dt
T → T
0
• This means that one can estimate this form of correlation

function from one realisation of the time-series.
Example of Correlation Functions
Broad correlation function
Narrow correlation function
A broader correlation function,

as in the upper frames, implies
that the signal changes
gradually compared to signal’s
Highly erratic signal
with narrow correlation functions.
Relatively slowly evolving signal
Cross-Correlation Functions
• The idea of a correlation function can be extended to consider

two processes, say X(t) and Y(t), this is called a cross-
correlation function.
• The cross-correlation functions are defined as
rxy ( t1 , t2 ) = E  X ( t1 ) Y ( t2 ) 
• If both X(t) and Y(t) are stationary then, as before, it is not the
absolute times t1 and t2 that matter but the time difference,
i.e. rxy ( t ) = E  X ( t − t ) Y ( t ) 
• Note that if Y(t)=X(t) then rxy ( t ) = rxx ( t )

Example – A Delay
• If Y(t) is a scaled and delayed version of the process X(t), i.e.

Y ( t ) = aX ( t − t 0 )
where a is a constant and t0 is the delay.
• Then rxy ( t ) = E  X ( t − t ) aX ( t − t0 ) 
= aE  X ( t − t ) X ( t − t0 )  = arxx ( t − t0 )
• The cross correlation then peaks at t=t0, which allows one to

estimate delays, for example in sonar and radar.
Notes on Cross-Correlation Functions
• In general, the cross-correlation function does not have the

symmetry inherent in the correlation function.
• The order of the subscripts in rxy ( t ) matters.
rxy ( t ) = E  X ( t − t ) Y ( t ) 
ryx ( t ) = E Y ( t − t ) X ( t )  = E  X ( t + t ) Y ( t )  = rxy ( −t )
• In general, cross-correlation function are not symmetric and

do not peak at t=0.
• The maximum value of the cross-correlation is limited by
rxy ( t )   x 2  y 2
2
Variance of X(t) Variance of Y(t)

Spectra of Random Signals
Outline
• Fourier transform of random signals

• Definition of the Power Spectra Density
• Wiener-Khinchin theorem
• Examples
• Cross spectra
• Coherence functions
• Input/Output Relations
Stationarity Revisited
• A process is stationary if its statistics do not vary with time.

– For example if it has a stationary mean then  x ( t ) = E  X ( t ) is
independent of t.
• Hence a truly stationary signal lasts for all t
– It can not stop or its statistics will change.
• Thus a signal is never actually stationary but we can obtain
useful results by modelling it as such.
• However, the idea of a stationary signal which does not stop
causes problems when developing theory.
• This stems from the fact that for a stationary signal
T /2
Energy =
 x ( t ) dt →  as T → 
2
−T /2
Fourier Transform of a Random Signal
• Consider a signal x(t) which is a realisation of a stationary

random time-series X(t).
• The Fourier transform of a signal is only defined for signals
with finite energy.
– Stationary signals have infinite energy.
• Whilst it is not helpful to consider the energy of a stationary
signal, it is helpful to consider its power, defined as
T /2

1
Power = x ( t ) dt → 2 = E t  X ( t )  as T → 
2 2
T  
−T /2
Signal’s variance
Fourier Transform in the Limit
• To cope with infinite energy one proceeds using following:

• Define a windowed segment of the signal
xT ( t ) = x ( t ) −T / 2  t  T / 2
=0 Elsewhere
• This has a Fourier transform X T ( f ) which is defined as long
as T< .
– Be aware of the distinction between the notation for the Fourier
transform X T ( f ) and the notation for a random time-series X ( t ).
Nature of the FT of a Random Signal
• If x(t) is random it should not be a surprise that the Fourier

transform of x(t) is also random.
• Parseval’s theorem states that:
T /2 
 
1 1
xT ( t ) dt = X T ( f ) df
2 2
T T
−T /2 −
• Accordingly it is reasonable to consider the behaviour of

XT ( f )
2
T
• Which is, as noted above, a random quantity.
Examples (on a linear scale)
Examples (on a logarithmic (dB) scale)
Average Fourier Transform
• Since X T ( f ) / T is random it is natural to consider its

2
average, i.e. 1 
E XT ( f ) 
2
T  
• The expectation here represents an ensemble average.

• The quantity depends on T.
• As T increases the resolution of the FT increases.
• We can talk of this quantity in the limit, i.e.

E XT ( f ) 
2
S xx ( f ) = lim  
T → T
Power Spectral Density (PSD)
• S xx ( f ) is the power spectral density (PSD) or the power

spectrum or just the spectrum of a process.
• It describes how the power in a signal is distributed with
frequency.
Power in the signal X between 2000 Hz
and 3000 Hz can be computed using the
area under the PSD in the region
2000<f<3000.
Note that the fact that the PSD includes

negative frequencies means we need to
compute:
−2000 3000 3000

−3000
S xx ( f ) df +

2000
S xx ( f ) df = 2

2000
S xx ( f ) df
Properties of PSDs
• PSDs are real and non-negative

– Their definition makes this clear, since X T ( f ) is positive and real.
2
• It is symmetrical in frequency
S xx ( − f ) = S xx ( f )
• The total area under the PSD is the signal’s variance

 Signal’s variance or

−
S xx ( f ) df = 2 its power.
Wiener-Khinchin Theorem
• The PSD and the correlation function form a Fourier transform

pair, such that:
S xx ( f ) = F rxx (  )
rxx (  ) = F −1 S xx ( f )
• From the basic properties of Fourier transforms we note that

short correlation functions are associated with broad spectra
and narrow spectra are associated with broad correlation
functions.
Example
• Consider a signal with the correlation function:

rxx (  ) = e−  
• This has a spectrum

 
S xx ( f ) =

−
rxx (  ) e−2if  d  =

−
e−  e−2if  d 
 0
2
e
0
− −2 if 
e d +
e
−
 −2 if 
e d = 2
 + 42 f 2
Example: Results
=100
=10
=1
Example: White Noise
• Consider a signal whose correlation function is a Dirac delta
function. 
rxx (  ) =  (  )

S xx ( f ) =  (  ) e−2if  d  = 1
−
• Two points in the time series, separated by  (0) are

uncorrelated (for all delays).
• The spectrum is constant for all frequencies.
• White noise is commonly used as an idealised noise process.
• White noise is not realisable in continuous time, since it has
infinite power.
• For digital systems white noise is simple to implement
– One just draws each new sample independently from the required pdf.
The Cross-Spectrum
• The PSD can be expressed as:

E X T ( f ) X T ( f )
 *
S xx ( f ) = lim   * Denotes complex conjugate

T → T
• If one measures to random signals X(t) and Y(t) then one can
define a cross-spectrum as:
E  X T ( f ) YT ( f )
*
S xy ( f ) = lim  
T → T
• Note that the first subscript in S xy ( f ) refers to the signal

whose FT is conjugated in the definition of the cross-
spectrum.
Cross-Spectrum Properties
• In general the cross-spectral density (CSD) is complex valued

– Unlike the PSD which is real.
• For negative frequencies one can show
E  X T ( − f ) YT ( − f ) E  X T ( f )YT ( f ) 
* *
S xy ( − f ) = lim   = lim   =S f
yx ( )
T → T T → T
• Which also implies that S yx ( f ) = S xy ( f )

*
• Wiener-Khinchin theorem for CSDs

S xy ( f ) = F rxy (  )
rxy (  ) = F −1 S xy ( f )
Coherence Function
• For the CSD one can show that:
S xy ( f )  S xx ( f ) S yy ( f )
2
• The coherence function is defined as: 2

S xy ( f )
 xy ( f ) =
2
S xx ( f ) S yy ( f )
• From the above relationships it is clear 0   2xy ( f )  1
• If the coherence function is close to unity for some frequency,
then this suggests that in that band X(t) and Y(t) are highly
correlated.
• If the coherence is small then in that band the processes are
uncorrelated.
Input-Output Relationships
• Consider the case where the two processes (X(t) and Y(t)) are
the input and output of a linear system with FRF H(f).
Input, X(t) Linear System Output, Y(t)
H(f)
• For such a situation we know that

YT ( f ) = H ( f ) X T ( f )
• Using the above relationship, the following can be shown:
E YT ( f ) YT ( f )
   ( ) ( ) 
* *
E X f X f
S yy ( f ) = lim   = H ( f ) lim 
2 T T 
T → T T → T
= H ( f ) S xx ( f )
2
Input-Output Relationships (Cont’d)
• Cross-spectral density
E X T ( f ) YT ( f )
  E X T ( f ) H ( f ) X T ( f )

* *
S xy ( f ) = lim   = lim  
T → T T → T

E XT ( f ) 
2
= H ( f ) lim   =H f S f
( ) xx ( )
T → T
• Using the last two results
S xy ( f ) H ( f ) S xx ( f )
2 2 2
 2
(f )= = =1
S xx ( f ) S yy ( f ) S xx ( f ) H ( f ) S xx ( f )
xy 2
Principles of Estimation Theory
Outline
• Estimation theory
– What is an estimator?
– What is a good estimator?
– Bias/Variance/Mean squared error
– Consistency
• Estimation of PSDs
– Periodograms
– Segment averaging
– Bias – Variance trade off
• Estimation of Cross Spectral Densities
General Estimation Problem
• Consider data, e.g. a digital time series x[n], from which one seeks to
estimate the value of some parameter q.
– Examples problems might be:
• From a set of data consisting of a sine wave in noise, estimate the frequency.
• For noise data which is thought to lie on straight line, estimate the slope of that line.
• From a transfer function, near a mode, what is the damping coefficient for that
mode?
• The goal is to use the data to generate a number, q̂ , which approximates q.
The “hat” notation is used to identify an estimate,

so that q̂ is an estimate of q.
What is an Estimator?
• In general an estimator takes the dataset and from that constructs a value,
q̂ , which is intended to approximate the parameter q.
• So that qˆ = f ( x ) where x represents the data set.
• Any function f() can be regarded as an estimator …. just some are good
estimators and some bad.
• Questions include:
– What function to choose so that the estimator is as “good” as possible?
– Are there optimal estimators? i.e. are there choices for the function f() which
give the best possible results?
– What do we mean by a good estimator?
Example Problem
• Consider a data set xn, yn. The data is believed to lie on straight line and
that straight line is know to pass through (0,0). The problem is to find a
way of estimating the slope of the straight line.
• Three possible candidate methods* (ways to construct estimators) are:
a) Take the biggest value of x and the biggest value of y and divide one by the
other, call that qˆ a.
b) Compute the mean of x and the mean of y and the slope should be the ratio
of these means, call that qˆ b.
c) One can use linear regression (fitting a straight line) and take the slope of
that line – ignoring the fact the intercept of that line might be different from
zero, call that qˆ c .
* Notethere are an infinite number of methods that could be considered, nearly all of those would
have no logical basis, the three illustrated here are chosen to have some reasoning behind them.
Example on One Data Set
• Consider one data set of 10 measurements from which we can construct
an estimate.
x y
0.4036 0.0939 * Data points
0.8765 0.1837 Red dotted line is the “true”
0.6154 0.1707 curve, which is only known
0.0636 -0.0067 because I simulated the data.
0.4610 0.0541 Blue dotted line is the best fit
0.4201 0.0862 straight line to the data set.
0.5578 0.1676
0.7780 0.1806
0.9371 0.2348 qˆ a : maximum of x is 0.9371, maximum of y is 0.2348, so
0.0692 0.0321 the estimate of the slope is 0.2505.
qˆ b : mean of x is 0.5182, mean of y 0.1197, so the estimate for
the slope is 0.2310.
qˆ c : best fit straight line is y=0.2456x-0.0076, so the estimate for
the slope is 0.2456.
Comments on Results
• From the analysis of the one data set in the last slide we see each
estimator produces a different estimate for the slope:
– Estimator a) suggests 0.2505
– Estimator b) suggests 0.2310
– Estimator c) suggests 0.2456
• Recall the correct answer is 0.25, so it looks like estimator a) is the best
………
• Or is it? The previous data set was generated using one set of 10 random
numbers representing the data.
• If we generate a new set of random numbers to simulate the data and
recompute the estimators we get the values of 0.2802, 0.2430, 0.2884 for
the three estimators a), b) and c)….. so now b) looks best being closest to
the true answer 0.25.
Repeated Testing
• One can run lots of tests for different sets of 10 random numbers and for
each set calculates the values for each of the 3 estimators:
Data Set Number Estimator a) Estimator b) Estimator c)
1 0.2802 0.2430 0.2884

2 0.2615 0.2411 0.2180
3 0.2525 0.2507 0.2253
: : : :
10,000 0.2505 0.2310 0.2456
Performance of Estimators
• To assess the performance of an estimator we look at some basic
quantities:
– The error  = qˆ − q
– The bias bq = E    = E qˆ  − q
• Mean error or difference between the mean value of the estimator and the true
parameter value.
• Estimators for which bq=0 are call unbiased (on average they give the right answer).
 ˆ
( ) 
2
– The variance  = E  q − E q  
2
 ˆ 
q
 
• Measures the variability of the estimate about its mean value
• Note an estimator which always gives he same value, regardless of the data, will
have zero variance (but generally will be a useless estimator)
– The mean squared error (mse)  q = E  2 
• The average squared error (!)
• The perfect estimator always gives zero error, so has a mean squared error of zero.
• The mse can be related to the bias and the variance, specifically
 q = q2 + bq 2
• If an estimator is unbiased, then its variance and MSE are the same.
Bias and Variance
• Non-zero bias means that on average the result is wrong.
• Large variance means that the results are widely spread.

Values for our Example
Using 10,000 realisations of our simulated data we can look at the bias, variance
and mean squared errors for our 3 candidate estimators.
Estimator Bias Variance MSE

qa 0.2603-0.25=0.0103 0.00056 0.00067

qb 0.2499-0.25=0.0001 0.00028 0.00028

qc 0.2504-0.25=-0.0004 0.00095 0.00095
 
Both qb and qc seem to have biases very close to zero so appear to be unbiased,

whereas qa has a comparatively large bias, i.e. it is biased.

It is estimator qb which has the lowest mse, so is the best estimator of the three
considered here.

Note that whilst qa has the largest bias, it has a relatively low variance and its mse

is better than qc.
  
In summary we would rank the estimators, best to worst, as qb, qa and then qc.
Consistency
• An estimator is said to be consistent if the mse tends to zero as the data
length increases.
– Estimators which are not consistent are considered to be poor.
Bias
For the example of a slope we can look at the behaviours
of the bias, variance and mse for different data lengths N.
Variance
MSE
Comments on Consistency
• The results from our example show a couple of points:
– The estimators qˆ b and qˆ c are both consistent – as the data length increases
then the mse for both estimators reduces and tends towards zero.
– For all the data lengths considered then the best performing estimator is
always qˆ b .
– The estimator qˆ a is very poor.
• As the data length increases its performance decreases (mse increases)!
• It was only the fact that we originally looked at short data lengths (N=10) that
meant it seemed to do reasonably (it was our second best estimator).
Optimal Estimators
• How does one find the overall best (optimal) estimator?
• We could just consider all the estimators we can think of and measure the
mse for them all and choose the best.
• For our slope example there are lots more “sensible” estimators one might
think of, for example: …..and even more
nonsensical
N

1 yn median( yn ) y
qˆ d = ; qˆ e = ; m = arg max ( xn ) , qˆ e = m
N n =1
xn median( xn ) xm
• A better approach is to define a measure of performance a find the

estimator which maximises that metric.
Example (again)
• Begin with the generative model (i.e. how the parameter relates to the
data, how does an estimate of q, generate values of y).
ŷ = qˆ x
• We then can consider a mean squared error (least squares) metric of
performance, L, defined as:
N N
 ( yn − yˆ n ) ( )
2
yn − qˆ xn
2
L= =
n =1 n =1
A good estimate of q will make N
estimates of the data y close to the measured values
dL
N xy n n
d qˆ
= −2  (
xn yn − qˆ xn = 0  qˆ * = ) n =1
N
n =1
n =1
xn2
Notation for the optimal estimator

Performance of Optimal Estimator
• We can compute the mse of the optimal estimator.
Comments
• The optimal estimator does have a mse better than any of the estimators
we first considered.
• The claim is that one can never find an estimator which has a lower mse.
• Note that the optimal estimator is
N Cross-correlation between x and y.
xy
1
n n
N
qˆ * = n =1
N Variance of x

1
xn2
N n =1
1/N has been included in the numerator and denominator

this clearly does not change the value of the estimator.
Estimation of the PSD
• Given a measured time series x(t) we look to estimate the power spectra
density (PSD).

E XT ( f ) 
2
S xx ( f ) = lim  
T → T
• One might initially consider the following
X(f)
2
• This is one spectral estimator, called the periodogram.

– The periodogram is just the squared magnitude of the Fourier transform of
the data, scaled by 1/T – there is no averaging.
– The Fourier transform can be computed using a windowing function
• In which case one needs include an additional normalisation factor not shown here
Periodogram Example
Data is the output of white noise driving a 2nd order Butterworth bandpass filter (800 Hz – 2.4 kHz).
The Fourier transform is computed here using a Hanning window.
Linear Scale Logarithmic (dB) Scale
N=128 N=256 N=128 N=256
N=512 N=1024 N=512 N=1024
Green line shows the true PSD

Variance of Periodogram
Below shows the variance of the periodogram computed for different data lengths (N).
Computed using the previous example, looking at one frequency (3 kHz) and computing
the periodogram for 1000 realisations of the noise process.
Note the variance does not

reduce as the data length
increases from 128 to 50,000.
Recall that if an estimator (here the

periodogram) is unbiased then it
variance is equal to the mse
Bias in the Periodogram
• The frequency resolution of the periodogram needs to be sufficient to
resolve peaks in the PSD.
• Sharp peaks in the PSD tend to be under-estimated by the periodogram if
its resolution is inadequate.
• Whereas sharp troughs tend to be over-estimated if the resolution is
inadequate (or if no window is employed).
• Greater data lengths lead to more frequency resolution and thus less bias
in the periodogram.
Example - Bias
• Consider an AR system with 2 poles close to the unit disc at a location
corresponding to ±2 kHz (assuming an 8 kHz sample rate).
Dotted line = true PSD

Blue line = averaged periodogram
estimate.
1000 periodograms were computed

and averaged to generate blue curves
Note that when the data length is short

the peak in the averaged periodogram
is lower than the true PSD, i.e. the
average periodogram is different to the
true PSD – it is biased.
Example Again
• For the same example as the preceding slide we look at the bias at 2 kHz
(the frequency of the peak) as the FFT size changes.
Bias reduces as the FFT size

increases.
Bias is negative because the
peak is under-estimated.
Comments on the Periodogram
• The periodogram is not consistent.
– Its variance does not reduce as the data length increases.
– Since the variance does not reduce, the mse (which is the variance plus the
square of the bias) does not reduce.
• Recall mse → 0 as N →  is the definition of consistency.
• It can be shown that the variance of the periodogram is equal to the true
PSD value.
– Confirming the above observation that the variance is independent of N.
– The periodogram has greater variance when the PSD is large and a smaller
variance when the PSD is small.
• This means the simple periodogram should not be used as an estimator of
the PSD.
– There are modifications to the periodogram which can be used, but in the
form here it should not be applied.
Segment Averaging
• The most common (in engineering at least) method for estimating PSDs is
the segment averaging approach, sometimes called Welch’s method or the
direct method.
• The method consists of the following basic steps:
– Divide the signal into, possibly, overlapping blocks in time.
– For each block (or segment), compute the periodogram
• Usually apply a windowing function.
• FFT the data
• Compute the squared magnitude
• Scale appropriately (e.g. divide by 1/T if using a rectangular window).
– Average the resulting periodograms to obtain the final PSD estimate.
• The averaging reduces the variance of the periodograms.

• Each segment acts a little like a different realisation of the process.
Illustration of Segment Averaging
Choice of Segment Length
• The choice of the segment length, Ts, is a key parameter using Welch’s
method (segment averaging).
• The number of segments in a T second recording will be proportional to
K=T/Ts.
• The variance of the averaged periodogram will be reduced by a factor
proportional to 1/K.
– So the larger K the lower the variance.
– Thus the shorter the segment length, Ts, the larger K, leading to a lower
variance for the final estimate.
• The bias reduces as the frequency resolution of the estimator increases.
– The resolution increases as Ts increases.
• Choosing Ts requires one to find a balanced between bias and variance.
• In summary:
– Small Ts, large K, low variance, high bias.
– Large Ts, small K, high variance, low bias.
Example
• Consider the example of an AR model discussed earlier.
• The data consists of 10,000 samples, PSDs are computed for block lengths
of 128, 256, 1024 and 4196 samples.
High bias
Low variance 4096
N=4096
Low bias
High variance
Estimating Cross-Spectra
• One can use segment averaging to compute the cross-spectrum in a
manner similar to that used to estimate the PSD.
• Given two time series x(t) and y(t) one aims to estimate the cross-
spectrum S xy ( f ) where
1
S xy ( f ) = lim E  X T ( f ) YT ( f ) 
*
T → T  
• The basic steps are:
– The two signals are segmented and a window may be applied.
– For each segment the Fourier transforms, X k ( f ) and Yk ( f ) , are computed,
where k is the segment number.
– The product X k ( f ) Yk ( f ) is formed.
*
K

1
( ) Yk ( f )
*
– These products are averaged across the segments K X k f
k =1
– This is then scaled by 1/T to produce an estimate of the cross-spectrum.
Illustration of Cross-Spectrum Estimation
x(t)
y(t)
Errors in Cross-Spectrum
• Consider the two signals x(t) and y(t) which are correlated with a delay of
t0 seconds between them.
• If the window length Ts is small compared to t0 then features in one signal,
say x(t), will not appear in the corresponding segment in y(t).
• Delays between x(t) and y(t) can result in a significant reduction in the
estimated cross-correlation, i.e. can lead to strong biases.
Example
Ts
Delay t0
Example: Effect of Delay
• Cross-spectrum computed for the input and output from a digital filter.
• An extra delay is incorporated in the output.
• The cross-spectrum is computed using a 256 Hanning window.
Example: Effect of FFT Size
• Example as previous slide, but with fixed delay of 128 samples.
• In this case the FFT size is varied.
Estimating Frequency Response
Functions (FRFs)
Outline
• Problem definition
• H1 and H2 estimators
• Relationship to the coherence function
• Biases in H1 and H2
• Reasons for lack of coherence
Problem Definition
• A common task in engineering is to estimate the frequency response
function for a system.
• This is commonly achieved using a controlled (and measured) input, x(t),
and measuring the response, y(t).
Input, x(t) System Output, y(t)

H(f)
• So the problem is to estimate H(f) from the measurements x(t) and y(t),
in this case assuming x(t) (and thus y(t)) are random signals.
Relationship between Spectra
• We have already seen that if x(t) and y(t) are the input and output of a
linear system, with a frequency response, H(f), then the PSDs and cross-
spectra are related by:
𝑆𝑆𝑦𝑦𝑦𝑦 𝑓𝑓 = 𝐻𝐻(𝑓𝑓) 2 𝑆𝑆𝑥𝑥𝑥𝑥 𝑓𝑓
𝑆𝑆𝑥𝑥𝑦𝑦 𝑓𝑓 = 𝐻𝐻(𝑓𝑓)𝑆𝑆𝑥𝑥𝑥𝑥 𝑓𝑓
𝑆𝑆𝑦𝑦𝑥𝑥 𝑓𝑓 = 𝐻𝐻(𝑓𝑓)∗ 𝑆𝑆𝑥𝑥𝑥𝑥 𝑓𝑓 * means conjugate
• These can be rearranged so that H(f) is the subject of the equations, in (at
least) two different ways:
𝑆𝑆𝑥𝑥𝑥𝑥 (𝑓𝑓)
𝐻𝐻 𝑓𝑓 =
𝑆𝑆𝑦𝑦𝑦𝑦 (𝑓𝑓)
𝐻𝐻 𝑓𝑓 =
𝑆𝑆𝑦𝑦𝑥𝑥 (𝑓𝑓)
Estimators of FRFs
• The two formulations for H(f) are identical if the theoretical spectra are
considered.
• In practice all of Sxx(f), Syy(f) and Sxy(f) have to be estimated from the
available data – which means that the two formulations for H(f), on the
last slide, will not be the same.
• These estimated quantities are given the names H1(f) and H2(f) and are
defined as:
𝑆𝑆̂𝑥𝑥𝑥𝑥 (𝑓𝑓)
𝐻𝐻1 𝑓𝑓 =
𝑆𝑆̂𝑥𝑥𝑥𝑥 (𝑓𝑓)
𝑆𝑆̂𝑦𝑦𝑦𝑦 (𝑓𝑓)
𝐻𝐻2 𝑓𝑓 =
𝑆𝑆̂𝑦𝑦𝑥𝑥 (𝑓𝑓)
Comments
• The estimator H1(f) is defined as
𝑆𝑆̂𝑥𝑥𝑥𝑥 (𝑓𝑓) ∑ 𝑋𝑋𝑛𝑛∗ (𝑓𝑓)𝑌𝑌𝑛𝑛 (𝑓𝑓)
𝐻𝐻1 𝑓𝑓 = =
̂
𝑆𝑆𝑥𝑥𝑥𝑥 (𝑓𝑓) ∑ 𝑋𝑋𝑛𝑛 (𝑓𝑓) 2
where 𝑋𝑋𝑛𝑛 (𝑓𝑓) is the Fourier transform of the nth segment of x(t).
• Compare this to the optimal estimator for a slope of line constrained to
pass through (0,0) – as per the example in earlier lecture.
∑ 𝑥𝑥𝑘𝑘 𝑦𝑦𝑘𝑘
𝜃𝜃∗ =
∑ 𝑥𝑥𝑘𝑘2
• The FRF estimation problem deals with complex valued data, whereas
the slope problem only considers real valued data. With the exception
of that difference then these two solutions are the same.
• Estimating a FRF can be viewed as finding the slope for complex valued
data.
Relationship Between Estimators
• It is pretty simple to show that:
�
𝐻𝐻1 (𝑓𝑓) 𝑆𝑆𝑥𝑥𝑥𝑥 (𝑓𝑓) 𝑆𝑆𝑥𝑥𝑥𝑥 (𝑓𝑓)𝑆𝑆𝑦𝑦𝑥𝑥 (𝑓𝑓) 𝑆𝑆𝑥𝑥𝑥𝑥 𝑓𝑓 𝑆𝑆𝑥𝑥𝑦𝑦 𝑓𝑓 ∗
= = =
𝐻𝐻2 (𝑓𝑓) 𝑆𝑆𝑦𝑦𝑦𝑦 (𝑓𝑓) 𝑆𝑆𝑥𝑥𝑥𝑥 (𝑓𝑓)𝑆𝑆𝑦𝑦𝑦𝑦 (𝑓𝑓) 𝑆𝑆𝑥𝑥𝑥𝑥 (𝑓𝑓)𝑆𝑆𝑦𝑦𝑦𝑦 (𝑓𝑓)
�𝑆𝑆 (𝑓𝑓)
𝑦𝑦𝑥𝑥
2
𝑆𝑆𝑥𝑥𝑥𝑥 𝑓𝑓
= = γ2𝑥𝑥𝑥𝑥 (𝑓𝑓)
𝑆𝑆𝑥𝑥𝑥𝑥 (𝑓𝑓)𝑆𝑆𝑦𝑦𝑦𝑦 (𝑓𝑓)
• where γ2𝑥𝑥𝑥𝑥 𝑓𝑓 is the coherence function and 0 ≤ γ2𝑥𝑥𝑥𝑥 (𝑓𝑓) ≤ 1.
• This means that 𝐻𝐻2 (𝑓𝑓) ≥ 𝐻𝐻1 (𝑓𝑓)
‒ The above considers theoretical spectra.
‒ When these spectra are estimated from data this inequality is still
guaranteed to hold.
Measurements with Output Noise
• Consider the problem of estimating the frequency response function
when the output signal is corrupted by additive noise, n(t).
n(t)
x(t) System u(t) y(t)

+
H(f)
• The noise is assumed to be uncorrelated with the signal x(t), so will also
be uncorrelated with u(t).
• What happens if one now uses x(t) and y(t) to compute the FRF?
Observations for this Model
• From the previous slide, we can express the Fourier transform of the
output as: 𝑌𝑌 𝑓𝑓 = 𝐻𝐻 𝑓𝑓 𝑋𝑋 𝑓𝑓 + 𝑁𝑁 𝑓𝑓 .
• If we consider simplified definitions of the PSD and cross-spectra, where
the division by T and the limit as 𝑇𝑇 → ∞ are not included.
‒ This is just keep the equations simpler as the factors are in all terms and so
just carry through all the expressions.
• The cross-spectrum Sxy(f) is thus:
𝑆𝑆𝑥𝑥𝑥𝑥 𝑓𝑓 = 𝐸𝐸 𝑋𝑋(𝑓𝑓)∗ 𝑌𝑌(𝑓𝑓) = 𝐸𝐸 𝑋𝑋(𝑓𝑓)∗ 𝐻𝐻 𝑓𝑓 𝑋𝑋 𝑓𝑓 + 𝑁𝑁 𝑓𝑓
= 𝐻𝐻 𝑓𝑓 𝐸𝐸 𝑋𝑋 𝑓𝑓 ∗ 𝑋𝑋 𝑓𝑓 + 𝐸𝐸 𝑋𝑋 𝑓𝑓 ∗ 𝑁𝑁 𝑓𝑓 = 𝐻𝐻 𝑓𝑓 𝑆𝑆𝑥𝑥𝑥𝑥 𝑓𝑓
• Thus the noise does not affect the cross-spectrum.
• This development uses the fact that N(f) and X(f) are uncorrelated so
that 𝐸𝐸 𝑋𝑋 𝑓𝑓 ∗ 𝑁𝑁 𝑓𝑓 = 0
Observations for this Model (cont’d)
• The PSD Syy(f) is thus:
∗
𝑆𝑆𝑦𝑦𝑦𝑦 𝑓𝑓 = 𝐸𝐸 |𝑌𝑌 𝑓𝑓 |2 = 𝐸𝐸 𝐻𝐻 𝑓𝑓 𝑋𝑋 𝑓𝑓 + 𝑁𝑁 𝑓𝑓 𝐻𝐻 𝑓𝑓 𝑋𝑋 𝑓𝑓 + 𝑁𝑁 𝑓𝑓
= 𝐻𝐻 𝑓𝑓 2 𝐸𝐸 |𝑋𝑋 𝑓𝑓 |2 + 𝐸𝐸 𝑋𝑋 𝑓𝑓 ∗ 𝑁𝑁 𝑓𝑓 + 𝐸𝐸 𝑁𝑁 𝑓𝑓 ∗ 𝑋𝑋 𝑓𝑓 + 𝐸𝐸 |𝑁𝑁 𝑓𝑓 |2
𝑆𝑆𝑦𝑦𝑦𝑦 𝑓𝑓 = 𝐻𝐻 𝑓𝑓 2 𝑆𝑆𝑥𝑥𝑥𝑥 𝑓𝑓 + 𝑆𝑆𝑛𝑛𝑛𝑛 𝑓𝑓
• Using the expressions for the FRF estimators one has:
𝐻𝐻1 𝑓𝑓 = = 𝐻𝐻(𝑓𝑓)
𝑆𝑆𝑦𝑦𝑦𝑦 𝑓𝑓 𝐻𝐻 𝑓𝑓 2 𝑆𝑆𝑥𝑥𝑥𝑥 𝑓𝑓 + 𝑆𝑆𝑛𝑛𝑛𝑛 𝑓𝑓 𝑆𝑆𝑛𝑛𝑛𝑛 𝑓𝑓
𝐻𝐻2 𝑓𝑓 = = = 𝐻𝐻(𝑓𝑓) 1 +
𝑆𝑆𝑦𝑦𝑥𝑥 𝑓𝑓 𝐻𝐻(𝑓𝑓)∗ 𝑆𝑆𝑥𝑥𝑥𝑥 𝑓𝑓 𝑆𝑆𝑢𝑢𝑢𝑢 𝑓𝑓
Inverse of the SNR at the output

Comments
• The theoretical value for the estimator H1(f) is unaffected by the
addition of uncorrelated noise on the output.
• Such a noise will cause the estimator H2(f) to over-estimate the FRF.
𝑆𝑆𝑛𝑛𝑛𝑛 𝑓𝑓
‒ It is an over-estimation because the factor 1 + >1
𝑆𝑆𝑦𝑦𝑦𝑦 𝑓𝑓
‒ This factor is real and positive, so it does not affect the phase of H2(f).
‒ The degree of over-estimation depends on the reciprocal of the signal to
noise ratio (SNR) on the output.
o High SNR (little noise) the estimator is good.
o Low SNR (more noise) and the estimator is poor.
Measurements with Input Noise
• In this case the measurement configuration is:
x(t)
n(t) +
System y(t)
v(t)
H(f)
• In this case 𝑌𝑌 𝑓𝑓 = 𝐻𝐻 𝑓𝑓 𝑉𝑉 𝑓𝑓 , 𝑋𝑋 𝑓𝑓 = 𝑉𝑉 𝑓𝑓 + 𝑁𝑁(𝑓𝑓) and we have:

𝑆𝑆𝑥𝑥𝑦𝑦 𝑓𝑓 = 𝐻𝐻 𝑓𝑓 𝑆𝑆𝑣𝑣𝑣𝑣 𝑓𝑓
𝑆𝑆𝑥𝑥𝑥𝑥 𝑓𝑓 = 𝑆𝑆𝑣𝑣𝑣𝑣 𝑓𝑓 + 𝑆𝑆𝑛𝑛𝑛𝑛 𝑓𝑓
𝑆𝑆𝑦𝑦𝑦𝑦 𝑓𝑓 = |𝐻𝐻 𝑓𝑓 |2 𝑆𝑆𝑣𝑣𝑣𝑣 𝑓𝑓
Results for Input Noise
• Like the output noise case we can obtain expressions for the estimators
H1(f) and H2(f).
• In this case one obtains
−1
𝑆𝑆𝑛𝑛𝑛𝑛 𝑓𝑓
𝐻𝐻1 𝑓𝑓 = 1 + 𝐻𝐻(𝑓𝑓)
𝑆𝑆𝑣𝑣𝑣𝑣 𝑓𝑓
𝐻𝐻2 𝑓𝑓 = 𝐻𝐻(𝑓𝑓)
• So in this case estimator H2(f) is the better estimator.
• The estimator H1(f) under-estimates the true transfer function.
‒ The degree of under-estimation depends on the SNR of the measured
input signal.
‒ The factor is again real and positive so does not affect the phase.
Input and Measurement Noise
• In the case where there is noise on the input and the output
x(t)
n2(t)
n1(t) +
System u(t) y(t)
v(t) +
H(f)
• Then
𝐻𝐻1 (𝑓𝑓) ≤ 𝐻𝐻(𝑓𝑓) ≤ 𝐻𝐻2 (𝑓𝑓)
𝐴𝐴𝐴𝐴𝐴𝐴 𝐻𝐻1 (𝑓𝑓) = 𝐴𝐴𝐴𝐴𝐴𝐴 𝐻𝐻 (𝑓𝑓) = 𝐴𝐴𝐴𝐴𝐴𝐴 𝐻𝐻2 (𝑓𝑓)
• So that H1(f) and H2(f) bracket the true spectrum.
2
‒ Since 𝐻𝐻2 𝑓𝑓 = 𝛾𝛾𝑥𝑥𝑥𝑥 𝑓𝑓 𝐻𝐻1 𝑓𝑓 if the coherence is close to one then the
bracket is tight.
The Coherence Function
• In practice before concerning oneself with the estimated FRF it is wise
to first consider the coherence function.
• If the coherence is small then the estimated FRF is possibly unreliable.
• There are several reasons why the coherence function may be less than
unity:
‒ Noise (as discussed in this lecture)
‒ The system may be non-linear
‒ There maybe other inputs which are not being measured, that contribute
to y(t).
‒ Estimation errors, i.e. the estimate of the coherence could be at fault.
o In particular the cross-spectrum estimate can be biased in frequency band
where the transfer function’s phase varies rapidly … like at a resonance.
Example
• Coherence function estimated using segment average for different
segment (FFT) sizes.
• The system is an AR system with a resonance at 2 kHz (as used
previously).
In all these cases no noise is added –

one expects the coherence to be 1.
There are 100,000 data points used
to compute the cross-spectra and
PSDs used to compute the coherence.
There is a dip of the coherence at
resonance because at resonance the
phase changes rapidly and the cross-
spectral estimate is biased.
This is a feature commonly seen in
real-world measurements.
Introduction to Systems:
Summary of the Theory of
Continuous Time Systems
Chuang Shi
November 2023
Credit to Paul White for creating the slides

Course Overview
• Recap of continuous time systems (this lecture)
– Overview of continuous-time systems and their general properties
– In-depth elaboration on four mathematical representations for
continuous-time linear time-invariant systems
• Digital systems
– Classes of digital systems
– Z-transforms
• Digital filter design
– Finite impulse response filters
– Infinite impulse response filters
174-730-538
Today’s Learning Outcomes
By the end of the session the students should be able to:
1. Recall different properties of continuous-time systems and describe their
basic physical meanings.
2. Identify a linear ordinary differential equation or a continuous-time linear
time-invariant system based on the form of a given differential equation.
3. Determine the transfer function of a continuous-time linear time-invariant
system based on its linear ordinary differential equation or impulse
response.
4. Sketch the frequency response of a continuous-time linear time-invariant
system based on its pole-zero diagram.
5. Explain the stability and causality of a continuous-time linear time-
invariant system based on a mathematical representation.
What is a System?
• A “system” is anything which takes one (or more or fewer)
inputs and creates one (or more outputs).
• Examples:
– A loudspeaker: the input being the voltage driving
the speaker and the output being the “sound”.
– A filter: the input is the unfiltered signal and the output is the filtered
signal (!)
– A structure: the input might be a force, the output

being the displacement/acceleration/velocity
at a point.
General Picture
• Assuming the input is x(t) and the output y(t).
• We represent a general system using:
• The input and output are analogue (continuous time) signals,

so the system is referred to as a continuous (or analogue)
system: in contrast to a digital systems which are the main
subject of this course.
• Sometimes there is no input, only an output is generated, such
systems are called “autonomous” systems.
What might we be interested in?
• Can one predict the output of a system for a known input?
– If we “know” the system and the input x(t), what is the output, y(t)?
• How do we characterise a system?
• If one knows the system and the output y(t), can one estimate
the input, x(t)?
– i.e. to remove the effect of the system from y(t), e.g. removing the
effect of distortions.
Linear Systems
• An important subclass of systems are the linear systems.
• Consider a system whose response to the input u(t) is w(t) and
whose response to v(t) is z(t).
• Then, for a linear system, the response, y(t), to a new
combined input, x(t):
x ( t ) c1u ( t ) + c2 v ( t )
=
is given by
y ( t ) c1w ( t ) + c2 z ( t )
=
where c1 and c2 are scalar constants.
• This is a form of the principle of super-position.
Time-Invariant Systems
• A system which does not evolve with time is said to be time-
invariant.
• Formerly, if x(t) is an input at time t which elicits the response
y(t), then x(t-T), which is the same input occurring T seconds
later, elicits the response y(t-T), i.e. the same response delayed
by T seconds.
• Systems which are both linear and time-invariant are referred

to as “LTI* systems”.
• The majority (all?) of this course will only consider LTI
systems.
* Linear Time Invariant
Comments on LTI systems
• An LTI system is a theoretical construct – no real world
systems are LTI - since LTI implies:
– Indestructibility:
Since the system is linear its response increases in proportion with the
amplitude of the input, regardless of how large the input becomes. You
can not “break” the system by driving it too hard.
For example, consider a loudspeaker, this suggests that the input can
cover any voltage range, even ±1,000,000 V!
– Immortality:
The system is the same for all time. It never cesses to be,
indeed it never started either!!
• This emphasises that an LTI system is a model of real system
– albeit a useful and powerful model.
• “All models are false but some models are useful.”
George E.P. Box
General Properties of Real-World
Systems
• Stability
– The output of a system should not grow in an
unbounded way, if the input is bounded.
• Causality
– The output, y(t), at any time t, can only depend
on inputs that have occurred before t. i.e. the
system is NOT magic, it cannot predict what is
to come!
• When we design and build a system to operate in “real-time” it

must be causal and as stable as possible.
A Real-Time System Example
If the disturbance is periodic, it can be decomposed into a series of
harmonic components, and because the system under control is assumed
to be linear, the amplitude and phase of each harmonic can be adjusted
independently, as recognised for the control of transformer noise by
William Conover in 1956.
LOUDSPEAKER MICROPHONE
TRANSFORMER
PHASE AMPLI- HARMONIC SOUND

AMPLIFIER SOUND
ANGLE TUDE SOURCE LEVEL
ANALYZER
METER
Real-Time Systems
• A real-time system is one that calculates its response “on-the-
fly”, so that y(t) is known at time t (or shortly after).
• Physical systems, like loudspeakers and structures, are
examples of real-time systems.
• Some digital systems, like phones or hearing aids,
must also be real-time.
• Non-real-time systems (or off-line systems) are usually based
on a computer and act on pre-stored data. Example problems:
– measure the input to a loudspeaker and then use a computer simulation
of the loudspeaker physics to create the output – might be used in
loudspeaker design.
– one has a recording of, say, speech in noise and wants to filter it to
remove the noise and so enhance the speech
174-730-538
Forms of Model
• There are 4 basic representations for continuous time LTI
systems.
1) Linear Ordinary Differential Equations (ODEs). Often
derived by consideration of the underlying physics.
2) Laplace domain, i.e. transfer functions, which are good tools
for assessing stability.
3) Frequency domain, i.e. frequency response functions, which
are easy to measure.
4) Time-domain, i.e. impulse responses, good for predicting the
output of a system given an arbitrary input.
• The above are not exclusive uses for each representation, they
are just those that they are (arguably) best suited towards.
174-730-538
Linear Ordinary Differential
Equations
• An ODE of the form
dpy dy dqx dx
a p p + ... + a1 + a= 0y bq q + ... + b1 + b0 x
dt dt dt dt
where the a’s and b’s are constants, represents an LTI system.
• Examples of systems that are not LTI include:

2
dy  dy 
sin ( 2πf 0t ) + y = dy xy
  + x +e =
x x
dt  dt  dt
Linear, but time varying equation Non-linear, time invariant equation, with
each term on the LHS being non-linear.
Transfer Functions
• The Laplace transform, L {}, has the extremely useful property
of changing a LTI ODE into an algebraic equation, because
 dy 
if L { y ( t )} = Y ( s ) then L   = sY ( s )
 dt 
• So that applying the Laplace transform to
dpy dy dqx dx
a p p + ... + a1 + a=
0 y bq q
+ ... + b1 + b0 x
dt dt dt dt
Leads to
a p s pY ( s ) + ... + a1sY ( s ) + a=
0Y ( s ) bq s q
X ( s ) + ... + b1sX ( s ) + b0 X ( s )
Transfer Functions (cont’d)
• Which we rearrange as:
( )
a p s p + ... + a1s + a0 Y ( s=
) ( )
bq s q + ... + b1s + b0 X ( s )
• From which the transfer function H(s) is defined:

Y ( s ) bq s q + ... + b1s + b0
= = H (s)
X ( s ) a p s + ... + a1s + a0
p
• For the systems we shall consider the transfer function has the
form of the ratio of two polynomials in the Laplace variable, s.
• Recall s is a complex valued variable.
Poles and Zeros
• The transfer function H(s) is a complex valued function of the
complex valued variable s.
• Visualising such a function is not easy.
• It turns out that if H(s) is the ratio of 2 polynomials, i.e.
Q(s)
H (s) =
P(s)
• Then the roots of the two polynomials convey all of the
important information regarding the system H(s).
• The poles are roots of P(s), i.e. the values of s such that P(s)=0
and hence the values of s where H(s)=∞.
• The zeros are roots of Q(s), i.e. the values of s such that Q(s)=0
and hence the values of s where H(s)=0.
Pole-Zero Diagram
• A transfer function is plotted as a pole-zero diagram, in which
just the positions of the poles and zeros are shown:
– Shows the complex plane of s (the s-plane) on which
• Poles are marked with an “×”
• Zeros are marked with an “o”
Stability
• The stability of a system is usually assessed using the transfer
function.
• A system is unstable if any of the poles have positive real
components, i.e. if any poles lie in the right half plane or if you
prefer to the right of the imaginary axis.
• In general, stability is not affected by the position of the zeros.
• Thus, simple inspection of the pole-zero diagram reveals
whether a system is stable or not.
Frequency Response (Analytic)
• The frequency response, H(f), of a system can be obtained
analytically if the transfer function is known.
• Specifically
H ( f ) = H ( s ) s= 2 πif
• That is to say you replace each occurrence of s in H(s) by 2πif

to get the transfer function.
• This is equivalent to evaluating the transfer function along the
imaginary axis in the s-plane.
All-pass
Enhance Eliminate
Magnitude
Frequency
Magnitude
Frequency
Magnitude
Frequency
Magnitude
Frequency
Frequency Response (Measurement)
• The frequency response of a system is something that can
commonly be measured.
• If the input to an LTI system is a sine wave at frequency f and
amplitude A, then the output will also be a sine wave at
frequency f but with amplitude A H ( f ) . The phase change
in that sine wave will be Arg { H ( f )}.
• So one way to measure a frequency response is to probe the
system with a set of sine waves at different frequencies:
– The amplitude of the output relative to the input, gives the magnitude
of the frequency response
– The phase change between the input and output gives the phase of the
frequency response.
Frequency Response
Impulse Response
• The final representation of an LTI system is it impulse
response, h(t).
• This is the output of the system when the input, x(t), is a Dirac
delta function, δ(t) – a “idealised” impulse.
– A Dirac delta function is a mathematical construct, not something that
can exist is the real world.
• One can seek to measure the impulse response of the system
using a “tap test” – applying a real impulse (as opposed to δ(t))
and measuring the response.
Properties of the Impulse Response
• For a stable system, the impulse response decays away.
Stable Unstable
• For a causal system, the impulse response is zero before t=0.

Causal Acausal Anti-causal
• The Laplace transform of the impulse response is the transfer

function, H(s).
• Similarly, the frequency response, H(f), is the Fourier transform of
impulse response.
Using the Impulse Response
• A useful property of Dirac delta functions is the “sifting
property”, which states that
∞
x (t )
= ∫ x ( u ) δ ( t − u ) du
−∞
which means that the signal x(t) can be thought of as being
composed of Delta functions.
• The response of the system to each δ(t-u) is an impulse
response, delayed by u seconds, i.e. h(t-u).
• So the overall response of the system, y(t), to x(t) is:
∞
=y (t ) ∫ x ( u ) h ( t − u ) du
−∞
Convolution
• Hence, the output y(t) to any input x(t) can be computed using
the integral ∞ ∞
y (t )
= ∫ x ( u ) h ( t − u=
) du ∫ h ( u ) x ( t − u ) du
−∞ −∞
• This is the convolution integral, and we commonly write

=y ( t ) h=
(t ) * x (t ) x (t ) * h (t )
• Note asterisk “*” denotes convolution and NOT
multiplication, like it does in MATLAB (and other computer
languages).
Observations about LTI Systems
• The exponential function plays a central role in the analysis of
LTI systems.
• For example, the Laplace transform is defined as:
L { x=
( t )} X=
(s) ∫ x ( t ) e − st
dt
i.e. the Laplace transform is defined as the integral, when the

signal is multiplied by an exponential function.
• Exponentials are eigen-functions for the operation of
differentiation: d
{ }
dt
e at = ae at
i.e. differentiating an exponential results in a scaled
exponential, so that differentiating an exponential does not
change its shape.
Summary
ODE
dpy dy dqx dx Underlying
+ +
a p p ... a1 + a0 y bq q + ... + b1 + b0 x
=
dt dt dt dt Physics
Laplace Not discussed

Fourier
transform here
Transform
FRF
Transfer s=2πif
bq ( 2πif ) + ... + b1 2πif + b0
q
Function bq s q + ... + b1s + b0
H (s) = H(f )= Measurement
a p s p + ... + a1s + a0 a p ( 2πif ) + ... + a1 2πif + a0
p
f=s/2πi
P(s)=0 Fourier
Q(s)=0 Laplace
Transform
transform
Convolution Response to
any input
Impulse ∞
Response =y (t ) ∫ x ( u ) h ( t − u ) du
−∞
Is the system Is the system

stable? causal?
174-730-538
Objectives
• This part of the course:
1) Defines what is meant by a digital system
2) Reconstructs the edifice on the previous o/h for a digital system
• For a digital system parallels to all the concepts just discussed
exist.
• This needs one to define digital equivalents of:
– ODEs (difference equations)*
– Fourier transforms (discrete Fourier transform (DFT))
– Laplace transform (z-transform)
– Convolution (digital convolution or convolution sum)
– Dirac delta function (Kronecker delta function)
– A criterion for stability
* Terms given in brackets are the names of these digital equivalents – for later reference.
Worked Example - Physical Model
• Model of car suspension
• Damping force: c ( x ( t ) − y ( t ) )
d
dt
= cx − cy
• Spring: k ( x ( t ) − y ( t ) ) =−
kx ky
Obtain an ODE by considering
the physics of the problem.
• Newton’s laws give: my = cx − cy + kx − ky
c k c k
⇒ 
y + y + y= x + x
m m m m
Note: Slides with the black block in the top right-hand
corner are “additional” material - appendices
• Compute the transfer function, by Laplace transforming the ODE:
 c k   c k c k
y + y + y =cx + kx  = s 2 + s +  Y ( s ) = s +  X ( s )
L  
 m m   m m m m
c k
 s+ 
Y (s)
=
H ( s ) =  m c m k
X (s)  2 
 s + s + 
 m m
c k k
Zeros: solution of s + =
0 ⇒ s =
−
m m c
c k −c ± c 2 − 4km
Poles: solution of s + s+ = 0 ⇒ s =
2
m m 2m
Stable as long as c,k > 0.
• Frequency Response, computed from transfer function
c k
 2πif + 
H ( f ) H=  m m
= ( s ) s= 2 πif
 c k
 −4π f + 2πif + 
2 2
 m m
2 2
k 2 2 c 
  + 4π f  
H(f )
2
=  m  m
2 2
 k 2 2 2 2 c 
 − 4 π f  + 4 π f  
m  m
k
2π
• Impulse response, computed from the transfer function
−c + c 2 − 4km −c − c 2 − 4km
=
Defining: s1 = , s2
2m 2m
 c k 
  m s + m   k − s k − s1 s t
h (t ) L 
= −1  =  e +
2 st
 1 2
e
 ( s − s1 )( s − s2 )  s2 − s1 s1 − s2
 
which has the general form
=h ( t ) Ae −βt sin ( 2πf 0t + θ )

Autonomous Systems
• Recall an autonomous system is one in which there is no input
(or you might consider that x(t)=0).
• One example of such a LTI system is:
d2y
2
+ ω0 y =0
2
dt
• This system might be regarded as a sine wave generator, since
the solution of the equation is:
y ( t ) A sin ( ω0t + θ )
=
• The values of A and θ depend on the initial conditions supplied
to the system.
More on Laplace Transforms
• The Laplace transform is usually defined as the unilateral
transform: ∞
L { x=
( t )} X=
( s ) ∫ x ( t ) e− st dt
0
• The bilateral form of the Laplace transform also exists and

used largely by mathematicians. This bilateral form is:
∞
X (s) = ∫ x ( t ) e − st dt
−∞
• The only difference between these forms is the lower limit of
integration. If x(t)=0 for t<0, then the two forms are identical.
For a system, this is equivalent to assuming causality.
Unilateral Example
• Consider a decaying exponential:
x1 ( t )= e − at t≥0 a>0
= 0 t<0 Strictly for a unilateral transform
we don’t need to specify this.
∞ ∞ ∞
X1 ( s ) x1 ( t ) e − st dt
−( s + a )t
∫ ∫ e dt ∫
− at − st
=
• Then = =
e e dt
0 0 0
∞
−( s + a )t
 e   1  1
= −  = 0 −  −  =
 s + a 0  s + a  s+a
A familiar result
On the basis that Also note you get this result for
- I hope!
e ( ) = 0 for t = ∞ a bilateral transform for this example
− s+a t
• There are some subtleties in this result that you may, or may
not, have considered previously.
Further Consideration
• Let us temporarily consider s as a real variable, and consider:
“Does the previous Laplace transform hold for all s?”
• Consider the integral ∞
−( s + a )t
∞
 e 
X1 ( s ) = ∫ e
−( s + a )t
dt =  − 
0  s + a 0
• On the previous slide it was assumed e ( ) → 0 as t → ∞
− s+a t
• If s+a<0 then e ( ) is an exponential which is increasing (not

− s+a t
decaying) so that lim e ( ) = ∞

− s+a t
t →∞
• In which case X1(s)=∞!

• So strictly X 1 ( s ) = 1 only if s > -a
s+a
Region of Convergence (ROC)
• Strictly the Laplace transform of an exponential should be
written as: General form of the condition
1
X1 ( s ) Re {s} > − a
when s is complex
=
s+a
• The condition on s defines the region of convergence for the
Laplace transform – the values of s for which the expression
holds.
• For values of s outside the ROC the Laplace transform does
not converge, informally it takes the value ∞.
• All Laplace transforms have such a ROC – normally they are
not considered (in Engineering at least!)
• Because this signal is causal (i.e. it is zero for t less than 0)
exactly the same arguments hold regardless of whether we
consider a unilateral or bilateral transform.
Example of Bilateral Transform
• Consider the anti-causal signal (anti-causal signals are zero for
t>0).
x2 ( t ) =
−e − at t ≤ 0
= 0 t >0
• Bilateral transform:
−( s + a )t
0
∞ 0
e 
X2 (s) =∫−∞ x2 ( t ) e dt =
−( s + a )t
− st
−∫ e dt = 
−∞  s + a  −∞
e( ) = 0 when Re {s} < − a in which case
− s+a t
t =−∞
1
=X2 (s) for Re {s} < − a
s+a
A Tale of Two Signals
• The two signals, x1(t) and x2(t), have the same functional form
for their (bilateral) Laplace transforms:
1
1 (s)
X= X=2 (s)
s+a
but with different regions of convergence Re{s}>-a and
Re{s}<-a, respectively.
• If only causal signals are considered no such ambiguity can
arise, which is one reason in favour of unilateral Laplace
transforms, because in the “continuous world” systems are
real-time, so must be causal.
• For the digital equivalent of the Laplace transform (the z-

transform) we shall encounter similar ambiguities.
Difference Equations
Overview
• Difference equations
• Fibonacci sequence
• Penguin Island Example
– Solution of autonomous (unforced) first order difference equations
– The solution of forced first order difference equations
– Steady state solutions
– The solution of second order difference equations – using a trial
solution.
• Fibonacci
• Adult penguin population
Digital Signals
• A digital signal is a sequence of values defined at a discrete set
of points in time, we might write x[n]*:
x 0 = 3, x 1 = −1.5, x 2 = 1, x 3 = 0.2,
• Commonly such digital signals are obtained by sampling a

continuous time signal, x(t), so that
x  n = x ( t ) t =nD
In which case x[n] represents the nth sample of the signal and D
is the sampling interval, with fs=1/D is the sampling frequency.
* Note on notation: I shall use square brackets for a digital signals/systems x[n] and round brackets for
continuous signals x(t), this emphasises the form of the signals, but it is only a device to help clarify the
difference between continuous and digital processes. In many texts you will find x(n) used.
Difference Equations
• Analogue/continuous systems might be represented as
ordinary differential equations (ODEs).
• The idea of a derivative does not apply directly to a digital
signal, so ODEs cannot be used for digital systems.
• An alternative form of equations is necessary for digital
systems.
• For a digital system we consider difference equations (or
recurrence equations), instead of ODEs.
Listen to http://www.bbc.co.uk/programmes/b008ct2j to find out more about Fibonacci sequences.
Example: Fibonacci Sequence

• The Fibonacci sequence is:
Leonardo Fibonacci
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, …. (1170 - 1250)
• Each term in this sequence is the sum of the previous two.

• We can write a difference equation to represent the Fibonacci
sequence, x[n]*:
x n = x n −1 + x n − 2
• To check, start with x[0]=0 and x[1]=1 (these are the initial
conditions) then applying the equation:
n = 2 : x  2 = x 1 + x 0 = 1 + 0 = 1
n = 3: x 3 = x  2 + x 1 = 1 + 1 = 2
n = 4 : x  4 = x 3 + x  2 = 2 + 1 = 3
* Note that x[n] is the “output”
of the “Fibonacci system”, normally n = 5 : x 5 = x  4 + x 3 = 3 + 2 = 5
I use x as an input, don’t be confused
this is just a temporary “glitch” in the n = 6 : x 6 = x 5 + x  4 = 5 + 3 = 8
notation.
Comments on the
Fibonacci Example
• One can continue by repeating the process at the bottom of the
previous overhead and we can find any term in the Fibonacci
sequence, e.g. the 100th term.
– For a human this is simple, but tedious and prone to error.
– For a computer it is trivial and can be done very efficiently.
• Thus we can “solve” this difference equation, without using
any analytic tools.
• Compare this to ODEs, where one can not, in general, solve
the equation without understanding calculus.
• Note that the Fibonacci sequence is unstable, it grows
exponentially. 21
• Difference equations are much EASIER than

differential equations.
34
Simple Population Models
• Difference equations are commonly used to model
populations.
• The population in the nth year, P[n], is related to the
population in earlier years.
• The simplest form is to assume:
– that a fixed proportion of the population each year die
– the number of births is also a fix proportion of the population
• Obviously, this is a greatly simplified model of a population.
Example: Penguin Island
• Consider an isolated island with
a population of penguins.
• Each year 25% of the penguins die.
• Each year the number born is
20% of the previous year’s
population.
• So in terms of a difference equation:
P  n = P n −1 − 0.25P n −1 + 0.2P n −1 = 0.95P n −1
 P  n − 0.95P n −1 = 0 This is a first order difference equation, the
Fibonacci equation was second order.
• So this year’s population is 95% of the preceding year’s.

How many Penguins?
• If in year 0 there are N penguins, how many will there be in
future years?
• This is easy to calculate “by hand”:
P 0 = N ; P 1 = 0.95P 0 = 0.95N ;

P  2 = 0.95P 1 = 0.95  0.95P 0 = 0.952 N
P 3 = 0.95P  2 = 0.95  0.952 N = 0.953 N
• There is a pretty clear pattern here; which we can write

mathematically as, in general:
P  n = 0.95n N
• This is the “solution” to our difference equation:
P  n − 0.95P n −1 = 0
Notes on the Solution
• Consider the equation and its solution:
P  n − 0.95P n −1 = 0
P  n = 0.95n N
• The term P[n-1] according to the solution is:
P  n − 1 = 0.95n−1 N
• So that
0.95P  n − 1 = 0.95  0.95n−1 N = 0.95n N = P  n
this is indeed the solution to this equation!
• Recall that N is the value in year 0, i.e. it is an initial condition.
• The general solution is of the form:
P  n = A0.95n
with the value of the arbitrary constant A being determined by
the initial condition – like we see when solving ODEs.
A More General Model
• Consider a general version of the same population model:
– Each year the proportion that die is a
– The proportion that are born is b
• The difference equation for this model is:
P  n = P n −1 − aP n −1 + bP n −1 = (1 − a + b ) P n − 1
• To simplify we write (1-a+b)=g and the equation becomes:

P  n − gP n −1 = 0
• The solution to this equation is:

P  n  = g n P  0
Forms of the Solution
1. If g<1 (more penguins die each year than are
born) then the penguin population is doomed
P  n  = g n P  0 → 0 n→
2. If g>1 (more penguins are born each year than

die) then the penguin population is destined
to overrun the island (and the world!)
P  n  = g n P  0 →  n→
3. If g=1 (same number are born and die)

then the population will be constant
P  n = P 0 n
Comments on this Population Model
• Our simple first order difference equation is not a good model
of populations.
• Populations are only maintained if the birth rate exactly
matches the death rate for all time; otherwise the population
either increases or decreases exponentially.
• The difference equation
P  n − gP n −1 = 0
is first order: it relates the data at n to the data 1 sample ago.
• It is linear and time-invariant because g is a constant (it does
not depend on n or on P[n]).
• This is also an autonomous system – there is no input.
A Forced Population Model
• Consider a scenario where Greenpeace hear of the isolated
island where the doomed penguins live and every year they
ship Z[n] penguins, breed in zoos, from the mainland on to the
island.
• How does this affect things?
• The population model becomes:
P  n = gP  n − 1 + Z  n
P  n − gP  n − 1 = Z  n
• For simplicity let us assume that the input (the number of

penguins shipped each year) is a constant, Z[n]=M.
Solving the Forced Equation
• We can again solve this by hand, assuming P[0]=N then
P  0 = N
P 1 = gP 0 + M = gN + M
P  2 = gP 1 + M = g ( gN + M ) + M = g 2 N + M ( g + 1)
( )
P 3 = gP  2 + M = g g 2 N + gM + M + M = g 3 N + M g 2 + g + 1 ( )
• There is a pattern emerging again and we can write
n−1
 1 − g n

P  n = g n N + M  g = g N +
m n
M
m=0  1− g 
Nb. See slides on finite and infinite geometric progressions
Form of the Solution
• If g<1, then the penguin population will tend to a steady state
solution
 1− gn  1
P  n = g N + 
n
M → M n →
 1− g  1− g
• If g>1, then the penguin population will grow in an
uncontrolled way
 g n
−1 
P  n = g N + 
n
M →  n → 
 g −1 
• If g=1, then the population will also grow:
n−1
P  n = N + M  1 = N + nM →  n → 
m=0
Note we need to use a different analysis for the case g=1 to avoid division by zero.
Steady State Solutions
• Often we might be interested in knowing if a difference
equation reaches a constant value, i.e. a steady state.
• These solutions, if they exist, can be found easily.
• If it does reach a constant value, then the values of P[n] do not
depend on n, we can write them as a constant, C, i.e.
= P  n −1 = P n = P n + 1 = P n + 2 = .... = C
• Thus to find the steady state solution, replace every occurrence
of P[ ], regardless of the value of n, by C.
• For example: P  n − gP n − 1 = M
• Replace P[n] and P[n-1] by C and solve to find C.
M
C − gC = M  C =
1− g
Comments
• The steady state solution identified matches that found by
solving the equation and seeing what happens as n increases
(for g<1).
• The existence of a steady state solution, does not mean that the
solution will reach that value. For example, in the case where
g>1, the solution tends to ∞, because it is unstable, albeit that a
steady state solution exists.
• This method works even when the difference equation is non-
linear.
Comments on the Solution
• The solution to this forced equation consists of 2 parts (g≠1)
 1− gn   M  n M
P  n = g N + 
n
M =  N −  g +
 1− g   1− g  1− g
Solution to the unforced equation, Term due to forcing, the steady

the transient response* state response
• This mirrors what happens when we solve ODEs where the

solution to a forced equation consists of the sum of:
– the solution of the unforced system (the complementary function)
– the solution to the forced system (the particular integral)
* Notethe general form of the solution of the unforced system was Agn this term has
exactly that form, just the arbitrary constant, A, is rather more complicated.
More Complex Penguin Models
• Our first order linear difference equation is a poor population
model.
• Can we make it better by making it more complicated?
• Let us try:
Consider a more realistic model based on splitting the
population into parts:
– Adult penguins, older than 1 year who can breed
– Juvenile penguins, younger than a year, who can not breed
• The number of adult penguins is denoted Pa[n] and the
juvenile population is Pj[n] – the total population P[n] is the
sum of these two sub-populations.
Rules for the new Model
• Juvenile and adults die at the same rate: still a constant
proportion, a, of their populations.
• The number of penguins born each year is a fixed proportion of
only the adult population.
• The surviving juveniles from the previous year become adults.
• Putting this in terms of equations:

Pj  n = bPa  n − 1
Pa  n = (1 − a ) Pa  n − 1 + (1 − a ) Pj n − 1 = (1 − a ) P n − 1
Surviving adults from last year Surviving juveniles from last year
• So from the first equation Pj[n-1]=bPa[n-2], using this:

Pa n = (1 − a ) Pa n −1 + (1 − a ) bPa n − 2
The New Model
• We now have a model of the adult penguin population based
on our rules
Pa n = (1 − a ) Pa n −1 + (1 − a ) bPa n − 2
• We shall simplify this by using a=(1-a) and b=b(1-a), so
Pa  n = aPa n −1 + bPa n − 2
• This is a second order equation, since it relates the value at
time n to that at n-1 and n-2.
• The order of the equation is determined by the difference
between the largest and smallest value of n in the equation, so
y n + 0.2 y n −1 + 0.6 y n − 3 = x n
is an example of a third order equation.
Solving the Second Oder Equation
• To solve this equation “by hand” we need two initial
conditions (like when solving ODEs).
• Let us say Pa[0]=N1 and Pa [1]=N2 and we can proceed:
Pa  2 = aPa 1 + bPa 0 = aN2 + bN1
Pa 3 = aPa  2 + bPa 1 = a ( aN2 + bN1 ) + bN2 = a 2 N2 + abN1 + bN 2
Pa  4 = aPa 3 + bPa  2 = a3 N2 + a2bN1 + abN2 + abN 2 + b2 N1
• The pattern in this solution is hard (not impossible) to discern.

• If we had values for all of the parameters (a, b, N1 and N2) this
would be a sequence of numbers that we could easily generate,
or if we used a computer very easy to program.
Comments
• So for a particular set of parameters we can easily work out the
sequence of values that this system generates.
• But we can’t easily, using this approach, generate a general
solution.
• So we can’t assess stability, see how the parameter choices
affect the solution, etc.
• In the following we shall seek a method for solving such

linear, time-invariant difference equations.
• This method will be a simple variation of a technique for
solving unforced LTI ODEs, which we first recap.
Solving Unforced LTI ODEs
• The method we shall consider is based on a trial solution.
• It is demonstrated via an example (third order):
d3y d 2 y dy
3
+2 − − 2y = 0
dt dt dt
• We try the solution* y = Aert and need to find A and r.
dy
• Note that = Arert
dt
• So the ODE becomes:

Ar 3ert + 2 Ar 2ert − Arert − 2 Aert = 0
 r 3 + 2r 2 − r − 2 = 0
* This solution is “tried” because we know that all ODEs of this kind have this form of solution!
Characteristic Equation
• The polynomial:
r 3 + 2r 2 − r − 2 = 0
is referred to as the characteristic equation.
• It defines the general behaviour of the solution and can be
obtained in a variety of ways, for example using Laplace
transforms or operator theory.
• For a third order equation there are 3 roots, in this case they
are:
r = −2, −1,1
• Each root corresponds to a possible solution of form y = Aert
General Solution
• The most general solution to our equation is obtained by
combining the three individual solutions.
• The three individual solutions are:
A1e−2t , A2e−t , A3et
where there are now 3 arbitrary constants, A1, A2 and A3, one
for each of the solutions.
• The general solution is:
y ( t ) = A1e−2t + A2e−t + A3et
• To evaluate the arbitrary constants, initial conditions are
needed.
• The et term means the solution will be unstable.
• We can determine stability by examining the roots of the
characteristic equation, if any have positive real parts, the
equation is unstable.
Solving Difference Equations
• First recall our Fibonacci example:
x  n − x n −1 − x n − 2 = 0
• To solve this try a solution of a form x  n = Ahn
• We use this because it is the form of the solution of the first
order difference equation.
• Substituting this into the Fibonacci equation:
x  n = Ahn ; x  n − 1 = Ahn−1; x  n − 2 = Ahn−2
• So that
Ahn − Ahn−1 − Ahn−2 = 0
• Dividing by Ahn-2 gives:
h2 − h − 1 = 0
Solution
• This generates a polynomial for h which defines its two
possible values:
1+ 5 1− 5
h1 =  1.618, h2 = − 0.618
2 2
• So the general solution is:
n n
 1+ 5   1− 5 
x  n = A1   + A2  
 2   2 
• Using the initial conditions x[0]=0 and x[1]=1.
• x[0]=0 means that A1=-A2.
• x[1]=1 means that A1=1/√5, A2=-1/√5.
• Hence the Fibonacci sequence can be written as:
n n
1  1+ 5  1  1− 5 
x  n =   −  
5 2  5 2 
Comments
Term decays with n – this term decays
• Consider the solution because |(1-√5)/2| < 1
n n
1  1+ 5  1  1− 5 
x  n =   −  
5 2  5 2 
Term which is unstable – increases with n
• For large n, the Fibonacci numbers can be approximated by:
n
1  1+ 5 
x  n   
5 2 
e.g. x[10]=55, using the above approximation you get 55.004.
• Despite the √5’s in the above solution, the values of x[n] must
be all integers (think about how the sequence is defined).
• The value h1 (=(1+√5)/2) is a special number called the golden
ratio, which has long establish aesthetic properties.
For more info on the golden ratio see: http://www.maths.surrey.ac.uk/hosted-sites/R.Knott/Fibonacci/phi.html.
Solving the Second Order
Population Model
• The model is: Pa  n = aPa n −1 +bPa n − 2
• The Fibonacci equation is a special case of this model with
a=b=1, which is the case if there are no deaths (a=0) and b=1
the number of births equals last years population.
• We again assume a solution of the form
• Leading to the characteristic equation Pa  n = Ahn
h2 − ah −b = 0
• Providing two solutions for h
a + a2 + 4b a − a2 + 4b
h1 = and h2 =
2 2
• The general solution is:
Pa  n = A1h1n + A2h2 n
Character of the Solution
• The nature of the solution depends on the values of h1 and h2.
• Since the solution is based on hn whether it grows or decays
depends on the magnitude of h (bearing in mind it could be
complex).
• We shall assume (without loss of generality) |h1| ≥ |h2|.
• So we can say:
– If |h1|<1 then the solution is decaying (note |h2|<1 must also be true).
– If |h1|>1 then the solution is unstable, grows exponentially.
– If |h1|=1 then the solution is neither grows nor decays.
• Further:
– If h1 is complex (h1 and h2 form a conjugate pair) then the solution
oscillates, the behaviour of the amplitude of these oscillations is
defined by the above statements regarding |h1|: these could be
“damped” oscillations, unstable oscillations or constant oscillations.
Use of Second Order Systems to
Model Populations
• Increasing the order of the model has not fundamentally
improved the realism of the population model.
• The population are still either going to grow or decay
exponentially; unless the birth and death rates are in precise
balance.
• More realistic population models require non-linearities: the
environment imposes bounds on population growth, if there
are too many penguins, a higher proportion die.
• Also population models need to take into account the predator
and prey populations, in the case of penguins:
– How many fish are there?
– How many leopard seals are there?
A More Realistic Example
• Consider a digital signal, x[n], with the samples:
x[n]={0, 0.9477, -0.1547, -0.8323, 0.2758, 0.7081, -0.3652, -0.5811, 0.4253, 0.4564,
-0.4593, -0.3380, 0.4707, 0.2290, -0.4633, -0.1315, 0.4407, 0.0471, -0.4064, 0.0237,
0.3639, -0.0809, -0.3161, 0.1248, 0.2656, -0.1563, -0.2148, 0.1765, 0.1656, -0.1867,
-0.1194, 0.1884, 0.0772, -0.1831, -0.0400, 0.1722, 0.0081, -0.1571, 0.0183, 0.1392,
-0.0393} Sampling points
x(t) x(nD)
In fact this signal was obtained by sampling
a continuous time signal, x(t), which was a
a decaying exponential, as shown on the right.
This signal is sampled with D=1/40, i.e. a
sampling frequency of 40 Hz.
x[n]
This form of graph is called a “stem plot” and
is commonly used to represent sampled signals
the vertical lines are just for “effect”.
Applying a Difference Equation
• We can compute “by hand” the output if we apply this input
to a difference equation.
• For example, consider the difference equation:
y  n = x n − 0.5 y n −1 + 0.3 y n − 2
assume the initial conditions y[-1]=y[-2]=0.
• Then for n=0 according to the equation

y 0 = x 0 − 0.5 y  −1 + 0.3 y −2
in this example, it happens that x[0]=0, and given the zero
initial conditions one has y[0]=0.
• For n=1: y 1 = x 1 − 0.5 y 0 + 0.3 y −1
y[0]=0 (as we just calculated), x[1]=0.9477, y[-1]=0, so
y 1 = x 1 = 0.9477
• For n=2: y  2 = x  2 − 0.5 y 1 + 0.3 y 0

y[1]=0.9477, x[2]=-0.1547, y[0]=0, so
y  2 = −0.1547 − 0.5  0.9477 + 0 = -0.6285
• For n=3: y 3 = x 3 − 0.5 y 2 + 0.3 y 1

y[2]=-0.6285, x[3]=-0.8323, y[1]=0.9477, so
y 3 = −0.8323 − 0.5  −0.6285 + 0.3 0.9477 = -0.2337
• And so on ….
The Full Output
• Repeating this for all of the inputs, one gets:
y[n]={0, 0.9477, -0.6285, -0.2337, 0.2041, 0.5359, -0.5719, -0.1344, 0.3209, 0.2556
, -0.4908, -0.0159, 0.3314, 0.0585, -0.3931, 0.0826, 0.2815, -0.0689, -0.2876, 0.1469
, 0.2042, -0.1389, -0.1853, 0.1758, 0.1221, -0.1646, -0.0959, 0.1750, 0.0493, -0.1588
, -0.0251, 0.1533, -0.0070, -0.1336, 0.0247, 0.1198, -0.0444, -0.0990, 0.0545, 0.0822,
-0.0641}
• One can easily compute the y[n]

output of any difference equation
given the input sequence x[n].
• Or: given a list of numbers
representing the input signal we
can create an output list.
The Time Index in a Difference
Equation
• When defining a difference equation, we usually write:
y  n = x n − 0.5 y n −1 + 0.3 y n − 2
• The absolute value of n is unimportant: the same difference
equation can be written as
y  n + 1 = x n + 1 − 0.5 y n + 0.3 y n −1 Adding 1 to n
or even
y  n −1 = x  n −1 − 0.5 y n − 2 + 0.3 y n − 3 Adding -1 to n
or even
y  n + 37 = x n + 37 − 0.5 y n + 36 + 0.3 y n + 35 Adding 37 to n
• It is only the relative indices of terms that is important:

– This equation is: the output is equal to the current input, minus half the
output one sample ago, plus 0.3 times the output before that.
A Non-Linear Population Model
• An alternative, non-linear, population model is the logistic
map:
x  n = rx  n − 1 (1 − x n − 1) = rx n − 1 − rx n − 1
2
• x[n] is the fractional population: the current population divided

by the maximum sustainable population – so x[n] always has a
value between 0 and 1.
• r is the model parameter – the value of which makes very big
changes to the behaviour of the system – see later slides.
• If x[n] is small the population grows linearly.
• If x[n] is close to one, then x[n+1] will be small - a population
crash!
Steady State Solutions
• To find the steady state solutions of the logistic equation
x n − rx n −1 (1 − x n −1) = 0
we set x[n]=x[n-1]=C and solve for C.

• Leading to
C − rC (1 − C ) = C (1 − r (1 − C )) = 0
• Either
1 r −1
C = 0 or 1 − r (1 − C ) = 0  C = 1 − =
r r
• So the 2 possible steady state solutions are
1− r
x  n = 0 or x  n =
r
Some Solutions I
• Start with r=0.5, with initial condition x[0]=0.25.
x 1 = 0.5  0.25  (1 − 0.25) = 0.0938
x  2 = 0.5  0.0938  (1 − 0.0938) = 0.0425
x 3 = 0.5  0.0425  (1 − 0.0425) = 0.0203
x  4 = 0.010; x 5 = 0.0049; x 6 = 0.0025;
• For these conditions the equation tends
to the steady state solution at 0.
MATLAB code to generate 100 points of the logistic equation

and plot those values is:
x(1)=0.25; r=0.5;
for n=2:100, x(n)=r*x(n-1)*(1-x(n-1));end
stem(x)
Some Solutions II
• r=1.5 and with initial condition x[0]=0.25.
x 1 = 1.5  0.25  (1 − 0.25) = 0.2813
x  2 = 1.5  0.2813  (1 − 0.2813) = 0.3032
x 3 = 1.5  0.3032  (1 − 0.3032) = 0.3169
x  4 = 0.3247; x 5 = 0.3289; x 6 = 0.03311;
• For these conditions the equation
tends to the steady state solution
at (r-1)/r=0.3333.
• In this case, as long as 0<x[0]<1
the solution always tends to 0.3333
Some Solutions III
• r=2.9, with initial condition x[0]=0.25.
• Steady state value is (r-1)/r=0.6552
• The first 7 values are
{0.25, 0.5437, 0.7194, 0.5853, 0.7039, 0.6045, 0.6934}
• The solution tends towards its

steady state, but now oscillates
as it approaches it.
Some Solutions IV
• Changing r just a bit to 3.1, still using x[0]=0.25,
with (r-1)/r=0.6774
• The first 7 values are
{0.25, 0.5813, 0.7545, 0.5742, 0.7580, 0.5687, 0.7604}
• In this case the solution does not

reach a steady state, it alternates
between 2 values (0.5580 and
0.7646).
Some Solutions V
• Changing r to 3.5, x[0]=0.25.
• In this case the solution eventually reaches a state where it
repeatedly cycles round 4 values: 0.5009, 0.875, 0.3828 and
0.8269.
• This is called a period 4

solution, as opposed to the
previous case, which is a
period 2 solution.
• Changing r has doubled the
period of the solution.
Some Solutions VI
• Changing r to 3.7, x[0]=0.25.
• The solution is now erratic, like it is random, no patterns
emerge.
• This is “chaos”.
Yet More Solutions
Short time plots (20 samples) Long time plots (200 samples)
• r=3.841
• Period 3 solution
• r=3.844
• Period 6 solution
• r=3.86
• Chaos
Bifurcation Diagram
• Consider running the logistic equation for a long time for
various values of r.
• Then for each run form a histogram of the points generated.
a) b) c) d) e)
• Stacking those histograms
next to each other, gives
you this plot.
Some example values of r considered here:
• 2.9 a stable solution

• 3.1 a period 2 solution
• 3.5 a period 4 solution
• 3.7 a chaotic solution
• around 3.84 where the solution’s
character was sensitive to r.
Comments on the Bifurcation
Diagram
• If one zooms in on a small section of the bifurcation diagram,
its structure remains the same, regardless of how much you
magnify it.
• This is called self-similarity.
• This bifurcation diagram is
an example of a fractal, an
object whose dimension is
fractional. In this case this
diagram is “between” a line
and a plane.
Discussion
• The logistic equation is a “darling” of the chaos world – you
can find a lot about it in popular science books and alike.
• This very simple equation offers a rich variety of behaviours
simply by changing the one parameter r.
– Note the choice of initial condition does not change the character of the
solution.
• It is a much more suitable model for population modelling that
LTI difference equations, since it can have non-zero, stable
solutions (1<r<3).
• However, this course will concentrate on LTI systems and the
analysis tools we develop there can not be applied to the
logistic equation (or any non-linear system).
Basics of Digital Systems
Chuang Shi
Continuous Time & Discrete Time
① Waveform and Sequence

② Analogue and Digital Signals
from Internet
from Internet
C/D D/C
DT System
Conversion Conversion
Outline
• Order of a digital system
• Classes of digital systems
– Moving Average (MA) systems
– Auto-Regressive (AR) systems
– Auto-Regressive Moving Average (ARMA) systems
• Kronecker (digital) delta function
• Sifting property for sequences
• Digital impulse response
– Finite Impulse Response (FIR) systems
– Infinite Impulse Response (IIR) systems
• Digital convolution
Opening Comments
• Difference equations are the digital equivalents of differential
equations.
• They relate an input sequence x[n] to an output sequence y[n].
• Difference equations are not normally developed by
consideration of the physics.
• We can easily compute the output (or “solve”) a difference
equation for a given input manually, or by computer, without
using analytic tools.
– Contrast this with differential equations which require the use of
“University level” mathematics.
• Difference equations can be applied to any time series: the
input does not have to be expressed as a mathematical
function.
Moving Average (MA) Difference
Equations
• These are difference equations in which y[n] depends only on
values of the input (x[n] for various n) and NOT on other
values of y[n].
• For a causal system:
y [=
n ] b0 x [ n ] + b1 x [ n − 1] + b2 x [ n − 2] + .... + bL x [ n − L ]
L
= ∑ b x [n − k ]
k =0
k Depends only on past inputs
• For an acausal system:

[ n] b− L x [ n + L1 ] + ... + b−1 x [ n + 1] + b0 x [ n] + b1 x [ n − 1] + .... + bL x [ n − L2 ]
y= 1 2
L2
∑ b x [n − k ]
k =− L1
k
Depends on past and future inputs
Comments
• MA digital systems are usually the easiest to work with.
– As we shall see, later in the course, they can be configured to have
useful properties.
• For the causal system on the previous o/h the order of the filter
is L (although there are L+1 coefficients, bk).
• For the acausal system the order is L1+ L2.
• The output is formed as the weighted sum of input values.
• The system coefficients bk defined the system’s behaviour.
• Most of the following will consider only causal systems.
MA Systems
MA Example
• Input signal x [ n ] ={1, −2, 2,3, −1,0, −3}
• Applied to the MA system
y [ n ] = 0.5 x [ n ] − 0.25 x [ n − 1] − 0.25 x [ n − 2]
• Assuming x[n]=0 for n<0.
y [ 0]= 0.5 × 1= 0.5
y [1] = 0.5 × −2 − 0.25 × 1 = −1.25
y [ 2] =0.5 × 2 − 0.25 × −2 − 0.25 × 1 =1.25
y [3] =0.5 × 3 − 0.25 × 2 − 0.25 × −2 =1.5
y [ 4] =
−1.75, y [5] =
−0.5, y [ 6] =
−1.25,
y [ 7 ] 0.75,
= = y [8] 0.75,
= y [9] 0,=
y [10] 0,
Input stops at this sample
y [ n] = {0.5, −1.25,1.25,1.5, −1.75, −0.5, −1.25,0.75,0.75,0,0...}

The COVID-19 chart I wish I didn’t have to make - Datawrapper Blog
order of the MA system depends on the coefficients
“The larger the window, the smoother the line becomes, but we’re also
shrinking the line from both ends. ”
The COVID-19 chart I wish I didn’t have to make - Datawrapper Blog
order of 13, but acausal
“A centered-moving average is computed from the previous seven

days, today and the six future days. ”
Auto-Regressive (AR) Systems
• An AR system creates an output by linear combinations of past
outputs and only the current input.
y [=
n ] b0 x [ n ] − a1 y [ n − 1] − a2 y [ n − 2] − ... − aL y [ n − L ]
⇒ y [ n ] + a1 y [ n − 1] + a2 y [ n − 2] + ... + aL y [ n − L ] =
b0 x [ n ]
• The coefficients ak and b0 define the response, (b0 defines the

“gain” on the input and is sometimes assumed to be to unity).
• L is the AR model order.
• The convention for the sign of ak is arbitrary, the one above is
used because it makes later representations (slightly) easier.
• AR systems have feedback: old values of the output are used
to compute the next output - the system can be unstable.
Note our penguin population models were all AR systems
The number of unit delays
is equal to the order of the
AR system.
AR Example
• Input signal x [ n=
] {1, −2,3}
• Consider the system
y [ n=
] x [ n] − 0.5 y [ n − 1] + 0.25 y [ n − 2]
• To compute the output, assuming y[-2]=y[-1]=0 then
y [ 0=
] x [0] − 0.5 y [ −1] + 0.25 y [ −2=] 1
y [1] =x [1] − 0.5 y [ 0] + 0.25 y [ −1] =−2 − 0.5 ×1 =−2.5
y [ 2] = x [ 2] − 0.5 y [1] + 0.25 y [ 0] = 3 − 0.5 × (−2.5) + 0.25 ×1 = 4.5
y [3] = x [3] − 0.5 y [ 2] + 0.25 y [1] = 0 − 0.5 × 4.5 + 0.25 × ( −2.5 ) = -2.875
y [ 4] = x [ 4] − 0.5 y [3] + 0.25 y [ 2] = 0 − 0.5 × ( −2.875 ) + 0.25 × 4.5 = 2.5625
y [ 5] =
−2, y [ 6] =
1.6406, y [ 7 ] =
−1.3203, y [8] =
1.0703, y [9] =
−0.8652 
Input stops here
Example (Cont’d)
• The response of the AR
system gradually decays
after the input stops.
• As n→∞ y[n] →0, but
y[n] only approaches 0,
never actually reaches it.
• For other systems (choices of ak) the AR system could be
unstable, in which case as n→∞ y[n] →∞.
• Contrast this with the MA system for which, shortly after the
input reaches zero, the output is exactly zero and stays at zero.
AR Example II (Unstable)
• Inputs signal x [ n=
] {1, −2,3} Was 0.25 in previous example
y [ n=
] x [ n] − 0.5 y [ n − 1] + 0.75 y [ n − 2]
• To compute the output, assuming y[-2]=y[-1]=0 then
y [ 0] =
1, y [1] =
−2.5, y [ 2] =
5.0, y [3] =
-4.375, y [ 4] =
5.9375, y [5] =
−6.25,
y [ 6] =
7.5781, y [ 7 ] =
−8.4766, y [8] =
9.9219, y [9] =
−11.3184 
ARMA Systems
• An ARMA (Auto-Regressive Moving Average) system is a
combination of an AR system with a MA system!
• The general (causal) form is
[ n] b0 x [ n] + ... + bp x [ n − p ] − a1 y [ n − 1] − .. − aq y [ n − q ]
y=
⇒ y [ n ] + a1 y [ n − 1] + ... + aq y [ n −=
q ] b0 x [ n ] + ... + bp x [ n − p ]
q p
⇒ ∑ a y [=
k 0=
n − k ] ∑ b x [n − k ]
k 0
k k a0 ≡ 1
• The order of an ARMA system requires 2 values, p and q in

this case. Sometimes a system is said to be ARMA{p,q}.
Comments
• AR and MA systems represent special cases of ARMA
systems.
– MA system of order L is also an ARMA{L,0} system.
– AR system of order L is also an ARMA{0,L} system.
• In general (assuming p>0) ARMA systems can be unstable,
since they recycle (feedback) past values of the output.
• The convention that a0≡1 is not always applied, but consider
an ARMA system with coefficients ak and bk with a0 ≠ 1
 ak   bk 
q p q p
∑
k 0=
ak y [=
k 0
n − k] ∑ bk x [ n − k ] ⇒
=k 0
∑    y [=
 a=0 
n − k]∑k 0
  x [n − k ]
 a0 
 ak   bk 
q p
Setting 
 a0
= ak and   = bk ⇒
 =a0  k 0=
∑
ak y [ n − k ]=
k 0
∑
bk x [ n − k ] a0 ≡ 1
ARMA Example
• Input signal x [ n=
] {1, −2,3}
[ n] 0.5 x [ n] + 0.5 x [ n − 1] − 0.25 y [ n − 1]
y=
• To compute the output, assuming y[-1]=0 then
[0] 0.5 x [0] + 0.5 x [ −1] − 0.25 y =
y= [ −1] 0.5
y [1] =0.5 x [1] + 0.5 x [ 0] − 0.25 y [ 0] =−1 + 0.5 − 0.25 × 0.5 =−0.625
y [ 2]= 0.5 x [ 2] + 0.5 x [1] − 0.25 y [1]= 1.5 − 1 + 0.25 × 0.625= 0.6563
y [3] =0.5 x [3] + 0.5 x [ 2] − 0.25 y [ 2] =0 + 1.5 − 0.25 × 0.6563 =1.3359
y [ 4] = 0.5 x [ 4] + 0.5 x [3] − 0.25 y [3] = 0 + 0 − 0.25 ×1.3359 = -0.3340
y [ 5] =
0.0835, y [ 6] =
−0.0209, y [ 7 ] =
0.0052, y [8] =
−0.0013, y [9] =
0.0003 
Discussion of ARMA Output
After the input has passed through the MA terms, such that
x[n-p]=0, then the system behaves like an AR system.
In this case the system is stable and rapidly approaches zero,
in this regime, for this example, y[n]=-0.25y[n-1]
174-730-538
Delta Functions
• Continuous delta function – the Dirac delta
– Denoted δ(t) the Dirac delta function is strictly not a function, Paul Dirac
but a distribution or a generalised function. (1902-84)
– Most commonly expressed as:
lim Rε ( t )
δ (t ) =
ε→0 Leopold
Kronecker
(1823-91)
"God made the integers;
• Digital delta function – the Kronecker delta all else is the work of man"
– Denoted δ[n] the Kronecker delta is a simple sequence,

there are no mathematical complications.
– It is a perfect digital impulse.
δ [ n ]= 1 n = 0
= 0 n≠0
Sifting Property for Sequences
(Digital Signals)
• Any digital signal (or sequence) can be expressed as the sum
of Kronecker delta functions:
∞
x [ n]
= ∑ x [k ] δ [n − k ]
k = −∞
• This is the sifting property for digital signals: compare it with

the continuous time equivalent:
∞
x (t )
=
∫
−∞
x ( u ) δ ( t − u ) du
Example
• Consider the signal x [ n=
] {1, −2,3}
δ[n]
+ −2δ[n-1]
=
+ 3δ[n-2]
Example
Digital Impulse Response
• The digital impulse response of a system, h[n], is defined as
the response you elicit from a (digital) system when it is
excited by a Kronecker delta, δ[n].
• For an LTI (linear time-invariant) system, the response of the

system to a delayed and scaled delta function aδ[n-k] is a
similarly scaled and delayed impulse response ah[n-k].
Note - we can compute a digital system’s impulse response without using a transform: contrast this with a
continuous system: if you know the ODE, you would compute the transfer function (or FRF) and
then inverse Laplace (or Fourier) transform to get the impulse response.
Example MA System
• Consider the MA system
=y [ n ] 0.25 x [ n ] + 0.4 x [ n − 1] − 0.3 x [ n − 2] + 0.1x [ n − 3]
• To compute the impulse response, h[n], then just set x[n]=δ[n],
in which case h[n]=y[n].
n 0 : y [ 0=
= ] h [0=] 0.25δ [0] + 0.4δ [ −1] − 0.3δ [ −2] + 0.1δ [ −3=] 0.25
n 1: y [1=
= ] h [1=] 0.25δ [1] + 0.4δ [0] − 0.3δ [ −1] + 0.1δ [ −2=] 0.4
n =2 : y [ 2] =h [ 2] =0.25δ [ 2] + 0.4δ [1] − 0.3δ [ 0] + 0.1δ [ −1] =−0.3
n = 3 : y [3]= h [3]= 0.25δ [3] + 0.4δ [ 2] − 0.3δ [1] + 0.1δ [ 0]= 0.1
n = 4 : y [ 4] = h [ 4] = 0.25δ [ 4] + 0.4δ [3] − 0.3δ [ 2] + 0.1δ [1] = 0

h [ n]
= {0.25, 0.4, −0.3, 0.1, 0, 0,} Recall δ[n] is zero everywhere except when n=0
e.g. δ[-2]= δ[-1]= δ[1] =δ[2]=0
Comments on the MA Example
• Our MA system
=y [ n ] 0.25 x [ n ] + 0.4 x [ n − 1] − 0.3 x [ n − 2] + 0.1x [ n − 3]
is characterised by the coefficients
= bk {0.25, 0.4, −0.3, 0.1}
which is also the impulse response of the system, h[n].
• It is generally true that, for an MA system, the impulse
response is equal to its coefficients.
h [ n ] = bn
• This is an important result which makes MA systems easy to
design and manipulate.
• For an MA system of order L the impulse response, h[n], is
identically zero for n>L.
Example ARMA System
• Consider (again) the system:
[ n] 0.5 x [ n] + 0.5 x [ n − 1] − 0.25 y [ n − 1]
y=
• The impulse response is given by:
n= 0 : y [ 0]= h [ 0]= 0.5δ [ 0] + 0.5δ [ −1] − 0.25 y [ −1]= 0.5
n = 1: y [1] = h [1] = 0.5δ [1] + 0.5δ [ 0] − 0.25 × 0.5 = 0.5 − 0.125 = 0.375
n=2 : y [ 2] =h [ 2] =
0.5δ [ 2] + 0.5δ [1] − 0.25 × 0.375 =−0.0938
n 3 : y [3=
= ] h [3=] 0.5δ [3] + 0.5δ [ 2] − 0.25 × −0.0938= 0.0234
4 : y [ 4] =
n= h [ 4] =
0.5δ [ 4] + 0.5δ [3] − 0.25 × 0.0234 =
−0.0059
h [ 5] =
0.0015, h [ 6] =
−0.0004, h [ 7 ] =
−0.0001,
Comments on the ARMA Example
• The impulse response in this case is not related in a transparent
manner to the system coefficients (ak and bk).
• The impulse response decays towards zero, but, in theory at
least, it never reaches zero.
h [ n ] → 0 as n → ∞
• This is in contrast to MA systems, whose impulse response
becomes identically zero for sufficiently large n.
• This is generally true: in nearly all ARMA systems* the
impulse response continues ad infinitum.
– For a stable system the impulse response decays towards zero.
– For an unstable system the impulse response grows without bound.
*We shall shortly discuss the exceptions: ARMA systems whose impulse response
becomes identically zero after some point n – such systems are very rare.
Classification of Digital Systems
• When talking of a “system” we generally class digital systems
according to whether they are MA, AR or ARMA.
• When considering “filters” (which are just digital systems*) we
generally use an alternative classification.
• Filters are classified according to whether their impulse
response becomes zero after a finite time (as in an MA
system), or only approaches zero (as in most ARMA systems).
• The two classes are:
– Finite Impulse Response (FIR) filters
– Infinite Impulse Response (IIR) filters
* The concepts of a filter and a system are not separate – the distinction is artificial.
Finite Impulse Response
Filters/Systems
• Any filter (or system) whose impulse response satisfies
h [ n ]= 0 ∀n > M
for some M, is said to be FIR.
• All MA systems are FIR
• No AR systems are FIR
• A few (a very few) ARMA system are also FIR
Typical FIR system response
h[n]=0 for n>6

Infinite Impulse Response
Filters/Systems
• Any filter (or system) for which you cannot find an M such
that the impulse response satisfies
h [ n ]= 0 ∀n > M
is said to be IIR.
For a stable system h[n]→0 as n→∞
• MA systems are never IIR but does not actually get there.
• AR systems are always IIR
• ARMA systems are nearly always IIR
Typical IIR Impulse Response

Example of an ARMA System which
is FIR
y [ n ]= x [ n ] − x [ n − 4] + y [ n − 1]
• This is an ARMA system – the difference equation contains
past values of x[n] and y[n].
• To compute its impulse response, put x[n]=δ[n] so
y [ 0] = h [ 0] = δ [ 0] − δ [ −4] + y [ −1] = 1 − 0 + 0 = 1
y [1] = h [1] = δ [1] − δ [ −3] + y [ 0] = 0 − 0 + 1 = 1
y [ 2] = h [ 2] = δ [ 2] − δ [ −2] + y [1] = 0 − 0 + 1 = 1
y [3] = h [3] = δ [3] − δ [ −1] + y [ 2] = 0 − 0 + 1 = 1
y [ 4] = h [ 4] = δ [ 4] − δ [ 0] + y [3] = 0 − 1 + 1 = 0
y [5] = h [5] = δ [5] − δ [1] + y [ 4] = 0 − 0 + 0 = 0
 Finite Impulse Response
h [ n ] = {1,1,1,1, 0, 0, 0, 0,}
after n=3, h[n]=0.
Discussion
• The previous example system can also be realised as an MA
system:
y [ n ] = x [ n ] + x [ n − 1] + x [ n − 2] + x [ n − 3]
• So the same system can be obtained using two difference
equations.
• The MA representation above betrays the system’s behaviour:
the output is just the sum of the last 4 inputs (a rolling sum).
• The ARMA system is an alternative, more efficient, realisation
of this rolling sum:
– To create the sum of the last 4 inputs, take the output from the previous
sum and add the value of the most recent input and subtract the input 5
samples ago (x[n-4]).
Input/Output Relationships for
Digital Systems
• Consider a digital LTI system receiving an input composed of
two Kronecker delta functions
x [ n ] =aδ [ n − n1 ] + bδ [ n − n2 ]
where n1 and n2 are delays and a and b are scale factors.

• The output in response to this input is:
y [ n ]= ah [ n − n1 ] + bh [ n − n2 ]
i.e. the sum of the scaled, and delayed, impulse responses.

Input/Output Relationships (Cont’d)
• The result from the previous slide can be extended.
• From the sifting property of delta functions: an arbitrary input
signal can be expressed as the sum of scaled and shifted delta
functions. ∞
=x [ n] ∑ x [k ] δ [n − k ]
k = −∞
• Each term in the sum can be considered independently

(because the system is linear).
• The response to the term x[k]δ[n-k] is x[k]h[n-k].
• So the output, y[n], for the input x[n] is given by
∞
=y [ n] ∑ x [k ] h [n − k ]
k = −∞
Digital Convolution
• To compute the output of a digital system for any input, x[n],
given that system’s impulse response, h[n], one uses the
convolution sum:
∞ ∞
=y [ n] ∑ x [k=
k = −∞
] h [n − k ] ∑ h [k ] x [n − k ]
k = −∞
• Denoted= as y [ n ] x=
[ n] * h [ n] h [ n] * x [ n]
• For a causal system one has:
n ∞
y [ n]
= ∑ x [ k=
k = −∞
] h [n − k ] ∑ h [k ] x [n − k ]
k=0
Note these relationships precisely mirror those for continuous systems.

MA Systems and Convolution
• We have seen that the impulse response of an MA system is
equal to its coefficients, i.e. h[n]=bn.
• Consider the convolution integral
∞
=y [ n] ∑ h [k ] x [n − k ]
k = −∞
• For an MA system this becomes:
∞
=y [ n] ∑ b x [n − k ]
k = −∞
k
and, say, for a causal system of order L we can write

L
[ n]
y= ∑ b x [n −=
k =0
k k] b0 x [ n ] + b1 x [ n − 1] + ... + bL x [ n − L ]
The convolution integral just reverts to the MA system equation!

Exponentials and Convolution
Summary
ODE
dpy dy dqx dx Underlying
+ + +
a p p ... Difference
a1 = bq q + ... + b1 + b0 x
a0 y Equation
dt dt dt dt Physics
Laplace
Z Not discussed
Fourier
transform
Transform here
Transform
FRF
Transfer ?
s=2πif
bq ( 2πif ) + ... + b1 2πif + b0
q
H (s) = ? H(f )= ?p Measurement
a p s p + ... + a1s + a0 f=s/2πi
? a p ( 2πif ) + ... + a1 2πif + a0
P(s)=0 Fourier
?
Q(s)=0 Z
Laplace
Transform
Transform
transform
any input
Impulse ∞
? Response Digital = y ( t ) Convolution

∫−∞ x ( u ) h ( t − u ) du
Impulse Response Sum
stable? causal?
Z-Transforms
Chuang Shi
Overview
• Definition of the z-transform
– Representation of the z-transform
• Properties
– Using z-transforms and difference equations
– Transfer functions
• Displaying the z-transform
• Relationship to the Fourier transform of a sequence
• Digital frequency response functions
The Z-Transform
• The z-transform is the digital equivalent of the Laplace
transform.
• It is defined as 
X (z) = 
n =−
x n z −n
• The variable z (which is the equivalent to s in Laplace

transforms) is in general complex and, hence, the value of X(z)
is also, in general, complex.
• This is a bilateral (two-sided) definition, i.e. n extends from -∞
to ∞, single sided definitions are also used, when n extends
from 0 to ∞.
• For a system, single sided z-transforms assume causality.
Example: Unit Step Function
• Consider the sequence, sometimes called the unit step
function:
u  n = 1 n  0
=0 n0
 
U (z) = 
n =−
u  n z − n = 
n=0
z −n
• Using the sum of an infinite geometric progression* with a=1

and r=z-1, then


1
U (z) = z −n
= z −1  1 or z 1
n =0
1 − z −1
* see overheads at the end of the lecture notes

Example: Finite Step Function
• Consider a step function which returns to zero after L samples.
uL  n = 1 0  n  L
L=6
= 0 Elsewhere
 L −1
UL ( z) = 
n =−
uL  n z − n = n =0
z −n
• The sum of a finite geometric progression with a=1 and r=z-1

L −1
1− z−L
UL ( z) = n =0
z −n =
1 − z −1
 z −1  1
Example: Geometric Sequence
• Consider the signal
x  n =  n n0
=0 n0
Sum of an infinite GP
• Z-transforming we get with a=1 and r=z-1.
 
X (z) =   z =  ( z )
n
n −n −1
n =0 n =0
1
= z −1  1 or z 
1 − z −1
Note the step function is just a special case of this example with =1.
Properties of the Z-Transform:
Linearity
• The z-transform is linear.
Z ax  n  = aX ( z ) and Z  x  n  + y  n  = X ( z ) + Y ( z )
where Z   denotes the operation of taking the z-transform

and X(z) and Y(z) are the z-transforms of x[n] and y[n]
respectively.
• More generally we write
Z ax  n  + by  n  = aX ( z ) + bY ( z )
• This allows us to “break up” z-transforms of sums into smaller

individual transforms.
Z-Transform Properties:
Shifting
• Consider a sequence shifted by 1 sample y[n]=x[n-1].
• How is the z-transform of y[n] related to that of x[n]?
 
Y ( z ) = Z  y  n  = 
n=−
y n z −n = 
n=−
x  n − 1 z − n
• If we rewrite the final summation in terms of m=n-1:

  
Y (z) = 
n =−
x  n − 1 z − n = 
m =−
x m z
−( m +1)
= 
m =−
x  m  z − m z −1
= z −1 
m =−
x  m  z − m = z −1 X ( z )
Shifting (cont’d)
• So the z-transform of a signal shifted by 1 sample is the
original z-transform multiplied by z-1.
• Repeatedly applying this we have the general result:
Z  x  n − m = z − m X ( z )
• This is one of the key properties of the z-transform, which

leads to the common pictorial representation of a delay in
digital system diagrams:
(again)
• Note that the finite step function, uL[n], can be related to the
infinite step function, u[n], as follows: u[n]
uL  n  = u  n  − u  n − L 
-u[n-L]
• Using the linearity and shifting property of the z-transform.

(
U L ( z ) = U ( z ) − z − LU ( z ) = U ( z ) 1 − z − L )
• Previously we showed that U(z)=1/(1-z-1), |z|>1, so
1− z−L
UL ( z) = −1
z 1
Same result as before,
but only valid for a smaller
1− z
range of z.
Time Reversal
• Consider a sequence y[n] which is equal to a second sequence
x[n] reversed in time, specifically, y[n]= x[-n].
• We can relate their z-transforms as follows
 
Y (z) = 
n =−
y n z − n = 
n =−
x  −n  z − n Now replace n by m where m=-n
− 
  ( ) ( )
−m
= x  m z =
m
x  m z −1
= X z −1
m = m =−
b a
Note that for summations n=a

f (n) = 
n =b
f ( n ) i.e. the order of the limits does not matter,
b a
which is in contrast to integration where the order of limits does matter 

a
f ( t ) dt 

b
f ( t ) dt
Example: Reverse Step Function
• Consider a reversed step function u-[n] defined as:
u−  n  = 1 n  0 u-[n]
=0 n0
• The z-transform of this function is:
 0 
  
1
U− ( z ) = u−  n  z − n = z −n = zm = z 1
n =− n =− m=0
1− z
again m=-n and the limits have been swapped
• Also note that u-[n]= u[-n] so that from the preceding theorem
U − ( z ) = U z −1 ( )
• So that since
1 1 1 (as also shown
U (z) = then U − ( z ) = =
( )
−1 −1
1− z 1 − z −1 1 − z above)
(yet again)
• We can express the finite step function as the sum of two
reversed step functions as follows: -u-[n+1]
uL  n = u−  n − L + 1 − u−  n + 1
• Z- transforming gives
U L ( z ) = z − L +1U − ( z ) − zU _ ( z )
z − L +1 − z z − L − 1 1 − z − L -u-[n-L+1]
= = −1 =
1− z z − 1 1 − z −1 L=6 in this illustration
Multiply numerator and
denominator by z-1. This is valid for |z|<1 since that is
the condition on U-(z).
This results valid on |z|<1 whereas the result obtained using step function was valid for
|z|>1, combined this covers almost all values of z.
Displaying the Z-Transform
• The variable z is complex valued and the function X(z) is also
complex valued.
• The z-transforms we encounter have the form of a ratio of two
polynomials in z.
Q( z)
• Specifically: X ( z) =
P( z)
• Like the Laplace transform we only consider the points at
which
– X(z)=0, which occurs when Q(z)=0, these are called the zeros.
– X(z)=∞, which are the points for which P(z)=0, these are called the
poles.
Example: Unit Step Function
• The z-transform of the unit step function is:
1 z
U (z) = −1
= z 1
1− z z −1
• The pole for this function is given by z-1=0 => z=1.
• The zero occurs when z=0. However, pole and zeros at the
origin (z=0) are not normally considered.
Pole-zero diagram for the unit step function.

Argand plane for z, pole () at z=1 and zero
(o) at z=0.
• The geometric sequence x[n]=n has the z-transform.
1 z
X (z) = −1
= z 
1 − z z −
=0.8 =1.2
Z-Transforms and Difference
Equations
• Consider the general (ARMA) difference equation
y  n = b0 x  n + b1 x  n − 1 + ... + bp x  n − p  − a1 y n − 1 − ... − aq y n − q 
• Z-transforming this gives

Z  y  n  = b0 x  n  + b1 x  n − 1 + ... + b p x  n − p  − a1 y  n − 1 − ... − aq y  n − q 
 Z  y  n  = b0 Z  x  n  + b1Z  x  n − 1 + .... + bp Z  x  n − p 
uses Z  x  n − m = z − m X ( z )
− a1Z  y  n − 1 − aq Z  y  n − q 
 Y ( z ) = b0 X ( z ) + b1 z −1 X ( z ) + ...b p z − p X ( z ) − a1 z −1Y ( z ) − ... − aq z − qY ( z )
( ) (
 Y ( z ) 1 + a1 z −1 + ... + aq z − q = X ( z ) b0 + b1 z −1 + ... + b p z − p )
Y ( z) b0 + b1 z −1 + ..... + bp z − p
 H (z) = =
X ( z) 1 + a1 z −1 + .... + aq z − q
Digital Transfer Functions
• The digital transfer function (sometimes called the
characteristic function) is defined as the ratio of Y(z) and X(z).
• A causal (LTI) ARMA system has a transfer function of the
form
Y ( z ) b0 + b1 z −1 + ..... + bp z − p
H (z) = =
X ( z ) 1 + a1 z −1 + .... + aq z − q
• It is the ratio of 2 polynomials in z.

• The roots of these polynomials control the behaviour of the
system.
– Zeros: solutions of: b0 + b1 z −1 + ..... + bp z − p = 0  b0 z p + b1 z p −1 + ..... + bp = 0
−1 −q q −1
– Poles: solutions of: 1 + a1 z + ..... + aq z = 0  z + a1 z + ..... + aq = 0
q
Example
• Consider the difference equation
y  n = 0.5x  n + 0.5x  n − 1 − 0.25 y  n − 1
• Z-transforming:
Y ( z ) = 0.5 X ( z ) + 0.5 z −1 X ( z ) − 0.25 z −1Y ( z )
( ) ( )
Y ( z ) 1 + 0.25 z −1 = 0.5 1 + z −1 X ( z )
 1 + z −1 
 H ( z ) = 0.5  −1 
 1 + 0.25 z 
one zero at z =-1
one pole at z =-0.25
Z-Transform of Digital Convolution
• The z-transform of the convolution of two signals is easily
expressed in terms of the signal’s individual z-transforms.
• Recall

x  n * y  n = 
m =−
x  m y  n − m
• Z-transforming it can be shown (for proof see “extra” o/hs)

Z  x  n  * y  n  = X ( z ) Y ( z )
• This is analogous Laplace transforms where

L  x ( t ) * y ( t ) = X ( s ) Y ( s )
Minor note, the above uses “*” for both digital and continuous time convolution, strictly they are different,
but the context should always make it clear which operation is implied and so no confusion should follow.
Input Output Relationships
• For a digital system the input/output relationship based on the
impulse response is:
y  n = h  n * x  n
• Z-transforming gives:
Y ( z) = H ( z) X ( z)
where H(z) is the z-transform of the impulse response h[n].
• Manipulating the above, the transfer function, H(z), can be
equated to the ratio Y(z)/X(z), hence
Y ( z) b0 + b1 z −1 + ..... + bp z − p
H (z) = = Z h  n  =
X ( z) 1 + a1 z −1 + .... + aq z − q
From previous slides we know this is equal to H(z)
Combining Digital Systems
• The basic rules for combing digital systems mirror those for
continuous time systems:
• For two systems in series their transfer functions are
multiplied.
• For two systems in parallel their

transfer functions are added.
Stability
• Recall the Geometric Sequence example:
– For ||>1 the sequence is unstable, i.e. x[n] increases without bound as
n→∞, the pole in the system lies outside the circle |z|=1.
– For ||<1 the sequence is stable, i.e. x[n] →0 as n→∞, the pole in the
system lies inside the circle |z|=1.
• This extends to general sequences, specifically:
If there is a pole of X(z) outside the circle |z|=1* then the

sequence x[n] is unstable.
The signal has been assumed to
be causal or right-sided.
• Note the stability does not depend on locations of the zeros of
X(z) it is only the poles positions that affect stability.
* The circle |z|=1 is commonly called “the unit disc”.
Diagrammatic Representation of
Stable Region of the Z-Plane
• Poles inside the unit disc are stable, poles outside the unit disc
are unstable.
z-plane
Im{z}
Unstable
Region
Stable
Region Re{z}
|z|=1
Nature of
Poles
|z|=1
General Properties
• The distance from the unit disc controls the rate of growth or
decay of the signal/system.
• Angle around the circle controls the rate of oscillation.
Oscillations that neither
decay or grow Rapidly
growing
Slowly Slowly
decaying growing High
Rapidly frequency Low
decaying frequency
Effect of Pole Radius Effect of Pole Angle

General Properties (Again)
• To formalize the previous slide (slightly).
• Consider a pole located at the complex location zp.
• This pole corresponds to a mode (in a system) or a component
(in a signal) which oscillates with the following
characteristics:
– The angle Arg{zp}=tan-1(Im(zp)/Re(zp)), defines the rate of oscillation.
– The magnitude, |zp|, controls the rate of decay/growth:
• If |zp|<1 the pole is stable and the mode decays
• If |zp|>1 the pole is unstable and the mode grows
• The further |zp| is from 1, the more rapid the rate of decay or growth.
Transfer Functions for MA Systems
• A general MA system has the difference equation:
y  n = b0 x  n + b1 x  n − 1 + b2 x n − 2 + .... + bq x n − q 
• Transfer function of which is:

H ( z ) = b0 + b1 z −1 + b2 z −2 + b3 z −3 + ... + bq z − q
= b0 z q + b1 z q −1 + b2 z q − 2 + b3 z q −3 + ... + bq
• Such a transfer function has q zeros. The only poles are at z=0
(such poles are not significant).
• So that MA systems have transfer functions that contain only
zeros.
Transfer Functions for AR Systems
• A general AR system has the difference equation:
y  n = b0 x  n − a1 y  n − 1 − a2 y  n − 2 − .... − a p y n − p 
• The transfer function of which is:

b0 b0 z p
H ( z) = −1 −2 −p
= p
1 + a1 z + a2 z + .... + a p z z + a1 z p −1 + a2 z p −2 + .... + a p
• Such systems have p poles. There are p zeros at the origin
(z=0).
• AR systems have transfer functions which only consist of
poles, so are sometimes called all-pole systems.
Transfer Functions for ARMA
Systems
• The general form for an ARMA system is:
y  n  = b0 x  n  + b1 x  n − 1 + ... + bq x  n − q  − a1 y  n − 1
− a2 y  n − 2 − .... − a p y  n − p 
• Which has the transfer function:

b0 z q + b1 z q −1 + b2 z q − 2 + .... + bq
H ( z ) = z q− p
z p + a1 z p −1 + a2 z p − 2 + .... + a p
• Thus an ARMA{p,q} system has a transfer function with p

poles and q zeros.
Alternative Representations
• If rk are the roots of the denominator, i.e. the poles, then
z p + a1 z p −1 + a2 z p −2 + .... + a p = ( z − r1 )( z − r2 ) ( z − r3 ) .... ( z − rp )
p
= 
k =1
( z − rk )
• If zk are the roots of the numerator, i.e. the zeros, then

b0 z q + b1 z q −1 + b2 z q −2 + .... + bq = b0 ( z − z1 )( z − z2 ) ( z − z3 ) .... ( z − z q )
q
= b0 
k =1
( z − zk )
Alternative Representations
(Cont’d)
• Combining this one can write:
( 1 )( 2 ) ( q )
− − −
max p , q
( z − zk )

z z z z .... z z
H ( z ) = b0 = b0
( z − r1 )( z − r2 ) .... ( z − rp ) k =1
( z − rk )
where is assumed that zk and rk are zero if k>p,q
• Hence knowing just the poles and zeros one can compute the
transfer function, with the exception of the value of b0.
• Based on this any ARMA system can be expressed as a
sequence of ARMA{1,1} systems in series:
Z-Transform of Real Signals
• If a signal is real-valued, then the poles and zeros correspond
to the roots of polynomials with real coefficients.
• Such roots are either:
– Real valued
– Complex valued and occur in conjugate pairs
• So if zp is, say, a complex pole then so is zp*
• This results in the poles and zeros of

a z-transform occurring symmetrically
about the real axis in the z-plane.
Note the symmetry in the

pole and zero locations
Example: Repeated Roots
• Some systems have more than one pole or zero at a given
location, these are called repeated roots.
• For example, consider the sequence
x  n = na n n0 Repeated root
indicated by “2”
=0 n0
• This has the z-transform (see extra o/hs)
az
X (z) = | z || a |
( z − a)
2
Example: Pole-Zero Cancellation
• A pole and zero at the same location will cancel each other.
• Consider
y  n = x  n − 3x  n − 1 + 2 x  n − 2 + 1.5 y n − 1 − 0.5 y n − 2
1 − 3z −1 + 2 z −2 z 2 − 3z + 2 ( z − 1)( z − 2 ) z − 2
H ( z) = = 2 = =
−1
1 − 1.5 z + 0.5 z −2
z − 1.5 z + 0.5 ( z − 1)( z − 0.5) z − 0.5
Pole and zero

cancel to leave
• The new transfer function refers to the simpler, but equivalent

difference equation: y  n = x  n − 2 x  n − 1 + 0.5 y  n − 1
Note that cancelling a pole zero pair in this way reduced both orders of the ARMA system by 1,
so that an ARMA{p,q} system becomes an ARMA{p-1,q-1} system.
Example: Rolling Sum (Again)
• Recall we saw the ARMA system y  n = x  n − x  n − 4 + y  n − 1
is equivalent to the MA system
y  n = x  n + x  n − 1 + x  n − 2 + x n − 3
• Considering the ARMA system’s transfer function
1 − z −4 −3 z − 1
4
H (z) = −1
=z
1− z z −1
• There are 4 roots of z4-1=0, specifically z=1,i,-1,-i so that
z 4 − 1 = ( z − 1)( z − i )( z + i )( z + 1)
• Hence
−3 ( z − 1)( z − i )( z + i )( z + 1)
Which is the transfer function
H (z) = z of the MA system
z −1
= z −3 ( z − i )( z + i )( z + 1) = 1 + z −1 + z −2 + z −3
Example: Rolling Sum (Cont’d)
• The ARMA system’s transfer
function is:
• The pole and zero at z=1 cancel.

This the reduces both orders of the
ARMA system by 1. So the system
goes from an ARMA{4,1} system
to a ARMA{3,0}, i.e. an MA system
of order 3.
• The resulting MA system’s transfer

function is:
Relationship to the Fourier
Transform
• The Fourier transform of a sequence is defined as
 
Xs ( f ) = 
n =−
x  n  e −2 pifnD = 
n =−
x n e
−2 pin( f / f s )
where D (=1/fs) is the sampling interval and fs is the sampling

rate - see Signal Processing notes.
• The FT of a sequence is equal to the z-transform X(z)
evaluated at z=e2pifD, or formerly
X s ( f ) = X ( z ) z =e 2 pif D
• Hence the FT is the z-transform evaluated on the circle |z|=1,

which is the unit disc.
Digital Frequency Response
Functions (FRFs)
• The FRF of a digital system can be evaluated from the transfer
function, H(z).
• Specifically, the FRF, H(f), is obtained by evaluating the
transfer function on the unit disc.
• So that
H ( f ) = H ( z ) z =e 2 pif D
• Further the FRF is the FT of the impulse response:


H ( f ) = F h  n  = 
n=−
h  n  e −2 pifnD
Defn of the Fourier transform
of a sequence.
Note that the notation for the FRF is ambiguous: no distinction is made between the digital and
continuous FRFs, both are referred to as H(f). The context should make it clear which is appropriate.
Note that frequency f
on this slide has been
normalized by
sample rate fs
Unit Disc
Magnitude
Frequency
Magnitude
Frequency
Periodicity in the FT of a Sequence
• The FT of a sequence is inherently periodic.
X s ( f ) = X s ( f + fs ) = X s ( fs − f )
*
• In terms of the z-transform increasing the frequency by fs

corresponds to one circle around the unit disc in the z-plane.
Comparison with Laplace
Transform
• If a continuous signal has a pole in the right half plane (i.e.
poles for which Re{s}>0) then the signal is unstable.
• If a sequence has a pole outside the unit disc it is unstable.
Continuous signals s-plane Digital signals z-plane
Im{s} Im{z}
Unstable Unstable
Stable Region
Region Region
Stable
Re{s} Region Re{z}
Im{s}=0
Frequency |z|=1
Axis Frequency Axis
Frequency Axes
• In each case the boundary between the stable and unstable
regions corresponds to the frequency axes.
• This is because sine/cosine waves are signals that are neither
stable nor unstable – they do not grow or decay.
• Consider the Laplace transform of a cosine wave
L cos ( 2pf 0t ) = 2
s s
=
s + ( 2pf 0 )
2
( s − 2pif0 )( s + 2pif 0 )
with two poles s=±2pif0 on the frequency axis
• Now consider the sampled (digital) cosine wave.

Z-Transform of a Cosine Sequence
• Consider the z-transform of a sampled cosine wave
x  n  = cos ( 2pf 0 nD ) =
2
(
1 2 pif nD −2 pif nD
e +e
0 0
n0 ) Recall D is the sampling
interval and D=1/fs, where
fs is the sample rate.
1 
  
X (z) = 
n =−
x n z = 
−n
 
2  n =0
e 2 pif nD − n
z +0
n=0
e 
−2 pif nD − n
z 0


1 n
 
 ( )  ( )
2 pif D −1
n
−2 pif D −1 1 1
=  e z + e z  = +
( ) ( )
0 0

2  n =0 − 2 pif D −1
− −2 pif D −1
 2 1 e z 2 1 e z
0 0
n =0
1 − cos ( 2pf 0 D ) z −1 z 2 − z cos ( 2pf 0 D )

= =
1 − 2 z cos ( 2pf 0 D ) + z
−1 −2
z 2 − 2 z cos ( 2pf 0 D ) + 1
• This z-transform has two poles on the unit disc at z=e±2pif0D, i.e.
the two poles of the cosine’s z-transform lie on the unit disc.
Example: Rolling Sum (Once More)
• The transfer function is:
H ( z ) = 1 + z −1 + z −2 + z −3
H ( ) = 1 + e−i + e−2i + e−3i
where  = 2pf D is the normalised angular frequency.
• Computer the squared magnitude of the FRF.
(
H ( ) = H ( ) H ( ) = 1 + e − i + e −2i + e −3i 1 + ei + e 2i + e3i )( )
2 *
( ) (
= e − i 3  / 2 e i 3  / 2 + e i  / 2 + e − i / 2 + e − i 3  / 2 e i 3  / 2 e i 3  / 2 + e i / 2 + e − i / 2 + e − i 3  / 2 )
= 4 ( cos ( 3 / 2 ) + cos (  / 2 ) )
2
Low frequencies are subject to a gain of 4, higher

frequencies are attenuated. =p corresponds to the
Nyquist frequency above which the FRF is a
copy of that below Nyquist. This system is a form
of low pass filter.
Geometric Progressions
• A geometric progression (GP) is a sequence in which the next
number is obtained by multiplying the preceding number by a
constant.
• A sequence x[n], if it is a finite geometric progression of N
terms if it can be written as:
 
x  n  = a, ar , ar 2 , ar 3 , ar 4 ,..., ar N −1
• A sequence x[n], is an infinite geometric progression if it can
be written as:
 
x  n  = a, ar , ar 2 , ar 3 , ar 4 ,...
• In both cases:
– a is the first term in the GP
– r is the geometric ratio (the ratio of the nth to the (n-1)th term)
Sum of a Finite GP
• Commonly we will sum GPs.
• For a finite GP we need to compute:
L −1

n =0
ar n −1 = a + ar + ar 2 + ar 3 + .... + ar L −1 = S L
• Multiplying both sides (1-r)

(
(1 − r ) S L = (1 − r ) a + ar + ar 2 + ar 3 + .... + ar L−1 )
(
= a + ar + ar 2 + .... + ar L −1 − r a + ar + .... + ar L −2 + ar L −1 )
= a − ar L
SL =
(
a 1− r L )  r 1
1− r
Sum of an Infinite GP
• To obtain the sum of an infinite length GP, take the previous
result and let L→∞.
lim S L = lim
(
a 1− r L
=
a) if | r | 1
L → L → 1− r 1− r
• The sum only exists when |r|<1.
• For |r|>1 each term in the GP is bigger than the proceeding
one, so as you add more and more terms together the sum
continues to grow and never approaches a finite limit.
lim S L = lim
(
a 1− r L ) = if | r | 1
L → L → 1− r
Summary of Sums of GPs
• For a finite GP, i.e. a sum which has L terms and L<∞.
SL =
(
a 1− r L )  r 1 You can easily deal with the
1− r special case of r=1 separately.
Try it!
• This has no conditions on r (except that r can not be 1).

• For an infinite GP, i.e. a sum of an infinite number of terms,
a
S = r 1
1− r
only true if r has a magnitude less than 1.
• These relationships hold for complex valued sequences as
well as real valued sequences.
A Related Sum
• Consider the sequence
 
x  n  = ar , 2ar 2 ,3ar 3 , 4ar 4 ,.... = nar n
• The sum of this infinite sequence is:
S = 0 + ar + 2ar 2 + 3ar 3 + ....
• Consider the sum of infinite GP
a
a + ar + ar 2 + ar 3 + ar 4 + .... =  r 1
1− r
• Differentiating wrt to r
a
0 + a + 2ar + 3ar 2 + 4ar 3 + .... =  r 1
(1 − r )
2
Multiplying through by r
ar
 ar + 2ar 2 + 3ar 3 + 4ar 4 + .... = = S  r 1
(1 − r )
2
Proof for Z-Transform of
Convolution
       −n
Z  x  n  * y  n  = Z  
 k =−
x  k  y  n − k  =  
 n=−

 k =−
x k  y n − k   z


 
= 
n=− k =−
x k  y n − k  z −n
Swapping order of summations
 
= 
k =−
x k  
n=−
y n − k  z −n
Definition of z-transform
 
= 
k =−
x  k  Z  y  n − k  = 
k =−
x  k Y ( z ) z − k
= Y (z) 
k =−
x k  z − k = X ( z )Y ( z )
Z-Transform of a Sine Wave
• Express sampled cosine as a sum of complex exponentials
e i n − e − i n
sin ( 2pf 0 nD ) = sin ( 0 n ) =
0 0
, n0 0 = 2pf 0 D
2i
=0 n0 Normalised angular
frequency – introduced
for compactness.
• Z-transforming
 
Z sin ( 0 n ) =   (e )
1
sin ( 0 n ) z − n = i0 n
− e − i n z − n
0
n =0
2i n =0
1   n
   
   ( )  ( )
i n − n − i n − n 1 −1 i
n
−1 − i
=  e z − 0
e z =  0
z e − z e 0
 0
2i n =0 n =0
 2i 
  n =0 n =0 
1 1 1  z −1 sin ( 0 ) z sin ( 0 )
=  −  = =
2i  1 − z −1ei 1 − z −1e − i  1 − 2cos ( 0 ) z −1 + z −2 z 2 − 2 z cos ( 0 ) + 1
0 0
Example of Repeat Roots
• Z-transform of x  n = na n n0
=0 n0

X (z) = 
n =1
na n z − n = az −1 + 2a 2 z −2 + 3a 3 z −3 + 4a 4 z −4 + .....
= az −1 + a 2 z −2 + a 3 z −3 + a 4 z −4 + ..... + a 2 z −2 + 2a 3 z −3 + 3a 4 z −4 + ....
az −1
infinite sum of a GP
=
1 − az −1
+ az −1
az −1
(+ 2 a 2 −2
z + 3 a 3 −3
z + .... )
az −1
= −1
+ az −1
X (z) | z || a |
1 − az Condition is necessary because we have used
the sum of an infinite GP.
az −1 az
 X (z) = = | z || a |
(1 − az ) ( z − a )
2 2
−1
This can also be shown using the result on the

slide entitled a “A Related Sum”
Z-Transforms (Again)
Chuang Shi
Contents
• Regions of convergence for causal, anti-causal systems and
doubly infinite systems.
– Left-sided, right-sided and two-sided sequences.
• Stability (again).
• Inferring general sequence properties from the ROC.
• Inverse z-transforms
– Series
– Partial fractions etc
– Contour integration (not covered here)
• IZT, taking into account the ROC.
Region of Convergence (ROC)
• Each z-transform is valid for a range of values of z.
• This range of values is called the Region of Convergence
(ROC).
• For the z-transforms we encounter the ROC is always an
annular (doughnut-shaped) region of the general form
Rmin  z  Rmax
defined by the two radii Rmin and Rmax
Note it is possible that: Rmin=0

and/or that Rmax=∞.
Example: Causal Geometric
Sequence
• Consider the sequence:
x  n = a n n0
=0 n0
• This has z-transform: Defines the ROC, i.e. the
1
X (z) =
values of z over which the
z a
1 − az −1 algebraic expression for X(z)
is valid.
• In this case Rmin=|a| and Rmax=∞.

• The ROC is the exterior of the circle
|z|=|a|.
Example: Anti-causal Geometric
Sequence
• Consider the anti-causal geometric sequence:
x  n = 0 n0
= an n0
• This has a z-transform given by:
0 
 
1
X ( z) = an z −n = a−m z m = a −1 z  1 or z  a
n =− m=0
1 − a −1 z
where m=-n
• In this case the ROC is defined by Rmin=0

and Rmax=|a| and represents the interior of
the circle |z|=|a|.
Example: Acausal Geometric
Sequence
• Consider the acausal geometric sequence:
x  n = a n
n
• Z-transforming
  −1  
X (z) = a  a z + a  ( az ) +  ( az )
n
−n n −n −n −n −1 m
= =
n
z z
n =− n=0 n =− n=0 m =1
1 az 1 − a2 −1
= + = a z a
1 − az −1
1 − az 1 + a 2 − a z + z −1 ( ) Assuming |a|<1
Requires az −1  1 Requires az  1
• The ROC is defined by Rmin=|a| and

Rmax=|a|-1.
Left-Side and Right-Sided Sequences
• A sequence is right-sided if it is zero for all values of n<M for
some M and has non-zero values as n→∞.
– Causal sequences are examples of right-sided sequences.
– Right-sided sequences can be thought of a generalisations of causal
signals.
• A sequence is left-sided if it is zero for all values of n>M for
some M and has non-zero values as n→ -∞.
– Anti-causal sequences are examples of left-sided sequences.
– Left-sided sequences can be thought of a generalisations of anti-causal
signals.
• Two-sided sequences extend to infinity in both directions (n
increasing and decreasing).
Typical Sequences
Causal, Acausal,
Right-sided Right-sided
Two-sided,
Not, anti-causal, Acausal
Left-sided
• Note that right- or left-sided sequences can be converted to

causal or anti-causal sequences, respectively, by applying
appropriate delays.
Form of the ROC
• For a right-sided sequence, e.g. a causal signal, its z-transform
has a ROC which is the exterior of a circle. That is, Rmin is
non-zero and Rmax=∞.
• For a left-sided sequence, e.g. an anti-causal signal, its z-
transform has a ROC that is the interior of a circle. That is,
Rmin is zero and Rmax is finite.
• For a two-sided signal, its z-transform has a ROC which is
annular. That is, both Rmin and Rmax are finite and non-zero.
• For a finite length sequence, the ROC is the entire z-plane, that
is Rmin is zero and Rmax=∞.
Stability (Revisited)
• Our previous discussion of stability centred on causal
sequences (or more strictly on right-sided sequences).
• The more general rule for stability is: “A signal is stable if the
ROC includes the unit disc”.
– For this to be the case both Rmin<1 and Rmax>1.
• “Stability” here is taken to mean that the signal remains
bounded for all n (normally as n→±∞).
Example
• Consider the left-sided geometric sequence (note it is not anti-
causal):
x  n  = 2n n  10
= 0 n  10
• This sequence is stable, since x[n]<1000 for all n.
• Its z-transform is
 
512 z −9
9
X (z) =   (2 z)  (2 z )
m m
−n −1 −9 −1
n
2 z = =2 z
9
= z 2
n =− m =−9 m=0
1− z / 2
ROC
Example: Acausal Geometric
Sequence (Again)
• Recall the example
x  n = a n
n
• This is stable as long as |a|<1.

1 − a2
X ( z) =
−1
• The z-transform is a z a
(
1 + a 2 − a z + z −1 )
• If |a|<1 then |a|-1>1 and the ROC
includes the unit disc, as it must
if the signal is stable.
Example: Anti-causal Geometric
Sequence
• Consider the sequence:
x  n  = −a n n0
=0 n0
• Z-transforming: m=-n
−1 
a −1 z −1
 
1
X (z) = − n −n
a z =− −m m
a z = = = a −1 z  1  z a
n =− m =1
1 − a −1 z az −1 − 1 1 − az −1
• Recall the z-transform of the geometric sequence y[n]=an has

exactly the same functional form, but a different ROC, i.e.
1
Y (z) = −1
z a
1 − az
• Note that when stating the z-transform one MUST specify
the ROC.
More Properties of a ROC
• The ROC cannot contain a pole.
• The boundaries of the ROC always have a pole on them.
• This means that for a given pole distribution there is a finite
number of plausible ROCs.
Example pole distribution The plausible ROCs Properties of sequences in
each of the ROCs:
ROC1: Causal sequence
ROC2: Stable (acausal)
sequence
ROC3: Acausal sequence
ROC4: Anti-causal
sequence
174-730-538
174-730-538
Inverse Z-Transform
• We seek to estimate the sequence x[n] from knowledge of its
z-transform, X(z), including the ROC.
• There are two approaches:
– Series expansion
– Contour integration (not to be covered here)
• The series approach is more widely applicable (in fact the
contour integration method is universally applicable too, but
an extra layer of complexity is introduced to allow that).
• In general, computing the inverse z-transform requires more
“art” than computing the forward transform.
Series Approach to the Inverse Z-
Transform (IZT)
• The z-transform is defined as:

X (z) = 
n =−
x  n  z − n = ..... + x  −2 z 2 + x  −1 z1 + x 0 + x 1 z −1 + x  2 z −2 + .....
• Hence if one can express X(z) in the form of a power series in

z, then the coefficients of that sequence are equal to the signal.
So if we can write:

X (z) = 
n =−
x  n  z − n = ..... + c−2 z 2 + c −1 z1 + c0 + c1 z −1 + c2 z −2 + .....
• Then one can equate coefficients for powers of z to obtain

x[n], i.e. x[n]=cn
• Consider the sequence x[n] which has the z-transform, X(z)
x  n = a n n0
( )
1 −1
X (z) = = 1 − az −1
z a
=0 n0 1 − az −1
• The following series expansion should be familiar
(1 − x ) = 1 + x + x 2 + x3 + ...... if x  1
−1
• So that we can write

( )
−1
X ( z ) = 1 − az −1
= 1 + az −1 + a 2 z −2 + a 3 z −3 + ...... since az −1  1
• Comparing this with


X (z) = 
n =−
x  n  z − n = ..... + x  −2 z 2 + x  −1 z1 + x 0 + x 1 z −1 + x  2 z −2 + .....
• So that x  n = a n n0 These terms must be zero, since

there are no terms in z in the
=0 n0 series expansion of X(z)
(Anti-causal case)
• Now consider the same example with an alternative ROC.
( )
1 −1
X (z) = −1
= 1 − az −1
z a
1 − az
• One can not directly apply the expansion of (1-x)-1 which
requires that |x|<1, since |z|<|a| 1<|az-1|.
• However, expand (1-x)-1 for the case |x|>1 we can use the
following “trick”.
( ) ( )
1 1 −1 −1
−1 −1 −1 −1
= = x x −1 = −x 1− x
1− x x x −1 (
−1
)
( )
= − x −1 1 + x −1 + x −2 + ..... = − x −1 − x −2 − x −3 − ...... x −1  1  x  1
(Anti-causal case) (Cont’d)
• So for |z|<|a|
( )
−1
X ( z ) = 1 − az −1
z a
= −a −1 z − a −2 z 2 − a −3 z 3 − a −4 z 4 − .....
• Comparing to

X (z) = 
n =−
x  n  z − n = ..... + x  −3 z 3 + x  −2 z 2 + x  −1 z1 + x 0 + x 1 z −1 + .....
• So that
x  −1 = −a −1 ; x  −2 = −a −2 ; x  −3 = −a −3 ;.....
x  n  = −a n n0 x[n]=0 for n≥0 since there are no
powers of zk, k≥0, in the series expansion.
=0 n0
Inverse Transform of More
Complicated Cases
• In general to invert more complicated functions X(z) one can
express X(z) in terms of simpler (first order) functions and
invert each of those.
• This simplification is usually achieved via partial fractions.
Q(z) Q(z)
X ( z) = =
P(z) ( z − r1 )( z − r2 ) ... ( z − rp )
c1 c2 cp
= + + .... + + d ( z) Roots of P(z), or the poles
z − r1 z − r2 z − rp of X(z).
Polynomial that will occur

Each term can easily be inverted if the order of Q(z) is greater than
individually that of P(z), i.e. q≥p
Example: Part 1
1 + 5 z −1
• Consider X ( z) = The ROC is unimportant for the
5 −1 2 −2 following so it is not stated.
1− z − z
3 3
• This can be expressed as:
1 + 5 z −1
X (z) =
−1  1 −1 
( )
1 − 2 z 1 + z 
 3 
• Using partial fractions one can write this as:
3 2
X ( z) = −1
−
1− 2z 1 + z −1 / 3
• This is now easy to invert and can be done term by term and
can be done taking into the ROC when it is supplied.
Example: Part 2
1 + 5 z −1
• The function X ( z) =
5 2
1 − z −1 − z −2
3 3
• Has 2 poles at z=2, -1/3.
• There are 3 possible ROCs which could apply to this X(z).
a) |z|<1/3
b) 1/3<|z|<2
c) |z|>2
• The form of the ROCs defines the form of the sequence that
they correspond to.
a) |z|<1/3, corresponds to a left-sided sequence, since in is the interior of a circle
and will be unstable since the ROC does not include the unit disc.
b) 1/3<|z|<2, corresponds to a stable, two-sided sequence, since this region
includes the unit disc and is annular.
c) |z|>2, corresponds to a right-sided sequence, since it is the exterior of a circle
and will be unstable because the ROC does not include the unit disc
Example: Part 3
• Find the right-sided sequence with the z-transform
1 + 5 z −1 3 2
X (z) = = −1
− −1
5 −1 2 −2 1 − 2 z 1 + z /3
1− z − z
3 3
• For a right-sided sequence the ROC is the exterior of a circle,
of the possible ROCs, only c) (|z|>2) is of this form.
( ) − 2 (1 + z / 3)
−1 −1
X ( z ) = 3 1− 2z −1 −1
= 3 (1 + 2 z + 2 z + 2 z + ...) − 2 (1 − z / 3 + z / 3 − z / 3 + ....)
−1 2 −2 3 −3 −1 −2 2 −3 3
= 1 + z ( 3  2 + 2 / 3) + z ( 3  2 − 2 / 3 ) + z ( 3  2 + 2 / 3 ) + .....
−1 −2 2 2 −3 3 3
x  n  = 3 ( 2 ) − 2 ( −1/ 3)
n n
n0
=0 n0 Notice this is unstable 2n→∞ as n →∞
Example: Part 4
• Find the stable sequence with the z-transform
1 + 5 z −1 3 2
X (z) = = −1
− −1
5 −1 2 −2 1 − 2 z 1 + z /3
1− z − z
3 3
• The ROC which corresponds to a stable sequence is 1/3<|z|<2
Can’t expand, since inside ROC |2z-1|>1 Can expand, since inside ROC |z/2|<1
( ) ( ) ( )
−1 −1 3 −1
X ( z ) = 3 1− 2z = − z (1 − z / 2 ) − 2 1 + z −1 / 3
−1 −1 −1
− 2 1+ z / 3
2
3  z z 2 z3 
(
= − z 1 + + 2 + 3 + ...  − 2 1 − z −1 / 3 + z −2 / 32 − z −3 / 33 + ....
2  2 2 2
)

 z z 2 z3 z 4 
(
= −3  + 2 + 3 + 4 + ...  − 2 1 − z −1 / 3 + z −2 / 32 − z −3 / 33 + .... )
2 2 2 2 
x  n  = −3  2n n0
Stable, note that 2n→0 as n →-∞
= −2 ( −1/ 3)
n
n0 and (1/3)n →0 as n →∞
Example: Part 5
• Find the left-sided sequence −1
with the z-transform
1 + 5z 3 2
X (z) = = −
5 −1 2 −2 1 − 2 z −1 1 + z −1 / 3
1− z − z
3 3
• For a left-sided sequence the ROC is the interior of a circle, in
this case this has to be |z|<1/3.
Can’t expand inside ROC Can expand inside ROC
( ) ( )
−1 −1 3
X ( z ) = 3 1− 2z = − z (1 − z / 2 ) − 6 z (1 + 3 z )
−1 −1 −1 −1
− 2 1+ z / 3
2
3  z z 2 z3 
(
= − z 1 + + 2 + 3 + ...  − 6 z 1 − 3 z + 32 z 2 − 33 z 3 + ....
2  2 2 2
)

 z z 2 z3 
(
= −3  + 2 + 3 + ...  − 2 3 z − 32 z 2 + 33 z 3 − 34 z 4 + .... )
2 2 2 
x  n  = −3  2n − 2  ( −3)
−n Right-sided (in this case anti-causal) and
n0 unstable, since 2n →0 as n →-∞ BUT
=0 n0 (-1/3)n →∞ as n →-∞
Computing the Impulse Response of
a Difference Equation
• One can compute the impulse response of a difference
equation using the IZT.
• Specifically, one can compute the transfer function, H(z), from
the difference equation.
• The impulse response, h[n], is then obtained by inverse z-
transforming, H(z).
• We normally assume that the system is causal, so that the ROC
used is the exterior of the circle containing the outer-most
pole.
Example
• Consider the ARMA difference equation (which we have seen
several times before).
y  n  = 0.5 x  n  + 0.5 x  n − 1 − 0.25 y  n − 1
(
0.5 1 + z −1 ) = 0.5 1 + z 1 + 0.25z
( )( )
−1
H (z) = −1 −1
(1 + 0.25z −1
) Assuming causality, the ROC is |z|>1/4
 z −1 z −2 z −3 z −4 
( −1
= 0.5 1 + z 1 − ) + 2 − 3 + 4 − ... 
 4 4 4 4 
1 z −1  1  z −2   −1   −1  
2
= + 1 −  +    +    + .....
2 2  4  2  4   4  
h  n  = 0 n  0, h  0 = 1/ 2, Compare with impulse response calculated
in “Basics of Digital Systems” notes
1   −1   −1   −3  −1 
n n−1 n
h  n =    +    =  
2   4   4   2  4 
1 3
h  0 = , h 1 = = 0.375, h  2 = −0.09375, h 3 = 0.0234, h  4  = −0.00586,...
2 8
Algebraic Long Division
• An alternative method for computing the series expansion of a
ratio of the form
Q( z)
f ( z) =
P( z)
is based on long division.
• The polynomial P(z) is divided into Q(z) using the standard
rules of long division.
• The resulting series expansion can be used to identify the
coefficients of z-n corresponding to x[n].
• The result is usually not a general expression for x[n] but
allows one to calculate the first few terms.
Example (again)
1 + 5 z −1
X ( z) =
Series expansion of X(z)
5 −1 2 −2 Equating powers of z-n we get
1− z − z x[0]=1, x[1]=20/3, x[2]=106/9, ….
3 3
1 + 20 / 3 z −1 + 106 / 9 z −2 +
5 −1 2 −2
1 − z − z 1 + 5 z −1
3 3
Subtract
5 −1 2 −2 Recall for Part 3, the IZT was:
1− z − z
x  n = 3 ( 2 ) − 2 ( −1/ 3) n  0
n n
3 3
20 −1 2 −2 =0 n0
z + z
3 3 Subtract n = 0, x  0 = 1
 5 −1 2 −2  20 −1 20 −1 100 −2 40 −3
1 − z − z   z = − −  −1  20
 3 3  3
z z z n = 1, x[1] = 3  2 − 2   =
3 9 9  3  3
2
106 −2 40 −3  −1  106
+ z + z n = 2, x  2 = 3  22 − 2   =
9 9  3  9
 5 −1 2 −2  106 −2 106 −2 530 −3 212 −4
 1 − z − z   z = z − z − z
 3 3  9 9 27 27
Summary:
ODE
ap
dpy
dt p
+ ... + a1
dy
dt
+ a0 y = bq
dqx
dt q
+ ... + b1
dx
dt
+ b0 x
Underlying
Physics
Continuous
Laplace Not discussed
Systems
Fourier (As before)
transform here
Transform
FRF
Transfer s=2if
bq ( 2if ) + ... + b1 2if + b0
q
H (s) = H(f )= Measurement
a p s p + ... + a1s + a0 a p ( 2if ) + ... + a1 2if + a0
p
f=s/2i
P(s)=0 Fourier
Q(s)=0 Laplace
Transform
transform
any input
Impulse 
Response y (t ) =  x (u ) h (t − u ) du
−

stable? causal?
Is the system
Summary:
Difference
Equations
causal?
y  n = b0 x  n + b1 x  n − 1 + ... + bq x n − q  − a1 y n − 1 − − a p y n − p 
Digital
Systems
z-transform Fourier Not discussed
here
Transform
of a sequence Design
z = e2 if / f
Transfer s
Function b0 + b1e −2 if / f + ... + bq e −2 qif / f
b0 + b1 z −1 + ... + bq z − q s s
H (z) = H(f )=
1 + a1 z −1 + ... + a p z − p 1 + a1e −2 if / f + ... + a p e −2 pif / f
s s FRF
f = f s log ( z ) / ( 2i )
P(z)=0
Fourier
Q(z)=0 Transform
z-transform of a sequence 
y  n =  h  m x  n − m
m=−
Impulse Digital Response to

Response Convolution any input
Is the system
Is the system
causal?
stable?
IZT Example 1
2z +1 1
X (z) = z   z −1 / 3  1 z-transform to invert, note the ROC |z-1/3|<1
3z − 1 3
Can’t expand this because the condition implies that |3z|>1
X ( z ) = ( 2 z + 1)( 3 z − 1)
−1
2 z + 1 2 / 3 + z −1 / 3 1 Can expand this since according

( )( )
−1
X (z) = = −1
= 2 + z −1
1 − z −1
/3 to the ROC |1/(3z)|<1
3z − 1 1− z / 3 3
z −1 z −2 z −3 z −4
( )
−1
−1
1− z / 3 = 1+ + 2 + 3 + 4 + ......
3 3 3 3
−1  z −1 z −2 z −3 z −4 
1
(
X ( z ) = 2 + z 1 + ) + 2 + 3 + 4 + ...... 
3  3 3 3 3 
2 1  2  −1 1  2 1  −2 1  2 1  −3
X (z) = +  + 1 z +  2 +  z +  3 + 2  z + .....
3 3 3  33 3 3 3 3 
X ( z ) = x  0 + x 1 z −1 + x  2 z −2 + x 3 z −3 + x  4 z −4 + ......
2 3 Equating powers of z-1.
2 51 51 51
 x  0 = ; x 1 = ; x  2 =   ; x 3 =   ;....
3 33 33 33
n
2 51
x  n  = 0, n  0, x  0 = , x  n  =   n  0
3 33
IZT Example 2
z −1
X (z) = z  1  z −1  1 z-transform to invert, note the ROC |z-1|<1
(1 − z )
2
−1
X ( z ) = z (1 − z ) (1 − z )
−1 −1
−1 −1 −1
(1 − z ) = 1 + z + z + z + z
−1
−1 −1 −2 −3 −4
+ z −5 + ...... Expansion is valid since |z-1|<1.
(1 − z ) (1 − z ) = (1 + z + z )( )
−1 −1
−1 −1 −1 −2
+ z −3 + ...... 1 + z −1 + z −2 + z −3 + ......
= 1 + z −1 + z −2 + z −3 + z −4 + z −5 + ......
(
+ z −1 1 + z −1 + z −2 + z −3 + z −4 + z −5 + ...... )
+ z −2 (1 + z −1
+ z −2 + z −3 + z −4 + z −5 + ......) + .......
= 1 + 2 z −1 + 3 z −2 + 4 z −3 + 5 z −4 + .....
(1 − z ) (1 − z )
−1 −1
X (z) = z −1 −1 −1
= z −1 + 2 z −2 + 3z −3 + 4 z −4 + 5 z −5 + .....
x  n = 0 n  0, x n = n n  0
IZT Example 3
6 z 2 + 3z − 1 1
X (z) = 2 z  Use partial fractions to break X(z) into
6z + 5z + 1 2 smaller, easy to compute, parts.
−1 −1
2 4 z 4z / 3
X ( z ) = 1+ − = 1+ −
2 z + 1 3z + 1 1 + z / 2 1 + z −1 / 3
−1
( ) ( )
−1 4 −1 −1
−1 −1 −1
= 1+ z 1+ z / 2 − z 1+ z / 3 Note both expansions are valid since
3 in ROC |z-1|/2<1 and |z-1|/3<1
−1  z −1 z −2 z −3  4 −1  z −1 z −2 z −3 
= 1 + z 1 − + 2 − 3 + ....  − z 1 − + 2 − 3 + .... 
 2 2 2  3  3 3 3 
−1  4  −2  −1 4 1  −3
  −1 
2
4  −1 
2

= 1 + z 1 −  + z  +  + z    −    + ......
 3  2 3 3  2  3  3  
x  n  = 0, n  0, x  0 = 1
n−1 n−1
 −1  4  −1   −1 
n
 −1 
n
x  n =   −   = 4  − 2  n0
 2  3 3   3   2 
General Principles of Filter Design
Chuang Shi

Outline of what is to come …
• Background to filters
• Filter design principles
• Methods for FIR filter design
– Windowing design
• Analogue filter designs
• Methods for IIR filter design
– Method of mapping differentials
– Impulse invariance
– Bilinear transform
Filters
• Recall the term “filter” is not really distinct from that of a
“digital system”.
• Filters aim to modify their inputs such that their outputs
possess some specified properties.
• The problem we shall consider is the design of the filters, i.e.
how can one select the system coefficients (e.g. the aks and bks
of an ARMA system) so that a particular property is realised?
• All such designs require compromise – ideal filters are rarely
implementable (realisable).
Types of Filters
• Frequency selective filters
– Filters that pass some frequencies and stop others
• Differentiators/integrators The only ones we shall
• Interpolators consider
• Prediction
– Forward or backward prediction
• Hilbert transformers
• Optimal filters
– For example, estimating one sequence from another.
• Tracking/state estimations filters
Types of Frequency Selective Filter
• Low-pass
– Frequencies above a cut-off frequency are rejected
• High-pass
– Frequencies below a cut-on frequency are rejected
• Band-pass
– Frequencies between two specified frequencies are passed
• Band-stop
– Frequencies between two specified frequencies are rejected
• Notch filter
– A narrow form of band-stop filter
• Comb filter
– A filter consisting of a series of notches
Examples of Frequency Selective Filters
Forms of Digital Filter
• When designing a filter, the first decision to be made is
whether the filter is to be Finite Impulse Response (FIR) or
Infinite Impulse Response (IIR).
• FIR filters are (always ?) implemented as a moving average
(MA) system.
• IIR filters are implemented as ARMA systems.
• There are very different design methodologies for FIR and IIR
filters.
• There are advantages and disadvantages to both (these will be
examined later).
Steps in Filter Design
• Designing a filter consists of the following general stages:
1. Specifying the required filter response, e.g. cut-on/-off frequencies.
2. Defining the type of filter you need, i.e. FIR or IIR
3. Deciding upon the number of coefficients
4. Designing a filter
5. Comparing the response with the specified response
• If the filter fulfils the specification either:
– Consider reducing the model order and redesigning (can you meet the
specification with a shorter filter?)
– Stop
• If the filter fails to meet the specification either:
– Consider increasing the model order, return to step 3.
– Reconsider your choice of filter type, return to step 2.
– Modify the specification (!), return to step 1.
The Effect of the Number of
Coefficients
• Choosing the number of coefficients in a filter is usually a
compromise.
• Filters normally have a better response if longer filters (ones
with more coefficients) are used.
• Longer filters require greater computational loads – which
may be an issue in real-time systems.
• Longer filters may introduce greater delays to the system (or
increase “phase distortion” – see later notes).
• Also, longer filters are more affected by rounding errors in
their coefficients (also see next slide).
– Because of finite precision, the filter coefficients are rounded before
they are implemented. The impact of these rounding errors tends to be
greater in longer filters than in shorter ones.
An Example of the Effects of
Rounding Coefficients
Coefficients have been
rounded to 3 d.p., which is
rather more dramatic than
normal (it is roughly equivalent
to 10 bit computation) – so the
effect is magnified here.
Note that the 4th order filter

is almost unaffected by
rounding the coefficients, the
two responses largely overlay
each other.
We shall later discuss

Butterworth filters: they are
one form of IIR filter.
In general, the effect of such

errors is less significant in
FIR filters.
An Ideal Filter
• Consider the problem of designing a low pass filter
– The same issues apply to all filter types.
• An ideal low-pass filter with cut-off at foff has a frequency
response function of the following form
H ( f ) =1 f  f off In which case note it is also

true that: H ( f ) = 1
2
f  f off
=0 f  f off =0 f  f off
• The output from such an ideal filter has no energy above foff
but energy below foff is preserved by the filter.
• Note the filter’s FRF is discontinuous at foff and, as such, no
realisable filter can be designed that exactly has this FRF.
Practical Designs
• In practice we must accept that a filter will only approximate
the ideal FRF.
• Loosely (for a low-pass filter) we would like a filter to have an
FRF whose magnitude is:
– Close to 1 for frequencies below the cut-off.
– Close to 0 for frequencies above the cut-off.
– Near the cut-off frequency we expect the response to rapidly change
from 1 to 0.
• In practice, the response tends to oscillate around 1 in the pass-
band (a phenomenon called pass-band ripple).
• Be small (but generally not zero) in the stop band.
• Take a finite time to transit between the two bands – this
region, where the filter response changes from close to 1 to
close to 0, is called the transition zone.
Examples of Practical Designs
Linear Scale dB Scale
Transition zone
Transition zone
A linear scale usually shows the The stop band-band behaviour is more
the pass-band ripple more effectively clearly assessed using a dB representation.
than a dB scale does.
Filter Specifications
• A filter specification should be realisable.
• The ideal filter is not a specification, since no practical filter
can have such a response.
• A practical filter specification (for a low-pass filter) should
define:
– The end of the pass-band
– The width of the transition zone (which along with the above defines
the start of the stop-band).
– The permissible level of rippling in the pass-band.
– The maximum gain in the stop-band, sometimes called the stop-band
ripple.
Introduction to Filter Designer - MATLAB

& Simulink Example (mathworks.com)
MATLAB’s Filter Specification
• MATLAB’s method of specifying a filter response mimics
most methods.
• The specification looks like: Value defining pass-band ripple
Frequencies defining the edges of the Value defining minimum

pass- and stop-bands. attenuation in the stop-band
• A filter meeting the specification should have an FRF that lies

completely within the hashed region (such as the one shown).
Compromises in Filter Design
• Filter design inherently involves a compromise.
• Basically, for a given filter order (number of coefficients)
making the transition zone narrower, will lead to more rippling
in the pass- and stop-bands.
• The filter design can be regarded as trying to minimise
oscillations in the pass- and stop-bands for a given width of
transition zone.
Methods of FIR Filter Design
• The design methods for FIR and IIR filters are very different,
so we consider them separately.
• The design of FIR filters tends to be more straightforward and
we shall begin with those techniques
• Three methods are commonly considered:
– Windowing method
– Frequency sampling
– Optimal designs – the equi-ripple principle
• We will illustrate some of the principles by considering the
windowing method.
Practical Introduction to Digital Filter Design - MATLAB &

Simulink Example (mathworks.com)
The Key to the Design of FIR Filters
• To design an FIR filter, we are looking to select the
coefficients bk in the following MA difference equation.
y  n  = b0 x  n  + b1 x  n − 1 + b2 x  n − 2 + .... + bL x  n − L 
L
=  b x n − k 
k =0
k
• Recall:
– These are all-zero filters and that there is no issue with stability.
– The impulse response of the system is equal to the filter coefficients
h[n]=bn.
– From the preceding statement one can see that the filter’s FRF is equal
to the FT of the coefficients, i.e. H ( f ) = F h  n  = F bn 
This last point, in particular, makes the design of FIR filters relative straightforward.
Windowing Design Method
• The windowing design method is based on the idealised filter.
• Since we have
 
H ( f ) = F bn  bn = h  n  = F −1 H ( f )  
• Hence the “ideal” filter coefficients are the inverse FT of the
ideal FRF. In this notation a “~” is used to
H ( f ) =1
indicate quantities that relate to the
• We can say f  f c ideal filter.
=0 f  fc
• Note that for an ideal filter we actually only require that its
magnitude is 1 in the pass band, in the above we consider the
special case where it actually equals one.
Ideal Impulse Response
• The inversion of the ideal filter can be performed analytically.
fs / 2
h  n  = bn =

− fs / 2
H ( f ) e 2 ifn df
sin ( 2f c n )
fc
 1 e
1 2 ifn f c
= 2 ifn
df = e  =
2in − fc n
− fc
i.e. the ideal impulse response is a sampled sinc function.

Issues with the “Ideal” Filter
• The previous filter is impractical for two reasons:
1. It is infinite in extent: it exists for all -∞<n< ∞.

• Obviously, a finite impulse response filter cannot have an infinitely
long response.
2. It is acausal, the impulse contains coefficients for n<0.
• In some applications, e.g. off-line applications, the lack of causality
might not be a problem.
• The windowing design method offers solutions to both

problems.
Truncation (Windowing)
• Firstly, the ideal response is windowed in order to stop it
extending to infinity.
• Specifically
bn = w  n  bn
where w[n] is a windowing function, such as those used in
spectral analysis (Hanning, Hamming, etc).
• The window is assumed to be finite duration and defined
symmetrically about n=0, i.e.
1. Symmetry w[-n]= w[n].
2. Finite duration w[n]=0 for |n|>K.
• The above implies that the window has 2K+1 coefficients, i.e.
that there are an odd number of coefficients.
– In fact this constraint can be relaxed, but, for simplicity, we shall
continue to assume that there are an odd number of coefficients.
Symmetry
• Since the response H ( f ) is real and symmetric then the
sequence bn is also real and symmetric.
• So that the ideal filter’s coefficients are:
– Symmetric b− n = bn
– Real valued
• This is because in general the Fourier transform of a real
symmetric function, is itself real and symmetric.
• If the windowing is performed symmetrically then the
windowed coefficients bn maintain symmetry, ensuring that
H(f) is also real and symmetric.
Effect of Truncation in the
Frequency Domain
• Since the windowing process consists of multiplication in the
time domain, its effect in the frequency domain can be
expressed as a convolution.
• Specifically since
bn = w  n  bn then H ( f ) = W ( f ) * H ( f )
where W(f) is the FT of the sequence w[n].
H(f ) W(f) H(f )
* =
Ideal filter’s FRF Window’s FT Designed filter’s FRF
Window Features which Affect the
Filter Design Ripple in Pass-
Stop-Band
Width of Width of
Window’s Transition
Main Lobe Zone
Window’s Side-lobes
• The width of the window’s main lobe defines the width of the
filter’s main lobe.
• The window’s side-lobes defines the filter’s pass-band and
stop-band behaviour.
Various Choices of Windowing
Functions (Linear Plots)
Window function - Wikipedia

Various Choices of Windowing
Functions (dB Plots)
Narrow
transition zone
poor attenuation
in the stop-band
Wide transition
zone good
attenuation in
the stop-band
Truncated Filter Coefficients
• As described so far the FIR filter coefficients are
b− K , b− K +1 ,...., b−1 , b0 , b1 ,...., bK corresponding to the difference
equation:
y  n  = b− K x  n + K  + b− K +1 x  n + K − 1 + .... + b−1 x  n + 1 + b0 x  n  +
+ bK x  n − K 
• This filter is acausal, y[n], as it depends on future values of the
input, x[n].
• This can be rectified by waiting K samples before computing
the output, i.e. not computing y until x[n+K] has occurred.
• This is equivalent to the difference equation
y  n + K  = b− K x  n + K  + b− K +1 x  n + K − 1 + .... + b−1 x  n + 1 + b0 x  n  +
+ bK x  n − K 
Shifting
• The difference equation
y  n + K  = b− K x  n + K  + b− K +1 x  n + K − 1 + .... + b−1 x  n + 1 + b0 x  n  +
+ bK x  n − K 
• Is equivalent to
y  n  = b− K x  n  + b− K +1 x  n − 1 + .... + b−1 x  n − K + 1 + b0 x  n − K  +
+ bK x  n − 2 K 
(replacing n+K by n throughout) clearly this is causal.
• In this form the coefficients b are now numbered strangely, it
is sensible to renumber them using bˆk = bk − K leading to
y  n  = b垐
0 x  n  + b1 x  n − 1 + .... + bK −1 x  n − K + 1 + bK x  n − K  +
垐
+ bˆ2 K x  n − 2 K 
Summary
Design Principles for IIR Filters
Chuang Shi

Sketch of IIR Filter Design
• The design of IIR filters consists of two basic steps:
1. Design of an equivalent analogue filter, which, itself consists of two

steps:
• Design of a unit low-pass filter using one of a step of standard designs
• Conversion of the resulting analogue unit low-pass filter to an analogue
filter of the appropriate form (high-pass, band-pass, band-stop or a low-
pass filter with a cut-off which is not unity).
2. Mapping of the analogue filter to a digital equivalent.

Standard Analogue Designs
• There are 5 standard unit low-pass filter designs we shall
consider:
– Bessel filters
– Butterworth filters
– Chebychev filters
• Type I
• Type II
– Elliptic filters
• A unit low-pass filter is a filter with a cut-off frequency such

that woff=2pfoff=1.
• Each design has at least one parameter (its order) generally
controlling the sharpness filter’s cut-off.
Bessel Filters
• Bessel filters: have good phase responses.
• The transfer function of these filters is given by:
qn ( 0 )
H (s) =
Friedrich
qn ( s )
Bessel
(1784-1846)
where qn(x) is a “reverse Bessel polynomial” (don’t worry you

don’t need to know what these are!)
• The first 3 order filters are
1
H (s) = n =1
s +1
3
= n=2
s + 3s + 3
2
15
= 3 n=3
s + 6 s 2 + 15s + 15
Bessel Filters (Cont’d)
• Bessel filters have very good phase responses, they introduce
minimal levels of phase distortion (see later notes).
• This means they have frequently been used for cross-over
systems.
Stephen Butterworth
(1885-1958)
Butterworth Filters
• Butterworth filters are a widely used class of analogue filters.
• These filters have a transfer function of the form:
1
H ( w) =
1 + w2 n
where n is the filter order.
• Thus the squared magnitude of the filter’s transfer function is
1
H (s) = n =1
s +1
1
= n=2
s 2 + 2s + 1
1
= n=3
( s + 1) ( s 2
+ s +1 )
Butterworth Filter (Cont’d)
• Butterworth filters are characterised by having a gain of -3 dB
at w=1 for all orders n.
Chebyshev Filters (Type I)
• There are two forms of Chebyshev filters: type I &
type II filters.
• Type I filters are characterised by an equi-ripple in
the pass-band and monotonic decay in the stop- Pafnuty
Chebyshev
band. (1821-1894)
• The frequency response of a type I filter is:

1
H ( w) =
2
1 + e Tn ( w)
2 2
where e is a scalar which controls the degree of rippling in the

pass-band, n is the order and Tn(x) are the Chebyshev
polynomials.
Chebyshev Polynomials
• The Chebyshev polynomials, Tn(x), are a widely used set of
functions, defined by an order n.
• They are polynomials of order n. In the region |x|<1 all the
turning points of Tn(x) occur at Tn(x) =±1.
T1 ( x ) = x
T2 ( x ) = 2 x 2 − 1
T3 ( x ) = 4 x 3 − 3x
T4 ( x ) = 8 x 4 − 8 x 2 + 1
T5 ( x ) = 16 x 5 − 20 x 3 + 5 x
Chebyshev Filters (Type I) (Again)
Effect of changing e, altering the degree of
rippling in the pass-band.
Chebyshev Type I filters of different orders

Chebyshev Filters (Type II)
• Type II Chebyshev filters have rippling in the stop-band and
are monotonically decreasing in the pass-band.
• The form of frequency response of Chebyshev type II filters is
1
H ( w) =
2
1
1+ 2
e Tn ( w)
2
again e controls the ripple (in this case in the stop-band).

Chebyshev Filters (Type II) (Cont’d)
Changing e so that one achieves different
attenuations in the stop-band, all for a 3rd
order filter.
Different model orders realising 40 dB attenuation

in the stop-band.
Elliptic Filters
• Sometimes call Cauer filters.
• Elliptic filters have ripples of constant magnitude Wilhelm
in the pass- and stop-bands. Cauer
(1900-1945)
• Have a frequency response of the form:

1
H ( w) =
2
1 + e Rn ( z, w)
2 2
where Rn( ) are elliptic (or Chebyshev) rational functions

(again you don’t need to know what these are!), the parameter
z is called the selectivity parameter.
• Together e and z control the rippling in the pass- and stop-
bands.
Examples of Elliptic Filters
Elliptic filter responses, with 1 dB ripple in the
pass-band and 40 dB attenuation in the stop-band.
3rd order elliptic filters with 40 dB stop-band

attenuation and different levels of pass-band ripple.
General Comments
• The methods can be ranked according to the sharpness of their
transition zones (starting with the sharpest):
– Elliptic
– Chebyshev (Type I and II)
– Butterworth
– Bessel
• The reverse order is true if one ranks the filter’s in terms of
their phase responses.
Filter Design Tool | Filter Wizard | Analog Devices

Mapping Low-Pass Filters
• The final step in designing analogue filter is to convert the unit
low-pass filter into the required form.
• This is achieved by a simple transformation applied to the
variable s.
• This converts the unit low-pass filter to:
– A general low-pass filter
– A high-pass filter
– A band-pass filter
– A band-stop filter
Mappings
• Low-pass filter with a cut-off frequency woff: s → s / woff
• High-pass filter with a cut-on frequency won: s → won / s
• Band-pass filter described by the frequencies won and woff :

s 2 + won woff
s→
s ( woff − won )
• Band-stop filter described by the frequencies won and woff :
s ( won − woff )
s→
s 2 + won woff
Example
• Consider a 3rd order high-pass Butterworth filter with cut-on
frequency won=100 rad/s.
• Starting with the standard 3rd order Butterworth unit low-pass
filter 1
H (s) =
(
( s + 1) s 2 + s + 1 )
• Map to a high-pass filter at 100, using the mapping s→won/s
1 s3
H hp ( s ) = =
 100    100  100 
+ 1  
2
(100 + s (
) 10000 + 100 s + s 2
)
   + + 1

 s  s  s 
Example 3rd Order Butterworth
Filters
Summary
• So far we have described methods for designing analogue
filters: these filters have a transfer function H(s).
• We now consider how to convert that analogue filter to a
digital version, with a transfer function H(z).
• This can be regarded in several different ways:
– Mapping the variable s to the variance z.
– Creating an equivalent difference equation from a differential equation.
• In fact from this stand-point this process shares much with the problem of
the numerical solution of differential equations, such as Runge-Kutta
methods.
What is Required of such a
Mapping?
1. Stability should be maintained, i.e. if H(s) stable, then after
the mapping the digital system H(s) should also be stable.
2. For an analogue filter, the frequency response H(f) has been
“carefully” selected, one wants preserve the character of this
response in the digital domain.
• To do this the mapping should take the points of the analogue
frequency axis (s=iw) should be mapped to the digital frequency axis
(z=eiw).
3. The mapping should be one-to-one.
• This means that there can be no aliasing, since each point in the
analogue (s) domain is mapped to a different point the digital (z)
domain.
Method of Mapping Differentials
• This is equivalent to Euler’s method for solving differential
equations.
• It is based on a finite difference approximation for derivatives.
dx x ( t ) − x ( t − h )

dt h
for small h.
• If h is selected to be the sampling interval, D, then
dx x ( t ) − x ( t − D ) x  n  − x  n − 1
 =
dt D t =nD
D
Method of Mapping Differential in
the Transform Domain
• Since in the Laplace domain
 dx 
L   = sX ( s )
 dt 
and in the z-domain, then
 x  n  − x  n − 1  (
1 − z −1 ) X (z)
Z =
 D  D
• Loosely one can say that
1 − z −1 1 − z −1
sX ( s )  X (z)  s 
D D This substitution
constitutes the method
of mapping differentials
Example
dy
• Consider the simple system +y=x
dt
1
H (s) =
s +1
• Using the method of mapping differentials
1 1 D
H (z) = = −1
=
s + 1 s=1− z −1
1− z 1 + D − z −1
D +1
D
• Consider the poles of these two systems:
– H(s) has one pole at s=-1
– H(z) has one pole at z=(1+D)-1 (1 + D ) y  n − y  n − 1 = Dx  n 
1 D
 y  n = y  n − 1 + x n
1+ D 1+ D
Difference Equation form
General Properties
• Using 1 − z −1 1
s=  z=
D 1 − Ds
where do points in the s-plane map to in the z-plane?
• Note that:
– s=0  z=1
– |s|→∞  z→0
1 1  1 + iwD  1 + eiq
s = iw  z = = 1 + =
1 − iwD 2  1 − iwD  2
where q = 2 tan −1 ( wD )
Frequency axis is the s-plane map to a circle in z-plane, but NOT the unit disc, i.e.
not the frequency axis in the z-plane.
Graphical Representation of
Mapping Differentials
Summary of Mapping Differentials
1. The method of mapping differentials is a one-to-one mapping
• It does not introduce aliasing
2. It preserves stability.
• The left half of the s-plane is mapped to the interior of the circle
defined by (1+e-iq)/2 (see grey regions in previous plot).
3. Points on the frequency axis in the s-plane do NOT map to
the frequency axis in the z-plane.
• This means that the frequency response of the digital system will not
be equivalent to that of the analogue system/filter.
• At low frequencies this mapping approximately preserves the FRF.
4. This method is not well suited to filter design
• Conceivably it might be used in the case of non-linear systems
• It could be used to design low-pass (or band pass filters) if the cut-off
frequency is very much smaller than fs/2.
Example
• Designing a 3rd order high pass filter with cut on at 100 Hz,
with a sample rate of 1 kHz using method of mapping
differentials.
Impulse Invariance
• This is the second approach to computing a digital system
from an analogue one.
• It consists of 3 steps:
1. Compute the inverse Laplace transform of H(s), i.e. compute the
impulse response h(t).
2. Sample this impulse response to create h[n]=h(t) for t=nD.
3. Compute the transfer function, H(z), of the digital system with impulse
response h[n] (using the z-transform).
• Note that step 2 is a sampling process that possibly introduces
aliasing.
Example
• Again consider the system
1
H (s) =
s +1
−1  1 
h (t ) = L   = e −t
t0
 s + 1
• Sample the impulse response
h  n  = h ( t ) t =nD = e − nD n0
• Z-transform to compute H(z).
  
  ( ) 1
H (z) = h n z
n
−n − nD − n −D −1
= e z = e z =
n =− n =0 n =0
1 − e −D z −1
 y  n  = x  n  + e −D y  n − 1
Impulse Invariance as a Mapping
• It can be shown that, using impulse invariance, the transfer
functions in the s- and z-planes are related via


1
H ( z ) z =e = H a ( s + 2pikf s )
D k =−
sD
where Ha(s) is the analogue system’s transfer function and

H(z) is the digital (impulse invariant) equivalent.
• This is a generalisation of the Poisson Sum formula relating
the Fourier transform of an analogue signal to the Fourier
transform of that signal after sampling (see signal processing
notes).
Impulse Invariance
5pfs
3pfs
pfs
−pfs
−3pfs
−5pfs
Summary of Impulse Invariance
1. Impulse invariance preserves stability.
• All the points in the left-half of the s-plane are mapped to the interior
of the unit disc in the z-plane.
2. The frequency axis in the s-plane is mapped to the frequency
axis in the z-plane.
3. The mapping is not one-to-one.
• The process of sampling can introduce aliasing.
• The analogue frequency response must be zero for frequencies above fs/2.
• This is feasible as long as the filter has a specific final cut-off frequency,
meaning that impulse invariance can be successfully used to design low-
pass or band-pass filters, but not high-pass or band-stop.
The Bilinear Transform
• The bilinear transform is a mapping method in which H(z) is
obtained from H(s) by making the substitution:
s→
2 1 (
− z −1
)
(
D 1 + z −1 )
where D is the sampling interval.
• This is the most widely used method for obtaining a digital
system from an analogue one.
• What follows is not strictly a “proof” for the utility of the
bilinear transform: it is more a verification.
Verification of the Bilinear
Transform (I)
• We shall consider a first order linear system*:
dy dy c b
a + by = cx  = x− y
dt dt a a
• This system has a transfer function:
c
H (s) =
as + b
• Also consider the following integral:
nD

dy
dt = y ( t ) t =nD − y ( t ) t = n−1 D = y  n  − y  n − 1
dt ( )
( n−1)D
*Note that higher order linear systems can be constructed but putting first order systems in series,
so this assumption is not as restrictive as one might, at first, expect.
Transform (II)
• Recall the trapezoidal rule for approximating integral:
( x1 − x0 )
x 1
f ( x ) dx 
x0
2
( f ( x0 ) + f ( x1 ) )
x1


x0
f ( x ) dx
Transform (III)
• Using the trapezoidal approximation one has:
nD
D
 dt = y  n  − y  n − 1  ( y  n − 1 + y  n )
dy
dt 2
( n−1)D
dy
where y  n  =
dt t =nD
dy c b
• Using the linear system we also have that = x − y so
dt a a
D c b c b 
y  n  − y  n − 1   x  n  − y  n  + x  n − 1 − y  n − 1 
2a a a a 
Transform (IV)
• Z-transforming gives:
2
D
( ) c
a
( ) −1 b
(
Y ( z ) 1 − z  X ( z ) 1 + z − Y ( z ) 1 + z −1
−1
a
)
Y (z) c
 H ( z) = =
X (z) (
a
2 1 −)z −1
+b
( D 1+ z) −1
• Compare this to the analogue transfer function

c
H (s) =
as + b
( 2 1)− z −1
• Obviously these are the same if s =

i.e. if the bilinear transform is used. ( )
D 1 + z −1
Example
• Considering the previous example:
1
H (s) =
s +1
• Using the bilinear transform one has
H ( z) =
1
=
(
D 1 + z −1 )
=
D + Dz −1
2  1− z 

−1
−1 
+1
−1
(
2 − 2z + D 1+ z −1
)
2 + D − ( 2 − D ) z −1
D  1+ z 
this is equivalent to the difference equation

D D 2−D
y  n = x n + x  n − 1 + y  n − 1
2+D 2+D 2+D
Properties of Bilinear Transform
2 (1 − z )
−1
• Considering the mapping s =

D (1 + z ) −1
 1 + Ds / 2 
which means that z =  
 1 − Ds / 2 
• This means that:

– s=0  z=1
– |s|→∞  z→-1
– z=eiwD  s=2itan(wD/2)/D
Imaginary values, i.e. points on the analogue

Points on the digital frequency axis frequency axis
Bilinear Transform
Mapping between Frequencies
• We shall consider an angular frequency wa and examine the
digital angular frequency that it gets mapped to wd.
2 2
wa = tan ( wd D / 2 )  wd = tan −1 ( wa D / 2 )
D D
• Note that there is a distortion of the analogue frequency axis
that occurs in order that it can be represented on a circle in the
z-plane.
• This distortion takes the form of a tan function.
Pre-warping
• When using the bilinear transform one has to take care when
selecting the cut-off frequency for the analogue filter.
• The filter will be implemented in the digital domain, so it is
there than one needs to specify the cut-off frequency.
• Hence one needs to select an analogue cut-off frequency,
which, after the bilinear transform has been applied, will be in
the correct location on the digital frequency axis.
• This process is called pre-warping.
Example of Prewarping
• To design a digital filter with a cut-off frequency of 16 kHz
using a system which is implemented (digitally) at a sample
rate of 44.1 kHz.
• To compute the appropriate frequency to use for the analogue
design
2
wa = tan ( wd D / 2 ) = 2 f s tan ( pf d / f s )
D
= 1.92 105  f a = 30.5kHz
• So during the analogue design phase one needs to generate a
filter with a cut-off frequency at 30.5 kHz. After applying the
bilinear transform the cut-off frequency will move to 16 kHz,
as required.
Example of Pre-warping
Phase Response of Filters
Chuang Shi

Outline
• Magnitude and phase responses
• Group delay and phase delay
• Phase distortion
• Linear phase
• Zero phase
• All-pass filters
• Merits of FIR and IIR Filters
Phase Response
• The frequency response function (FRF), H(f ), is a complex
valued function.
i arg( H ( f ) ) i( f )
H(f )= H(f )e = H(f )e
• So far the design of filters has considered only the magnitude

of the FRF, |H(f )|.
• The phase of the FRF, arg(H(f )), has not been considered.
• This phase defines the delays that frequencies undergo as they
pass through the filter.
• Potentially a poor phase response can lead to phase distortion.
Group Delay and Phase Delay
• There two forms of delay that are considered:
– Phase delay
– Group delay
• These are analogous to the group and phase velocities in
dispersive media.
• Consider a sinusoidal input x(t) to a system, whose output,
y(t), is also sinusoidal.
x ( t ) = A1 sin ( 2f 0t + 1 )
y ( t ) = A1 H ( f 0 ) sin ( 2f 0t + 1 +  ( f 0 ) )
Phase Delay
• The phase delay, tp, for a system is defined via the argument
of the FRF, specifically
( f )
tp = −
2f
• The minus sign reflects the fact we choose to define the phase
delay (not the “phase advance”).
• This follows from:
y ( t ) = A1 H ( f 0 ) sin ( 2f 0t + 1 +  ( f 0 ) )
    ( f0 )   
= A1 H ( f 0 ) sin  2f 0  t −  −   + 1 
   
   2f 0   
(
= A1 H ( f 0 ) sin 2f 0 ( t − t p ) + 1 )
Group Delay
• The group delay, tg, is based on the slope of the argument of
the FRF, specifically:
1 d( f )
tg = −
2 df
• The group delay defines the delay packets of energy undergo,
whereas the phase delay defines the delay that a sinusoid
experiences.
• Essentially the group delay defines the delay for the envelope
of the signal and the phase delay defines the delay of the
carrier signal.
x ( t ) = A ( t ) sin ( 2f 0t +  )
(
y ( t )  H ( f 0 ) A ( t − t g ) sin 2f 0 ( t − t p ) +  )
Group Delay and Phase Delay
Group delay
Output Pulse
Input Pulse
Example Phase Response
• When computing the phase response for a system one has to
take some care.
• There is an ambiguity in the phase of a complex value.
z = rei+2 k  for any k
• For each frequency arg(H(f )) takes a value in the range (-,].
• This means that if the phase function strays outside of the
region (-,] then it is folded back into that region.
• This had to be undone in a process called phase unwrapping.
Example of Phase Unwrapping
Butterworth Filter
• 6th Order Butterworth low-pass filter at 0.125
Elliptic Filter
• 6th Order Elliptic low-pass filter at 0.125
Errors due to unwrapping errors

Phase Distortion
• Phase distortion occurs when different frequencies are delayed
by different amounts, i.e. if the phase/group delay are
frequency dependent.
• To avoid phase distortion we require that
1 d( f )
Constants
Group Delay tg = − =
2 df
 ( f ) = −2f + 
( f )
Phase Delay tp = − =
2f
 ( f ) = −2f
Comments
• Recall the group delays for the elliptic and Butterworth filters.
• The elliptic filter’s phase response shows a much stronger
frequency dependence than does the Butterworth filters.
• So the elliptic filter’s have a greater degree of phase distortion.
• They do, generally, have a better magnitude response.
• This is usually the case: filters with rapid transitions normally
introduce larger phase distortion.
Linear Phase
• In order to avoid phase distortion one requires that the FRFs
phase is linear, i.e.
 ( f ) =  − 2f
• For a real system the FRF at f=0 is real, i.e. (f )=0 or , thus
=0 or  (we shall largely assume =0).
• Note the slope, , of the phase response defines the delay
(both the group and phase delay) of the system and this delay
is constant for all frequencies, i.e. no phase distortion occurs.
• Linear phase systems are desirable, further we would generally
also like to minimise the delay .
Linear Phase (Cont’d)
• Recall that the Fourier transform of delayed signal satisfies
X ( f ) = F  x ( t ) F  x ( t − t ) = e −2 if t X ( f )
• So that since a linear phase system has an FRF of the form

H ( f ) = H ( f ) e −2 if
• Thus the impulse response, h(t), is


h ( t −  ) = F −1 H ( f ) 
• The function | H(f )| is real and symmetric, therefore h(t-) is
also real and symmetric.
Impulse Response of Linear Phase
Systems
Odd number of coefficients, N even
Delay is equal to N/2
If N is even the delay is an integer

number of samples.
If N is odd, the delay is non-integer,

it contains a half sample delay.
Delay, 
Even number of coefficients, N odd

Conditions
• The impulse response is symmetric about the point N/2, where
N+1 is the length of the impulse response.
• This means that the length of the impulse response (N) MUST
be finite: i.e. it is impossible to have a (causal) IIR filter which
is exactly linear phase.
• An FIR filter which is linear phase has an impulse response
which is symmetric
• ….. and since, for an FIR filter, the impulse response is equal
to its coefficients, then the coefficients for a linear phase FIR
filter are also symmetric.
Zero Phase
• The “ultimate” linear phase filter is one for which the phase is
linear and there is no delay, =0: a zero phase filter.
• Such filters have FRFs which are real valued.
• For a causal FIR filter, this requires use of a filter of length 1
(N=0) which is a trivial (and rather useless) filter.
• For non-causal filters one can readily make a filter which is
zero phase, based on any standard filter.
• Consider
H ( f ) = H ( f )H ( f )
2 *
since |H(f )|2 is real such a filter must be zero phase.

Zero Phase Filter (Cont’d)
• To implement a zero phase filter with FRF |H(f )|2, one needs
to apply two filters in series: H(f ) followed by H(f )*.
• Recall that
F  x ( −t ) = X ( f )
*
• So that H(f )* is a filter with impulse response h(-t).

• Further, since y ( t ) = x ( t ) * h ( −t )
y ( t ) = w ( −t )
where w ( t ) = x ( −t ) * h ( t )
then a filter with FRF H(f )* can be realised by applying the

filter H(f ) to a time-reversed version of the input, and time-
reversing the output.
Don’t be confused by the two uses of “*” X(f)* is the conjugate of X(f) and x(t)*h(t) is the convolution of x and h.
Implementing a Zero Phase Filter
• A “recipe” to implement a zero phase filter, |H(f )|2, is:
i. Apply the filter to the input x(t) to create an output z(t).
ii. Then time-reverse z(t) to form z(-t).
iii. Apply the filter to the time-reversed signal z(-t).
iv. Time-reverse the output.
• Delays introduced in step i are undone in step iii, so that
overall such a filter introduces no delays.
• Such filters are only suitable for off-line applications, since
they require the entire input x(t) – but can prove useful in
such instances.
• Note the FRF implemented is the square of the H(f ).
Simple Linear Phase System
• Consider a system with two zeros at z=z and 1/z* (where in
general z is complex).
• These zeros have polar forms:
1 i
rei and e
r
1/z*
z
r
 1/r
FRF of 2 Zero System
• This system has a transfer function (assuming causality)
( )( ) ( )(
H ( z ) = 1 − z −1z 1 − z −1 / z* = 1 − z −1rei 1 − z −1ei / r )
 −1 i  1  −2 i 2  
= 1 − z e  r +  + z e 
  r 
• With an FRF given by

 i( −)  1  i 2( −)  i( −)  − i( −)  1  i( −) 
H ( ) =  1 − e r + +e  =e  e −r + +e 
  r     r  
  1   i( −)
=  2cos (  − ) −  r +   e Phase of H(f ), which is linear.
  r 
Magnitude of H(f )
Comments
• The previous system is linear phase and has two zeros at
reciprocal positions relative to the unit disc.
• In fact it is generally true that a linear phase system has zeros
arranged in reciprocal pairs.
Example:
Pole-zero diagram of a FIR band-pass filter, N=16.
Note zeros that are not on the unit disc, occur in

sets of four, z, z*,1/z and 1/z* (there are two such
sets of four in this example).
All-Pass Filters
• Consider a related ARMA system, with a pole at z=z and a
zero at 1/z*, where |z|<1.
• The transfer function is
z* − z −1
H ( z) =
1 − zz −1
• The FRF of this system is
H ( ) =
− i
re − e − i
=
e − i
r − e (
i( −)
)
i − i i( −)
1 − re e 1 − re
H ( )
2
=
(r −e( )
i −)
( r −e
− i( − )
) = r + 1 − 2r cos (  − ) = 1
2
(1 − re ( )
i −
) (1 − re − i( − )
) 1 + r − 2r cos (  − )
2
All-Pass Filters (Cont’d)
• This system has a FRF which has a gain of unity for all
frequencies.
• This filter only affects the phase of the input – it delays
frequency components, does not attenuate or amplify any
component.
• Effectively such filters just introduce phase distortion.
• All-pass filters can be applied in series, without losing their
all-pass character.
FIR vs IIR Filters
• The choice of whether to use an FIR or an IIR filter is critical.
• The advantages and disadvantages of the two types can be
summarised as:
– IIR filters are generally more efficient: they require fewer coefficients
to achieve a given level of performance.
– FIR filters can be designed to avoid phase distortion: in most instances
this is the main reason for selecting FIR filters.
– IIR filters can suffer from the effects of rounding errors on the
coefficients: there is a maximum length of filter which can effectively
be designed.
– Stability need to be considered with IIR filters.
Complex Numbers and their
Reciprocals
• Consider the complex number z=rei.
• It conjugate is z*=re-i
• Its reciprocal is 1 = 1 e − i
z r
1 1 i
• The conjugate of the reciprocal is * = e
z r
1/z*
r
 z
z*
1/r 1/z

ISVR6130

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ISVR6130

Uploaded by

Copyright:

Available Formats

Join: vevox.

app ID: 180-736-270

Fourier Integrals: Part 1

If a signal has a period Tp, the spacing (in frequency)

If a Fourier series has coefficients, dn, that roll-off at 40 dB per decade at

If a signal has Fourier series coefficients which roll-offs

• The concept of Dirac delta functions occurs frequently when

The sifting property is the key one we shall use

• The sifting property provides us a mathematical way of finding

 sin 2t  t 1dt  1 sin  2  0

 t  2t cos 20t  t  a dt a3  2a cos 20 a 

-40 -30 -20 -10 0 10 20 30 40 f (Hz)

• FS of a signal with a period, Tp=0.25 s (fp=1/Tp=4 Hz)

-40 -30 -20 -10 0 10 20 30 40 f (Hz)

As Tp increase, fp reduces, so the spacing between Fourier series components reduces.

• This limit is well behaved for many signals.

• This “undoes” the Fourier integral, creating a time series from

This sign is different between the two

• Anti-symmetric signals, i.e. x  t   x t  then X  f  is

Note that    2 ift *

Using the substitution u=t-t so that du=dt and t=u+t.

Is exactly 1 with t replaced by u, because these

Taking the definition of X(f), multiply by e2pift and integrate:

• What is the Fourier transform of:

Using integral of eat, which is eat/a

U=-2p U=-p U=p U=2p U=3p

Either way one sees that X  f   T as f  0

So as T gets bigger, the Fourier transform becomes taller and narrower.

Recall one definition of the Dirac delta is:   f    e 2 ift dt

And that for this example we are computing X  f   e  2 ift dt

• What is the Fourier transform of:

So the Fourier transform is:

This shows, like in example 1, that the Fourier transform of

• What is the Fourier transform of:

1   2if    2if   2f 

1 arg  X  f  tan  1  2f /  

Log-log plot of |X(f)|

• What is the Fourier transform of:

This will be based on the following:

Where y(t) is given by: y t  e  t t 0

Then we also know that F  y  t  Y  f  (see properties of the

• Example 3 is asymmetric and the Fourier integral is complex

o indicates peak value (2/a)

• Their Fourier transforms are:

Join: vevox.app ID:108-365-854

What can you say about the Fourier Integrals (FIs) of

1. a) - i), b) - ii), c) - iii)

2. a) - i), b) - iii), c) - ii)

3. a) - ii), b) - i), c) - iii)

5. a) - iii), b) - i), c) - ii)

1. a)-i), b)-ii), c)-iii)

2. a)-i), b)-iii), c)-ii)

3. a)-ii), b)-i), c)-iii)

4. a)-ii), b)-iii), c)-i)

5. a)-iii), b)-i), c)-ii)

6. a)-iii), b)-ii), c)-i)

* Denotes convolution (NOT multiplication!), which is completely different

• Hence if you multiply two signals in one domain (time or

• Y(f-u), considered as a function

• Delta functions is the identity function1 x t *  t   x t 

– Combining the last two properties x t *  t     x t   

Input, x(t) Output, y(t)

• An LTI system is characterised by 2 properties: