Basic Video Compression Techniques

Chapter 10
Basic Video Compression Techniques
10.1 Introduction to Video compression

10.2 Video Compression with Motion Compensation
10.3 Video compression standard H.261
10.4 Video compression standard MPEG-1
1
Fundamentals of Multimedia, Chapter 10
The need for video compression
2
Introduction to Video Compression
• A video consists of a time-ordered sequence of frames,

i.e., images.
• An obvious solution to compress video by applying an

image compression algorithm to each frame, for instance
compressing each image frame as a JPEG image.
3
• Consecutive frames in a video are similar — temporal

redundancy exists.
• Significantly higher compression rates can be achieved by
exploiting temporal redundancy.
4
It Utilizes two basic compression techniques:

• Intraframe compression
• Occurs within individual frames
• Designed to minimize the duplication of data in each picture
(Spatial Redundancy)
• Interframe compression
• Compression between frames
• Designed to minimize data redundancy in successive pictures
(Temporal redundancy)
5
• Temporal redundancy arises when successive frames of

video display images of the same scene.
• It is common for the content of the scene to remain fixed
or to change only slightly between successive frames.
• Spatial redundancy occurs because parts of the picture

are often replicated (with minor changes) within a single
frame of video.
6
Temporal Redundancy
• Temporal redundancy is exploited so that not every

frame of the video needs to be coded independently as a
new image.
• It makes more sense to code only the changed
information from frame to frame rather than coding the
whole frame
7
With difference coding, only the first image (I-frame) is coded in its entirety. In the two
following images (P-frames), references are made to the first picture for the static elements, i.e.
the house. Only the moving parts, i.e. the running man, are coded using motion vectors, thus
reducing the amount of information that is sent and stored.
Temporal Redundancy
• Temporal redundancy can be better exploited by

Predictive coding based on previous frames.
• Predicting the motion of pixels and regions from frame to
frame, rather than predicting the frame as a whole.
• Compression proceeds by subtracting images.

 The difference between the current frame and other
frame(s) in the sequence will be coded  small values and
low entropy, good for compression.
8
Temporal Redundancy
9
Pixel motion prediction
• It can be done even better by searching

for just the right parts of the image to
subtract from the previous frame.
• Practically, the coherency from frame to

frame is better exploited by observing that
groups of contiguous pixels, rather than
individual pixels, move together with the
same motion vector.
• Therefore, it makes more sense to predict

frame n+1 in the form of regions or blocks 10
rather than individual pixels.
Pixel motion prediction
• The pixel Cn(x, y) shown in the frame n has moved to a new location
in frame n+1. Consequently, Cn+1(x, y) in frame n+1 is not the same as
Cn(x, y) but is offset by the motion vector (dx,dy).
11
• the small error difference e(x,y) = Cn+1(x, y) - Cn(x+dx, y+dy)
Video Compression with Motion Compensation
Steps of Video compression based on Motion

Compensation (MC):
• MC-based Prediction.
• Motion Estimation (motion vector search).
• Derivation of the prediction error, i.e., the difference.
12
Motion Compensation
• It is an algorithmic technique used to predict a frame in a

video, given the previous and/or future frames.
• Motion compensation describes a picture in terms of the

transformation of a reference picture to the current
picture.
• The reference picture may be previous in time or even

from the future.
13
Motion Compensation
How it works
• It exploits the fact that, often, for many frames of a
movie, the only difference between one frame and
another is the result of either the camera moving or an
object in the frame moving.
• In reference to a video file, this means much of the
information that represents one frame will be the same
as the information used in the next frame.
• Using motion compensation, a video stream will contain
some full (reference) frames; then the only information
stored for the frames in between would be the
information needed to transform the previous frame into 14
the next frame.
Motion Compensation
• Each image is divided into macroblocks of size N x N.

• By default, N = 16 for luminance images. For
chrominance images, N = 8 if 4:2:0 chroma
subsampling is adopted.
15
Motion Compensation
Motion compensation is performed at the macroblock level.

• The current image frame is referred to as Target Frame.
• A match is sought between the macroblock in the Target
Frame and the most similar macroblock in previous and/or
future frame(s) (referred to as Reference frame(s)).
• The displacement of the reference macroblock to the target
macroblock is called a motion vector MV.
• Figure 10.1 shows the case of forward prediction in which

the Reference frame is taken to be a previous frame.
16
Fig. 10.1: Macroblocks and Motion Vector in Video Compression.
• MV search is usually limited to a small immediate

neighborhood — both horizontal and vertical displacements
in the range [−p, p].
This makes a search window of size (2p + 1) x (2p + 1). 17
Size of Macroblocks
• Smaller macroblocks increase the number of blocks in the

target frame
 a larger number of motion vectors to predict the target frame.
 This requires more bits to compress motion vectors,
 but smaller macroblocks tend to decrease the prediction error.
• Larger macroblocks
 fewer motion vectors to compress,
 but also tend to increase the prediction error. This is because
larger areas could possibly cover more than one moving region
within a large macro block. 18
Video compression standard H.261
H.261: An earlier digital video compression standard, its

principle of MC-based compression is retained in all later
video compression standards.
• The standard was designed for videophone, video
conferencing and other audiovisual services over ISDN.
• The video codec supports bit-rates of p x 64 kbps,
where p ranges from 1 to 30 (Hence also known as p *
64).
• Require that the delay of the video encoder be less
than 150 msec so that the video can be used for real-
time video conferencing. 19
H.261 Frame Sequence
Two types of image frames are defined:

• Intra-frames (I-frames) and Inter-frames (P-frames):
20
• I-frames:
• These are intra-frames coded where only spatial redundancy is used to
compress that frame.
• Are treated as independent images (can be reconstructed without any

reference to other frames).
• Transform coding method similar to JPEG is applied within each I-frame.
• This frame requires more bits for compression than predicted

frames  their compression is not that high
21
• P-frames:
• P-frames are predictive coded (forward predictive coding method ),
exploiting temporal redundancy by comparing them with a
preceding reference frame
• Are not independent (it is impossible to reconstruct them without the
data of another frame (I or P))
• They contain the motion vectors and error signals
• P-frames need less space than the I-frames, because only the differences
are stored. However, they are expensive to compute, but are
necessary for compression
• An important problem the encoder faces is when to stop predicting
using P-frames, and instead insert an I frame
• An I frame needs to be inserted where P frames cannot give much
compression
22
• This happens during scene transitions or scene changes, where the error
images are high.
Fig. 10.4: H.261 Frame Sequence.
• We typically have a group of pictures — one I-frame followed by

several P-frames — a group of pictures
• Number of P-frames followed by each I-frame determines the size of
GOP – can be fixed or dynamic. Why this can’t be too large? 23
• Temporal redundancy removal is included in P-frame

coding, whereas I-frame coding performs only spatial
redundancy removal.
• Lost P-Frames usually results in artifacts that are folded

into subsequent frames. If an artifact persists over time,
then the likely cause is a lost P-Frame.
• To avoid propagation of coding errors, an I-frame is

usually sent a couple of times in each second of the
video. 24
Intra-frame (I-frame) Coding
• Various lossless and lossy compression techniques use —

like JPEG.
• Compression contained only within the current frame
• Simpler coding – Not enough by itself for high
compression.
• Cant rely on intra frame coding alone not enough
compression.
• However, cant rely on inter frame differences across a
large number of frames
• So when Errors get too large: Start a new I-Frame
25
Intra-frame (I-frame) Coding
• Macroblocks are of size 16 x 16 pixels for the Y frame, and 8 x

8 for Cb and Cr frames, since 4:2:0 chroma subsampling is
employed. A macroblock consists of four Y, one Cb, and one Cr
8 x 8 blocks.
• For each 8 x 8 block a DCT transform is applied, the DCT
coefficients then go through quantization zigzag scan and
entropy coding. 26
Block Transform Encoding (I-frame)
27
Inter-frame (P-frame) Predictive Coding
The H.261 P-frame coding scheme based on motion

compensation:
• For each macroblock in the Target frame, a motion
vector is allocated.
• After the prediction, a difference macroblock is
derived to measure the prediction error.
• Each of these 8 x 8 blocks go through DCT,
quantization, zigzag scan and entropy coding
procedures.
28
Inter-frame (P-frame) Predictive Coding
• The P-frame coding encodes the difference macroblock

(not the Target macroblock itself).
• Sometimes, a good match cannot be found, i.e., the

prediction error exceeds a certain acceptable level. The
MB itself is then encoded (treated as an Intra MB) and in
this case it is termed a non-motion compensated MB.
 (In order to minimize the number of expensive motion estimation
calculations, they are only calculated if the difference between two
blocks at the same position is higher than a threshold, otherwise the
whole block is transmitted)
• For a motion vector, the difference MVD is sent for

entropy coding
29
Fig. 10.6: H.261 P-frame Coding Based on Motion Compensation. 30

Video compression standard MPEG
• MPEG: Moving Pictures Experts Group, established in

1988 for the development of digital video.
• MPEG compression is essentially an attempt to overcome

some shortcomings of H.261:
• H.261 only encodes video. MPEG-1 encodes video and
audio.
• H.261 only allows forward prediction. MPEG-1 has
forward and backward prediction (B-pictures).
•  MPEG-1 was designed to allow a fast forward and backward
search and a synchronization of audio and video. 31
Motion Compensation in MPEG-1
As mentioned before, Motion Compensation (MC) based

video encoding in H.261 works as follows:
• In Motion Estimation (ME), each macroblock (MB) of
the Target P-frame is assigned a best matching MB
from the previously coded I or P frame - prediction.
• prediction error: The difference between the MB and
its matching MB, sent to DCT and its subsequent
encoding steps.
• The prediction is from a previous frame — forward
prediction.
32
Motion Compensation in MPEG-1
• Sometimes, areas of the current frame can be better

predicted by the next future frame. This might happen
because objects or the camera moves, exposing areas not
seen in the past frames.
The MB containing part of a ball in the Target frame cannot find a good matching MB in
the previous frame because half of the ball was occluded by another object. A match
however can readily be obtained from the next frame. 33
The Need for a Bidirectional Search
The Problem here is that many macroblocks need information that is

not in the reference frame.
• Occlusion by objects affects differencing
• Difficult to track occluded objects etc.
• MPEG uses forward/backward interpolated prediction.
• Using both frames increases the correctness in prediction during

motion compensation.
• The past and future reference frames can themselves be coded
as an I or a P frame.
34
MPEG B-Frames
• The MPEG solution is to add a third frame type which is a

bidirectional frame, or B-frame
35
MPEG B-Frames
• B-frames, also known as bidirectionally coded frames, are

intercoded and also exploit temporal redundancy .
• To predict a B-frame, the previous or past frame and the
next or future frame are used.
• The coding of B frames is more complex compared with I
or P-frames with the encoder having to make more decisions.
36
MPEG B-Frames
• To compute a matching macroblock, the encoder needs to

search for the best motion vector in the past reference
frame and also for the best motion vector in the future
reference frame. Two motion vectors are computed for
each macroblock.
• The macroblock gets coded in one of three modes:
• Forward predicted using only the past frame
• Backward predicted using only the future frame
• Interpolated, using both by averaging the two predicted
blocks
• The case corresponding to the best macroblock match
and yielding the least entropy in the difference is chosen. 37
Backward Prediction Implications
B-frames also necessitate reordering frames during

transmission, which causes delays:
• The order in which frames arrive at the encoder is known
as the display order . This is also the order in which the
frames need to be displayed at the decoder after decoding.
• B-frames induce a forward and backward dependency . The
encoder has to encode and send to the decoder both the
future and past reference frames before coding and
transmitting the current B-frame.
• Because of the change in the order, all potential B-frames
need to be buffered while the encoder codes the future
reference frame, imposing the encoder to deal with buffering
and also causing a delay during transmission.
38
Backward Prediction Implications
Ex: Here, Backward prediction requires that the future frames

that are to be used for backward prediction be Encoded and
Transmitted first, I.e. out of order.
39
Fig 10.9: MPEG Frame Sequence.
Example encoding patterns:
Pattern 1: IPPPPPPPPPP
Dependency: I <---- P <---- P <---- P ...
• I-frame compressed independently
• First P-frame compressed using I-frame
• Second P-frame compressed using first P-frame
• And so on...
40
Pattern 2: I BB P BB P BB P BB P BB P BB P
Dependency: I <---- B B ----> P <---- B B ----> P ...
• B-frames between I-frame and first P-frame compressed
using I-frame and first P-frame
• B-frames between first P-frame and second P-frame
compressed using first P-frame and second P-frame
• And so on...
41
Pattern 3: I BBB P BBB P BBB P BBB P

Dependency: I <---- B B B ----> P <---- B B B ----> P ...
• B-frames between I-frame and first P-frame compressed
using I-frame and first P-frame
• B-frames between first P-frame and second P-frame
compressed using first P-frame and second P-frame
• And so on...
42
The quality of an MPEG-video
• The usage of the particular frame type defines the quality and the
compression ratio of the compressed video.
• I-frames increase the quality (and size), whereas the usage of B-
frames compresses better but also produces poorer quality.
• The distance between two I-frames can be seen as a measure for the
quality of an MPEG-video.
• No defined limit to the number of consecutive B frames that may be
used in a group of pictures,
• Optimal number is application dependent.
• Most broadcast quality applications however, have tended to use 2
consecutive B frames (I,B,B,P,B,B,P,) as the ideal trade-off between
compression efficiency and video quality.
43
MC-based B-frame coding idea (summary)
The MC-based B-frame coding idea is illustrated in Fig. 10.8:

• Each MB from a B-frame will have up to two motion vectors
(MVs) (one from the forward and one from the backward
prediction).
• If matching in both directions is successful, then two MVs
will be sent and the two corresponding matching MBs are
averaged (indicated by ‘%’ in the figure) before comparing
to the Target MB for generating the prediction error.
• If an acceptable match can be found in only one of the
reference frames, then only one MV and its corresponding
MB will be used from either the forward or backward
prediction. 44
• Fig 10.8: B-frame Coding Based on Bidirectional Motion 45

Compensation.

Basic Video Compression Techniques

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Video Compression Techniques

Uploaded by

Copyright:

Available Formats

Chapter 10

Basic Video Compression Techniques

10.1 Introduction to Video compression

The need for video compression

Introduction to Video Compression

• A video consists of a time-ordered sequence of frames,

• An obvious solution to compress video by applying an

Introduction to Video Compression

• Consecutive frames in a video are similar — temporal

Introduction to Video Compression

It Utilizes two basic compression techniques:

Introduction to Video Compression

• Temporal redundancy arises when successive frames of

• Spatial redundancy occurs because parts of the picture

• Temporal redundancy is exploited so that not every

• Temporal redundancy can be better exploited by

• Compression proceeds by subtracting images.

Pixel motion prediction

• It can be done even better by searching

• Practically, the coherency from frame to

• Therefore, it makes more sense to predict

Pixel motion prediction

Video Compression with Motion Compensation

Steps of Video compression based on Motion

• It is an algorithmic technique used to predict a frame in a

• Motion compensation describes a picture in terms of the

• The reference picture may be previous in time or even

• Each image is divided into macroblocks of size N x N.

Motion compensation is performed at the macroblock level.

• Figure 10.1 shows the case of forward prediction in which

Fig. 10.1: Macroblocks and Motion Vector in Video Compression.

• MV search is usually limited to a small immediate

• Smaller macroblocks increase the number of blocks in the

Video compression standard H.261

H.261: An earlier digital video compression standard, its

H.261 Frame Sequence

Two types of image frames are defined:

H.261 Frame Sequence

• Are treated as independent images (can be reconstructed without any

• Transform coding method similar to JPEG is applied within each I-frame.

• This frame requires more bits for compression than predicted

H.261 Frame Sequence

Fig. 10.4: H.261 Frame Sequence.

• We typically have a group of pictures — one I-frame followed by

H.261 Frame Sequence

• Temporal redundancy removal is included in P-frame

• Lost P-Frames usually results in artifacts that are folded

• To avoid propagation of coding errors, an I-frame is

Intra-frame (I-frame) Coding

• Various lossless and lossy compression techniques use —

Intra-frame (I-frame) Coding

• Macroblocks are of size 16 x 16 pixels for the Y frame, and 8 x

Block Transform Encoding (I-frame)

Inter-frame (P-frame) Predictive Coding

The H.261 P-frame coding scheme based on motion

Inter-frame (P-frame) Predictive Coding

• The P-frame coding encodes the difference macroblock

• Sometimes, a good match cannot be found, i.e., the

• For a motion vector, the difference MVD is sent for

Fig. 10.6: H.261 P-frame Coding Based on Motion Compensation. 30

Video compression standard MPEG

• MPEG: Moving Pictures Experts Group, established in