Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Chapter 10

Basic Video Compression Techniques

10.1 Introduction to Video compression


10.2 Video Compression with Motion Compensation
10.3 Video compression standard H.261
10.4 Video compression standard MPEG-1

1
Fundamentals of Multimedia, Chapter 10

The need for video compression

2
Fundamentals of Multimedia, Chapter 10

Introduction to Video Compression

• A video consists of a time-ordered sequence of frames,


i.e., images.

• An obvious solution to compress video by applying an


image compression algorithm to each frame, for instance
compressing each image frame as a JPEG image.

3
Fundamentals of Multimedia, Chapter 10

Introduction to Video Compression

• Consecutive frames in a video are similar — temporal


redundancy exists.
• Significantly higher compression rates can be achieved by
exploiting temporal redundancy.

4
Fundamentals of Multimedia, Chapter 10

Introduction to Video Compression

It Utilizes two basic compression techniques:


• Intraframe compression
• Occurs within individual frames
• Designed to minimize the duplication of data in each picture
(Spatial Redundancy)
• Interframe compression
• Compression between frames
• Designed to minimize data redundancy in successive pictures
(Temporal redundancy)

5
Fundamentals of Multimedia, Chapter 10

Introduction to Video Compression

• Temporal redundancy arises when successive frames of


video display images of the same scene.
• It is common for the content of the scene to remain fixed
or to change only slightly between successive frames.

• Spatial redundancy occurs because parts of the picture


are often replicated (with minor changes) within a single
frame of video.

6
Fundamentals of Multimedia, Chapter 10

Temporal Redundancy

• Temporal redundancy is exploited so that not every


frame of the video needs to be coded independently as a
new image.
• It makes more sense to code only the changed
information from frame to frame rather than coding the
whole frame

7
With difference coding, only the first image (I-frame) is coded in its entirety. In the two
following images (P-frames), references are made to the first picture for the static elements, i.e.
the house. Only the moving parts, i.e. the running man, are coded using motion vectors, thus
reducing the amount of information that is sent and stored.
Fundamentals of Multimedia, Chapter 10

Temporal Redundancy

• Temporal redundancy can be better exploited by


Predictive coding based on previous frames.
• Predicting the motion of pixels and regions from frame to
frame, rather than predicting the frame as a whole.

• Compression proceeds by subtracting images.


 The difference between the current frame and other
frame(s) in the sequence will be coded  small values and
low entropy, good for compression.

8
Fundamentals of Multimedia, Chapter 10

Temporal Redundancy

9
Fundamentals of Multimedia, Chapter 10

Pixel motion prediction

• It can be done even better by searching


for just the right parts of the image to
subtract from the previous frame.

• Practically, the coherency from frame to


frame is better exploited by observing that
groups of contiguous pixels, rather than
individual pixels, move together with the
same motion vector.

• Therefore, it makes more sense to predict


frame n+1 in the form of regions or blocks 10
rather than individual pixels.
Fundamentals of Multimedia, Chapter 10

Pixel motion prediction

• The pixel Cn(x, y) shown in the frame n has moved to a new location
in frame n+1. Consequently, Cn+1(x, y) in frame n+1 is not the same as
Cn(x, y) but is offset by the motion vector (dx,dy).
11
• the small error difference e(x,y) = Cn+1(x, y) - Cn(x+dx, y+dy)
Fundamentals of Multimedia, Chapter 10

Video Compression with Motion Compensation

Steps of Video compression based on Motion


Compensation (MC):
• MC-based Prediction.
• Motion Estimation (motion vector search).
• Derivation of the prediction error, i.e., the difference.

12
Fundamentals of Multimedia, Chapter 10

Motion Compensation

• It is an algorithmic technique used to predict a frame in a


video, given the previous and/or future frames.

• Motion compensation describes a picture in terms of the


transformation of a reference picture to the current
picture.

• The reference picture may be previous in time or even


from the future.

13
Fundamentals of Multimedia, Chapter 10

Motion Compensation

How it works
• It exploits the fact that, often, for many frames of a
movie, the only difference between one frame and
another is the result of either the camera moving or an
object in the frame moving.
• In reference to a video file, this means much of the
information that represents one frame will be the same
as the information used in the next frame.
• Using motion compensation, a video stream will contain
some full (reference) frames; then the only information
stored for the frames in between would be the
information needed to transform the previous frame into 14
the next frame.
Fundamentals of Multimedia, Chapter 10

Motion Compensation

• Each image is divided into macroblocks of size N x N.


• By default, N = 16 for luminance images. For
chrominance images, N = 8 if 4:2:0 chroma
subsampling is adopted.

15
Fundamentals of Multimedia, Chapter 10

Motion Compensation

Motion compensation is performed at the macroblock level.


• The current image frame is referred to as Target Frame.
• A match is sought between the macroblock in the Target
Frame and the most similar macroblock in previous and/or
future frame(s) (referred to as Reference frame(s)).
• The displacement of the reference macroblock to the target
macroblock is called a motion vector MV.

• Figure 10.1 shows the case of forward prediction in which


the Reference frame is taken to be a previous frame.
16
Fundamentals of Multimedia, Chapter 10

Fig. 10.1: Macroblocks and Motion Vector in Video Compression.

• MV search is usually limited to a small immediate


neighborhood — both horizontal and vertical displacements
in the range [−p, p].
This makes a search window of size (2p + 1) x (2p + 1). 17
Fundamentals of Multimedia, Chapter 10

Size of Macroblocks

• Smaller macroblocks increase the number of blocks in the


target frame
 a larger number of motion vectors to predict the target frame.
 This requires more bits to compress motion vectors,
 but smaller macroblocks tend to decrease the prediction error.

• Larger macroblocks
 fewer motion vectors to compress,
 but also tend to increase the prediction error. This is because
larger areas could possibly cover more than one moving region
within a large macro block. 18
Fundamentals of Multimedia, Chapter 10

Video compression standard H.261

H.261: An earlier digital video compression standard, its


principle of MC-based compression is retained in all later
video compression standards.
• The standard was designed for videophone, video
conferencing and other audiovisual services over ISDN.
• The video codec supports bit-rates of p x 64 kbps,
where p ranges from 1 to 30 (Hence also known as p *
64).
• Require that the delay of the video encoder be less
than 150 msec so that the video can be used for real-
time video conferencing. 19
Fundamentals of Multimedia, Chapter 10

H.261 Frame Sequence

Two types of image frames are defined:


• Intra-frames (I-frames) and Inter-frames (P-frames):

20
Fundamentals of Multimedia, Chapter 10

H.261 Frame Sequence

• I-frames:
• These are intra-frames coded where only spatial redundancy is used to
compress that frame.

• Are treated as independent images (can be reconstructed without any


reference to other frames).

• Transform coding method similar to JPEG is applied within each I-frame.

• This frame requires more bits for compression than predicted


frames  their compression is not that high

21
Fundamentals of Multimedia, Chapter 10

H.261 Frame Sequence

• P-frames:
• P-frames are predictive coded (forward predictive coding method ),
exploiting temporal redundancy by comparing them with a
preceding reference frame
• Are not independent (it is impossible to reconstruct them without the
data of another frame (I or P))
• They contain the motion vectors and error signals
• P-frames need less space than the I-frames, because only the differences
are stored. However, they are expensive to compute, but are
necessary for compression
• An important problem the encoder faces is when to stop predicting
using P-frames, and instead insert an I frame
• An I frame needs to be inserted where P frames cannot give much
compression
22
• This happens during scene transitions or scene changes, where the error
images are high.
Fundamentals of Multimedia, Chapter 10

Fig. 10.4: H.261 Frame Sequence.

• We typically have a group of pictures — one I-frame followed by


several P-frames — a group of pictures
• Number of P-frames followed by each I-frame determines the size of
GOP – can be fixed or dynamic. Why this can’t be too large? 23
Fundamentals of Multimedia, Chapter 10

H.261 Frame Sequence

• Temporal redundancy removal is included in P-frame


coding, whereas I-frame coding performs only spatial
redundancy removal.

• Lost P-Frames usually results in artifacts that are folded


into subsequent frames. If an artifact persists over time,
then the likely cause is a lost P-Frame.

• To avoid propagation of coding errors, an I-frame is


usually sent a couple of times in each second of the
video. 24
Fundamentals of Multimedia, Chapter 10

Intra-frame (I-frame) Coding

• Various lossless and lossy compression techniques use —


like JPEG.
• Compression contained only within the current frame
• Simpler coding – Not enough by itself for high
compression.
• Cant rely on intra frame coding alone not enough
compression.
• However, cant rely on inter frame differences across a
large number of frames
• So when Errors get too large: Start a new I-Frame
25
Fundamentals of Multimedia, Chapter 10

Intra-frame (I-frame) Coding

• Macroblocks are of size 16 x 16 pixels for the Y frame, and 8 x


8 for Cb and Cr frames, since 4:2:0 chroma subsampling is
employed. A macroblock consists of four Y, one Cb, and one Cr
8 x 8 blocks.
• For each 8 x 8 block a DCT transform is applied, the DCT
coefficients then go through quantization zigzag scan and
entropy coding. 26
Fundamentals of Multimedia, Chapter 10

Block Transform Encoding (I-frame)

27
Fundamentals of Multimedia, Chapter 10

Inter-frame (P-frame) Predictive Coding

The H.261 P-frame coding scheme based on motion


compensation:
• For each macroblock in the Target frame, a motion
vector is allocated.
• After the prediction, a difference macroblock is
derived to measure the prediction error.
• Each of these 8 x 8 blocks go through DCT,
quantization, zigzag scan and entropy coding
procedures.

28
Fundamentals of Multimedia, Chapter 10

Inter-frame (P-frame) Predictive Coding

• The P-frame coding encodes the difference macroblock


(not the Target macroblock itself).

• Sometimes, a good match cannot be found, i.e., the


prediction error exceeds a certain acceptable level. The
MB itself is then encoded (treated as an Intra MB) and in
this case it is termed a non-motion compensated MB.
 (In order to minimize the number of expensive motion estimation
calculations, they are only calculated if the difference between two
blocks at the same position is higher than a threshold, otherwise the
whole block is transmitted)

• For a motion vector, the difference MVD is sent for


entropy coding
29
Fundamentals of Multimedia, Chapter 10

Fig. 10.6: H.261 P-frame Coding Based on Motion Compensation. 30


Fundamentals of Multimedia, Chapter 10

Video compression standard MPEG

• MPEG: Moving Pictures Experts Group, established in


1988 for the development of digital video.

• MPEG compression is essentially an attempt to overcome


some shortcomings of H.261:
• H.261 only encodes video. MPEG-1 encodes video and
audio.
• H.261 only allows forward prediction. MPEG-1 has
forward and backward prediction (B-pictures).
•  MPEG-1 was designed to allow a fast forward and backward
search and a synchronization of audio and video. 31
Fundamentals of Multimedia, Chapter 10

Motion Compensation in MPEG-1

As mentioned before, Motion Compensation (MC) based


video encoding in H.261 works as follows:
• In Motion Estimation (ME), each macroblock (MB) of
the Target P-frame is assigned a best matching MB
from the previously coded I or P frame - prediction.
• prediction error: The difference between the MB and
its matching MB, sent to DCT and its subsequent
encoding steps.
• The prediction is from a previous frame — forward
prediction.
32
Fundamentals of Multimedia, Chapter 10

Motion Compensation in MPEG-1

• Sometimes, areas of the current frame can be better


predicted by the next future frame. This might happen
because objects or the camera moves, exposing areas not
seen in the past frames.

The MB containing part of a ball in the Target frame cannot find a good matching MB in
the previous frame because half of the ball was occluded by another object. A match
however can readily be obtained from the next frame. 33
Fundamentals of Multimedia, Chapter 10

The Need for a Bidirectional Search

The Problem here is that many macroblocks need information that is


not in the reference frame.
• Occlusion by objects affects differencing
• Difficult to track occluded objects etc.
• MPEG uses forward/backward interpolated prediction.

• Using both frames increases the correctness in prediction during


motion compensation.
• The past and future reference frames can themselves be coded
as an I or a P frame.

34
Fundamentals of Multimedia, Chapter 10

MPEG B-Frames

• The MPEG solution is to add a third frame type which is a


bidirectional frame, or B-frame

35
Fundamentals of Multimedia, Chapter 10

MPEG B-Frames

• B-frames, also known as bidirectionally coded frames, are


intercoded and also exploit temporal redundancy .
• To predict a B-frame, the previous or past frame and the
next or future frame are used.
• The coding of B frames is more complex compared with I
or P-frames with the encoder having to make more decisions.

36
Fundamentals of Multimedia, Chapter 10

MPEG B-Frames

• To compute a matching macroblock, the encoder needs to


search for the best motion vector in the past reference
frame and also for the best motion vector in the future
reference frame. Two motion vectors are computed for
each macroblock.
• The macroblock gets coded in one of three modes:
• Forward predicted using only the past frame
• Backward predicted using only the future frame
• Interpolated, using both by averaging the two predicted
blocks
• The case corresponding to the best macroblock match
and yielding the least entropy in the difference is chosen. 37
Fundamentals of Multimedia, Chapter 10

Backward Prediction Implications

B-frames also necessitate reordering frames during


transmission, which causes delays:
• The order in which frames arrive at the encoder is known
as the display order . This is also the order in which the
frames need to be displayed at the decoder after decoding.
• B-frames induce a forward and backward dependency . The
encoder has to encode and send to the decoder both the
future and past reference frames before coding and
transmitting the current B-frame.
• Because of the change in the order, all potential B-frames
need to be buffered while the encoder codes the future
reference frame, imposing the encoder to deal with buffering
and also causing a delay during transmission.
38
Fundamentals of Multimedia, Chapter 10

Backward Prediction Implications

Ex: Here, Backward prediction requires that the future frames


that are to be used for backward prediction be Encoded and
Transmitted first, I.e. out of order.

39
Fig 10.9: MPEG Frame Sequence.
Fundamentals of Multimedia, Chapter 10

Example encoding patterns:

Pattern 1: IPPPPPPPPPP
Dependency: I <---- P <---- P <---- P ...
• I-frame compressed independently
• First P-frame compressed using I-frame
• Second P-frame compressed using first P-frame
• And so on...

40
Fundamentals of Multimedia, Chapter 10

Example encoding patterns:

Pattern 2: I BB P BB P BB P BB P BB P BB P
Dependency: I <---- B B ----> P <---- B B ----> P ...
• I-frame compressed independently
• First P-frame compressed using I-frame
• B-frames between I-frame and first P-frame compressed
using I-frame and first P-frame
• Second P-frame compressed using first P-frame
• B-frames between first P-frame and second P-frame
compressed using first P-frame and second P-frame
• And so on...
41
Fundamentals of Multimedia, Chapter 10

Example encoding patterns:

Pattern 3: I BBB P BBB P BBB P BBB P


Dependency: I <---- B B B ----> P <---- B B B ----> P ...
• I-frame compressed independently
• First P-frame compressed using I-frame
• B-frames between I-frame and first P-frame compressed
using I-frame and first P-frame
• Second P-frame compressed using first P-frame
• B-frames between first P-frame and second P-frame
compressed using first P-frame and second P-frame
• And so on...
42
Fundamentals of Multimedia, Chapter 10

The quality of an MPEG-video

• The usage of the particular frame type defines the quality and the
compression ratio of the compressed video.
• I-frames increase the quality (and size), whereas the usage of B-
frames compresses better but also produces poorer quality.
• The distance between two I-frames can be seen as a measure for the
quality of an MPEG-video.
• No defined limit to the number of consecutive B frames that may be
used in a group of pictures,
• Optimal number is application dependent.
• Most broadcast quality applications however, have tended to use 2
consecutive B frames (I,B,B,P,B,B,P,) as the ideal trade-off between
compression efficiency and video quality.
43
Fundamentals of Multimedia, Chapter 10

MC-based B-frame coding idea (summary)

The MC-based B-frame coding idea is illustrated in Fig. 10.8:


• Each MB from a B-frame will have up to two motion vectors
(MVs) (one from the forward and one from the backward
prediction).
• If matching in both directions is successful, then two MVs
will be sent and the two corresponding matching MBs are
averaged (indicated by ‘%’ in the figure) before comparing
to the Target MB for generating the prediction error.
• If an acceptable match can be found in only one of the
reference frames, then only one MV and its corresponding
MB will be used from either the forward or backward
prediction. 44
Fundamentals of Multimedia, Chapter 10

• Fig 10.8: B-frame Coding Based on Bidirectional Motion 45


Compensation.

You might also like