Video Compression Fundamentals: Pamela C. Cosman

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 57

VIDEO COMPRESSION

FUNDAMENTALS

Pamela C. Cosman

1
Compressing Digital Video
 Exploit spatial redundancy within frames (like JPEG:
transforming, quantizing, variable length coding)
 Exploit temporal redundancy between frames
 Only the sun has changed position between these 2 frames

Previous Frame Current Frame

2
Simplest Temporal Coding - DPCM
 Frame 0 (still image)
 Difference frame 1 = Frame 1
– Frame 0
 Difference frame 2 = Frame 2
– Frame 1
 If no movement in the scene,
all difference frames are 0.
Can be greatly compressed! 0 1 2 3
 If movement, can see it in the
difference images

3
Difference Frames

 Differences between two frames can be


caused by
 Camera motion: the outlines of background or
stationary objects can be seen in the Diff Image
 Object motion: the outlines of moving objects can
be seen in the Diff Image
 Illumination changes (sun rising, headlights, etc.)
 Scene Cuts: Lots of stuff in the Diff Image
 Noise

4
Difference Frames

 If the only difference between two frames is


noise (nothing moved), then you won’t
recognize anything in the Difference Image
 But, if you can see something in the Diff
Image and recognize it, there’s still
correlation in the difference image
 Goal: remove the correlation by
compensating for the motion

5
6
7
8
Types of Motion
Frame n

 Translation: simple
movement of typically
rigid objects Frame n+1 Frame n+2
(Rotation) (Zoom)
 Camera pans vs.
movement of objects  Rotation: spinning
about an axis
 Camera versus object
rotation
 Zooms –in/out
 Camera zoom vs. object
Frame n Frame n+1
zoom (movement in/out)

9
Describing Motion

 Translational
 Move (object) from (x,y) to (x+dx,y+dy)
 Rotational
 Rotate (object) by (r rads) (counter/clockwise)
 Zoom
 Move (in/out) from (object) to increase its size by
(t times)

Which is easiest? Which are we most likely to


encounter?
10
Motion Estimation

 Determining parameters for the motion


descriptions
 For some portion of the frame, estimate its
movement between 2 frames- the current
frame and the reference frame
 What is some portion?
 Individual pixels (all of them)?
 Lines/edges (have to find them first)
 Objects (must define them)
 Uniform regions (just chop up the frame)
11
General Idea

 For a region PC in the current frame, find a


region PR in the search window in reference
frame so that Error(PR,PC) is minimized
 Issues: Error measures, search techniques,
choice of search window, choice of reference
frame, choice of region PC
Current
Portion
Reference of
Search Frame
Frame interest
window
PC

12
Block-based Motion Estimation

 PC is a block of pixels (in the current frame)


 The search window is a rectangular segment
(in the reference frame)

T=1 (reference) T=2 (current)


13
Motion Vectors
 A motion vector (MV) describes the offset between
the location of the block being coded (in the current
frame) and the location of the best-match block in
the reference frame

T=1 (reference) T=2 (current)


14
Motion Compensation

The blocks being predicted are on a grid

1 3 1 2 3 4
2 4
5 6 7 8
6 7 8
5
9 10 11 12
10
9 12
11
14 13 14 15 16

13 15 16

The blocks used for prediction are NOT


15
Motion Vector Search

 1. Mean squared error  Given error measure,


 Select a block in the how to efficiently
reference frame to determine best-match
minimize block in search window?
Σ(b(Bref)-b(Bcurr))2  Full search: best results,
 2. Mean abs. error most computation
 Logarithmic search –
 Select block to
heuristic, faster
minimize
 Hierarchical motion
Σ|b(Bref)-b(Bcurr)|
estimation

16
Motion Vector Search
 Logarithmic Search: First examine
 Full search: Evaluate positions marked 1.
every position in the  Choose best of these (lowest error
search window measure) and examine positions
marked 2 surrounding it
 Choose the best of these, and
examine the positions marked 3
 Final result = best of these

17
Hierarchical Motion Estimation

 Use an averaging filter on the image, then


downsample by a factor of 2
 Conduct a search on the downsampled
image (only ¼ of the size)
 Given the results of the search on the
downsampled image, return to the full
resolution image and refine the search there

18
Motion Compensation

 The standards do not specify HOW the


encoder will find the motion vectors (MVs)
 The encoder can use exhaustive/fast search,
MSE /MAE/other error metric, etc.
 The standard DOES specify
 The allowable syntax for specifying the MVs
 What the decoder will do with them
 What the decoder does is to grab the
indicated block from reference frame, and
glue it in place
19
Standard specifies bit stream
 The video compression  This allows future encoders of
standards define syntax and better performance to remain
semantics for the bit stream compatible with existing
between encoder and decoder decoders.
bit stream
 Also allows for commercially
ENCODER DECODER
secret encoders to be
compatible with standard
decoders
not this Standard defines not this
this Today’s Ho-Hum Today’s
Encoder Decoder
 Encoder is not specified by
MPEG except that it produces Tomorrow’s Nifty
a compliant bit stream Encoder
 Compliant decoder must Today’s decoder
interpret all legal MPEG bit Very secret still works!
streams Encoder

20
Motion Compensation Example

Frame n-1 Frame n

(0,0) (-16,0) (5,0) (0,0)

(0,0) (16,7) (5,2) (0,0)

(20,-24) (0,0) (-20,-18) (0,0) MOTION COMPENSATED


Frame n

21
Objects versus Macroblocks
 Real moving objects will not coincide with
boundaries of macroblocks

background Prediction error


Background well
encoded (no
moving object Moving object well encoded motion vector) Prediction error
with motion vector

 If encoder sends MV=(MotX,MotY), object well


coded, but background poorly coded
 If encoder sends MV=(0,0), background well coded,
but moving object poorly coded
 Either approach is valid
22
Motion Compensation

 This glued together frame is called


the motion compensated frame
 The encoder can also form the difference between
the motion compensated frame and the actual
frame.
 This is called the motion compensated difference
frame
 This difference frame formed using MC should have
less correlation between pixels than the difference
frame formed without using MC

23
Motion Compensated Difference
Frames
 Suppose we are doing lossless coding
 Encoder has sequence of frames: …, F(n-2), F(n-1)
 Next: encode F(n)
 Past frames have been losslessly encoded, so the
decoder knows F(n-1) perfectly already
 Encoder sends the motion vectors for frame F(n)
relative to frame F(n-1), to form motion
compensated frame M(n)
 Encoder knows M(n), Decoder knows M(n)

24
Motion Compensation Example

F(n-1) F(n)
M(n)

(0,0) (-16,0) (5,0) (0,0)

(0,0) (16,7) (5,2) (0,0)

(20,-24) (0,0) (-20,-18) (0,0)


MOTION COMPENSATED
Frame

25
Encoding Difference Frames
 Encoder forms motion  With no motion compensation
compensated diff frame: encoder could do frame diff:
MCD(n) = F(n) – M(n) FD(n) = F(n) – F(n-1)
 Encoder losslessly  Encoder losslessly
encodes MCD(n) encodes FD(n)
 Decoder can then do  Decoder can then do
F(n) = MCD(n) + M(n) F(n) = FD(n) + F(n-1)
→ knows F(n) exactly → knows F(n) exactly

If successive frames are very similar:


fewer bits to send Motion Vectors + MCD(n) instead of FD(n)
fewer bits to send FD(n) instead of F(n)
26
Motion compensated difference frames
 Decoder knows F(n-1) and, once you send the
motion vectors, it knows M(n)
Send FD(n)

Reference Frame Original Frame Difference Image


F(n-1) F(n) FD(n)=F(n)-F(n-1)

Send Motion
Vectors Send MCD(n)
Motion compensated Motion compensated
frame M(n) difference image
MCD(n) =F(n) – M(n)

27
Motion Compensated Difference
Frames
 But we are NOT doing lossless coding
 Encoder has sequence of frames: …, F(n-2), F(n-1)
 Next: encode F(n)
 Past frames have been lossy encoded, so the
decoder has versions …, G(n-2), G(n-1)
 Encoder knows …, G(n-2), G(n-1) also
 Encoder sends the motion vectors for frame F(n)
relative to frame G(n-1), to form motion
compensated frame M(n)
28
Encoding Difference Frames
 Encoder forms motion compensated
difference frame: MCD(n) = F(n) – M(n)
 Encoder lossy encodes MCD(n)
 Call the decoder version MCD*(n)
 If the decoder received MCD(n) exactly,
could do: F(n) = MCD(n) + M(n)
 But with MCD*(n), decoder can do
G(n) = MCD*(n) + M(n)
→ knows F(n) approximately
29
Motion estimation philosophy

 Goal of motion estimation is NOT to provide a


careful analysis of the actual motion
 Goal is to achieve a given quality of representation
of the video while globally minimizing the bit rate
required to send
 The motion information
 The prediction error information
 Most of the time, for a given representation quality
 fewer bits to send MV+MCD(n) instead of sending FD(n)
 fewer bits to send FD(n) instead of sending F(n) itself.

30
Motion Compensation for Chrominance

 Luminance is highly correlated, more so than


chrominance
 The “best” motion vectors are available by
searching in the luminance plane
 Motion vectors for chrominance are not
computed separately, simply scaled as
needed

31
Motion Estimation/Compensation
Summary
 At the encoder:
 For each block in the frame being coded, examine
the search window(s) in the reference frame to
find the best match block (do this for luminance
only)
 Form the MC difference image = original image
minus motion compensated image
 Scale the motion vectors for the chrominance,
form the motion compensated chrominance
frames, and form chrominance difference image

32
Motion Estimation/Compensation
Summary
 At the decoder:
 Decode the reference frames (Y,Cr,Cb)
 For each block in a temporally coded Y frame, use
the motion vector to select a block from the
reference frame and glue it in place
 Add the Y difference image
 For each block in temporally coded Cr,Cb frames,
first scale the motion vector, then do the previous
2 steps with Cr and Cb data

33
Progress of Video Compression

34
Progress of Video Compression

35
Progress of Video Compression

36
Temporal Location of Reference

 The reference frame need not occur before


the temporally coded frames which use it

 Why? Scene changes, allow better matches

37
Flavors of Motion Estimation

 1. Forward predicted blocks: the best-match block


occurs in the reference frame before the block’s
frame
 2. Backward predicted blocks: the best-match block
occurs in the reference frame after the block’s frame
 3. Interpolatively predicted blocks: the best-match
block is the average of the best-match blocks from
reference frames before & after
 The motion compensation direction can be selected
independently for each block in a frame.

38
MPEG Frame Types

 Intra (I) pictures: coded by themselves, as


still images. No temporal coding. No motion
vectors.

39
MPEG Frame Types

 Forward Motion Compensated predicted (P)


pictures – forward motion compensated from
the previous I or P frame

40
MPEG Frame Types
 Motion Compensated interpolated (B) pictures –
forward, backward, and interpolatively motion
compensated from previous/next I/P frames

41
Motion Vector Coding

 How are the motion vectors actually encoded


for transmission to the decoder?
 Start by taking the difference between the current
motion vector and the most recent previous one of
the same type (forward/backward/interpolative)
 Encode the difference using variable length
coding
 Horizontal and vertical components coded
separately

42
MPEG Frame Structure Terminology

 A block contains 8x8 pixels


 The DCT unit
 A macroblock (MB) contains 4 blocks from
the luminance, plus the corresponding
chrominance blocks
 4 blocks from each of Cr/Cb if 4:4:4 format
 2 blocks from each of Cr/Cb if 4:2:2 format
 1 block from each of Cr/Cb if 4:1:1 or 4:2:0 format
 The motion compensation unit

43
MPEG Frame Structure Terminology

 A slice is a collection of macroblocks, tracing


in a raster scan from upper left to lower right
 The resynchronization unit
 A picture is a frame, either progressive (non-
interlaced) or interlaced
 The primary coding unit
 A Group of Pictures (GOP) contains ≥ 1
frame.
 The unit for random access into the sequence

44
MPEG GOP Structure

 A Group of Pictures (GOP) may contain


 All I pictures
 I & P pictures only
 I, P, & B Pictures
 A common GOP format for 30 frames/sec:
 I-picture spacing 15 frames (1/2 second)
 P-picture spacing 3 frames (1/10 second)
I B B P B B P B B P B B P B B I
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

45
Frame Ordering

 Display order (encoder input order):

B B I B B P B B P B B P B B P B B I
-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

 But consider coding dependencies:


 Frame 2 (B) needs frame 4 (P) to be decoded first, etc.
 So better transmit frame 4 before frame 2

I B B P B B P B B P B B P B B I B B
1 -1 0 4 2 3 7 5 6 10 8 9 13 11 12 16 14 15

46
Types of Coding Modes

 What if the best-match block in the reference


frame is a great match?
 Then the motion vector is all you need to send
 What if it is a terrible match?
 Then don’t use the motion vector at all, just code
the block by itself, with something like JPEG
(called intra mode coding)
 What if it is a so-so match?
 Then you can send the MV, and also send the
frame difference information for that macroblock

47
Coding Mode I (Inter-Coding)

 Inter coding refers to coding with motion vectors

Macro
Block

Previous Current
Frame Frame
Motion Vector

48
Coding Mode II (Intra-Coding)

 INTRA coding refers to coding without motion vectors


 The MB is coded all by itself, in a manner similar to JPEG

Macro
Block

Previous Current
Frame Frame

49
I-Picture Coding

 Two possible coding modes for macroblocks


in I-frames
 Intra- code the 4 blocks with the current
quantization parameters
 Intra with modified quantization: scale the
quantization matrix before coding this MB
 All macroblocks in intra pictures are coded
 Quantized DC coefficients are losslessly
DPCM coded, then Huffman as in JPEG

50
I-Picture Coding

 Use of macroblocks modifies block-scan order:


8 8

 Quantized coefficients are zig-zag scanned and run-


length/Huffman coded as in JPEG
 Very similar to JPEG except (1) scaling Q matrix
separately for each MB, and (2) order of blocks

51
P-Picture Coding: many coding modes

 Motion compensated  MV = (0,0), just send


coding: Motion Vector only difference block
 Motion compensated  MV=(0,0), just send diff
coding: MV plus difference block, with modified
macroblock quantization scaling
 Intra: the MB is coded
Motion DCT with DCTs (no
Vector
difference is computed)
 Motion compensation: MV  Intra with modified
& difference MB with quantization scaling
modified quant. scaling

52
How to choose a coding mode?
 MPEG does not specify how to choose mode
 Full search = try everything…the different
possibilities will lead to different rate/distortion
outcomes for that macroblock

distortion

• MV, no difference


• MV, plus difference
• • Intra

rate

53
How to choose a coding mode?

 Tree search: use a decision tree


 For example:
 First find the best-match block in the search window. If
it’s a very good match, then use motion compensation.
Otherwise, don’t.
 If you decided to use motion compensation, then need
to decide whether or not to send the difference block as
well. Make decision based on how good a match it is.
 If you decided to send the difference block, then have to
decide whether or not to scale the quantization
parameter… check the current rate usage…

54
B-Picture Coding

 B pictures have even more possible modes:


 Forward prediction MV, no difference block
 Forward prediction MV, plus difference block
 Backward prediction MV, no difference block
 Backward prediction MV, plus difference block
 Interpolative prediction MV, no difference block
 Interpolative prediction MV, plus difference block
 Intra coding
 Some of above with modified Quant parameter

55
Group of Pictures

 IIIII…: Every picture is intra-coded.


 Fully decodable without reference to any other picture
 Editing is straightforward
 Requires about 2.5 more bit rate than bidirectional
 IBBPBBPB…: Forward and bidirectional
 Best compression factor
 Needs large decoder memory
 Hard to edit
 Most useful for final delivery of post-produced material
(e.g., broadcast) because no editing requirement

56
Group of Pictures

 IPPPPIPP…: Forward predicted only.


 Needs less decoder memory
 IBIBIB…: bidirectional compromise
 Some of the bit rate advantage of bidirectional coding
 Not nearly the full latency penalty of bidirectional
 Editable with moderate processing.
For example, if the video after a B picture needs to be
deleted, the B frame would not be decodable.
Solution is to decode the B frame first, re-encode it using
forward prediction only. Some quality loss.

57

You might also like