Professional Documents
Culture Documents
Video-Coding Standards: - Overview - H.261 - H.263 - MPEG-1 - MPEG-2
Video-Coding Standards: - Overview - H.261 - H.263 - MPEG-1 - MPEG-2
• Overview
• H.261
• H.263
• MPEG-1
• MPEG-2
Fundamentals of Multimedia
2
Fundamentals of Multimedia
3
Fundamentals of Multimedia
4
Fundamentals of Multimedia
1 H.261
• H.261: An earlier digital video compression standard, its
principle of MC-based compression is retained in all later
video compression standards.
- The standard was designed for videophone, video conferencing and
other audiovisual services over ISDN.
- Require that the delay of the video encoder be less than 150 msec
so that the video can be used for real-time bidirectional video
conferencing.
5
Fundamentals of Multimedia
6
Fundamentals of Multimedia
7
Fundamentals of Multimedia
8
Fundamentals of Multimedia
• Two types of image frames are defined: Intra-frames (I-frames) and Inter-
frames (P-frames):
- I-frames are treated as independent images. Transform coding method similar
to JPEG is applied within each I-frame, hence “Intra”.
- P-frames are not independent: coded by a forward predictive coding method
(prediction from a previous P-frame is allowed — not just from a previous I-
frame).
- Temporal redundancy removal is included in P-frame coding, whereas I-frame
coding performs only spatial redundancy removal.
- To avoid propagation of coding errors, an I-frame is usually sent a couple of
times in each second of the video.
• Motion vectors in H.261 are always measured in units of full pixel and
they have a limited range of ± 15 pixels, i.e., p = 15.
9
Fundamentals of Multimedia
10
Fundamentals of Multimedia
11
Fundamentals of Multimedia
12
Fundamentals of Multimedia
• For a motion vector, the difference MVD between the motion vectors
of the preceding macroblock and current macroblock is sent for
entropy coding:
MVD = MVPreceding − MVCurrent
13
Fundamentals of Multimedia
Quantization in H.261
• The quantization in H.261 uses a constant step_size, for all DCT coefficients
within a macroblock.
• If we use DCT and QDCT to denote the DCT coefficients before and after the
quantization, then for DC coefficients in Intra mode:
QDCT round DCT round DCT
step _ size 8
for all other coefficients:
QDCT DCT DCT
step _ size 2* scale
scale — an integer in the range of [1, 31].
14
Fundamentals of Multimedia
15
Fundamentals of Multimedia
16
Fundamentals of Multimedia
17
Fundamentals of Multimedia
18
Fundamentals of Multimedia
19
Fundamentals of Multimedia
20
Fundamentals of Multimedia
21
Fundamentals of Multimedia
22
Fundamentals of Multimedia
2 H.263
• H.263 is an improved video coding standard for video
conferencing and other audiovisual services transmitted on
Public Switched Telephone Networks (PSTN).
• Aims at low bit-rate communications at bit-rates of less
than 64 kbps.
• Uses predictive coding for inter-frames to reduce temporal
redundancy and transform coding for the remaining signal
to reduce spatial redundancy (for both Intra-frames and
inter-frame prediction).
23
Fundamentals of Multimedia
24
Fundamentals of Multimedia
• The difference is that GOBs in H.263 do not have a fixed size, and they
always start and end at the left and right borders of the picture.
• Each QCIF luminance image consists of 9 GOBs and each GOB has 11 x 1
MBs (176 x 16 pixels), whereas each 4CIF luminance image consists of 18
GOBs and each GOB has 44 x 2 MBs (704 x 32 pixels).
25
Fundamentals of Multimedia
26
Fundamentals of Multimedia
27
Fundamentals of Multimedia
28
Fundamentals of Multimedia
Half-Pixel Precision
- The default range for both the horizontal and vertical components u and
v of MV(u, v) are now [−16, 15.5].
29
Fundamentals of Multimedia
30
Fundamentals of Multimedia
- When the motion vector points outside the image boundary, the value of
the boundary pixel that is geometrically closest to the referenced pixel is
used.
31
Fundamentals of Multimedia
- Four motion vectors (from each of the 8 x 8 blocks) are generated for each
macroblock in the luminance image.
32
Fundamentals of Multimedia
4. PB-frames mode:
- In H.263, a PB-frame consists of two pictures being coded as one unit.
- The use of the PB-frames mode is indicated in PTYPE.
- The PB-frames mode yields satisfactory results for videos with moderate
motions.
- Under large motions, PB-frames do not compress as well as B-frames and
an improved new mode has been developed in Version 2 of H.263.
H.263+
• The aim of H.263+: broaden the potential applications and offer additional
flexibility in terms of custom source formats, different pixel aspect ratio
and clock frequencies.
• H.263+ provides 12 new negotiable modes in addition to the four optional
modes in H.263.
- It uses Reversible Variable Length Coding (RVLC) to encode the difference
motion vectors.
- A slice structure is used to replace GOB to offer additional flexibility.
• H.263+ implements Temporal, SNR, and Spatial scalabilities.
- Support of Improved PB-frames mode in which the two motion vectors of the
B-frame do not have to be derived from the forward motion vector of the P-
frame as in Version 1.
- H.263+ includes deblocking filters in the coding loop to reduce blocking effects.
34
Fundamentals of Multimedia
H.263++
• H.263++ includes the baseline coding methods of H.263 and
additional recommendations for Enhanced Reference Picture
Selection (ERPS), Data Partition Slice (DPS), and Additional
Supplemental Enhancement Information.
35
Fundamentals of Multimedia
MPEG
• MPEG: Moving Pictures Experts Group, established in
1988 for the development of digital video.
36
Fundamentals of Multimedia
MPEG-1
• MPEG-1 adopts the CCIR601 digital TV format also known as
SIF (Source Input Format).
• MPEG-1 supports only non-interlaced video. Normally, its
picture resolution is:
– 352 × 240 for NTSC video at 30 fps
– 352 × 288 for PAL video at 25 fps
– It uses 4:2:0 chroma subsampling
• The MPEG-1 standard is also referred to as ISO/IEC 11172. It
has five parts:
• 11172-1 Systems
• 11172-2 Video
• 11172-3 Audio
• 11172-4 Conformance
• 11172-5 Software.
37
Fundamentals of Multimedia
39
Fundamentals of Multimedia
– Each MB from a B-frame will have up to two motion vectors (MVs) (one from
the forward and one from the backward prediction).
– If matching in both directions is successful, then two MVs will be sent and the
two corresponding matching MBs are averaged (indicated by ‘%’ in the figure)
before comparing to the Target MB for generating the prediction error.
– If an acceptable match can be found in only one of the reference frames, then
only one MV and its corresponding MB will be used from either the forward or
backward prediction.
40
Fundamentals of Multimedia
41
Fundamentals of Multimedia
43
Fundamentals of Multimedia
Fig : Slices in an
MPEG-1 Picture.
44
Fundamentals of Multimedia
– MPEG-1 quantization uses different quantization tables for its Intra and Inter
coding .
45
Fundamentals of Multimedia
46
Fundamentals of Multimedia
47
Fundamentals of Multimedia
MPEG-2
• MPEG-2: For higher quality video at a bit-rate of more than
4 Mbps.
50
Fundamentals of Multimedia
51
Fundamentals of Multimedia
52
Fundamentals of Multimedia
53
Fundamentals of Multimedia
54
Fundamentals of Multimedia
56
Fundamentals of Multimedia
This is the only mode that can be used for either Frame-
pictures or Field-pictures.
57
Fundamentals of Multimedia
– Due to the nature of interlaced video the consecutive rows in the 8×8
blocks are from different fields, there exists less correlation between
them than between the alternate rows.
– Alternate scan recognizes the fact that in interlaced video the vertically
higher spatial frequency components may have larger magnitudes
and thus allows them to be scanned earlier in the sequence.
58
Fundamentals of Multimedia
MPEG-2 Scalabilities
• The MPEG-2 scalable coding: A base layer and one or more enhancement
layers can be defined — also known as layered coding.
– The encoding and decoding of the enhancement layer is dependent on the base
layer or the previous enhancement layer.
60
Fundamentals of Multimedia
61
Fundamentals of Multimedia
SNR Scalability
• SNR scalability: Refers to the enhencement/refinement over the
base layer to improve the Signal-Noise-Ratio (SNR).
62
Fundamentals of Multimedia
Spatial Scalability
• The base layer is designed to generate bitstream of
reduced resolution pictures. When combined with the
enhancement layer, pictures at the original resolution
are produced.
Fig. : Encoder for MPEG-2 Spatial Scalability. (a) Block Diagram. (b) Combining
Temporal and Spatial Predictions for Encoding at Enhancement Layer.
66
Fundamentals of Multimedia
Temporal Scalability
• The input video is temporally demultiplexed into two pieces,
each carrying half of the original frame rate.
67
Fundamentals of Multimedia
Hybrid Scalability
• Any two of the above three scalabilities can be
combined to form hybrid scalability:
70
Fundamentals of Multimedia
Data Partitioning
• The Base partition contains lower-frequency DCT
coefficients, enhancement partition contains high-
frequency DCT coefficients.
• More restricted slice structure: MPEG-2 slices must start and end in
the same macroblock row. In other words, the left edge of a picture
always starts a new slice and the longest slice in MPEG-2 can have
only one row of macroblocks.
72
Fundamentals of Multimedia
73
Fundamentals of Multimedia
Exercises
4. B-frames provide obvious coding advantages, such as increase
in SNR at low bitrates and bandwidth savings. What are some
of the disadvantages of B-frames?
Answer:
The obvious one is the overhead — especially the time it needs to do
bidirectional search at the encoder side.
Also, it is hard to make a decision as how to use the results from both the
forward and backward predictions: when there are fast
motions/transitions, it can often be the case that only one of the motion
vectors is accurate. Therefore, a simple average of the difference MBs may
not yield good results.
74
Fundamentals of Multimedia
Answer:
As is, the B-frame is at the lowest level of the I-P-B hierarchy.
Any error made on P-frame motion compensation (MC) will
be carried over to B-frame MC. Therefore, B-frames are not
used as reference frames.
In principle, this can be done (and in fact it is done in later
standards.)
75