Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Audio Compression

1.Answer : Differential pulse code modulation (DPCM): Differential Pulse Code Modulation
(DPCM) is a signal encoder that converts analog signals into digital signals. It's a method of encoding
audio signals by encoding the difference between consecutive samples, instead of the samples
themselves. It is also a derivative of standard PCM and exploits the fact that, for most audio signals,
the range of the differences in amplitude between successive samples of the audio waveform is less
than the range of the actual sample amplitudes. Hence if only the digitized difference signal is used to
encode the waveform then fewer bits are required than for a comparable PCM signal with the same
sampling rate. A DPCM encoder and decoder are shown in Figrue 1.

Figure 1 : DPCM principle : endcoder/decoder schematic


Operation of DPCM:
Encoder :
 Previously digitized sample is held in the register (R)
 The DPCM signal is computed by subtracting the current contents (R 0)from the new output
by the ADC (PCM)
 The register value is then updated before transmission
 DPCM=PCM-R0

Decoder :
 Decoder simply adds the previous register contents (PCM) with the DPCM
 R₁ =R0+DPCM

Limitation of DPCM:
 ADC operations introduces quantization errors each time and will introduce cumulative errors
in the value stored in the register(R).

2. Answer : Third-order predictive DPCM : Third-order predictive DPCM (Differential Pulse Code
Modulation) is a technique used in multimedia to compress audio and video signals by exploiting the
redundancy between neighboring samples. It works by predicting the current sample based on a few
previous samples and then transmitting only the difference (prediction error) between the predicted
and actual values. This can significantly reduce the amount of data needed to represent the signal,
while still maintaining an acceptable level of quality.It reduces the bit rate requirements from 64kbps
to 32kbps. Third-order predictive DPCM signal encoder and decoder shown in Figure 2.
Figure 2 : Third-order predictive DPCM signal encoder and decoder schematic

Operation of Third Order Predictive DPCM :


 R₁, R₂, R₂ will be subtracted from PCM.
 The values in the R₁ register will be transferred to R₂ and R₂ to R₂ and the new predicted value
goes into R₁ .
 Decoder operates in a similar way by adding the same proportions of the last three computed
PCM signals to the received DPCM signal .
3. Answer : Perceptual coding : perceptual coding is a compression technique that analyzes the
human senses, primarily hearing and vision, to prioritize information based on what we are likely to
perceive. The goal is to achieve high compression ratios while maintaining perceptual transparency,
meaning the compressed content appears nearly identical to the original to the human user. Perceptual
encoders have been designed for the compression of general audio such as that associated with a digital
television broadcast . They also use a model but ,in this case , it is know as a psychoacoustic model
since its role is to exploit a number of the limitations of the human ear .

Sensitivity of the ear:


 The dynamic range of ear is defined as the loudest sound it can hear to the quietest sound .
 Sensitivity of the ear varies with the frequency of the signal as shown figure 3.
 The ear is most sensitive to signals in the range 2-5kHz hence the signals in this band are the
quietest the ear is sensitive to.
 Vertical axis gives all the other signal amplitudes relative to this signal (2-5 kHz).
 In the figure 3 although the Signal A & B have same relative amplitude, signal A would be
heard only because it is above the hearing threshold and B is below the hearing threshold.

Figure 3: sensivity as a function of frequency


Frequency Masking : Frequency masking is a key concept based on a phenomenon called auditory
masking, where a louder sound at one frequency can make it harder to hear a quieter sound at a nearby
frequency. When an audio sound consists of multiple frequency signals is present , the sensitivity of
the ear changes and varies with the relative amplitude of the signal .

Figure 4 : Frequency masking


Conclusions from Figure 4 :
 Signal B is larger than signal A. This causes the basic sensitivity curve of the ear to be distorted
in the region of signal B .
 Signal A will no longer be heard as it is within the distortion band.

Variation of frequency masking effect with frequency:


 Masking effect at various frequencies 1, 4, and 8kHz are shown .
 Width of masking curve (means range of frequencies that are affected) increases with
increasing frequency.
 The width of each curve at a particular signal level is known as the critical bandwidth for that
frequency.
 For frequencies greater than 500Hz critical bandwidth increases linearly in multiples of 100Hz.
4. Answer : MPEG Audio Coder : Motion Picture Expert Group was formed by the ISO to formulate
a set of standards relating to a range of multimeadia applications that involves the use of video with
sound . The coder associated with audio compression form a part of these standards are known as
MPEG audio coders .

Figure 5: MPEG perceptual coder schematic with encoder/decoder implementation


Briefly describe MPEG with encoder/decoder :
 The audio input signal is first sampled and quantized using PCM.
 The bandwidth available for transmission is divided into a number of frequency subbands using
a bank of analysis filters.
 Analysis filter bank:
 Maps each set of 32 (time related) PCM samples into an equivalent set of 32 frequency
samples.
 Determines the peak amplitude in each subband (consisting of 12 freq. components)
called scaling factor.
 Processing associated with both frequency and temporal masking is carried out by the
psychoacoustic model.
 In basic encoder the time duration of each sampled segment of the audio input signal is equal
to the time to accumulate 12 successive sets of 32 PCM.
 12 sets of 32 PCM time samples are converted into frequency components using DFT.
 The output of the psychoacoustic model is a set of what are known as signal-to-mask ratios
(SMRs) and indicate the frequency components whose amplitude is below the audible
threshold.
 This is done to have more bits for highest sensitivity regions compared with less sensitive
regions.
 In an encoder all the frequency components are carried in a frame.
Video Compression
1 . Answer: Describe different types of frames :
I-Frames (Intra-Frames):
 I-Frames are standalone frames that do not depend on any other frames for decoding. They
serve as reference points for the video compression process and contain complete information
about a particular moment in the video.
 They are larger in size compared to P-Frames and B-Frames.
 They are crucial for random access and error recovery since they provide a starting point for
decoding a sequence of frames.
 I-Frames are typically encoded independently, making them more resilient to transmission
errors.
P-Frames (Predictive Frames):
 Function: P-Frames are predicted frames that depend on previous I-Frames or P-Frames for
reconstruction. They store the differences (motion vectors) between the current frame and the
reference frames.
 They are more efficient in terms of compression compared to I-Frames since they only contain
the changes from the previous frames.
 P-Frames are suitable for representing motion in the video, making them essential for achieving
good compression ratios.

B-Frames (Bidirectional Frames):


 B-Frames use both previous and future frames as references, allowing for more efficient
compression by representing the differences between the current frame and frames on both
sides.
 B-Frames offer the highest compression efficiency as they take advantage of temporal
redundancy in both directions.
 They are computationally more intensive during encoding but contribute to reducing the overall
bit rate of the video.
PB-Frames (Bi-Directional Predictive Frames):
 PB-Frames combine the characteristics of both P-Frames and B-Frames. They use motion
compensation with both past and future frames, providing a balance between compression
efficiency and computational complexity.
 PB-Frames offer a compromise between the compression efficiency of B-Frames and the
simplicity of P-Frames.
 They are commonly used in video coding standards to strike a balance between compression
performance and decoding complexity.
Figure 6 : Example frame sequences with: (a) I- and P- frames only ; (b) I-,P- and B-frames;(c)PB-
frames .

2. Answer :
Short note on H.261 :
Released in 1988, H.261 was the first widely used standard for video coding. It was created for use
with Integrated Services Digital Network (ISDN) connections for video conferences.
Key Features:
Resolution and Frame Rate: Common video conferencing resolutions including QCIF (Quarter
Common Intermediate Format) and CIF (Common Intermediate Format) are supported by H.261.
Frame rates of 30 frames per second are usually used.
Encoding Techniques: Discrete cosine transform (DCT) is used for spatial compression and motion
correction is used for temporal prediction in H.261 video coding.
Bit Rate: Variable bit rate encoding is supported by the standard, which adjusts to the complexity of
the video content.
Intra-Frame and Inter-Frame Coding: Similar to numerous other video compression protocols,
H.261 leverages both inter- and intra-frame coding to take advantage of temporal and spatial
redundancy.

Short note on H.263 :


An development of H.261, H.263 was first presented in 1996 with the purpose of facilitating video
conferencing over low-bitrate communication channels, such as the internet and mobile networks.
Key Features:
Improved Compression Efficiency: H.263 offers improved compression efficiency over H.261,
making it more suitable for low-bitrate applications.
Enhanced Video Coding Techniques: H.263 introduces advanced techniques such as half-pixel
motion compensation, which refines the accuracy of motion prediction.
Error Resilience: H.263 includes features for error resilience, making it more robust in the presence
of packet loss or transmission errors.
Rate Control: The standard supports rate control mechanisms to regulate the bit rate, improving
adaptability to varying network conditions.
Arbitrary Resizable Macroblocks: H.263 allows for arbitrary resizable macroblocks, offering
flexibility in adapting to different video resolutions.

Short note on MPEG-1:


These days, MPEG-1 video is commonly used for transferring video files over the Internet, playing
back and storing videos on PCs, and storing videos on CDs. For example, MPEG-1 is utilized in Video
CDs.
Key Features:
 Designed for low-bitrate video compression.
 Supported video resolutions such as QCIF (Quarter Common Intermediate Format) and CIF
(Common Intermediate Format).
 Included support for both constant and variable bit rate encoding.
 Used discrete cosine transform (DCT) for spatial compression.
 Included audio compression (Layer 3 audio), leading to the popular MP3 audio format.
Usage: MPEG-1 was widely adopted for Video CDs, early digital video applications, and multimedia
content on the internet.
Short note on MPEG-2:
Digital television signals transmitted by cable, direct broadcast satellite TV systems, and terrestrial
(over-the-air) broadcasters are typically in MPEG-2 format. Additionally, it describes the format for
films and other content that is distributed on DVDs and comparable disks.
Key Features:
 Designed for higher bitrates and improved video quality compared to MPEG-1.
 Supported a broader range of video resolutions and frame rates, making it suitable for broadcast
television and DVD.
 Introduced interlaced video support for broadcast applications.
 Included support for multiple audio channels and various audio coding options.
Usage: MPEG-2 became the standard for digital television broadcasting, DVD video, and digital
storage of video content.
Short note on MPEG-4:
A group of techniques known as MPEG-4 define how to compress digital audio and video (AV) data.
Upon its introduction in late 1998, it was named as a standard for a collection of related technologies
as well as audio and video coding formats.
Key Features:
 Designed for a wide range of applications, including streaming media, video conferencing, and
interactive multimedia.
 Introduced object-based coding, enabling more efficient coding of video objects.
 Supported a variety of multimedia content types, including 2D and 3D graphics, text, and
audio.
 Advanced video coding techniques, including efficient compression algorithms and error
resilience features.
Usage: MPEG-4 is widely used for online streaming, video conferencing, multimedia applications,
and mobile video. It serves as a foundation for various codecs and formats, including H.264/AVC
(Advanced Video Coding) and AAC (Advanced Audio Coding).

Short note on MP@ML:


The acronym for "Main Profile at the Main Level" is MP@ML. Among the profiles and levels
specified in H.264/AVC, the "main profile at the main level" denotes a standard configuration utilized
for a number of applications, such as broadcasting, video conferencing, and streaming video.
Here's a brief explanation:
 Profile: A profile in video compression standards like H.264 specifies a group of functions and
resources that can be utilized while encoding. A typical set of characteristics appropriate for a
broad range of applications is included in the "main profile".
 Level: The level specifies constraints on parameters such as maximum bit rate, resolution, and
processing capability. The "main level" represents a baseline set of constraints that are
commonly supported across a broad range of devices and applications.

You might also like