Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

Multimedia refers to a number of different integrated media, such as text, images, audio, and video that

are generated, stored, and transmitted digitally and can be accessed interactively

Media types can be divided in two types depending on their behavior with respect to time.
Two Media types: Temporal and Non-Temporal
Temporal Media
The media has an associated time aspect. Example: its view changes with respect to time.
Examples: Audio, video, animation, music etc.
Non-Temporal Media
They are also known as a static media. It has the same representation regardless of time
Examples: texts, graphics, paintings, book etc

The common data encoding schemes for text are:


Plain text (ASCII) is text in an electronic format that can be read and interpreted by humans

Rich text is similar but it also embeds special control characters into the text to provide
additional features

Hypertext is an advance on rich text which allows the reader to jump to different sections within
the document or even jump to a new document. Hypertext is inherently nonlinear.

Texts consist of two structures:


Linear imposes strict linear progression on the reader
page 1 page 2 page 3 page 4
this is text this is text this is text this is text this is text
or is it or is it or is it or is it or is it
hypertext hypertext hypertext hypertext hypertext
only links only links only links only links only links
can tell can tell can tell can tell can tell

Non-Linear
• non-linear structure
– blocks of text (pages)
– links between pages create a mesh or network
– users follow their own path through information
bookm
hom
this is text ark
this is text this is text
this is text this is text
or is it or is it
e
or is it or is it or is it
hypertext hypertext
hypertext hypertext hypertext
only links only links back only links
only links
can tell
only links
can tell
this is text this is text this is text
can tell can tell
or is it link can tell
or is it or is it
extern hypertext hypertext hypertext
al link only links
can tell
only links
can tell
only links
can tell

Hypermedia
• Hypermedia is the generalization of hypertext to include other kinds of media: images,
audio clips and video clips are typically supported in addition to text.

• Individual chunks of information are usually referred to as documents or nodes, and the
connections between them as links or hyperlinks the so-called node-link hypermedia
model.
• The entire set of nodes and links forms a graph network. A distinct set of nodes and links
which constitutes a logical entity or work is called a hyper document
• www is an example of hypermedia

Synchronization
From the definition of the term multimedia system we can derive that there is a reference-
synchronization between the information encoded in various media. This reference has to be
derived, because it is a prerequisite for integrated processing, storage, representation,
communication, generation, and manipulation of multimedia information

Synchronization creates a relationship between independent objects (pieces of information,


media, processes, data streams, LDUs). Synchronization between media objects includes
relationships between time-dependent and time-independent objects. An example for
synchronization of continuous media from everyday life is the synchronization of visual and
acoustic information in television broadcasting. A multimedia system has to produce a similar
synchronization for audio and motion picture information.

An example for temporal relationships of time-dependent and time-independent media is a slide


show. The representation of the slides is synchronized to the commented audio stream. To
implement a slide show in a multimedia system, we have to synchronize the playback of pictures
to the relevant sections of the audio stream.
Time-dependent Presentation Units (Logical Data Units (LDUs))
Time-dependent media normally consists of a sequence of information units called logical data
units (LDUs). The LDUs of a media object often consist of several granularities.
For example, a symphony (see Figure below) can be composed of several movements. Each of
these movements is an independent part of the composition. It exists, in turn, of a sequence of
notes for various instruments. In a digital system, each note represents a sequence of audio
samples.
The granularity implies a hierarchical division of media objects. Often, there are two types of
hierarchies: The first hierarchy is a content hierarchy, based on the contents of the media objects.
In our symphony example, this would be the hierarchy of the symphony, the movements, and the
notes. The second hierarchy can be thought of as an encoding hierarchy, based on the data
encoding of the media object. In our symphony example, this could be a media object
representing a movement and divided into blocks and samples. The samples represent the lowest
level of the encoding hierarchy.

Events-based Synchronization
In an events-based synchronization, presentation actions are initiated by synchronization events,
typical presentation actions are:
• Start a presentation.
• Stop a presentation.
The events initiated by presentation actions can be external (e.g., generated by a timer) or
internal with regard to the presentation, in which case events are triggered when a continuous
media object reaches the specific logical data units LDU. This type of specification expands
easily to new synchronization types. Its main drawback is that it is difficult to handle in realistic
scenarios. The user may easily get confused by the type of state transitions defined in the
synchronization specification, so that both the creation and the maintenance are difficult.

Scripts
A script in this context is a textual description of a synchronization scenario. The elements of a
script are activities and subscripts. Scripts often grow into complete programming languages
implementing time operations. Scripts can relate to or make use of various specification
methods. A typical example is a script based on a fundamental hierarchical method, supporting
three main operations: the serial presentation, the parallel presentation, and the repeated
presentation of media objects.
Scripts offer many options, because they represent a complete programming environment. One
drawback is that these methods are procedural rather than declarative, while the declarative
approach appears to be easier to understand for non-expert users.
COMPRESSION

Reducing the volume of data to be exchanged is called compression. Compression is divided into
two broad categories:
1. Lossless
2. lossy compression.

Lossless Compression
In lossless compression, the integrity of the data is preserved because the compression and
decompression algorithms are exact inverses of each other. No part of the data is lost in the
process.
Lossless compression methods are normally used when we cannot afford to lose any data. For
example, we must not lose data when we compress a text file or an application program.

Lossless compression methods:


Entropy encoding
Run-length coding,
Statistical encoding
Dictionary coding,
Huffman coding, and
Arithmetic coding.

Run-length Coding
Run-length coding, sometimes referred to as run-length encoding (RLE), is the simplest
method of removing redundancy. It can be used to compress data made of any combination of
symbols. The method replaces a repeated sequence, run, of the same symbol with two entities: a
count and the symbol itself. For example, the following shows how we can compress a string of
17 characters to a string of 10 characters.

AAABBBBCDDDDDDEEE --------------------------- 3A4B1C6D3E

A modified version of this method can be used if there are only two symbols in the data, such as
a binary pattern made of 0s and 1s.

The compressed data can be encoded in binary using a fixed number of bits per digit. For
example, using 4 bits per digit, the compressed data can be represented as 1100001100001000, in which the
rate of compression is 26/16 or almost 1.62 for this example.
Dictionary Coding
There is a group of compression methods based on creation of a dictionary (array) of strings in
the text. The idea is to encode common sequences of characters instead of encoding each
character separately.
The dictionary is created as the message is scanned, and if a sequence of characters that is an
entry in the dictionary is found in the message, the code (index) of that entry is sent instead of
the sequence.

Lempel-Ziv-Welch (LZW) was invented by Lempel and Ziv and refined by Welch.
The interesting point about this encoding technique is that the creation of the dictionary is
dynamic; it is created by the sender and the receiver during the encoding and decoding
processes; it is not sent from the sender to the receiver.

Encoding
The following shows the process:
1. The dictionary is initialized with one entry for each possible character in the message
(alphabet). At the same time, a buffer, which we call the string(S), is initialized to the first
character in the message. The string holds the largest encodable sequence found so far. In the
initialization step, only the first character in the message is encodable.
2. The process scans the message and gets the next character in the message.
a. If the concatenation of the string and the scanned character is in the dictionary, the
string is not the largest encodable sequence. The process updates the string by
concatenating the character at the end of it and waits for the next iteration.
b. If the concatenation of the string and the scanned character is not in the dictionary, the
largest encodable sequence is the string, not the concatenation of the two.
Three actions are taken.
First, the process adds the concatenation of the two as the new entry to the dictionary.
Second, the process encodes the string.
Third, the process reinitializes the string with the scanned character for the next iteration.
3. The process repeats step 2 while there are more characters in the message.

Let us show an example of LZW encoding using a text message in which the alphabet is made of
two characters: A and B. The figure shows how the text "BAABABBBAABBBBAA" is encoded
as 1002163670.

Note that the buffer PreS holds the string from the previous iteration before it is updated.
Decoding
The following shows the process of decoding.
1. The dictionary is initialized as we explain in the encoding process. The first codeword is
scanned and, using the dictionary, the first character in the message is output.
2. The process then creates a string and sets it to the previous scanned codeword. Now it scans a
new codeword.
a. If the codeword is in the dictionary, the process adds a new entry to the dictionary,
which is the string concatenated with the first character from the entry related to the new
codeword. It also outputs the entry related to the new codeword.
b. If the codeword is not in the dictionary (which may happen occasionally), the process
concatenates the string with the first character from the string and stores it in the
dictionary. It also outputs the result of the concatenation.
3. The process repeats step 2 while there are more codewords in the code.
Huffman Coding
To compress data, we can consider the frequency of symbols and the probability of their
occurrence in the message. Huffman coding assigns shorter codes to symbols that occur more
frequently and longer codes to those that occur less frequently.
For example, imagine we have a text file that uses only five characters (A, B, C, D and E) with
the frequency of occurrence of (20, 10, 10, 30, 30). Probability (02, 0.1, 0.1, 0.3, 0.3)
Huffman Tree
To use Huffman coding, we first need to create the Huffman tree. The Huffman tree is a tree in
which the leaves of the tree are the symbols. It is made so that the most frequent symbol is the
closest to the root of the tree (with the minimum number of nodes to the root) and the least
frequent symbol is the farthest from the root.
1. We put the entire character set in a row. Each character is now a node at the lowest level of the
tree.
2. We select the two nodes with the smallest frequencies and join them to form a new node,
resulting in a simple two-level tree. The frequency of the new node is the combined frequencies
of the original two nodes. This node, one level up from the leaves, is eligible for combination
with other nodes.
3. We repeat step 2 until all of the nodes, on every level, are combined into a single tree.
4. After the tree is made, we assign bit values to each branch. Since the Huffman tree is a binary
tree, each node has a maximum of two children.
Coding Table
After the tree has been made, we can create a table that shows how each character can be
encoded and decoded.
The code for each character can be found by starting at the root and following the branches that
lead to that character. The code itself is the bit value of each branch on the path, taken in
sequence.

Encoding and Decoding


Figure shows how we can encode and decode using Huffman coding.

 Average Number of bits per code word = 2.2


 Entropy of the source = 2.17
 To represent 5 character (8 combination) = log 2 8 = 3

Advantage: No code is prefix of another.


Assigns shorter codes to symbols that occur more frequently and longer codes to
those that occur less frequently
Disadvantage: Both encoder and decoder needs same encoding table, tree cannot be created
dynamically.
Entropy of the source
The minimum number of average number of bits that are required to transmit a particular source
stream is known as entropy of the source
n
H ( x ) =−∑ pi log 2 pi
i=1

n
Average Number of bits per code word ∑ N i pi
i=1

Neumerical

Given the frequency / probability of occurrence 0.25, 0.25, 0.125, 0.125 and 0.125 of 6 different
character M, N, O, P, Q and R respectively. If encoding algorithm under consideration uses the
following set of code M=10, S=11, O=010, P=011, Q= 000 and R=001. Compute the following?

 Average Number of bits per code word


2*0.25 + 2*0.25 + 3*0.125 + 3*0.125 + 3*0.125 +3*0.125
2 * (2*0.25) + 4*(3*0.125) = 2.5

 Entropy of the source


H ( x ) =−¿0.25 log 2 0.25 + 0.25 log 2 0.25 + 0.125 log 2 0.125 + 0.125 log 2 0.125 + 0.125
log 2 0.125 + 0.125 log 2 0.125 )

= −¿0.25 log 2 0.25) + 4(0.125 log 2 0.125))

= 2.5

 The minimum number of bits that are required assuming fixed length code

= log 2 6 = 2.58 = APPROX 3


= log 2 8 = 3
Lossy Compression
Lossless compression has limits on the amount of compression. However, in some situations, we
can sacrifice some accuracy to increase the compression rate. Although we cannot afford to lose
information in text compression, we can afford it when we are compressing images, video, and
audio.

Pulse Code Modulation (PCM)


The most common technique to change an analog signal to digital data (digitization) is called
pulse code modulation (PCM). A PCM encoder has three processes, as shown in Figure

Sampling
The first step in PCM is sampling. The analog signal is sampled every Ts, where Ts is the sample
interval or period. The inverse of the sampling interval is called the sampling rate or sampling
frequency and denoted by fs, where fs = 1/Ts.
Quantization
The result of sampling is a series of pulses with amplitude values between the maximum and
minimum amplitudes of the signal. The set of amplitudes can be infinite with non-integral values
between the two limits. These values cannot be used in the encoding process. The following are
the steps in quantization:
1. We assume that the original analog signal has instantaneous amplitudes between Vmin and
Vmax.
2. We divide the range into L zones, each of height Δ (delta).

3. We assign quantized values of 0 to L − 1 to the midpoint of each zone.


4. We approximate the value of the sample amplitude to the quantized values.

As a simple example, assume that we have a sampled signal and the sample amplitudes are
between −20 and +20 V. We decide to have eight levels (L = 8). This means that Δ = 5 V. Figure
below shows this example.

Encoding: quantized values are encoded into bits. If the number of quantization levels is L, the
number of bits is nb = log2 L. In our example L is 8 and nb is therefore 3.
Predictive Coding
In predictive coding, we use this similarity. Instead of quantizing each sample separately, the
differences are quantized. The differences are smaller than the actual samples and thus require
fewer bits. Many algorithms are based on this principle.
Predictive coding is used when we digitize an analog signal. Pulse code modulation (PCM) as a
technique that converts an analog signal to a digital signal, using sampling. In PCM, samples are
quantized separately. The neighboring quantized samples, however, are closely related and have
similar values. Different types of predictive coding are Delta Modulation (DM), Adaptive DM
(ADM), Differential PCM (DPCM), Adaptive DPCM (ADPCM)

Delta Modulation (DM)


The simplest method in predictive coding is called delta modulation. Let xn represent the value
of the original function at sampling interval n and yn be the reconstructed value of xn. Figure
below shows the encoding and decoding processes in delta modulation.

In PCM the sender quantizes the samples (xn) and transmits them to the receiver. In delta
modulation, the sender quantizes en, the difference between each sample (xn) and the preceding
reconstructed value (yn–1). The sender then transmits Cn. The receiver reconstructs sample yn from
the received Cn.
The modulator, at each sampling interval, compares the value of the analog signal with the last
value of the staircase signal. If the amplitude of the analog signal is larger, the next bit in the
digital data is 1; otherwise, it is 0. The output of the comparator, however, also makes the
staircase itself. If the next bit is 1, the staircase maker moves the last point of the staircase signal
δ up; if the next bit is 0, it moves it δ down.

PCM needs to transmit 3 bits for each sample if the maximum quantized value is 7; DM reduces
the number of transmitted bits because it transmits a single bit (1 or 0) for each sample.

Adaptive DM (ADM)
Figure below shows the role of quantizer Δ on delta modulation. In the region where Δ is
relatively small compared to the slope of the original function, the reconstructed staircase cannot
catch up with the original function; the result is an error known as slope overload distortion. On
the other hand, in the region where Δ is relatively large compared to the slope of the original
function, the reconstructed staircase continues to oscillate largely around the original function
and causes an error known as granular noise.

Since most functions have regions with both large and small slopes, selecting a large value or a
small value for Δ decreases one type of error but increases the other type. Adaptive DM (ADM)
is used to solve the problem. In ADM, the value of Δ changes from one step to the next and is
calculated as shown below.
where Mn is called step-size multiplier and is calculated from the values of qn from a few
previous bits. There are many different algorithms for evaluating Mn; one simple algorithm is to
increase Mn by a certain percentage if qn remains the same and decrease it by a certain
percentage if qn changes. The adaptation can be further improved by delaying the coding process
to include knowledge about a few future samples in the evaluation of Mn.

Differential PCM (DPCM)


The differential PCM (DPCM) is the generalization of delta modulation. In delta modulation a
previously reconstructed sample yn–1 is called the predictor because it is used to predict the
current value. In DPCM, more than one previously reconstructed sample is used for prediction.
In this case the difference is evaluated as shown below.

Where the summation is the predictor, ai is the predictor’s coefficient (or weight), and N is the
order of the predictor. For DM, the order of the predictor is 1 and a1 = 1. The difference is
quantized as in DM and sent to the receiver. The receiver reconstructs the current value as shown
below.
Transform Coding
In transform coding, a mathematical transformation is applied to the input signal to produce the
output signal. The transformation needs to be invertible, to allow the original signal to be
recovered. The transformation changes the signal representation from one domain to another for
example time domain to frequency domain, which results in reducing the number of bits in
encoding.
The basic principle of transform coding is to map the pixel values into a set of linear transform
coefficients, which are subsequently quantized and encoded. By applying an inverse
transformation on the decoded transform coefficients, it is possible to reconstruct the image with
some loss. It must be noted that the loss is not due to the process of transformation and inverse
transformation, but due to quantization alone. Since the details of an image and hence it’s spatial
frequency content vary from one local region to the other, it leads to a better coding efficiency if
we apply the transformation on local areas of the image, rather than applying global
transformation on the entire image. Such local transformations require manageable size of the
hardware, which can be replicated for parallel processing.
For transform coding, the first and foremost step is to subdivide the image into non-overlapping
blocks of fixed size

Discrete Cosine Transform (DCT)


One of the popular transformations used in multimedia is called discrete cosine transform
(DCT).
One-Dimensional DCT In one-dimensional DCT, the transformation is the matrix multiplication
of a column matrix p (source data) by a square matrix T (DCT coefficient). The result is a
column matrix M (transformed data). Since the square matrix that represents the DCT coefficient
is an orthogonal matrix (inverse and transpose are the same), the inverse transformation can be
obtained by multiplication of the transformed data matrix by the transpose matrix of the DCT
coefficient. Figure below shows the transformation in matrix, in which N is the size of matrix T
and TT is the transpose matrix of T.
Example

Two-Dimensional DCT
Two-dimension DCT is what we need for compressing images, audio, and video. The principle is
the same, except that the source data and transformed data are two-dimensional square matrices.
To achieve the transformation with the same properties as mentioned for the one-dimensional
DCT, we need to use the T matrix twice (T and TT). The inverse transformation also uses the T
matrix twice, but in the reverse order.

You might also like