Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 23

1.

Multimedia Basics and Text


and software engineers.

ii) What is noise in relation to multimedia data? For each respective modality
audio, photographic image and digital video give one example of a possible
source of noise and how this noise might manifest itself in the data.

Noisy data is meaningless data, often from data corruption in the recording or
sampling process or from extraneous source during the recording.

Audio (One of each):

Source : Analog-digital, digital-analog conversion, aliasing, interference, background


noise, analog hiss/hum, transmission artefact, compression artefact

Manifestation: Hiss, hum, stuttering, jitters, ringing of high frequencies, muffled


sounds

Photographic image (One of each):

Source: Analog-digital, digital-analog conversion, aliasing, interference, transmission


artifact, compression artifact

Manifestation: salt and pepper noise, graininess, blocky pictures, low image contrast

Digital Video (One of each):

Source: Analog-digital, digital-analog conversion, aliasing, interference, transmission


artifact, compression artifact

Manifestation: Per Frame: salt and pepper noise, graininess, blocky pictures, low
image contrast. Video jitters, missing frames
iii) Explain basics of Multimedia and differentiate between discrete and
continuous media.

Discrete Media: refers to the media involving space dimension only (e.g. Still
images, text and graphics).Discrete media is also referred to as static media or space-
based media or non-temporal media.

Continuous Media: refers to time-based media (e.g. Sound , Movies and


Animation). Continuous media is also referred as dynamic media or time-based media
or temporal media.

iv) Discuss the typographic etiquette and typographic goals

Typographic Etiquette
1. Select a font
2. Modify the font size
3. Scale your headings
4. Set-Line spacing
5. Add tracking and kerning to make the text look more roomy
6. Add white space between headers and the body text.
7. Use a line -length of 45-50 characters.
Typographic goals
I. To remain invisible to the reader.
II. To increase clarity and readability.
III. To subtly indicate voice and tone of speaker.

v) Differentiate between Hypertext and Hypermedia with respective examples.


Hypertext
HYPER TEXT HYPER MEDIA
1. Hypertext refers to the system 1- Hypermedia refers to
of managing the information connecting hypertext with other
related to the plain text. media such as graphics, sounds,
and animations.
2- It simply allows users to move 2. It extends the ability of
from one document to another hypertext and allows user to
with a single click on the click text or any other
hypertext or go to links multimedia to move one to
another page.
3- Hypertext attracts to the user to 3. Hypermedia attracts the user
move around the document as more than the hypertext as it
well as to move from one page gives more flexibility of
of document to another. movement.
4- It provides a less user 4. It provides a better user
experience to the users than the experience to the users than the
hypermedia. hypertext.
5- Example includes reading a 5. Example includes reading an
blog on a website, and click on article on a website, and click
go to links to move to the next on an image takes the user to its
part. associated page

vi) Define
a) Multimedia System
b) Typography
c) Typefaces

a) Multimedia System

Multimedia means that computer information can be represented through audio,


video, and animation in addition to traditional media (i.e., text, graphics drawings,
images).

b) Typography

A collection of letters, numbers, punctuation, and other symbols used to set text (or
related) matter. Although font and typeface are often used interchangeably, font refers
to the physical embodiment (whether it's a case of metal pieces or a computer file)
while typeface refers to the design (the way it looks).

c) Typefaces
Typeface refers to a group of characters, letters and numbers that share the same
design. For example Garamond, Times, and Arial are typefaces. For example, Arial is
a typeface; 16pt Arial Bold is a font. So typeface is the creative part and font is the
structure.

2. Color and Image

i) Different color models are often used in different applications, discuss these
different color models.

There are two basic kinds of color models, additive and subtractive. Let's look at an
additive color model first. The most common one is Red,Green,Blue (RGB) and other
is Cyan, Magenta, Yellow, Black (CMYK).

The RGB Color Model


This color model uses light to create color, and it's used for digital media. When you play a game
on your smart phone or watch a movie on your TV, you're seeing color in an RGB color space.
RGB is called an additive color model because when the three colors of light are shown in the
same intensity at the same time, they produce white. If all the lights are out, they create black.

The CMYK Color Model

When printing color images, you can't use colored light, and that means images can
not be printed in RGB. That's where the other color model comes in. A subtractive
color model adds pigment in the form of ink or dye that causes an absence of white.
The most common subtractive color model is Cyan/Magenta/Yellow/Black, usually
referred to as CMYK.

ii) What is the YIQ color model? Give one application in which this color model
is most commonly used and explain the reason.

The YIQ color space model is use in U.S. commercial color television broadcasting
(NTSC). It is a rotation of the RGB color space such that the Y axis contains the
luminance information, allowing backwards-compatibility with black-and-white color
TV's, which display only this axis of the color space.

Application
YIQ color model is used for US TV broadcast
This model was to designed to separate chrominance (I and Q ) from luminance (Y).
This was the requirement in the early days of color television when black-white set
were expected to pickup and display what were originally color pictures.
The Y channel contain luminance information (sufficient for black-and-white
television sets) while the I and Q channel carried the color information

iii) Explain why JPEG compression is not always suitable for compression of
images that contain sharp edges or abrupt changes of intensity (such as black
text on a white background).

 Low pass filtering less to blurring of edges - High Frequency component will not
be small as assumed by JPEG.
 Ringing artefacts occur due to Gibbs phenomenon: Fourier sums overshoot at a
jump discontinuity, and this overshoot does not die out as the frequency increases

iv) Shown below is a JPEG quantization table. Explain why the values in the top
left corner are smaller than the values in the bottom-right corner. Why are some
values not symmetrical with respect to the
main diagonal? What would happen to the quality of the picture if all values in
the table were halved?

16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103
77 24 35 55 64 81 104
113 92 49 64 78 87 103
121 120 101 72 92 95
98 112 100 103 99

• Eye is most sensitive to low frequencies (upper-left corner), less sensitive to high
frequencies (bottom-right corner). Hence we quantise more aggressively in the high-
frequency range.

• This is the result of perceptual experiments. The experiments determined perception


thresholds in these frequency bands. (See e.g. Lohscheller, H. “A Subjectively
Adapted Image Communication System”, IEEE trans. on Comm., 1984.)

• The quality would be reduced (doubling the values makes quantization more
aggressive).

v) Explain different coding techniques, such as, entropy coding/Huffman coding


with an example.

Entropy Coding
In information theory an entropy coding (or entropy encoding) is a lossless data
compression scheme that is independent of the specific characteristics of the
medium. ... Two of the most common entropy coding techniques are Huffman coding
and arithmetic coding.

Entropy example

Entropy of e1…….en is maximized


When p1=p2=…….pn=1/n H(e1…en) =Log2n
No symbol is better than the other or contains more information
2k symbols may be represent by k bits
Entropy of p1,…pn is minimized when
p1=1, p2=…=pn=0 H(e1,…,en)=0

Entropy example
Entropy calculation for a two symbol alphabet.
Example 1: A pA=0.5
B pB=0.5
H(A,B) = -PA log2 PA - PB log2 pB =
= - 0.5log2 0.5,- 0.5log 0.5= 1

Huffman coding
 Huffman Coding is a famous Greedy Algorithm.

 It is used for the lossless compression of data.

 It uses variable length encoding.

 It assigns variable length code to all the characters.

 The code length of a character depends on how frequently it occurs in the given text.

 The character which occurs most frequently gets the smallest code.

 The character which occurs least frequently gets the largest code.

 It is also known as Huffman Encoding.

Huffman code

Letter Frequency Code Bits


E 120 0 1
D 42 101 3
I 42 110 3
U 37 100 3
C 32 1110 4
M 24 11111 5
K 7 111101 6
Z 2 111100 6

The Huffman tree (for the above example) is given below

306
0
1

120 186
0 1
e

107
79 1
0
0
1 0 65
42 1
37 42 I
d 32
u
c 0 33
1

9
24
0 m
1

2 7
z k

vi) Define
a) Dithering
b) GIF File
c) Human visual acuity
d) Color Harmony Schemes

a) Dithering
Dithering is an image processing operation used to create the illusion of color depth in
images with a limited color palette. Colors not available in the palette are
approximated by a diffusion of colored pixels from within the available palette.

b) GIF File
GIF stands for “Graphics Interchange Format”. It's a bitmap image format which was
created by CompuServe in 1987. GIF images are compressed with a lossless
compression but the size of the files are significantly small. It is one of the most
widely used image format on CorelDraw.

c) Human visual acuity


Human visual acuity refers to the sharpness of one's vision; it is a measure of how
well a person sees. The Snellen chart test is commonly used to measure acuity. This
test determines how much detail a person can distinguish on a standardized chart of
letters when standing 20 feet away.

d) Color Harmony Schemes


Color harmony is the term for colors that are thought to match.In other words,colors
that look aesthetically pleasing side-by-side. This is more an art than science as color
perception is influenced by cognitive factors,emotion and culture.
Color schemes that are in harmony are said to match, while those are that seem out of
harmony are said to clash.

3. Digital Audio
i) Describe the difference between reverb and echo.
Reverberation

. The phenomenon of persistence or prolongation of audible sound after the source


has stopped emitting sound is called reverberation.

. Reverberation is usually experienced in closed spaces as there are multiple number


of objects through which sound gets reflected.

. Reverberation can be heard when the sound gets reflected by a nearby wall.

Echo

. It is a phenomenon of repetition of sound of a source by reflection from an obstacle.


. An echo can be heard both in open as well as closed spaces.
. An echo can be heard only when the distance between the source of sound and the
reflecting body is at least 17.2m.

ii) Audio signals are often sampled at different rates. CD quality audio is
sampled at 44.1kHz rate while telephone quality audio sampled at 8kHz. What
are the maximum frequencies in the input signal that can be fully recovered for
these two sampling rates? Briefly describe the theory you use to obtain
the results.
• CD quality audio, the maximum frequency: 44,100Hz / 2 = 22,050Hz.
• Telephone quality audio, the maximum frequency: 8kHz / 2 = 4kHz.
• This is based on Nyquist theorem: the sampling frequency for a signal must be at
least twice the highest frequency component in the signal.

iii) Explain the MP3 audio compression algorithm.

A typical uncompressed wave file might be as big as 30 MB for a typical 3 minute


song. But after being run through the MP3 compression algorithms that might drop
down to 3 MB without any serious loss of quality.
he people who designed these compression algorithms used our knowledge
of psychoacoustics to manage the data bandwidth. Psychoacoustics refers to how our
brain interprets sounds.
The brain uses certain tricks like auditory masking to allocate resources and attention
to what is the most important sound happening at any given time. Using this info, we
know what we can get rid of, data-wise.
The first and easiest savings are to go ahead and cut out a certain frequency range if
the music allows for it.

DE-EMPHASIZE THE QUIET


This refers to something our ears and brains do called simultaneous masking.
Basically, if a loud sound is blaring out over the top of a lot of low-volume sounds,
you're naturally going to focus on the loud sound. What this means is that we can
spend lot less data on the quiet sounds. They don't need as much detail encoded in
them during those times
TEMPORAL MASKING
In the same fashion above, if two sound events occur within milliseconds of each
other, we're only going to be able to focus on the loudest one. It's how we've been
evolutionarily primed to react. Our ears and minds can't separate events that close in
time.
So what the encoder algorithm does is ignore or at least allocate much less data to the
quieter sound since we won't perceive it anyways.
BIT RATE, BIT DEPTH, & SAMPLE RATE MANAGEMENT
And finally this is where the real work is done.
First and foremost, MP3 is a lossy data compression technique by definition because
we immediately drop the bit depth of the audio from 24 bit or above down to 16 bit.
Lossy refers to this drop in resolution but doesn't have to mean a loss in audio quality.
16 bit is a depth that has plenty of headroom to provide a high signal-to-noise ratio. It
means that every sample has 16 bits to encode with (using a 0 or a 1 in binary). By
dropping from 24 bit to 16 bit we've already made a 25% saving in size with no
discernible quality difference.
The basics is that a lower sample rate captures less "snap shots" of each moment of
music. You can think of it like a movie or a video game at 60 frames-per-second
versus the typical 24 fps. 24 is more than good enough but 60 looks great during fast
action scenes. It works the same for music and sample rates.
And finally we set a limit to the data throughput. This takes into account everything
mentioned above and then sets a ceiling on how much data you can send at once.
Most MP3 streaming and selling services use a CBR, which is a constant bit rate,
usually of 128 kilobytes per second.

iv) Define:
a) decibel
b) Critical Band
c) Masking
d) Midi

a) Decibel

A decibel is a unit of measurement which is used to indicate how loud a sound is.
Continuous exposure to sound above 80 decibels could be harmful.
b) Critical Band

The band of frequencies in a masking noise that are effective in masking a tone of a


given frequency (see auditory masking). The width of this band, in hertz (Hz), is the
critical bandwidth.

c) Masking

When talking about editing and processing images the term 'masking' refers to the
practice of using a mask to protect a specific area of an image, just as you would use
masking tape when painting your house. Masking an area of an image protects that
area from being altered by changes made to the rest of the image.
d) Midi
MIDI (Musical Instrument Digital Interface) is a protocol designed for recording and
playing back music on digital synthesizers that is supported by many makes of
personal computer sound cards. Originally intended to control one keyboard from
another, it was quickly adopted for the personal computer.

4. Video and Animation

i) Discuss the disadvantages of a simple approach to video compression is to


compress each frame of the video using the JPEG pipeline., in fact it is done in
the Motion JPEG (or M-JPEG) video format.

Compressing each frame independently does not exploit temporal redundancy in


videos, yielding a poor compression ratio.

ii) Explain the key differences between I-frames, P-frames and B-frames in
MPEG-2 video compression. Describe the advantages and disadvantages of using
B-frames.
I frame compression removes the spatial redundancy of the image, and P and B
frames remove the temporal redundancy,
An I-frame or a Key-Frame or an Intra-frame consists ONLY of macro blocks that
use Intra-prediction. It can only use “spatial redundancies” in the frame for
compression. Spatial Redundancy is a term used to refer to similarities between the
pixels of a single frame.
P-frame also known as Inter-frames stands for Predicted Frame and allows macro
blocks to be compressed using temporal prediction in addition to spatial prediction. 
A B-frame is a frame that can refer to frames that occur both before and after it.
The B stands for Bi-Directional for this reason.

Advantantage of B-frames
• Coding efficiency.
• Most B frames use fewer bits.
• Quality can also be improved in the case of moving objects that reveal hidden areas
within a video sequence.
Disadvantage of B-frames:
• Frame reconstruction memory buffers within the encoder and decoder must be
doubled in size to accommodate the 2 anchor frames.
• More delays in real-time applications

iii) Explain in detail various video compression techniques.

Spatial Compression
Spatial compression techniques are based on still image compression. The most
popular technique, which is adopted by many standards, is the transform technique. In
this technique, the image is split into blocks and the transform is applied to each
block. The result of the transform is scaled and quantized. The quantized data is
compressed by a lossless entropy encoder and the output bitstream is formed from the
result. The most popular transform algorithm is the Discrete Cosine Transform (DCT)
or its modifications. There are many other algorithms for spatial compression such as
wavelet transform, vector coding, fractal compression, etc.
Temporal Compression
Temporal compression can be a very powerful method. It works by comparing
different frames in the video to each other. If the video contains areas without motion,
the system can issue a short command that copies that part of the previous frame, bit-
for-bit, into the next one. If some of the pixels are changed (moved, rotated, change
brightness, etc.) with respect to the reference frame or frames, then a prediction
technique can be applied. For each area in the current frame, the algorithm searches
for a similar area in the previous frame or frames. If a similar area is found, it’s
subtracted from the current area and the difference is encoded by the transform coder.
The reference for the current frame area may also be obtained as a weighted sum of
corresponding areas from previous and consecutive frames. If consecutive frames are
used, then the current frame must be delayed by some number of frame periods.

iv)Describe various video formats, codecs and containers.

A container and a codec are two components of any video file. A video format is the
container that stores audio, video, subtitles and any other metadata. A codec encodes
and decodes multimedia data such as audio and video.

v) Define
a) PAL
b) Progressive Scan
c) Animation Techniques
d) Motion Compensation

a) PAL

PAL is an abbreviation for Phase Alternate Line. This is the video format standard
used in many European countries. A PAL picture is made up of 625 interlaced lines
and is displayed at a rate of 25 frames per second. SECAM is an abbreviation for
Sequential Color and Memory.

b) Progressive Scan
Progressive scanning (alternatively referred to as non-interlaced scanning) is a format
of displaying, storing, or transmitting moving images in which all the lines of each
frame are drawn in sequence.

c) Animation Techniques

This can refer to two things: traditional animations made from paper or other similar
medium and vector-based animations made on computer. ... You make all the
drawings digitally on a computer and play those images to give an animation affect.
So, it's comparatively easier and quicker than the traditional technique.

d) Motion Compensation

Motion compensation is an algorithmic technique used to predict a frame in a video,


given the previous and/or future frames by accounting for motion of the camera
and/or objects in the video. It is employed in the encoding of video data for video
compression, for example in the generation of MPEG-2 files.

5. Miscellaneous
i) Discuss Immersive Reality and differentiate between Virtual Reality and
Augmented Reality.

Immersive Reality
It represents the next step of augmented reality and builds up itself through the
immersion in a specific room: the “cave” (virtual automatic ambient), to develop
an immersive virtual reality in which some projectors are toward three or more
displays. In this case an high definition projection system with 3D special glasses has
been made, searching for multi-sensorial effects (3D sound).
Virtual Reality

. Virtual reality replaces the real world with the artificial


. The user enters an entirely immersive world and cut off from real world.
. Everything around the user is fabricated by the system.This may display inside a
blank room , headset or other device that allows the user to feel present in the virtual
environment.

. VR might work better for video games and social networking in a virtual
environment , such as Second Life , or even PlayStation Home.

Augmented Reality

. Augmented reality enhances real life with artificial images and adds graphics ,
sound & smell to the natural world , as it exists.

. The user can interact with the real world , and at the same time can see, both the real
and virtual world.

. User is not cut off from the reality.


. AR uses device such as Smartphone on wearable device - Which contains software,
sensors , a compass and small digital projectors which display images onto real world
objects.

ii) Explain the following Multimedia Skills:


a) Project Manager
b) Multimedia Programmer
c) Video and Audio Specialist

A) Project Manager

. Center of action
. Responsible for overall development and implementation of a project as well as day-
to-day operations

. Budgets

. Schedule

. Creative session

. Time Sheets
. Illness

. Invoices

. Team dynamics

. Technical & operational expert

b) Multimedia Programmer

. Software Engineer
. Integrates all the multimedia elements of a project into a seamless whole using
authoring system or programming language

. JavaScript , OpenScript , Lingo, Autoware , Java , C++ , etc.

c) Video and Audio Specialist


Video Specialist

. Video specialist is responsible for :

. entire team of videographers


. Sound technicians
. lighting designers
. set designers
. script supervisors
. gaffers
. grips,
. production assistants
. actors
. However , for many modest projects , a video specialist many shoot and edit all of
the footage without outside help.

Audio Specialist

. Wizard who make a multimedia program come alive , designing and producing
music , voice-over narrations , and sound sound effects.

. Selecting suitable music and talent , and scheduling recording sessions , and
digitizing and editing recorded material into computer file.

iii) Briefly describe Multimedia Authoring paradigms and discuss Multimedia


Authoring tools
Multimedia Authoring paradigms
The paradigm is that of a programming language, which specifies (by filename)
multimedia elements, sequencing, hotspots, synchronization, etc. A powerful, object-
oriented scripting language is usually the centerpiece of such a system; in-program
editing of elements (still graphics, video, audio, etc.)

Multimedia Authoring tools


Multimedia authoring tools give an integrated environment for joining together the
different elements of a multimedia production. It gives the framework for organizing
and editing the components of a multimedia project

iv) Discuss Content Distribution Networks and Quality of Service (QoS)


Parameters (Delay jitter etc.) important for streaming Multimedia.

"Quality of Service" (QOS) refers to certain characteristics of a data link connection


as observed between the connection endpoints. QOS describes the specific aspects of
a data link connection that are attributable to the DLS provider. QOS is defined in
terms of QOS parameters.

QoS Parameters
To provide and sustain QoS, resource management must be QoS-driven. To allocate
resources, the resource management system must consider different parameters:
resource availability;
resource control policies, including Service Level Agreements (SLA); QoS
requirements of applications, which are quantified by QoS parameters (e.g. Jitter,
Delay, Packet loss).

Jitter: Jitter is the delay variation and is introduced by the variable transmission of
delay of the packets over the network. This can occur because of routers' internal
queues behavior in certain circumstances (e.g. flow congestion), routing changes, etc.
This parameter can seriously affect the quality of streaming audio and/or video.

Delay: this parameter is intrinsic to communications, since the end points are distant
and the information will consume some time to reach the other side. Delay is also
referred as to latency. Delay time can be increased if the if packets face long queues
in the network (congestion), or crosses a less direct route to avoid congestion.

Packet Loss: happens when one or more packets of data being transported across the
internet or a computer network fail to reach their destination. Wireless and IP
networks cannot provide a guarantee that packets will be delivered at all, and will fail
to deliver (drop) some packets if they arrive when their buffers are already full. This
loss of packets can be caused by other factors like signal degradation, high loads on
network links, packets that are corrupted being discarded or defect in network
elements.
v)Discuss Multimedia Hardware (Processor/GPUs, Input Devices, Output
Devices)

Processor/GPUs

A graphics card (also called a video card, display card, graphics adapter, or display
adapter) is an expansion card which generates a feed of output images to a display
device (such as a computer monitor).

Input Devices

Keyboard- Most common and very popular input device is keyboard. The keyboard
helps in inputting the data to the computer.

Mouse - Mouse is most popular Pointing device. It is a very famous cursor-control


device

Joystick - Joystick is also a pointing device, which is used to move cursor position on
a monitor screen.

Track Ball - Track ball is an input device that is mostly used in notebook or laptop
computer, instead of a mouse

Scanner - Scanner is an input device, which works more like a photocopy machine

Digitizer - Digitizer is an input device, which converts analog information into a


digital form.

Magnetic Ink Card Reader (MICR) - MICR input device is generally used in banks
because of a large number of cheques to be processed everyday.
Optical Character Reader (OCR) - OCR is an input device used to read a printed
text. 

Bar Code Readers - Bar Code Reader is a device used for reading bar coded data
(data in form of light and dark lines).

Optical Mark Reader (OMR) - OMR is a special type of optical scanner used to
recognize the type of mark made by pen or pencil. 

Microphone- Microphone is an input device to input sound that is then stored in


digital form.

Digital Camera - Digital camera is an input device to input images that is then stored
in digital form.

Digital Video Camera - Digital Video camera is an input device to input


images/video that is then stored in digital form.

Output Devices

Monitors - Monitor commonly called as Visual Display Unit (VDU) is the main
output device of a computer.

Printers - Printer is the most important output device, which is used to print
information on paper.

Screen Image Projector - Screen image projector or simply projector is an output


device used to project information from a computer on a large screen so that a group
of people can see it simultaneously.

Speakers and Sound Card - Computers need both a sound card and speakers to hear
audio, such as music, speech and sound effects.
v) Define:
a) Mixed Reality
b) Net neutrality
c) Priority Scheduling
d) RSVP

a) Mixed Reality

Mixed Reality is a blend of physical and digital worlds, unlocking natural and
intuitive 3D human, computer, and environment interactions. This new reality is
based on advancements in computer vision, graphical processing, display
technologies, input systems, and cloud computing.

b) Net neutrality

Net neutrality is the principle that an internet service provider (ISP) has to provide
access to all sites, content and applications at the same speed, under the same
conditions without blocking or giving preference to any content.

c) Priority Scheduling

Priority scheduling is a non-preemptive algorithm and one of the most common


scheduling algorithms in batch systems. Each process is assigned a priority. Process
with highest priority is to be executed first and so on. Processes with same priority are
executed on first come first served basis.

d) RSVP

The Resource Reservation Protocol (RSVP) is a transport layer protocol designed to


reserve resources across a network using the integrated services model. ... RSVP can
be used by hosts and routers to request or deliver specific levels of quality of service
(QOS) for application data streams.
The End

You might also like