Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Q1.

Digital media is digitized content that can be transmitted over the internet
or computer networks. This can include text, audio, video, and graphics. This means
that news from a TV network, newspaper, magazine, etc. that is presented on a Web
site or blog can fall into this category. Most digital media are based on translating
analog data into digital data. The Internet began to grow when text was put onto the
Internet instead of stored on papers as it was previously. Soon after text was put
onto computers images followed, then came audio and video onto the Internet.
Digital media has come a long way in the few short years to become as we know it
today and it continues to grow.

An Analog signal is any continuous signal for which the time varying feature
(variable) of the signal is a representation of some other time varying quantity, i.e.,
analogous to another time varying signal. It differs from a digital signal in terms of
small fluctuations in the signal which are meaningful.

Properties of Digital vs Analog signals


Digital information has certain properties that distinguish it from analog
communication methods. These include

 Synchronization – digital communication uses specific synchronization


sequences for determining synchronization.
 Language – digital communications requires a language which should be
possessed by both sender and receiver and should specify meaning of symbol
sequences.
 Errors – disturbances in analog communication causes errors in actual intended
communication but disturbances in digital communication does not cause errors
enabling error free communication. Errors should be able to substitute, insert or
delete symbols to be expressed.
 Copying – analog communication copies are quality wise not as good as their
originals while due to error free digital communication, copies can be made
indefinitely.
 Granularity – for a continuously variable analog value to be represented in
digital form there occur quantization error which is difference in actual analog
value and digital representation and this property of digital communication is
known as granularity.

Advantages of Digital System:

 Less expensive, easy to implement.


 More reliable, you can save your data and retrieve it when needed.
 Easy to manipulate.
 Flexibility with system change, use of standardized receiver and Transmitter.
 Compatibility with other digital system.
 Integrated networks.

Disadvantages of Digital System:


 Sampling Error, sampling error is most common in many cases.
 Not wide bandwidth.
 Ambient weather Dependencies are high.
 Digital communications require greater bandwidth
 The detection of digital signals requires the communications system to be synchronized.
 Low life span as more Dependencies on outer source.

Advantages of Analog System:

 Uses less bandwidth


 More accurate
 Flexibility with bandwidth.
 Great lifespan and, ambient weather dependencies are low.
 Easy to handle not expensive, over-sensitive routings.

Disadvantages of Analog System:

 High cost of signal conversion inside the display


 Upgrade to digital interface not possible.
 No security for transmission data.

 We can convert analog image to digital image using sampling and

quantization. The process of manipulating digital images with a computer is

called as digital image processing.

 Pixel: In a digital image, all the coordinates on 2-d function and the

corresponding values are finite

 Gray Level: Each pixel has some intensity value which is called Gray level or

Gray value. These values are usually represented in 8-bit int. So the range of

values from 0 to 255. The values near to 0 indicates darker regions and the

values near 255 represent brighter regions.

 Types of Images: Binary, Grayscale, Color

 Binary Image: A binary image has only two possible gray values or

intensities 0 and 255, there are no intermediate values. Binary images are used

as masks for indicating the pixels of interest in many image processing tasks.

Below is the example of binary image.



 Binary Image (only 0 and 255)

 Grayscale Image: Grayscale image has range of values from 0 to 255 i.e,

each pixel location can have any value between 0 and 255. If you watch old

films around the 1950s, you are watching grayscale images (films are nothing

but videos which are collection of individual images in proper sequence). Here

is the example below


 Grayscale image (0–255 range)
 Color Image: Both binary image and grayscale image are 2-dimensional

arrays, where at every location, you have one value to represent the pixel.

Remember to represent a color image, we need more than one value for each

pixel. But how many values we need to represent a color? Typically you need 3
values for each pixel to represent any color. This came from the idea that any

color can be formed by combining 3 basic colors Red, Blue and Green. Ex: you

get yellow by mixing red and green, violet can be formed by combining red and

blue etc., This is actually called RGB color space. There are many other ways to

create color images which we will discuss in future discussions. Below is an

example of a color image.

Video: A video is just a collection of images in a proper sequence.

. JPEG uses lossy compression. Which means that information is lost in the

compression process.

Color Space Transform and Subsampling

Since monitors have three color components for every pixel, one for red, one

for green, and one for blue, they use the RGB color space. We will assume we

have an image in the RGB color space and we will save it as a JPEG image.
Human eyes are more sensitive to brightness than color. For this reason, JPEG

allocates more space for brightness information than color information. But to do so,

we first have to separate these two components, known

as luminance and chrominance.

JPEG uses the YCbCr color space. It consists of three components: luma

(Y), blue-difference chroma (Cb), and red-difference chroma (Cr). The

luma component represents brightness while the other two components (Cb and Cr)

represent the color information.

Converting from RGB to YCbCr:


Y = 0.299*R + 0.587*G + 0.144*B
Cb = - 0.1687*R - 0.3313*G + 0.5*B + 128
Cr = 0.5*R - 0.4187*G - 0.0813*B + 128

Converting from YCbCr to RGB:


R = Y + 1.402*(Cr-128)
G = Y - 0.34414*(Cb-128) - 0.71414*(Cr-128)
B = Y + 1.772*(Cb-128)
Note: These values must be truncated to the range 0–255.

MPEG work Temporal redundancy:


 Pixels in two video frames that have the same values in the same location
(some objects repeated again and again in every frame).
 It is removed with the help of Motion compensation technique Macroblock
 Each macroblock is composed of four 88 Luminance (Y) blocks and two 88
Chrominance (Cb & Cr) blocks.
 This set of six blocks is called a macro block.
 It is the basic hierarchical component used achieving high level of
compression.
MPEG constructs three types of pictures namely:
 Intra pictures (I-pictures)
 Predicted pictures (P-pictures)
 Bidirectional predicted pictures (B-pictures)
The MPEG algorithm employs following steps:
Intra frame DCT coding (I-pictures):
The I-pictures are compressed as if they are JPEG images.
 First an image is converted from RGB color model to YUV color model.
 In general, each pixel in a picture consists of three components: R (Red), G
(Green), and B (Blue).
 Apply DCT

 DCT is performed on small blocks of 8*8 pixels to produce blocks of DCT


coefficients.

The NXN two-dimensional DCT is defined as:


$$ F(u,v) = \frac{2}{N} C(u) C(v) \sum_{x=0}^{N-1} \sum_{y=0}^{N-1} f(x,y) cos \
frac{(2x+1)u\pi}{2N} cos\frac{(2y + 1)v\pi}{2N} $$
$$ C(u), C(v) = \{ \frac{1}{ \sqrt2} for u,v=0 $$
$$ = 1 otherwise $$
The inverse DCT (IDCT) is defined as:
$$ f(x,y) = \frac{2}{N} \sum_{u=0}{N-1} \sum_{v=0}{N-1} C(u) C(v) cos \frac{(2x+1)}
{2N} cos\frac{(2y+1)v\pi}{2N} $$
Where x, y are spatial co-ordinates in the image block u, v are co-ordinates in the
coefficient block.
Apply Quantization
 Quantization is process that attempts to determine what information can be
safely discarded without significant loss in visual fidelity.
 MPEG uses a matrix called quantizer (Q[i,j]) to define quantization step. Every
time when a pixels matrix (X[i,j]) with the same size to Q[i,j] comes ,use Q[i,j]
to divide x(i,j) to get quantized value matrix Xq[i,j].
 Quantization Equation Xq[i,j] = Round(X[i,j)/Q[i,j])
 After Quantization, perform Zig-zag scanning to gather even more
 In Huffman coding, we give shorter keywords to more frequently coefficients &
longer keywords to least frequently occurring coefficients.
 Hence achieving final level of compression.
 GIFs work by compressing a set of frames or images into an image
sequence that then loop for a set amount of time (or forever). Each image can
contain up to 8 bits per pixel, allowing for a limit of 256 colors. GIFs are
compressed using the Lempel-Ziv-Welch lossless data compression, which
reduces file size without degrading visual quality. Take what I said about
reducing file size with a grain of salt - video of the same length takes up much
less space than a GIF of the same length. GIF quality also typically isn't that
great due to the color limitation.
 Fun Side Note: The creators of GIF pronounced it JIF (same sound as J in
jog). This is pretty disappointing for those of us who say it with a hard G
sound, but you can't always get what you want.
 The value of a GIF, is that it's counted as an image. You don't have to do any
special storage or processing for GIFs, you just add them to your website,
they count as an image and play. So in this way, they are cost effective,
because you are likely not paying to host the GIF somewhere (though that's
possible, we have sites ike GIPHY for example).
 GIFs are especially useful if you're doing something with limited colors, well
defined edges and lines, and plan to use them sparingly. Because they use a
lossless compression, they're very big, so putting a bunch of GIFs on your
mobile app or website might seem like a good idea but it will slow the
performance.

JPEG compression work? 

The algorithm can be separated in different steps. We only show the steps for the
compression, the de-compression works in the opposite order. We show only the
most common compression: The lossy compression of 8bit RGB data. “Lossy”
means, that the compression will also reduce some of the image content (in
opposite to lossless compression).
Color Conversion and Subsampling

Starting with the RGB data, this data is divided into the luminance and the color
components. The data is converted to Yuv. This step does not reduce the amount
of data as it just changes the representation of the same information. But as the
human observer is much more sensitive to the intensity information than to the
color information, the color information can be sub-sampled without a significant
loss of visible image information. But of course the amount of data is reduced
significantly.
Block-Processing and DCT

The Yuv image data with the subsampled color components is then divided into
8x8pixel blocks. The complete algorithm is performed from now on on these pixel
blocks. Each block is transformed using the discrete Cosines Transformation
(DCT). What does this mean? In the spatial domain (so before we have
transformed) the data is described via digital value for each pixel. So we represent
the image content by a list of pixel number and pixel value. After the transformation,
the image content is described by the coefficient of the spatial frequencies for
vertical and horizontal orientation. So in the spatial domain, we need to store 64
pixel values. In the frequency domain, we have to store 64 frequency coefficients.
No data reduction so far.
 

Quantization

To reduce the needed amount of data to store the 64 coefficients, these are
quantized. Depending on the size of the quantization steps, more or less
information is lost in this step. Most of the times, the user can define the strength of
the JPEG compression. The quantization is the step where this user information
has influence on the result (remaining image quality and file size).
The MPEG compression algorithm encodes First a reduction of the resolution is done,
which is followed by a motion compensation in order to reduce temporal redundancy. The
next steps are the Discrete Cosine Transformation (DCT) and a quantization as it is used for
the JPEG compression; this reduces the spatial redundancy (referring to human visual
perception). The final step is an entropy coding using the Run Length Encoding and the
Huffman coding algorithm.
Step 1: Reduction of the Resolution The human eye has a lower sensibility to colour
information than to dark-bright contrasts. A conversion from RGB-colour-space into YUV
colour components help to use this effect for compression. The chrominance components U
and V can be reduced (subsampling) to half of the pixels in horizontal direction (4:2:2), or a
half of the pixels in both the horizontal and vertical (4:2:0).
: Motion Estimation An MPEG video can be understood as a sequence of frames. Because
two successive frames of a video sequence often have small differences (except in scene
changes), the MPEG-standard offers a way of reducing this temporal redundancy. It uses
three types of frames: I-frames (intra), P-frames (predicted) and B-frames (bidirectional).
Step 3: Discrete Cosine Transform (DCT) DCT allows, similar to the Fast Fourier Transform
(FFT), a representation of image data in terms of frequency components. So the frame-
blocks (8x8 or 16x16 pixels) can be represented as frequency components. The
transformation into the frequency domain is described by the following formula:

Gif compression algorithm


The Gif file format uses the LZW compression algorithm developed by
Abraham Lempel, Jakob Ziv and Terry Welch. The compression algorithm
constructs a color table for an image wherein each color value is matched to a
pixel.. Complex images are those without areas of flat color
and generally contain many colors. The LZW Gif compression algorithm cannot
reduce the file sizes of such images well: 17,848 bytes
Horizontal Pixel change
One more point on the LZW compression algorithm – it counts the pixel change
horizontally. Therefore, images that involve horizontal color changes will be
larger than those that have vertical color changes. Take a look at the two
example images below:

An image with little horizontal color change: 324 bytes

An image with a significant amount of horizontal color


change (It is the two differently colored vertical bands that cause so much of
horizontal color change): 1109 bytes

626 bytes
1286 bytes
You’ll notice that the image that had lesser horizontal color change (the one with
horizontal color bands), almost doubled in file size while the one in which we
had vertical color bands increased only about 15%.
Q.3 JPEG compression is used in a number of image file formats. JPEG/Exif is the
most common image format used by digital cameras and other photographic image
capture devices; along with JPEG/JFIF, it is the most common format for storing and
transmitting photographic images on the World Wide Web.[9] These format variations
are often not distinguished and are simply called JPEG.
The MIME media type for JPEG is "image/jpeg," except in older Internet
Explorer versions, which provide a MIME type of "image/pjpeg" when uploading
JPEG images.[10] JPEG files usually have a filename extension of "jpg" or "jpeg."
JPEG/JFIF supports a maximum image size of 65,535×65,535 pixels, [11] hence up to
4 gigapixels for an aspect ratio of 1:1. In 2000, the JPEG group introduced a format
intended to be a successor, JPEG 2000, but it was unable to replace the original
JPEG as the dominant image standard

Because of its compressed nature, it’s ideal for uploading images to the web. It is also
one of the most universal file formats as it can be used by just about any software
and can be printed by just about anyone. Unless you compress your photo with
editing software, it should be good.

Most of the time, if you’re using pictures from your phone or images shared across
the Internet, JPEGs will be pretty much the same from one another, ending in either a
.JPEG or .JPG extension. However, images taken with a higher-end digital camera
might have Exchangeable image file (or Exif) metadata that has information such as
what kind of camera was used and color settings when the picture was taken. These
are typically meant for more professional printing facilities.

Q.4 why Computer Vision Is Difficult

Those are the seven most important drivers of complexity that make
computer vision difficult:

 Collecting input data specific to the problem


 Expertise with the popular Deep Learning frameworks like
Tensorflow, PyTorch, Keras, Caffe, MXnet for training and evaluating
Deep Learning models
 Selecting the appropriate hardware (e.g., Intel, NVIDIA, ARM) and
software platforms (e.g., Linux, Windows, Docker, Kubernetes) and
optimizing Deep Learning models for the deployment environment
 Managing deployments to thousands of distributed Edge devices
from the Cloud (Device Cloud)
 Organizing and rolling out updates across the fleet of Edge endpoints
that may be offline or experience connectivity issues.
 Monitoring metrics from all endpoints and data analysis in real-time.
Regular inspection is needed to make sure the system is running as
intended.
 Knowledge about data privacy and security best practices. Data
encryption at rest and in transit and secure access management are
an absolute necessity in computer vision.

You might also like