Professional Documents
Culture Documents
Q1. Digital Media
Q1. Digital Media
Digital media is digitized content that can be transmitted over the internet
or computer networks. This can include text, audio, video, and graphics. This means
that news from a TV network, newspaper, magazine, etc. that is presented on a Web
site or blog can fall into this category. Most digital media are based on translating
analog data into digital data. The Internet began to grow when text was put onto the
Internet instead of stored on papers as it was previously. Soon after text was put
onto computers images followed, then came audio and video onto the Internet.
Digital media has come a long way in the few short years to become as we know it
today and it continues to grow.
An Analog signal is any continuous signal for which the time varying feature
(variable) of the signal is a representation of some other time varying quantity, i.e.,
analogous to another time varying signal. It differs from a digital signal in terms of
small fluctuations in the signal which are meaningful.
Pixel: In a digital image, all the coordinates on 2-d function and the
Gray Level: Each pixel has some intensity value which is called Gray level or
Gray value. These values are usually represented in 8-bit int. So the range of
values from 0 to 255. The values near to 0 indicates darker regions and the
Binary Image: A binary image has only two possible gray values or
intensities 0 and 255, there are no intermediate values. Binary images are used
as masks for indicating the pixels of interest in many image processing tasks.
each pixel location can have any value between 0 and 255. If you watch old
films around the 1950s, you are watching grayscale images (films are nothing
but videos which are collection of individual images in proper sequence). Here
Grayscale image (0–255 range)
Color Image: Both binary image and grayscale image are 2-dimensional
arrays, where at every location, you have one value to represent the pixel.
Remember to represent a color image, we need more than one value for each
pixel. But how many values we need to represent a color? Typically you need 3
values for each pixel to represent any color. This came from the idea that any
color can be formed by combining 3 basic colors Red, Blue and Green. Ex: you
get yellow by mixing red and green, violet can be formed by combining red and
blue etc., This is actually called RGB color space. There are many other ways to
. JPEG uses lossy compression. Which means that information is lost in the
compression process.
Since monitors have three color components for every pixel, one for red, one
for green, and one for blue, they use the RGB color space. We will assume we
have an image in the RGB color space and we will save it as a JPEG image.
Human eyes are more sensitive to brightness than color. For this reason, JPEG
allocates more space for brightness information than color information. But to do so,
as luminance and chrominance.
luma component represents brightness while the other two components (Cb and Cr)
The algorithm can be separated in different steps. We only show the steps for the
compression, the de-compression works in the opposite order. We show only the
most common compression: The lossy compression of 8bit RGB data. “Lossy”
means, that the compression will also reduce some of the image content (in
opposite to lossless compression).
Color Conversion and Subsampling
Starting with the RGB data, this data is divided into the luminance and the color
components. The data is converted to Yuv. This step does not reduce the amount
of data as it just changes the representation of the same information. But as the
human observer is much more sensitive to the intensity information than to the
color information, the color information can be sub-sampled without a significant
loss of visible image information. But of course the amount of data is reduced
significantly.
Block-Processing and DCT
The Yuv image data with the subsampled color components is then divided into
8x8pixel blocks. The complete algorithm is performed from now on on these pixel
blocks. Each block is transformed using the discrete Cosines Transformation
(DCT). What does this mean? In the spatial domain (so before we have
transformed) the data is described via digital value for each pixel. So we represent
the image content by a list of pixel number and pixel value. After the transformation,
the image content is described by the coefficient of the spatial frequencies for
vertical and horizontal orientation. So in the spatial domain, we need to store 64
pixel values. In the frequency domain, we have to store 64 frequency coefficients.
No data reduction so far.
Quantization
To reduce the needed amount of data to store the 64 coefficients, these are
quantized. Depending on the size of the quantization steps, more or less
information is lost in this step. Most of the times, the user can define the strength of
the JPEG compression. The quantization is the step where this user information
has influence on the result (remaining image quality and file size).
The MPEG compression algorithm encodes First a reduction of the resolution is done,
which is followed by a motion compensation in order to reduce temporal redundancy. The
next steps are the Discrete Cosine Transformation (DCT) and a quantization as it is used for
the JPEG compression; this reduces the spatial redundancy (referring to human visual
perception). The final step is an entropy coding using the Run Length Encoding and the
Huffman coding algorithm.
Step 1: Reduction of the Resolution The human eye has a lower sensibility to colour
information than to dark-bright contrasts. A conversion from RGB-colour-space into YUV
colour components help to use this effect for compression. The chrominance components U
and V can be reduced (subsampling) to half of the pixels in horizontal direction (4:2:2), or a
half of the pixels in both the horizontal and vertical (4:2:0).
: Motion Estimation An MPEG video can be understood as a sequence of frames. Because
two successive frames of a video sequence often have small differences (except in scene
changes), the MPEG-standard offers a way of reducing this temporal redundancy. It uses
three types of frames: I-frames (intra), P-frames (predicted) and B-frames (bidirectional).
Step 3: Discrete Cosine Transform (DCT) DCT allows, similar to the Fast Fourier Transform
(FFT), a representation of image data in terms of frequency components. So the frame-
blocks (8x8 or 16x16 pixels) can be represented as frequency components. The
transformation into the frequency domain is described by the following formula:
626 bytes
1286 bytes
You’ll notice that the image that had lesser horizontal color change (the one with
horizontal color bands), almost doubled in file size while the one in which we
had vertical color bands increased only about 15%.
Q.3 JPEG compression is used in a number of image file formats. JPEG/Exif is the
most common image format used by digital cameras and other photographic image
capture devices; along with JPEG/JFIF, it is the most common format for storing and
transmitting photographic images on the World Wide Web.[9] These format variations
are often not distinguished and are simply called JPEG.
The MIME media type for JPEG is "image/jpeg," except in older Internet
Explorer versions, which provide a MIME type of "image/pjpeg" when uploading
JPEG images.[10] JPEG files usually have a filename extension of "jpg" or "jpeg."
JPEG/JFIF supports a maximum image size of 65,535×65,535 pixels, [11] hence up to
4 gigapixels for an aspect ratio of 1:1. In 2000, the JPEG group introduced a format
intended to be a successor, JPEG 2000, but it was unable to replace the original
JPEG as the dominant image standard
Because of its compressed nature, it’s ideal for uploading images to the web. It is also
one of the most universal file formats as it can be used by just about any software
and can be printed by just about anyone. Unless you compress your photo with
editing software, it should be good.
Most of the time, if you’re using pictures from your phone or images shared across
the Internet, JPEGs will be pretty much the same from one another, ending in either a
.JPEG or .JPG extension. However, images taken with a higher-end digital camera
might have Exchangeable image file (or Exif) metadata that has information such as
what kind of camera was used and color settings when the picture was taken. These
are typically meant for more professional printing facilities.
Those are the seven most important drivers of complexity that make
computer vision difficult: