Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 13

MULTIMEDIA: MEDIA AND DATA STREAMS

The following chapter introduces terminology and gives a sense of the commonality of
the elements of multimedia. The introduction of terminology begins with a clarification of the
notion multimedia followed by a description of media and the important properties of
multimedia systems. Subsequently, characteristics of data Streams in such systems and the
introduction of the notion Logical Data Unit (LDU) follow
One way of defining multimedia can be found in the meaning of the composed word.
 Multi- much] much; multiple.
 Medium[lat]

This description is derived from the common forms of human interaction. It is not very exact and
has to be adapted to computer processing. Therefore, we discuss in the next section the notion
medium in more detail with respect to computer processing.

MULTIMEDIA: MEDIA AND DATA STREAMS

Medium

In general, one describes medium as information. Examples of a medium a means for


distribution and presentation of are text, graphics, speech and music. In the same way' one can
also add water and atmospheric; io lT-(according to the above medium description from the
American Heritage Dictionary). Media can be classified with respect to different criteria
[ISO93a]. We classify media according to perception, representation, presentation, storage,
transmission information exchange.

The Perception Medium

Perception media help humans to sense their environment. • The perception of


information occurs mostly through seeing or hearing the Information • For the perception of
information through seeing, the visual media such as text, image and video are used. • For the
perception of information through hearing, auditory media such as music, noise and, speech are
relevant.

The Representation Medium

An image can be in JPEG format. An audio stream can be represented using a simple
PCM (Pulse Coding Method) with a linear quantization of 16 bits per sample. A text character is
coded in ASCII or EBCDIC code. Representation media are characterized by internal computer
representations of information. That is, how is the computer information coded? The various
formats used to represent media information in computer.

 A text character is coded in ASCII or EBCDIC code.


 Graphics are coded according to CEPT or CAPTAIN videotext standard. The
 graphics standard GKS can also serve as a basis for coding,-
 An audio stream can be represented using a simple PCM (Pulse Coding
 Method) with a linear quantization of 16 bits per sample.
 An image can be coded as a facsimile (the gloup 3 according to the ISO
 Standard Specification) or in JPEG format.
 A combined audio/video sequence can be coded in different TV standard formats
 (e.g., PAL, SECAM, NTSC), and stoied in the computer using i" MFEG
 Format

The Presentation Medium

Presentation media refer to the tools and devices for the input and output of information.
The central question is: Through which medium is information delivered lA the computer, or
introduced into the computer? The media, ex, paper' screen and speaker are used to deliver the
information by the computer (output media); keyboard, mouse, camera and microphone are the
input media,

The Storage Medium

Storage media refer to a data carrier which enables storage of information. However,
the storage of data is not limited only to the available components o[ a .ompui"i.
Therefore, paper is also a storage medium. The central question is: Where will
the information be stored,? Microfilm, floppy disk, hard disk, and CD-ROM are
examples of storage media.

The transmission Medium

The transmission medium characterizes different information carriers that enable


continuous data transmission. Information is transmitted over networks, which use wire and
cable transmission, such as coaxial cable and fiber optics, as well as free air space.

The information exchange medium

The information exchange medium includes all information carriers for transmission,
That is all storage and transmission media. • The storage medium is transported outside of
computer networks to the destination, through direct transmission using computer networks, or
through combined usage of storage and transmission media. (e.g., electronic mailing system)

Representation Values and Representation Spaces

The above classification of media can be used as a basis for characterizing the notion
medium in the context of information processing. Here, the description of perception medium
comes closest to our notion of a medium: the media appeal to the human senses. Each medium
defines, values and representation port [HD90, HS91], which involve the five senses. Stereo and
quadraphonic determined the acoustic representations spaces.

Components of Multimedia
There are five components of multimedia i.e. text, sound, images, animation and video.

Text
Text or written language is the most common way of communicating information. It is
one of the basic components of multimedia. It was originally defined by printed media such as
books and newspapers that used various typefaces to display the alphabet, numbers, and special
characters. Although multimedia products include pictures, audio and video, text may be the
most common data type found in multimedia applications. Besides this, text also provides
opportunities to extend the traditional power of text by linking it to other media, thus making it
an interactive medium.

(i) Static Text


In static text, the words are laid out to fit in well with the graphical surroundings.
The words are built into the graphics just like the graphics and explanation given in the pages of
the book, the information is well laid out and easy to read. The learners are able to look at the
pictures and read the textual information.

(ii) Hypertext
hypertext system consists of nodes. It contains the text and links between the nodes,
which define the paths the user can follow to access the text in non-sequential ways. The links
represent associations of meaning and can be thought of as cross-references. This structure is
created by the author of the system, although in more sophisticated hypertext systems the user is
able to define their own paths. The hypertext provides the user with the flexibility and choice to
navigate through the material. Text should be used to convey imperative information and should
be positioned at appropriate place in a multimedia product. Well-formatted sentences and
paragraphs are vital factors, spacing and punctuation also affects the readability of the text. Fonts
and styles should be used to improve the communication of the message more appropriately.

Image
Images are an important component of multimedia. These are generated by the computer
in two ways, as bitmap or raster images and as vector images.

(i) Raster or Bitmap Images


The most common and comprehensive form of storage for images on a computer is a
raster or bitmap image. Bitmap is a simple matrix of the tiny dots called pixel that forms a raster
or bitmap image (Vaughan, 2008). Each pixel consists of two or more colours. The colour depth
is determined by how much data, in bits is used to determine the number of colours e.g. one bit is
two colours, four bits means sixteen colours, eight bits indicates 256 colours, 16 bits yields
65,536 colours and so on. Depending on the hardware capabilities, each point can display from
two to millions of colours. Comprehensive image means that an image looks as much as possible
like the real word or original product. This means that the proportion, size, colour, and texture
must be as accurate as possible. Bitmap formats are Windows Bitmap (BMP), Device
Independent Bitmap (DIB), and Windows Run Length Encoded (RLE) (Hillman, 1998).
(ii) Vector Images
Vector images base on drawing elements or objects such as lines, rectangles, circles and
so forth to create an image. The advantage of vector image is the relatively small amount of data
required to represent the image and therefore, it does not requires a lot of memory to store. The
image consists of a set of commands that are drawn when needed. A bitmap image requires the
number of pixels to produce appropriate height, width and colour depth, the vector image is
based on a relatively limited number of drawing commands. The falls drop of vector images is
the limited level of detail that can be presented in an image (Hillman, 1998). Mostly used vector
format is Windows metfile in windows operating system.

Compression techniques are used to reduce the file size of images that is useful for
storing large number of images and speeding transmission for networked application.
Compression formats used for this purpose are GIF, TIFF and JPEG.

Animation
Animation consists of still images displayed so quickly that they give the impression of
continuous movement. The screen object is a vector image in animation. The movement of that
image along paths is calculated using numerical transformations applied to their defining
coordinates. To give the impression of smoothness the frame rate has to be at least 16 frames per
second, and for natural looking motion it should be at least 25 frames per second. Animations
may be two or three dimensional. In two dimensional animations the visual changes that bring an
image alive occur on the flat X and Y axis of the screen, while in three dimensional animations it
occurs along the entire three axes X, Y and Z showing the image from all the angles. Such
animations are typically rendered frame by high-end three dimensional animation softwares.
Animation tools are very powerful and effective. There are two basic types of animations, path
animation and frame animation.

(i) Path Animation


Path animations involve moving an object on a screen that has a constant background e.g.
a cartoon character may move across the screen regardless any change in the background or the
character.

(ii) Frame Animation


In frame animations, several objects are allowed to move simultaneously and the objects
or the background can also change. The moving objects are one of the most appropriate tools to
enhance understanding, as they allow the learner to see the demonstration of changes, processes
and procedures (Earnshaw & Vince, 1995). Animation uses very little memory in comparison to
digital video as it consists of drawing and moving instructions. Animation is very useful for such
multimedia applications where moving visuals are required, but where digital video may be
unsuitable, unnecessary, or too expensive in terms of disc space or memory.

Sound
Sound is probably the most sensuous element of multimedia. It is meaningful speech in
any language, from a whisper to a scream. It can provide the listening pleasure of music, the
startling accent of special effects, or the ambience of a mood setting background. It can promote
an artist; add interest to a text site by humanizing the author, or to teach pronouncing words in
another language. Sound pressure level (volume) is measured in decibels, which is actually the
ratio between a chosen reference point on a logarithmic scale and the level that is actually
experienced.

(i) Musical Instrument Digital Identifier (MIDI)


Musical Instrument Digital Identifier (MIDI) is a communication standard developed in
the early 1980s for electronic musical instruments and computers. It is the short hand
representation of music stored in numeric form. MIDI is the quickest, easiest and most flexible
tool for composing original score in a multimedia project. To make MIDI scores sequencer,
software and sound synthesizer is needed. A MIDI keyboard is also useful for simplifying the
creation of musical scores. Its quality depends upon the quality of musical instruments and the
capabilities of sound system. It is device dependent (Vaughan, 2008).

(ii) Digital Audio


Digitized sound is sampled sound. The every nth fraction of a second, a sample of sound
is taken and stored as digital information in bits and bytes. The quality of this digital recording
depends upon how often the samples are taken (sampling rate) and how many numbers are used
to represent the value of each sample (bit depth, sample size, resolution). The more often the
sample is taken and the more data is stored about that sample, the finer the resolution and quality
of the captured sound when it is played back (Vaughan, 2008). The quality of digital audio also
relies on the quality of the original audio source, capture devices, supporting software and the
capability of playback environment.

The main benefit of audio is that it provides a channel that is separate from that of the
display (Nielson, 1995). Sound plays a major role in multimedia applications, but there is a very
fine balance between getting it right and overdoing it (Philips, 1997). Multimedia products
benefit from digital audio as informational content such as a speech or voice-over and as special
effects to indicate that a program is executing various actions such as jumping to new screens.
The three sampling frequencies used in multimedia are CD-quality 44.1 kHz, 22.05 kHz and
11.025 kHz. Digital audio plays a key role in digital video.

Video
Video is defined as the display of recorded real events on a television type screen. The
embedding of video in multimedia applications is a powerful way to convey information. It can
incorporate a personal element, which other media lack. The personality of the presenter can be
displayed in a video (Philips, 1997). The video may be categorized in two types, analog video
and digital video.

(i) Analog Video


Analog video is the video data that is stored in any non-computer media like videotape,
laserdisc, film etc. It is further divided in two types, composite and component analogue video.
Composite Analog Video has all the video components including brightness, colour, and
synchronization, combined into one signal. Due to the composition or combining of the video
components, the quality of the composite video is resulted as color bleeding, low clarity and high
generational loss (Hillman, 1998). Generational loss means the loss of quality when the master is
copied to edit or for other purpose. This recording format was used for customer analog video
recording tape formats (such as Betamax and VHS) and was never adequate for most multimedia
presentations (Vaughan, 2008). Composite video is also susceptible to quality loss from one
generation to other.

Component analog video is considered more advanced than composite video. It takes
different components of video such as colour, brightness and synchronization and breaks them
into separate signals (Hillman, 1998). S-VHS and Hi-8 are examples of this type of analog video
in which colour and brightness, information are stored on two separate tracks. In early 1980s,
Sony has launched a new portable, professional video format „Betacam‟ in which signals are
stored on three separate tracks (Vaughan, 2008).

There are certain analogue broadcast video standards commonly used round the globe.
These are National Television Standard Committee (NTSC), Phase Alternate Line (PAL),
Sequential Colour with Memory (SECAM) and HDTV. In the United States, Canada, Japan
NTSC standard is used, while in United Kingdom, China, South Africa PAL is used. SECAM is
used in France. A new standard has been developed known as High Definition Television
(HDTV) which bears better image and colour quality in comparison to other standards.

(ii) Digital Video


It is the most engaging of multimedia venues, and it is a powerful tool for bringing
computer users closer to the real world (Vaughan, 2008). Digital video is storage intensive. A
high quality colour still image on a computer screen requires one megabyte or more of storage
memory. To provide the appearance of motion, picture should be replaced by at least thirty times
per second and the storage memory required is at least thirty megabyte for one second of video.
The more times the picture is replaced, the better is the quality of video.

Video requires high bandwidth to deliver data in networked environment. This


overwhelming technological bottleneck is overcome using digital video compression schemes.
There are video compression standards as MPEG, JPEG, Cinepak and Sorenson. In addition to
compressing video data, streaming technologies such as Adobe Flash, Microsoft Windows
Media, QuickTime and Real Player are being implemented to provide reasonable quality low
bandwidth video on the web. QuickTime and Real Video are the most commonly used for wide
spread distribution.
Digital video formats can be divided into two categories, composite video and component video.
Composite digital recording formats encode the information in binary (0‟s and 1‟s) digital code.
It retains some of weakness of analogue composite video like colour and image resolution and
the generation loss when copies are made.

Component digital is the uncompressed format having very high image quality. It is
highly expensive. Some popular formats in this category are „Digital Bitacam‟ and D-5
developed in 1994 and DVCAM developed in 1996. There are certain standards for digital
display of video i.e. Advanced Television System Committee (ATSC), Digital Video
Broadcasting (DVB), and Integrated Services Digital Broadcasting (ISBD). ATSC is the digital
television standard for the United States, Canada and South Korea, DVB is used commonly in
Europe and ISBD is used in Japan to allow the radio and television stations to convert into digital
format (Molina & Villamil, 1998).Video can be used in many applications. Motion pictures
enhance comprehension only if they match the explanation. For example, if we want to show the
dance steps used in different cultures, video is easier and more effective than to use any graphics
or animation (Thibodeau, 1997).

Multimedia System and properties

A multimedia system is characteritized by computer control, integration production,


manipulation, presentation storage and communication of independent information which is
encoded at least through a continuous (time dependent) and a discrete (time independent)
medium.*/

Properties of multimedia systems

1. Combination of media.
2. Independence.
3. Computer supported integration(computer control)
4. Communication system.

1. Combination of media

According to the definition of multimedia system, a multimedia system must be composed with
the help of different mediums and devices and all together when works or comes in function then
it forms the multimedia system.

2. Independence

In the multimedia system different media should be independent from each other whereas
there should be inherently tight connection between different media to work together also.

3. Computer supported integration

The different independent media are combined in arbitrary forms to work together as a
system with the support of computers. /*Computer supported integration also called control
through the computer in multimedia systems.*/

4. Communication systems

Communication capable multimedia system must be approached. Multimedia information


not only be created, proceed and stored but also be distributed above the single computer
boundary which makes the multimedia application much popular and useful in distributed
environment.

Images and Graphics


An image consists of a rectangular array of dots called pixels. The size of the image is
specified in terms of width X height, in numbers of the pixels. The physical size of the image, in
inches or centimeters, depends on the resolution of the device on which the image is displayed.
The resolution is usually measured in DPI (Dots Per Inch). An image will appear smaller on a
device with a higher resolution than on one with a lower resolution. For color images, one needs
enough bits per pixel to represent all the colors in the image. The number of the bits per pixel is
called the depth of the image.

Image data types

Images can be created by using different techniques of representation of data called data
type like monochrome and colored images. Monochrome image is created by using single color
whereas colored image is created by using multiple colors. Some important data types of images
are following:
 1-bit images- An image is a set of pixels. Note that a pixel is a picture element in digital
image. In 1-bit images, each pixel is stored as a single bit (0 or 1). A bit has only two
states either on or off, white or black, true or false. Therefore, such an image is also
referred to as a binary image, since only two states are available. 1-bit image is also
known as 1-bit monochrome images because it contains one color that is black for off
state and white for on state.
A 1-bit image with resolution 640*480 needs a storage space of 640*480 bits.
640 x 480 bits. = (640 x 480) / 8 bytes = (640 x 480) / (8 x 1024) KB= 37.5KB.
The clarity or quality of 1-bit image is very low.
 8-bit Gray level images- Each pixel of 8-bit gray level image is represented by a single
byte (8 bits). Therefore each pixel of such image can hold 2 8=256 values between 0 and
255. Therefore each pixel has a brightness value on a scale from black (0 for no
brightness or intensity) to white (255 for full brightness or intensity). For example, a dark
pixel might have a value of 15 and a bright one might be 240.

A grayscale digital image is an image in which the value of each pixel is a single sample, which
carries intensity information. Images are composed exclusively of gray shades, which vary from
black being at the weakest intensity to white being at the strongest. Grayscale images carry many
shades of gray from black to white. Grayscale images are also called monochromatic, denoting
the presence of only one (mono) color (chrome). An image is represented by bitmap. A bitmap is
a simple matrix of the tiny dots (pixels) that form an image and are displayed on a computer
screen or printed.

A 8-bit image with resolution 640 x 480 needs a storage space of 640 x 480 bytes=(640 x
480)/1024 KB= 300KB. Therefore an 8-bit image needs 8 times more storage space than 1-bit
image.
 24-bit color images - In 24-bit color image, each pixel is represented by three bytes,
usually representing RGB (Red, Green and Blue). Usually true color is defined to mean
256 shades of RGB (Red, Green and Blue) for a total of 16777216 color variations. It
provides a method of representing and storing graphical image information an RGB color
space such that a colors, shades and hues in large number of variations can be displayed
in an image such as in high quality photo graphic images or complex graphics.
Many 24-bit color images are stored as 32-bit images, and an extra byte for each pixel used to
store an alpha value representing special effect information.
A 24-bit color image with resolution 640 x 480 needs a storage space of 640 x 480 x 3 bytes =
(640 x 480 x 3) / 1024=900KB without any compression. Also 32-bit color image with resolution
640 x 480 needs a storage space of 640 x 480 x 4 bytes= 1200KB without any compression.
Disadvantages
o Require large storage space
o Many monitors can display only 256 different colors at any one time. Therefore,
in this case it is wasteful to store more than 256 different colors in an image.
 8-bit color images - 8-bit color graphics is a method of storing image information in a
computer's memory or in an image file, where one byte (8 bits) represents each pixel. The
maximum number of colors that can be displayed at once is 256. 8-bit color graphics are
of two forms. The first form is where the image stores not color but an 8-bit index into
the color map for each pixel, instead of storing the full 24-bit color value. Therefore, 8-bit
image formats consists of two parts: a color map describing what colors are present in the
image and the array of index values for each pixel in the image. In most color maps each
color is usually chosen from a palette of 16,777,216 colors (24 bits: 8 red, 8green, 8
blue).
The other form is where the 8-bits use 3 bits for red, 3 bits for green and 2 bits for blue. This
second form is often called 8-bit true color as it does not use a palette at all. When a 24-bit full
color image is turned into an 8-bit image, some of the colors have to be eliminated, known as
color quantization process.
A 8-bit color image with resolution 640 x 480 needs a storage space of 640 x 480 bytes=(640 x
480) / 1024KB= 300KB without any compression.

Color lookup tables

A color loop-up table (LUT) is a mechanism used to transform a range of input colors
into another range of colors. Color look-up table will convert the logical color numbers stored in
each pixel of video memory into physical colors, represented as RGB triplets, which can be
displayed on a computer monitor. Each pixel of image stores only index value or logical color
number. For example if a pixel stores the value 30, the meaning is to go to row 30 in a color
look-up table (LUT). The LUT is often called a Palette.
Characteristic of LUT are following:
 The number of entries in the palette determines the maximum number of colors which
can appear on screen simultaneously.
 The width of each entry in the palette determines the number of colors which the wider
full palette can represent.
A common example would be a palette of 256 colors that is the number of entries is 256 and thus
each entry is addressed by an 8-bit pixel value. Each color can be chosen from a full palette, with
a total of 16.7 million colors that is the each entry is of 24 bits and 8 bits per channel which sets
the total combinations of 256 levels for each of the red, green and blue components 256 x 256 x
256 =16,777,216 colors.

Image file formats


 GIF- Graphics Interchange Formats- The GIF format was created by Compuserve. It
supports 256 colors. GIF format is the most popular on the Internet because of its
compact size. It is ideal for small icons used for navigational purpose and simple
diagrams. GIF creates a table of up to 256 colors from a pool of 16 million. If the image
has less than 256 colors, GIF can easily render the image without any loss of quality.
When the image contains more colors, GIF uses algorithms to match the colors of the
image with the palette of optimum set of 256 colors available. Better algorithms search
the image to find and the optimum set of 256 colors.

Thus GIF format is lossless only for the image with 256 colors or less. In case of a rich, true
color image GIF may lose 99.998% of the colors. GIF files can be saved with a maximum of 256
colors. This makes it is a poor format for photographic images.
GIFs can be animated, which is another reason they became so successful. Most animated banner
ads are GIFs. GIFs allow single bit transparency that is when you are creating your image, you
can specify which color is to be transparent. This provision allows the background colors of the
web page to be shown through the image.

 JPEG- Joint Photographic Experts Group- The JPEG format was developed by the Joint
Photographic Experts Group. JPEG files are bitmapped images. It store information as
24-bit color. This is the format of choice for nearly all photograph images on the internet.
Digital cameras save images in a JPEG format by default. It has become the main
graphics file format for the World Wide Web and any browser can support it without
plug-ins. In order to make the file small, JPEG uses lossy compression. It works well on
photographs, artwork and similar materials but not so well on lettering, simple cartoons
or line drawings. JPEG images work much better than GIFs. Though JPEG can be
interlaced, still this format lacks many of the other special abilities of GIFs, like
animations and transparency, but they really are only for photos.
 PNG- Portable Network Graphics- PNG is the only lossless format that web browsers
support. PNG supports 8 bit, 24 bits, 32 bits and 48 bits data types. One version of the
format PNG-8 is similar to the GIF format. But PNG is the superior to the GIF. It
produces smaller files and with more options for colors. It supports partial transparency
also. PNG-24 is another flavor of PNG, with 24-bit color supports, allowing ranges of
color akin to high color JPEG. PNG-24 is in no way a replacement format for JPEG
because it is a lossless compression format. This means that file size can be rather big
against a comparable JPEG. Also PNG supports for up to 48 bits of color information.
 TIFF- Tagged Image File Format- The TIFF format was developed by the Aldus
Corporation in the 1980 and was later supported by Microsoft. TIFF file format is widely
used bitmapped file format. It is supported by many image editing applications, software
used by scanners and photo retouching programs.

TIFF can store many different types of image ranging from 1 bit image, grayscale image, 8 bit
color image, 24 bit RGB image etc. TIFF files originally use lossless compression. Today TIFF
files also use lossy compression according to the requirement. Therefore, it is a very flexible
format. This file format is suitable when the output is printed. Multi-page documents can be
stored as a single TIFF file and that is way this file format is so popular. The TIFF format is now
used and controlled by Adobe.
 BMP- Bitmap- The bitmap file format (BMP) is a very basic format supported by most
Windows applications. BMP can store many different type of image: 1 bit image,
grayscale image, 8 bit color image, 24 bit RGB image etc. BMP files are uncompressed.
Therefore, these are not suitable for the internet. BMP files can be compressed using
lossless data compression algorithms.
 EPS- Encapsulated Postscript- The EPS format is a vector based graphic. EPS is popular
for saving image files because it can be imported into nearly any kind of application. This
file format is suitable for printed documents. Main disadvantage of this format is that it
requires more storage as compare to other formats.
 PDF- Portable Document Format- PDF format is vector graphics with embedded pixel
graphics with many compression options. When your document is ready to be shared
with others or for publication. This is only format that is platform independent. If you
have Adobe Acrobat you can print from any document to a PDF file. From illustrator you
can save as .PDF.
 EXIF- Exchange Image File- Exif is an image format for digital cameras. A variety of
tage are available to facilitate higher quality printing, since information about the camera
and picture - taking condition can be stored and used by printers for possible color
correction algorithms.it also includes specification of file format for audio that
accompanies digital images.
 WMF- Windows MetaFile- WMF is the vector file format for the MS-Windows operating
environment. It consists of a collection of graphics device interface function calls to the
MS-Windows graphice drawing library.Metafiles are both small and flexible, hese images
can be displayed properly by their proprietary softwares only.
 PICT- PICT images are useful in Macintosh software development, but you should avoid
them in desktop publishing. Avoid using PICT format in electronic publishing-PICT
images are prone to corruption.
 Photoshop- This is the native Photoshop file format created by Adobe. You can import
this format directly into most desktop publishing applications.

Computer Image Processing

Image processing has two main areas:


 Image synthesis and
 Image analysis
` Image synthesis, has already been covered to an introductory level in CSA2120, and will not be
discussed further in this lecture series. Image analysis is concerned with recovering from
graphics information which is necessary for scene analysis, e.g., automatically discerning what is
in a scene, and being able to reason about topographical relationships between objects in a scene.
Some application areas are object identification and tracking (tracking, obviously, in an
environment where there are a sequence of images such as video), image enhancement (to
improve the quality of a digitized image), pattern detection and recognition (e.g., optical
character recognition), and scene analysis and computer vision (e.g., visual planning systems,
and reconstructing 3D scenes from stereoscopic images). Of obvious importance is the ability to
accurately reconstruct 3D environments for virtual realities.
Image Recognition
Figure shows the variety of steps required to transform iconic information into
recognition information.

Image recognition is usually performed on digital images which are represented by a pixel
matrix. The only information available to an image recognition system is the light intensities of
each pixel and the location of a pixel in relation to its neighbors. From this information, image
recognition systems must recover information which enables objects to be located and
recognized, and, in the case of stereoscopic images, depth information which informs us of the
spatial relationship between objects in a scene.

Image Formatting
Image Formatting means capturing an image by bringing it into a digital form -- already
covered in the section on digitizing images.
Conditioning
In an image, there are usually features which are uninteresting, either because they were
introduced into the image during the digitization process as noise, or because they form part of a
background. An observed image is composed of informative patterns modified by uninteresting
random variations. Conditioning suppresses, or normalizes, the uninteresting variations in the
image, effectively highlighting the interesting parts of the image.
Labeling
Informative patterns in an image have structure. Patterns are usually composed of
adjacent pixels which share some property such that it can be inferred that they are part of the
same structure (e.g., an edge). Edge detection techniques focus on identifying continuous
adjacent pixels which differ greatly in intensity or colour, because these are likely to mark
boundaries, between objects, or an object and the background, and hence form an edge. After the
edge detection process is complete, much edge will have been identified. However, not all of the
edges are significant. Thresholding filters out insignificant edges. The remaining edges are
labeled. More complex labeling operations may involve identifying and labeling shape primitives
and corner finding.
Grouping
Labeling finds primitive objects, such as edges. Grouping can turn edges into lines by
determining that different edges belong to the same spatial event. The first 3 operations represent
the image as a digital image data structure (pixel information), however, from the grouping
operation the data structure needs also to record the spatial events to which each pixel belongs.
This information is stored in a logical data structure.
Extracting
Grouping only records the spatial event(s) to which pixels belong. Feature extraction
involves generating a list of properties for each set of pixels in a spatial event. These may include
a set's centroid, area, orientation, spatial moments, grey tone moments, spatial-grey tone
moments, circumscribing circle, inscribing circle, etc. Additionally properties depend on whether
the group is considered a region or an arc. If it is a region, then the number of holes might be
useful. In the case of an arc, the average curvature of the arc might be useful to know. Feature
extraction can also describe the topographical relationships between different groups. Do they
touch? Does one occlude another? Where are they in relation to each other? Etc.
Matching
Finally, once the pixels in the image have been grouped into objects and the relationship
between the different objects has been determined, the final step is to recognize the objects in the
image. Matching involves comparing each object in the image with previously stored models and
determining the best match template matching.

Stored Image Format

A digital image is stored as a 2-dimensional array of values, where each value represents the data
associated with a pixel in the image. In the case of bitmaps, the value is 0 or 1, which represent
monochrome images. In the case of a colour image, the value can be:
 3 numbers representing the intensities of the red, green, and blue components of the
colour at that pixel;
 An indirect address to tables of red, green and blue intensities;
 An indirect address to a table of colour triples;
 An indirect address to any table capable of representing colour codes;
 4 or 5 spectral samples for each colour.
The storage space required for an image is the resolution of the image multiplied by the colour
depth. For example, a 640x480 resolution image in millions of colours requires 640x480x24 =
7,372,800 bits, or 900K. Smaller space requirements can be obtained by compressing the image.

Computer-generated graphics
Graphics can also be created from scratch using a graphics editor, e.g. xphigs. In this
case, a graphic is specified through graphics primitives and their attributes, rather than by a pixel
matrix. This gives the advantage that components of the image can be manipulated through the
primitives (e.g., line, square, ellipse), whereas with a digitized image it is only possible to
manipulate the image at the pixel level. These graphics occupy less space than a corresponding
digitized image of the same resolution and colour-depth. However, before the graphic can be
rendered on the screen it needs to be converted into a pixel matrix. Some graphics packages also
allow objects to be labeled (e.g., if you draw a chair you can label that object as a chair). This is
of particular interest to content-based image retrieval.

You might also like