Professional Documents
Culture Documents
ABSTRACT
As more and more audiovisual information becomes available from many
sources around the world, many people would like to use this information for
various purposes. This challenging situation led to the need for a solution that
quickly and efficiently searches for and/or filters various types of multimedia
material that’s interesting to the user.
INTRODUCTION
The Moving Pictures Experts Group abbreviated MPEG is part of the
International Standards Organization (ISO), and defines standards for digital
video and digital audio. The primal task of this group was to develop a format to
play back video and audio in real time from a CD. Meanwhile the demands have
raised and beside the CD the DVD needs to be supported as well as transmission
equipment like satellites and networks. All this operational uses are covered by a
broad selection of standards. Well known are the standards MPEG-1, MPEG-2,
MPEG-4 and MPEG-7. Each standard provides levels and profiles to support
special applications in an optimized way.
It's clearly much more fun to develop multimedia content than to index it.
The amount of multimedia content available -- in digital archives, on the World
Wide Web, in broadcast data streams and in personal and professional databases
-- is growing out of control. But this enthusiasm has led to increasing difficulties
in accessing, identifying and managing such resources due to their volume and
complexity and a lack of adequate indexing standards. The large number of
recently funded DLI-2 projects related to the resource discovery of different
media types, including music, speech, video and images, indicates an
acknowledgement of this problem and the importance of this field of research for
digital libraries.
Avid PC users will almost certainly remember the first time they were
able to view a video clip on their computer. The clips were about the size of a
postage stamp and were generously referred to as "multimedia". Later, the first
acceptable video clips were used in the opening scenes of computer games. In
some cases, there were even digital 3D animations that couldn't be created in
real-time with the hardware and software that was available in those days. As the
video clips demanded extensive storage space (despite their short length), they
were only available on CD-ROM drives that had recently become popular.
Because of this, many PC's became multimedia-compatible, in a restricted sense,
by the integration of a CD-ROM drive and a soundcard. However, their
limitations soon became apparent: it wasn't possible to run the video clip
smoothly in fullscreen mode even with the most powerful hardware available.
With the development of high performance graphic chips, faster processors and
corresponding software interfaces, today's users are now able to run video clips
in all the usual formats (including fullscreen mode) without problems. We'll
continue with a look at the most video formats and we'll then provide an
overview of their specific applications.
One of the oldest formats in the x86 computer world is AVI. The
abbreviation 'AVI' stands for 'Audio Video Interlaced'. This video format was
created by Microsoft, which was introduced along with Windows 3.1. AVI, the
proprietary format of Microsoft's "Video for Windows" application, merely
provides a framework for various compression algorithms such as Cinepak, Intel
Indeo, Microsoft Video 1, Clear Video or IVI. In its first version, AVI supported
a maximum resolution of 160 x 120 pixels with a refresh rate of 15 frames per
second. The format attained widespread popularity, as the first video editing
systems and software appeared that used AVI by default. Examples of such
editing boards included Fast's AV Master and Miro/Pinnacle's DC10 to DC50.
However, there were a number of restrictions: for example, an AVI video that
had been processed using an AV Master could not be directly processed using an
interface board from Miro/Pinnacle. The manufacturers adapted the open AVI
format according to their own requirements. AVI is subject to additional
restrictions under Windows 98, which make professional work at higher
resolutions more difficult. For example, the maximum file size under the FAT16
file system is 2 GB. The FAT32 file system (came with OSR2 and Windows 98)
brought an improvement: in connection with the latest DirectX6 module
'DirectShow', files with a size of 8 GB can (at least in theory) be created. In
practice however, many interface cards lack the corresponding driver support so
that Windows NT 4.0 and NTFS are strongly recommended. Despite its age and
numerous problems, the AVI format is still used in semi-professional video
editing cards. Many TV cards and graphic boards with a video input also use the
AVI format. These are able to grab video clips at low resolutions (mostly 320 x
240 pixels).
Apple's Format
The MOV format which originated in the Macintosh world, was also
ported to x86 based PC's. It is the proprietary standard of Apple's Quicktime
application that simultaneously stores audio and video data. Between 1993 and
1995, Quicktime was superior to Microsoft's AVI format in both functionality
and quality. The functionality of the latest generation (Quicktime 4.0) also
includes the streaming of Internet videos (the realtime transmission of videos
without the need to first download the entire file to the computer). Despite this,
Apple's proprietary format is continually losing popularity with the increasing
use of MPEG. Video clips coded with Apple's format are still found on some
CD's because of Quicktime's ability to run on both Macintosh and x86
computers.
MPEG Formats
The MPEG formats are by far the most popular standard. MPEG stands
for "Motion Picture Experts Group" - an international organization that develops
standards for the encoding of moving images. In order to attain widespread use,
the MPEG standard only specifies a data model for the compression of moving
pictures and for audio signals. In this way, MPEG remains platform independent.
One can currently differentiate between four standards: MPEG-1, MPEG-2,
MPEG-4 und MPEG-7. Let's take a brief look at each format separately.
MPEG-2 has been in existence since 1995 and its basic structure is the
same as that of MPEG-1, however it allows data rates up to 100 MBit/s and is
used for digital TV, video films on DVD-ROM and professional video studios.
MPEG-2 allows the scaling of resolution and the data rate over a wide range.
Due to its high data rate compared with MPEG-1 and the increased requirement
for memory space, MPEG-2 is currently only suitable for playback in the home
user field. The attainable video quality is noticeably better than with MPEG-1 for
data rates of approximately 4 MBit/s.
MPEG-4 is one of the latest video formats and its objective is to get the
highest video quality possible for extremely low data rates in the range between
10 KBit/s and 1 MBit/s. Furthermore, the need for data integrity and loss-free
data transmission is paramount as these play an important role in mobile
communications. Something completely new in MPEG-4 is the organization of
the image contents into independent objects in order to be able to address or
process them individually. MPEG-4 is used for video transmission over the
Internet for example. Some manufacturers plan to transmit moving images to
mobile phones in the future. MPEG-4 is intended to form a platform for this type
of data transfer.
MPEG-1, -2, and -4 make content available. MPEG-7 lets you to find the
content you need.
DEFINING MPEG-7
Qualifying MPEG-7
Over 100 MPEG-7 Description Tools are currently being developed and
refined. The relationships between the MPEG-7 Description Tools are outlined in
Figure 2. The basic elements, at the lower level, deal with basic data types,
mathematical structures, schema tools, linking and media localization tools, as
well as basic DSs, which are elementary components of more complex DSs. The
Schema tools section specifies elements for creating valid MPEG-7 schema
instance documents and description fragments.
In addition, this section specifies tools for managing and organizing the
elements and datatypes of the schema. Based on this lower level, content
description and management elements can be defined. These elements describe
the content from several viewpoints. Currently five viewpoints are defined:
creation and production, media, usage, structural aspects, and conceptual aspects.
The first three elements primarily address information that’s related to the
management of the content (content management), whereas the last two are
mainly devoted to the description of perceivable information (content
description).
classification, and purpose of the creation. Most of the time this information is
author-generated since it can’t be extracted from the content.
• Usage: Contains meta information that’s related to the usage of the content;
typical features involve rights holders, access rights, publication, and financial
information. This information may be subject to change during the lifetime of the
AV content.
• Media: Contains the description of the storage media; typical features include
the storage format, the encoding of the AV content, and elements for the
identification of the media. Note: Several instances of storage media for the same
AV content can be described.
The five sets of Description Tools are presented here as separate entities,
however, they are interrelated and may be partially included in each other. For
example, Media, Usage or Creation & Production elements can be attached to
individual segments involved in the structural description of the content. Tools
are also defined for navigation and access and there is another set of tools for
Content organization which addresses the organization of content by
classification, by the definition of collections and by modeling. Finally, the last
set of tools is User Interaction which describes user’s preferences for the
consumption of multimedia content and usage history.
The MPEG-7 Audio group develops a range of Description Tools, from generic
audio descriptors (e.g., waveform and spectrum envelopes, fundamental
frequency) to more sophisticated description tools like Spoken Content and
Timbre. Generic Audio Description tools will allow the search for similar voices,
by searching similar envelopes and fundamental frequencies of a voice sample
against a database of voices. The Spoken Content Description Scheme (DS) is
designed to represent the output of a great number of state of the art Automatic
Speech Recognition systems, containing both words and phonemes
representations and most likely transitions. This alleviates the problem of out-of-
vocabulary words, allowing retrieval even when the original word was wrongly
decoded. The Timbre descriptors (Ds) describe the perceptual features of
instrument sound, that make two sounds having the same pitch and loudness
appear different to the human ear. These descriptors allow searching for melodies
independently of the instruments.
with similar motion patterns that can be applicable to news (e.g. similar
movements in a soccer or football game) or to surveillance applications (e.g.,
detect intrusion as a movement towards the safe zone).
The MPEG-7 Systems group is developing the DDL and the binary format
(known as BiM), besides working in the definition of the terminal architecture
and access units.
• Audio: I want to search for songs by humming or whistling a tune or, using an
excerpt of Pavarotti’s voice, get a list of Pavarotti’s records and video clips in
which Pavarotti sings or simply makes an appearance. Or, play a few notes on a
keyboard and retrieve a list of musical pieces similar to the required tune, or
images matching the notes in a certain way, e.g. in terms of emotions.
• Graphics: Sketch a few lines on a screen and get a set of images containing
similar graphics, logos, and ideograms.
• Image: Define objects, including color patches or textures, and get examples
from which you select items to compose your image. Or check if your company
logo was advertised on a TV channel as contracted.
• Visual: Allow mobile phone access to video clips of goals scored in a soccer
game, or automatically search and retrieve any unusual movements from
surveillance videos.
The following applications are examples of the type of solutions that MPEG-7
can solve. These application examples represent development work in progress.
There are many more applications being developed around the world.
Figure 3 shows possible ways to search for visual content using the inherent
structural features of an image. In this example there are four image features
detailed. The color histogram feature (1) of an image allows me to search for
images that have the same color. Note, the position of the colors is not important
but rather the amount of similar color in the image is important. The next feature,
spatial color distribution (2) allows me to search for images where the location of
the same color is important. You can see that the added object in the right-bottom
flag does not affect this type of search. You can additionally search for images
that have a similar edge or contour profile as in the spatial edge distribution (3)
search technique. Note, color does not make a difference to this type of search.
Finally, you can see an example of searching by object shape (4). Here, the color
and edge profiles are not important.
Movie Tool
MPEG-7 is about the future of media in the 21st century. This is not an
overstatement. MPEG-7 provides a comprehensive and flexible framework for
describing the content of multimedia. To describe content implies knowledge of
elements it consists of, as well as, knowledge of interrelations between those
elements. The most straightforward application is multimedia management,
where such knowledge is prerequisite for efficiency and accuracy. However,
there are other serious implications. Knowledge of the structural features of
multimedia information as well as its semantic features will help generate
solutions that will provide more comprehensive and accurate indexing and search
applications, (leading to greater ability for content manipulation, content reuse -
and thus new content creation). Many issues, it is true, remain including
copyrights issues and interoperability between applications and systems that wish
to adhere to the MPEG-7 standard. But such issues are balanced by incredible
The contributors to MPEG-7 include experts in every portion of the content value
chain: production, post-production, delivery, and consumption. Through this
process MPEG-7 has standardized description schemes for content description,
management, and organization, as well as navigation, access, user preferences
and usage history.
MPEG-7 description tools, then, are a key enabler of the following application
domains:
Media archives will become vast and interconnected pools of content, too
large to be managed manually. Customization of content within programs, e.g.
substitution of structural elements (characters, music, voices) according to viewer
desires, content scaling for PDA, cell phones, will be not only possible, but easy
and pleasant. MPEG-7 will enable the creation of tools, (through its structured
combination of low level features and high-level meta-data), for coping with this
"outbreak" of generic content.
MPEG-7 will address both retrieval from digital archives (pull applications)
as well as filtering of streamed audiovisual broadcasts on the Internet (push
applications). It will operate in both real-time and non real-time environments. A
"real-time environment" in this context means that the description is generated at
the same time as the content is being captured (e.g., smart cameras and scanners).
There are many applications and application domains which will potentially
benefit from the MPEG-7 standard. Examples of applications include:
• Education;
• Journalism (e.g., searching speeches of a certain politician using his name,
his voice or his face);
• Cultural services (museums, art galleries);
REFERENCES
(http://www.cselt.it/mpeg/)
(http://www.mpeg-7.com)
www.mpeg-7.com
CONTENTS
1. INTRODUCTION
4. DEFINING MPEG-7
10. REFERENCES
ACKNOWLEDGMENT