Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Module 1

INTRODUCTION TO MULTIMEDIA DATABASES


Prof. Dr. Naomie Salim Faculty of Computer Science & Information Systems Universiti Teknologi Malaysia

The Explosion of Digital Multimedia Information


We interact with multimedia everyday Large amount of text, images, speech & video converted to digital form Advantages of digitized data over analog
Easy storage Easy processing Easy sharing

Give examples of multimedia applications that deals with storing, retrieving, processing and sharing of multimedia data

Eg 1. Journalism
Journalist to write article about influence of alcohol on driving Investigation involved:
Collect news articles about accidents, scientific reports, television commercials, police interviews, medical experts interviews

Illustration:
Search photo archives, stock footage companies for good photos shocking, funny, etc.

Other examples
Searching movies
Based on taste of movies already seen Based on movies a friend favor

Searching on web
Eg. searching Australian Open website (http://www.ausopen.org) Integrate conceptual terms + interesting events give info about video segments showing female American tennis players going to the net

Retrieval problems
EMPLOYEE (Name: char(20), City: Char(20), Photo: Image)
How do you select employees in Skudai? How do you select employees that wear tudung, wear glasses, fair and have a mole under the lips?

Characteristics of Media Data


Medium - Information representation
Alphanumeric Representation of audio, video and image

Static vs dynamic
Static: do not have time dimensions (alphanumeric data, images, graphics) Dynamic: have time dimensions (video, animation, audio)

Multimedia
Collection of media types used together At least one media types must be non-alphanumeric

Digital representation of text


OCR techniques convert analog text to digital text Eg. of digital representation: ASCII
Use 8 bits Chinese char requires more space Storage requirements depend on number of characters

Structured documents becoming more popular


Docs consist of titles, chapters, sections, paragraphs, etc. Standards like HTML and XML used to encode structured information

Compression of text
Huffman, arithmetic coding Since storage requirements not too high, less important than multimedia data

Digital representation of audio


Audio
air pressure waves with frequency, amplitude Human hears 20-20,000 Hertz Low amplitude soft sound

Digitizing pressure waveforms


Transform into electrical signal (by microphone) Convert into discrete values
Sampling: continuous time axis divided into small, fixed intervals Quantization: determination of amplitude of video signals at beginning of each time interval Human cannot notice difference between analog & digital with enough high sampling rate and precise quantization

Audio storage requirements


Example of a CD audio
16 bits per sample 44,000 samples per second Two (stereo) channels Requirements = 16 * 44,000 * 2 bits = 1.4 Mbit per second

Compression (examples)
Masking: Discard soft sound because not audible by louder sound Speech: coding of lower frequency sounds only MPEG: audio compression standards

Digital representation of image


Scan analog photos & pictures using scanner
Analog image approximated by rectangle of small dots In digital camera, ADC is built-in

Image consists of many small dots or picture elements (pixels)


Gray scale: 1 byte (8 bits) per pixel Color: 3 color (RGB) of one byte each Data required for 1 rectangular screen
A = xyb A:number of bytes needed, x: # pixels per horizontal line, y: # horizontal lines, b: # bytes per pixel

Image compression
Exploit redundancy in image & properties of human perception
Spatial redundancy: pixels in certain area often appear similar (golden sand, blue sky) Human tolerance: error still allows effective communication

Eg. of image compression


Transform coding Fractal image coding

Digital representation of video


Sequence of frames or images presented at fixed rate
Digital video obtained by digitizing analog videos or digital cameras Playing 25 frames per second gives illusion of continuous view

Amount of data to represent video


1 second, image: 512 lines, 512 pixels per line, 24 bits per pixel, 25 frames per second 512 * 512 * 3 * 25 = 19 Mbytes

Compression of video
Compressing frames of videos: similar to image
Reduce redundancy & exploit human perception properties

Temporal redundancy: neighboring frames normally similar, remove by applying motion estimation & compression
Each image divided into fixed-sized blocks For each block in image, the most similar block in previous image is determined & pixel difference computed Together with displacement between the two blocks, this difference stored or transmitted

MPEG-1 (VHS, pixel based coding): coding of video data up to speed of 1.5 Mbits per second MPEG-2 (pixel based coding): coding of video data up to speed of 10 Mbits per second MPEG-4 (multimedia data, object based coding) : coding of video data up to speed of 40 Mbits per second, tools for decoding & representing video objects, support content-based indexing & retrieval

How to search for images or multimedia data?


Analyze one by one? No! Takes too long! Have to use metadata instead of searching directly, search for metadata that have been added to it Metadata requirements to be valuable for searching:
Description of multimedia object should be as complete as possible Storage of metadata must not take too much overhead Comparison of two metadata values must be fast

Metadata of Multimedia Objects


Descriptive data
Give format or factual info about multimedia object Eg.: author name, creation date, length of multimedia object, representation technique Eg. standard for descriptive data: Dublin core Can use SQL (metadata condition in WHERE clause)

Metadata of Multimedia Objects (cont.)


Annotations
Textual description of contents of objects Eg.: photo description in Facebook Either free format or sequence of keywords Manual text annotations allow Information Retrieval techniques to be used but
Time consuming, expensive Subjective, incomplete

Structured concepts (eg semantic web, ER-like schema) can be used to describe content through concepts, their relationships to each other & MM object but
Also slow and expensive

Metadata of Multimedia Objects (cont.)


Features
Derive characteristics from MM object itself Need language to describe features, eg. MPEG-7 Process to capture features from MM object is called feature extraction
Performed automatically, sometimes with human support

Two feature classes


Low-level features High-level features

Low-level Features
Grasp data patterns & statistics of MM object Depend strongly on medium Extraction performed automatically Eg. for text
List of keywords with frequency indicators

Eg. for audio


Representation
Amplitude-time sequence: quantification of air pressure at each sample Silence:0, > silence:+ve amplitude, < silence:-ve amplitude

Eg. Low-level features derived


Energy (loudness of signal), ZCR(zero crossing rate-frequency of sign change)-high indicate speech, silence ratio(low indicates music)

Low-level features (cont.)


Eg. for images
Color histograms: # pixels having color of certain range Spatial relationships: eg. blue patterns appears above yellow (beach photo), Contrast: # dark spots neighboring light spots

Eg. for video


Use low-level features for image Eg. of temporal dimension: shot change-when pixel difference between two images is higher than certain threshold
Shot- sequence of images taken with same camera position

High-level features
Features which are meaningful to end user, such as golf course, forest How can we bridge semantic gap between low level and high level features
High level feature extraction from low level features Eg. text containing words football, referee football match text Eg. Speech to text translators (low level audio features to text) Eg. Video-Domain specific: loud sound from crowd, round object passing white line, followed by sharp whistle-goal

Multimedia Information Retrieval System (MIRS)

Component of MIRS - Archiving


MM data stored separately from its metadata
Voluminous Visible or audible delays in playback unacceptable

MM data managed separately in MM content server


Objects get identification to be used by other parts of MIRS at storage time Have to deal with compression and protection

Component of MIRS Feature Extraction (Indexing)


Extraction of metadata (annotations, descriptions, features) from incoming multimedia object Algorithms have to consider extraction dependencies. Eg.:
Video object segmented, choose key frame for each segment Extract low-level features from key frame Based on low-level features, classify into shots of audience, fields, close-ups For field shots, detect positions of players Extract body related features of players Determine where net playing begins and ends

Have to consider incremental maintenance (modification of MM objects, extractors, extraction dependencies)

Incremental Maintenance in ACOI Feature Extraction Architecture

Component of MIRS - Searching


Multimedia queries are diverse, can be specified in many different ways No exact match, many ways to describe MM objects Specifying information need
Direct user specifies info. need herself Indirect user relies on other users

Possible Querying Scenarios

Possible Querying Scenarios (cont.)


Queries based on Profile
Users expose preferences in one way or another Preferences stored in user profile in MIRS Can use profile of a friend if not sure & trusted

Queries based on Descriptive Data


Based on format and fact about MM object Eg. all movies with Director = Steven Spielberg

Possible Querying Scenarios (cont.)


Queries based on Annotations
Text-based: keywords or natural language Eg. Show me video in which Barack Obama shakes hand with Mahathir Mohamad
Set of keywords derived from query & compared with keywords in annotations of movies

Queries based on Features


content-based queries features derived (semi) automatically from content of MM object Low & high level features used Eg. Find all photos with color distribution like this photo Eg. Give me all football videos which a goal is scored within last ten minutes
goal is high-level feature that must be known to MIRS

Possible Querying Scenarios (cont.)


Query by example
Give example MM object MIRS extract all kinds of features from the MM object Resulting query based on these features

Similarity
Degree to which query & MM object of MIRS are similar Similarity calculated by MIRS based on metadata of MM object & query Try to estimate value of relevance of MM object to user Output is list of MM objects in descending order of similarity value

General Retrieval Model

Relevance Feedback

Helps when user doesnt know exactly what he is looking for, causing problem in query formulation Interactive approach
User issue starting query, MIRS compose result set, user judge output (relevant/not), MIRS uses feedback to improve retrieval process

Component of MIRS - Browsing


User sometimes cannot precisely specify what they want, but can recognize what they want when they see it Browsing let user scans through objects
Exploits hyperlinks which lead user from one object to other When object shown, user judge its relevance & proceed accordingly If objects are huge, icons are used

Starting point
query that describe info need or system provide starting point User can ask for another starting point if not satisfied Can classify object based on topics & subtopics

Component of MIRS Output Presentation (Play)


When MIRS returns list of objects, system has to decide whether user has right to see them User interface should be able to show all kinds of MM data What if objects are huge and result set large?
Give user perception of content of object Extract & present essential info for user to browse & select objects
Text: title, summary, places where keywords occur Audio: tune, start of song Images: summary of images thumbnails Video: cut into scene n choose for each scene a prime image

Component of MIRS Output Presentation (cont.)


Streaming
Content sent to client at specific rate and except for buffering, played directly Audio & video is delivered as continuous stream of packets When resource become scarce
Use switched Ethernet instead of shared Ethernet Use disk stripping Skip frames during play-back Fragment content over several content servers (need logical component between client & servers to direct client request to corresponding server)

Quality of MIRS
Recall
r/R
r: # of relevant objects returned by system, n: # objects retrieved, R: # relevant objects in collection

Precision
r/n

Relevance judged by humans, refer to TREC (Text Retrieval Conference)

Exercise
Discuss the role of DBMS in storing MM objects Discuss the role of Information Retrieval systems in storing MM objects

End of Module 1

You might also like