Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 58

Multimedia Systems

Multimedia Databases
Outline
• Image Processing Basics
– Image Features
– Image Segmentation
• Additional Reference: Wasfi Al-Khatib, Y.
Francis Day, Arif Ghafoor, and P. Bruce Berra.
Semantic modeling and knowledge representation
in multimedia databases. IEEE Transactions on
Knowledge and Data Engineering, 11(1):64-80,
1999.
Image Processing
• Image processing involves the analysis of
scenes or the reconstruction of models from
images representing 2D or 3D objects.
– Image Analysis

• Identifying Image Properties (Image Features)

• Image Segmentation

• Image Recognition
• We will look at image processing from a
database perspective.
– Objective: Design of robust image processing and
recognition techniques to support semantic modeling,
knowledge representation, and querying of images.
Semantic Modeling and Knowledge
Representation in Image Databases
•Feature Extraction.
•Salient Object Identification.
•Content-Based Indexing and Retrieval. •
Query Formulation and Processing. Multi-
Level Abstraction
Semantic Modeling

And Knowledge Semantic Specification Semantic


Representation Knowledge Base Identification Process
Layer

Object
Recognition Layer Object Models Object Recognition
Process
Feature
Extraction Layer
Feature Specification Feature Extraction
Process
Multimedia
Data
Image Data
Still Video Frames

Feature Extraction Layer


• Image features: Colors, Textures, Shapes,
Edges, ...etc.
• Features are mapped into a
multidimensional feature space allowing
similarity-based retrieval.
• Features can be classified into two types:
Global and Local.

Global Features
• Generally emphasize coarse-grained pattern
matching techniques.
• Transform the whole image into a functional
representation.
• Finer details within individual parts of the image are
ignored.
• Examples: Color histograms and coherence vectors,
Texture, Fast Fourier Transform, Hough Transform,
and Eigenvalues.

• What are some of the example queries?

Color Histogram
• How many pixels of the image take a
specific color
– In order to control the number of colors, the
domain is discretized
• E.g. consider the value of the two leftmost bits in
each color channel (RGB).

• In this case , the number of different colors is equal


to __________

• How can we determine whether two images are


similar using the color histogram?
Color Coherence Vector
• Based on the color histogram
• Each pixel is checked as to whether it is within a
sufficiently large one-color environment or not.
– i.e. in a region related by a path of pixels of the same color
• If so, the pixel is called coherent, otherwise incoherent
• For each color j, compute the number of coherent and
incoherent pixels (αj , βj), j = 1, ..., J
• When comparing two images with color coherence
vectors (αj , βj) and (γj , µj), j = 1, ..., J, we may use the
expression

∑ ⎛⎜⎝⎜αα j +j −γγ j +j 1 + ββ +j −µµ


jJ 1
=
j

+j 1⎟⎟⎞⎠
Texture
• Texture is a small surface structure
– Natural or artificial

– Regular or irregular
• Examples include
– Wood barks
– Knitting patterns – The surface of a
sponge Texture Examples
– Artificial/pe
riodic

– Artificial/no
n-periodic
– Photographi
c/pseudo-
periodic

– Photographi
c/random –
Photographic/structured – Inhomogeneous

(non-texture) Texture
• Two basic approaches to study texture

– Structural analysis searches for small basic


components and an arrangement rule
– Statistical analysis describes the texture as a
whole based on specific attributes (local graylevel
variance, regularity, coarseness, orientation, and
contrast.
• Either done in the spatial domain or the spatial frequency
domain

Global Features
• Advantages:
– Simple.
– Low computational complexity.
• Disadvantages: – Low accuracy

Local Features
• Images are segmented into a collection of smaller
regions, with each region representing a potential
object of interest (fine-grained).
• An object of interest may represent a simple
semantic object (e.g. a round object).
• Choice of features is domain specific:
– X-ray imaging, GIS, ...etc require spatial features (e.g.
shapes [may be calculated through edges] and
dimensions.)
– Paintings, MMR imaging, ...etc may use color features in
specific regions of the image.

Edge Detection
• A given input image E is used to gradually
compute a (zero-initialized) output image A.
– A convolution mask runs across E pixel by pixel
and links the entries in the mask at each position
that M occupies in E with the gray value of the
underlying image dots.
– The result of the linkage (and the subsequent sum
across all products from the mask entry and the
gray value of the underlying image pixel) is
written to the output image A.
Convolution
• Convolution is a simple mathematical operation which is
fundamental to many common image processing operators.
• Convolution provides a way of `multiplying together' two
arrays of numbers, generally of different sizes, but of the
same dimensionality, to produce a third array of numbers
of the same dimensionality.
• This can be used in image processing to implement
operators whose output pixel values are simple linear
combinations of certain input pixel values.
• The convolution is performed by sliding the kernel over
the image, generally starting at the top left corner, so as to
move the kernel through all the positions where the kernel
fits entirely within the boundaries of the image.

Convolution Computation
• If the image E has M rows and N columns, and the
kernel K has m rows and n columns, then the size

of the output image A will have M - m + 1 rows,


and N - n + 1 columns and is given by:
m n
A(i, j) =∑∑E(i+k −1, j +l −1)K(k,l)
k= =1 1l

Similarity Metrics
• Minkowski Distance 1
⎛F ⎞
r r

⎜∑x[i]− y[i] ⎟
⎝ i=1 ⎠
• Weighted Distance
– Average Distance

• Color Histogram Intersection


Prototype Systems
• QBIC (http://www.hermitagemuseum.org)

– Uses color, shape, and texture features


– Allows queries by sketching features and providing
color information
• Chabot (Cypress)
– Uses color and textual annotation.
– Improved performance due to textual annotation
(Concept Query)
• KMeD
– Uses shapes and contours as features.
– Features are extracted automatically in some cases and
manually in other cases.

Image Segmentation
• Assigning a unique number to “object” pixels
based on different intensities or colors in the
foreground and the background regions of an
image
– Can be used in the object recognition process, but
it is not object recognition on its own
• Segmentation Methods
– Pixel oriented methods – Edge oriented methods –
Region oriented methods

– ....
Pixel-Oriented Segmentation
• Gray-values of pixels are studied in isolation
• Looks at the gray-level histogram of an image and
finds one or more thresholds in the histogram
– Ideally, the histogram has a region without pixels,
which is set as the threshold, and hence the image is
divided into a foreground and a background based on
that (Bimodal Distribution)
• Major drawback of this approach is that object and
background histograms overlap.
– Bimodal distribution rarely occurs in nature.

Edge-Oriented Segmentation
• Segmentation is carried out as follows
– Edges of an image are extracted (using Canny
operators, e.g.)
– Edges are connected to form closed contours
around the objects.
• Hough Transform

– Usually very expensive


– Works well with regular curves (application in
manufactured parts)
– May work in presence of noise

Region-Oriented Segmentation
• A major disadvantage of the previous
approaches is the lack of “spatial” relationship
considerations of pixels.
– Neighboring pixels normally have similar properties
• The segmentation (region-growing) is carried out
as follows

– Start with a “seed” pixel.


– Pixel’s neighbors are included if they have some
similarity to the seed pixel, otherwise they are not.
• Homogeneity condition

• Uses an eight-neighborhood (8-nbd) model


Region-Oriented Segmentation
• Homogeneity criterion: Gray-level mean value of a

region is usually used 1 N N mk = n2 ∑∑i= =1 j 1 P(i, j)

1N N 2
• With standard deviation σk = n2 ∑∑i= =1 j 1(P(i, j)−mk ) •

Drawbacks: Computationally expensive.

Water Inflow Segmentation


• Fill a gray-level image gradually with
water.
– Gray-levels of pixels are taken as height.
– The higher the water rises, the more
pixels are flooded
• Hence, you have lands and waters

• Lands correspond to “objects”

Object Recognition Layer


• Features are analyzed to recognize objects and faces
in an image database.
– Features are matched with object models stored in a
knowledge base.
– Each template is inspected to find the closest match.
– Exact matches are usually impossible and generally
computationally expensive.

– Occlusion of objects and the existence of spurious


features in the image can further diminish the success of
matching strategies.

Template Matching Techniques


• Fixed Template Matching
– Useful if object shapes do not change with respect
to the viewing angle of the camera.
• Deformable Template Matching

– More suitable for cases where objects in the


database may vary due to rigid and non-rigid
deformations.

Fixed Template Matching


• Image Subtraction:
– Difference in intensity levels between the image and the
template is used in object recognition.
– Performs well in restricted environments where imaging
conditions (such as image intensity) between the image
and the template are the same.
• Matching by correlation:

– utilizes the position of the normalized cross-correlation


peak between a template and image.
– Generally immune to noise and illumination effects in
the image.
– Suffers from high computational complexity caused by
summations over the entire template.
Deformable Template Matching
• Template is represented as a bitmap describing the
characteristic contour/edges of an object shape.
• An objective function with transformation
parameters which alter the shape of the template is
formulated reflecting the cost of such
transformations.
• The objective function is minimized by iteratively
updating the transformations parameters to best
match the object.
• Applications include: handwritten character
recognition and motion detection of objects in video
frames.

Prototype System: KMeD


• Medical objects belonging only to patients
in a small age group are identified
automatically in KMeD.
– Such objects have high contrast with respect to
their background and have relatively simple
shapes, large sizes, and little or no overlap with
other objects.
• KMeD resorts to a human-assisted object
recognition process otherwise.
Spatial Modeling and Knowledge
Representation Layer (1)
• Maintain the domain knowledge for representing
spatial semantics associated with image databases.
• At this level, queries are generally descriptive in
nature, and focus mostly on semantics and concepts
present in image databases.
• Semantics at this level are based on ``spatial events''
describing the relative locations of multiple objects.
– An example involving such semantics is a range query
which involves spatial concepts such as close by, in the
vicinity, larger than. (e.g. retrieve all images that
contain a large tumor in the brain).

Spatial Modeling and Knowledge


Representation Layer (2)
• Identify spatial relationships among objects, once
they are recognized and marked by the lower layer
using bounding boxes or volumes.
• Several techniques have been proposed to formally
represent spatial knowledge at this layer.

– Semantic networks
– Mathematical logic
– Constraints
– Inclusion hierarchies – Frames.
Semantic Networks
• First introduced to represent the meanings of English
sentences in terms of words and relationships between
them.
• Semantic networks are graphs of nodes representing
concepts that are linked together by arcs representing
relationships between these concepts.

• Efficiency in semantic networks is gained by


representing each concept or object once and using
pointers for cross references rather than naming an
object explicitly every time it is involved in a relation.
• Example: Type Abstraction Hierarchies (KMeD)
Brain Lesions Representation
TAH Example
Constraints-based Methodology

• Domain knowledge is represented using a


set of constraints in conjunction with formal
expressions such as predicate calculus or
graphs.
• A constraint is a relationship between two or
more objects that needs to be satisfied.

Example: PICTION system


• Its architecture consists of a natural language
processing module (NLP), an image understanding
module (IU), and a control module.
• A set of constraints is derived by the NLP module
from the picture captions. These constraints (called
Visual Semantics by the author) are used with the faces
recognized in the picture by the IU module to identify
the spatial relationships among people.
• The control module maintains the constraints
generated by the NLP module and acts as a
knowledge-base for the IU module to perform face
recognition functions.
Mathematical Logic
• Iconic Indexing by 2D strings: Uses projections of
salient objects in a coordinated system.
• These projections are expressed in the form of 2D
strings to form a partial ordering of object
projections in 2D.
• For query processing, 2D subsequence matching is
performed to allow similarity-based retrieval.
• Binary Spatial Relations: Uses Allen's 13 temporal
relations to represent spatial relationships.
Inclusion Hierarchies
• The approach is object-oriented and uses
concept classes and attributes to represent
domain knowledge.
• These concepts may represent image
features, high-level semantics, semantic
operators and conditions.
Frames
• A frame usually consists of a name and a list
of attribute-value pairs.
• A frame can be associated with a class of
objects or with a class of concepts.
• Frame abstractions allow encapsulation of
file names, features, and relevant attributes
of image objects.

You might also like