Dictionary - of - Computer - Vision - and - Image Book PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 338

Dictionary of Computer Vision

and Image Processing

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8
Dictionary
of Computer
Vision and Image
Processing
R.B. Fisher
University of Edinburgh

K. Dawson-Howe
Trinity College Dublin

A. Fitzgibbon
Oxford University

C. Robertson
Heriot-Watt University

E. Trucco
Heriot-Watt University
Copyright © 2005 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,
West Sussex PO19 8SQ, England

Telephone (+44) 1243 779777

Email (for orders and customer service enquiries): cs-books@wiley.co.uk


Visit our Home Page on www.wiley.com

All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval
system or transmitted in any form or by any means, electronic, mechanical,
photocopying, recording, scanning or otherwise, except under the terms of
the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by
the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK,
without the permission in writing of the Publisher. Requests to the Publisher should be
addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern
Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to permreq@wiley.co.uk,
or faxed to +44 1243 770620.

This publication is designed to provide accurate and authoritative information in regard


to the subject matter covered. It is sold on the understanding that the Publisher is not
engaged in rendering professional services. If professional advice or other expert
assistance is required, the services of a competent professional should be sought.

Other Wiley Editorial Offices

John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA

Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA

Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany

John Wiley & Sons Australia Ltd, 42 McDougall Street,


Milton, Queensland 4064, Australia

John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01,
Jin Xing Distripark, Singapore 129809

John Wiley & Sons Canada Ltd, 22 Worcester Road,


Etobicoke, Ontario, Canada M9W 1L1

Wiley also publishes its books in a variety of electronic formats. Some content that
appears in print may not be available in electronic books.

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

ISBN-13 978-0-470-01526-1 (PB)


ISBN-10 0-470-01526-8 (PB)

Typeset in 9/9.5pt Garamond by Integra Software Services Pvt. Ltd, Pondicherry, India
Printed and bound in Great Britain by Antony Rowe Ltd, Chippenham, Wiltshire
This book is printed on acid-free paper responsibly manufactured from sustainable
forestry in which at least two trees are planted for each one used for paper production.
From Bob to Rosemary,
Mies, Hannah, Phoebe
and Lars

From AWF to Liz, to my


parents, and again to D.

To Karen and Aidan.


Thanks pips!

From Ken to Jane,


William and Susie

From Manuel to Emily,


Francesca and Alistair
Contents

Page ix Preface

xi References

1 0

7 A

25 B

37 C

69 D

81 E

91 F

103 G

117 H

125 I

145 J

147 K

151 L
vii
167 M

189 N

197 O

207 P

235 Q

239 R

259 S

301 T

315 U

319 V

329 W

337 X

339 Y

341 Z

viii
Preface

This dictionary arose out of a research publications, have not


continuing interest in the yet appeared in mainstream text-
resources needed by beginning books. Thus this book is also a
students and researchers in useful source for recent terminol-
the fields of image processing, ogy and concepts.
computer vision and machine Certainly some concepts are
vision (however you choose to missing, but we have scanned
define these overlapping fields). both textbooks and the research
As instructors and mentors, we literature to find the central
often found confusion about and commonly used terms. Many
what various terms and concepts additional terms also arose as
mean for the beginner. To part of the definition process
support these learners, we have itself.
tried to define the key concepts Although the dictionary was
that a competent generalist intended for beginning and
should know about these fields. intermediate students and
The results are definitions for researchers, as we developed
more than 2500 terms.
the dictionary it was clear that
This is a dictionary, not an
encyclopedia, so the definitions we also had some confusions
are necessarily brief and are not and vague understandings of the
intended to replace a proper text- concepts. It also surprised us that
book explanation of the term. some terms had multiple usages.
We have tried to capture the To improve quality and coverage,
essentials of the terms, with each definition was reviewed
short examples or mathematical during development by at least
precision where feasible or nec- two people besides its author.
essary for clarity. Further infor- We hope that this has caught any
mation about many of the terms errors and vagueness, as well
can be found in the references as reproduced the alternative
that follow this preface. These are meanings. Each of the co-authors
mostly general textbooks, each is quite experienced in the
providing a broad view of a por- topics covered here, but it was
tion of the field. Some of the still educational to learn more
concepts are also quite recent about our field in the process
and, although commonly used in of compiling the dictionary.

ix
We hope that you find using the Joe L. Mundy
dictionary equally valuable. Shmuel Peleg
While we have strived for per- Maria Petrou
fection, we recognize that we Keith Price
might have made some errors Azriel Rosenfeld
or been insufficiently precise. Amnon Shashua
Hence, there is a web site where Yoshiaki Shirai
errata and other materials can Milan Sonka
be found: http://homepages.inf. Chris Taylor
ed.ac.uk/rbf/CVDICT/ If you spot Demetri Terzopoulos
an error, please email us: rbf@inf. John Tsotsos
ed.ac.uk Shimon Ullman
To help the reader, terms Andrew Zisserman
appearing elsewhere in the dic- Steven Zucker
tionary are underlined. We have
tried to be reasonably thorough
about this, but some terms, such
as 2D, 3D, light, camera, image,
pixel and color were so com-
monly used that we decided to
not cross-reference these.
We have tried to be consistent
with the mathematical notation:
italics for scalars (s), arrowed ital-
ics for points and vectors (v ), and
bold font letters for matrices (M).
The contents of the dictio-
nary have been much improved
with the advice and suggestions
from the dictionary’s interna-
tional panel of experts:

Andrew Blake
Aaron Bobick
Chris Brown
Stefan Carlsson
Henrik Christensen
Roberto Cipolla
James L. Crowley
Patrick Flynn
Vaclav Hlavac
Anil Jain
Avinash Kak
Ales Leonardis
Song-De Ma
Gerard Medioni

x
References

1. D. Ballard, C. Brown. Computer Vision. Prentice Hall, 1982.


A classic textbook presenting an overview of techniques in the
early days of computer vision. Still a source of very useful infor-
mation.

2. D. Forsyth, J. Ponce. Computer Vision – A Modern Approach. Pren-


tice Hall, 2003.
A recent, comprehensive book covering both 2D (image process-
ing) and 3D material.

3. R. M. Haralick, L. G. Shapiro. Computer and Robot Vision. Addison-


Wesley Longman Publishing, 1992.
A well-known, extensive collection of algorithms and techniques,
with mathematics worked out in detail. Mostly image processing,
but some 3D vision as well.

4. E. Hecht. Optics. Addison-Wesley, 1987.


A key resource for information on light, geometrical optics, dis-
tortion, polarization, Fourier optics, etc.

5. B. K. P. Horn. Robot Vision. MIT Press, 1986.


A classic textbook in computer vision. Especially famous for the
treatment of optic flow. Dated nowadays, but still very interesting
and useful.

6. A. Jain. Fundamentals of Digital Image Processing. Prentice Hall


Intl, 1989.
A little dated, but still a thorough introduction to the key topics in
2D image processing and analysis. Particularly useful is the infor-
mation on various whole image transforms, such as the 2D Fourier
transform.

7. R. Jain, R. Kasturi, B. Schunck. Machine Vision. McGraw-Hill, 1995.

xi
A good balance of image processing and 3D vision, including
typically 3D topics like model-based matching. Reader-friendly
presentation, also graphically.

8. V. S. Nalwa. A Guided Tour of Computer Vision. Addison-Wesley,


1993.
A discursive, compact presentation of computer vision at the
beginning of the 90s. Good to get an overview of the field as it
was, and quickly.

9. M. Petrou and P. Bosdogianni. Image Processing: The Fundamen-


tals. Wiley Interscience, 1999.
A student-oriented textbook on image processing, focusing on
enhancement, compression, restoration and pre-processing for
image understanding.

10. M. Sonka, V. Hlavac, R. Boyle. Image Processing, Analysis, and


Machine Vision. Chapman & Hall, 1993.
A well-known, exhaustive textbook covering much image process-
ing and a good amount of 3D vision alike, so that algorithms are
sometimes only sketched. A very good reference book.

11. G. Stockman, L. Shapiro. Computer Vision. Prentice Hall. 2001.


A thorough and broad 2D and 3D computer vision book, suitable
for use as a course textbook and for reference.

12. E. Trucco, A. Verri. Introductory Techniques for 3-D Computer


Vision. Prentice Hall, 1998.
This book gives algorithms and theory for many central 3D algo-
rithms and topics, and includes supporting detail from 2D and
image processing where appropriate.

13. S. E. Umbaugh. Computer Vision and Image Processing. Prentice


Hall, 1998.
A compact book on image processing, coming with its own devel-
opment kit and examples on a CD.

xii
See main text for descriptions of figures

additive color: The way in which


multiple wavelengths of light can be
combined to allow other colors to be
perceived.
chromaticity diagram: A 2D slice of a 3D color
space. The CIE 1931 chromaticity diagram is the
slice through the xyz color space of the CIE
where x + y + z = 1.

CMYK: Cyan, magenta, yellow and black color


model.

chamfer matching: A matching


technique based on the comparison of
contours, and based on the concept of
chamfer distance assessing the simi- color image segmentation: Segmenting a color
larity of two sets of points. This can be image into homogeneous regions based on some
used for matching edge images using similarity criteria. The boundaries around typical
the distance transform. regions are shown here.
16,777,216 colors 256 colors

16 colors 4 colors
color quantization: The process of reducing
the number of colors in an image by selecting
a subset of colors, then representing the
original image using only them.

contrast enhancement: Contrast


enhancement (also known as contrast
stretching) expands the distribution of
intensity values in an image so that a
larger range of sensitivity in the output
device can be used.

color re-mapping: An image transformation


where each original color is replaced by
another color from a colormap.
multi-spectral thresholding: A segmen-
tation technique for multi-spectral image
Colour Image data.

Hue Saturation Luminance

HSL: Hue-Saturation-Luminance color image


format.
0

1D: One dimensional, usually in axis, and reading the numbers


reference to some structure. at the intersections:
Examples include: 1) a signal
xt that is a function of time t, y axis
2) the dimensionality of a sin-
gle property value or 3) one P
Py
degree of freedom in shape
variation or motion.
x axis
2D: Two dimensional. A space
Px
describable using any pair of
orthogonal basis vectors con-
sisting of two elements.
2D coordinate system: A sys-
tem associating uniquely 2 real
2D Fourier transform: A spe-
numbers to any point of a
cial case of the general Fourier
plane. First, two intersecting
transform often used to find
lines (axes) are chosen on the
structures in images.
plane, usually perpendicular to
each other. The point of inter- 2D image: A matrix of data rep-
section is the origin of the sys- resenting samples taken at dis-
tem. Second, metric units are crete intervals. The data may be
established on each axis (often from a variety of sources and
the same for both axes) to asso- sampled in a variety of ways.
ciate numbers to points. The In computer vision applica-
coordinates Px and Py of a tions the image values are often
point, P, are obtained by pro- encoded color or monochrome
jecting P onto each axis in a intensity samples taken by digit-
direction parallel to the other al cameras but may also be

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

1
range data. Some typical inten- T shaped for example). One
sity values are: detector for these features is
the SUSAN corner finder.
2D pose estimation: A funda-
mental open problem in
computer vision where the
correspondence between two
sets of 2D points is found. The
problem is defined as follows:
Given two sets of points  xj 
06 21 11
and  yk , find the Euclidean
21 16 12 10 09 transformation R t  (the pose)
10 09 08 09 20 31 and the match matrix Mjk  (the
07 06 01 02 08 42 correspondences) that best
17 12 09 04 relates them. A large number
of techniques has been used to
Image values address this problem, for ex-
ample tree-pruning methods,
2D input device: A device the Hough transform and
for sampling light intensity geometric hashing. A special
from the real world into a case of 3D pose estimation.
2D matrix of measurements.
The most popular two dimen- 2D projection: A transformation
sional imaging device is the mapping higher dimensional
charge-coupled device (CCD) space onto two dimensional
camera. Other common devices space. The simplest method is
are flatbed scanners and X-ray to simply discard higher dimen-
scanners. sional coordinates, although
generally a viewing position is
2D point: A point in a 2D used and the projection is per-
space, that is, characterized by formed.
two coordinates; most often, a
point on a plane, for instance
an image point in pixel coord-
inates. Notice, however, that
Projected points
two coordinates do not neces-
sarily imply a plane: a point on Viewpoint
a 3D surface can be expressed
either in 3D coordinates or by
3D solid
two coordinates given a surface 2D space
parameterization (see surface
patch).
2D point feature: Localized For example, the main steps
structures in a 2D image, such for a computer graphics projec-
as interest points, corners and tion are as follows: apply nor-
line meeting points ( X, Y and malizing transform to 3D point
2
world coordinates; clip against 3D coordinate system: Same as
canonical view volume; project 2D coordinate system, but in
onto projection plane; trans- three dimensions.
form into viewport in 2D device
coordinates for display. Com- +Y
monly used projections func-
tions are parallel projection or
perspective projection. +Z
2.5D image: A range image
obtained by scanning from a
single viewpoint. This allows +X
the data to be represented in a
single image array, where each
pixel value encodes the dis-
tance to the observed scene.
The reason this is not called a
3D image is to make explicit
the fact that the back sides of 3D data: Data described in all
the scene objects are not rep- three spatial dimensions. See
resented. also range data, CAT and NMR.
An example of a 3D data set is:
2.5D sketch: Central structure
of Marr’s theory of vision. An
intermediate description of a
scene indicating the visible sur-
faces and their arrangement
with respect to the viewer.
It is built from several dif-
ferent elements: the contour,
texture and shading informa-
tion coming from the primal
sketch, stereo information and
motion. The description is the-
orized to be a kind of buffer
where partial resolution of the 3D data acquisition: Sampling
objects takes place. The name data in all three spatial dimen-
2 12 D sketch stems from the fact sions. There are a variety of
that although local changes in ways to perform this sampling,
depth and discontinuities are for example using structured
well resolved, and the absolute light triangulation.
distance to all scene points may
remain unknown. 3D image: See range image.
3D: Three dimensional. A space 3D interpretation: A 3D model,
describable using any triple of e.g., a solid object, that explains
mutually orthogonal basis vec- an image or a set of image
tors consisting of three elem- data. For instance, a certain con-
ents. figuration of image lines can be
3
explained as the perspective sets of 3D points is found. The
projection of a polyhedron; in problem is defined as follows:
simpler words, the image lines Given two sets of points xj  and
are the images of some of the  yk , find the parameters of an
polyhedron’s lines. See also Euclidean transformation R t 
image interpretation. (the pose)and the match matrix
3D model: A description of a 3D Mjk  (the correspondences)
object that primarily describes that best relates them. Assum-
its shape. Models of this sort ing the points correspond, they
are regularly used as exemplars should match exactly under this
in model based recognition transformation.
and 3D computer graphics. 3D reconstruction: A general
3D moments: A special case of term referring to the computa-
moment where the data comes tion of a 3D model from 2D
from a set of 3D points. images.
3D object: A subset of 3 . In 3D skeleton: See skeleton
computer vision, often taken 3D stratigraphy: A modeling
to mean a volume in 3 that and visualization tool used to
is bounded by a surface. Any display different underground
solid object around you is an layers. Often used for visual-
example: table, chairs, books, izations of archaeological sites
cups, and you yourself. or for detecting different rock
3D point: An infinitesimal vol- and soil structures in geologi-
ume of 3D space. cal surveying.
3D point feature: A point 3D structure recovery: See 3D
feature on a 3D object or in a reconstruction.
3D environment. For instance, 3D texture: The appearance of
a corner in 3D space. texture on a 3D surface when
3D pose estimation: 3D pose imaged, for instance, the fact
estimation is the process of that the density of texels varies
determining the transforma- with distance due to perspec-
tion (translation and rotation) tive effects. 3D surface prop-
of an object in one coordinate erties (e.g., shape, distances,
frame with respect to another orientation) can be estimated
coordinate frame. Generally, from such effects. See also
only rigid objects are consider- shape from texture, texture
ed, models of those object orientation.
exist a priori, and we wish to 3D vision: A branch of
determine the position of that computer vision dealing with
object in an image on the basis characterizing data composed
of matched features. This is a of 3D measurements. For
fundamental open problem example, this may involve
in computer vision where the segmentation of the data
correspondence between two into individual surfaces that

4
are then used to identify See also 4 connectedness. This
the data as one of several figure shows the eight pixels
models. Reverse engineering is connected to the central pixel
a specialism inside 3D vision. (*): and the two groups of pix-
4 connectedness: A type of els joined by 8 connectedness:
image connectedness in which
each rectangular pixel is con- 1 1 1
sidered to be connected to the 1
1 2
1
1
four neighboring pixels that 1 2 1
1 1
share a common crack edge. 1 1 1
See also 8 connectedness. This 1 1 1
figure shows the four pix- Object pixel Connected Object Pixels
els connected to the central Background pixel
pixel (*):

and the four groups of pixels


joined by 4 connectedness:

1 1 1
2 3
2 4 3
2 4 3
2 3
2 2 3
2 2 2

Object pixel Connected object pixels


Background pixel

8 connectedness: A type of
image connectedness in which
each rectangular pixel is
considered to be connected to
all eight neighboring pixels.

5
A

A*: A search technique that per- tained. There are two types of
forms best-first searching based aberration commonly encoun-
on an evaluation function that tered: chromatic aberration,
combines the cost so far and the where different frequencies of
estimated cost to the goal. light focus at different posi-
a posteriori probability: Liter- tions – and spherical aber-
ally, “after” probability. It is the ration, where light passing
probability pse that some situ- through the edges of a lens (or
ation s holds after some evi- mirror) focuses at slightly dif-
dence e has been observed. ferent positions.
This contrasts with the a priori
probability ps that is the prob-
ability of s before any evidence
is observed. Bayes, rule is often
used to compute the a poster-
iori probability from the a priori
blue red
probability and the evidence.
a priori probability: Suppose
that there is a set Q of equally
likely outcomes for a given
action. If a particular event E chromatic aberration
could occur of any one of a sub-
set S of these outcomes, then the absolute conic: The conic in 3D
a priori or theoretical probabil- projective space that is the inter-
ity of E is defined by section of the unit (or any)
sizeS sphere with the plane at infin-
PE = ity. It consists only of complex
sizeQ points. Its importance in com-
aberration: Problem exhibited puter vision is due to its role in
by a lens or a mirror whereby the problem of autocalibration:
unexpected results are ob- the image of the absolute conic

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

7
 
(IAC), a 2D conic, is repre- I3 0 3
sented by a 3 × 3 matrix  = . Like the
that is the inverse of the matrix 0 
3 0
KK  , where K is the matrix of absolute conic, it is defined to
the internal camera calibration be invariant under Euclidean
parameters. Thus, identifying  transformations, is rescaled
allows the camera calibration to under similarities,
  takes
 the
A A 0 3
be computed. form = under
0 
3 0
absolute coordinates: Gener-
affine transforms and becomes
ally used in contrast to local or
an arbitrary 4 × 4 rank 3 matrix
relative coordinates. A coord-
under projective transforms.
inate system that is referenced
to some external datum. For absorption: Attenuation of light
example, a pixel in a satellite caused by passing through an
image might be at (100,200) optical system or being incident
in image coordinates, but on an object surface.
at (51:48:05N, 8:17:54W) in accumulation method: A
georeferenced absolute coord- method of accumulating evi-
inates. dence in histogram form, then
absolute orientation: In photo- searching for peaks, which
grammetry, the problem of correspond to hypotheses.
registering two corresponding See also Hough transform,
sets of 3D points. Used to regis- generalized Hough transform.
ter a photogrammetric recon- accumulative difference: A
struction to some absolute means of detecting motion in
coordinate system. Often ex- image sequences. Each frame
pressed as the problem of in the sequence is compared
determining the rotation R, to a reference frame (after
translation t and scale s that registration if necessary) to
best transforms a set of produce a difference image.
model points m  1     m
 n Thresholding the difference
to corresponding data points image gives a binary motion
d1      dn  by minimizing the mask. A counter for each pixel
least-squares error location in the accumulative
n image is incremented every
R t  s = di − sR m  i + t 2 time the difference between
i=1 the reference image and the
to which a solution may be current image exceeds some
found in terms of the singular threshold. Used for change
value decomposition. detection.
absolute point: A 3D point accuracy: The error of a value
defining the origin of a coord- away from the true value. Con-
inate system. trast this with precision.
absolute quadric: The sym- acoustic sonar: Sound Naviga-
metric 4 × 4 rank 3 matrix tion And Ranging. A device that
8
is used primarily for the detec- The model is based on an
tion and location of objects initial region that is divided
(e.g., underwater or in air, as using Delaunay triangulation
in mobile robotics, or internal and then each patch is tracked
to a human body, as in medical from frame to frame (note that
ultrasound) by reflecting and the patches can deform).
intercepting acoustic waves. It
active contour models: A tech-
operates with acoustic waves
nique used in model based
in an analogous way to that of
vision where object bound-
radar, using both the time of
flight and Doppler effects, giv- aries are detected using a
ing the radial component of deformable curve representa-
relative position and velocity. tion such as a snake. The term
active refers to the ability of the
ACRONYM: A vision system de- snake to deform shape to bet-
veloped by Brooks that at- ter match the image data. See
tempted to recognize three also active shape model.
dimensional objects from two
dimensional images, using active contour tracking: A tech-
generalized cylinder primitives nique used in model based
to represent both stored model vision where object boundaries
and objects extracted from the are tracked in a video sequence
image. using active contour models.
active appearance model: A active illumination: A system
generalization of the widely of lighting where intensity,
used active shape model ap- orientation, or pattern may be
proach that includes all of the continuously controlled and
information in the image region altered. This kind of system may
covered by the target object, be used to generate structured
rather than just that near mod- light.
eled edges. The active appear- active learning: Learning about
ance model has a statistical the environment through inter-
model of the shape and gray- action (e.g., looking at an object
level appearance of the object of from a new viewpoint).
interest. This statistical model
generalizes to cover most valid active net: An active shape
examples. Matching to an image model that parameterizes a
involves finding model param- triangulated mesh.
eters that minimize the differ- active sensing: 1) A sensing
ence between the image and activity carried out in an active
a synthesized model example, or purposive way, for instance
projected into the image. where a camera is moved in
active blob: A region based space to acquire multiple or
approach to the tracking of optimal views of an object. (See
non-rigid motion in which an also active vision, purposive
active shape model is used. vision, sensor planning.)

9
2) A sensing activity implying on the scene. Light stripe
the projection of a pattern of ranging is one form of active
energy, for instance a laser line, triangulation. A variant is to use
onto the scene. See also laser a single scanning laser beam
stripe triangulation, structured to illuminate the scene and
light triangulation. use a stereo pair of cameras to
active shape model: Statistical compute depth.
models of the shapes of objects active vision: An approach to
that can deform to fit to a computer vision in which the
new example of the object. The camera or sensor is moved in a
shapes are constrained by a stat- controlled manner, so as to sim-
istical shape model so that they plify the nature of a problem.
may vary only in ways seen For example, rotating a cam-
in a training set. The models era with constant angular vel-
are usually formed by using ocity while maintaining fixation
principal component analysis at a point allows absolute cal-
to identify the dominant modes culation of scene point depth,
of shape variation in observed instead of only relative depth
examples of the shape. Model that depends on the camera
shapes are formed by linear speed. (See also kinetic depth.)
combinations of the dominant
modes. active volume: The volume of
interest in a machine vision
active stereo: An alternative ap-
application.
proach to traditional binocular
stereo. One of the cameras is activity analysis: Analyzing the
replaced with a structured light behavior of people or objects in
projector, which projects light a video sequence, for the pur-
onto the object of interest. If the pose of identifying the imme-
camera calibration is known, diate actions occurring or the
the triangulation for comput- long term sequence of actions.
ing the 3D coordinates of object For example, detecting poten-
points simply involves finding tial intruders in a restricted area.
the intersection of a ray and
known structures in the light acuity: The ability of a vision sys-
field. tem to discriminate (or resolve)
between closely arranged visual
active surface: 1) A surface de- stimuli. This can be measure
termined using a range sensor; using a grating, i.e., a pattern
2) an active shape model that of parallel black and white
deforms to fit a surface. stripes of equal widths. Once
active triangulation: Determin- the bars become too close, the
ation of surface depth by grating becomes indistinguish-
triangulation between a light able from a uniform image of
source at a known position the same average intensity as
and a camera that observes the bars. Under optimal light-
the effects of the illuminant ing, the minimum spacing that

10
a person can resolve is 0.5 min
of arc.
adaptive: The property of an
algorithm to adjust its param-
eters to the data at hand in
order to optimize performance.
Examples include adaptive
contrast enhancement, adap-
tive filtering and adaptive
smoothing. Original After adaptive
histogram equalization
adaptive coding: A scheme for
the transmission of signals over
unreliable channels, for ex- adaptive Hough transform: A
ample a wireless link. Adaptive Hough transform method that
coding varies the parameters iteratively increases the reso-
of the encoding to respond lution of the parameter space
to changes in the channel, for quantization. It is particularly
example “fading”, where the useful for dealing with high
signal-to-noise ratio degrades. dimensional parameter spaces.
adaptive contrast enhance- Its disadvantage is that sharp
ment: An image processing peaks in the histogram can be
operation that applies histo- missed.
gram equalization locally
adaptive meshing: Methods for
across an image.
creating simplified meshes
adaptive edge detection: where elements are made
Edge detection with adaptive smaller in regions of high detail
thresholding of the gradient (rapid changes in surface orien-
magnitude image. tation) and larger in regions of
adaptive filtering: In signal pro- lowdetail,suchasplanes.
cessing, any filtering process in adaptive pyramid: A method of
which the parameters of the fil- multi-scale processing where
ter change over time, or where
small areas of image having
the parameters are different at
different parts of the signal or some feature in common (say
image. color) are first extracted into
a graph representation. This
adaptive histogram equaliza- graph is then manipulated, for
tion: A localized method of example by pruning or mer-
improving image contrast. ging, until the level of desired
A histogram is constructed of scale is reached.
the gray levels present. These
gray levels are re-mapped so adaptive reconstruction: Data
that the histogram is approxi- driven methods for creating
mately flat. It can be made statistically significant data in
perfectly flat by dithering. areas of a 3D data cloud where

11
data may be missing due to can be combined to allow other
sampling problems. colors to be perceived (e.g., if
adaptive smoothing: An itera- equal amounts of green and
tive smoothing algorithm that red light are shone on a sheet
avoids smoothing over edges. of white paper the paper will
Given an image Ix y, one iter- appear to be illuminated with
ation of adaptive smoothing a yellow light source). Con-
proceeds as follows: trast this with subtractive color.
(See plate section for a color
1. Compute gradient magnitude version of this figure.)
image Gx y = 
Ix y
2. Make weights image Wx y =
e − Gxy
3. Smooth the image
1 1
Yellow
i=−1 j=−1 Axyij
Sx y = 1 1 Green Red
i=−1 j=−1 Bxyij
where
Axyij = Ix + i y + jWx + i y + j 

Bxyij = Wx + i y + j additive noise: Generally im-


adaptive thresholding: An im- age independent noise that is
proved image thresholding added to it by some external
technique where the threshold process. The recorded image I
value is varied at each pixel. at pixel i j is then the sum
A common technique is to use of the true signal S and the
the average intensity in a neigh- noise N .
bourhood to set the threshold. Ii j = Si j + Ni j
The noise added at each pixel
i j could be different.
adjacent: Commonly meaning
“next to each other”, whether
in a physical sense of being
connected pixels in an image,
Thresholded image regions sharing some
Image, I Smoothed, S
I > S–6 common boundary, nodes in
a graph connected by an arc
adaptive triangulation: See or components in a geometric
adaptive meshing. model sharing some common
bounding component, etc.
adaptive visual servoing: See Formally defining “adjacent”
visual servoing. can be somewhat heuristic
additive color: The way in which because you may need a way
multiple wavelengths of light to specify closeness (e.g., on

12
a quantized grid of pixels) or is invariant under affine trans-
consider how much shared formations.
“boundary” is required before affine camera: A special case
two structures are adjacent. of the projective camera that
adjacency: See adjacent. is obtained by constraining
the 3 × 4 camera parameter
adjacency graph: A graph that matrix T such that T31 = T32 =
shows the adjacency between T33 = 0 and reducing the cam-
structures, such as segmented era parameter vector from 11
image regions. The nodes of degrees of freedom to 8.
the graph are the structures and
an arc implies adjacency of the affine curvature: A measure
two structures connected by the of curvature based on the
arc. This figure shows the graph affine arc length, . For a para-
associated with the segmented metric equation of a curve
image on the left: f u = xu yu, its affine
curvature, , is
  = x   y    − x   y   
2 2
affine flow: A method of find-
ing the movement of a sur-
3 8
1 3
1 face patch by estimating the
8
affine transformation param-
4 4 5 7 eters required to transform the
5 7 6 patch from its position in one
6
view to another.
Regions Adjacency graph
affine fundamental matrix:
The fundamental matrix which
affine: A term first used by Euler. is obtained from a pair of
Affine geometry is a study of cameras under affine view-
properties of geometric objects ing conditions. It is a 3 × 3
that remain invariant under matrix whose upper left 2 × 2
affine transformations (map- submatrix is all zero.
pings). These include: parallel- affine invariant: An object or
ness, cross ratio, adjacency. shape property that is not
changed (i.e., is invariant) by
affine arc length: For a para-
the application of an affine
metric equation of a curve transformation. See also in-
f u = xu yu, arc length is variant.
not preserved under an affine
transformation. The affine affine length: See affine arc
length length.
 u 1
affine moment: Four shape
u = ẋ ÿ − ẍ ẏ 3 measures derived from second-
0 and third-order moments that

13
remain invariant under where the ambiguity in the
affine transformations. They choice of basis is affine only.
are given by Planes that are parallel in the
− 211 Euclidean basis are parallel
I1 = 20 024 in the affine reconstruction.
00 A projective reconstruction
I2 =  230 203 − 6 30 21 12 03 can be upgraded to affine
by identification of the plane
+4 30 312 + 4 321 03 at infinity, often by locating
the absolute conic in the
−3 221 212 / 10
00 reconstruction.
I3 =  20  21 03 − 212  affine stereo: A method of scene
reconstruction using two cali-
− 11  30 03 − 21 12  brated views of a scene from
+ 02  30 12 − 221 / 700 known view points. It is a sim-
ple but very robust approxi-
I4 =  320 203 − 6 220 11 12 03 mation to the geometry of
stereo vision, to estimate pos-
−6 220 02 21 03 itions, shapes and surface
orientations. It can be cali-
+9 220 02 212 brated very easily by observ-
+12 20 211 21 03 ing just four reference points.
Any two views of the same pla-
+6 20 11 02 30 03 nar surface will be related by
an affine transformation that
−18 20 11 02 21 12 maps one image to the other.
−8 311 30 03 This consists of a translation
and a tensor, known as the
−6 20 202 30 12 disparity gradient tensor repre-
senting the distortion in image
+9 20 202 221 shape. If the standard unit vec-
+12 211 02 30 12 tors X and Y in one image are
the projections of some vec-
−6 11 202 30 21 tors on the object surface and
the linear mapping between
+ 302 230 / 11
00 images is represented by a 2 ×
3 matrix A, then the first two
where each is the associated columns of A will be the corres-
central moment. ponding vectors in the other
affine quadrifocal tensor: The image. Since the centroid of
form taken by the quadrifocal the plane will map to both
tensor when specialized to the image centroids, it can be used
viewing conditions modeled by to find the surface orientation
the affine camera. affine transformation: A spe-
affine reconstruction: A three cial set of transformations
dimensional reconstruction in Euclidean geometry that
14
preserve some properties of In 2D the matrix is 2 × 2; in 3D
the construct being trans- it is 3 × 3.
formed. affine trifocal tensor: The form
taken by the trifocal tensor
when specialized to the view-
ing conditions modeled by the
affine camera.
affinely invariant region:
Image patches that automat-
ically deform with changing
viewpoint in such a way that
they cover identical physical
parts of a scene. Since such
regions can are describable by
Affine transformations pre- a set of invariant features they
serve: are relatively easy to match
• Collinearity of points: if three between views under changing
points belong to the same illumination.
straight line, their images
under affine transformations agglomerative clustering: A
also belong to the same line class of iterative clustering
and the middle point remains algorithms that begin with a
between the other two points. large number of clusters and at
• Parallel lines remain paral- each iteration merge pairs (or
lel, concurrent lines remain tuples) of clusters. Stopping
concurrent (images of inter- the process at a certain number
secting lines intersect). of iterations gives the final set
• The ratio of length of line seg- of clusters, or the process can
ments of a given line remains be run until only one cluster
constant. remains, and the progress of
• The ratio of areas of two tri- the algorithm represented as a
angles remains constant. dendrogram.
• Ellipses remain ellipses and albedo: Whiteness. Originally a
the same is true for parabolas term used in astronomy to
and hyperbolas. describe reflecting power.
• Barycenters of triangles (and
other shapes) map into the
corresponding barycenters. Albedo values

Analytically, affine transform-


ations are represented in the
matrix form 1.0 0.75 0.5 0.25 0.0

f x = Ax + b
If a body reflects 50% of the
where the determinant det(A) light falling on it, it is said to
of the square matrix A is not 0. have albedo 05.

15
algebraic distance: A linear dis-
tance metric commonly used
in computer vision applications
because of its simple form
and standard matrix based least
mean square estimation oper-
ations. If a curve or surface is
defined implicitly by f  x a = 0
(e.g., x · a = 0 for a hyper-
plane) the algebraic distance of
a point xi to the surface is simply
 .
xi  a
f 
aliasing: The erroneous replace-
ment of high spatial frequency
(HF) components by low-
frequency ones when a signal
is sampled. The affected HF
components are those that
are higher than the Nyquist
frequency, or half the sampling
frequency. Examples include
the slowing of periodic signals
by strobe lighting, and corrup-
tion of areas of detail in image
resizing. If the source signal has
no HF components, the effects
of aliasing are avoided, so the
low pass filtering of a signal
to remove HF components
prior to sampling is one form alignment: An approach to
of anti-aliasing. The image geometric model matching by
below is the perspective pro- registering a geometric model
jection of a checkerboard. The to the image data.
image is obtained by sampling ALVINN: Autonomous Land Ve-
the scene at a set of integer hicle In a Neural Network. An
locations. First figure: The early attempt, at Carnegie-
spatial frequency increases as Mellon University, to learn
the plane recedes, producing a complex behaviour (man-
aliasing artifacts (jagged lines euvering a vehicle) by observing
in the foreground, moiré humans.
patterns in the background). ambient light: Illumination by
Second figure: removing diffuse reflections from all sur-
high-frequency components faces within a scene (including
(i.e., smoothing) before down- the sky, which acts as an exter-
sampling mitigates the effect. nal distant surface). In other
16
words, light that comes from in wide-screen movie photo-
all directions, such as the sky graphy.
on a cloudy day. Ambient light anatomical map: A biological
ensures that all surfaces are illu- model usable for alignment
minated, including those not with or region labeling of a cor-
directly facing light sources. responding image dataset. For
AMBLER: An autonomous active example, one could use a model
vision system using both of the brain’s functional regions
structured light and sonar, to assist in the identification
developed by NASA and of brain structures in an NMR
Carnegie-Mellon University. It dataset.
is supported by a 12-legged AND operator: A boolean logic
robot and is intended for
operator that combines two
planetary exploration.
input binary images, applying
amplifier noise: Spurious addi- the AND logic
tive noise signal generated by
p q p&q
the electronics in a sampling
device. The standard model for 0 0 0
this type of noise is Gaussian. 0 1 0
It is independent of the signal. 1 0 0
In color cameras, where more 1 1 1
amplification is used in the blue at each pair of corresponding
color channel than in the green pixels. This approach is used
or red channel there tends to to select image regions. The
be more noise in the blue chan- rightmost image below is the
nel. In well-designed electron- result of ANDing the two left-
ics amplifier noise is generally most images.
negligible.
analytic curve finding: A
method of detecting parametric
curves by first transforming data
into a feature space that is then
searched for the hypothesized
curve parameters. Examples
might be line finding using the angiography: A method for im-
Hough transform. aging blood vessels by intro-
ducing a dye that is opaque
anamorphic lens: A lens hav-
when photographed by X-ray.
ing one or more cylindrical sur-
Also the study of images
faces. Anamorphic lenses are
obtained in this way.
usedinphotographytoproduce
images that are compressed in angularity ratio: Given two
onedimension.Imagescanlater figures, X and Y , i X  and
be restored to true form using j Y  are angles subtending
another reversing anamorphic convex parts of the contour
lens set. This form of lens is used of the figure X and k X  are

17
angles subtending plane parts aperture reduces the amount
of the contour of figure X , of light available, but increases
then the angularity ratios are: the depth of field. This figure
shows nearly closed (left) and
 i X  nearly open (right) aperture
i 360 positions:
and

i j X 

k k X 

anisotropic filtering: Any fil-


tering technique where the fil-
ter parameters vary over the closed open
image or signal being filtered.
anomalous behavior detection: aperture control: Mechanism
Special case of surveillance for varying the size of a camera’s
where human movement is aperture.
analyzed. Used in particular to
detect intruders or behavior aperture problem: If a motion
likely to precede or indicate sensor has a finite receptive
crime. field, it perceives the world
through something resembling
antimode: The minimum be- an aperture, making the motion
tween two maxima. For of a homogeneous contour
example one method of seem locally ambiguous. Within
threshold selection is done by that aperture, different physical
determining the antimode in a motions are therefore indis-
bimodal histogram. tinguishable. For example,
the two alternative motions
f(x) of the square below are iden-
Antimode tical in the circled receptive
fields:

Before After

aperture: Opening in the lens


diaphragm of a camera through
which light is admitted. This
device is often arranged so that
the amount of light can be
controlled accurately. A small

18
apparent contour: The appar- models in the model base.
ent contour of a surface S in There are many approaches to
3D, is the set of critical val- appearance based recognition,
ues of the projection of S on a such as using a principal
plane, in other words, the sil- component model to encode
houette. If the surface is trans- all appearances in a com-
parent, the apparent contour pressed framework, using
can be decomposed into a col- color histograms to summarize
lection of closed curves with the appearance, or using a set
double points and cusps. The of local appearance descriptors
convex envelope of an apparent such as Gabor filters extracted
contour is also the boundary of at interest points. A common
its convex hull. feature of these approaches
apparent motion: The 3D mo- is learning the models from
tion suggested by the image examples.
motion field, but not neces-
sarily matching the real 3D appearance based tracking:
motion. The reason for this Methods for object or target
mismatch is the motion fields recognition in real time, based
may be ambiguous, that is, may on image pixel values in each
be generated by different 3D frame rather than derived fea-
motions, or light source move- tures. Temporal filtering, such
ment. Mathematically, there as the Kalman filter, is often
may be multiple solutions to used.
the problem of reconstructing appearance change: Changes
3D motion from the image in an image that are not easily
motion field. See also visual accounted for by motion, such
illusion, motion estimation. as an object actually changing
appearance: The way an object form.
looks from a particular view- appearance enhancement
point under particular lighting
transform: Generic term for
conditions.
operations applied to images
appearance based recognition: to change, or enhance, some
Object recognition where the aspect of them. Examples
object model encodes the pos- include brightness adjustment,
sible appearances of the object contrast adjustment, edge
(as contrasted with a geometric sharpening, histogram equal-
model that encodes the shape ization, saturation adjustment
as used in model based or magnification.
recognition). In principle, it
is impossible to encode all appearance flow: Robust
appearances when occlusions methods for real time object
are considered; however, recognition from a sequence
small numbers of appearances of images depicting a moving
can often be adequate, espe- object. Changes in the images
cially if there are not many are used rather than the images

19
themselves. It is analogous to f(x)
processing using optical flow. arc length
appearance model: A repre-
sentation used for interpreting
images that is based on the f(x)
appearance of the object.
These models are usually
learned by using multiple
views of the objects. See also
x=a x=b x
active appearance model and
appearance based recognition.
appearance prediction: Part of arc of graph: Two nodes in a
the science of appearance graph can be connected by an
engineering, where an object arc. The dashed lines here are
texture is changed so that the arcs:
the viewer experience is pre-
dictable.
A B
appearance singularity: An
image position where a small
change in viewer position can
cause a dramatic change in the
appearance of the observed
scene, such as the appearance C
or disappearance of image
features. This is contrasted
with changes occurring when architectural model recon-
in a generic viewpoint. For struction: A generic term for
example, when viewing the cor- reverse engineering buildings
ner of a cube from a distance, a based on collected 3D data
small change in viewpoint still as well as libraries of building
leaves the three surfaces at the constraints.
corner visible. However, when
the viewpoint moves into the area: The measure of a region
infinite plane containing one of or surface’s extension in some
the cube faces (a singularity), given units. The units could be
one or more of the planes image units, such as square pix-
disappears. els, or in scene units, such as
square centimeters.
arc length: If f is a function such
that its derivative f  is continu- area based: Image operation
ous on some closed interval that is applied to a region of
a b then the arc length of an image, as opposed to pixel
f from x = a to x = b is the based.
integral
 b array processor: A group
1 + f  x2 dx of time-synchronized process-
a ing elements that perform

20
computations on data dis- articulated object segmenta-
tributed across them. Some tion: Methods for acquiring an
array processors have elements articulated object from 2D or
that communicate only with 3D data.
their immediate neighbors, as articulated object tracking:
in the topology shown below. Tracking an articulated object
See also single instruction in an image sequence. This
multiple data. includes both the pose of the
object and also its shape param-
eters, such as joint angles.
aspect graph: A graph of the set
of views (aspects) of an object,
where the arcs of the graph are
transitions between two neigh-
boring views (the nodes) and
a change between aspects is
called a visual event. See also
characteristic view. This graph
shows some of the aspects of the
hippopotamus

arterial tree segmentation:


Generic term for methods
used in finding internal pipe-
like structures in medical object
images. Example image types
are NMR images, angiograms aspects
and X-rays. Example trees are
bronchial systems and veins.
articulated object: An object aspect ratio: 1) The ratio of the
composed by a number of (usu- sides of the bounding box of an
ally) rigid subparts or com- object, where the orientation of
ponents connected by joints, the box is chosen to maximize
which can be arranged in a num- this ratio. Since this measure is
ber of different configurations. scale invariant it is a useful met-
The human body is a typical ric for object recognition. 2) In
example. a camera, it is the ratio of the
articulated object model: A horizontal to vertical pixel sizes.
representation of an articulated 3) In an image, it is the ratio
object that includes both its of the image width to height.
separate parts and their range For example, an image of 640
of movement (typically joint by 480 pixels has an aspect
angles) relative to each other. ratio of 4:3.

21
aspects: See characteristic view It occurs when a lens has
and aspect graph. irregular curvature causing
association graph: A graph light rays to focus at an
used in structure matching, area, rather than at a point.
such as matching a geometric It may be corrected with a
model to a data description. In toric lens, which has a greater
this graph, each node corres- refractive index on one axis
ponds to a pairing between a than the others. In human
model and a data feature (with eyes, astigmatism often occurs
the implicit assumption that with nearsightedness and far-
they are compatible). Arcs in sightedness.
the graph mean that the two atlas based segmentation: A
connected nodes are pairwise segmentation technique used
compatible. Finding maximal in medical image processing,
cliques is one technique for especially with brain images.
finding good matches. The Automatic tissue segmentation
graph below shows a set of is achieved using a model of
pairings of model features A, the brain structure and imagery
B and C with image features a, (see atlas registration) com-
b, c and d. The maximal clique piled with the assistance of
consisting of A:a, B:b and C:c human experts. See also image
is one match hypothesis. segmentation.
atlas registration: An image
registration technique used
A:a B:b
in medical image processing,
especially to register brain
images. An atlas is a model
(perhaps statistical) of the char-
C:c C:d
B:d acteristics of multiple brains,
providing examples of normal
and pathological structures.
astigmatism: Astigmatism is a This makes it possible to take
refractive error where the light into account anomalies that
is focused within an optical sys- single-image registration could
tem, such as in this example. not. See also medical image
registration.
ATR: See automatic target recog-
nition.
attention: See visual attention.
attenuation: The reduction of
a particular phenomenon, for
instance, noise attenuation as
the reduction of image noise.
attributed graph: A graph use-
ful for representing different
22
n
properties of an image. Its and f¯ = 1
n i=1 fi is the sample
nodes are attributed pairs of mean.
image segments, their color or
shape for example. The rela- autofocus: Automatic determin-
tions between them, such as ation and control of image
relative texture or brightness sharpness in an optical or vision
are encoded as arcs. system. There are two major
variations in this control sys-
augmented reality: Primarily a tem: active focusing and pas-
projection method that adds sive focusing. Active autofocus
graphics or sound, etc as an is performed using sonar or
overlay to original image or infrared signal to determine the
audio. For example, a fire- object distance. Passive auto-
fighter’s helmet display could
focus is performed by analyzing
show exit routes registered to
the image itself to optimize dif-
his/her view of the building.
ferences between adjacent pix-
autocalibration: The recovery of els in the CCD array.
a camera’s calibration using
only point (or other feature) automatic: Performed by a ma-
correspondences from multiple chine without human interven-
uncalibrated images and geo- tion. The opposite of “manual”.
metric consistency constraints automatic target recognition
(e.g., that the camera settings (ATR): Sensors and algorithms
are the same for all images in a used for detecting hostile
sequence). objects in a scene. Sensors
autocorrelation: The extent to are of many different types,
which a signal is similar to sampling in infrared, visible
shifted copies of itself. For an light and using sonar and radar.
infinitely long 1D signal ft  autonomous vehicle: A mobile
 → , the autocorrelation at robot controlled by computer,
a shift t is with human input operating
 only at a very high level, stating
Rf t = f t f t + tdt the ultimate destination or
− task for example. Autonomous
navigation requires the visual
The autocorrelation function
tasks of route detection, self-
Rf always has a maximum at 0.
localization, landmark loca-
A peaked autocorrelation func-
tion and obstacle detection,
tion decays quickly away from
t = 0. The sample autocor- as well as robotics tasks such
relation function of a finite as route planning and motor
set of values f1n is rf dd = control.
1     n − 1 where autoregressive model: A model
that uses statistical properties
n−d
 fi − f¯  fi+d − f¯  of past behavior of some vari-
rf d = i=1
n able to predict future behavior
¯2
i=1  fi − f  of that variable. A signal xt at
23
time t satisfies an autoregressive
p
model if xt = n=1 n xt−n + t ,
where t is noise.
autostereogram: An image simi-
lar to a random dot stereogram
in which the corresponding
features are combined into
a single image. Stereo fusion axis of rotation: A line about
allows the perception of a 3D which a rotation is performed.
shape in the 2D image. Equivalently, the line whose
points are fixed under the
average smoothing: See mean action of a rotation. Given a 3D
smoothing. rotation matrix R, the axis is the
AVI: Microsoft format for audio eigenvector of R corresponding
and video files (“audio video to the eigenvalue 1.
interleaved”). Unlike MPEG, it axis-angle curve representa-
is not a standard, so that com- tion: A rotation representa-
patibility of AVI video files and tion based on the amount of
AVI players is not always guar- twist  about the axis of rota-
anteed. tion, here a unit vector a . The
axial representation: A region quaternion rotation represen-
representation that uses a curve tation is similar.
to describe the image region.
The axis may be a skeleton
derived from the region by a
thinning process.
axis of elongation: 1) The
line that minimizes the second
moment of the data points. If
xi  are the data points, and

dx  L is the distance from
point x to line L, then the
axis of elongation A minimizes

i dxi  A2 . Let  be the mean
of xi. Define the scatter matrix
S = i  xi − 
 xi − 
 T . Then
the axis of elongation is the
eigenvector of S with the largest
eigenvalue. See also principal
component analysis. The figure
below shows this axis of elonga-
tion for a set of points. 2) The
longer midline of the bounding
box with largest length-to-width
ratio. A possible axis of elonga-
tion is the line in this figure:

24
B

B-rep: See surface boundary side not facing the viewer.


representation. 2) The computation of a 3D
b-spline: A curve approximation quantity from its 2D projection.
spline represented as a combi- For example, a 2D homoge-
nation of basis functions: neous point x is the projection
of a 3D point X by a perspective

m
projection matrix P, so x =
ct = a i Bi x
i=0 PX . The backprojection of x
where Bi are the basis func- is the 3D line nullP + P+ x 
tions and a  i are the control where P+ is the pseudoinverse
points. B-splines do not neces- of P. 3) Sometimes used inter-
sarily pass through any of changeably with triangulation.
the control points; however, 4) Technique to compute the
if b-splines are calculated for attenuation coefficients from
adjacent sets of control points intensity profiles covering a
the curve segments will join total cross section under vari-
up and produce a continuous ous angles. It is used in CT and
curve. MRI to recover 3D from essen-
tially 2D images. 5) Projection
b-spline fitting: Fitting a of the estimated 3D position
b-spline to a set of data of a shape back into the 2D
points. This is useful for noise image from which the shape’s
reduction or for producing a pose was estimated.
more compact model of the
observed curve. background: In computer vi-
sion, generally used in the con-
b-spline snake: A snake made text of object recognition. The
from b-splines. background is either (1) the
back projection: 1) A form area of the scene behind an
of display where a translucent object or objects of interest or
screen is illuminated from the (2) the part of the image whose

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

25
pixels sample from the back-
ground in the scene. As
opposed to foreground. See
also figure/ground separation.
background labeling: Methods
for differentiating objects in
the foreground of images or
those of interest from those in
the background.
background modeling: Seg-
mentation or change detection
method where the scene behind
the objects of interest is mod-
eled as a fixed or slowly chan-
ging background, with possible
foreground occlusions. Each
pixel is modeled as a distri-
bution which is then used to
decide if a given observation
belongs to the background or an
occluding object.
background normalization: the second is the background
Removal of the background by estimate obtained by dilation
some image processing tech- with ball 9 9 structuring
nique to estimate the back- element and the third is the
ground image and then (normalized) division of the
dividing or subtracting the input image by the background
background from an original image.
image. The technique is useful
for when the background backlighting: A method of illu-
is non-uniform. The images minating a scene where the
below illustrate this where the background receives more illu-
first shows the input image, mination than the foreground.
Commonly this is used to pro-
duce silhouettes of opaque
objects against a lit background,
for easier object detection.
bandpass filter: A signal pro-
cessing filtering technique that
allows signals between two
specified frequencies to pass
but cuts out signals at all other
frequencies.
26
back-propagation: One of the excitation when a bar is in its
best-studied neural network receptive field. 2) Device used
training algorithms for super- by thirsty undergraduates.
vised learning. The name arises bar-code reading: Methods and
from using the propagation algorithms used for the detec-
of the discrepancies between tion, imaging and interpret-
the computed and desired ation of black parallel lines
responses at the network out-
of different widths arranged
put back to the network inputs.
to give details on products
The discrepancies are one of
or other objects. Bar codes
the inputs into the network
themselves have many different
weight recomputation process.
coding standards and arrange-
back-tracking: A basic tech- ments. An example bar code is:
nique for graph searching:
if a terminal but non-solution
node is reached, search does
not terminate with failure, but
continues with still unexplored
children of a previously visit-
ed non-terminal node. Classic barycentrum: See center of
back-tracking algorithms are mass.
breadth-first, depth-first, and barrel distortion: Geometric
A*. See also graph, graph lens distortion in an optical
searching, search tree. system that causes the outlines
bar: A raw primal sketch primi- of an object to curve outward,
tive that represents a dark line forming a barrel shape. See
segment against a lighter back- also pincushion distortion.
ground (or its inverse). Bars bas-relief ambiguity: The
are also one of the primitives in ambiguity in reconstructing
Marr’s theory of vision. The fol- a 3D object with Lambertian
lowing is a small dark bar ob- reflectance using shading from
served inside a receptive field: an image under orthographic
projection. If the true surface
Receptive field is zx y, then the family of
surfaces azx y + bx + cy gen-
erate identical images under
these viewing conditions, so
Bar
any reconstruction, for any
values of a b c is equally
valid. The ambiguity is thus up
to a three-parameter family.
baseline: Distance between two
bar detector: 1) Method or algo- cameras used in a binocular
rithm that produces maximum stereo system.
27
Object point
likelihood, pc, of observing
the class. The Bayesian classifier
Epipolar plane is the most common statistical
classifier currently used in
Left image plane Right image plane computer vision processes.
Bayesian filtering: A probabil-
istic data fusion technique. It
uses a formulation of prob-
Left camera Stereo baseline Right camera abilities to represent the
system state and likelihood
basis function representation: functions to represent their
A method of representing a relationships. In this form,
function as a sum of simple Bayes’ rule can be applied and
(usually orthonormal ) ones. further related probabilities
For example the Fourier trans- deduced.
form represents functions as Bayesian model: A statistical
a weighted sum of sines and modeling technique based on
cosines. two input models:
Bayes’ rule: The relationship
between the conditional prob- 1. a likelihood model p yx h,
ability of event A given B and describing the density of ob-
the conditional probability of serving y given x and h.
event B given event A. This Regarded as a function of h,
expressed as for a fixed y and x, the density
is also known as the likelihood
PBAPA
PAB = of h.
PB 2. a prior model, phD0  which
providing that PB = 0. specifies the a priori density
of h given some known infor-
Bayesian classifier: A math-
mation denoted by D0 before
ematical approach to classifying
any new data are taken into
a set of data, by selecting
account.
the class most likely to have
generated that data. If x is the The aim of the Bayesian model
data and c is a class, then the is to predict the density for
probability of that class is pc x . outcomes y in test situations
This probability can be hard x given data D = DT  D0 with
to compute so Bayes’ rule can both pre-known and training
then be used here, which says data.
cpc
that pcx  = Pxpx
. Then we
can compute the probability of Bayesian model learning: See
the class pcx  in terms of the probabilistic model learning.
probability of having observed Bayesian network: A belief
the given data x with, P x c, modeling approach using a
and without, p x  assuming graph structure. Nodes are
the class c plus the a priori variables and arcs are implied
28
causal dependencies and are of thin metal plates. If a set
given probabilities. These of landmarks is distributed on
networks are useful for fusing two infinite flat metal plates
multiple data (possibly of and the differences in the coor-
different types) in a uniform dinates between the two sets
and rigorous manner. are vertical displacements of
BDRF: See bidirectional reflect- the plate, one Cartesian coor-
ance distribution function. dinate at a time, then the
bending energy is the energy
beam splitter: An optical system required to bend the metal
that divides unpolarized light plate so that the landmarks
into two orthogonally polar- are coincident. When applied
ized beams, each at 90 to the to images, the sets of land-
other, as in this example:
marks may be sets of fea-
tures. 2) Denotes the amount
of energy that is stored due to
an object’s shape.
best next view: See next view
planning.
Incoming Bhattacharyya distance: A
beam measure of the (dis)similarity
of two probability distribu-
tions. Given two arbitrary
distributions pi xi=12 the
Bhattacharyya distance
behavior analysis: Model based between them is
 
vision techniques for identi- d 2 = −log  p1 xp2 xdx
fying and tracking behavior in
humans. Often used for threat bicubic spline interpolation:
analysis. A special case of surface
behavior learning: Generation interpolation that uses cubic
of goal-driven behavior mod- spline functions in two dimen-
els by some learning algorithm, sions. This is like bilinear
for example reinforcement surface interpolation except
learning. that the interpolating surface is
Beltrami flow: A noise suppres- curved, instead of flat.
sion technique where images bidirectional reflectance dis-
are treated as surfaces and tribution function (BRDF): If
the surface area is minimized the energy arriving at a sur-
in such a way as to pre- face patch, denoted Ei  i ,
serve edges. See also diffusion and the energy radiated in a
smoothing. particular direction is denoted
bending energy: 1) A metaphor Le  e  in polar coordinates,
borrowed from the mechanics then BRDF is defined as the

29
ratio of the energy radiated x y is interpolated from the
from a patch of a surface in values at the four surrounding
some direction to the amount points. In the diagram below
of energy arriving there. The fbilinear x y =
radiance is determined from
the irradiance by A+B
Le  e  = fi  i  e  e  d1 + d 1 d2 + d 2 

Ee  e  where
A = d1 d2 f11 + d1 d2 f21
where the function f is the
bidirectional reflectance distri- B = d1 d2 f12 + d1 d2 f22
bution function. This function
often only depends on the dif- The gray lines offer an easy
ference between the incident aide memoire: each function
angle i of the ray falling on value fij is multiplied by the
the surface and the angle e of two closest d values.
the reflected ray. The geometry
is illustrated by:
n f11 f21
L E
d2
d1 d1
φe (x,y)
φi
d2
θi f12 f22
θe

bilateral filtering: A non-iter-


ative alternative to anisotropic bilinearity: A function of two
filtering where images can be variables x and y is bilinear in
smoothed but edges present in x and y if it is linear in y for
them are preserved. fixed x and linear in x for fixed
y. For example, if x and y are
bilateral smoothing: See bilat- vectors and A is a matrix such
eral filtering. that x  A y is defined, then the
bilinear surface interpolation: function f x  y  = x  A y + x + y
To determine the value of is bilinear in x and y .
a function fx y at an arbi- bimodal histogram: A histo-
trary location x y, of which gram with two pronounced
only discrete samples fij = peaks, or modes. This is a
 fxi  yj ni=1 m
j=1
are available. convenient intensity histogram
The samples are arranged on a for determining a binarizing
2D grid, so the value at point threshold. An example is:

30
10000 reduction, image enhancement
9000
8000 and image segmentation. The
7000 two most basic operations are
6000
5000 dilation and erosion. These
4000
3000
operators take two pieces
2000 of data as input: the input
1000
0
binary image and a structuring
0 50 100 150 200 250
element (also known as a ker-
nel). Virtually all other mathe-
bin-picking: The problem of matical morphology operators
getting a robot manipula- can be defined in terms of com-
tor equipped with vision binations of erosion and dila-
sensors to pick parts, for tion along with set operators
instance screws, bolts, com- such as intersection and union.
ponents of a given assem- Some of the more impor-
bly, from a random pile. tant are opening, closing and
A classic challenge for hand– skeletonization. Binary mor-
eye robotic systems, involving phology is a special case
at least segmentation, object of gray scale mathematical
recognition in clutter and pose morphology. See also math-
estimation. ematical morphology.
binarization: See thresholding. binary moment: Given a binary
image Bi j, there is an infinite
binary image: An image whose family of moments indexed by
pixels can either be in an ‘on’ the integer values p and q. The
or ‘off’ state, represented by pqthmoment is given by mpq =
the integers 1 and 0 respect-  p q
i j i j Bi j.
ively. An example is:
binary noise reduction:
A method of removing salt-
and-pepper noise from binary
images. For example, a point
could have its value set to
the median value of its eight
neighbors.
binary object recognition:
Model based techniques and
algorithms used to recognize
objects from their binary
binary mathematical morph- images.
ology: A group of shape-based
operations that can be applied binary operation: An operation
to binary images, based around that takes two images as inputs,
a few simple mathematical con- such as image subtraction.
cepts from set theory. Com- binary region skeleton: See
mon usages include noise skeleton.

31
binocular: A system that has two in the bipartite graph, in
cameras looking at the same other words, a maximal set of
scene simultaneously usually nodes from the two subsets
from a similar viewpoint. See connected by arcs such that
also stereo vision. each node is connected by
binocular stereo: A method of exactly one arc. One maximal
deriving depth information matching in the graph below
from a pair of calibrated with sets V 1 = A B C and
cameras set at some dis- V 2 = X Y pairs A Y and
tance apart and pointing in C X. The selected arcs are
approximately the same direc- solid, and other arcs are
tion. Depth information comes dashed.
from the parallax between
the two images and relies V1 V2
on being able to derive
the same feature in both A
images. X

binocular tracking: A method


B
that tracks objects or features
in 3D using binocular stereo.
Y
biometrics: The science of dis- C
criminating individuals from
accurate measurement of their
physical features. Example bit map: An image with one bit
biometric measurements are per pixel.
retinal lines, finger lengths, fin- bit-plane encoding: An image
gerprints, voice characteristics compression technique where
and facial features. the image is broken into bit
bipartite matching: Graph planes and run length cod-
matching technique often ap- ing is applied to each plane.
plied in model based vision To get the bit planes of an 8-bit
to match observations with gray scale image, the picture
models or stereo to solve the has a boolean AND operator
correspondence problem. Ass- applied with the binary value
ume a set V of nodes corresponding to the desired
partitioned into two non- plane. For example, ANDing
intersecting subsets V1 the image with 00010000 gives
and V 2 . In other words, the fifth bit plane.
V = V 1 ∪ V 2 and V 1 ∩ V 2 = 0.
The only arcs E in the graph bitangent: See curve bitangent.
lie between the two subsets, bitshift operator: The bitshift
i.e., E ⊂ V 1 × V 2  ∪ V 2 × V 1 . operator shifts the binary rep-
This is the bipartite graph. The resentation of each pixel to
bipartite matching problem is the left or right by a set num-
to find a maximal matching ber of bit positions. Shifting

32
01010110 right by 2 bits gives group ( blob) of connected
00010101. The bitshift oper- pixels; extract physical meas-
ator is a computationally cheap urements from the blobs.
method of dividing or multiply- blob extraction: A part of blob
ing an image by a power of 2. analysis. See connected com-
A shift of n positions is a multi- ponent labeling.
plication or division by 2n .
block coding: A class of signal
blanking: Clearing a CRT or coding techniques. The input
video device. The vertical signal is partitioned into fixed-
blanking interval ( VBI) in tele- size blocks, and each block is
vision transmission is used to transmitted after translation to
carry data other than audio and a smaller (for compression ) or
video. larger (for error-correction)
blending operator: An image block size.
processing operator that cre- blocks world: The blocks world
ates a third image C by a is the simplified problem do-
weighted combination of the main in which much early
input images A and B. In artificial intelligence and
other words, Ci j =
Ai j + computer vision research was
Bi j for two scalar weights
done. The essential feature
and . Usually,
+ = 1. The of the blocks world is the
results of some process can restriction of analysis to
be illustrated by blending the simplified geometric objects
original and result images. An such as polyhedra and the
example of blending that adds assumption that geometric
a detected boundary to the descriptions such as image
original image is: edges can be easily recovered
from the image. An example
blocks world scene is:

blob analysis: Blob analysis is


a group of algorithms used
in medical image analysis.
There are four steps in the
process: derive optimum fore-
ground/background threshold
to segment objects from
their background; binarize blooming: Blooming occurs
the images by applying a when too much light enters
thresholding operation; per- a digital optical system. The
form region growing and light saturates CCD pixels,
assign a labels to each discrete causing charge to overspill
33
into surrounding elements set-theoretic description of a
giving either vertical or hori- region boundary. For an ex-
zontal streaking in the image ample, see chain code.
(depending on the orientation boundary detection: An image
of the CCD). processing algorithm that finds
Blum’s medial axis: See medial and labels the edge pixels
axis transform between two neighboring
blur: A measure of sharpness in image segments after seg-
an image. Blurring can arise mentation. The boundary
from the sensor being out of represents physical discontinu-
focus, noise in the environ- ities in the scene, for example
ment or image capture pro- changes in color, depth, shape
cess, target or sensor motion, or texture.
as a side effect of an image boundary grouping: An image
processing operation, etc. A processing algorithm that
blurred image is: attempts to complete a fully
connected image-segment
boundary from many broken
pieces. A boundary might be
broken because it is common-
place for sharp transitions in
property values to appear in
the image as slow transitions,
or sometimes disappear due
to noise, blurring, digitization
border detection: See bound- artifacts, poor lighting or
ary detection. surface irregularities, etc.
border tracing: Given a pre- boundary length: The length
labeled (or segmented) image, of the boundary of an object.
the border is the inner layer of See also perimeter.
each region’s connected pixel boundary matching: See curve
set. It can be traced using a sim- matching.
ple 8-connective or 4-connect-
boundary property: Character-
ive stepping procedure in a
istics of a boundary, such as arc
3 × 3 neighborhood.
length, curvature, etc.
boundary: A general term for
the lower dimensional struc- boundary representation: See
ture that separates two objects, boundary description and
such as the curve between B-Rep.
neighboring surfaces, or boundary segmentation: See
surface between neighboring curve segmentation.
volume. boundary-region fusion: Re-
boundary description: Func- gion growing segmentation
tional, geometry based or approach where two adjacent

34
regions are merged when where n1 and n2 are the
their characteristics are refractive indices of the two
close enough to pass some materials.
similarity test. The candidate brightness: The quantity of radi-
neighborhood for testing ation reaching a detector after
similarity can be the pixels incidence on a surface. Often
lying near the shared region measured in lux or ANSI
boundary. lumens. When translated into
bounding box: The smallest an image, the values are
rectangular prism that com- scaled to fit the bit patterns
pletely encloses either an available. For example, if an
object or a set of points. The 8-bit byte is used, the max-
ratio of the length of box sides imum value is 255. See also
is often used as a classifica- luminance.
tion metric in model based brightness adjustment: In-
recognition. crease or decrease in the
bottom-up: Reasoning that pro- luminance of an image. To
ceeds from the data to the con- decrease, one can linearly
clusions. In computer vision, interpolate between the image
describes algorithms that use and a pure black image.
the data to generate hypothe- To increase, one can linearly
ses at a low level, that are extrapolate from a black image
refined as the algorithm pro- and the target. The extrapola-
ceeds. Compare top-down. tion function is
BRDF: See bidirectional reflect- v = 1 −
 ∗ i0 +
∗ i1
ance distribution function. where
is the blending fac-
break point detection: See tor (often between 0 and 1), v
curve segmentation. is the output pixel value and
i0 and i1 are the correspond-
breast scan analysis: See ing image and black pixels.
mammogram analysis. See also gamma correction and
Brewster’s angle: When light contrast enhancement.
reflects from a dielectric sur- Brodatz texture: A well-known
face it will be polarized perpen- set of texture images often
dicularly to the surface normal. used for testing texture-related
The degree of polarization algorithms.
depends on the incident angle
building detection: A general
and the refractive indices of the
term for a specific, model-
air and reflective medium. The
based set of algorithms for
angle of maximum polarization
finding buildings in data.
is called Brewster’s angle and
The range of data used is
is given by
  large, encompassing stereo
n1 images, range images, aerial
B = tan−1
n2 and ground-level photographs.
35
bundle adjustment: An algo- images. A small butterfly filter
rithm used to optimally deter- convolution kernel is
mine the three dimensional 0 −2 0
coordinates of points and cam- 1 2 1
era positions from two dimen- 0 −2 0
sional image measurements.
This is done by minimiz- It is often used in conjunction
ing some cost function that with the Hough transform for
includes the model fitting error finding peaks in the Hough fea-
and the camera variations. The ture space, particularly when
bundles are the light rays searching for lines. The line
between detected 3D features parameter values of  p  will
and each camera center. It is generally give a butterfly shape
these bundles that are itera- with a peak at the approximate
tively adjusted (with respect to correct values.
both camera centers and fea-
ture positions).
burn-in: 1) A phenomenon
of early tube-based cameras
and monitors where, if the
same image was presented for
long periods of time it became
permanently burnt into the
phosphorescent layer. Since
the advent of modern mon-
itors (1980s) this no longer
happens. 2) The practice
of shipping only electronic
components that have been
tested for long periods, in
the hope that any defects
will manifest themselves
early in the component’s life
(e.g., 72 hours of typical use).
3) The practice of discarding
the first several samples of an
MCMC process in the hope
that a very low-probability
starting point will be converge
to a high-probability point
before beginning to output
samples.
butterfly filter: A linear fil-
ter designed to respond
to “butterfly” patterns in
36
C

CAD: See computer aided de- as affine-, orthographic- or


sign. pinhole camera.
calculus of variations: See camera calibration: Methods
variational approach. for determining the position
calibration object: An object or and orientation of cameras and
small scene with easily locat- range sensors in a scene and
able features used for camera relating them to scene coor-
calibration. dinates. There are essentially
four problems in calibration:
1. Interior orientation. Determin-
ing the internal camera geom-
etry, including its principal
point, focal length and lens
distortion.
2. Exterior orientation. Determin-
ing the orientation and posi-
tion of the camera with respect
to some absolute coordinate
system.
3. Absolute orientation. Deter-
mining the transformation
between two coordinate sys-
tems, the position and orien-
camera: 1) The physical device tation of the sensor in the
used to acquire images. 2) The absolute coordinate system
mathematical representation from the calibration points.
of the physical device and its 4. Relative orientation. Determin-
characteristics such as position ing the relative position and
and calibration. 3) A class of orientation between two cam-
mathematical models of the eras from projections of cali-
projection from 3D to 2D, such bration points in the scene.

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

37
These are classic problems in This generally consists of six
the field of photogrammetry. degrees of freedom (three for
camera coordinates: 1) A rotation, three for translation ).
viewer-centered representa- It is often a component of
tion relative to the camera. camera calibration. Camera
The camera coordinate system position is sometimes called the
is positioned and oriented extrinsic parameters of the cam-
era. Multiple camera positions
relative to the scene coordinate
may be estimated simultan-
system and this relationship
eously with the reconstruction
is determined by camera
of 3D scene structure in
calibration. 2) An image
structure- and- motion algo-
coordinate system that places
rithms.
the camera’s principal point
at the origin 0 0, with unit Canny edge detector: The first
aspect ratio and zero skew. The of the modern edge detectors.
focal length in camera coor- It took account of the trade-off
dinates may or may not equal between sensitivity of edge
1. If image coordinates are detection versus the accuracy
such that the 3 × 4 projection of edge localization. The edge
matrix is of the form detector consists of four stages:
  1) Gaussian smoothing to re-
f 0 0 duce noise and remove small
 
0 f 0 R  t details, 2) gradient magnitude
0 0 1 and direction calculation,
3) non-maximal suppression
then the image and camera co- of smaller gradients by larger
ordinate systems are identical. ones to focus edge localization
camera geometry: The physical and 4) gradient magnitude
geometry of a camera system. thresholding and linking that
See also camera model. uses hysteresis so as to start
linking at strong edge pos-
camera model: A mathematical itions, but then also track
model of the projection from 3D weaker edges. An example of
(real world) space to the camera the edge detection results is:
image plane. For example see
pinhole camera model.
camera motion compensation:
See sensor motion compen-
sation.
camera motion estimation:
See sensor motion estimation.
camera position estimation:
Estimation of the optical pos- canonical configuration: A
ition of the camera relative to stereo camera configuration
the scene or observed structure. in which the optical axes of
38
the cameras are parallel, the cartography: The study of maps
baselines are parallel to the and map-building. Automated
image planes and the horizon- cartography is the development
tal axes of the image planes of algorithms that reduce the
are parallel. This results in manual effort in map building.
epipolar lines that are parallel cascaded Hough transform:
to the horizontal axes, hence An application of several
simplifying the search for successive Houghtransforms,
correspondences. with the output of one trans-
Optical Centers Optical Axes
formusedasinputtothenext.
cascading Gaussians: A term
Image referring to the fact that the
plane 1
convolution of a Gaussian with
Image plane 2
itself is another Gaussian.
Corresponding epipolar lines
CAT: See X-ray CAT.
cardiac image analysis: catadioptric optics: The gen-
Techniques involving the eral approach of using mirrors
development of 3D vision in combination with conven-
algorithms for tracking the tional imaging systems to get
motion of the heart from NMR wide viewing angles (e.g., 180 ).
and echocardiographic images. It is desirable that a catadiop-
tric system has a single view-
Cartesian coordinates: A pos- point because it permits the
ition description system where generation of geometrically cor-
an n-dimensional point, P,  is rect perspective images from
described by exactly n co- the captured images.
ordinates with respect to n lin-
early independent and often categorization: The subdivision
orthonormal vectors, known as of a set of elements into clearly
axes. distinct groups, or categories,
defined by specific proper-
Z ties. Also the assignment of
an element to a category or
P = (xc yc zc)
recognition of its category.
category: A group or class used
in a classification system.
For example, in mean and
P Gaussian curvature shape
classification, the local shape
xc of a surface is classified into
X
zc four main categories: planar,
ellipsoidal, hyperbolic, and
yc cylindrical. Another example is
the classification of observed
Y grazing animals into one of

39
{sheep, cow, horse}. See also composed of a high num-
categorization. ber of processing elements.
Particularly useful in machine
CBIR: See content based image vision applications when a sim-
retrieval. ple 1:N mapping is possi-
CCD: Charge-Coupled Device. A ble between image pixels and
solid state device that can processing elements. See also
record the number of photons systolic array and SIMD.
falling on it. center line: See medial line.
center of curvature: The cen-
ter of the circle of curvature (or
osculating circle) at a point P of
a plane curve at which the cur-
vature is nonzero. The circle of
curvature is tangent to the curve
 has the same curvature as
at P,
 and lies towards
the curve at P,
the concave (inner) side of the
curve. This figure shows the cir-
cle and center of curvature, C ,

of a curve at point P:
A 2D matrix of CCD ele-
ments are used, together with
a lens system, in digital cam-
eras where each pixel value in
the final images corresponds to
the output one or more of the
elements. C
CCIR camera: Camera fulfilling
P
color conversion and pixel for-
mation criteria laid out by
the Comité Consultatif Inter-
national des Radio.
center of mass: The point within
cell microscopic analysis: an object at which the force
Automated image processing of gravity appears to act. If the
procedures for finding and object can be described by a
analyzing different cell types multi-dimensional point set  xi 
from images taken by a micro- containing Npoints, the center
scope vision system. Common of mass is N1 Ni=0 xi f
xi , where
examples are the analysis of xi  is the value of the image
f
pre-cancerous cells and blood (e.g., binary or gray scale) at
cell analysis. point xi .
cellular array: A massively par- center of projection: The origin
allel computing architecture, of the camera reference frame

40
in the pinhole camera model. The image of the great circle
In such a camera, the projec- under central projection will
tion of a point in space is be a line. Also known as the
determined by the line passing gnomonic projection.
through the point itself and the centroid: See center of mass.
center of projection. See:
certainty representation: Any
CENTER OF LENS of a set of techniques for
PROJECTION
OPTICAL
encoding the belief in a
AXIS hypothesis, conclusion, calcu-
IMAGE SCENE lation, etc. Example represen-
PLANE OBJECT tation methods are probability
and fuzzy logic.
center-surround operator: An chain code: An efficient method
operator that is particularly for contour coding where an
sensitive to spot-like image arbitrary curve is represented
features that have higher (or by a sequence of small vectors
lower) pixel values in the cen- of unit length in a limited set of
ter than the surrounding areas. possible directions. Depending
A simple convolution mask that on whether the 4 connected or
can be used as an orientation
the 8 connected grid is em-
independent spot detector is:
ployed, the chain code is
− 18 − 18 − 18 defined as the digits from 0
− 18 1 − 18 to 3 or 0 to 7, assigned to
− 18 − 18 − 18 the 4 or 8 neighboring grid
points in a counter-clockwise
central moments: A family of sense. For example, the string
image moments that are 222233000011 describes the
invariant to translation small curve shown below using
because the center of mass has a 4 connected coding scheme,
been subtracted during the starting from the upper right
calculation. If fc r is the pixel
input image pixel value
1
( binary or gray scale ) at row
r and column c then the
pq th central moment is

r c − ĉ r − r̂  fc r
p q 2 0
c
where ĉ r̂  is the center of
mass of the image.
3
central projection: It is defined
by projection of an image
on the surface of a sphere chamfer matching: A matching
onto a tangential plane by rays technique based on the com-
from the center of the sphere. parison of contours, and based
A great circle is the intersec- on the concept of chamfer
tion of a plane with the sphere. distance assessing the similarity
41
of two sets of points. This can be change detection: See motion
used for matching edge images detection.
using the distance transform.
character recognition: See
See also Hausdorff distance.
optical character recognition.
To find the parameters (for
example, translation and scale character verification: A pro-
below) that register a library cess used to confirm that
image and a test image, the printed or displayed characters
binary edge map of the test are within some tolerance that
image is compared to the dis- guarantees that they are read-
tance transform. Edges are able by humans. It is used in
detected on image 1, and the applications such as labeling.
distance transform of the edge
pixels is computed. The edges characteristic view: An ap-
from image 2 are then matched. proach to object representation
(See plate section for a color in which an object is encoded by
version of these figures.) a set of views of the object. The
views are chosen so that small
changes in viewpoint do not
cause large changes in appear-
ance (e.g., a singularity event ).
Real objects have an unreal-
istic number of singularities,
so practical approaches to
creating characteristic views
require approximations, such
Image 1 Image 2
as only using views on a
tessellated viewsphere, or only
representing the viewpoints
that are reasonable stable over
large ranges on the viewsphere.
See also aspect graph and
appearance based recognition.
chess board distance metric:
Dist. Trans. Edges 2 See Manhattan metric.
chi-squared distribution: The
chi-squared (  2 ) probability
distribution describes the dis-
tribution of squared lengths
of vectors drawn from a nor-
mal distribution. Specifically
let the cumulative distribution
Best Match function of the  2 distribu-
tion with d degrees of free-
chamfering: See distance trans- dom be denoted  2 d u. Then
form. the probability that a point x

42
drawn from a d-dimensional chromatic aberration: A focus-
Gaussian distribution will have ing problem where light of
squared norm  x 2 less than different wavelengths (color) is
a value  is given by  2 d . refracted by different amounts
Empirical and theoretical plots and consequently images at dif-
of the  2 probability density ferent places. As blue light is
function with five degrees of refracted more than red light,
freedom are here: objects may be imaged with
color fringes at places where
0.06 there are strong changes in
Computed
0.04 Empirical lightness.
0.02 chromaticity diagram: A 2D
0 slice of a 3D color space. The
0 5 10 15 20 25 30
CIE 1931 chromaticity dia-
|X|2, X ∈ R5
gram is the slice through
the xyz color space of the
chi-squared test: A statistical CIE where x + y + z = 1. This
test of the hypothesis that slice is shown below. (See
a set of sampled values has plate section for a colour
been drawn from a given dis- version of this figure.) The
tribution. See also chi-squared color gamut of standard 0–1
distribution. RGB values in this model
chip sensor: A CCD or other is the bright triangle in the
semiconductor based light sen- center of the horseshoe-like
sitive imaging device. shape. Points outside the tri-
angle have had their satur-
chord distribution: A 2D shape ations truncated. See also
description technique based on CIE chromaticity coordinates.
all chords in the shape (that is
all pairwise segments between
points on the boundary). Histo-
1
grams of their lengths and 515 nm
orientations are computed. The 535 nm
values in the length histogram 505 nm 555 nm
are invariant to rotations and 0.5 575 nm
scale linearly with the size of 495 nm 595 nm
object. The orientation histo- 780 nm
485 nm
gram values are invariant to
0 380 nm
scale and shifts.
chroma: The color portion of 0 0.5 1
a video signal that includes
hue and saturation, requir- chrominance: 1) The part of
ing luminance to make it vis- a video signal that carries
ible. It is also referred to as color. 2) One or both of the
chrominance. color axes in a 3D color space

43
that distinguishes intensity and CIE L*A*B* model: A color
color. See also chroma. representation model based on
chromosome analysis: Vision that proposed by the Commis-
technique used for the diagno- sion Internationale d’Eclairage
sis of some genetic disorders (CIE) as an international stand-
from microscope images. This ard for color measurement.
usually includes sorting the It is designed to be device-
independent and perceptually
chromosomes into the 23 pairs
uniform (i.e., the separation
and displaying them in a
between two points in this
standard chart.
space corresponds to the per-
CID: Charge Injection Device. ceptual difference between the
A type of semiconductor im- colors). L*A*B* color con-
aging device with a matrix of sists of a luminance, L*,
light-sensitive cells. Every pixel and two chromatic compon-
in a CID array can be individu- ents: A* component, from
ally addressed via electrical green to red; B* component,
indexing of row and column from blue to yellow. See also
electrodes. It is unlike a CCD CIE L*U*V* model.
because it transfers collected CIE L*U*V* model: A color
charge out of the pixel dur- representation system where
ing readout, thus erasing the colors are represented by lumi-
image. nance (L*) and two chro-
CIE chromaticity coordinates: minance components(U*V*). A
Coordinates in the CIE color given change in value in any
space with reference to three component corresponds ap-
ideal standard colors X Y proximately to the same per-
and Z. Any visible color can ceptual difference. See also
be expressed as a weighted CIE L*A*B* model.
sum of these three ideal colors, circle: A curve consisting of
for example, for a color p = all points on a plane lying a
w1 X + w2 Y + w3 Z. The normal- fixed radius r from the center
ized values are given by point C. The arc defining the
w1 entire circle is known as the
x=
w1 + w2 + w3 circumference and is of length
w2
y=
w1 + w2 + w3
w3
z= r
w1 + w2 + w3
C
since x + y + z = 1, we only
need to know two of these val-
ues, say x y. These are the
chromaticity coordinates.
44
2r . The area contained inside is similar to a circle is given by
the curve is given by A = r 2 . 
A
A circle centered at the point C = 4
h k has equation x − h2 + P2
 y −k2 = r 2 . The circle is a spe- where C varies from 0 (non-
cial case of the ellipse. circular) to 1 (perfectly circu-
circle detection: A class of lar). A is the object area and P
algorithms, for example the is the object perimeter.
Hough transform, that locate city block distance: See Man-
the centers and radii of circles hattan metric.
in digital images. In general
images, scene circles usually classification: A general term
appear as ellipses, as in this for the assignment of a
example: label (or class) to structures
(e.g., pixels, regions, lines,
etc.). Example classification
problems include: a) labelling
pixels as road, vegetation or
sky, b) deciding whether cells
are cancerous based on cell
shapes or c) the person with
the observed face is an allowed
system user.
classifier: An algorithm assign-
ing a class among several
possible to an input pat-
circle fitting: Techniques for tern or data. See also
deriving circle parameters from classification, unsupervised
either 2D or 3D observations. classification, clustering,
As with all fitting problems, supervisedclassification and
one can either search the rule-based classification.
parameter space using a good
metric (using, for example, clipping: Removal or non-
a Hough transform), or can rendering of objects that do
solve a well-posed least- not coincide with the display
squares problem. area.
clique: A clique of a graph G is
circular convolution: The cir-
a fully connected subgraph of
cular convolution (ck ) of two
G. In a fully connected graph,
vectors xi  and  yi  that are
every vertex is a neighbor of all
of length
 n is defined as
others. The graph below has a
ck = n−1
i=0 xi yj where 0 ≤ k < n clique with five nodes. (There
and j = i − kmod n. are other cliques in the graph
circularity: One measure C of with fewer nodes, e.g., ABac
the degree to which a 2D shape with four nodes, etc.).
45
ter is more structured than
“noise”.
A
B
CMOS: Complementary metal-
oxide semiconductor. A tech-
nology used in making image
a
sensors and other computer
b c
chips.
CMY: See CMYK.
CMYB: See CMYK.
close operator: The application
CMYK: Cyan, magenta, yellow
of two binary morphology and black color model. It is a
operators, dilation followed by subtractive model where colors
erosion, which has the effect of are absorbed by a medium, for
filling small holes in an image. example pigments in paints.
This figure shows the result of Where the RGB color model
closing with a mask 22 pixels adds hues to black to generate
in diameter: a particular color, the CMYK
model subtracts from white.
Red, green and blue are sec-
ondary colors in this model.
(See plate section for a colour
version of this figure.)

clustering: 1) Grouping to-


gether images regions or pixels
into larger, homogeneous re-
gions sharing some property.
2) Identifying the subsets of a
xi  based on
set of data points 
some property such as proxi-
mity.
clutter: A generic term for
unmodeled or uninteresting coarse-to-fine processing:
elements in an image. For Multi-scale algorithm applica-
example, a face detector gener- tion that begins by processing
ally has a model for faces, and at a large or coarse level and
not for other objects, which then, iteratively, to a small
are regarded as clutter. The or fine level. Importantly,
background of an image is results from each level must
often expected to include “clut- be propagated to ensure a
ter”. Loosely speaking, clut- good final result. It is used
46
for computing, for example, can remain focused over long
optical flow. distances.
coaxial illumination: Front coincidental alignment: When
lighting with the illumination two structures seem to be
path running along the imag- related, but in fact the struc-
ing optical axis. Advantages of tures are independent or the
this technique are no visible alignment is just a conse-
shadows or direct specularities quence of being in some spe-
from the camera’s viewpoint. cial viewpoint. Examples are
random edges being collinear
HALF-SILVERED
or surfaces coplanar, or object
MIRROR corners being nearby. See also
OPTICAL
non-accidentalness.
CAMERA
AXIS collimate: To align the optics
of a vision system, especially
TARGET AREA those in a telescopic system.
LIGHT SOURCE
collimated lighting: Collimated
cognitive vision: A part of lighting (e.g., directional back-
computer vision focusing tech- lighting) is a special form of
niques for recognition and structured light. A collimator
categorization of objects, produces light in which all the
rays are parallel.
structures and events,
learning and knowledge
representation, control and
visual attention.
coherence detection: Stereo Camera
vision technique where max-
imal patch correlations are
searched for across two images
to generate features. It relies
on having a good correlation
measure and a suitably chosen
patch size.
coherent fiber optics: Many Object
fiber optic elements bound
into a single cable component
with the individual fiber spatial
positions aligned, so that it can
Optical
be used to transmit images. system
coherent light: Light, for ex-
ample generated by a laser, in
which the emitted light waves
have the same wavelength and
are in phase. Such light waves Lamp

47
It is used to produce well description to an object that
defined shadows that can be is independent of the lighting
cast directly onto either a sen- environment. This will allow
sor or an object. the system to recognize objects
collinearity: The property of under many different lighting
lying along the same straight conditions. The human vision
line. system does this automatically,
but most machine vision sys-
collineation: See projective tems cannot. For example,
transformation. humans observing a red object
color: Color is both a physical in a cluttered scene under a
and psychological phenom- blue light will still see the
enon. Physically, color refers object as red. A machine vision
to the nature of an object tex- system might see it as a very
ture that allows it to reflect dark blue.
or absorb particular parts of
the light incident on it. (See color co-occurrence matrix: A
also reflectance.) The psycho- matrix (actually a histogram)
logical aspect is characterized whose elements represent the
by the visual sensation experi- sum of color values existing, in
enced when light of a particu- a given image in a sequence, at
lar frequency or wavelength is a certain pixel position relative
incident on the retina. The key to another color existing at a
paradox here concerns why different position in the image.
light of slightly different wave- See also co-occurrence matrix.
lengths should be be so per- color correction: 1) Adjust-
ceptually different (e.g., red ment of colors to achieve
versus blue). color constancy. 2) Any change
color based database index- to the colors of an image. See
ing: See also gamma correction.
color based image retrieval. color differential invariant:
color based image retrieval: A type of differential invariant
An example of the more general based on color information,
R· G
image database indexing pro- such as  R G that has the
cess, where one of the main same value invariant to transla-
indices into the image database tion, rotation and variations in
comes from either color sam- uniform illumination.
ples, the color distribution from
a sample image, or by a set of color doppler: A method for
text color terms (e.g., “red”), etc. noninvasively imaging blood
flow through the heart or
color clustering: See color other body parts by dis-
image segmentation. playing flow data on the
color constancy: The ability of a two dimensional echocardio-
vision system to assign a color graphic image. Blood flow in

48
different directions will be dis- color histogram matching:
played in different colors. Used in color image indexing
color edge detection: The pro- where the similarity mea-
cess of edge detection in color sure is the distance between
images. A simple approach is color histograms of two
combine (e.g., by addition) the images, e.g., by using the
edge strengths of the individ- Kullback–Leibler divergence
ual RGB color planes. or Bhattacharyya distance.
color efficiency: A tradeoff that color image: An image where
is made with lighting systems, each element (pixel) is a tuple
where conflicting design con- of values from a set of color
straints require energy efficient bases.
production of light while color image restoration: See
simultaneously producing image restoration.
sufficiently broad spectrum
illumination that the the colors color image segmentation:
look natural. An obvious ex- Segmenting a color image into
ample of a skewed tradeoff homogeneous regions based
is with low pressure sodium on some similarity criteria. The
street lighting. This is energy boundaries around typical
efficient but has poor color regions are shown here (see
appearance. plate section for a colour
color gamut: The subset of all version of this figure):
possible colors that a par-
ticular display device (CRT,
LCD, printer) can display.
Because of physical differ-
ence in how various devices
produce colors, each scan-
ner, display, and printer has
a different gamut, or range
of colors, that it can repre-
sent. The RGB color gamut
can only display approxi- color indexing: Using color
mately 70% of the colors that information, e.g., color histo-
can be perceived. The CMYK grams, for image database
color gamut is much smaller, indexing. A key issue is varying
reproducing about 20% of illumination. It is possible
perceivable colors. The color to use ratios of colors from
gamut achieved with premixed neighboring locations to obtain
inks (like the Pantone Match- illumination invariance.
ing System) is also smaller than color matching: Due to the
the RGB gamut. phenomenon of trichromacy,
color halftoning: See any color stimulus can be
dithering. matched by a mixture of the

49
three primary stimuli. Color then representing the original
matching is expressed as : image using only them. This
has the side-effect of allowing
C = RR + GG + BB
image compression with fewer
where a color stimulus C is bits. A color image encoded
matched by R units of primary with progressively fewer num-
stimulus R mixed with G units bers of colors is shown here
of primary stimulus G and B (see plate section for a colour
units of primary stimulus B. version of these figures):
color mixture model: A mixture
model based on distributions
in some color representation
system that specifies both the
color groups in a model as well
as their relationships to each
other. The conditional prob-
ability of a observed pixel xi
belonging to an object Ow is 16,777,216 colors 256 colors
modeled as a mixture with K
components.
color models: See color repre-
sentation system.
color moment: A color image
description based on moments 16 colors 4 colors
of each color channel’s
histogram, e.g., the mean, vari- color re-mapping: An image
ance and skewness of the transformation where each
histograms. original color is replaced by
color normalization:Tech- another color from a colormap.
niques for normalizing the If the image has indexed
distribution of color values colors, this can be a very fast
in a color image, so that the operation and can provide spe-
image description is invariant cial graphical effects for very
to illumination. One sim- low processing overhead. (See
ple method for producing
invariance to lightness is to
use vectors of unit length
for color entries, rather than
coordinates in the color
representation system.
color quantization: The pro-
cess of reducing the num-
ber of colors in an image by
selecting a subset of colors, Original Color remapped

50
plate section for a colour ver- to be processed, as a conse-
sion of these figures.) quence of having to consider
color representation system: all combinations of elements.
A 2D or 3D space used to For example, consider match-
represent a set of absolute ing M model features to D data
color coordinates. RGB and features with D ≥ M , each data
CIE are examples of such feature can be used at most
spaces. once and all model features
must be matched. Then the
color spaces: See color repre- number of possible matchings
sentation system. that need to be considered is
color temperature: A scalar D × D − 1 × D − 2 · · · × D −
measure of colour. 1) The M + 1. Here, if M increases
colour temperature of a given by only one, approximately D
colour C is the temperature times as much matching effort
in kelvins at which a heated is needed. Combinatorial
black body would emit light explosion is also loosely used
that is dominated by colour for other non-combination
C . It is relevant to computer algorithms whose effort
vision in that the illumination grows rapidly with even small
color changes the appearance increases in input data sizes.
of the observed objects. The compactness: A scale, transla-
color temperature of incan-
tion and rotation invariant
descent lights is about 3200
descriptor based on the ratio
kelvins and sunlight is about perimeter 2
5500 kelvins. 2) Photographic area
.
color temperature is the ratio compass edge detector: A
of blue to red intensity. class of edge detectors based
color texture: Variations (tex- on combining the response
ture) in the appearance of a of separate edge operators
surface (or region illumina-, applied at several orientations.
tion, etc.) arising because of The edge response at a pixel
spatial variations in either the is commonly the maximum of
color, reflectance or lightness the responses over the several
of a surface. orientations.
colorimetry: The measurement composite filter: Hardware
of color intensity relative to or software image processing
some standard. method based on a mixture
of components such as noise
combinatorial explosion:
reduction, feature detection,
When used correctly, this
grouping, etc.
term refers to how the com-
putational requirements of composite video: A television
an algorithm increases very video transmission method
quickly relative to the increase created as a backward-
in the number of elements compatible solution for the
51
transition from black-and- of components. For example,
white to color television. The most current mechanical parts
black-and-white TV sets ignore are designed by a computer
the color component while aided design (CAD) process. 2)
color TV sets separate out the A term used for distinguishing
color information and display objects designed with the
it with the black-and-white assistance of a computer.
intensity.
computer vision: A broad term
compression: See image com- for the processing of image
pression. data. Every professional will
computational theory: An have a different definition
approach to computer vision that distinguishes computer
algorithm description pro- vision from machine vision,
moted by Marr. A process can image processing or pattern
be described at three levels, recognition. The boundary is
implementation (e.g., as a pro- not clear, but the main
gram), algorithm (e.g., as a issues that lead to this term
sequence of activities) and being used are more emphasis
computational theory. This on 1) underlying theories
third level is characterized of optics, light and sur-
by the assumptions behind faces, 2) underlying statistical,
the process, the mathematical property and shape models,
relationship between the 3) theory-based algorithms,
input and output process as contrasted to commercially
and the description of the exploitable algorithms and 4)
properties of the input data issues related to what humans
(e.g., assumptions of statistical broadly relate to “under-
distributions). The claimed standing” as contrasted with
advantage of this approach is “automation”.
that the computational the- computed axial tomography:
ory level makes explicit the Also known as CAT. An X-ray
essentials of the process, that procedure used in conjunction
can then be compared to the with vision techniques to build
essentials of other processes a 3D volumetric image from
solving the same problem. By multiple X-ray images taken
this method, the implementa- from different viewpoints. The
tion details that can confuse procedure can be used to pro-
comparisons can be ignored. duce a series of cross sections
computational vision: See of a selected part of the human
computer vision. body, that can be used for
computer aided design: medical diagnosis.
1) A general term for object concave mirror: The type of
design processes where a com- mirror used for imaging, in
puter assists the designer, e.g., which a concave surface is
in the specification and layout used to reflect light to a focus.
52
The reflecting surface usually form of a tree. The concavity
is rotationally symmetric about tree of a shape has the convex
the optical or principal axis and hull of its shape as the parent
mirror surface can be part of node and the concavity trees
a sphere, paraboloid, ellipsoid, of its concavities as the child
hyperboloid or other surfaces. nodes. These are subtracted
It is also known as a converg- from the parent shape to give
ing mirror because it brings the original object. The concav-
light to a focus. In the case ity tree of a convex shape is
of the spherical mirror, half the shape itself. The concavity
way between the vertex and the tree of the gray shape below is
sphere center, C, is the mirror shown:
focal point, F, as shown here:
S31
Concave
mirror S311
Object Image
S3
Principal axis C F S32
S1
S

S2
concave residue: The set dif-
ference between a shape and S4
its convex hull. For a convex S41 S
shape, the concave residue is S1 S2 S3 S4
empty. Some shapes (in black)
and their concave residues (in S31 S32 S41
gray) are shown here:
S311

concurrence matrix: See co-


occurrence matrix.
condensation tracking: Con-
ditional density propagation
tracking. The particle filter tech-
nique applied by Blake and
Isard to edge tracking. A frame-
work for object tracking
concavity: Loosely, a depres- with multiple simultaneous
sion, dent, hollow or hole in hypotheses that switches be-
a shape or surface. More pre- tween multiple continuous
cisely, a connected component autoregressive process motion
of a shape’s concave residue. models according to a discrete
concavity tree: An hierarchical transition matrix. Using import-
description of an object in the ance sampling it is possible
53
to keep only the N strongest parabola and hyperbola. The
hypotheses. general form for a conic in 2D
condenser lens: An optical is ax 2 + bxy + cy 2 + dx + ey +
device used to collect light over f = 0. Some example conics are:
a wide angle and produce a
collimated output beam.
conditional dilation: A binary
image operation that is a com-
bination of the dilation opera-
tor and a logical AND operation circle ellipse parabola hyperbola
with a mask, that only allows
dilation into pixels that belong conic fitting: The fitting of a geo-
to the mask. This process can metric model of a conic section
be described by the formula: ax 2 +bxy +cy 2 +dx +ey +f = 0
dilate X J  ∩ M , where X is the to a set of data points xi  yi .
original image, M is the mask Special cases include fitting cir-
and J is the structuring element. cles and ellipses.
conditional distribution: A dis-
conic invariant: An invariant of a
tribution of one variable given
the values of one or more other conic section. If the conic is in
variables. canonical form
conditional replenishment: A ax 2 +bxy +cy 2 +dx +ey +f = 0
method for coding of video
signals, where only the portion
of a video image that has with a2 + b 2 + c 2 + d 2 + e 2 +
changed since the previous f 2 = 1, then the two invariants
frame is transmitted. Effective to rotation and translation are
for sequences with largely sta- functions of the eigenvalues
tionary backgrounds, but more
quadratic form
of the leading
complex sequences require ab
matrix A = b c . For example,
more sophisticated algorithms
that perform motion compen- the trace and determinant are
sation. invariants that are convenient
conformal mapping: A function to compute. For an ellipse, the
from the complex plane to eigenvalues are functions of
itself, f
 → , that preserves the radii. The only invariant to
local angles. For example, the affine transformation is the
complex function y = sinz = class of the conic (hyperbola,
− 12 ie iz − e −iz  is conformal. ellipse, parabola, etc.). The
conic: Curves arising from the invariant to projective trans-
intersection of a cone with a formation is the set of signs
plane (also called conic sec- of the eigenvalues of the 3 × 3
tions). This is a family of curves matrix representing the conic
including the circle, ellipse, in homogeneous coordinates.

54
conical mirror: A mirror in the problem. Given a graph con-
shape of (possibly part of) a sisting of nodes and arcs, the
cone. It is particularly useful problem is to identify nodes
for robot navigation since a cam- forming a connected set. A
era placed facing the apex of node is in a set if it has an arc
the cone aligning the cone’s connecting it to another node
axis and the optical axis and ori- in the set. 2) Connected compo-
ented towards its base can have nent labeling is used in binary
a full 360 view. Conical mirrors and gray scale image processing
were used in antiquity to pro- to join together neighboring
duce cipher images known as pixels into regions. There are
anamorphoses. several efficient sequential
conjugate direction: Optimiza- algorithms for this proce-
tion scheme where a set of dure. In this image, the pixels
independent directions are in each connected compo-
identified on the search space. nent have a different gray
A pair of vectors u  and v are shade:
conjugate with respect to

matrix A if u A v = 0. A con-
jugate direction optimization
method is one in which a series
of optimization directions are
devised that are conjugate with
respect to the normal matrix
but do not require the normal
matrix in order for them to be
determined.
conjugate gradient: A basic
technique of numerical opti-
mization in which the minimum connectivity: See pixel connect-
of a numerical target function is ivity.
found by iteratively descending conservative smoothing: A
along non-interfering (conju- noise filtering technique whose
gate) directions. The conjugate name derives from the fact
gradient method does not that it employs a fast filtering
require second derivatives algorithm that sacrifices noise
and can find the optima of an suppression power to preserve
N dimensional quadric form in the image detail. A simple form
N iterations. By comparison, a of conservative smoothing
Newton method requires one replaces a pixel that is larger
iteration and gradient descent (smaller) than its 8 connected
can require an arbitrarily large neighbors by the largest (small-
number of iterations. est) value amongst those
connected component label- neighbors. This process works
ing: 1) A standard graph well with impulse noise but is
55
not as effective with Gaussian arguments, and g and h may
noise. also be vector-valued, encoding
multiple constraints to be sat-
constrained least squares: It is
isfied. Optimization subject to
sometimes useful to minimize
 2 over some subset equality constraints is achieved
A x − b
by the method of Lagrange
of possible solutions x that are multipliers. Optimization of
predetermined. For example, a quadratic form subject to
one may already know the func- equality constraints results
tion values at certain points on in a generalized eigensystem.
the parameterized curve. This Optimization of a general f
leads to an equality constrained subject to general g and h
version of the least squares may be achieved by iterative
problem, stated as: minimize methods, most notably sequen-
 2 subject to B
A x − b x = c. tial quadratic programming.
There are several approaches constraint satisfaction: An ap-
to the solution of this problem proach to problem solving that
such as QR factorization and the consists of three components:
SVD. As an example, this regres- 1) a list of what “variables”
sion technique can be useful in need values, 2) a set of allow-
least squares surface fitting able values for each “variable”
where the plane described and 3) a set of relationships
by x is constrained to be that must hold between the val-
perpendicular to some other ues for each “variable” (i.e., the
plane. constraints). For example, in
constrained matching: A gen- computer vision, this approach
eric term for recognition ap- has been used for different
structure labelling (e.g., line
proaches where two objects are
labelling, region labelling) and
compared under a constraint on
geometric model recovery tasks
either or both. One example of
(e.g., reverse engineering of 3D
this would be a search for mov- parts or buildings from range
ing vehicles under 20 feet in data).
length.
constructive solid geometry
constrained optimization: (CSG): A method for defin-
Optimization of a function f ing 3D shapes in terms of a
subject to constraints on the mathematically defined set of
parameters of the function. The primitive shapes. Boolean set
general problem is to find the x theoretic operations of inter-
that minimizes (or maximizes) section, union and difference
fx subject to gx = 0 and are used to combine shapes to
hx >= 0, where the functions make more complex shapes.
f g h may all take vector-valued For example:

56
contextual image classifica-
tion: Algorithms that take into
– = account the source or setting
of images in their search for
features and relationships in
the image. Often this context is
content-based image retrieval: composed of region identifiers,
Image database searching color, topology and spatial rela-
methods that produce matches tionships as well as task-specific
based on the contents of the knowledge.
images in the database, as
contrasted with using text contextual method: Algorithms
descriptors to do the index- that take into account the spatial
ing. For example, one can arrangement of found features
use descriptors based on in their search for new ones.
color moments to select images continuous convolution: The
with similar invariants. convolution of two continuous
context: In vision, the elements, signals. In 2D image processing
information, or knowledge terms the convolution of two
occurring together with or images f and h is:
accompanying some data, con-
tributing to the data’s full gx y = fx y ⊗ hx y
meaning. For example, in a
video sequence one can speak = fu  v 
of spatial context of a pixel, − −

indicating the intensities at ×hx − u  y − v du dv


surrounding location in a
given frame (image), or of continuous Fourier transform:
temporal context, indicating See Fourier transform.
the intensities at that pixel
location (same coordinates) continuous learning: A gen-
but in previous and following eral term describing how a
frames. Information deprived system continually updates
of appropriate context can be its model of a process based
ambiguous: for instance, differ- on current data. For example,
ential optical flow methods can updating a background model
only estimate the normal flow; (for change detection) as the
the full flow can be estimated illumination changes during
considering the spatial context the day.
of each pixel. At the level of contour analysis: Analysis of
scene understanding, knowing outlines of image regions.
that the image data comes from
contour following: See con-
a theater performance provides
tour linking.
context information that can
help distinguish between a real contour grouping: See con-
fight and a stage act. tour linking.

57
contour length: The length of image more obvious by increas-
a contour in appropriate units ing the displayed contrast
of measurements. For instance, between image brightness
the length of an image contour levels. Histogram equalization
in pixels. See also arc length. is one method of contrast
enhancement. An example of
contour linking: Edge detection contrast enhancement is here
or boundary detection pro- (see plate section for a colour
cesses typically identify pixels version of these figures):
on the boundary of a region.
Connecting these pixels to form
a curve is the goal of contour
linking.
contour matching: See curve
matching.
contour partitioning: See curve
segmentation.
contour representation: See
boundary representation.
contour tracing: See contour
linking. Input image

contour tracking: See contour


linking.
contours: See object contour.
contrast: 1) The difference in
brightness values between two
structures, such as regions or
pixels. 2) A texture measure. In
a gray scale image, contrast, C ,
is defined as

C= i − j2 P i j After contrast enhancement
i j

where P is the gray-level co- contrast stretching: See con-


occurrence matrix. trast enhancement.
contrast enhancement: Con- control strategy: The guide-
trast enhancement (also known lines behind the sequence of
as contrast stretching) expands processes performed by an
the distribution of intensity automatic image analysis or
values in an image so that a scene understanding system.
larger range of sensitivity in the For instance, control can be
output device can be used. This top-down (searching for image
can make subtle changes in an data that verifies an expected
58

target) or bottom-up (progres- yr c = i j wi jxr − i c − j.
sively acting on image data or Similar forms using integrals
results to derive hypotheses). exist for continuous signals
The control strategy may allow and images. By the appro-
selection of alternative hypoth- priate choice of the weight
eses, processes or parameter values, convolution can com-
values, etc. pute low pass/smoothing,
convex hull: Given a set of high pass/differentiation fil-
points, S, the convex hull is the tering or template match-
smallest convex set that con- ing/matched filtering, as well
tains S. a 2D example is shown as many other linear functions.
here: The right image below is the
result of convolving (and then
inverting) the left image with a
+1 −1 mask:

Object

Convex hull

co-occurrence matrix: A repre-


sentation commonly used in
texture analysis algorithms. It
convexity ratio: Also known as records the likelihood (usu-
solidity. A measure that char- ally empirical) of two features
acterizes deviations from con- or properties being at a given
vexity. The ratio for shape X position relative to each other.
areaX 
is defined as areaC , where CX For example, if the center of
X
is the convex hull of X . A con- the matrix M is position a b
vex figure has convexity factor 1, then the likelihood that the
while all other figures have con- given property is observed at
vexity less than 1. an offset i j from the current
convolution operator: A widely pixel is given by matrix value
used general image and Ma + i b + j.
signal processing operator that cooperative algorithm: An
computes  the weighted sum algorithm that solves a prob-
y j = i wix j − i where lem by a series of local inter-
wi are the weights, xi is the actions between adjacent
input signal and y j is the structures, rather than some
result. Similarly, convolutions global process that has access
of image data take the form to all data. The value at a
59
structure changes iteratively in relationship between two co-
response to changing values ordinate systems. Typical trans-
at the adjacent structures, formations include translation
such as pixels, lines, regions, and rotation. See also Euclidean
etc. The expectation is that transformation.
the process will converge to a coplanarity: The property of
good solution. The algorithms lying in the same plane. For
are well suited for massive example, three vectors a  b and
local parallelism (e.g., SIMD), c are coplanar if their scalar
and are sometimes proposed  · c = 0 is
a × b
triple product 
as models for human image zero.
processing. An early algo-
rithm to solve the stereo coplanarity invariant: A pro-
correspondence problem used jective invariant that allows
cooperative processing be- one to determine when five
tween elements representing corresponding points observed
the disparity at a given picture in two (or more) views are
element. coplanar in the 3D space. The
five points allow the construc-
coordinate system: A spanning tion of a set of four collinear
set of linearly independent vec- points whose cross ratio value
tors defining a vector space. can be computed. If the five
One example is the set gener- points are coplanar, then
ally referred to as the X, Y and the cross ratio value must be the
Z axes. There are, of course, same in the two views. Here,
an infinite number of sets of point A is selected and the lines
three linearly independent vec- AB, AC, AD and AE are used to
tors describing 3D space. The define an invariant cross ratio
right-handed version of this is for any line L that intersects
shown in the figure. them:

Y
A

L
B D

Z X E

coordinate system transform- C


ation: A geometric transform-
ation that maps points, vectors
or other structures from one core line: See medial line.
coordinate system to another. corner detection: See curve seg-
It is also used to express the mentation.

60
corner feature detectors: See basis of cosine functions. For
interest point feature detectors an even 1D function fx, the
and curve segmentation. cosine transform is

coronary angiography: A class
of image processing techniques Fu = 2 fx cos2uxdx
0
(usually based on X-ray data)
for visualizing and inspecting For a sampled signal f0 n−1 , the
the blood vessels surrounding discrete cosine transform is the
the heart (coronaries). See also vector b0 n−1 where, for k ≥ 1:
angiography.  n−1
1
correlation: See cross correl- b0 = f
ation. n i=0 i
 n−1  
correlation based optical flow 2
estimation: Optical flow esti- bk = fi cos 2i + 1k
mated by correlating local n i=0 2n
image texture at each point in For a 2D signal fx y the cosine
two or more images and noting transform Fu v is
their relative movement.

correlation based stereo: 4 fx y cos2ux
Dense stereo reconstruction 0 0
(i.e., at every pixel) com- cos2vydxdy
puted by cross correlating
local image neighborhoods
in the two images to find cost function: The function or
corresponding points, from metric quantifying the cost of a
which depth can be computed certain action, move or config-
by stereo triangulation. uration, that is to be minimized
over a given parameter space.
correspondence constraint: A key concept of optimization.
See stereo correspondence See also Newton’s optimization
constraint. method and functional optimi-
correspondence problem: See zation.
stereo correspondence prob- covariance: The covariance, de-
lem. noted  2 , of a random variable
cosine diffuser: Optical correc- X is the expected value of the
tion mechanism for correct- square of the deviation of the
ing spatial responsivity to light. variable from the mean. If  is
Since off-angle light is treated the mean, then  2 = E X −2 .
with the same response as nor- For a d-dimensional data set
mal light, a cosine transfer is represented as a set of n column
used to decrease the relative 1 n , the sample mean
vectors x
responsivity to it.  = n1 ni=1 xi , and the sample
is 
covariance is the d × d matrix
1 n
cosine transform: Representa-
tion of an signal in terms of a  = n−1 xi − 
i=1   .
 xi − 

61
covariance propagation: A to represent where two aligned
method of statistical error ana- blocks meet. Here, neither a
lysis, in which the covariance step edge nor fold edge is seen:
of a derived variable can be
estimated from the covari- crack following: Edge tracking
ances of the variables from on the dual lattice or “cracks”
which it is derived. For exam- between pixels based on the
ple, assume that independent continuous segments of line
variables x and y are sampled from a crack code.
from multi-variate normal Crimmins smoothing operator:
distributions with associated An iterative algorithm for
covariance matrices Cx and Cy . speckle (salt-and-pepper noise)
Then, the covariance of the reduction. It uses a nonlinear
derived variable z = ax + by is
noise reduction technique that
Cz = a2 Cx + b 2 Cy .
compares the intensity of each
crack code: A contour descrip- image pixel with its eight neigh-
tion method that codes not bors and either increments or
the pixels themselves but the decrements the value to try and
cracks between them. This make it more representative of
is done as a four-directional its surroundings. The algorithm
scheme as shown below. It raises the intensity of pixels
can be viewed as a chain code
that are darker relative to their
with four directions rather than
eight. neighbors and lowers pixels
that are relatively brighter.
More iterations produce more
reduction in noise but at the
0 cost of increased blurring of
detail.
3 1 critical motion: In the problem
of self-calibration of a mov-
2
ing camera, there are certain
motions for which calibration
Crack code = { 2, 2, 1, 2, 3, 2 } algorithms fail to give unique
solutions. Sequences for which
self-calibration is not possible
crack edge: A type of edge are known as critical motion
used in line labeling research sequences.
cross correlation: Standard
method of estimating the
degree to which two series
CRACK EDGE are correlated. Given two
series xi  and  yi , where
i = 0 1 2   N − 1 the cross
62
correlation, rd , at a delay d is based representation of an
defined as object. The representation
 defines the volume by a curved
i xi − mx  · yi−d − my 
  axis, a cross section and a
i xi − mx  i yi−d − my 
2 2 cross section function at each
point on that axis. The cross
where mx and my are the section function defines how
means of the corresponding the size or shape of the cross
sequences. section varies as a function of its
position along the axis. See also
cross correlation matching: generalized cone. This example
Matching based on the cross shows how the size of the
correlation of two sets. The square cross section varies
closer the correlation is to 1, along a straight line to create a
the better the match is. For
truncated pyramid:
example, in correlation based
stereo, for each pixel in the
first image, the corresponding
pixel in the second image
is the one with the highest AXIS
correlation score, where the CROSS SECTION CROSS SECTION
sets being matched are the FUNCTION
local neighborhoods of each
pixel.
TRUNCATED PYRAMID
cross ratio: The simplest projec-
tive invariant. It generates a
scalar from four points of any cross-validation: A test of how
1D projective space (e.g., a pro- well a model generalizes to
jective line). The cross ratio for other data (i.e., using samples
the four points ABCD below is: other than those that were
r + ss + t used to create the model).
This approach can be used
sr + s + t to determine when to stop
training/learning, before over-
generalization occurs. See also
t leave-one-out test.
s
r D crossing number: The crossing
C number of a graph is the min-
B imum number of arc inter-
A
sections in any drawing of
that graph. A planar graph
cross section function: Part of has crossing number zero. This
the generalized cylinder repre- graph has a crossing number
sentation that gives a volumetric of one:

63
1) the increased amount of
computational effort required,
A 2) the exponentially increasing
B
amount of data required to
populate the data space in
order that training works and
b c a 3) how all data points tend to
become equidistant from each
other, thus causing problems
for clustering and machine
CSG: See constructive solid learning algorithms.
geometry cursive script recognition:
CT: See X-ray CAT. Methods of optical character
cumulative histogram: A histo- recognition whereby hand-
gram where the bin contains written cursive (also called
not only the count of all joined-up) characters are
instances having that value but automatically classified.
also the count of all bins having curvature: Usually meant to
a lower index value. This is the refer to the change in shape of
discrete equivalent of the cumu- a curve or surface. Mathemat-
lative probability distribution. ically, the curvature  of a curve
The right figure is the cumula- is the length of the second
tive histogram corresponding
derivative   sx2s  of the curve
2

to the normal histogram on the


left: x s parameterized as a func-
tion of arc length s. A related
definition holds for surfaces,
6 12
only here there are two dis-
4 8 tinct principal curvatures at
2 4 each point on a sufficiently
1 2 3 4 5 1 2 3 4 5
smooth surface.
NORMAL HISTOGRAM CUMULATIVE HISTOGRAM
curvature primal sketch: A
multi-scale representation of
currency verification: Algo- the significant changes in
rithms for checking that curvature along a planar curve.
printed money and coinage are curvature scale space: A
genuine. A specialist field multi-scale representation of
involving optical character rec- the curvature zero-crossing
ognition. points of a planar contour as
curse of dimensionality: The it evolves during smoothing.
exponential growth of pos- It is found by parameterizing
sibilities as a function of the contour using arc length,
dimensionality. This might which is then convolved with
manifest as several effects as a Gaussian filter of increasing
the dimensionality increases: standard deviation. Curvature
64
zero-crossing points are then different points, as illustrated
recovered and mapped to the here:
scale-space image with the
horizontal axis representing INFLECTION POINTS

the arc length parameter on


the original contour and the
vertical axis representing the
standard deviation of the
Gaussian filter.
BITANGENT
curvature sign patch classi- BITANGENT POINTS LINE
fication: A method of local
surface classification based curve evolution: A curve abstrac-
on its mean and Gaussian tion method whereby a curve
curvature signs, or principal can be iteratively simplified, as
curvature sign class. See also in this example:
mean and Gaussian curvature
shape classification.
curve: A set of connected
points in 2D or 3D, where
each point has at most two
neighbors. The curve could be
defined by a set of connected
points, by an implicit function
(e.g., y + x 2 = 0), by an explicit Evolution
form (e.g., t −t 2  for all t), stage
or by the intersection of two
surfaces (e.g., by intersecting
the planes X = 0 and Y = 0),
etc.
curve binormal: The vector
perpendicular to both the
tangent and normal vectors to a For example, a relevance mea-
curve at any given point: sure is assigned to every vertex
in the curve. The least import-
BINORMAL ant can be removed at each
TANGENT
iteration by directly connect-
ing its neighbors. This elim-
ination is repeated until the
desired stage of abstraction is
reached. Another method of
NORMAL curve evolution is to progres-
sively smooth the curve with
curve bitangent: A line tangent Gaussian weighting of increas-
to a curve or surface at two ing standard deviation.
65
INFLECTION POINTS
curve fitting: Methods for find-
ing the parameters of a best-
fit curve through a set of 2D
(or 3D) data points. This is often
posed as a minimization of the
least- squares error between
some hypothesized curve and BITANGENT
BITANGENT POINTS LINE
the data points. If the curve, yx,
can be thought of as the sum of
a set of m arbitrary basis func- curve invariant: Measures taken
tions, Xk and written over a curve that remain invari-
ant under certain transform-

k=m
ations, e.g., arc length and
yx = ak Xk x curvature are invariant under
k=1 Euclidean transformations.
then the unknown parameters curve invariant point: A point
are the weights ak . The curve on a curve that has a geo-
fitting process can then be metric property that is invariant
considered as the minimiza- to changes in projective trans-
tion of some log-likelihood formation. Thus, the point can
function giving the best fit to be identified and used for cor-
N points whose Gaussian respondence in multiple views
error has standard deviation of the same scene. Two well
i . This function may be known planar curve invariant
defined as points are curvature inflection

points and bitangent points, as
yi − yxi  2
i=N
shown here:
2 =
i=1 i
INFLECTION POINTS
The weights that minimize this
can be found from the design
matrix D
Xj xi 
Dij =
i BITANGENT
BITANGENT POINTS
LINE
by finding the solution to the
linear equation
curve matching: The compari-
Da = r son of data sets to previously
where the vector ri = yi
. modeled curves or other curve
i data sets. If a modeled curve
curve inflection: A point on a closely corresponds to a data
curve where the curvature is set then an interpretation of
zero as it changes sign from similarity can be made. Curve
positive to negative, as in the matching differs from curve
two examples below: fitting in that curve fitting

66
involves minimizing the param- lines that make up a square.
eters of theoretical models Methods include: corner
rather than actual examples. detection, Lowe’s method and
curve normal: The vector per- recursive splitting.
pendicular to the tangent vec- curve smoothing: Methods for
tor to a curve at any given point rounding polygon approxima-
and that also lies in the plane tions or vertex-based approxi-
that locally contains the curve at mations of surface boundaries.
that point: Examples include Bezier curves
in 2D and NURBS in 3D.
BINORMAL See also curve evolution. An
TANGENT example of a polygonal data
curve smoothed by a Bezier
curve is:

NORMAL

curve representation system:


Methods of representing or
modeling curves param-
etrically. Examples include:
b-splines, crack codes, cross
section functions, Fourier data curve
descriptors, intrinsic equa- smoothed curve
tions, polycurves, polygonal (Bezier)
approximations, radius vector
functions, snakes, splines, etc.
curve saliency: A voting method curve tangent vector: The vec-
for the detection of curves in a tor that is instantaneously par-
2D or 3D image. Each pixel is allel to a curve at any given
convolved with a curve mask to point:
build a saliency map. This map
will hold high values for loca- BINORMAL
tions in space where likely can-
TANGENT
didates for curves exist.
curve segmentation: Methods
of identifying and splitting
curves into different primitive
NORMAL
types. The location of changes
between one primitive type
and another is particularly cut detection: The identifica-
important. For example, a tion of the frames in film
good curve segmentation algo- or video where the camera
rithm should detect the four viewpoint suddenly changes,

67
either to a new viewpoint cylindrical surface region:
within the current scene or to a A region of a surface that is
new scene. locally cylindrical. A region in
cyclopean view: A term used in which all points have zero
Gaussian curvature, and non-
stereo image analysis, based on
zero mean curvature.
the mythical one-eyed Cyclops.
When stereo reconstruction of
a scene occurs based on two
cameras, one has to con-
sider what coordinate system
to use to base the recon-
structed 3D coordinates, or
what viewpoint to use when
presenting the reconstruction.
The cyclopean viewpoint is
located at the midpoint of
the baseline between the two
cameras.
cylinder extraction: Methods
of identifying the cylinders and
the constituent data points
from 2.5D and 3D images that
are samples from 3D cylinders.
cylinder patch extraction:
Given a range image or a set
of 3D data points, cylinder
patch extraction finds (usually
connected) sets of points that
lie on the surface of a cylinder,
and usually also the equation
of that cylinder. This process
is useful for detecting and
modelling pipework in range
images of industrial scenes.
cylindrical mosaic: A photo-
mosaicing approach where
individual 2D images are pro-
jected onto a cylinder. This is
possible only when the camera
rotates about a single axis or
the camera center of projection
remains approximately fixed
with respect to the distance to
the nearest scene points.
68
D

darkfield illumination: A spe- decimation, or 2) reduce the


cialized illumination technique number of dimensions in each
that uses oblique illumination data point, e.g., by projection
to enhance contrast in sub- or principal component ana-
jects that are not imaged well lysis (PCA).
under normal illumination con- data structure: A fundamental
ditions. concept in programming: a
data fusion: See sensor fusion. collection of computer data
organized in a precise struc-
data integration: See sensor ture, for instance a tree (see
fusion. for instance quadtree), a
data parallelism: Reference to queue, or a stack. Data struc-
the parallel structuring of either tures are accompanied by sets
the input to programs, the of procedures, or libraries,
organization of programs them- implementing various types
selves or the programming lan- of data manipulation, for
guage used. Data parallelism is instance storage and indexing.
a useful model for much image DCT: See discrete cosine trans-
processing because the same form.
operation can be applied inde-
pendently and in parallel at all deblur: To remove the effect
pixels in the image. of a known blurring function
on an image. If an observed
data reduction: A general term image I is the convolution
for processes that 1) reduce of an unknown image I 
the number of data points, and a known blurring kernel
e.g., by subsampling or by B, so that I = I  ∗ B, then
using cluster centers of mass deblurring is the process of
as representative points or by computing I  given I and B.

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

69
See deconvolution, image decoding: Converting a signal
restoration, Wiener filtering. that has been encoded back
decentering distortion (lens): into its original form (lossless
Lens decentering is a common coding) or into a form close to
cause of tangential distortion. the original (lossy coding). See
It arises when the lens elem- also image compression.
ents are not perfectly aligned decomposable filters: A com-
and creates an asymmetric plex filter that can be applied
component to the distortion. as a number of simpler fil-
decimation: 1) In digital signal ters applied one after the
processing, a filter that keeps other. For example the 2D
one sample out of every N , Laplacian of Gaussian filter can
where N is a fixed number. See be decomposed into four sim-
also subsampling. 2) “Mesh” pler filters.
decimation: merging of simi- deconvolution: The inverse
lar adjacent surface patches process of convolution. Decon-
or mesh vertices in order to volution is used to remove
reduce the size of a model. certain signals (for example
Often used as a processing step blurring) from images by
when deriving a surface model inverse filtering (see deblur).
from a range image. For a convolution producing
decision tree: Tools for help- image h = f ∗ g +  given f and
ing to choose between sev- g, the image and convolution
eral courses of action. They are mask,  is the noise and ∗ is
an effective structure within the convolution, deconvolu-
which an agent can search tion attempts to estimate f .
options and investigate the Deconvolution is often an
possible outcomes. They also ill-posed problem and may not
help to balance the risks and have a unique solution. See
rewards associated with each also image restoration.
possible course of action. defocus: Blurring of an image,
either accidental or deliberate,
by incorrect focus or viewpoint
?
parameters use or estimation.
Rule See also shape from focus,
? ? ? shape from defocus.
defocus blur: Deformation of
? ? ? ? ? an image due to the pre-
dictable behavior of optics
when incorrectly adjusted. The
? ? ? ? ? ? blurring is the result of light
rays that, after entering the
Decisions made optical system, misconverge on
? Decisions the imaging plane. If the cam-
Results era parameters are known in

70
advance, the blurring can be by unwanted processes. For
partially corrected. instance, MPEG compression–
deformable model: Object des- decompression can alter some
criptors that model a specific intensities, so that the image
class of deformable objects is degraded. (See also JPEG
(e.g., eyes, hands) where the image compression), image
shapes vary according to the noise.
values of the parameters. If degree of freedom: A free vari-
the general, but not specific, able in a given function. For
characteristics of an object type instance, rotations in 3D space
are known then a deformable depend on three angles, so
model can be constructed and that a rotation matrix has nine
used as a matching template entries but only three degrees
for new data. The degree of of freedom.
deformation needed to match Delaunay triangulation: The
the shape can be used as Delaunay graph of the point
matching score. See also modal set can be constructed from
deformable model, geometric its Voronoi diagram by con-
deformable model. necting the points in adjacent
deformable shape: See deform- polygons. The connections
able model. form the Delaunay triangula-
tion. The triangulation has the
deformable superquadric: A
property that the circumcircle
type of superquadric volu-
of every triangle contains no
metric model that can be
other points. The approach
deformed by bending, twist-
can be used to construct a
ing, etc. in order to fit to the polyhedral surface approxi-
data being modeled. mation from a set of 3D
deformable template model: sample points. The solid lines
See deformable model. connecting the points below
deformation energy: The met- are the Delaunay triangulation
ric that must be minimized and the dashed lines are the
when determining an active boundaries of the Voronoi
shape model. Comprised of diagram.
terms for both internal energy
(or force) arising from the
model shape deformation and
external energy (or force) aris-
ing from the discrepancy
between the model shape and
the data.
degradation: A loss of quality
suffered by an image, the con-
tent of which gets corrupted
71
demon: A program that runs in In a range image, the intensity
the background, for instance value in the image is a measure
performing checks or guaran- of depth.
teeing the correct functioning
of a module of a complex depth estimation: The process
system. of estimating the distance
between a sensor (e.g., a
demosaicing: The process of stereo pair) and a part of the
converting a single color per scene being imaged. Stereo
pixel image (as captured by vision and range sensing are
most digital cameras) into a two well-known ways to esti-
three color per pixel image. mate depth.
Dempster–Shafer: A belief depth from defocus: The depth
modeling approach for testing from defocus method uses the
a hypothesis that allows infor- direct relationships among the
mation, in the form of beliefs, depth, camera parameters and
to be combined into a plausi- the amount of blurring in
bility measure for that hypoth- images to derive the depths
esis. from parameters that can be
directly measured.
dense reconstruction: A class
of techniques estimating depth depth from focus: A method
at each pixel of an input image to determine distance to one
or sequence, thus generating point by taking many images
a dense sampling of the 3D in better and better focus. This
surfaces imaged. This can be is also called autofocus or soft-
achieved, for instance, by range ware focus.
sensing, or stereo vision.
depth image: See range image.
dense stereo matching: A class
of methods establishing the depth image edge detector:
correspondence (see stereo See range image edge detector.
correspondence problem) be- depth map: See range image.
tween all pixels in a stereo
pair of images. The generated depth of field: The distance
disparity map can then be between the nearest and the
used for depth estimation. farthest point in focus for a
given camera:
densitometry: A class of tech-
niques that estimate the dens-
ity of a material from images, Nearest point Furthest point
for instance bone density in in focus in focus
the medical domain (bone Camera

densitometry).
depth: Distance of scene points
from either the camera center
or the camera imaging plane. Depth of field

72
depth perception: The ability to
perceive distances from visual
stimuli, for instance motion or
stereo vision. Start

3D model

Conjugate gradient search

DFT: See discrete Fourier trans-


form.
View 1 View 2 diagram analysis: Syntactic ana-
lysis of images of line drawings,
possibly with text in a report
depth sensor: See range sensor. or other document. This field is
closely related to the analysis of
Deriche edge detector: Con- visual languages.
volution filter for edge finding
similar to the Canny edge dichroic filter: A dichroic filter
detector. Deriche uses a differ- selectively transmits light of a
ent optimal operator where the given wavelength.
filter is assumed to have infinite dichromatic model: The di-
extent. The resulting convolu- chromatic model states that the
tion filter is sharper than the light reflected from a surface is
derivative of the Gaussian that the sum of two components,
Canny uses body and interface reflectance.
Body reflectance follows
x
fx = Axe −  Lambert’s law. Interface reflect-
ance models highlights. The
See also edge detection. model has been applied to
several computer vision tasks
derivative based search: Nu- including color constancy,
merical optimization methods shape recovery and color image
assuming that the gradient can segmentation. See also color.
be estimated. An example is the difference image: An image
quasi-Newton approach, that computed as pixelwise differ-
attemptstogenerateanestimate ence of two other images, that
of the inverse Hessian matrix. is, each pixel in the difference
This is then used to determine image is the difference between
thenextiterationpoint. the pixels at the same location

73
in the two input images. For invariants (typically based on
example, in the figure below derivatives of the image func-
the right image is the difference tion). The image function is
of the left and middle images always assumed to be continu-
(after adding 128 for display ous and differentiable.
purposes).
differential pulse code modu-
lation: A technique for con-
verting an analogue signal to
binary by sampling it, express-
ing the value of the sampled
data modulation in binary and
then reducing the bit rate by
diffeomorphism: A differenti- taking account of the fact that
able one-to-one map between consecutive samples do not
manifolds. The map has a change much.
differentiable inverse. differentiation filtering: See
difference-of-Gaussians oper- gradient filter.
ator: A convolution operator
diffraction: The bending of light
used to locate edges in a
gray-scale image using an rays at the edge of an object or
approximation to the Laplacian through a transparent medium.
The amount by which a ray
of Gaussian operator. In 2D
is bent is dependent on wave-
the convolution mask is:
length.
   
 x 2 +y 2   x 2 +y 2 
− − diffraction grating: An array of
12 22
c1 e − c2 e diffracting elements that has
where the constants c1 and c2 the effect of producing periodic
control the height of the individ- alterations in a wave’s phase,
ual Gaussians and 1  2 are the amplitude or both. The simplest
standard deviations. arrangement is an array of slits
(see moiré interferometry).
differential geometry: A field
of mathematics studying the m=2
local derivative-based proper- m=1
ties of curves and surfaces, m=0
θ
for instance tangent plane and
curvature.
differential invariant: Image Light Light banding
descriptors that are invariant source Diffraction
under geometric transform- grating
ations as well as illumination
changes. Invariant descriptors diffuse illumination: Light en-
are generally classified as global ergy that comes from a multi-
invariants (corresponding to tude of directions, hence not
object primitives) and local causing significant shading or
74
shadow effects. The opposite of
diffuse illumination is directed
illumination.
diffuse reflection: Scattering of
light by a surface in many direc-
tions. Ideal Lambertian diffu-
sion results in the same energy
being reflected in every direc-
tion regardless of the direction
of the incoming light energy. digital geometry: Geometry
(points, lines, angles, surfaces,
etc.) in a sampled and quantized
Light
domain.
Reflected Light digital image: Any sampled and
quantized image.

41 43 45 51 56 49 45 40
56 48 65 85 55 52 44 46
59 77 99 81 127 83 46 56
52 116 44 54 55 186 163 163
diffusion smoothing: A tech- 51
50
129
85
46
192
48
140
71
167
164
99
86
51
97
44
nique achieving Gaussian 57 63 91 126 102 56 54 49
146 169 213 246 243 139 180 163
smoothing as the solution of 41 44 54 56 47 45 36 54
a diffusion equation with the
image to be filtered as the
initial boundary condition. digital image processing:
The advantage is that, unlike Image processing restricted to
repeated averaging, diffusion the domain of digital images.
smoothing allows the con-
digital signal processor: A class
struction of a continuous scale
of co-processors designed
space.
to execute processing oper-
digital camera: A camera in ations on digitized signals
which the image sensing sur- efficiently. A common char-
face is made up of individual acteristic is the provision of a
semiconductor sampling elem- fast multiply and accumulate
ents (typically one per pixel function, e.g., a ← a + b × c.
of the image), and quantized digital subtraction angiog-
versions of the sensed values raphy: A basic technique
are recorded when an image is used in medical image pro-
captured. cessing to detect, visualize and
digital elevation map: A sam- inspect blood vessels, based on
pled and quantized map where the subtraction of a background
every point represents a height image from the target image,
above a reference ground plane usually where the blood vessels
(i.e., the elevation). are made more visible by using

75
an X-ray contrast medium. See of filling in any small holes in the
also medical image registration. object(s) and joining any object
digital terrain map: See digital regions that are close together.
elevation map. Most frequently described as a
morphological transformation,
digital topology: Topology (i.e.,
how things are connected/ and is the dual of the erode
arranged) in a digital domain operator.
(e.g., in a digital image). See
also connectivity.
digital watermarking: The pro-
cess of embedding a signa-
ture/watermark into digital
data. In the domain of digital
images this is most normally
done for copyright protection.
The digital watermark may be dimensionality: The number of
invisible or visible (as shown). dimensions that need to be con-
sidered. For example 3D object
location is often considered as
a seven dimensional problem
(three dimensions for position,
three for orientation and one
for the object scale).
digitization: The process of mak- direct least square fitting: Dir-
ing a sampled digital version of ect fitting of a model to some
some analog signal (such as an data by a method that has a
image). closed form or globally conver-
dihedral edge: The edge made gent solution.
by two planar surfaces. A “fold” directed illumination: Light
in a surface: energy that comes from a par-
ticular direction hence causing
relatively sharp shadows. The
opposite of this form of illumin-
ation is diffuse illumination.
directional derivative: A deriva-
tive taken in a specific direction,
for instance, the component of
the gradient along one coordi-
dilate operator: The operation nate axis. The images on the
of expanding a binary or gray- right are the vertical and hori-
scale object with respect to the zontal directional derivatives of
background. This has the effect the image on the left.
76
discrete Fourier transform
(DFT): A version of the Fourier
transform for sampled data.
discrete relaxation: A technique
for labeling objects in which the
possible type of each object is
discontinuity detection: See iteratively constrained based on
edge detection. relationships with other objects
discontinuity preserving regu- in the scene. The aim is to obtain
larization: A method for pre- a globally consistent interpret-
serving edges (discontinuities) ation (if possible) from locally
from being blurred as a result of consistent relationships.
some regularization operation discrimination function: A bi-
(such as the recovery of a dense nary function separating data
disparity map from a sparse into two classes. See classifier.
set of disparities computed at disparity: The image distance
matching feature points). shifted between corresponding
discontinuous event tracking: points in stereo image pairs.
Tracking of events (such as
a moving person) through a Left image features Right image features
sequence of images. The dis-
continuous nature of the track-
ing is caused by the distance that
a person (or hand, arm, etc.)
can travel between frames and
also be the possibility of occlu- Disparity
sion (or self-occlusion).

disparity gradient: The gradient


of a disparity map for a stereo
pair, that estimates the surface
slope at each image point. See
also binocular stereo.
discrete cosine transform
(DCT): A transformation that disparity gradient limit: The
converts digital images into maximum allowed disparity
the frequency domain in terms gradient in a potential stereo
of the coefficients of discrete feature match.
cosine functions. Used, for disparity limit: The maximum
example, within JPEG image allowed disparity in a potential
compression. stereo feature match. The

77
notion of a disparity limit is pincushion distortion, barrel
supported by evidence from distortion.
the human visual system. distortion polynomial: A poly-
dispersion: Scattering of light by nomial model of radial lens
the medium through which it is distortion. A common example
traveling. is x = xd 1 + k1 r 2 + k2 r 4 , y =
yd 1+k1 r 2 +k2 r 4 . Here, x y are
distance function: See distance the undistorted image coordi-
metric. nates, xd  yd are the distorted
distance map: See range image. image coordinates, r 2 = xd2 +
distance metric: A measure of yd2 , and k1  k2 are the distortion
coefficient. Usually k2 is signifi-
how far apart two things are in
cantly smaller than k1 , and can
terms of physical distance or be set to 0 in cases where high
similarity. A metric can be other accuracy is not required.
functions besides the standard
Euclidean distance, such as the distortion suppression: Cor-
algebraic or Mahalanobis dist- rection of image distortions
ances. A true metric must satisfy: (such as non-linearities
1) dx y + dy z ≥ dx z, introduced by a lens). See
2) dx y = dy x, 3) dx x = 0 geometric distortion and geo-
and 4) dx y = 0 implies x = y, metric transformation.
but computer vision processes dithering: A technique simulat-
often use functions that do not ing the appearance of different
satisfy all of these criteria. shades or colors by varying the
distance transform: An image pattern of black and white (or
different color) dots. This is a
processing operation normally
common task for inkjet printers.
applied to binary images in
which every object point is
transformed into a value repre-
senting the distance from the
point to the nearest object
boundary. This operation is also
referred to as chamfering (see
chamfer matching ).

4
4
3
3
2
2
2
1
2
1
2
1
1
1
1
0
1
0
1
0
divide and conquer: A tech-
4
4
3
3
2
2
1
1
0
0
0
1
0
1
0
1
1
1
0
0
nique for solving problems
4
4
3
3
2
2
1
1
0
0
1
1
2
1
1
1
0
0
0
1 efficiently by subdividing the
4
4
3
3
2
2
1
1
0
1
0
1
0
1
0
1
0
1
1
1 problem into smaller sub-
4 3 2 2 2 2 2 2 2 2
problems, and then recursively
solving these subproblems in
distortion coefficient: A coeffi- the expectation that the smaller
cient in a given image distortion problems will be easier to
model, for instance k1  k2 in the solve. An example is an algo-
distortion polynomial. See also rithms for deriving a polygonal
78
approximation of a contour dominant plane: A degenerate
in which a straight line esti- case encountered in uncali-
mate is recursively split in the brated structure and motion
middle (into two segments recovery where most or all of
with the midpoint put exactly the tracked image features are
on the contour) until the co-planar in the scene.
distance between the polyg- Doppler: A physics phenom-
onal representation and the enon whereby an instrument
actual contour is below some
receiving acoustic or electro-
tolerance.
magnetic waves from a source
in relative motion measures
Curve
Final Estimate an increasing frequency if the
source is approaching, and
decreasing if receding. The
acoustic Doppler effect is
Initial Estim
ate employed in sonar sensors to
estimate target velocity as well
as position.
divisive clustering: Clustering/
cluster analysis in which all downhill simplex: A method for
items are initially considered finding a local minimum using
as a single set (cluster) and a simplex (a geometrical figure
subsequently divided into specified by N + 1 vertices) to
component subsets (clusters). bound the optimal position in
an N -dimensional space. See
DIVX: An MPEG 4 based video
also optimization.
compression technology aim-
ing to achieve sufficiently high DSP: See digitalsignalprocessor.
compression to enable transfer dual of the image of the abso-
of digital video contents over lute conic (DIAC): If  is the
the Internet, while maintaining matrix representing the image
high visual quality. of the absolute conic, then −1
document analysis: A general represents its dual (DIAC). Cali-
term describing operations that bration constraints are some-
attempt to derive information times more readily expressed in
from documents (including for terms of the DIAC than the IAC.
example character recognition duality: The property of two con-
and document mosaicing ). cepts or theories having similar
document mosaicing: Image properties that can be applied
mosaicing of documents. to the one or to the other.
document retrieval: Identifica- For instance, several relations
tion of a document in a database linking points in a projective
of scanned documents based on space are formally the same as
some criteria. those linking lines in a projec-
tive space; such relations are
DoG: See difference of Gaussians. dual.

79
dynamic appearance model: A per time sample) to a model
model describing the changing sequence of features, where the
appearance of an object/scene hope is for a one-to-one match
over time. of observations to features. But,
dynamic programming: An ap- because of variations in rate
proach to numerical optimiza- at which observations are pro-
tion in which an optimal duced, some features may get
solution is searched by keeping skipped or others matched to
several competing partial paths more than one observation. The
throughout and pruning alter- usual goal is to minimize the
native paths that reach the same amount of skipping or mul-
point with a suboptimal value. tiple samples matched (time
warping). Efficient algorithms
dynamic range: The ratio of the to solve this problem exist
brightest and darkest values in based on the linear ordering of
an image. Most digital images the sequences. See also hidden
have a dynamic range of around Markov models (HMM).
100:1 but humans can perceive
detail in dark regions when
the range is even 10,000:1. To
allow for this we can create high
dynamic range images.

dynamic scene: A scene in which


some objects move, in contrast
to the common assumption in
shape from motion that the
scene is rigid and only the cam-
era is moving.
dynamic stereo: Stereo vision
for a moving observer. This
allows shape from motion tech-
niques to be used in addition to
the stereo techniques.
dynamic time warping: A tech-
nique for matching a sequence
of observations (usually one

80
E

early vision: A general term maximum chord length of any


referring to the initial stages of orthogonal chord.
computer vision (i.e., image
capture and image processing ). Maxim
Also known as low level vision.
earth mover’s distance: A met-
um o

ric for comparing two distri- hord


um C
Maxim
rthog

butions by evaluating the


minimum cost of transforming
onal

one distribution into the other


chor

(e.g., can be applied to color


histogram matching ).
d

echocardiography: Cardiac
Distribution 1 Distribution 2 Transformation
ultrasonography (echocardiog-
raphy) is a non-invasive tech-
nique for imaging the heart
and surrounding structures.
Generally used to evaluate car-
diac chamber size, wall thick-
eccentricity: A shape represen- ness, wall motion, valve con-
tation that measures how non- figuration and motion and the
circular a shape is. One way of proximal great vessels.
computing this is to take the edge: A sharp variation of the
ratio of the maximum chord intensity function. Represented
length of the shape to the by its position, the magnitude

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

81
of the intensity gradient, and enhanced image and  is some
the direction of the maximum constant.
intensity variation.
edge based segmentation:
Segmentation of an image
based on the edges detected.
edge based stereo: A type of
feature based stereo where the
features used are edges.
edge detection: An image pro- edge finding: See edge detec-
cessing operation that com- tion.
putes edge vectors (gradient edge following: See edge
and orientation) for every tracking.
point in an image. The first
stage of edge based segmenta- edge gradient image: See edge
tion. image.
edge grouping: See edge
tracking.
edge image: An image where
every pixel represents an edge
or the edge magnitude.
edge linking: See edge tracking.
edge magnitude: A measure of
edge direction: The direction the contrast at an edge, typ-
perpendicular to the normal ically the magnitude of the
to an edge, that is, the direc- intensity gradient at the edge
tion along the edge, paral- point. See also edge detection,
lel to the lines of constant edge point.
intensity. Alternatively, the nor-
mal direction to the edge, edge matching: See curve
i.e., the direction of maximum matching.
intensity change (gradient). edge motion: The motion of
See also edge detection, edge edges through a sequence
point. of images. See also shape
edge enhancement: An image from motion and the aperture
enhancement operation that problem.
makes the gradient of edges edge orientation: See edge
steeper. This can be achieved, direction.
for example, by adding some
multiple of a Laplacian con- edge point: 1) A location in an
volved version of the image image where some quantity
Li j to the image gi j. fi j = (e.g., intensity) changes rap-
gi j + Li j where fi j is the idly. 2) A location where the
82
gradient is greater than some Image from Position A Image from Position B
threshold.
edge preserving smoothing:
A smoothing filter that is
designed to preserve the edges
in the image while reducing Position A Position B
image noise. For example see Motion of the observer
median filter.

eigenface: An eigenvector deter-


mined from a matrix A in which
the columns of A are images
of faces. These vectors can be
used for face recognition.
edge sharpening: See edge eigenspace based recognition:
enhancement. Recognition based on an
edge tracking: 1) The grouping eigenspace representation.
of edges into chains of signifi- eigenspace representation:
cant edges. The second stage of See principal component
edge based segmentation. Also representation.
known as edge following, edge
grouping and edge linking. eigenvalue: A scalar  that for
2) Tracking how the edge a matrix A satisfies Ax = x
moves in a video sequence. where x is a nonzero vector
(eigenvector).
edge type labeling: Classifica-
tion of edge points or edges eigenvector: A non-zero vec-
into a limited number of types tor x that for a matrix A satis-
(e.g., fold edge, shadow edge, fies Ax = x where  is a scalar
occluding edge, etc.). (the eigenvalue).
EGI: See extended Gaussian eigenvector projection: Projec-
image. tion onto the PCA basis vectors.
egomotion: The motion of the electromagnetic spectrum:
observer with respect to the The entire range of fre-
observed scene. quencies of electromagnetic
waves including X-rays, ultra-
egomotion estimation: Deter- violet, visible light, infrared,
mination of the motion of a microwave and radio waves.
camera. Generally based on
image features corresponding
Wavelength (in meters)
to static objects in the scene.
–12 –10 –8 –6 –4 –2 2 4
See also structure and motion. 10 10 10 10 10 10 1 10 10

A typical image pair where the


X rays Microwave Radio
camera position is to be esti-
mated is: Ultraviolet Visible Infrared

83
ellipse fitting: Fitting of an EM: See expectation maximiza-
ellipse model to the boundary tion.
of some shape, data points, etc. empirical evaluation: Evalu-
ation of computer vision
algorithms in order to char-
acterize their performance
by comparing the results
of several algorithms on
standardized test problems.
Careful evaluation is a difficult
research problem in its own
ellipsoid: A 3D volume in which right.
all plane cross sections are encoding: Converting a digital
ellipses or circles. An ellipsoid signal, represented as a set
is the set of points x y z sat- of values, from one form to
2 2 2
isfying xa2 + by 2 + zc2 = 1. Ellip- another, often to compress
soids are used in computer the signal. In lossy encod-
vision as a basic shape primi- ing, information is lost in
tive and can be combined with the process and the decoding
other primitives in order to algorithm cannot recover it.
describe a complex shape. See also MPEG and JPEG image
compression.
elliptic snake: An active contour
model of an ellipse whose endoscope: An instrument for
parameters are estimated visually examining the interior
through energy minimization of various bodily organs. See
from an initial position. also fiberscope.
elongatedness: A shape rep- energy minimization: The
resentation that measures how problem of determining the
long a shape is with respect to absolute minimum of a multi-
variate function representing
its width (i.e., the ratio of the
(by a potential energy-like
length of the bounding box to
penalty) the distance of a
its width), as illustrated below. potential solution from the
See also eccentricity. optimal solution. It is a spe-
cialization of the optimization
problem. Two popular min-
gth imization algorithms in com-
Len
Wid

puter vision are the


th

Levenberg–Marquardt and
Newton optimization methods.
entropy: 1. Colloquially, the
amount of disorder in a sys-
tem. 2. A measure of the infor-
mation content of a random
84
variable X . Given that X has epipolar line. This constraint
a set of possible values or may be described mathemat-
outcomes , with probabilities ically using the fundamental
Px x ∈ , the entropy HX  matrix. See also epipolar
of X is defined as geometry.

−Px log Px epipolar correspondence
x∈ matching: Stereo matching
using the epipolar constraint.
with the understanding that
0 log 0 = 0. For a multivariate epipolar geometry: The geo-
distribution, the joint entropy metric relationship between
HX Y of X Y is two perspective cameras.

−Px y log Px y
Real world point
xy∈× Optical Center
Optical Center
Image Plane Image Plane
For a set of values represented Epipolar Line
Epipolar Line

as a histogram, the entropy of Image Point


Camera 2
Image Point

the set may be defined as the Camera 1

entropy of the probability dis-


tribution function represented epipolar line: The intersection
by the histogram. of the epipolar plane with the
image plane. See also epipolar
constraint.
epipolar plane: The plane
defined by any real world
scene point together with the
optical centers of two cameras.
epipolar plane image (EPI):
Left: p log p as a function of An image that shows how a
p. Probabilities near 0 and 1 particular line from a camera
signal high entropy, probabil- changes as the camera pos-
ities between are less entropic. ition is changed such that the
Right: The entropy of the gray image line remains on the same
scale histograms in some win- epipolar plane. Each line in the
dows on an image. EPI is a copy of the relevant line
epipolar constraint: A geomet- from the camera at a different
ric constraint reducing the time. Features that are distant
dimensionality of the stereo from the camera will remain in
correspondence problem. For the same position in each line,
any point in one image, the and features that are close to
possible matching points in the the camera will move from line
other image are constrained to to line (the closer the feature
lie on a line known as the the further it will move).
85
Image 1 Image 8 epipole location: The oper-
ation of locating the epipoles.
equalization: See histogram
equalization.
erode operator: The operation
EPI from 8 images for highlighted line: of reducing a binary or gray
scale object with respect to
the background. This has the
effect of removing any isolated
epipolar plane image analysis: object regions and separating
An approach to determining any object regions that are only
shape from motion in which connected by a thin section.
epipolar plane images (EPIs) Most frequently described as a
are analyzed. The slope of lines morphological transformation
in an EPI is proportional to and is the dual of the dilate
the distance of the object from operator.
the camera, where vertical lines
corresponding to features at
infinity
epipolar plane motion: See
epipolar plane image analysis.
epipolar rectification: The
image rectification of stereo error propagation: 1) The
images so that the epipolar propagation of errors result-
lines are aligned with the ing from one computation
image rows (or columns). to the next computation.
2) The estimation of the
epipolar transfer: The trans- error (e.g., variance) of a
fer of corresponding epipolar process based on the esti-
lines in a stereo pair of images, mates of the error in the
defined by a homography. See input data and intermediate
also stereo and stereo vision. computations.
epipole: The point through essential matrix: In binocular
which all epipolar lines from stereo, a matrix E expressing
a camera appear to pass. See a bilinear constraint between
also epipolar geometry. corresponding image points
u u in camera coordinates:
Epipolar Lines
u Eu = 0. This constraint is the
Image
basis for several reconstruc-
tion algorithms. E is a function
of the translation and rotation
of the camera in the world
reference frame. See also the
Epipole fundamental matrix.

86
Euclidean distance: The geo- for instance, in Lagrangian
metric distance between two mechanics and have been used
points  x1  y1  and x2  y2 , in computer vision for a variety
i.e., x1 − x2 2 + y1 − y2 2 . For of optimizations, including
 x1 and
n-dimensional vectors for surface interpolation. See
x2 , the distance is  ni=1 x1i − also variational approach and
1 variational problem.
x2i 2  2 .
Euler number: The number of
Euclidean reconstruction: 3D contiguous parts (regions) less
reconstruction of a scene using the number of holes. Also
a Euclidean frame of refer- known as the genus.
ence, as opposed to an affine
reconstruction or projective even field: The first of the two
reconstruction. The most com- fields in an interlaced video
plete reconstruction achiev- signal.
able. For example, using stereo even function: A function
vision. where fx = f−x for all x.
Euclidean space: A representa- event analysis: See event under-
tion of the space of all n-tuples standing.
(where n is the dimension-
ality). For example the three event detection: Analysis of a
dimensional Euclidean space sequence of images to detect
 X Y Z is typically used to activities in the scene.
describe the real world. Also
known as Cartesian space (see Image from a Movement detected
sequence of images in the image
Cartesian coordinates).
Euclidean transformation: A
transformation that operates
in Euclidean space (i.e., main-
taining the Euclidean spatial
arrangements). Examples
include rotation and trans- event understanding: Recogni-
lation. Often applied to tion of an event (such as a per-
homogeneous coordinates. son walking) in a sequence of
Euler angle: The Euler angles images. Based on the data pro-

   are a particular set of vided by event detection.
angles describing rotations in exhaustive matching: Match-
three dimensional space. ing where all possibilities are
Euler–Lagrange: The Euler– considered. As an alternative
Lagrange equations are the see hypothesize and verify.
basic equations in the calculus expectation maximization
of variations, a branch of (EM): A method of finding a
calculus concerned with maximum likelihood estimate
maxima and minima of def- of some parameters based on
inite integrals. They occur, a sample data set. This method

87
works well even when there the value associated with the
are missing values. surface patch with which it
expectation value: The mean intersects is incremented.
value of a function (i.e., the
average expected value). If px
is the probability density func-
tion of a random variable x,

the expectation of x is x̄ =
pxxdx.
expert system: A system that
uses available knowledge and extended light source: A light
heuristics to solve problems. source that has a significant
See also knowledge based size relative to the scene, i.e., is
vision. not approximated well by a
exponential smoothing: A point light source. In other
method for predicting a data words this type of light source
value (Pt+1 ) based on the pre- has a diameter and hence can
vious observed value (Dt ) and produce fuzzy shadows. Con-
the previous prediction (Pt ). trast with: point light sources.
Pt+1 =
Dt + 1 −
Pt where

is a weighting value between 0


and 1. Light Source
No shadow
Fuzzy shadow
Complete shadow

Pt (α = 1.0)
exterior orientation: The pos-
ition of a camera in a global
Value

Dt
coordinate system. That which
Pt (α = 0.5) is determined by an absolute
orientation calculation.
0 1 2 3 4 5 6 7 8 9 external energy (or force): A
Time measure of fit between the
image data and an active shape
exponential transformation: model that is part of the
See pixel exponential oper- model’s deformation energy.
ator. This measure is used to deform
expression understanding: the model to the image data.
See facial expression analysis. extremal point: Points that lie
extended Gaussian image on the boundary of the small-
(EGI): Use of a Gaussian est convex region enclosing a
sphere for histogramming set of points (i.e., that lie on
surface normals. Each surface the convex hull).
normal is considered from extrinsic parameters: See ex-
the center of the sphere and terior orientation.

88
eye location: The task of
finding eyes in images of
faces. Approaches include
blink detection, face feature
detection, etc.
eye tracking: Tracking the pos-
ition of the eyes in a face image
sequence. Also, tracking the
gaze direction.

89
F

face analysis: A general term involved a combination of


covering the analysis of face im- human motion analysis and
ages and models. Often used skin color analysis.
to refer to facial expression an-
face feature detection: The
alysis.
location of features (such as
face authentication: Verifica- eyes, nose, mouth) from a
tion that (the image of ) a face human face. Normally per-
corresponds to a particular in- formed after face detection al-
dividual. This differs from the though it can be used as part
face recognition in that here of face detection.
only the model of a single per-
son is considered.

?
=

face identification: See face


face detection: Identification of recognition.
faces within an image or face indexing: Indexing from a
series of images. This often database of known faces as a
precursor to face recognition.
face modeling: Representing a
face using some type of model
typically derived from an image
(or images). These models are

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

91
used in face authentication, from an image or sequence of
face recognition, etc. images.
face recognition: The task of factorization: See motion fac-
recognizing a face from an torization.
image as an instance of a per- false alarm: See false positive.
son recorded in a database of
faces. false negative: A binary classifier
cx returns + or - for examples
x. A false negative occurs when
the classifier returns -for an
example that is in reality +.
?
=
false positive: A binary classifier
cx returns + or - for examples
x. A false positive occurs when
the classifier returns + for an
example that is in reality -.
face tracking: Tracking of a face fast Fourier transform (FFT):
in a sequence of images. Often A version of the Fourier
used as part of a human– transform for discrete sam-
computer interface. ples that is significantly more
efficient (order N log2 N ) than
face verification: See face au- the standard discrete Fourier
thentication. transform (which is order N 2 )
facet model based extraction: on data sets with N points.
The extraction of a model fast marching method: A type
based on facets (small simple of level set method in which
surfaces; e.g., see planar facet the search can move in only
model ) from range data. See one direction (hence making it
also planar patch extraction. faster).
facial animation: The way in feature: 1) A distinctive part of
which facial expressions something (e.g., the nose and
change. See also face feature eyes are distinctive features
detection. of the face), or an attribute
facial expression analysis: derived from an object/shape
Study or identification of the (e.g., circularity ). See also
facial expression(s) of a person image feature. 2) A numer-
ical property (possibly com-
bined with others to form a
Happy Perplexed Surprised feature vector ) and generally
used in a classifier.
feature based optical flow esti-
mation: Calculation of optical
flow in a sequence of images
from image features.

92
feature based stereo: A solu- scene point. Having the cor-
tion to the stereo correspond- respondence allows the esti-
ence problem in which image mation of the depth from
features are compared from binocular stereo, fundamental
the two images. The main alter- matrix, homography or trifocal
native approach is correlation tensor in the case of 3D scene
based stereo. structure recovery or of the 3D
feature based tracking: Track- target motion in the case of tar-
ing the motion of image get tracking.
features through a sequence. feature point tracking: Track-
feature contrast: The difference ing of individual image features
between two features. This in a sequence of images.
can be measured in many do- feature selection: Selection of
mains (e.g., intensity, orienta- suitable features (properties)
tion, etc.). for a specific task, for example,
feature detection: Identifica- classification. Typically fea-
tion of given features in an tures should be independent,
image (or model). For example detectable, discriminatory and
see corner detection. reliable.
feature extraction: See feature feature similarity: How much
detection. two features resemble each
other. Measures of feature
feature location: See feature similarity are required for
detection. feature based stereo, feature
feature matching: Matching of based tracking, feature
image features in several matching, etc.
images of the same object feature space: The dimensions
(for instance, feature based of a feature space are the
stereo), or of features from an feature (property) values of
unknown object with features a given problem. An object
from known objects (feature or shape is mapped to fea-
based recognition). ture space by computing the
feature orientation: The orien- values of the set of features
tation of an image feature with defining the space, typically for
respect to the image frame of recognition and classification.
reference.
Area

feature point: The image loca-


tion at which a particular
feature is found.
feature point correspondence:
Matching feature points in two
or more images. The assump-
tion is that the feature points
are the image of the same Rectangularity

93
In the example above, different Maximum, minimum and
shapes are mapped to a 2D fea- mean values of Feret’s diam-
ture space defined by area and eter are often used (where
rectangularity. every possible pair of parallel
feature stabilization: A tech- tangent lines is considered).
nique for stabilizing the pos- FERET: A standard database of
ition of an image feature in face images with a defined
an image sequence so that it experimental protocol for the
remains in a particular position testing and comparison of face
on a display (allowing/causing recognition algorithms.
the rest of the image to move
FFT: See fast Fourier transform.
relative to that feature).
fiber optics: A medium for
Original sequence transmitting light that consists
of very thin glass or plastic
fibers. It can be used to pro-
vide much higher bandwidth
Stabilized sequence for signals encoded as patterns
of light pulses. Alternately, it
can be used to transmit images
Stabilized feature directly through rigidly con-
nected bundles of fibers, so
feature tracking: See feature as to see around corners, past
based tracking. obstacles, etc.
feature vector: A vector formed fiberscope: A flexible fiber optic
by the values of a number instrument allowing parts of
of image features (properties), an object to be viewed that
typically all associated with the would normally be inaccess-
same object or image. ible. Most often used in medi-
cal examinations.
feedback: The use of outputs
from a system to control the fiducial point: A reference
system’s actions. point for a given algorithm,
e.g., a fixed, known, easily
Feret’s diameter: The distance detectable pattern for a cali-
between two parallel lines bration algorithm.
at the extremities of some
shape that are tangential to figure–ground separation:
the boundary of the shape. The segmentation of the area

Image Figure Ground

Maximum
Feret’s diameter

94
of the image representing the fingerprint indexing: See
object of interest (the figure) fingerprint database indexing.
from the remainder of the finite element model: A class of
image (the background). numerical methods for solving
figure of merit: Any scalar that differential problems. Another
is used to characterize the per- relevant class is finite differ-
formance of an algorithm. ence methods.
filter: In general, any algo- finite impulse response filter
rithm that transforms a signal (FIR): A filter that produces
into another. For instance, an output value ( yn ) based on
bandpass filters remove/reduce the current and past p input
the parts of an input signal values (xi ). yn = i=0 ai xn−i
outside a given frequency where ai are weights. See
interval; gradient filters allow also infinite impulse response
only image gradients to pass filters.
through; smoothing filters FIR: See finite impulse response
attenuate high frequencies. filter.
filter ringing: A type of distor- Firewire (IEEE 1394): A serial
tion caused by the applica- digital bus system supporting
tion of a steep recursive filter. 400 Mbits per second. Power,
Normally this term applies to control and data signals are
electronic filters in which cer- carried in a single cable. The
tain components (e.g., capaci- bus system makes it possible to
tors and inductors) can store address up to 64 cameras from
energy and later release it, but a single interface card and mul-
there are also digital equiva- tiple computers can acquire
lents to this effect. images from the same camera
simultaneously.
filtering: Application of a filter.
first derivative filter: See gra-
fingerprint database indexing: dient filter.
Indexing into a database of
first fundamental form: See
fingerprints using a number of
surface curvature.
features derived from the fin-
gerprints. This allows a smaller Fisher linear discriminant
number of fingerprints to be (FLD): A classification method
considered when attempt- that maps high dimensional
ing fingerprint identification data into a single dimension
within the database. in such a way as to maximize
class separability.
fingerprint identification:
Identification of an individual fisheye lens: See wide angle
through comparison of an lens.
unknown fingerprint (or flat field: 1) An object of uni-
fingerprints) with previously form color, used for photo-
known fingerprints. metric calibration of optical

95
systems. 2) A camera system is fMRI: Functional Magnetic Res-
flat field correct if the gray scale onance Imaging, or fMRI, is a
output at each pixel is the same technique for identifying which
for a given light input. parts of the brain are activated
flexible template: A model of by different types of phys-
a shape in which the relative ical stimulation, e.g., visual or
position of points is not fixed acoustic stimuli. A MRI scan-
(e.g., defined in probabilistic ner is set up to register the
form). This approach allows increased blood flow to the
for variations in the appear- activated areas of the brain on
ance of the shape. Functional MRI scans. See also
FLIR: Forward Looking Infrared. nuclear magnetic resonance.
An infrared system mounted FOA: See focus of attention.
on a vehicle looking ahead FOC: See focus of contraction.
along the direction of travel.
focal length: 1) The distance
between the camera lens and
the focal plane. 2) The distance
Infrared
Sensor
from a lens at which an object
viewed at infinity would be in
focus. (from infinity)

flow field: See optical flow field.


LIGHT

flow histogram: A histogram of


the optical flow in an image Focal
Length
sequence. This can be used, for
example, to provide a qualita-
tive description of the motion focal point: The point on the
of the observer. optical axis of a lens where
flow vector field: Optical flow light rays from an object at
is described by a vector infinity (also placed on the
(magnitude and orientation) optical axis) converge.
for each image point. Hence a
flow vector field is the same as Focal Point
an optical flow field.
Optical Axis
fluorescence: The emission of
visible light by a substance
caused by the absorption of
some other (possibly invisible)
electromagnetic wavelength. focal plane: The plane on which
This property is sometimes an image is focused by a lens
used in industrial machine system. Generally this consists
vision. of an array of photosensitive

96
elements. See also image interest moves. See also depth
plane. from focus.
focal surface: A term most fre- focus invariant imaging: Im-
quently used when a con- aging systems that are designed
cave mirror is used to focus to be invariant to focus. Such
an image (e.g., in a reflector systems have large depths of
telescope). The focal surface in field.
this case is the surface of the focus of attention (FOA): The
mirror. feature or object or area to
which the attention of a visual
Focal Surface system is directed.
Optical Axis
focus of contraction (FOC):
The point of convergence of
Focal Point the optical flow vectors for a
translating camera. The com-
focus: To focus a camera is to ponent of the translation along
arrange for the focal points the optical axis must be non-
of various image features to zero. Compare focus of ex-
converge on the focal plane. pansion.
An image is considered to focus of expansion (FOE):
be in focus if the main sub- The point from which all
ject of interest is in focus. optical flow vectors appear to
Note that focus (or lack of emanate in a static scene
focus) can be used to derive where the observer is moving.
useful information (e.g., see For example if a camera sys-
depth from focus ). tem was moving directly for-
wards along the optical axis
In focus Out of focus then the optical flow vectors
would all emanate from the
principal point (usually near
the center of the image).

Two images from a Blended


moving observer Image
FOE

focus control: The control of


the focus of a lens system usu-
ally by moving the lens along
the optical axis or by adjust- FOE: See focus of expansion.
ing the focal length. See also fold edge: A surface orientation
autofocus. discontinuity. An edge where
focus following: A technique two locally planar surfaces
for slowly changing the focus meet. The figure below shows
of a camera as an object of a fold edge.
97
FOLD EDGE Fourier domain inspection:
Identification of defects based
on features in the Fourier
transform of an image.
Fourier image processing:
Image processing in the
Fourier domain (i.e., process-
ing images that have been
transformed using the Fourier
foreground: In computer vision, transform ).
generally used in the context Fourier matched filter object
of object recognition. The area recognition: Object recogni-
of the scene or image in which tion in which correlation is
the object of interest lies. See determined using a matched
figure–ground separation. filter that is the conjugate of
foreshortening: A typical per- the Fourier transform of the
spective effect whereby distant object being located.
objects appear smaller than
closer ones. Fourier shape descriptor:
A boundary representation of a
form factor: The physical size or shape in terms of the
arrangement of an object. This coefficients of a Fourier
term is frequently used with transformation.
reference to computer boards.
Fourier slice theorem: A slice
Förstner operator: A feature at an angle  of a 2D Fourier
detector used for corner transform of an object is equal
detection as well as other edge to a 1D Fourier transform
features. of a parallel projection of
forward looking radar: A radar the object taken at the same
system mounted on a ve- angle. See also slice based
hicle looking ahead along the reconstruction.
direction of travel. See also
side looking radar. Fourier space: The frequency
domain space in which an
Fourier–Bessel transform: See image (or other signal) is rep-
Hankel transform. resented after application of
Fourier domain convolution: the Fourier transform.
Convolution in the Fourier Fourier space smoothing:
domain involves simply multi- Application of a smoothing
plication of the Fourier filter (e.g., to remove high-
transformed image by the frequency noise) in a Fourier
Fourier transformed filter. For transformed image.
very large filters this operation
is much more efficient than Fourier transform: A transform-
convolution in the original ation that allows a signal
domain. to be considered in the

98
frequency domain as a sum based on exploiting self-
of sine and cosine waves similarity at different scales.
or equivalently as a sum fractal measure/dimension: A
of exponentials. For a two measure of the roughness of a
dimensional
  image Fu v = shape. Consider a curve whose
− −
f x ye −2ixu + yv dxdy. length (L1 and L2 ) is measured
See also fast Fourier transform, at two scales (S1 and S2 ). If the
discrete Fourier transform and curve is rough the length will
inverse Fourier transform. grow as the scale is increased.
fovea: The high-resolution cen- The fractal dimension is D =
logL1 −L2 
tral region of the human retina. logS −S 
.
2 1
The analogous region in an fractal representation: A rep-
artificial sensor that emulates resentation based on self-
the retinal arrangement of similarity. For example a fractal
photoreceptor, for example a representation of an image
log-polar sensor. could be based on similarity of
foveal image: An image in which blocks of pixels.
the sampled pattern is inspired fractal surface: A surface model
by the arrangement of the that is defined progressively
human fovea, i.e., sampling is using fractals (i.e., the surface
most dense in the image center displays self-similarity at differ-
and gets progressively sparser ent scales).
towards the periphery of the
image. fractal texture: A texture repre-
sentation based on self-
similarity between scales.
frame: 1) A complete standard
television video image con-
sisting of both the even
and odd video fields. 2)
A knowledge representation
technique suitable for record-
ing a related set of facts,
rules of inference, precondi-
tions, etc.
frame buffer: A device that
stores a video frame for access,
display and processing by a
foveation: 1) The process of cre- computer. For example such
ating a foveal image. 2) Dir- devices are used to store the
ecting the camera optical axis frame from which a video dis-
to a given direction. play is refreshed. See also
frame store.
fractal image compression: An
image compression method frame grabber: See frame store.

99
frame of reference: A coord- of direction codes (typically 0
inate system defined with through 7). In the following
respect to some object, the figure we show the Freeman
camera or with respect to the codes relative to the center
real world. point on the left and an ex-
ample of the codes derived
Zworld from a chain of points on the
right.

5 6 7
Ycube
4 0
Xcube
Zcube 3 2 1
Xworld
0, 0, 2, 3, 1, 0, 7, 7, 6, 0, 1, 2, 2, 4
Yworld Ycylinder
Zcylinder
Frenet frame: A triplet of mutu-
Xcylinder ally orthogonal unit vectors
(the normal, the tangent
and the binormal/ bitangent )
frame store: An electronic describing a point on a
device for recording a frame curve.
from an imaging system.
Typically such devices are
used as interfaces between
CCIR cameras and computers. Normal

freeform surface: A surface that


Tangent
does not follow any parti-
cular mathematical form; for Binormal
example, the folds of a piece of
fabric, as shown below. frequency domain filter: A
filter defined by its action in
the Fourier space. See high
pass filter and low pass filter.
frequency spectrum: The range
of (electromagnetic) frequen-
cies.
front lighting: A general term
covering methods of lighting
a scene where the lights are
Freeman code: A type of chain on the same side of the object
code in which a contour is rep- as the camera. As an alterna-
resented by coordinates for the tive consider backlighting. For
first point followed by a series example,
100
Light fundamental form: A metric
Source that useful in determining local
properties of surfaces. See also
Objects
Camera being first fundamental form and
imaged second fundamental form.
Light fundamental matrix: A bilinear
Source relationship between cor-
responding points u u  in
frontal: Frontal presentation of binocular stereo images. The
a planar surface is one in which fundamental matrix, F, incorp-
the plane is parallel to the orates the two sets of camera
image plane. parameters (K, K  ) and
the relative position t 
full primal sketch: A repre- and orientation R of the
sentation described as part of cameras. Matching points
Marr’s theory of vision, that  from one image and u
u 
is made up of the raw primal from the other image satisfy
sketch primitives together with u = 0 where St  is the
 T F
u
grouping information. The skew symmetric matrix of t
sketch contains described and F = K −1 T St R −1 K  −1 .
image structures that could See also the essential matrix.
correspond with scene struc-
tures (e.g., image regions with fusion: Integration of data from
scene surfaces). multiple sources into a single
representation.
function based model: An
object representation based fuzzy logic: A form of logic that
on the object functionality allows a range of possibilities
(e.g., an object’s purpose or between true and false (i.e., a
the way in which an object degree of truth).
moves and interacts with fuzzy morphology: A type of
other objects) rather than its mathematical morphology that
geometric properties. is based on fuzzy logic rather
function based recognition: than the more conventional
Object recognition based on Boolean logic.
object functionality rather than fuzzy set: A grouping of data
geometric properties. See also (into a set) where each item
function based model. in the set has an associated
functional optimization: An grade/ likelihood of member-
analytical technique for opti- ship in the set.
mizing (maximizing or min- fuzzy reasoning: See fuzzy
imizing) complex functions of logic.
continuous variables.
functional representation: See
function based model.

101
G

Gabor filter: A filter formed by industrial machine vision sys-


multiplying a complex oscil- tems.
lation by an elliptical Gauss- gait analysis: Analysis of the
ian distribution (specified by way in which human subjects
two standard deviations and move. Frequently used for bio-
an orientation). This creates metric or medical purposes.
filters that are local, selective
for orientation, have different
scales and are tuned for inten-
sity patterns (e.g., edges, bars
and other patterns observed to
trigger responses in the simple
cells of the mammalian visual
cortex) according to the fre-
quency chosen for the com-
plex oscillation. The filter can
be applied in the frequency
domain as well as the spatial
domain. gait classification: 1) Classifi-
cation of different types of
Gabor transform: A transform- human motion (such as walk-
ation that allows a 1D or 2D ing, running, etc.). 2) Bio-
signal (such as an image) to be metric identification of people
represented as a weighted sum based on their gait parameters.
of Gabor functions.
Galerkin approximation: A
Gabor wavelets: A type of method for determining the
wavelet formed by a sinusoidal coefficients of a power series
function that is restricted by a solution for a differential equa-
Gaussian envelope function. tion.
gaging: Measuring or testing. gamma: Devices such as
A standard requirement of cameras and displays that

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

103
convert between analogue Gaussian distribution: A prob-
(denoted a) and digital (d ) ability density function with
images generally have a non- this distribution:
linear relationship between a x−2
and d. A common model for 1 −
Px = √ e 2 2
this nonlinearity is that the  2
signals are related by a gamma
curve of the form a = c × d  , where  is the mean and 
for some constant c. For CRT is the standard deviation. If
displays, common values of  x ∈ d , then the multivariate
are in the range 1.0–2.5. probability density function is
1
gamma correction: The correc- px  = det2− 2 exp− 12 
x−
 −1
tion of brightness and color   
 x − 
 where   is the
ratios so that an image has the distribution mean and  is its
correct dynamic range when covariance.
displayed on a monitor.
Gaussian mixture model: A
gauge coordinates: A coord- representation for a distribu-
inate system local to the image tion based on a combination
surface itself. Gauge coord- of Gaussians. For instance,
inates provide a convenient used to represent color histo-
frame of reference for oper- grams with multiple peaks. See
ators such as the gradient oper- expectation maximization.
ator.
Gaussian noise: Noise whose
Gaussian convolution: See distribution is Gaussian in
Gaussian smoothing. nature. Gaussian noise is spe-
Gaussian curvature: A measure cified by its standard deviation
of the surface curvature at a about a zero mean, and is often
point. It is the product of the modeled as a form of additive
maximum and minimum of the noise.
normal curvatures in all direc-
tions through the point. See
also mean curvature.
Gaussian derivative: The com-
bination of Gaussian smooth-
ing and a gradient filter. This
results in a gradient filter that
is less sensitive to noise.
Gaussian pyramid: A multi-
resolution representation of
Original Normal first Gaussian first
Image derivative derivative
an image formed by several
images, each one a Gaussian
smoothed and subsampled ver-
sion of the original one at
increasing standard deviation.
104
Original Gaussian Smoothed Images gaze direction estimation: Esti-
Image sigma = 1.0 sigma = 3.0 mation of the direction in
which a human subject is look-
ing. Used for human–computer
interaction.

Gaussian smoothing: An image


processing operation aimed to
attenuate image noise com-
puted by convolution with
a mask sampling a Gaussian
distribution.
gaze direction tracking: Con-
tinuous gaze direction estima-
tion (e.g., in a video sequence
or a live camera feed).
gaze location: See gaze direc-
tion estimation.
generalized cone: A general-
Gaussian speckle: Speckle that ized cylinder in which the
has a Gaussian distribution. swept curve changes along the
Gaussian sphere: A sampled axis.
representation of a unit sphere generalized curve finding: A
where the surface of the sphere general term referring to meth-
is defined by a number of ods that locate arbitrary curves.
triangular patches (often com- For example, see generalized
puted by dividing a dodeca- Hough transform.
hedron). See also extended
Gaussian image. generalized cylinder: A volu-
metric representation where
the volume is defined by
sweeping a closed curve along
an axis. The axis does not need
to be straight and the closed
curve may vary in shape as
it is moved along the axis.
For example a cylinder may

Axis

gaze control: The ability of a


human subject or a robot head
to control their gaze direction.
105
be defined by moving a circle to evolve programs that satisfy
along a straight axis, and a cone some evaluation criteria.
may be defined by moving a cir- genus: In the study of top-
cle of changing diameter along ology, the number of “holes” in
a straight axis. a surface. In computer vision,
generalized Hough transform: sometimes used as a dis-
A version of the Hough criminating feature for simple
transform capable of detect- object recognition.
ing the presence of arbitrary Gestalt: German for “shape”.
shapes. The Gestalt school of psych-
generalized order statistics fil- ology, led by the German
ter: A filter in which the values psychologists Wertheimer,
within the filter mask are con- Köhler and Koffka in the first
sidered in increasing order and half of the twentieth century,
then combined in some fash- had a profound influence
ion. The most common such on perception theories, and
filter is the median filter that subsequently on computer
selects the middle value. vision. Its basic tenet was
that a perceptual pattern has
generate and test: See hypothe- properties as a whole, which
size and verify. cannot be explained in terms
generic viewpoint: A viewpoint of its individual components.
such that small motions may In other words, the whole is
cause small changes in the size more than the sum of its parts.
This concept was captured in
or relative positions of fea-
some basic laws (proximity,
tures, but no features appear
similarity, closure, “com-
or disappear. This contrasts mon destiny” or good form,
with a privileged viewpoint. saliency), that would apply to
genetic algorithm: An optimiza- all mental phenomena, not
tion algorithm seeking solu- just perception. Much work
tions by refining iteratively a on low-level computer vision,
small set of candidates with a most notably on perceptual
process mimicking genetic evo- grouping and perceptual
lution. The suitability (fitness) organization, has exploited
of a set of possible solutions these ideas. See also visual
(population) is used to gen- illusion.
erate a new population until geodesic: The shortest line be-
some conditions are satisfied tween two points (on a math-
(e.g., the best solution has not ematically defined surface).
changed for a given number of
geodesic active contour: An
iterations).
active contour model similar
genetic programming: Applica- to the snake model in that
tion of genetic algorithms in it attempts to minimize an
some programming language energy function between the

106
model and the data, but which geographic information sys-
also incorporates a geometrical tem (GIS): A computer sys-
model. tem that stores and manipulates
geographically referenced data
Initial Final (such as images of portions of
Contour Contour the Earth taken by satellite).
geometric compression: The
compression of geometric
structures such as polygons.
geometric constraint: A limi-
tation on the possible phys-
geodesic active region: A tech- ical arrangement/appearance
nique for region based seg- of objects based on geom-
mentation that builds on etry. These types of con-
geodesic active contours by straints are used extensively
adding a force that takes into in stereo vision (e.g., the
account information within epipolar constraint), motion
regions. Typically a geodesic analysis (e.g., rigid motion
active region will be bounded constraint) and object recogni-
by a single geodesic active con- tion (e.g., focusing on specific
tour. classes of objects or relations
between features).
geodesic distance: The length
of the shortest path between geometric correction: In re-
two points along some sur- mote sensing, an algorithm
face. This is different from the or technique for correction of
Euclidean distance that takes geometric distortion.
no account of the surface. geometric deformable model:
The following example shows A deformable model in which
the geodesic distance between the deformation of curves is
Calgary and London (following based on the level set method
the curvature of the Earth). and stops at object boundaries.
A typical example is a geodesic
active contour model.
geometric distance: In curve
and surface fitting, the short-
est distance from a given point
to a given surface. In many fit-
ting problems, the geometric
distance is expensive to com-
pute but yields more accurate
solutions. Compare algebraic
geodesic transform: Assigns to
distance.
each point the geodesic dis-
tance to some feature or class geometric distortion: Devi-
of feature. ations from the idealized
107
image formation model (for geometric optics: A general
example, pinhole camera) of term referring to the descrip-
an imaging system. Examples tion of optics from a
include radial lens distortion geometrical point of view.
in standard cameras. Includes concepts such as the
geometric feature: A general simple pinhole camera model,
term describing a shape char- magnification, lenses, etc.
acteristic of some data, that geometric reasoning: Reason-
encompasses features such as ing with geometric shapes in
edges, corners, geons, etc. order to address such tasks as
geometric feature learning: robot motion planning, shape
Learning geometric features similarity, spatial position esti-
from examples of the feature. mation, etc.
geometric feature proximity: geometric representation: See
A measure of the distance geometric model.
between geometric features, geometric shape: A shape
e.g., as by using the distance that takes a relatively simple
between data and overlaid geometric form (such as a
model features in hypothesis square, ellipse, cube, sphere,
verification. generalized cylinder, etc.) or
geometric hashing: A tech- that can be described as a
nique for matching models combination of such geometric
in which some geometric primitives.
invariant features are mapped
into a hash table, and this hash geometric transformation:
table is used to perform the A class of image processing
recognition. operations that transform the
spatial relationships in an
geometric invariant: A quantity image. They are used for the
describing some geometric correction of geometric dis-
configuration that remains tortions and general image
unchanged under certain trans- manipulation. A geometric
formations (e.g., cross-ratio, transformation requires the
perspective projection).
definition of a pixel coordinate
geometric model: A model transformation together with
that describes the geometric an interpolation scheme. For
shape of some object or example, a rotation does:
scene. A model can be 2D
(e.g., polycurve) or 3D (e.g.,
surface based models), etc.
geometric model matching:
Comparison of two geometric
models or of a model and a set
of image data shapes, for the
purposes of recognition.

108
geon: GEometrical iON. A basic Glint
volumetric primitive proposed
by Biederman and used in
recognition by components.
Some example geons are:

global: A global property of a


mathematical object is one that
depends on all components of
the object. For example, the
average intensity of an image is
gesture analysis: Basic analy- a global property, as it depends
sis of video data representing on all the image pixels.
human gestures preceding the global positioning system
task of gesture recognition. (GPS): A system of satellites
gesture recognition: The rec- that allow the position of a GPS
ognition of human gestures receiver to be determined in
generally for the purpose of absolute Earth-referenced co-
human–computer interaction. ordinates. Accuracy of standard
See also hand sign recognition. civilian GPS is of the order of
meters. Greater accuracy is
obtainable using differential
HI STOP GPS.
global structure extraction:
Identification of high level
structures/relationships in an
image (e.g., symmetry detec-
tion).
global transform: A gen-
Gibbs sampling: A method for eral term describing an
probabilistic inference based operator that transforms
on transition probabilities an image into some other
(between states). space. Sample global trans-
GIF: Graphics Interchange For- forms include the discrete
mat. A common compressed cosine transform, the Fourier
image format based on the transform, the Haar transform,
Lempel–Ziv–Welch algorithm. the Hadamard transform, the
Hartley transform, histograms,
GIS: See geographic information the Hough transform, the
system. Karhunen–Loeve transform,
glint: A specular reflection vis- the Radon transform, and the
ible on a mirror-like surface. wavelet transform.

109
golden template: An image be determined for every point.
of an unflawed object/scene Common filters include Roberts
that is used within template cross gradient operator, Prewitt
matching to identify any gradient operator and the Sobel
deviations from the ideal gradient operator. The Sobel
object/scene. horizontal gradient operator
gradient: Rate of change. This gives:
is frequently associated with
edge detection. See also gray Gradient
scale gradient. Filter
–1 0 1
∗ –2 0 2 =
–1 0 1
Intensity

Gradient

gradient image: See edge


Position in image.
Position in image row gradient magnitude thresh-
image row olding: Thresholding of a
gradient image in order to
identify ‘strong’ edge points.
gradient based flow estima-
tion: Estimation of the optical
flow based on gradient images.
This computation can be done
directly through the compu-
tation of a time derivative as
long as the movement between
frames is quite small. See also
the aperture problem.
gradient descent: An iterative gradient matching stereo: An
method for finding the (local) approach to stereo matching
minimum of a function. in which the image gradients
gradient edge detection: Edge (or features derived from the
detection based on image image gradients) are matched.
gradients. gradient operator: An image
gradient filter: A filter that is processing operator that pro-
convolved with an image to cre- duces a gradient image from
ate an image in which every a gray scale input image I .
point represents the gradient Depending on the usage of the
in the original image in an term, the output could be 1)
orientation defined by the fil- the vectors I of the x and y
ter. Normally two orthogonal derivatives at each point or 2)
filters are used and by combin- the magnitudes of these gra-
ing these a gradient vector can dient vectors. The usual role
110
of the gradient operator is to minimum in a function that
locate regions of strong gradi- has many sharp local minima
ents that signals the position (a non-convex function). This
of an edge. The figure below is achieved by approximating
shows a gray scale image and the function by a convex func-
its gradient magnitude image, tion with just one minimum
where darker lines indicate (near the global minimum of
stronger magnitudes. The gra- the non-convex function) and
dient was calculated using the then gradually improving the
Sobel operator. approximation.
grammar: A system of rules con-
straining the way in which
primitives (such as words) can
be combined. Used in com-
puter vision to represent ob-
jects where the primitives are
simple shapes, textures or fea-
tures.
grammatical representation: A
representation that describes
shapes using a number of
gradient space: A representa- primitives that can be com-
tion of surface orientations in bined using a particular set of
which each orientation is rep- rules (the grammar).
resented by a pair  p q where granulometric spectrum: The
p =
x

z
and q =
z

y
(where the z resultant distribution from a
axis is aligned with the optical granulometry.
axis of the viewing device). granulometry: The study of the
size characteristics of a set
Z (e.g., the size of a set of
Vectors representing
A
B various surface
Gradient Space regions). Most normally this
q
D E
orientations
C is achieved by applying a
C
B
E series of morphological open-
Y
A
D
p ings (with structured elements
X
of increasing size) and then
studying the resultant size
distributions.
gradient vector: A vector de- graph: A graph is formed by a
scribing the magnitude and set of vertices V and a set of
direction of maximal change edges E ⊂ V × V linking pairs
on an N-dimensional surface. of vertices. Vertices u and v are
graduated non-convexity: An neighbors if u v ∈ E or v u ∈
algorithm for finding a global E. See graph isomorphism,

111
subgraph isomorphism. This is
a graph with five nodes:
Graph Model Graph Model

?
A
B

b c a graph model: A model of data in


terms of a graph. Typical uses in
computer vision include object
representation (see graph
graph cut: A partition of the ver- matching) and edge gradients
tices of a directed graph V into (see graph searching).
two disjoint sets S and T . The
graph partitioning: The oper-
cost of the cut is the costs of all
the edges that go from a vertex ation of splitting a graph into
in S to a vertex in T . subgraphs satisfying some cri-
teria. For example we might
graph isomorphism: Two want to partition a graph of
graphs are isomorphic if there all polygonal edge segments in
exists a mapping (bijection) an image into subgraphs cor-
between their vertices that responding to objects in the
makes the edge sets iden- scene.
tical. Determining whether
two graphs are isomorphic graph representation: See
is the graph isomorphism graph model.
problem and is believed to graph searching: Search for a
be NP-complete. These small specific node or path through
graphs are isomorphic with a graph. Used, among other
A:b, C:a, B:c: things, for border detection
(e.g., in an edge gradient
b
image) and object identification
A
B (e.g., decision trees).
c
graph similarity: The degree to
C a which two graph represen-
tations are similar. Typically (in
computer vision) these repre-
graph matching: A general term
sentations will not be exactly
describing techniques for com-
the same and hence a double
paring two graph models.
These techniques may attempt subgraph isomorphism may
to find graph isomorphisms, need to be found to evaluate
subgraph isomorphisms, or similarity.
may just try to establish simi- graph theoretic clustering:
larity between graphs. Clustering algorithms that use

112
concepts from graph theory, in gray scale distribution model:
particular leveraging efficient A model of how gray scales
graph-theoretic algorithms are distributed in some image
such as maximum flow. region. See also intensity
histogram.
grassfire algorithm: A tech-
nique for finding a region gray scale gradient: The rate of
skeleton based on wave propa- change of the gray levels in
gation. A virtual fire is lit a gray scale image. See also
on all region boundaries and edge, gradient image and first
the skeleton is defined by derivative filter.
the intersection of the wave gray scale image: A mono-
fronts. chrome image in which pixels
typically represents brightness
TIME values ranging from 0 to 255.
See also gray scale.

OBJECT FIRE SKELETON

grating: See diffraction grating.


gray level    : See gray
scale .
gray scale: A monochromatic
representation of the value of gray scale mathematical
a pixel. Typically this repre- morphology: The appli-
sents image brightness and cation of mathematical
ranges from 0 (black) to 255 morphology to gray scale
(white). images. Each quant-
ization level is treated as a
0 255 distinct set where pixels are
members of the set if they have
a value greater than or equal to
particular quantization levels.
gray scale co-occurrence: The
occurrence of two particu- gray scale moment: A moment
lar gray levels some partic- that is based on image or
ular distance and orientation region gray scales. See also
apart. Used in co-occurrence binary moment.
matrices. gray scale morphology: See
gray scale correlation: The gray scale mathematical
cross correlation of gray scale morphology.
values in image windows or gray scale similarity: See gray
full images. scale correlation.

113
gray scale texture moment: a training phase where noisy
A moment that describes data and corresponding ideal
texture in a gray scale image data are presented.
(e.g., the Haralick texture ground following: See ground
operator describes image tracking.
homogeneity).
ground plane: The horizontal
gray scale transformation: A plane that corresponds to the
general term describing a class ground (the surface on which
of image processing operations objects stand). This concept
that apply to gray scale images, is only really useful when
and simply manipulate the the ground is roughly flat.
gray scale of pixels. Example The ground plane is high-
operations include contrast lighted here:
stretching and histogram equa-
lization.
gray value    : See gray
scale .
greedy search: A search algo-
rithm seeking to maximize
a local criterion instead of
a global one. Greedy algo-
rithms sacrifice generality for ground tracking: A loosely
speed. For instance, the stable defined term describing the
configuration of a snake is robot navigation problem of
typically found by an iter- sensing the ground plane and
ative energy minimization. The following some path.
snake configuration at each ground truth: In performance
step of the optimization can analysis, the true value, or the
be found globally, by searching most accurate value achiev-
the space of all allowed config- able, of the output of a
urations of all pixels simultan- specific instrument under anal-
eously (a large space) or locally ysis, for instance a vision
(greedy algorithm), by search- system measuring the diameter
ing the space of all allowed of circular holes. Ground truth
configurations of each pixel values may be known theor-
individually (a much smaller etically, e.g., from formulae,
space). or obtained through an instru-
grey    : See gray . ment more accurate than the
grid filter: An approach to noise one being evaluated.
reduction where a nonlinear grouping: 1) In human percep-
function of features (pixels or tion, the tendency to perceive
averages of a number of pixels) certain patterns or clusters of
from the local neighborhood stimuli as a coherent, distinct
are used. Grid filters require entity as opposed to a set of
114
independent elements. 2) A
whole class of segmentation
algorithms is based on this idea.
Much of this work was inspired
by the Gestalt school of psych-
ology. See also segmentation,
image segmentation, super-
vised classification, and clus-
tering.
grouping transform: An image
analysis technique for group-
ing image features together
(e.g., based on collinearity,
etc.).

115
H

Haar transform: A wavelet hand sign recognition: The


transform that is used in image recognition of hand gestures
compression. The basis func- such as those used in sign lan-
tions used are similar to those guage.
used by first derivative edge
detectors, resulting in images
that are decomposed into hor- H I
izontal, diagonal and vertical
edges at different scales.
Hadamard transform: A trans-
formation that can be used to hand tracking: The tracking
transform an image to its of a person’s hand in a video
constituent Hadamard compo- sequence, often for use in
nents. A fast version of the human–computer interaction.
algorithms exists that is similar
to the fast Fourier transform, hand–eye calibration: The cali-
but all values in the basis func- bration of a manipulator (such
tions are either +1 or −1. It as a robot arm) together with a
requires significantly less com- visual system (such as a num-
putation and as such is often ber of cameras). The main
used for image compression. issue here is ensuring that both
systems use the same frame
halftoning: See dithering. of reference. See also camera
Hamming distance: The num- calibration.
ber of different bits in corres- hand–eye coordination: The
ponding positions in two bit use of visual feedback to direct
strings. For instance, the Ham- the movement of a manipu-
ming distance of 01110 and lator. See also hand–eye cali-
01100 is 1, that of 10100 and bration.
10001 is 2. A very important
concept in digital communica- handwriting verification:
tions. Verification that the style of

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

117
handwriting corresponds to Hausdorff distance: A measure
that of some particular of the distance between two
individual. sets of (image) points. For
handwritten character recog- every point in both sets deter-
nition: The automatic recog- mine the minimum distance
nition of characters that have to any point in the other set.
been written by hand. The Hausdorff distance is the
maximum of these minimum
values.
HDTV: High Definition TeleVi-
sion.
height image: See range image.
Hankel transform: A simplifica- Helmholtz reciprocity: An ob-
tion of the Fourier transform servation by Helmholtz about
for radially symmetric func- the bidirectional reflectance
tions. distribution function fr i  e of
hat transform: See Laplacian a local surface patch, where i
of Gaussian (also known as and e are the incoming and
Mexican hat operator) and/or outgoing light rays respect-
top hat operator. ively. The observation is
Harris corner detector: A cor- that the reflectance is sym-
ner detector where a corner metric about the incoming
is detected if the eigenvalues and outgoing directions,
of the matrix M are large and i.e., fr i  e = fr e  i .
locally maximum ( fi j  is the
intensity at point (i,j)). Hessian: The matrix of second
  derivatives of a multi-valued
f f f f scalar function. It can be
 i i i j  used to design an orientation-
M= 
 f f f f   To avoid dependent second derivative
i j j j edge detector.

explicit comutation of the  


eigenvalues, the local maxima 2 fi j 2 fi j
of detM − 0004 × traceM can  i 2 ij 
H= 
 2 fi j 2 fi j 
be used. This is also known as
the Plessey corner finder. ji j 2
Hartley transform: Similar
transform to the Fourier trans-
form, but the coefficients used heterarchical/mixed control:
are real (whereas those used An approach to system con-
in the Fourier transform are trol where control is shared
complex). amongst several systems.
118
heuristic search: A search pro- repeated until some condition
cess that employs common- is satisfied (e.g., no clusters of
sense rules (heuristics) to less that a particular size
speed up search. remain).
hexagonal image representa- hierarchical coding: Coding
tion: An image representa- of (image) data at multiple
tion where the pixels are hex- layers starting with the lowest
agonal rather than rectangular. level of detail and grad-
This representation might be ually increasing the resolu-
used because 1) it is similar to tion. See also hierarchical
the human retina or 2) the dis- image compression.
tances to all adjacent pixels are
hierarchical Hough transform:
equal, unlike diagonally con-
A technique for improving
nected pixels in rectangular
the efficiency of the standard
grids
Hough transform. Commonly
used to describe any Hough-
Hexagonal Sampling Grid based technique that solves a
sequence of problems begin-
ning with a low-resolution
Hough space and proceeding
to high-resolution space, or
using low-resolution images,
or operating on subimages of
hidden Markov model (HMM): the input image before com-
A model for predicting the bining the results.
probability of system state on hierarchical image com-
the basis of the previous state pression: Image compression
together with some observa- using hierarchical coding. This
tions. HMMs have been used leads to the concept of pro-
extensively in handwritten gressive image transmission.
character recognition. hierarchical matching: Match
hierarchical: A general term ing at increasingly greater
referring to the approach of levels of detail. This approach
considering data at a low level can be used when matching
of detail initially and then images or more abstract repre-
gradually increasing the level sentations.
of detail. This approach often hierarchical model: A model
results in better performance. formed by smaller submodels,
hierarchical clustering: An each of which may have further
approach to grouping in smaller submodels. The model
which each item is initially may contain multiple instances
put in a separate cluster, the of the subcomponent models.
two most similar clusters are The subcomponents may be
merged and this merging is placed relative to the model

119
by using a coordinate system highlight: See specular reflec-
transformation or may just be tion.
listed in a set structure. This histogram: A representation of
is a three-level hierarchical the frequency distribution of
model with multiple usage of some values. See intensity
the subcomponents: histogram, an example of
which is shown below.

600

Frequency
0
0 Gray Scale 255

hierarchical recognition: See


hierarchical matching. histogram analysis: A gen-
eral term describing a group
hierarchical texture: A way of of techniques that abstract
considering texture elements information from histograms
at multiple levels (e.g., basic (e.g., determining the anti-
texture elements may them- mode/trough in a bi-modal
selves be grouped together histogram for use in thresh-
to form a texture element at olding).
another scale, and so on).
histogram equalization: An
hierarchical thresholding: A image enhancement operation
thresholding technique where that processes a single image
an image is considered at and results in an image with a
different levels of detail in a uniform distribution of inten-
pyramid data structure, and sity levels (i.e., whose intensity
thresholds are identified at histogram is flat). When this
different levels in the pyramid technique is applied to a
starting at the highest level. digital image, however, the
high level vision: A general resulting histogram will often
term referring to image analy- have large values interspersed
sis and understanding tasks with zeros.
(i.e., those tasks that address
reasoning about what is seen,
as opposed to basic processing
of images).
high pass filter: A frequency
domain filter that removes or
suppresses all low-frequency histogram modeling: A class of
components. techniques, such as histogram

120
equalization, modifying the HK segmentation: See mean
dynamic range and contrast of and Gaussian curvature shape
an image by changing its inten- classification.
sity histogram into one with HMM: See hidden Markov
desired properties. model.
histogram modification: See
holography: The process of cre-
histogram modeling.
ating a three dimensional
histogram moment: A moment image (a hologram) by record-
derived from a histogram. ing the interference pattern
histogram smoothing: The produced by coherent laser
application of a smoothing fil- light that has been passed
ter (e.g., Gaussian smoothing) through a diffraction grating.
to a histogram. This is often homogeneous, homogeneity:
required before histogram 1. (Homogeneous coordin-
analysis operations can be ates:) In projective n-
applied. dimensional geometry, a point
is represented by a n + 1 elem-
600 600 ent vector, with the Cartesian
representation being found by
Frequency

Frequency

dividing the first n components


by the last one. Homogeneous
quantities such as points are
0
0 Gray Scale 255
0
0 Gray Scale 255
equal if they are scalar multiples
of each other. For example a
2D point is represented as x y
hit and miss/hit or miss oper- in Cartesian coordinates and in
ator: A morphological oper- homogeneous coordinates by
ation where a new image is the point x y 1 and any multi-
formed by ANDing (logical ple thereof. 2. (Homogeneous
AND) together corresponding texture:) A two (or higher)
bits for every pixel of an input dimensional pattern, defined
image and a structuring elem- on a space S ⊂ R2 for which
ent. This operator is most some functions (e.g., mean,
appropriate for binary images standard deviation) applied to
but may also be applied to gray a window on S have values that
scale images. are independent of the position
of the window.
homogeneous coordinates:
Points described in projective
space. For example an
x y z point in Euclidean
space would be described as
HK: See mean and Gaussian x y z  for any  in
curvature shape classification. homogeneous coordinates.
121
homogeneous representation: that associated with the ground
A representation defined in plane.
projective space.
homography: The relationship Vanishing
Vanishing
described by a homography Point Point
transformation.
homography transformation:
Any invertible linear trans- Horizon Line Horizon Line
formation between projective
spaces. It is commonly used
for image transfer, which maps Hough transform: A technique
one planar image or region for transforming image fea-
to another. The transformation tures directly into the likeli-
can be estimated using four hood of occurrence of some
non-collinear point pairs. shape. For example see Hough
homomorphic filtering: An transform line finder and gen-
image enhancement tech- eralized Hough transform.
nique that simultaneously Hough transform line finder:
normalizes brightness and A version of the Hough trans-
enhances contrast. It works form based on the parametric
by applying a high pass filter equation of a line (s = i cos  +
to the original image in the j sin ) in which a set of edge
frequency domain, hence points i j is transformed
reducing intensity variation
into the likelihood of a line
(that changes slowly) and
being present as represented
highlighting reflection detail
(that changes rapidly). in a s  space. The likelihood
is quantified, in practice, by a
homotopic transformation: A histogram of the sin  cos  val-
continuous deformation that ues observed in the images.
preserves the connectivity of
object features (e.g., skeleton-
ization). Two objects are Image Edge Image Significant
homotopic if they can be made Lines
the same by some series of
homotopic transformations.
Hopfield network: A type of
neural network mainly used in
optimization problems, which
has been used in object reco- HSI: Hue-Saturation-Intensity
gnition. color image format.
horizon line: The line defined HSL: Hue-Saturation-Luminance
by all vanishing points from color image format (see plate
the same plane. The most com- section for a colour version of
monly used horizon line is these figures).

122
Color Image A well known vision system
developed by Nicholas Ayache
= and Olivier Faugeras, in which
geometric relations derived
Hue Saturation Luminance from polygonal models are
used for recognition.
hyperbolic surface region:
A region of a 3D surface that is
locally saddle-shaped. A point
HSV: Hue Saturation Value on a surface at which the
color image format. Gaussian curvature is negative
hue: Describes color using the (so the signs of the principal
dominant wavelength of the curvatures are opposite).
light. Hue is a common com-
ponent of color image formats
(see HSI, HSL, HSV).
Hueckel edge detector: A
parametric edge detector that
models an edge using a
parameterized model within a
hyperfocal distance: The dis-
circular window (the param-
tance D at which a camera
eters are edge contrast, edge
should be focused in order that
orientation and distance the depth of field extends from
background mean intensity). D/2 to infinity. Equivalently, if
Huffman encoding: An optimal, a camera is focused at a point at
variable-length encoding of val- distance D, points at D/2 and
ues (e.g., pixel values) based infinity are equally blurred.
on the relative probability of hyperquadric: A class of volu-
each value. The code lengths metric shape representations
may change dynamically if the that include superquadrics.
relative probabilities of the Hyperquadric models can
data source change. This tech- describe arbitrary convex
nique is commonly used in polyhedra.
image compression.
hyperspectral image: An image
human motion analysis: A gen- with a large number (perhaps
eral term describing the appli- hundreds) of spectral bands.
cation of motion analysis to An image with a lower number
human subjects. Such analy- of spectral bands is referred to
sis is used to track moving as multi-spectral image.
people, to recognize the pose
hyperspectral sensor: A sensor
of a person and to derive 3D
capable of collecting many
properties. (perhaps hundreds) of spec-
HYPER: HYpothesis Predicted tral bands simultaneously. Pro-
and Evaluated Recursively. duces a hyperspectral image.

123
hypothesize and test: See
hypothesize and verify.
hypothesize and verify: A
common approach to object
recognition in which possibil-
ities (of object type and pose)
are hypothesized and then
evaluated against evidence
from the images. This is done
either until all possibilities
are considered or until a
hypothesis with a sufficiently
high degree of fit is found.

Possible hypotheses:
What piece
goes here?

Hypotheses which do not


need to be considered
(in this 3 by 3 jigsaw):

hysteresis tracking: See thresh-


olding with hysteresis.

124
I

ICA: See independent com- system developed by Haralick


ponent analysis. and Currier.
iconic: Having the characteris- identification: The process of
tics of an image. See iconic associating some observations
model. with a particular instance or
iconic model: A representation class of object that is already
having the characteristics of known.
an image. For example the identity verification: Confirm-
template used in template ation of the identity of a person
matching. based on some biometrics
(e.g., face authentication). This
iconic recognition: Object rec-
differs from the recognition of
ognition using iconic models.
an unknown person in that
ICP: See iterative closest point. only one model has to be
ideal line: A line described in compared with the information
the continuous domain as that is observed.
opposed to one in a digital IGS: Interpretation Guided Seg-
image, which will suffer from mentation. A vision technique
rasterization. for grouping image elem-
ideal point: A point described ents into regions based on
in the continuous domain as semantic interpretations in
opposed to one in a digital addition to raw image values.
image, which will suffer from Developed by Tenenbaum and
rasterization. May also be used Barrow.
to refer to a vanishing point. IHS: Intensity Hue Saturation
IDECS: Image Discrimination color image format.
Enhancement Combination IIR: See infinite impulse re-
System. A well-known vision sponse filter.

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

125
ill-posed problem: A math-
ematical problem that infringes
at least one of the conditions
in the definition of well-posed
problem. Informally, these are
that the solution must (a) exist,
(b) be unique, and (c) depend
continuously on the data. Ill-
posed problems in computer
vision have been approached
using regularization theory.
See regularization. image: A function describing
illuminance: The total amount some quantity (such as bright-
of visible light incident upon a ness) in terms of spatial layout
point on a surface. Measured in (See image representation).
lux(lumenspermetersquared), Most frequently computer
or footcandles (lumens per vision is concerned with two
foot squared). Illuminance de- dimensional digital images.
creases as the distance between image addition: See pixel add-
the viewer and the source ition operator.
increases.
image analysis: A general term
illuminant direction: The dir- covering all forms of analysis
ection from which illuminance of image data. Generally image
originates. See also light source analysis operations result in
geometry. a symbolic description of the
illumination: See illuminance. image contents.
illumination constancy: The image acquisition: See image
phenomenon that allows capture.
humans to perceive the light- image arithmetic: A general
ness/brightness of surfaces as term covering image process-
approximately constant regard- ing operations that are based
less of the illuminance. on the application of an arith-
illumination field calibration: metic or logical operator to
Determination of the illumin- two images. Such operations
ance falling on a scene. Typic- included addition, subtraction,
ally this is done by taking an multiplication, division, blend-
image of a white object of ing, AND, NAND, OR, XOR, and
known brightness. XNOR.
illusory contour: A perceived image based: A general term
border where there is no edge describing operations or rep-
present in the image data. See resentations that are based on
also subjective contour. For images.
example the following diagram image based rendering: The
shows the Kanizsa triangles. production of a new image of
126
a scene from an arbitrary view- indices (for example, key-
point based on a number of words) with images that allows
images of the scene together the images to be indexed
with associated range images. efficiently within a database.
image blending: An arithmetic image difference: See image
operation similar to image subtraction.
addition where a new image is
image digitization: The process
formed by blending the values
of sampling and quantizing an
of corresponding pixels from
analogue image function to
two input images. Each input
create a digital image.
image is given a weight for
the blending so that the total image distortion: Any effect
weight is 1.0. that alters an image from the
ideal image. Most typically
this term refers to geometric
∗0.7 + Trinity college ∗0.3 = distortions, although it can also
Dublin
refer to other types of dis-
tortion such as image noise
image capture: The acquisition and effects of sampling and
of an image by a recording quantization.
device, e.g., a camera.
image coding: The mapping or Correct Image Distorted Image
algorithm required to encode
or decode an image represen-
tation (such as a compressed
image ).
image compression: A method
of representing an image in
order to reduce the amount
of storage space that it occu- image encoding: The process
pies. Techniques can be loss- of converting an image into
less (which allows all image data a different representation. For
to be recorded perfectly) or example see image compres-
lossy (where some loss of qual- sion.
ity is allowed, typically resulting image enhancement: A general
in significantly better compres- term covering a number of
sion rates). image processing operations,
image connectedness: See that alter an image in order to
pixel connectivity. make it easier for humans to
perceive. Example operations
image coordinates: See image include contrast stretching and
plane coordinates and pixel histogram equalization. For
coordinates. example, the following shows
image database indexing: a histogram equalization oper-
The technique of associating ation:

127
are organized. Different possi-
bilities include pixel interleav-
ing (where the image data is
ordered by pixel position), and
image feature: A general term band interleaving (where the
for an interesting image struc- image data is ordered by band,
ture that could arise from and is then ordered by pixel
a corresponding interesting position within each band).
scene structure. Features can image interpolation: A method
be single points such as for computing a value for a
interest points, curve vertices, pixel in an output image based
image edges, lines or curves or on non-integer coordinates in
surfaces, etc. some input image. The com-
image feature extraction: A putation is based on the values
group of image processing of nearby pixels in the input
techniques concerned with image. This type of operation
the identification of particular is required for most geometric
features in an image. Examples transformations and computa-
include edge detection and tions requiring sub-pixel reso-
corner detection. lution. Types of interpolation
scheme include nearest-neigh-
image flow: See optic flow. bor interpolation, bilinear
image formation: A general interpolation, bi cubic interpol-
term covering issues relating to ation, etc. This figure shows
the manner in which an image the result of interpolation in
is formed. For example in the image enlargement:
case of a digital camera this
term would include the camera
Enlarged image using
geometry as well as the process
bicubic interpolation
of sampling and quantization.
image grid: A geometric map
describing the image sampling
in which every image point is
represented by a vertex (or
hole) in the map/grid.
image indexing: See image
database indexing.
image intensifier: A device for
image interpretation: A general
amplifying an image, so that
term for computer vision pro-
the resultant sensed luminous cesses that extract descriptions
flux is significantly higher. from images (as opposed to
image interleaving: Describes processes that produce output
the way in which image pixels images for human viewing).

128
There is often the assump- Magnified image (x4)
tion that the descriptions
are very high-level, e.g., “the
boy is walking to the store
carrying a book” or “these
cells are cancerous”. A broader
definition would also allow
processes that extract informa-
tion needed by a subsequent
(usually non-image processing) image matching: The compari-
activity, e.g., the position of a son of two images, often
bright spot in an image. evaluated using cross cor-
image invariant: An image fea- relation. See also template
ture or measurement image matching.
that is invariant to some prop-
erties. For example invariant
Image 1
color features are often used in
image database indexing.
image irradiance equation:
Usually expressed as Ex y =
R p q, this equality (up to a
constant scale factor to account
for illumination strength, sur- Image 2
face color and optical effi-
ciency) says that the observed
brightness E at pixel x y is
equal to the reflectance R of
the surface for surface normal
 p q −1. Usually there is a Locations where Image 2 matches Image 1
one-degree-of-freedom family
of surface normals with the
same reflectance value so the
observed brightness only par-
tially constrains local surface
orientation and thus shape. image memory: Seeframe store.
image magnification: The ex- image morphing: A gradual
tent to which an image is ex- transformation from one image
panded for viewing. If the to another image.
image size is actually changed
then image interpolation must
be used. Normally quoted
relative to the original size
(e.g., ×2 ×10, etc.).
129
image morphology: An ap- the ideal values. Often noise
proach to image processing is modeled as having a Gauss-
that considers all operations ian distribution with a zero
in terms of set operations. See mean, although it can take
mathematical morphology. on different forms such as
image mosaic: A composition of salt-and-pepper noise depend-
several images, to provide a ing upon the cause of the noise
single larger image with cov- (e.g., the environment, elec-
ering a wider field of view. trical inference, etc.). Noise
For example, the following is a is measured in terms of the
mosaic of three images: signal-to-noise ratio.
image modality: A general term
for the sensing technique used
to capture an image, e.g., a
visible light, infrared or X-ray
image.
image normalization: The pur-
pose of image normalization
is to reduce or eliminate the
effects of different illumination
on the same or similar scenes.
A typical approach is to sub-
image motion estimation: tract the mean of the image
Computation of optical flow for and divide by the standard
all pixels/features in an image. deviation, which produces
image multiplication: See pixel a zero mean, unit variance
multiplication operator. image. Since images are not
Gaussian random samples, this
image noise: Degradation of an approach does not completely
image where pixels have val- solve the problem. Further,
ues which are different from light source placement can
also cause variations in shading
Original image Image with gaussian
noise that are not corrected by this
approach. This figure shows
an original image (left) and its
normalization (right):

Image with salt-and-pepper noise

image of absolute conic: See


absolute conic.

130
image pair rectification: See to an image in order to trans-
image rectification. form it in some way. See also
image processing.
image plane: The mathematical
plane behind the lens onto image pyramid: A hierarchical
which an image is focused. In image representation in which
practice, the physical sensing each level contains a smaller
surface aims to be placed here, version of the image at the pre-
but its position will vary slightly vious level. Often pixel values
due to minor variations in sen- are obtained by a smoothing
process. Usually the reduc-
sor shape and placement. The
tion is by a power of two
term is also used to describe
(i.e., 2 or 4). The figure below
the geometry of the image shows four levels of a pyra-
recorded at this location. See: mid in which each level is
formed by averaging together
IMAGE PLANE
two pixels from the previous
OPTICAL AXIS layer. The levels are enlarged
to the original image size for
LENS inspection of the effect of the
X
compression.
Y

image plane coordinates: The


position of points in the
physical image sensing plane.
These have physically meaning-
ful values, such as centimeters. image quality: A general term,
These can be converted to usually referring to the extent
pixel coordinates, which are in to which the image data
pixels. The two meanings are records the observed scene
sometimes used interchange- faithfully. The specific issues
ably. that are important to image
quality are problem specific,
image processing: A general but may include low image
term covering all forms of pro- noise, high image contrast,
cessing of captured image data. good image focus, low motion
It can also mean processing blur, etc.
that starts from an image and
results in an image, as con- image querying: A short-
trasted to ending with symbolic hand term for indexing into
image databases. This is often
descriptions of the image con-
done based on color, texture
tents or scene. or shape indices. The database
image processing operator: A keys could be based on global
function that may be applied or local measures.

131
image reconstruction: A term may also refer to the separation
used in image compression to between pixels (e.g., 1 m)
describe the process of recreat- or the angular separation
ing a digital image from some between the lines of sight cor-
compressed form. responding to adjacent pixels.
image rectification: A warping image restoration: The process
of a stereo pair of images such of removing some known
that conjugate epipolar lines (and modelled) distortion
(defined by the two cam- from an image, such as blur
eras’ epipoles and any 3D in an out-of-focus image.
scene point) are collinear. Usu- The process may not pro-
ally the lines are transformed duce a perfect image, but
to be parallel to the hor- may remove an undesired
izontal axis so that corres- distortion (e.g., motion blur)
ponding image features can be at the cost of another ignor-
found on the same raster line. able distortion (e.g., phase
This reduces the computa- distortion).
tional complexity of the stereo image sampling: The process
correspondence problem. of measuring some pixel
image registration: See regis- values from the physical
tration. image focused onto the
image plane. The sampling
image representation: A gen- could be monochrome, color
eral term for how the image or multi-spectral, such as
data is represented. Image data RGB. The sampling usually
can be one, two, three or results in a rectangular array
more dimensional. Image data of pixels sampled at nearly
is often stored in arrays where equally spacing, but other
the spatial layout of the array sampling could be used such
reflects the spatial layout of the as space variant sensing.
data. The figure below shows a
small 10 × 10 pixel image patch image scaling: The operation of
with the gray scale values for increasing or reducing the size
the corresponding pixels. of an image by some scale fac-
tor. This operation may require
the use of some type of
123 123 123 123 123 123 123 123 96 96
123 123 112 96 96 123 123 123 123 96
image interpolation method.
123 123 96 96 112 123 137 123 123 96
123 123 96 96 123 214 234 178 123 96
See also image magnification.
123 100 72 109 178 230 230 137 123 96
125 78 51 142 218 178 96 76 96 96
92 100 92 92 81 76 76 96 123 123
image segmentation: The
81 109 129 129 100 81 92 123 123 123
51 109 142 137 123 123 123 123 123 123
grouping of image pixels into
33 76 123 123 137 137 123 123 123 123
meaningful, usually connected,
structures such as curves and
image resolution: Usually used regions. The term is applied to
to record the number of pix- a variety of image modalities,
els in the horizontal and verti- such as intensity data or
cal directions in the image, but range data and properties, such

132
as similar feature orientation, the jitter. A similar application
feature motion, surface shape would be to remove system-
or texture. atic camera motions to pro-
image sequence: A series of duce a motionless image. See
images generally taken at regu- also feature stabilization.
lar intervals in time. Typically image sharpening operator:
the camera and/or objects in An image enhancement oper-
the scene will be moving. ator that increases the high
spatial frequency component
of the image, so as to make
the edges of objects appear
sharper or less blurred. See
also edge enhancement. These
images show a raw image (left)
image sequence fusion: The and an image sharpened with
integration of information the unsharp operator (right).
from the many images in
an image sequence. Differ-
ent types of fusion include
3D structure recovery, produc-
tion of a mosaic of the scanned
scene, tracking of a moving
object, improved scene imag-
ing due to image averaging,
etc.
image sequence matching:
Computing the correspond-
ence between pixels or image size: The number of
image features in frames of pixels in an image, for exam-
the image sequence. With ple, 768 horizontally by 494
the correspondences, one vertically.
can construct image mosaics,
stabilize image jitter or recover image smoothing: See noise
scene structure. reduction.
image sequence stabilization: image stabilization: See image
Normal hand-held video cam- sequence stabilization
era recordings contain some image storage devices: See
image motion due to the jit- frame store.
ter of the human operator.
Image stabilization attempts to image subtraction operator:
estimate the random portion See pixel subtraction operator.
of the camera motion jitter image transfer: 1) See novel
and translate the images in the view synthesis. 2) Alternatively,
sequence to reduce or remove a general term describing the

133
movement of an image from structured light sources, point
one device to another, or alter- light sources, etc.
natively from one representa- imaging spectroscopy: The
tion to another. acquisition and analysis of
image understanding: A gen- surface composition by using
eral term referring to the image data from multiple
derivation of high-level spectral channels. A typical
(abstract) information from an sensor (AVIRIS) records 224
image or series of images. This measurements at 10 nm incre-
term is often used to refer to ments from 400 to 2500 nm.
the emulation of human visual The term might refer to the
capabilities. raw multi-dimensional signal
or to the classification of
Image Understanding Environ-
that signal into surface types
ment (IUE): A C++ based col-
(e.g., vegetation or mineral
lection of data-types (classes)
types).
and standard computer vision
algorithms. The motivation imaging surface: The surface
behind the development of within a camera on which the
the IUE was to reduce the image is projected by the lens.
independent re-invention of This surface in a digital camera
basic computer vision code in is comprised of photosensi-
government funded computer tive elements that record the
vision research. incident illumination. See also
image plane.
image warping: A general term
for transforming the positions implicit curve: A curve that is
of pixels in an image, usu- defined by an equation of
ally while maintaining image the form f x  = 0. Then the
topology (i.e., neighboring curve is the set of points
original pixels remain neigh- S = 
x f
x  = 0.
bors in the warped image). implicit surface: The represen-
This results in an image with tation of a surface as the set
a new shape. This oper- of points that makes a function
ation might be done, for have the value zero. For ex-
example, to correct some ample, the sphere x 2 + y 2 +
geometric distortion, align two z 2 = r 2 of radius r at the origin
images (see image rectifica- could be represented by the
tion), or transform shapes function fx y z = x 2 + y 2 +
into a more easily processed z 2 −r 2 . The set of points where
form (e.g., circles into straight fx y z = 0 is the implicit
lines). surface.
imaging geometry: A gen- impossible object: An object
eral term referring to the that cannot physically exist,
relative placement of sensors, such as:
134
dent of each other. Unlike
principal component analysis,
which considers only second
order properties (covariances)
and transforms onto basis
vectors that are orthogonal
to each other, ICA considers
properties of the whole dis-
tribution and transforms onto
basis vectors that need not be
orthogonal.
impulse noise: A form of image index of refraction: The abso-
corruption where image pixels lute index of refraction in a
have their value replaced by material is the ratio of the
the maximum value (e.g., 255). speed of an electromagnetic
See also salt-and-pepper noise. wave in a vacuum to the
This figure shows impulse speed in the material. More
noise on an image: commonly used is the rela-
tive index of refraction of
two media, which is the ratio
of their absolute indices of
refraction. This ratio is used in
lens design and explains the
bending of light rays as the
light passes into a new material
(Snell’s Law).
incandescent lamp: A light indexing: The process of retriev-
source whose light arises from ing an element from a data
the glowing of a very hot structure using a key. A pow-
structure, such as a tungsten erful concept imported into
filament in the common light computer vision from pro-
bulb. gramming. For example, the
incident light: A general term problem of establishing the
referring to the light that identity of an object given an
image and a set of candidate
strikes or illuminates a surface.
models is typically approached
incremental learning: Learning by locating some characteriz-
that is incremental in nature. ing elements in the image, or
See continuous learning. features, then using the fea-
independent component an- tures’ properties to index a
alysis: A multi-variate data an- data base of models. See also
alysis method. It finds a linear model base indexing.
transformation that makes industrial vision: A general
each component of the trans- term covering uses of machine
formed data vectors indepen- vision technology to industrial
135
processes. Applications include information fusion: Fusion of
product inspection, process information from multiple
feedback, part or tool align- sources. See sensor fusion.
ment. A large range of lighting infrared: See infrared light.
and sensing techniques are
used. A common feature of infrared imaging: Production
industrial vision systems is fast of a image through use of an
processing rates (e.g., several infrared sensor.
times a second), which may infrared light: Electromagnetic
require limiting the rate at energy with wavelengths
which targets are analyzed or approximately in the range
limiting the types of processing. 700 nm to 1 mm. Immedi-
infinite impulse response filter ately shorter wavelengths are
(IIR): A filter that produces an visible light and immediately
output value (yn ) based on the longer wavelengths are micro-
current and past input values wave radio. Infrared light is
(xi ) together withpast output often used in machine vision
p
values (yj ). yn = i=0 ai xn−i + systems because: 1) it is easily
q observed by most semiconduc-
j=1 bj yn−j where ai and bj are
weights. tor image sensors yet is not
visible by humans or 2) it is a
inflection point: A point at measure of the heat emitted
which the second derivative by the observed scene.
of a curve changes its sign,
corresponding to a change infrared sensor: A sensor cap-
in concavity. See also curve able of observing or measuring
inflection. infrared light.
inlier: A sample that falls within
an assumed probability distri-
INFLECTION
POINT
bution (e.g., within the 95 per-
centile). See also outlier.
inspection: A general term for
visually examining a target
to detect defects. Common
practical inspection examples
include printed circuit boards
influence function: A function for breaks or solder joint fail-
describing the effect of an ures, paper production for
individual observations on a holes or discolorations, and
statistical model. This allows us food for irregularities.
to evaluate whether the obser- integer lifting: A method used
vation is having an undue influ- to construct wavelet represen-
ence on the model. tations.

136
integer wavelet transform: An between the brightness of the
integer version of the discrete measured light and the stored
wavelet transform. values. The term can refer to
integral invariant: An inte- the intensity of observed visible
gral (of some function) that light as well.
is invariant under a set of intensity gradient: The math-
transformations. For example, ematical gradient operation 
local integrals along a curve applied to an intensity image
of curvature or arc length are I gives the intensity gradient
invariant to rotation and I at each image point. The
translation. Integral invari- intensity gradient direction
ants potentially have greater shows the local image direc-
stability to noise than, e.g., dif- tion in which the maximum
ferential invariants, such as change in intensity occurs. The
curvature itself. intensity gradient magnitude
integration time: The length gives the magnitude of the
of time that a light-sensitive local rate of change in image
sensor medium is exposed to intensity. These terms are illus-
the incident light (or other trated below. At each of the two
stimulus). Shorter times designated points, the length
reduce the signal strength and of the vector shows the magni-
possible motion blur (if the tude of the change in intensity
sensor or objects in the scene and the direction of the vector
are moving). shows the direction of greatest
change.
intensity: 1) The brightness of
a light source. 2) Image data
that records the brightness of
the light that comes from the
observed scene.
intensity based database
indexing: This is a form of
image database indexing that intensity gradient direction:
uses intensity descriptors such The local image direction in
as histograms of pixel (mono- which the maximum change
chrome or color) values or in intensity occurs. See also
vectors of local derivative intensity gradient.
values.
intensity gradient magnitude:
intensity cross correlation: The magnitude of the local rate
Cross correlation using inten- of change in image intensity.
sity data. See also intensity gradient.
intensity data: Image data that The image that follows shows
represents the brightness of the raw image and its inten-
the measured light. There is sity gradient magnitude
not usually a linear mapping (contrast enhanced for clarity).

137
image below shows (in black)
the intensity level 80 of the
left image.

intensity histogram: A data


structure that records the
number of pixels of each
intensity value. A typical intensity matching: This ap-
gray scale image will have pix- proach finds corresponding
els with values in [0,255]. Thus points in a pair of images
the histogram will have 256 by matching the gray scale
entries recording the number intensity patterns. The goal is
of pixels that had value 0, the to find image neighborhoods
number having value 1, etc. that have nearly identical pixel
A dark object against a lighter intensities. All image points
background and its histogram could be considered for match-
are shown here ing or only feature or interest
points. An algorithm where
Pixel value histogram intensity matching is used
15000 is correlation based stereo
10000 matching.
5000
0
intensity sensor: A sensor that
0 100 200 measures intensity data.
interest point: A general term
intensity image: An image that for pixels that have some inter-
records the measured intensity esting property. Interest points
data. are often used for making
intensity level slicing: An feature point correspondences
image processing operation between images. Thus, the
in which pixels with values points usually have some
other than the selected value identifiable property. Further,
(or range of values) are set to because of the need to limit
zero. If the image is viewed the combinatorial explosion
as a landscape, with height that matching can produce,
proportional to brightness, interest points are often
then the slicing operator expected to be infrequent
takes a cross section through in an image. Interest points
the height surface. The right are often points of high

138
variation in pixel values. See is the production of inter-
also point feature. Example ference fringes where the
interest points from the Harris light illuminates a surface.
corner detector(courtesy of These are parallel roughly
Marc Pollefeys) are seen equally spaced lighter and
here: darker bands of brightness.
One important consequence of
these bands is blurring of the
edge positions.
interferometric SAR: An en-
hancement of synthetic
aperture radar (SAR) sensing
to incorporate phase informa-
tion from the reflected signal,
increasing accuracy.
interior orientation: A photo-
grammetry term for the
calibration of the intrinsic
interest point feature detector: parameters of a camera, includ-
An operator applied to an ing its focal length, principal
point, lens distortion, etc.
image to locate interest points.
This allows transformation of
Well-known examples are the measured image coordinates
Moravec and the Plessey inter- into camera coordinates.
est point operators.
interlaced scanning: A tech-
interference: When 1) ordinary nique arising from television
light interacts with matter that engineering, whereby alternate
has dimensions similar to the rows of an image are scanned
wavelength of the light or or transmitted instead of con-
2) coherent light interacts secutive rows. Thus, one tele-
with itself, then interference vision frame is transmitted by
occurs. The most notable effect sending first the odd rows,
from a computer vision per- forming the odd field, and
spective is the production of then the even rows, forming
interference fringes and the the even field.
speckle of laser illumination. intermediate representation:
May alternatively refer to elec- A representation that is created
trical interference which can as a stage in the derivation
affect an image when it is of some other representa-
being transmitted on an elec- tion from some input repre-
trical medium. sentation. For example the
raw primal sketch, full primal
interference fringe: When opti- sketch, and 2.5D sketch were
cal interference occurs, the intermediate representation
most noticeable effect it has between input images and a

139
3D model in Marr’s theory. In points. Image, surface and vol-
the following example a binary ume values can be interpolated,
image of the notice board is an as well as higher dimensional
intermediate representation structures. Interpolating func-
between the input image and tions can be curved as well as
the textual output. linear.
interpretation tree search: An
08:32
algorithm for matching be-
1 Malahide tween members between two
Intermediate Representation
discrete sets. For each feature
from the first set, it builds
a depth-first search tree con-
internal energy (or force): A
sidering all possible matching
measure of the stability of a
features from the second set.
shape (such as smoothness) of
After a match is found for one
an active shape or deformable
feature (by satisfying a set of
contour model which is part of
consistency tests), then it tries
the deformation energy. This
to match the remaining fea-
measure is used to con-
tures. The algorithm can cope
strain the appearance of the
when no match is possible for
model.
a given feature by allowing a
internal parameters (of cam- given number of skipped fea-
era): See intrinsic parameters. tures. Here we see an example
inter-reflection: The reflection of a partial interpretation tree
caused by light reflected that is matching model features
off a surface and bouncing to data features:
off another surface of the
same object. See also mutual
illumination.
interval tree: An efficient struc- DATA 1 M1 M2 M3 *
ture for searching in which X ? ?
every node in the tree is a par- DATA 2 M1 M2 M3 *
ent to nodes in a particular X X
?
interval of values.
DATA 3 M1 M2 M3 *
interpolation: A mathematical ? ? ?
process whereby a value is DATA 4 M1 - MODEL 1
inferred from other nearby M2 - MODEL 2
M3 - MODEL 3
values or from a mathematical X - CONSISTENCY FAILURE * - WILDCARD
function linking nearby values.
For example, dense values
along a curve can be linearly intrinsic camera parame-
interpolated between two ters: Parameters such as
known curve points by fitting a focal length, coefficients of
line connecting the two curve radial lens distortion, and

140
the position of the princi- image below shows a depth
pal point, that describe the image registered with the
mapping from image pixels intensity image on the left.
to world rays in a camera.
Determining the parameters
of this mapping is the task
of camera calibration. For a
pinhole camera, world rays
r are mapped to homogen-
eous image coordinates x by
x = Kr where K is the upper
triangular 3 × 3 matrix
 
u f s u0 intruder detection: An applica-
K= 0 v f v0 tion of machine vision, usually
0 0 1 analyzing a video sequence to
detect the appearance of an
In this form, f represents the
unwanted person in a scene.
focal length, s is the skew angle
between the image coordinate invariant: Something that does
axes, u0  v0  is the principal not change under specified
point, and u and v are the operations (e.g., translation
the aspect ratios (e.g., pix- invariant).
els/mm) in the u and v image invariant contour function:
directions. The contour function charac-
intrinsic dimensionality: terizes the shape of a pla-
The number of dimensions nar figure based on the exter-
(degrees of freedom) inherent nal boundary. Values invariant
in a data set, independent of to position, scale or orienta-
the dimensionality of the space tion can be computed from
in which it is represented. the contour functions. These
For example, a curve in 3D is invariants can be used for
recognition of instances of the
intrinsically 1D although its
planar figure.
points are represented in 3D.
inverse convolution: See
intrinsic image: A term describ- deconvolution.
ing one of a set of images
registered with the input inverse Fourier transform: A
intensity image that describe transformation that allows a
properties intrinsic to the signal to be recreated from
scene, instead of proper- its Fourier coefficients. See
ties of the input image. Ex- Fourier transform.
ample intrinsic images include: inverse square law: A physical
distance to scene points, law that says the illumination
scene surface orientations, sur- power received at distance d
face reflectance, etc. The right from a point light source is

141
inversely proportional to the curvature of the intensity sur-
square of d, i.e., is proportional face along the isophote at that
to d12 . point.
invert operator: A low-level iso-surface: A surface in a 3D
image processing operation space where the value of
where a new image is formed some function is constant.
by replacing each pixel i.e., fx y z = C where C is
by an inverted value. For some constant.
binary images, this is 1 if isotropic gradient operator: A
the input pixel is 0 or 0 gradient operator that com-
if the input pixel is 1. For putes the scalar magnitude of
gray level images, this depends the gradient, i.e., a value that is
on the maximum range of independent of edge direction.
intensity values. If the range of
intensity values is [0,255] then isotropic operator: An operator
the inverse inverse of a pixel that produces the same
with value x is 256 − x. The output irrespective of the
result is like a photographic local orientation of the pixel
negative. Below is a gray level neighborhood where the oper-
image and its inverted image: ator is applied. For example,
a mean smoothing operator
produces the same output
value, even if the image data is
rotated at the point where the
operator is being applied. On
the other hand, a directional
derivative operator would
IR: See infrared. produce different values if
the image were rotated. This
irradiance: The amount of concept is particularly relevant
energy received at a point on to feature detectors, some of
a surface from the correspond- which are sensitive to the local
ing scene point. orientation of the image pixel
isometry: A transformation that values and some of which are
preserves distances. Thus the not (isotropic).
transformation T x → u is an iterated closest point: See
isometry if, for all pairs x y, iterative closest point.
we have x − y = Tx − Ty.
iterative closest point (ICP):
isophote curvature: Isophotes A shape alignment algorithm
are curves of constant image that works by iterating its two-
intensity. Isophote curvature is stage process until some ter-
defined at any given pixel as: mination point: step 1) given
− LLvv , where Lw is magnitude an estimated transformation of
w
of the gradient perpendicular the first shape onto the second,
to the isophote and Lvv is the find the closest feature from

142
the second shape for each fea-
ture of the first shape, and step
2) given the new set of closest
features, re-estimate the trans-
formation that maps the first
feature set onto the second.
Most variations of the algo-
rithm need a good initial esti-
mate of the alignment.
IUE: See Image Understanding
Environment.

143
J

Jacobian: The matrix of deriva- junction label: A symbolic label


tives of a vector function. Typ- for the pattern of edges meet-
ically if the function f 
x  is ing at the junction. This
written in component form as approach is mainly used in
blocks world scenes where all
f 
x  = f x1  x2      xp  objects are polyhedra, and
  thus all lines are straight and
f1 x1  x2      xp  meet at only a limited num-
 f2 x1  x2      xp  
  ber of configurations. Example
=   “Y” (i.e., corner of a block
  
seen front on) and “arrow”
fn x1  x2      xp  (i.e., corner of a block seen
then the Jacobian J is the n × p from the side) junctions are
matrix shown here. See also line label.
 f1 f1 
x1
   x
   
p

J=    
 ARROW Y JUNCTION
JUNCTION
fn fn
x
   x
1 p

joint entropy registration:


Registration of data using joint
entropy (a measure of the
degree of uncertainty) as a
criterion.
JPEG: A common format for
compressed image represen-
tation designed by the Joint
Photographic Experts Group
(JPEG).

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

145
K

k-means: An iterative squared k-nearest-neighbor algorithm:


error clustering algorithm. In- A nearest neighbor algorithm
put is a set of points  xi ni=1 , and that uses the classifications of
initial guess at the locations the nearest k neighbors when
c1      ck of k cluster cen- making a decision.
ters. The algorithm alternates Kalman filter: A recursive linear
two steps: points are assigned estimator of a varying state vec-
to the cluster center closest tor and associated covariance
to them, and then the cluster from observations, their associ-
centers are recomputed as the ated covariances and a dynamic
mean of the associated points. model of the state evolution.
Iterating yields an estimate of Improved estimates are calcu-
the k cluster centers that is like- lated as new data is obtained.
ly to minimize x minc  x − c2 .
Karhunen–Loève transform-
k-means clustering: See ation: The projection of a
k-means. vector (or image when treated
k-medians (also k-medoids): as a vector) onto an orthogonal
A variant of k-means clustering space that has uncorrelated
in which multi-dimensional components constructed from
medians are computed instead the autocorrelation (scatter)
of means. The definition of matrix of a set of example
multi-dimensional median var- vectors. An advantage is the
ies, but options for the me- orthogonal components have
dian m  of a set of points a natural ordering (by the
x i ni=1 , i.e., x1i      xdi ni=1
 largest eigenvalues of the
include the component- covariance of the original
wise definition m  = median vector space) so that one can
x1i ni=1      medianxdi ni=1  and select the most significant
the analogue of the one variation in the dataset. The
dimensional definition m  = transformation can be used as
argminm∈R  d  − x i .
i=1 m
n
a basis for image compression,

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

147
for estimating linear models kernel function: 1) A func-
in high dimensional datasets tion in an integral transform-
and estimating the dominant ation (e.g., the exponential
modes of variation in a dataset, term in the Fourier transform);
etc. It is also known as the 2) a function applied at
principal component trans- every point in an image (see
formation. The following convolution).
image shows a dataset before kernel principal component
and after the KL transform was analysis: An extension of the
applied. principal component analysis
(PCA) method that allows
+Y classification with curved re-
+Y gion boundaries. The kernel
method is equivalent to a non-
+X +X linear mapping of the data into
a high dimensional space from
which the global axes of max-
imum variation are extracted.
PRINCIPAL EIGENVECTOR
The method provides a trans-
formation via a kernel so that
kernel: 1) A small matrix of PCA can be done in the input
numbers that is used in space instead of the trans-
image convolutions. 2) The formed space.
structuring element used key frames: Primarily a com-
in mathematical morpho- puter graphics animation tech-
logy. 3) The mathematical nique, where key frames in
transformation used kernel dis- a sequence are drawn by
criminant analysis. more experienced anima-
kernel discriminant analysis: tors and then intermedi-
A classification approach based ate interpolating frames are
on three key observations: drawn by less experienced
1) some problems need curved animators. In computer
classification boundaries, vision motion sequence
2) the classification boundaries analysis, key frames are the
should be defined locally by analogous video frames,
the classes rather than globally typically displaying motion
and 3) a high dimensional discontinuities between which
classification space can be the scene motion can be
avoided by using the kernel smoothly interpolated.
method. The method provides KHOROS: An image process-
a transformation via a kernel ing development environ-
so that linear discriminant ment with a large set of
analysis can be done in the operators. The system comes
input space instead of the with a pull-down interactive
transformed space. development workspace

148
where operators can be instan- knowledge-based vision: A
tiated and connected by click style of image interpretation
and drag operations. that relies on multiple pro-
kinetic depth: A technique for cessing components capable
estimating the depth at image of different image analysis
feature points (usually edges) processes, some of which
by exploiting a controlled sen- may solve the same task in
sor motion. This technique different ways. Linking the
generally does not work at all components together is a rea-
points of the image because of soning algorithm that knows
insufficient image structure or about the capabilities of the
sensor precision in smoothly different components, when
varying regions, such as walls. they might be usable or might
See also shape from motion. fail. An additional common
A typical motion case is for the component is some form of
camera to rotate on a circular task dependent knowledge
trajectory while fixating on a encoded in a knowledge
point in front of the camera, as representation that is used
seen here: to help guide the reasoning
algorithm. Also common is
some uncertainty mechanism
FIXATION that records the confidence
POINT TARGET
that the system has about the
outcomes of its processing. For
example, a knowledge-based
vision system might be used
for aerial analysis of road net-
SWEPT works, containing specialized
TRAJECTORY detection modules for straight
roads, road junctions, forest
roads as well as survey maps,
Kirsch compass edge de- terrain type classifiers, curve
tector: A first derivative edge linking, etc.
detector that computes the
gradient in different directions knowledge representation: A
according to which calcu- general term for methods
lation mask is used. Edges of computer encoding know-
have high gradient values, ledge. In computer vision sys-
so thresholding the intensity tems, this is usually knowledge
gradient magnitude is one about recognizable objects and
approach to edge detection. visual processing methods.
A Kirsch mask that detects A common knowledge rep-
edges at 45 is: resentation scheme is the
 
−3 5 5 geometric model that records
−3 0 5 the 2D or 3D shape of
−3 −3 −3 objects. Other commonly used
149
vision knowledge representa- independently and is also inde-
tion schemes are graph models pendent of integration time.
and frames. Kullback–Leibler distance/ di-
Koenderink’s surface shape vergence: A measure of the
classification: An alternative relative entropy or distance
to the more common mean between two probability dens-
curvature and Gaussian curva- ities p1 
x  and p2 
x , defined as
ture 3D surface shape classi-  p x
fication labels. Koenderink’s D p1 p2  = p1  x  log 1 d x
scheme decouples the two p2 
x
intrinsic shape parameters into
one parameter (S) that repre- kurtosis: A measure of the flat-
sents the local surface shape ness of a distribution of gray-
(including cylindrical, hyper- scale values. If ng is the num-
bolic, spherical and planar) ber of pixels out of N with
and a second parameter (C ) gray scale value g, then the
that encodes the magnitude of fourth histogram moment is
the curvedness of the shape. 4 = N1 g ng  g − 1 4 , where 1
The shape classes represented is the mean pixel value. The
in Koenderink’s classification kurtosis is 4 − 3.
scheme are illustrated: Kuwahara: An edge-preserving
noise reduction filter. The fil-
ter uses four regions surround-
ing the pixel being smoothed.
S: –1 –1/2 0 +1/2 +1 The smoothed value for that
pixel is the mean value of the
Kohonen network: A multi- region with smallest variance.
variate data clustering and an-
alysis method that produces a
topological organization of the
input data. The response of the
whole network to a given data
vector can be used as a lower
dimensional signature of the
data vector.
KTC noise: A type of noise asso-
ciated with Field Effect Transis-
tor (FET) image sensors. The
“KTC” term is used because √ the
noise is proportional to kTC
where T is the temperature, C
is the capacitance of the image
sensor and k is Boltzmann’s
constant. This noise arises dur-
ing image capture at each pixel

150
L

label: A description associated their mean curvature (white:


with something for the pur- negative, light gray: zero,
poses of identification. For dark gray: positive, black:
example see region labeling. missing data).
labeling problem: Given a set lacunarity: A scale dependent
S of image structures (which measure of translational invari-
may be pixels as well as more ance based on the size distri-
structured objects like edges) bution of holes within a set.
and a set of labels L, the label- High lacunarity indicates that
ing problem is the question the set is heterogeneous and
of how to assign a label l ∈ L low lacunarity indicates homo-
for each image structure s ∈ S. geneity.
This process is usually depend-
ent on both the image data LADAR: LAser Detection And
and neighboring labels. A typ- Ranging or Light Amplification
ical remote sensing application for Detection and Ranging. See
is to label image pixels by their laser radar.
land type, such as water, snow, Lagrange multiplier technique:
sand, wheat field, forest, etc. A method of constrained opti-
A range image (below left) has mization to find a solution
its pixels labeled by the sign of to a numerical problem that
includes one or more con-
straints. The classical form
of the Lagrange multiplier
technique finds the param-
eter vector v minimizing (or
maximizing) the function fv  =
gv  + hv , where g is the
function being minimized and
h is a constraint function
that has value zero when its

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

151
argument satisfies the con- two images, etc. Landmarks
straint. The Lagrange multi- might be task specific, such as
plier is . components on an electronic
Laguerre formula: A formula circuit card or an anatomi-
cal feature such as the tip of
for computing the directed
the nose, or might be a more
angle between two 3D lines
general image feature such as
based on the cross ratio of four
interest points.
points. Two points arise where
the two image lines intersect LANDSAT: A series of satellites
the ideal line (i.e., the line launched by the United States
through the vanishing points ) of America that are a common
and the other two points are source of satellite images of the
the ideal line’s absolute points Earth. LANDSAT 7 for example
(intersection of the ideal line was launched in April 1999 and
and the absolute conic ). provides complete coverage of
the Earth every 16 days.
Lambert’s law: The observed
shading on ideal diffuse reflec- Laplacian: Loosely, the Lapla-
tors is independent of observer cian of a function is the sum of
position and varies with the its second order partial deriva-
angle  between the surface tives. For example the Lapla-
normal and source direction: cian of fx y z  3 →  is
2 f 2 f 2 f
 2 fx y z = x 2 + y 2 + z 2 . In
LIGHT SOURCE
SURFACE NORMAL computer vision, the Lapla-
CAMERA
cian operator may be applied
θ
to an image, by convolu-
tion with the Laplacian ker-
nel, one definition of which is
Lambertian surface: A sur- given by the sum of second
derivative kernels −1 2 −1

face whose reflectance obeys


and −1 2 −1
 , with zero
Lambert’s law, more commonly
padding to make the result
known as a matte surface.
3 × 3:
These surfaces have equally
 
bright appearance from all 0 −1 0
viewpoints. Thus, the shading −1 4 −1
of the surface depends only 0 −1 0
on the relative direction of the
incident illumination.
Laplacian of Gaussian oper-
landmark detection: A gen- ator: A low-level image oper-
eral term for detecting an ator that applies the second
image feature that is com- derivative Laplacian operator
monly used for registration. ( 2 ) after a Gaussian smooth-
The registration might be ing operation everywhere in
between a model and the an image. It is an isotropic
image or it might be between operator. It is often used as

152
part of a zero crossing edge image and then adding the
detection operator because Laplacian.
the locations where the value laser: Light Amplification by
changes sign (positive to nega- Stimulated Emission of Radi-
tive or vice versa) of the out- ation. A very bright light source
put image are located near often used for machine vision
the edges in the input image, applications because of its
and the detail of the detected properties: most light is at a
edges can be controlled by single spectral frequency, the
use of the scale parameter of light is coherent, so various
interference effects can be
the Gaussian smoothing. An ex-
exploited and the light beam
ample mask that implements
can be processed so that diver-
the Laplacian of Gaussian oper- gence is slight. Two common
ator with smoothing parameter applications are for structured
= 1 4 is: light triangulation and range
sensing.
laser illumination: A very bright
light source useful because of
its limited spectrum, bright
power and coherence. See also
laser.
laser radar: (LADAR) A LIDAR
range sensor that uses laser
light. See also laser range
sensor.
laser range sensor: A laser-
based range sensor records the
Laplacian pyramid: A com- distance from the sensor to a
pressed image representation target or target scene by de-
in which a pyramid of Laplacian tecting the image of a laser
images is created. At each spot or stripe projected onto
the scene. These sensors are
level of the scheme, the cur-
commonly based on struc-
rent gray scale image has the
tured light triangulation, time
Laplacian applied to it. The of flight or phase difference
next level gray scale image is technologies.
formed by Gaussian smoothing
and subsampling. At the final laser speckle: A time-varying
level, the smoothed and sub- light pattern produced by
sampled image is kept. The interference of the light
original image can be approxi- reflected from a surface
mately reconstructed level by illuminated by a laser.
level through expanding and laser stripe triangulation: A
smoothing the current level structured light triangulation

153
system that uses laser light. contrast and rotation invariant
For example, a projected plane measures.
of light that would normally least mean square estimation:
result in a straight line in Also known as least square
the camera image is distorted estimation or mean square esti-
by any objects in the scene mation. Let v be the parameter
where the distortion is pro- vector that we are searching
portional to the height of the for and ei v  be the error
object. A typical triangulation measure associated with the
geometry is illustrated here: i th of N data items. The
error measure often used is
LASER STRIPE the Euclidean, algebraic or
PROJECTOR
Mahalanobis distance between
the i th data item and a curve or
surface being fit, that is param-
eterized by v . Then the mean
square error is:
1 
LASER STRIPE N
e v 2
SCENE OBJECT
N i=1 i
CAMERA /SENSOR The desired parameter vector v
minimizes this sum.
lateral inhibition: A process least median of squares
whereby a given feature weak- estimation: Let v be the
ens or eliminates nearby fea- parameter vector that we
tures. An example of this are searching for and ei v 
appears in the Canny edge de- be the error associated with
tector where locally maximal the i th of N data items. The
intensity gradient magnitudes error measure often used is
cause adjacent gradient values the Euclidean, algebraic or
that lie across (as contrasted Mahalanobis distance between
with along) the edge to be set the i th data item and a curve
to zero. or surface being fit that is
parameterized by v . Then the
Laws’ texture energy measure: median square error is the
A measure of the amount of median or middle value of the
image intensity variation at a sorted set ei v 2 . The desired
pixel. The measure is based on parameter vector v minimizes
5 one dimensional finite differ- this median value. This esti-
ence masks convolved orthog- mator usually requires more
onally to give 25 2D masks. computation for the iterative
The 25 masks are then con- and sorting algorithms but can
volved with the image. The out- be more robust to outliers
puts are smoothed nonlinearly than the least mean square
and then combined to give 14 estimator.

154
least square curve fitting: A data. Fitting often uses the
least mean square estimation Euclidean, algebraic or Maha-
process that fits a parametric lanobis distance to evalu-
curve model or a line to a col- ate the goodness of fit. The
lection of data points, usually range image (above left) has
2D or 3D. Fitting often uses planar and cylindrical surfaces
the Euclidean, algebraic or fitted to the data (above right).
Mahalanobis distance to evalu-
leave-one-out test: A method
ate the goodness of fit. Here
for testing a solution in which
is an example of least square
one sample is left out of the
ellipse fitting:
training set and used instead
for testing. This can be done
for every sample.
LED: Light Emitting semicon-
ductor Diode. Often used as
detectable point light source
markers or controllable illu-
mination.
least square estimation: See left-handed coordinate system:
least mean square estimation. A 3D coordinate system with
the XYZ axes arranged as
least squares fitting: A general shown below. The alternative
term for a least mean square is a right-handed coordinate
estimation process that fits system.
some parametric shape, such
as a curve or surface, to a
collection of data. Fitting often +Y
uses the Euclidean, alge- +Z (INTO PAGE)
braic or Mahalanobis dis-
tance to evaluate the goodness
of fit.
least square surface fitting: A +X
least mean square estimation
process that fits a parametric
surface model to a collection
of data points, usually range

Legendre moment: The


Legendre moment of a piece-
wise continuous function
fx y with order  +1 m
 +1n is
1
4
2m + 12n + 1 −1 −1 Pm x
Pn  y fx ydxdy where Pm x
is the mth order Legendre
polynomial. These moments

155
can be used for characterizing convex or half-cylindrical, con-
image data and images can be verging, magnifying, etc.
reconstructed from the infinite level set: The set of data points
set of moments. x that satisfy a given equation
Lempel–Ziv–Welch (LZW): A of the form: f x  = c. Varying
form of file compression based the value of c gives different
on encoding commonly occur- sets of usually closely related
ring byte sequences. This form points. A visual analogy is of
of compression is used in the a geographic surface and the
common GIF image file format. ocean rising. If the function
lens: A physical optical device f is the sea level, then the
for focusing incident light level sets are the shore lines
onto an imaging surface, for different sea levels c. The
such as photographic film or figure below shows an intensity
an electronic sensor. Lenses image and the pixels at level
can also be used to change (brightness) 80.
magnification, enhance or
modify a field of view.
lens distortion: Unexpected
variation in the light field
passing through a lens. Exam-
ples are radial lens distortion
or chromatic aberration and
usually arise from how the
lens differs from the ideal lens.
lens equation: The simplest
case of a convex converging Levenberg–Marquardt opti-
lens with focal length f per- mization: A numerical multi-
fectly focused on a target at dis- variate optimization method
tance D has distance d between that switches smoothly
the lens and the image plane between gradient descent
as related by the lens equa- when far from a (local) opti-
tion 1f = D1 + d1 and illustrated mum and a second-order
here: inverse Hessian (quadratic)
method when nearer.
SCENE license plate recognition: A
OBJECT
computer vision application
d H OPTICAL that aims to identify a vehicle’s
h
D
AXIS license plate from image data.
IMAGE Image data is often acquired
PLANE LENS from automatic cameras at
places where vehicles slow
lens type: A general term for lens down such as bridges and toll
shapes and functions, such as barriers.

156
LIDAR: LIght Detection And light source geometry: A gen-
Ranging. A range sensor using eral term referring to the shape
(usually) laser light. It can be and placement of the light
based on the time of flight of sources in a scene.
a pulse of laser light or the light source placement: A
phase shift of a waveform. general term for the positions
The measurement could be of the light sources in a scene.
of a single point or an array It may also refer to the care
of measurements if the light that machine vision applica-
beam is swept across the tions engineers take when
scene/object. placing the light sources so as
Lie groups: A group that can be to minimize unwanted lighting
represented as a continuous effects, such as shadows
and differentiable manifold of and specular reflections, and
a space, such that group oper- to enhance the visibility
ations are also continuous. An of desired scene struc-
example of a Lie group is tures, e.g., by back lighting or
the orthogonal group SO3 = oblique lighting.
R ∈ 3×3  R  R = I detR = 1 light stripe ranging: See struc-
of rigid 3D rotations. tured light triangulation.
light: A general term for the elec- lightfield: A function that
tromagnetic radiation used in encodes the radiance on
many computer vision applica- an empty point in space as
tions. The term could refer to a function of the point’s
the illumination in the scene position and the direction of
or the irradiance coming from the illumination. A lightfield
the scene onto the sensor. allows image based rendering
Most computer vision applica- of new (unoccluded) scene
tions use light that is visible, views from arbitrary positions
infrared or ultraviolet. within the lightfield.
light source: A general term for lighting: A general term for the
the source of illumination in illumination in a scene, whether
a scene, whether deliberate or deliberate or accidental.
accidental. The light source lightness: The estimated or per-
might be a point light source ceived reflectance of a surface,
or an extended light source. when viewed in monochrome.
light source detection: The lightpen: A user-interface device
process of detecting the pos- that allows people to indicate
ition of or direction to the places on a computer screen
light sources in the scene, even by touching the screen at the
if not observable. The light desired place with the pen. The
sources are usually assumed to computer can then draw items,
be point light sources for this select actions, etc. It is effect-
process. ively a type of mouse that acts

157
on the display screen instead line detection operator: A fea-
of on a mat. ture detection process that
likelihood ratio: The ratio of detects lines. Depending on
probabilities of observing data the specific operator, locally
D with and without condition linear line segments may be
PDC 
C  PD¬C . detected or straight lines might

be globally detected. Note that
limb extraction: A process this detects lines as contrasted
of image interpretation that with edges.
extracts 1) the arms or legs
of people or animals, e.g., for line drawing analysis: 1) Analy-
tracking or 2) the barely visible sis of hand-made or CAD draw-
edge of a curved surface as it ings to extract a symbolic
curves away from an observer description or shape descrip-
(derived from an astronomical tion. For example, research
term). See figure below. See has investigated extracting 3D
also occluding contour. building models from CAD
drawings. Another application
is the analysis of hand-drawn
circuit sketches to form a
circuit description. 2) Analy-
sis of the line junctions in a
LIMB polyhedral blocks world scene,
in order to understand the 3D
structure of the scene.
line fitting: A curve fitting prob-
lem where the objective is to
estimate the parameters of a
line: Usually refers to a straight straight line that best interpol-
ideal line that passes through ates given point data.
two points, but may also refer
line following: See line group-
to a general curve marking,
ing.
e.g., on paper.
line cotermination: When two line grouping: Generally refers
lines have endpoints in exactly to the process of creating
or nearly the same location. a longer curve by group-
See examples: ing together shorter fragments
found by line detection. These
might be short connecting
locally detected line fragments,
LINE or might be longer straight
COTERMINATIONS line segments separated by a
gap. May also refer to the
grouping of line segments on
the basis of grouping prin-
ciples such as parallelism.
158
See also edge tracking, per- line matching: The process of
ceptual organization, Gestalt. making a correspondence
line intersection: Where two or between the lines in two sets.
more lines intersect at a point. One set might be a geometric
The lines cross or meet at a model such as used in model
line junction. See: based recognition or model
registration or alignment. Alter-
natively, the lines may have
been extracted from different
images, as when doing feature
based stereo or estimating the
epipolar geometry between
LINE INTERSECTIONS the two lines.
line moment: A line moment
line junction: The point at is similar to the traditional
which two or more lines meet. area moment but is calculated
See junction labeling. only at points xs ys along
line label: In an ideal polyhedral 
the object contour. The pq th
blocks world scene, lines arise moment is: xs p ysq ds. The
from only a limited set of infinite set of line moments
physical situations such as con- uniquely determine the con-
vex or concave surface shape tour.
discontinuities (fold edges),
occluding edges where a fold line moment invariant: A set
edge is seen against the back- of invariant values computable
ground (blade edge), crack from the line moments. These
edges where two polyhedra may be invariant to translation,
have aligned edges or shadow scaling and rotation.
edges. Line labels identify the line of sight: A straight line from
type of line (i.e., one of these the observer or camera into the
types). Assigning labels is one scene, usually to some target.
step in scene understanding See:
that helps deduce the 3D
structure of the scene. See LINE OF SIGHT
also junction label. Here is an
example of the usual line labels
for convex(+), concave(−) and
occluding (>) edges.

+ line scan camera: A camera that


+ uses a solid-state or semi-
+
– conductor (e.g., CMOS) linear
– array sensor, in which all of the
photosensitive elements are in
line linking: See line grouping. a single 1D line. Typical line

159
scan cameras have between 32 an extra term with value 1.)
and 8192 elements. These sen- A linear discriminant function
sors are used for a variety is a basic classification process
of machine vision applications that determines which of two
such as scanning, flow process classes or cases the structure
control and position sensing. belongs to based on the sign
of the  linear function l = a ·
line segmentation: See curve
segmentation. x = ai xi , for a given coef-
ficient vector a  . For example,
line spread function: The line to discriminate between unit
spread function describes side squares and unit diam-
how an ideal infinitely thin eter circles based on the area A,
line would be distorted after the feature vector is x = A 1
passing through an optical and the coefficient vector a =
system. Normally, this can 1 −0 89 . If l > 0, then the
be computed by integrating structure is a square, otherwise
the point spread functions of a circle.
an infinite number of points linear features: A general term
along the line. for features that are locally or
line thinning: See thinning. globally straight, such as lines
linear: 1) Having a line-like or straight edges.
form. 2) A mathematical linear filter: A filter whose out-
description for a process put is a weighted sum of its
in which the relationship inputs, i.e., all terms in the
between some input variables filter are either constants or
x and some output variables y variables. If xi  are the inputs
is given by y = A x where A is (which may be pixel values
a matrix. from a local neighborhood or
linear array sensor: A solid- pixel values from the same
state or semiconductor position in different images of
the same scene, etc.), then the
(e.g., CMOS) sensor in which
linear
 filter output would be
all of the photosensitive ele-
ai xi + a0 , for some constants
ments are in a single 1D line.
ai .
Typical linear array sensors
have between 32 and 8192 linear regression: Estimation of
elements and are used in line the parameters of a linear rela-
scan cameras. tionship between two random
variables X and Y given sets of
linear discriminant analysis: samples xi and yi . The objec-
See linear discriminant func- tive is to estimate the matrix
tion. A and vector a  that minimize

linear discriminant function: the residual rA a   = i yi −
Assume a feature vector x A xi − a   . In this form, the
2

based on observations of some xi are assumed to be noise-


structure. (Assume that the fea- free quantities. When both vari-
ture vector is augmented with ables are subject to error,

160
orthogonal regression is pre- is invariant to lightness and
ferred. contrast transformations, that
linear transformation: A math- can be used to create local
ematical transformation of a set texture primitives.
of values by addition and mul- local contrast adjustment: A
tiplication by constants. If the form of contrast enhancement
set of values is a vector x , the that adjusts pixel intensities
general linear transformation based on the values of nearby
produces another vector y = pixels instead of the values
A x , where y need not have the of all pixels in the image.
same dimension as x and A is The right image has the eye
a constant matrix (i.e., is not a area’s brightness (from ori-
function of x ). ginal image at the left) en-
hanced while maintaining the
lip shape analysis: An appli- background’s contrast:
cation of computer vision to
understanding the position
and shape of human lips as
part of face analysis. The goal
might be face recognition or
expression understanding.
lip tracking: An application of
computer vision to follow-
ing the position and shape
of human lips in a video local curvature estimation: A
sequence. The goal might be part of surface or curve shape
for lip reading, augmenta- estimation that estimates the
tion of deaf sign analysis or curvature at a given point
focusing of resolution during based on the position of nearby
image compression. parts of the curve or surface.
For example, the curve y =
local: A local property of a math- sinx has zero local curvature
ematical object is one that is at the point x = 0 (i.e., the
defined in terms only of a small curve is locally uncurved or
neighborhood of the object, for straight), although the curve
instance, curvature. In image has nonzero local curvature at
processing, a local operator other other points (e.g., at 4 ).
operates on a small number of See also differential geometry.
nearby pixels at a time.
Local Feature Focus (LFF)
local binary pattern: Given a method: A 2D part identi-
local neighborhood about a fication and pose estimation
point, use the value of the algorithm that can cope with
central pixel to threshold large amounts of occlusion
the neighborhood. This cre- of the parts. The algorithm
ates a local descriptor of uses a mixture of property-
the gray scale structure that based classifiers, graph models

161
and geometric models. The is parameterized by a polar co-
key identification process is ordinate  and a radial coord-
based around local configura- inate r . However, unlike polar
tions of image features that is coordinates, the radial distance
more robust to occlusion. increases exponentially as r
grows. The mapping from pos-
local invariant: See local point
ition  r to Cartesian coord-
invariant.
inates is r cos r sin,
local operator: An image pro- where  is some design param-
cessing operator that com- eter. Further, the amount of
putes its output at each pixel area of the image plane rep-
from the values of the nearby resented by each pixel grows
pixels instead of using all or exponentially with r , although
most of the pixels in the image. the precise pixel size depends
local point invariant: A prop- on factors like amount of pixel
erty of local shape or intensity overlap, etc. See also foveal
that is invariant to, e.g., transla- image. The receptive fields of
tion, rotation, scaling, contrast a log-polar image (courtesy of
or brightness changes, etc. Herman Gomes) can be seen in
For example, a surface’s the outer rings of:
Gaussian curvature is invariant
to change in position.
local surface shape: The shape
of a surface in a “small” region
around a point, often classified
into one of a small number of
surface shape classifications.
Computed as a function of the
surface curvatures.
local variance contrast: The
variance of the pixel values
computed in a neighborhood
about each pixel. Contrast is
the difference between the log-polar stereo: A form of
larger and smaller values of this stereo vision in which the input
variance. Large values of this images come from log-polar
property occurs in highly tex- sensors instead of the standard
tured or varying areas. Cartesian layout.
log-polar image: An image rep- logarithmic transformation:
resentation in which the pix- See pixel logarithm operator.
els are not in the standard logical object representation:
Cartesian layout but instead An object representation based
have a space varying layout. In on some logical formalism
the log-polar case, the image such as the predicate calculus.
162
For example, a square can be can be exactly recon-
defined as: structed from the compressed
image. This contrasts with
squares ⇐⇒ polygons lossy compression.
& number _of _sidess 4
& ∀e1 ∀e2 e1 = e2 & lossy compression: A category
side_ofs e1 & side_ofs e2  of image compression in which
& lengthe1  = lengthe2  the original image cannot be
& parallele1  e2  exactly reconstructed from the
 perpendiculare1  e2  compressed image. The goal
is to lose insignificant image
long baseline stereo: See wide details (e.g., noise) while limit-
baseline stereo. ing perception of changes
to the image appearance.
long motion sequence: A video Lossy algorithms generally pro-
sequence of more than just duce greater compression than
a few frames in which there lossless compression.
is significant camera or scene
motion. The essential idea is low angle illumination: A
that the 3D scene structure machine vision technique,
can be inferred by effectively a often used for industrial vision,
stereo vision process. Here the where a light source (usu-
ally a point light source )
matched image features can be
is placed so that a ray of
tracked through the sequence,
light from the source to the
instead of having to solve
inspection point is almost
the stereo correspondence
perpendicular to the surface
problem. If a long sequence normal at that point. The
is not available, then analysis situation can also arise nat-
could use optical flow or short urally, e.g., from the sun
baseline stereo. position at dawn or dusk. One
look-up table: Given a finite set consequence of this low angle
of input values xi  and a is that shallow surface shape
function on these values, fx, defects and cracks cast strong
a look-up table records the shadows that may simplify the
values xi  fxi  so that the inspection process. See:
value of the function f can
be looked up directly rather
than recomputed each time. CAMERA
Look-up tables can be easily
used for color remapping or LIGHT
standard functions of integer SOURCE
pixel values (e.g., the logarithm
of a pixel’s value). θ
lossless compression: A cat-
egory of image compression
in which the original image TARGET POINT

163
low frequency: Usually refer- would say that low-level vision
ring to low spatial frequency in ends and middle-level vision
the context of computer vision. starts.
The low-frequency com- low pass filter: This term is
ponents of an image are the
imported from 1D signal pro-
slowly changing intensity com-
cessing theory into image
ponents of the image, such
processing. The term “low”
as large regions of bright and
dark pixels. If low temporal is a shorthand for “low fre-
frequency is the intended quency”, that, in the context
meaning, then low frequency of a single image, means low
refers to slowly changing pat- spatial frequency, i.e., intensity
terns of brightness or darkness patterns that change over many
at the same pixel in a video pixels. Thus a low pass filter
sequence. This image shows applied to an image leaves the
the low-frequency components low spatial frequency patterns,
of an image. or large, slowly changing pat-
terns, and removes the high
spatial frequency components
(sharp edges, noise ). Low pass
filters are a kind of smoothing
or noise reduction filter. Alter-
natively, filtering is applied to
the changing values of a given
pixel over an image sequence.
In this case the pixel values can
be treated as a sampled time
sequence and the original sig-
nal processing definition of
“low pass filter” is appropri-
low level vision: A general ate. Filtering this way removes
and somewhat imprecisely rapid temporal changes. See
(i.e., contentiously) defined also high pass filter. Here is an
term for the initial stages image and a low-pass filtered
of image analysis in a vision version:
system. It can also be used
for the initial stages of pro-
cessing in biological vision
systems. Roughly, low level
vision refers to the first few
stages of processing applied to
intensity images. Some authors
use this term only for oper-
ations that result in other
images. So, edge detection
is about where most authors

164
Lowe’s curve segmentation
method: An algorithm that
tries to split a curve into
a sequence of straight line
segments. The algorithm has
three main stages: 1) a recur-
sive splitting of segments into
two shorter, but more line-
like segments, until all remain-
ing segments are very short.
This forms a tree of seg-
ments. 2) Merging segments in
the tree in a bottom-up fash-
ion according to a straight-
ness measure. 3) Extracting the
remaining unmerged segments
from the tree as the segmenta-
tion result.
luma: The luminance compon-
ent of light. Color can be
divided into luma and chroma.
luminance: The measured in-
tensity from a portion of a
scene.
luminance efficiency: The
sensor specific function V
that determines how the
observed light Ix y  at
sensor position x y of
wavelength  contributes

to the measured luminance
lx y = IVd at that
point.
luminous flux: The amount of
light at all wavelengths that
passes through a given region
in space. Proportional to per-
ceived brightness.
luminosity coefficient: A com-
ponent of tristimulus color
theory. The luminosity coef-
ficient is the amount of
luminance contributed by a
given primary color to the total
perceived luminance.

165
M

M-estimation: A robust gene- macrotexture: The intensity


ralization of least square pattern formed by spatially or-
estimation and maximum ganized texture primitives on a
likelihood estimation. surface, such as a tiling. This
contrasts with microtexture.
Mach band effect: An effect in
the human visual system in magnetic resonance imaging
which a human observer per- (MRI): See NMR.
ceives a variation in brightness magnification: The process of
at the edges of a region of con- enlargement (e.g., of an image).
stant brightness. This vari- The amount of enlargement
ation makes the region appear applied.
slightly darker when it is beside magnitude-retrieval problem:
a brighter region and appear The reconstruction of a signal
slightly brighter when it is be- based on only the phase (not
side a darker region. the magnitude) of the Four-
machine vision: A general term ier transform.
for processing image data by a Mahalanobis distance: The
computer and often synonym- distance between two N -
ous with computer vision. dimensional points scaled by
There is a slight tendency to the statistical variation in each
use “machine vision” for prac- component of the point. For
tical vision systems, such as example, if x and y are two
for industrial vision, and “com- points from the same distri-
puter vision” for more explora- bution that has covariance
tory vision systems or for matrix C then the Mahalanobis
systems that aim at some of distance is given by
the competences of the human 1
vision system. x − y  C−1 
 x − y  2

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

167
The Mahalanobis distance is road model (black) overlaying
the same as the Euclidean an aerial image.
distance if the covariance
matrix is the identity matrix.
A common usage in computer
vision systems is for com-
paring feature vectors whose
elements are quantities hav-
ing different ranges and amo-
unts of variation, such as a
2-vector recording the proper-
ties of area and perimeter.
mammogram analysis: A mam- marching cubes: An algorithm
mogram is an X-ray of the for locating surfaces in volu-
human female breast. The metric datasets. Given a func-
main purpose of analysis is the tion f  on the voxels, the
detection of potential signs of algorithm estimates the pos-
cancerous growths. ition of the surface f  x = c
Manhattan distance: Also for some c. This requires esti-
called the Manhattan metric. mating where the surface inter-
Motivated by the problem of sects each of the twelve edges
only being able to walk along of a voxel. Many implementa-
city blocks in dense urban tions propagate from one voxel
environments, the distance to its neighbors, hence the
between points x1  y1  and “marching” term.
x2  y2  is  x1 − x2  +  y1 − y2 . marginal distribution: A prob-
many view stereo: See multi- ability distribution of a random
view stereo. variable X derived from the
joint probability distribution of
MAP: See maximum a posteriori
a number of random variables
probability.
integrated over all variables
map analysis: Analyzing an except X .
image of a map (e.g., obtained
Markov Chain Monte Carlo:
with a flat-bed scanner) in Markov Chain Monte Carlo
order to extract a symbolic (MCMC) is a statistical inference
description of the terrain method useful for estimating
described by the map. This is the parameters of complex
now a largely obsolete process distributions. The method
given digital map databases. generates samples from the
map registration: The registra- distribution by running the
tion of a symbolic map to (usu- Markov Chain that models
ally) aerial or satellite image the problem for a long time
data. This may require identi- (hopefully to equilibrium)
fying roads, buildings or land and then uses the ensemble
features. This image shows a of samples to estimate the
168
distribution. The states of the a strong result in the output
Markov Chain are the possible image when it processes a por-
configurations of the problem. tion of the input image con-
Markov random field (MRF): taining a pattern for which it
An image model in which the is “matched”. For example, the
value at a pixel can be ex- filter could be tuned for the
pressed as a linear weighted letter “e” in a given font size
sum of the values of pixels in a and type style, or a particular
finite neighborhood about the face viewed at the right scale. It
original pixel plus an additive is similar to template matching
random noise value. except the matched filter can
be tuned for spatially separ-
Marr’s theory: A shortened term ated patterns. This is a signal
for “Marr’s theory of the processing term imported into
human vision system”. Some image processing.
of the key stages in this inte-
grated but incomplete theory matching function: See similar-
are the raw primal sketch, full ity metric.
primal sketch, 2.5D sketch and matching method: A general
3D object recognition. term for finding the corres-
Marr–Hildreth edge detector: pondences between two struc-
An edge detector based on tures (e.g., surface matching)
multi-scale analysis of the zero- or sets of features (e.g., stereo
crossings of the Laplacian of correspondence ).
Gaussian operator. mathematical morphology
mask: A term for an m × n array operation: A class of math-
of numbers or symbolic labels. ematically defined image
A mask can be the smoothing processing operations in
mask used in a convolution, which the result is based on
the target in a template the spatial pattern of the input
matching or the kernel used data values rather than values
in a mathematical morphology themselves. For example, a
operation, etc. Here is a sim- morphological line thinning
ple mask for computing an algorithm would identify
approximation to the Laplacian places in an image where a line
operator: description was represented
by data more than 1 pixel wide
0 1 0 (i.e., the pattern to match). As
this is redundant, the thinning
1 –4 1 algorithm would chose one
of the redundant pixels to
0 1 0 be set to 0. Mathematical
morphology operations can
apply to both binary and gray
matched filter: A matched fil- scale images. This figure shows
ter is an operator that produces a small image patch image
169
before and after a thinning algorithms to represent max-
operation. imally matched structures. The
graph below has two maximal
cliques: BCDE and ABD.
A

B D

matrix: A mathematical struc-


ture of a given number of rows
and columns with each entry C E
usually containing a number. A
matrix can be used to repre-
sent a transformation between maximum a posteriori prob-
two coordinate systems, record ability: The highest proba-
the covariance of a set of vec- bility after some event or
tors, etc. A matrix for rotating observations. This term is
a 2D vector by 6 radians is: often used in the context
of parameter estimation, pose
  estimation or object recogni-
cos 6  sin 6 
−sin 6  cos 6 
  tion problems, in which case
we wish to estimate the param-
 
0866 0500 eters, position or identity
= −0500 0866 respectively that have highest
probability given the observed
image data.
matrix array camera: A 2D maximum entropy: A method
solid state imaging sensor, for extracting the maximum
such as those found in typ- amount of information
ical current video, webcam and (entropy) from a measurement
machine vision cameras. (such as an image) in the pres-
matte surface: A surface ence of noise. This method will
whose reflectance follows the always give a conservative re-
Lambertian model. sult; only presenting structure
where there is evidence for it.
maximal clique: A clique (all
nodes are connected to all maximum entropy restor-
other nodes in the clique) ation: An image restoration
where no further nodes exist technique based on
that are connected to all nodes maximum entropy.
in the clique. Maximal cliques maximum likelihood estima-
may have different sizes – tion: Estimating the param-
the issue is maximality, not eters of a problem that has the
size. Maximal cliques are used highest likelihood or prob-
in association graph matching ability, i.e., given the observed

170
data. For example, the max- mean curvature: A mathemat-
imum likelihood estimate ical characterization for a com-
of the mean of a Gaussian ponent of local surface shape
distribution is the average of at a point on a smooth
the observed samples drawn surface. Each point can be
from that distribution. uniquely described by a pair
of principal curvatures. The
MCMC: See Markov Chain
mean curvature is the average
Monte Carlo.
of the principal curvatures.
MDL: See minimum description
mean filter: See mean smooth-
length.
ing operator.
mean and Gaussian curvature
mean shift: An adaptive gradi-
shape classification: A clas-
ent ascent technique that oper-
sification of a local (i.e., very
ates by iteratively moving the
small) surface patch (often center of a search window to
at single pixels from a the average of certain points
range image) into one of a set within the window.
of simple surface shape classes
based on the signs of the mean mean smoothing operator: A
and Gaussian curvatures. The noise reduction operator that
standard set of shape classes is: can be applied to a gray scale
{plane, concave cylinder, con- image or to separate compon-
vex cylinder, concave ellipsoid, ents of a multispectral image.
convex ellipsoid, saddle val- The output value at each pixel
ley, saddle ridge, minimal}. is the average of the values of
Sometimes the classes {saddle all pixels in a neighborhood
valley, saddle ridge, minimal} of the input pixel. The size
are conflated into the single of the neighborhood deter-
class “hyperbolic”. This table mines how much smoothing
summarizes the classifications (or noise reduction) is done,
based on the curvature signs: but also how much blurring of
fine detail also occurs. A image
with Gaussian noise with  =
MEAN CURVATURE 13 and its mean smoothing are:
– 0 +


GAUSSIAN CURVATURE

IMPOSSIBLE measurement resolution: The


degree to which two differing
+
quantities can be distinguished

171
by measurement. This may be medial surface: The medial sur-
the minimum spatial distance face of a volume is the 3D gen-
that two adjacent pixels rep- eralization of the medial axis of
resent (spatial resolution) or a planar region. It is the locus
the minimum time difference of centers of spheres that touch
between visual observations the surface of the volume at
(temporal resolution), etc. three or more points.
medial axis skeletonization: median filter: See median
See medial axis transform. smoothing.
medial axis transform: An median flow filtering: A noise
operation on a binary image reduction operation on vec-
that transforms regions into sets tor data that generalizes the
of pixels that are the centers of median filter on image data.
circles that are bitangent to the
The assumption is that the
boundary and that fit entirely
vectors in a spatial neighbor-
within the region. The value
of each point on the axis is the hood about the current vector
radius of the bitangent circle. should be similar. Dissimilar
This can be used to represent vectors are rejected. The term
the region by a simpler axis-like “flow” arose through the filter’s
structure and is most effective development in the context of
on elongated regions. A region image motion.
and its medial axis are below. median smoothing: An image
noise reduction operator that
replaces a pixel’s value by
the median (middle) of the
sorted pixel values in its
neighborhood. An image with
medial line: A curve going salt-and-pepper noise and the
through the middle of an result of applying median
elongated structure. See also smoothing are:
medial axis transform. This
figure shows a region and its
medial line.

medical image registration: A


general term for registration
of two or more medical
image types or an atlas with
some image data. A typical

172
registration would align X-ray smaller cells. For example see
CAT and NMR images. Delaunay triangulation.
membrane model: A surface fit- metameric colors: Colors that
ting model that minimizes a are defined by a limited num-
combination of the smooth- ber of channels each of which
integrates a range of the
ness of the fit surface and the
spectrum. Hence the same
closeness of the fit surface to
metameric color can be caused
the original data. The surface
by a variety of spectral distribu-
class must have C 0 continu- tions.
ity and thus it differs from
the smoother thin plate model metric determinant: The met-
that has C 1 continuity. ric determinant is a measure of
curvature. For surfaces, it is the
mesh model: A tessellation of square root of the determinant
an image or surface into poly- of the first fundamental form
gonal patches, much used in matrix of the surface.
computer aided design (CAD).
The vertices of the mesh are metric property: A visual prop-
called nodes, or nodal points. erty that is a measurable
A popular class of meshes is quantity, such as a distance
based on triangles, for instance or area. This contrasts with
the Delaunay triangulation. logical properties such as
Meshes can be uniform, i.e., all image connectedness.
polygons are the same, or metric reconstruction: Recon-
non-uniform. Uniform meshes struction of the 3D structure
can be represented by small of a scene with correct spa-
sets of parameters. Surface tial dimensions and angles.
meshes have been used for This contrasts with project-
modeling free-form surfaces ive reconstruction. Two views
(e.g., faces, landscapes). See of a metrical and projective
also surface fitting. This icosa-
hedron is a mesh model of a
nearly spherical object:

OBSERVED RECONSTRUCTED
VIEW VIEW
METRICAL RECONSTRUCTION

OBSERVED RECONSTRUCTED
mesh subdivision: Methods for VIEW VIEW
subdividing cells in a mesh
model into progressively PERSPECTIVE RECONSTRUCTION

173
reconstruction of a cube are micron: One millionth of a
below. The metrical projection meter; a micrometer.
looks “correct” from all views, microscope: An optical device
but the perspective projection
observing small structures such
may look “correct” only from
as organic cells, plant fibers or
the views where the data was
integrated circuits.
acquired.
metric stratum: These are microtexture: See statistical
the set of similarity texture.
transformations (i.e., rigid mid-sagittal plane: The plane
transformations with a scal- that separates the body (and
ing). This is what can be brain) into left and right halves.
recovered from image data In medical imaging (e.g., NMR),
without external information it usually refers to a view of
such as some known length. the brain sliced down the mid-
metrical calibration: Calibra- dle between the two hemi-
tion of intrinsic and extrinsic spheres.
camera parameters to enable middle level vision: A general
metric reconstruction of a term referring to the stages of
scene. visual data processing between
Mexican hat operator: A con- low level and high level vision.
volution operator that im- There are many variations of
plements either a Laplacian the definition of this term
of Gaussian or difference of but a usable rule of thumb
Gaussians operator (which is that middle level vision
produce very similar results). starts with descriptions of the
The mask that can be used to contents of an image and
implement this convolution results in descriptions of the
has a shape similar to a Mexican features of the scene. Thus,
hat (sombrero), as seen here: binocular stereo would be a
middle level vision process
because it acts on image edge
x 10
–3 fragments to produce 3D scene
fragments.
1
MIMD: See multiple instruction
0
multiple data.
–1
minimal point: A point on a
–2 hyperbolic surface where the
–3 two principal curvatures are
–4
equal in magnitude but oppos-
–4
–2 2
ite in sign, i.e., 1 = −2 .
0 0
2 –2 minimal spanning tree: Con-
–4
Y
X sider a graph G and a subset
174
T of the arcs in G such that all for which the distance  x −
nodes in G are still connected m c  is smallest.
in T and there is exactly one
minimum spanning tree: See
path joining any two nodes. minimal spanning tree.
T is a spanning tree. If each
arc has a weight (possibly con- MIPS: millions of instructions
stant), the minimal spanning per second.
tree is the tree T with small- mirror: A specularly reflecting
est total weight. This is a graph surface for which incident light
and its minimal spanning tree: is reflected only at the same
angle and in the same plane as
the surface normal.
miss-one-out test: See leave-
GRAPH MINIMAL SPANNING one-out test.
TREE
missing data: Data that is
unavailable, hence requiring it
minimum bounding rectangle: to be estimated. For example
The rectangle of smallest area a moving person may become
that surrounds a set of image occluded resulting in missing
data. position data for a number of
minimum description length frames.
(MDL): A criterion for compar- missing pixel: A pixel for which
ing descriptions usually based no value is available (e.g., if
on the implicit assumption that there was a problem with a
the best description is the sensing element in the image
one that is shortest (i.e., takes sensor).
the fewest number of bits mixed pixel: A pixel whose mea-
to encode). The minimum surement arises from more
description usually requires than one scene phenomena.
several components: 1) the For example, a pixel that
models observed (e.g., whether observes the edge between
lines or circular arcs), 2) two regions. This pixel has a
the parameters of the models gray level that lies between the
(e.g., the line endpoints), 3) different gray levels of the two
how the image data varies from regions.
the models (e.g., explicit devi-
ations or noise model param- mixed reality: Image data that
eters) and 4) the remainder of contains both original image
data and overlaid com-
the image that is not explained
puter graphics. See also
by the models.
augmented reality. This image
minimum distance classifier: shows an example of mixed
Given an unknown sample reality, where the butterfly is a
with feature vector x , select the graphical object added to the
class c with model vector m c image of the small robot:
175
model acquisition: The process
of learning a model, usually
based on observed instances
or examples of the structure
being modeled. This may be
simply learning the param-
eters of a distribution from
examples. For example, one
might learn the image tex-
mixture model: A probabilistic ture properties that distinguish
representation in which more tumorous cells from normal
than one distribution is com- cells. Alternatively, the struc-
bined, modeling a situation ture of the object might be
where the data may arise from learned as well, such as con-
different sources or have differ- structing a model of a build-
ent behaviors, each with differ- ing from a video sequence.
ent probability distributions. Another type of model acqui-
sition is learning the proper-
MLE: See maximum likelihood
ties of an object, such as what
estimation.
properties and relations define
modal deformable model: A a square as compared to other
deformable model based on geometric shapes.
modal analysis (i.e., study of
the different shapes that an model base: A database of
object can assume). models usually used as part of
an identification process.
mode filter: A noise reduction
filter that, for each pixel, out- model base indexing: Select-
puts the mode (most common) ing one or more candidate
value in its local neighbor- models from a model database
hood. The figure below shows of structures known by the sys-
a raw image with salt-and-pep- tem. This is usually to eliminate
per noise and the filtered ver- exhaustive testing with every
sion at the right. member of the model base.
model based coding: A method
of encoding the contents of
an image (or video sequence )
using a pre-defined or learned
set of models. This could
be for producing a more
compact description of the
image data (see model based
compression) or for produ-
model: An abstract representa- cing a symbolic description.
tion of some object or class of For example, a Mondrian style
objects. image could be encoded by
176
the positions, sizes and col- using a geographic information
ors of the colored rectangular system model of the road net-
regions. work.
model based compression: An model based tracking: An
application of model based image tracking process that
coding for the purpose of uses models to locate the pos-
reducing the amount of mem- ition of moving targets in an
ory required to describe the image sequence. For example,
image while still allowing re- the estimated position, orien-
construction of the original tation and velocity of a mod-
image. eled vehicle in one image
model based feature detection: allows a strong prediction of its
Using a parametric model location in the next image in
of a feature to locate the sequence.
instances of the feature in model based vision: A general
an image. For example, a term for using models of the
parametric edge detector uses objects expected to be seen in
a parameterized model of a the image data to help with
step edge that encodes edge the image analysis. The model
direction and edge magnitude. allows, among other things,
model based recognition: prediction of additional model
Identification of the structures feature positions, verification
in an image by using some that a set of features could be
internally represented model part of the model and under-
of the objects known to the standing of the appearance of
computer system. The models the model in the image data.
are usually geometric models. model building: See also model
The recognition process finds acquisition. The pro-
image features that match
cess of constructing a
the model features with the
geometric model usually
right shape and position.
based on observed instances
The advantage of model
based recognition is that the or examples of the structure
model encodes the object being modeled, such as from a
shape thus allowing predic- video sequence.
tions of image data and less model fitting: See model regis-
chance of coincidental features tration.
being falsely recognized. model invocation: See model
model based segmentation: An base indexing.
image segmentation process
model reconstruction: See
that uses geometric models to
partition the image into dif- model acquisition.
ferent regions. For example, model registration: A general
aerial images could have the term for aligning a geom-
visible roads segmented by etric model to a set of image

177
data. The process may require result is a set of light and dark
estimating the rotation, trans- bands in the observed image.
lation and scale that maps As well as causing image
a model onto the image degradation, this effect can
data. There may also be also be used in range sensors,
shape parameters, such as where the fringe positions
model length, that need to give an indication of surface
be estimated. The fitting depth. An example of typical
may need to account for observed fringe patterns is:
perspective distortion. This
figure shows a 2D model
registered on an intensity
image of the same part.

model selection: See model


base indexing.
modulation transfer function
(MTF): Informally, the MTF is
a measure of how
well spatially varying patterns Moiré interferometry: A tech-
are observed by an optical nique for contouring surfaces
system. More formally, in that works by projecting a
a 2D image, let X fh  fv  and fringe pattern (e.g., of straight
Y fh  fv  be the Fourier trans- lines) and observing this pat-
forms of the input xh v tern through another grating.
and output yh v images. This effect can be acheieved in
Then, the MTF of a hori- other ways as well. The tech-
zontal and vertical spatial nique is useful for measuring
frequency pair  fh  fv  is extremely small stress and dis-
 H fh  fv   /  H0 0 , where tortion movements.
H fh  fv  = Y fh  fv /X fh  fv . Moiré pattern: See moiré fringe.
This is also the magnitude of
the optical transfer function. Moiré topography: A method
for measuring the local shape
Moiré fringe: An interference of a surface by analyzing the
pattern that is observed spacing of moiré fringes on the
when spatially sampling, at
target surface.
a given spatial frequency, a
signal that has a slightly dif- moment: A method for summar-
ferent spatial frequency. The izing the distribution of pixel

178
positions or values. Moments monochrome: Containing only
are a parameterized family different shades of a single
of values. For example, if color. This color is usually dif-
Ix y is a binary image then ferent shades of gray, going
xy Ix yx p y q computes its from pure black to pure white.
pq th moment mpq . (See also monocular: Using a single cam-
gray level moments and era, sensor or eye. This
moments of intensity.) contrasts with binocular and
moment characteristic: See multi-ocular stereo where
moment invariant. more than one sensor is
moment invariant: A function used. Sometimes there is
of image moment values that also the implication that
keeps the same value even if the image data is acquired
the image is transformed in from only a single viewpoint
some manner. For example, as a single camera taking
the value A12  20 2 +  02 2  images over time is mathemat-
is invariant where pq are ically equivalent to multiple
central moments of a binary cameras.
image region and A is the area
of the region. This value is a monocular depth cue: Image
constant even if the image data evidence that indicates that
is translated, rotated or scaled. one surface may be closer
to the viewer than another.
moments of intensity: An For example, motion parallax
image moment value that takes or occlusion relationships give
account of the gray scales of evidence of relative depths.
the image pixels as well as
their positions. For example, monocular visual space: The
if Gx y is a gray scale image, visual space behind the lens in
then xy Gx yx p y q com- an optical system. This space is
commonly assumed to be with-
putes its pq th moment
out structure but scene depth
of intensity gpq . See also
can be recovered from the de-
gray level moment.
focus blurring that occurs in
Mondrian: A famous visual this space.
artist from the Nether-
monotonicity: A sequence of
lands, whose later paintings
values or function that is
were composed of adjacent
either continuously increasing
rectangular blocks of constant
(monotone increasing) or con-
(i.e., without shading) color.
This style of image has been tinuously decreasing (mono-
used for much color vision tone decreasing).
research and, in particular, Moravec interest point oper-
color constancy because of ator: An operator that locates
its simplified image structure interest points at pixels where
without shading, specularities, neighboring intensity values
shadows or light sources. change greatly in at least one

179
direction. These points can kernel. The process could also
be used for stereo matching be used to separate touching
or feature point tracking. The objects.
operator computes the sum
morphological smoothing: A
of the squares of pixel dif-
ferences in a line vertically, gray scale mathematical mor-
horizontally and both diagonal phology operation applied to
directions in a 5 × 5 window gray scale images that results
about the given pixel. The min- in an output image simi-
imum of these four values is lar to that produced by
selected and then all values standard noise reduction. The
that are not local maxima or smoothing is calculated by
are below a given threshold are CG  OG A B B where CG  and
suppressed. This image shows OG  are the gray scale close
the interest points found by the and open operations respect-
Moravec operator as white dots ively of image A by kernel B.
on the original image. morphological transform-
ation: One of a large class
of binary and gray scale image
trans formations whose pri-
mary characteristic is they
react to the pattern of the
pixel values rather than the
values themselves. Examples
include dilation, erosion,
skeletonizing, thinning, etc.
morphological gradient: A The right figure below is the
gray scale mathematical mor- opening of the left figure,
phology operation applied to when using a disk shaped
gray scale images that results structuring element 11 pixels
in an output image similar in diameter.
to the standard intensity
gradient. The gradient is calcu-
lated by 12 DG  A B − EG  A B
where DG  and EG  are the
gray scale dilate and erode
respectively of image A by
kernel B.
morphological segmentation:
Using mathematical morpho-
morphology: The shape of a
logy operations applied to
structure. See also mathemat-
binary images to extract isol-
ated regions of the desired ical morphology.
shape. The desired shape is morphometry: Techniques for
specified by the morphological the measurement of shape.
180
mosaic: The construction of a occurs when an object moves
larger image from a collection during image capture.
of partially overlapping images motion coding: 1) A com-
taken from different view ponent of video sequence
points. The reconstructed compression in which effi-
image could have different geo- cient methods are used for re-
metries, e.g., as if seen from a presenting movement of image
single perspective viewpoint, or regions between video frames.
as if seen from an orthographic 2) A term for neural cells tuned
viewpoint. See also image to respond for direction and
mosaic. speeds of image motion.
motion: A general language motion detection: Analysis of
term, but, in the context of an image sequence to deter-
computer vision, refers to mine if or when something in
analysis of an image sequence the observed scene moves. See
where the camera position or also change detection.
scene structure changes over motion discontinuity: When
time. the smooth motion of either
motion analysis: Analysis of an the camera or something in
image sequence in order to the scene changes, such as the
extract useful information. speed or direction of motion.
Examples of information rou- Another form of motion
tinely extracted include: shape discontinuity is between two
of observed scene, figure– groups of adjacent pixels that
ground separation, egomotion have different motions.
estimation, and estimates motion estimation: Estimating
of a target’s position and the motion direction and
motion. speed of the camera or some-
motion blur: The blurring of an thing in the scene.
image that arises when either motion factorization: Given a
the camera or something in the set of tracked feature points
scene moves while the image through an image sequence,
is being acquired. The image a measurement matrix can
below shows the blurring that be constructed. This matrix
can be factored into compon-
ent matrices that represent
the shape and 3D motion
of the structure up to an
3D affine transform (which is
removable using knowledge of
the intrinsic camera param-
eters).
motion field: The projection of
the relative motion vector for

181
each scene point onto the These analyses are often char-
image plane. In many circum- acterized by assumptions on
stances this is closely related to temporal coherence that sim-
the optical flow, but may dif- plify computation.
fer as image intensities can also motion smoothness con-
change due to illumination straint: The assumption that
changes. Similarly, motion of nearby points in the image
a uniformly shaded region is have similar motion direc-
not observable locally because tions and speeds, or similar
there is no changes in image optical flow. This constraint is
intensity values. based on the fact that adjacent
motion layer segmentation: pixels generally record data
The segmentation of an image from the projection of adja-
into different regions where cent surface patches from the
the motion is locally con- scene. These scene compon-
sistent. The layering effect ents will have similar motion
is most noticeable when the relative to the observer. This
observer is moving through a assumption can help reduce
scene with objects at differ- motion estimation errors or
ent depths (causing different constrain the ambiguity in
amounts of parallax) some of optical flow estimates arising
which might also be moving. from the aperture problem.
See also motion segmentation. motion tracking: Identifi-
motion model: A mathematical cation of the same target
model of types of motion feature points through an
allowable for the target object image sequence. This could
or camera, such as only lin- also refer to tracking complete
ear motion along the optical objects as well as feature
axis with constant velocity. points, including estimating
Another example might allow the trajectory or motion
velocities and accelerations in parameters of the target.
any direction, but occasionally movement analysis: A general
discontinuities, such as for a term for analyzing an image
bouncing ball. sequence of a scene where ob-
motion representation: See jects are moving. It is often
motion model. used for analysis of human
motion such as for people
motion segmentation: See mo- walking or using sign language.
tion layer segmentation.
moving average smoothing: A
motion sequence analysis: The form of image noise reduction
class of computer vision algo- that occurs over time by
rithms that process sequences averaging the most recent
of images captured close to- images together. It is based
gether in space and time, typ- on the assumption that vari-
ically by a moving camera. ations in time of the observed
182
intensity at a pixel are random. Expert Group (MPEG), origin-
Thus, averaging the values ally concerned with similar
will produce intensity esti- applications as H.263 (very
mates closer to the true (mean) low bit rate channels, up to
value. 64 kbps). Subsequently ex-
moving light display: An tended to encompass a large
image sequence of a darkened set of multimedia applications,
scene containing objects with including over the Internet.
attached point light sources. MPEG 7: A standard formulated
The light sources are observed by the ISO Motion Pictures
as a set of moving bright spots. Expert Group (MPEG). Unlike
This sort of image sequence MPEG 2 and MPEG 4, that
was used in the early research deal with compressing multi-
on structure from motion. media contents within specific
moving object detection: Ana- applications, it specifies the
lyzing an image sequence, usu- structure and features of the
ally with a stationary camera, to compressed multimedia con-
detect whether any objects in tent produced by the different
the scene move. standards, for instance to be
used in search engines.
moving observer: A camera or
other sensor that is mov- MRF: See Markov random field.
ing. Moving observers have MRI: Magnetic Resonance Im-
been extensively used in recent aging. See nuclear magnetic
research on structure from res- onance.
motion. MSRE: Mean Squared Recon-
MPEG: Moving Picture Experts struction Error.
Group. A group developing MTF: See modulation transfer
standards for coding digital function.
audio and video, as used in
video CD, DVD and digital tele- multi-dimensional edge detec-
vision. This term is often used tion: variation on standard
to refer to media that is stored edge detection of gray scale
in the MPEG 1 format. images in which the input
image is multi-spectral (e.g., a
MPEG 2: A standard formulated RGB color image). The edge
by the ISO Motion Pictures detection operator may detect
Expert Group (MPEG), a sub- edges in each dimension inde-
set of ISO Recommendation pendently and then combine
13818, meant for transmission the edges or may use all infor-
of studio-quality audio and mation at each pixel directly.
video. It covers four levels of The following image shows
video resolution. edges detected from red, green
MPEG 4: A standard formulated and blue components of an
by the ISO Motion Pictures RGB image.

183
source images to lie on top of
each other or to be combined.
R (See also sensor fusion.) For
example, two overlapping
intensity images could be
registered to help create a
mosaic. Alternatively, the
G images need not be from
the same type of sensor.
(See multi-modal fusion.)
For example, NMR and CAT
images of the same body part
B
could be registered to provide
richer information, e.g., for
a doctor. This image shows
two unregistered range images
on the left and the registered
multi-dimensional histogram:
datasets on the right.
A histogram with more than
one dimension. For example
consider measurements as vec-
tors, e.g., from a multi-spectral
image, with N dimensions in
the vector. Then one could cre-
ate a histogram represented by
an array with dimension N .
The N components in each vec- multi-level: See multi-scale
tor are used to index into the method.
array. Accumulating counts or multi-modal analysis: A gen-
other evidence values in the eral term for image analysis
array makes it a histogram. using image data from more
multi-grid method: An efficient than one sensor type. There is
algorithm for solving systems often the assumption that the
of discretized differential (or data is registered so that each
other) equations. The term pixel records data of two or
“multi-grid” is used because more types from the same por-
the system is first solved at a tion of the observed scene.
coarse sampling level, which is
then used to initialize a higher- multi-modal fusion: See sensor
resolution solution. fusion.
multi-image registration: A multi-modal neighborhood
general term for the geometric signature: A description of
alignment of two or more a feature point based on the
image datasets. Alignment image data in its neighbor-
allows pixels from the different hood. The data comes several

184
registered sensors, such as for multi-scale methods are:
X-ray and NMR. 1) some structures have differ-
multi-ocular stereo: A stereo ent natural scales (e.g., a thick
triangulation process that bar could also be considered
uses more than one camera to be two back-to-back edges)
and 2) coarse scale informa-
to infer 3D information.
tion is generally more reliable
The terms binocular stereo
in the presence of image noise,
and trinocular stereo are
but the spatial accuracy is bet-
commonly used when there
ter in finer scale information
are only two or three cameras (e.g., an edge detector might
respectively. use a coarse scale to reliably
multi-resolution method: See detect the edges and a finer
multi-scale method. scale to locate them more accu-
multi-scale description: See rately). Below is an image with
multi-scale method. two scales of blurring.
multi-scale integration:
1) Combining informa-
tion extracted by using
operators with different scales.
2) Combining information
extracted from registered
images with different scales.
These two definitions could
just be two ways of consi-
dering the same process if the multi-scale representation: A
difference in operator scale is representation having image
only a matter of the amount features or descriptions that
of smoothing. An example of belong to two or more
multi-scale integration occurs scales. An example might be
combining edges extracted zero crossings detected from
from images with different intensity images that have
amounts of smoothing to received increasing amounts
produce more reliable edges. of Gaussian smoothing. A
multi-scale model representa-
multi-scale method: A general tion might represent an arm
term for a process that uses as a single generalized cylinder
information obtained from at a coarse scale, two general-
more than one scale of ized cylinders at an intermed-
image. The different scales iate scale and with a surface
might be obtained by redu- triangulation at a fine scale.
cing the image size or by The representation might have
Gaussian smoothing of the results from several discrete
image. Both methods reduce scales or from a more con-
the spatial frequency of the tinuous range of scales, as in
information. The main reasons a scale space. Below are zero
185
crossings found at two scales of data), or seven or more bands,
Gaussian blurring. including several infrared
wavelengths (e.g., satellite
remote sensing). Recent
hyperspectral sensors can give
measurements at 100–200 dif-
ferent wavelengths. The typical
image representation uses a
vector to record the different
spectral measurements at each
pixel of an image array. The
following image shows the red,
green and blue components of
an RGB image.

multi-sensor geometry: The


relative placement of a set
of sensors or multiple views
from a single sensor but from R
different positions. One key
consequence of the different
placements is ability to deduce
the 3D structure of the scene.
The sensors need not be the
same type but usually are for G
convenience.
multi-spectral analysis: Using
the observed image brightness
at different wavelengths to aid
in the understanding of the B
observed pixels. A simple ver-
sion uses RGB image data.
Seven or more bands, includ-
ing several infrared wave-
lengths are often used for satel- multi-spectral segmentation:
lite remote sensing analysis. Segmentation of a multi-
Recent hyperspectral sensors spectral image. This can
can give measurements at 100– be addressed by segmenting
200 different wavelengths. the image channels individu-
ally and then combining the
multi-spectral image: An results, or alternatively the
image containing data mea- segmentation can be based
sured at more than one
on some combination of the
wavelength. The number of
information from the channels.
wavelengths may be as low as
two (e.g., some medical scan- multi-spectral thresholding: A
ners), three (e.g., RGB image segmentation technique for
186
multi-spectral image data. A for a variable that is a vec-
common approach is to thresh- tor rather than as a scalar. Let
old each spectral channel x be the vector variable with
independently and then logic- dimension N . Assume that this
ally AND together the result- variable has mean value x
ing images. An alternative is and covariance matrix C. Then
to cluster pixels in a multi- the probability of observing the
spectral space and choose particular value x is given by:
thresholds that select desired 1 − 12   x  C−1 
x − x −
x
clusters. The images below N 1 e
2 2 C 2
show a colored image first
thresholded in the blue chan- multi-view geometry: See
nel (0–100 accepted) and multi- sensor geometry.
then ANDed with the thresh-
olded green channel (0–100 multi-view image registration:
accepted). (See plate section See multi-image registration.
for a colour version of these multi-view stereo: See
figures.) multi-sensor geometry.
multiple instruction multiple
data (MIMD): A form of par-
allelism in which, at any given
time, each processor might be
executing a different instruc-
tion or program on a dif-
multi-tap camera: A camera ferent dataset or pixel. This
that provides multiple outputs. contrasts with single instruc-
tion multiple data parallelism
multi-thresholding: Thresh- where all processors execute
olding using a number of the same instruction simultan-
thresholds giving a result eously although on different
that has a number of gray pixels.
scales or colors. In the fol-
lowing example the image has multiple motion segmen-
been thresholded with two tation: See motion segmen-
thresholds (113 and 200). tation.
multiple target tracking: A
general term for tracking
multiple objects simultan-
eously in an image sequence.
Example applications include
tracking football players and
automobiles on a road.
multiple view interpolation:
multi-variate normal distribu- A technique for creating (or
tion: A Gaussian distribution recognizing) new unobserved
187
views of a scene from example reflectance of the nearby
images captured from other surface (through the spectrum
viewpoints. of the light reflecting from
multiplicative noise: A model the nearby surface onto the
for the corruption of a sig- first surface). The following
nal where the noise is propor- diagram shows how mutual
illumination can occur.
tional to the signal strength.
fx y = gx y + gx yvx y mutual information: The
where fx y is the observed amount of information two
signal, gx y is the ideal (ori- pieces of data (such as images)
ginal) signal and vx y is the have in common. In other
noise. words given a data item A
and an unknown data item
Munsell color notation system: B, the mutual information
A system for precisely spe- MIA B = HB − HBA where
cifying colors and their Hx is the entropy.
relationships, based on hue,
value (brightness) and chroma mutual interreflection: See
(saturation). The “Mun- mutual illumination.
sell Book of Color” contains
colored chips indexed by these
three attributes. The color of
any unknown surface can be
identified by comparison with
the colors in the book under
specified lighting and viewing
conditions.
mutual illumination: When
light reflecting from one
surface illuminates another
surface and vice versa. The
consequence of this is that
light observed coming from
a surface is a function of not
only the light source spectrum
and the reflectance of the
target surface, but also the

CA LIGHT
M
ER
WHITE

A
E
C
FA
R
BR

SU
O

EN
N
W

GRE
EE
N

R
G

RED SURFACE

188
N

NAND operator: An arithmetic near infrared: Light wave-


operation where a new image lengths approximately in the
is formed by NANDing (lo- range 750–5000 nm.
gical AND followed by NOT) nearest neighbor: A classifi-
together corresponding bits for cation, labeling or grouping
every pixel of the two image principle in which a data item
images. This operator is most is associated with or takes
appropriate for binary images the same label as the previ-
but may also be applied to ously classified data item that
gray scale images. For example is nearest to the first data item.
the following shows the NAND This distance might be based
operator applied to two binary on spatial distance or a dis-
images: tance in a property space. In
this figure the unknown square
is classified with the label of the
nearest point, namely a circle.

narrow baseline stereo: A


form of stereo triangulation in
which the sensor positions are
close together. The baseline
is the distance between the
sensor positions. Narrow base-
line stereo often occurs when
the image data is from a video Necker cube: A line drawing of
sequence taken by a moving a cube drawn under ortho-
camera. graphic projection, which as a

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

189
result can be interpreted in two negate operator: See invert op-
ways. erator.
neighborhood: 1) The neigh-
borhood of a vertex v in a
graph is the set of vertices
that are connected to v by an
arc. 2) The neighborhood of
a point (or pixel) x is a set of
Necker reversal: An ambiguity points “near” x. A common
in the recovery of 3D structure definition is the set of points
from multiple images. Under within a certain distance of x,
affine viewing conditions, the where the distance metric
sequence of 2D images of a set of may be Manhattan distance
rotating 3D points is the same as or Euclidean distance. 3) The
the sequence produced by the 4 connected neighborhood of
rotation in the opposite direc- a 2D location x y is the set
tion of a different set of points, of image locations x + 1 y
so that two solutions to the x − 1 y x y + 1 x y − 1.
structure and motion problem The 8 connected neighbor-
are possible. The different set hood is the set of pixels
of points is the reflection of the x + i y + j − 1 ≤ i j ≤ 1. The
first set about any plane perpen- 26 connected neighborhood of
dicular to the optical axis of the a 3D point x y z is defined
camera. analogously.
needle map: An image repre-
sentation used for displaying
2D and 3D vector fields, such
as surface normals. Each pixel
4 connected 8 connected
has a vector. Diagrams show-
ing these use little lines with
the magnitude and direction neural network: A classifier that
of the vector projected onto maps input data x of dimen-
the image of a 3D vector. To sion n to a space of outputs y
avoid overcrowding the image, of dimension m. As a black box,
the pixels where the lines are the network is a function f 
drawn are a subset of the full n → 0 1m . The most com-
image. This image shows a nee- monly used form of neural
dle map of the surface normals network is the multi-layer
on the block sides. perceptron (MLP). An MLP is
characterized by a m × n matrix
of weights W, and a transfer
function that maps the reals
to 0 1. The output of the
single-layer network is f 
x =
W x  where is applied
190
elementwise to vector argu- next view planning: When
ments. A multi-layer network is inspecting an object or obtain-
a cascade of single-layer net- ing a geometric or appear-
works, with different weights ance-based model, it may be
matrices at each layer. For necessary to observe the object
example, a two-layer network from several places. Next view
with k hidden nodes is defined planning determines where to
by weights matrices W1 ∈ k×n next place the camera (by mov-
and W2 ∈ m×k , and written ing either the object or the
fx  = W2 W1 x . A common camera) based on either what
choice for is the sigmoid was observed (in the case of
function t = 1 + e −st −1 for unknown objects) or a geomet-
some value of s. When we ric model (in the case of known
make it explicit that f is a objects).
function of the weights as well next view prediction: See next
as the input vector, it is writ- view planning.
ten f W
x .Typically, a neural NMR: See nuclear magnetic
network is trained to predict resonance.
the relationship between the
x ’s and y ’s of a given col- node of graph: A symbolic
lection of training examples. representation of some entity
Training means setting the or feature. It is connected to
weights matrices to minimize other nodes in a graph by
the training error eW = arcs, that represent relation-
 ships between the different
yi  f W
xi  where d mea-
i d entities.
sures distance between the
network output and a train- noise: A general term for the
ing example. Common choices deviation of a signal away from
for dy  y   include the 2-norm its “true” value. In the case of
y − y  2 . images, this leads to pixel val-
ues (or other measurements)
Newton’s optimization that are different from their
method: To find a local min- expected values. The causes of
imum of function f  n →  noise can be random factors,
from starting position x0 . such as thermal noise in the
Given the function’s gradient sensor, or minor scene events,
f and Hessian H evaluated
at xk , the Newton update is
xk+1 = xk − H−1 f . If f is a
quadratic form then a single
Newton step will directly yield
the global minimum. For
general f , repeated Newton
steps will generally converge
to a local optimum.
191
such as dust or smoke. Noise corrupted with this type
can also represent systematic, of noise converges to the
but unmodeled, events such as true value. Noise reduction
short term lighting variations methods often introduce
or quantization. Noise might other distortions, but these
be reduced or removed using may be less significant to the
a noise reduction method. application than the original
Above are images without and noise. An image with salt-and-
with salt-and-pepper noise. pepper noise and its noise
noise model: A way to model the reduced by median smoothing
statistical properties of noise are shown in the figure.
without having to model the
causes of the noise. One general
assumption about noise is that
it has some underlying, but per-
haps unknown, distribution.
A Gaussian noise model is a
commonly used for random fac-
tors and a uniform distribution
is often used for unmodeled
scene effects. Noise could be noise removal: See noise
modeled with a mixture model. reduction.
The noise model typically has noise source: A general term
one or more parameters that for phenomena that corrupt
control the magnitude of the image data. This could be
noise. The noise model can also systematic unmodeled pro-
specify how the noise affects the
cesses (e.g., 60 Hz electromag-
signal, such as additive noise
netic noise) or random pro-
(which offsets the true value)
cesses (e.g., electronic shot
or multiplicative noise (which
noise). The sources could be
rescales the true value). The
in the scene (e.g., chaff ), in
type of noise model can
the medium (e.g., dust), in the
constrain the type of noise
lens (e.g., imperfections) or in
reduction method.
the sensor (e.g., sensitivity vari-
noise reduction: An image pro- ations).
cessing method that tries to
noise suppression: See noise
reduce the distortion of an
image that has been caused by reduction.
noise. For example, the images noise-whitening filter: A noise
from a video sequence taken modifying filter that outputs
with a stationary camera and images whose pixels have noise
scene can be averaged together that is independent of 1) other
to reduce the effect of pixels’ noise (spatial noise)
Gaussian noise because the or 2) other values of that
average value of a signal pixel at other times (temporal
192
noise). The resulting image’s nonlinear function of the
noise is white noise. inputs. This covers a large
non-accidentalness: A general range of algorithms. Exam-
principle that can be used to ples of nonlinearity might
improve image interpretation be: 1) doubling the values
based on the concept that of all input data does not
when regularities appear in double the values of the
an image, they are most output results (e.g., a filter
likely to result from regular- that reports the position at
ities in the scene. For exam- which a given value appears),
ple, if two straight lines end 2) applying an operator to
near to each other, then this the sum of two images gives
could have arisen from a different results from adding
coincidental alignment of the the results of the operator
line ends and the observer. applied to the two original
However, it is much more images (e.g., thresholding).
probable that the two lines
non-maximal suppression: A
end at the same point in the
observed scene. This figure technique for suppressing mul-
shows line terminations and tiple responses (e.g., high
orientations that are unlikely to values of gradient magnitude)
be coincidental. representing a single edge or
other feature. The resulting
edges should be a single pixel
NON-ACCIDENTAL TERMINATION wide.
non-parametric clustering: A
data clustering process such
as k-nearest neighbor that does
NON-ACCIDENTAL not assume an underlying
PARALLELISM
probability distribution.
non-hierarchical control: A non-parametric method: A
way of structuring the se- probabilisticmethodusedwhen
quence of actions in an image the form of the underlying
interpretation system. Non- probability distribution is
hierarchical control is when unknown or multi-modal.
there is no master process that Typical applications are to
orders the sequence of actions estimate the a posteriori proba-
or operators applied. Instead, bility of a classification given an
typically, each operator can observation. Parzenwindows
observe the current results and or k-nearestneighbor classifiers
decide if it is capable of exe- areoftenused.
cuting and if it is desirable to non-rigid model representa-
do so. tion: A model representation
nonlinear filter: A process where the shape of the model
where the outputs are a can change, perhaps under
193
the control of a few param- non-rigid tracking: A tracking
eters. These models are useful process that is designed
for representing objects whose to track non-rigid objects.
shape can change, such as This means that it can
moving humans or biological cope with changes in actual
specimens. The differences in object shape as well as appar-
shape may occur over time or ent shape due to perspective
be between different instances. projection and observer
Changes in apparent shape viewpoint.
due to perspective projection non-symbolic representation:
and observer viewpoint are not A model representation in
relevant here. By contrast, a which the appearance is
rigid model would have the described by a numerical
same actual shape irrespec- or image-based descrip-
tive of the viewpoint of the tion rather than a symbolic
observer. or mathematical description.
For example, non-symbolic
non-rigid motion: A motion of
models of a line would be
an object in the scene in which
a list of the coordinates of
the shape of the object also the points in the line or an
changes. Examples include: image of the line. Symbolic
1) the position of a walk- object representations include
ing person’s limbs and 2) the equation of the line or the
the shape of a beating heart. endpoints of the line.
Changes in apparent shape
due to perspective projection normal curvature: A plane that
and viewpoint are not relevant contains the surface normal n 
here. at point p to a surface inter-
sects that surface to form a pla-
non-rigid registration: The nar curve that passes through
problem of registering, or p . The normal curvature is the
aligning, two shapes that curvature of at p . The inter-
can take on a variety of secting plane can be at any
configurations (unlike rigid specified orientation about the
shapes). For instance, a walk- surface normal. See:
ing person, a fish, and facial
features like mouth and eyes
are all non-rigid objects, the n
shape of which changes in
time. This type of registration
is frequently needed in med- p
ical imaging as many human
body parts deform. Non-rigid Γ
registration is considerably
more complex than rigid regis-
tration. See also alignment,
registration, rigid registration.

194
normal distribution: See Gaus- using computer graphics.
sian distribution. However, the main approaches
normal flow: The component to novel view synthesis use
of optical flow in the direc- epipolar geometry and the
tion of the intensity gradient. pixels of two or more images
The orthogonal component is of the object to directly syn-
not locally observable because thesize a new image without
small motions orthogonally do creating a 3D reconstruction.
not change the appearance of NP-complete: A concept in
local neighborhoods. computational complexity cov-
normalized correlation: 1) An ering a special set of problems.
image or signal similarity mea- All of these problems currently
sure that scales the differ- can be solved, in the worst
ences between the signals by a case, in time exponential
measure of the average signal Oe N  in the number or size
strength: N of their input data. For
 the subset of exponential
x − y 2 problems called NP-complete,
 i i i
 i xi2  i yi2  if an algorithm for one could
be found that executes in poly-
This scales the difference so nomial time ON p  for some
that it is less significant if the p, then a related algorithm
inputs are larger. The simi- could be found for any other
larities lie in the range [0,1], NP-complete algorithm.
where 0 is most similar. 2) NTSC: National Television Sys-
A statistical cross correlation tem Committee. A television
process where the correlation signal recording system used
coefficient is normalized to lie for encoding video data at
in the range −1 1, where 1 approximately 60 video fields
is most similar. In the case of per second. Used in the USA,
two scalar variables, this means Japan and other countries.
dividing by the standard devi-
ations of the two variables. nuclear magnetic resonance
(NMR): An imaging technique
NOT operator: See invert based on magnetic properties
operator. of the atomic nuclei. Protons
novel view synthesis: A pro- and neutrons within atomic
cess whereby a new view nuclei generate a magnetic
of an object is synthesized dipole that can respond to an
by combining information external magnetic field. Several
from several images of properties related to the relax-
the object from different ation of that magnetic dipole
viewpoints. One method is by give rise to values that depend
3D reconstruction, e.g., from on the tissue type, thus allow-
binocular stereo, and then ing identification or at least
rendering the reconstruction visualization of the different
195
soft tissue types. The measure-
ment of the signal is a way
of measuring the density of
certain types of atoms, such
as hydrogen in the case of
biological NMR scanners. This
technology is used for med-
ical body scanning, where a
detailed 3D volumetric image
can be produced. Signal levels
are highly correlated with dif-
ferent biological structures so
one can easily observe differ-
ent tissues and their positions.
Also called MRI/magnetic reso-
nance imaging.
NURBS: Non-Uniform Rational
B-Splines: a type of shape
modeling primitive based on
ratios of b-splines. Capable of
accurately representing a wide
range of geometric shapes
including freeform surfaces.
Nyquist frequency: The min-
imum sampling frequency for
which the underlying true
image (or signal) can be
reconstructed from the sam-
ples. If sampling at a lower
frequency, then aliasing will
occur, creating apparent image
structure that does not exist in
the original image.
Nyquist sampling rate: See
Nyquist frequency.

196
O

object: 1) A general term refer-


ring to a group of features in (L,H,W)
H
a scene that humans consider
to compose a larger structure. L
In vision it is generally thought
of as that to which attention is
directed. 2) A general system
theory term, where the object
is what is of interest (unlike
the background). Resolution
or scale may determine what is object contour: See occluding
considered the object. contour.
object centered representa- object grouping: A general term
tion: A model representation meaning the clustering of all
in which the position of the fea- of the image data associated
tures and components of the with a distinct observed object.
model are described relative For example, when observing a
to the position of the object person, object grouping could
itself. This might be a rela- cluster all of the pixels from the
tive description (the nose is image of the person.
4 cm from the mouth) or might object plane: In the case of
use a local coordinate system convex simple lenses typically
(e.g., the right eye is at pos- used in laboratory TV cam-
ition (0,25,10) where (0,0,0) is eras, the object plane is the 3D
the nose.) This contrasts with, scene plane where all points
for example, a viewer centered are exactly in focus on the
representation. Here is a rect- image plane (assuming a per-
angular solid defined in its fect lens and the optical axis
local coordinate system: is perpendicular to the image

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

197
plane). The object plane is the camera system from which
illustrated here: images are being supplied. See
also observer motion estima-
LENS tion.
OPTICAL AXIS observer motion estimation:
IMAGE PLANE OBJECT PLANE
When an observer is moving,
image data of the scene pro-
vides optical flow or trackable
object recognition: A general scene feature points. These
term for identifying which allow an estimate of how the
of several (or many) pos- observer is moving relative to
sible objects is observed in an the scene, which is useful for
image. The process may also navigation control and pos-
include computing the object’s ition estimation.
image or sceneposition, or la-
beling the image pixels or obstacle detection: Using visual
image features that belong to data to detect objects in front
the object. of the observer, usually for
mobile robotics applications.
object representation: An en-
coding of an object into a Occam’s razor: An argument
form suitable for computer attributed to William of Occam
manipulation. The models (Ockham), an English nom-
could be geometric models, inalist philosopher of the early
graph models or appearance fourteenth century, stating that
assumptions must not be need-
models, as well as other forms.
lessly multiplied when explain-
object verification: A compon- ing something (entia non sunt
ent of an object recognition multiplicanda praeter necessi-
process that attempts to ver- tatem). Often used simply
ify a hypothesized object iden- to suggest that, other condi-
tity by examining evidence. tions being equal, the sim-
Commonly, geometric object plest solution must be pre-
models are used to verify that ferred. Notice variant spelling
object features are observed in Ockham. See also minimum
the correct image positions. description length.
objective function: 1) The cost occluding contour: The visible
function used in an optimiza- edge of a smooth curved sur-
tion process. 2) A measure of face as it bends away from an
the misfit between the data and observer. The occluding con-
the model. tour defines a 3D space curve
on the surface, such that a line
oblique illumination: See low of sight from the observer to
angle illumination. a point on the space curve is
observer: The individual (or perpendicular to the surface
camera) making observations. normal at that point. The 2D
Most frequently this refers to image of this curve may also

198
be called the occluding con- face hidden by occlusion. This
tour. The contour can often recovery helps improve com-
be found by an edge detection pleteness when reconstructing
process. The cylinder bound- scenes and objects for virtual
aries on both the left and right reality. This image shows two
are occluding contours from occluded pipes and an esti-
our viewpoint: mated recovery:

OCCLUDING CONTOUR

occluding contour analysis:


A general term that includes occlusion understanding: A
1) detection of the occluding general term for analyzing
contour, 2) inference of the scene occlusions that may in-
shape of the 3D surface at the clude occluding contour detec-
occluding contour and 3) de- tion, determining the relative
termining the relative depth of depths of the surfaces on both
the surfaces on both sides of sides of an occluding contour,
the occluding contour. searching for tee junctions as
occluding contour detection: a cue for occlusion and depth
Determining which of the order, etc.
image edges arise from oc- occupancy grid: A map con-
cluding contours. struction technique used
occlusion: Occlusion occurs mainly for autonomous vehicle
when one object lies between navigation. The grid is a set of
an observer and another squares or cubes representing
object. The closer object oc- the scene, which are marked
cludes the more distant one according to whether the ob-
in the acquired image. The server believes the correspond-
occluded surface is the portion ing scene region is empty
of the more distant object (hence navigable) or full. A
hidden by the closer object. probabilistic measure could
Here, the cylinder occludes also be used. Visual evidence
the more distant brick: from range, binocular stereo
or sonar sensors are typically
used to construct and update
the grid as the observer moves.
OCR: See optical character rec-
ognition.
octree: A volumetric representa-
tion in which 3D space is
occlusion recovery: The pro- recursively divided into eight
cess of attempting to infer the (hence “oct”) smaller volumes
shape and appearance of a sur- by planes parallel to the XY, YZ,
199
XZ coordinate system planes. lines of sight at a point onto a
A tree is formed by linking the single camera image. The space
eight subvolumes to each par- behind the mirrors and cam-
ent volume. Additional subdiv- era(s) is typically not visible.
ision need not occur when a See also catadioptric optics.
volume contains only object or Here a camera using a spher-
empty space. Thus, this repre- ical mirror achieves a very wide
sentation can be more efficient field of view:
than a pure voxel representa-
tion. Here are three levels of
a pictorial representation of an
octree, where one octant and
the largest (leftmost) level is
expanded to give the middle
figure, and similarly an octant
of the middle:

opaque: When light cannot pass


through a structure. This
causes shadows and occlusion.
open operator: A mathematical
odd field: Standard interlaced morphology operator applied
video transmits all of the even to a binary image. The oper-
scan lines in an image frame ator is a sequence of N erodes
first and then all of the odd followed by N dilates, both
lines. The set of odd lines is the using a specified structuring
odd field. element. The operator is useful
O’Gorman edge detector: A for separating touching objects
parametric edge detector. A de- and removing small regions.
composition of the image and The right image was created by
model by orthogonal Walsh opening the left image with an
function masks was used to 11-pixel disk kernel:
compute the step edge param-
eters (contrast and orienta-
tion). One advantage of the
parametric model was a good-
ness of model fit as well as the
edge contrast that increased
the reliability of the detected
edges. operator: A general term for a
omnidirectional sensing: Liter- function that is applied to
ally, sensing all directions some data in order to trans-
simultaneously. In practice, form it in some way. For
this means using mirrors and example see image processing
lenses to project most of the operator.

200
opponent color: A color rep- where the optical flow is dif-
resentation system originally ferent in direction or magni-
developed by Hering in which tude. The regions can arise
an image is represented by from objects moving in differ-
three channels with contrasting ent directions or surfaces at dif-
colors: Red–Green, Yellow– ferent depths. See also optical
Blue, and Black–White. flow field segmentation. The
optical: A process that uses light dashed line in this image is the
and lenses is an optical pro- boundary between optical flow
cess. moving left and right:
optical axis: The ray, perpen-
dicular to the lense and
through the optical center,
around which the lense is sym-
metrical.

Focal Point

Optical Axis

optical flow constraint equa-


tion: The equation It + I ·
 x = 0 that links the observed
u
optical center: See focal point. change in image I ’s intensities
over time It at image pos-
optical character recognition
(OCR): A general term for ition x to the spatial change in
extracting an alphabetic text pixel intensities at that position
description from an image I and the velocity u  x of the
of the text. Common special- image data at that pixel. The
isms include bank numerals, constraint does not completely
handwritten digits, hand- determine the image motion,
written characters, cursive text, as this has two degrees of free-
Chinese characters, Arabic dom. The equation provides
characters, etc. only one constraint, thus lead-
ing to an aperture problem.
optical flow: An instantaneous
velocity measurement for the optical flow field: The field
direction and speed of the composed of the optical flow
image data across the visual vector at each pixel in an
field. This can be observed at image.
every pixel, creating a field optical flow field segmenta-
of velocity vectors. The set of tion: The segmentation of an
apparent motions of the image optical flow image into regions
pixel brightness values. where the optical flow has
optical flow boundary: The a similar direction or magni-
boundary between two regions tude. The regions can arise
201
from objects moving in differ- a measure of how well spatially
ent directions or surfaces at dif- varying patterns are observed
ferent depths. See also optical by an optical system. More
flow boundary. formally, in a 2D image, let
optical flow region: A region X fh  fv  and Y fh  fv  be the
where the optical flow has a Fourier transforms of the
similar direction or magnitude. input xh v and output yh v
Regions can arise from objects images. Then, the OTF of
moving in different directions, a horizontal and vertical
or surfaces at different depths. spatial frequency pair  fh  fv 
See also optical flow boundary. is H fh  fv /H0 0, where
H fh  fv  = Y fh  fv /X fh  fv .
optical flow smoothness con- The optical transfer function is
straint: The constraint that usually a complex number en-
nearby pixels in an image usu- coding both the reduction in
ally have similar optical flow signal strength at each spatial
because they usually arise from frequency and the phase shift.
projection of adjacent surface
patches having similar motions optics: A general term for the
relative to the observer. The manipulation and transform-
constraint can be relaxed at ation of light and images using
optical flow boundaries. lenses and mirrors.
optical image processing: An optimal basis encoding: A gen-
image processing technique in eral technique for encoding
which the processing occurs image or other data by pro-
by use of lenses and coherent jecting onto some basis func-
light instead of by a compu- tions of a linear space and
ter. The key principle is that then using the projection coef-
a coherent light beam that ficients instead of the original
passes through a transparency data. Optimal basis functions
of the target image and is then produce projection coefficients
focused produces the Fourier that allow the best discrimin-
transform of the image at the ation between different classes
focal point where frequency of objects or members in a class
domain filtering can occur. (such as for face recognition).
A typical processing arrange-
ment is: optimization: A general term
for finding the values of the
parameters that maximize or
minimize some quantity.
optimization parameter esti-
mation: See optimization.
SOURCE FOCAL IMAGING
TRANSPARENCY PLANE SENSOR OR operator: A pixelwise logic
FILTER operator defined on binary
variables. It takes as input
optical transfer function two binary images, I1 and I2 ,
(OTF): Informally, the OTF is and returns an image I3 in
202
which the value of each pixel vector; the orientation of an
is 0 if both I1 and I2 are 0, ellipsoid, specified by its prin-
and 1 otherwise. The rightmost cipal directions; the orienta-
image below shows the result tion of a wire-frame model,
of ORing the left and middle specified by its own reference
figures (note that the white pix- frame with respect to a world
els have value 1): reference frame.
orientation error: The amount
of error associated with an
orientation value.
orientation representation:
See pose representation.
oriented texture: A texture in
which a preferential direction
order statistic filter: A filter can be detected. For instance,
based on order statistics, a the direction of the bricks
technique that sorts the pixels in a regular brick wall. See
of a neighborhood by inten- also texture direction, texture
sity value, and assigns a rank orientation.
(the position in the sorted se-
quence) to each. An order orthogonal image transform:
statistics filter replaces the cen- Orthogonal Transform Coding
tral value of the filtering neigh- is a well-known class of tech-
borhood with the value at a niques for image compression.
given rank in the sorted list. A The key process is the pro-
popular example is the median jection of the image data onto
filter. As this filter is less sensi- a set of orthogonal basis func-
tive to outliers, it is often used tions. See, for instance, the dis-
in robust statistics processes. crete cosine, Fourier or Haar
See also rank order filter. transforms. This is a special
case of the linear integral trans-
ordered texture: See macrotex- form.
ture.
orthogonal regression: Also
ordering: Sorting a collection of known as total least squares.
objects by a given property, for Traditionally seen as the gen-
instance, intensity values in a eralization of linear regression
order statistic filter. to the case where both x
orientation: The property of and y are measured quan-
being directed towards or fac- tities and subject to error.
ing a particular region of space, Given samples xi and yi , the
or of a line; also, the pose or objective is to find estimates
attitude of a body in space. of the “true” points x̃i  ỹi ,
For instance, the orientation and line parameters a b c
of a vector (where the vector such that ax̃i + b ỹi + c = 0 ∀i,
points to), specified by its unit and such that the error

203

xi − x̃i 2 +  yi − ỹi 2 is min- orthoimage: In photogram-
imized. This estimate is easily metry, the warp of an aerial
obtained as the line (or plane, photograph to an approxim-
etc., in higher dimensions) ation of the image that would
passing through the centroid have been taken had the
of the data, in the direction camera pointed directly down-
of the eigenvector of the data wards. See also orthographic
scatter matrix that has smallest projection.
eigenvalue.
orthonormal: A property of a
orthographic: The characteris- set of basis functions or vec-
tic property of orthographic tors. If < > is the inner pro-
(or perpendicular) projection duct function and a and b
onto the image plane. See are any two different mem-
orthographic projection. bers of the set, then we
orthographic camera: A camera have < a a >=< b b >= 1 and
in which the image is formed <a b >= 0.
according to a orthographic
projection. OTF: See optical transfer func-
tion.
orthographic projection: Ren-
dering of a 3D scene as a 2D outlier: If a set of data mostly
image by a set of rays ortho- conforms to some regular pro-
gonal to the image plane. The cess or is well represented by
size of the objects imaged does a model, with the exception of
not depend on their distance a few data points, then these
from the viewer. As a conse- exception points are outliers.
quence, parallel lines in the Classifying points as outliers
scene remain parallel in the depends on both the models
image. The equations of ortho- used and the statistics of the
graphic projections are data. This figure shows a line fit
x =X y=Y to some points and an outlying
point.
where x y are the image coord-
inates of an image point in
the camera reference frame
OUTLIER
(that is, in millimeters, not
pixels), and X Y Z are the
coordinates of the correspond-
ing scene point. An example is
seen here:
INLIERS

PARALLEL RAYS
outlier rejection: Identifying
outliers and removing them
IMAGE PLANE from the current process. Iden-

204
tification is often a difficult
process.
over-segmented: Describing
the output of a segmentation
algorithm. Given an image
where a desired segmentation
result is known, the algorithm
over-segments if the desired
regions are represented by too
many algorithmically output
regions. This image should be
segmented into three regions
but it was oversegmented into
five regions:

205
P

paired boundaries: See paired is invariant to rotation and


contours. translation. PGHs can be com-
paired contours: A pair of con- pared using the Bhattacharyya
tours occurring together in metric.
images and related by a spa- PAL camera: A camera conform-
tial relationship, for instance ing to the European PAL stan-
the contours generated by river dard (Phase Alternation by
banks in aerial images, or the Line). See also NTSC, RS-170,
contours of a human limb CCIR camera.
(arm, leg). Co-occurrence can
palette: The range of colors
be exploited to make con-
available.
tour detection more robust.
See also feature extraction. An pan: Rotation of a camera about
example is seen here: a single axis through the cam-
era center and (approximately)
parallel to the image vertical:

pairwise geometric histogram:


A line- or edge- based shape
representation used for object
recognition, especially 2D.
Histograms are built by com-
puting, for each line segment,
the relative angle and perpen-
dicular distance to all other panchromatic: Sensitive to light
segments. The representation of all visible wavelengths.

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

207
Panchromatic images are gray panoramic image stereo: A
scale images where each pixel stereo system working with a
averages light equally over the very large field of view, say
visible range. 360 in azimuth and 120 in
elevation. Disparity maps and
panoramic: Associated with a depths are recovered for the
wide field-of-view often cre- whole field of view simul-
ated or observed by a panned taneously. A normal stereo
camera. system would have to be
moved and results registered
panoramic image mosaic: A
to achieve the same result.
class of techniques for collating See also binocular stereo,
a set of partially overlapping multi-view stereo, omnidirec-
images into a panoramic, sin- tional sensing.
gle image. This is a mosaic build
Pantone matching system
(PMS): A color matching
system used by the printing
industry to print spot colors.
Colors are specified by the
Pantone name or number.
PMS works well for spot colors
but not for process colors,
usually specified by the CMYK
from the frames of a hand-held color model.
camera sequence. Typically,
the mosaic yields both very Panum’s fusional area: The
high resolution and large region of space within which
single vision is possible (that
field of view, which cannot be
is, you do not perceive dou-
simultaneously achieved by a ble images of objects) when the
physical camera. There are sev- eyes fixate a given point.
eral ways to build panoramic
mosaic, but, in general, there parabolic point: A point on
a smooth surface where the
are three necessary steps: first,
Gaussian curvature is positive.
determining correspondences See also HK segmentation.
(see stereo correspondence
problem) between adjacent parallax: The angle between the
images; second, using the two straight lines that join
a point (possibly a moving
correspondences to find
one) to two viewpoints. In
a warping transformation motion analysis, motion par-
between the two images (or allax occurs when two scene
between the current mosaic points that project to the same
and a new image); third, image point at one viewpoint
blending the new image into later project to different points
the current mosaic. as the camera moves. The
208
vector between the two new orthogonality of the rows of
points is the parallax. See: the left 2 × 3 submatrix, but
also to the constraint that
the rows have equal norm. In
orthographic projection, both
rows have unit norm.
parameter estimation: A class
of techniques aimed to esti-
mate the parameters of a
given parametric model. For
instance, assuming that a set of
image points lie on an ellipse,
and considering the implicit
ellipse model ax 2 + bxy + cy 2 +
INITIAL CAMERA FINAL CAMERA dx + ey + f , the parameter
POSITION POSITION vector a b c d e f  can be
estimated, for instance, by
parallel processing: An algo- least square surface fitting.
rithm is executed in parallel, parametric edge detector: An
or through parallel processing, edge detection technique that
when it can be divided into a seeks to match image data
number of computations that using a parametric model of
are performed simultaneously edge points and thus detects
on separate hardware. See also edges when the image data
single instruction multiple data, fits the edge model well. See
multiple instruction multiple Hueckel edge detector.
data, pipeline parallelism, task
parallelism. parametric mesh: A type of sur-
face modeling primitive for
parallel projection: A gen- 3D models in which the sur-
eralization of orthographic face is defined by a mesh of
projection in which a scene points. A typical example is
is projected onto the image NURBS (non-uniform rational
plane by a set of parallel rays b-splines).
not necessarily perpendicular
to the image plane. This is a parametric model: A math-
good approximation of per- ematical model expressed as
spective projection, up to a function of a set of parameters,
uniform scale factor, when for instance, the parametric
the scene is small in compar- equation of a curve or surface
ison to its distance from the (as opposed to its implicit form),
center of projection. Parallel or a parametric edge model
projection is a subset of weak (see parametric edge detector).
perspective viewing, where the paraperspective: An approxi-
weak perspective projection mation of perspective project-
matrix is subject not only to ion, whereby a scene is divided

209
into parts that are imaged sep- particle counting: An applica-
arately by parallel projection tion of particle segmentation
with different parameters. to counting the instances of
small objects (particles) like
part recognition: A class of tech- pebbles, cells, or water drop-
niques for recognizing assem- lets, in images or sequences,
blies or articulated objects such as in this image:
from their subcomponents
(parts), e.g., a human body
from head, trunk, and limbs.
Parts have been represented
by 3D models like generalized
cones, superquadrics, and
others. In industrial contexts,
part recognition indicates the
recognition of specific items
(parts) in a production line,
typically for classification and
quality control.
part segmentation: A class of particle filter: A tracking strat-
techniques for partitioning a egy where the probability den-
set of data into components sity of the model parameters
(parts) with an identity of their is represented as a set of par-
own, for instance a human ticles. A particle is a single sam-
body into limbs, head, and ple of the model parameters,
trunk. Part segmentation meth- with an associated weight.
ods exist for both 2D and 3D The probability density repre-
data, that is, intensity images sented by the particles is typ-
ically a set of delta functions
and range images, respectively.
or a set of Gaussians with
Various geometric models have
means at the particle centers.
been adopted for the parts, At each tracking iteration, the
e.g., generalized cylinders, current set of particles repre-
superellipses, and superqua- sents a prior on the model
drics. See also articulated parameters, which is updated
object segmentation. via a dynamical model and
partially constrained pose: A observation model to produce
situation whereby an object is the new set representing the
subject to a number of con- posterior distribution. See also
straints restricting the number condensation tracking.
of admissible orientations or particle segmentation: A class
positions, but not fixing one of techniques for detecting
univocally. For instance, cars individual instances of small
on a road are constrained to objects (particles) like pebbles,
rotate around an axis perpen- cells, or water droplets, in
dicular to the road. images or sequences. A typical

210
problem is severe occlusion patch classification: The prob-
caused by overlapping parti- lem of attributing a surface
cles. This problem has been patch to a particular class in a
approached successfully with shape catalogue, typically com-
the watershed transform. puted from dense range data
using curvature estimates or
particle tracking: See conden- shading. See also curvature
sation tracking sign patch classification, mean
Parzen: A Parzen window is and Gaussian curvature shape
a linearly increasing and classification.
decreasing weighting window path coherence: A property
(triangle-shaped) used to limit used in tracking objects in an
leakage to spurious frequen- image sequence. The assump-
cies when computing the tion is that the object motion
power spectrum of a signal: is mostly smooth in the scene
and thus the observed motion
in a projected image of the
scene is also smooth.
path finding: The problem of
determining a path with given
properties in a graph, for exam-
ple, the shortest path connect-
ing two given nodes, or two
See also windowing, Fourier nodes with given properties.
transform. A path is defined as a lin-
passive sensing: A sensing pro- ear subgraph. Path finding is a
cess that does not emit any characteristic problem of state-
stimulus or where the sensor space methods, inherited from
does not move is passive. symbolic artificial intelligence.
A normal stationary camera See also graph searching. This
term is also used in the con-
is passive. Structured light text of dynamic programming
triangulation or a moving search, for instance applied
video camera are active. to the stereo correspondence
passive stereo: A passive stereo problem.
algorithm uses only the infor- pattern grammar: See shape
mation obtainable using a grammar.
stationary set of cameras and
pattern recognition: A large
ambient illumination. This research area concerned with
contrasts with the active vision the recognition and classifi-
paradigm in stereo, where the cation of structures, relations
camera(s) might move or some or patterns in data. Classic
projected stimulus might be techniques include syntactic,
used to help solve the stereo structural and statistical
correspondence problem. pattern recognition.

211
PCA: See principal component used for selecting the thresh-
analysis. old. The method assumes that
PDM: See point distribution the percentage of the scene that
model. belongs to the desired object
(e.g., a darker object against a
peak: A general term for when lighter background) is known.
a signal value is greater than The threshold that selects that
the neighboring signal values. percentage of pixels is used.
An example of a signal peak
measured in one dimension perception: The process of
is when crossing a bright line understanding the world
lying on a dark surface along a through the analysis of sensory
scanline. A cross-section along input (such as images).
a scanline of an image of a perceptron: A computational
light line on a dark background element w  · x  that acts on a
might observe the pixel values data vector x , where w  is a vector
7, 45, 105, 54, 7. The peak of weights and  is the acti-
would be at 105. A two dimen- vation function. Perceptrons
sional example is when observ- are often used for classifying
ing the image of a bright spot data into one of two sets
on a darker background.  · x  ≥ 0 or w
(i.e., if w  ·
pedestrian surveillance: See x  < 0). See also classification,
person surveillance. supervised classification, pat-
tern recognition.
pel: See pixel.
perceptron network: A multi-
pencil of lines: A bundle of layer arrangement of percep-
lines passing through the same trons, closely related to the
point. For example, if p is a well-known back-propagation
generic bundle point and p0 networks.
the point through which all
lines pass, the bundle is perceptual grouping: See per-
ceptual organization.
p = p0 + v
perceptual organization: A the-
where  is a real number and v
ory based on Gestalt psych-
the direction of the individual
ology, centered on the tenet
line (both are parameters). An
that certain organizations (or
example is:
interpretations) of visual stim-
uli are preferred over others
by the human visual system.
A famous example is that a
drawing of a wire-frame cube
is immediately interpreted as
a 3D object, instead of a 2D
collection of lines. This con-
percentile method: A special- cept has been used in sev-
ized thresholding technique eral low-level vision systems,

212
typically to find groups of the person’s movement over
low-level features most prob- time, possibly identify the per-
ably generated by interesting son using a database of known
objects. See also grouping and faces, and classify the person’s
Lowe’s curve segmentation. A behavior according to a small
more complex example is class of pre-defined behaviors
below, where the line of fea- (e.g., normal or anomalous).
ture endings suggests a virtual See also anomalous behavior
horizontal line. detection, face recognition, and
face tracking.
perspective: The rendering of a
3D scene as a 2D image accord-
ing to perspective projection,
the key characteristic of which
is, intuitively, that the size of
performance characterization: the imaged objects depend on
A class of techniques aimed to their distance from the viewer.
assess the performance of com- As a consequence, the image of
puter vision systems in terms a bundle of parallel lines is a
of, for instance, accuracy, pre- bundle of lines converging into
cision, robustness to noise, a point, the vanishing point.
repeatability, and reliability. The geometry of perspective
perimeter: 1) The perimeter of was formalized by the master
a binary image is the set of fore- painters of the Italian Quattro-
ground pixels that touch the cento and Renaissance.
background. 2) The length of
the path through those pixels. perspective camera: A camera
in which the image is formed
periodicity estimation: The according to perspective pro-
problem of estimating the jection. The corresponding
period of a periodic phe- mathematical model is com-
nomenon, e.g., given a texture monly known as the pinhole
created by the repetition of a camera model. An example of
fixed pattern, determine the the projection in the perspec-
pattern’s size. tive camera is:
person surveillance: A class of
techniques aimed at detecting, CENTER OF LENS
PROJECTION
tracking, counting, and recogn- OPTICAL
AXIS
izing people or their behavior in
CCTV videos, for security pur- IMAGE SCENE
PLANE OBJECT
poses. For examples, systems
have been reported for the auto-
mated surveillance of car parks, perspective distortion: A type
banks, airports and the like. of distortion in which lines
A typical system must detect that are parallel in the real
the presence of a person, track world appear to converge in
213
a perspective image. In the therefore used as an abso-
example notice how the train lute measure of the signifi-
tracks appear to converge in cance of feature points. See
the distance. also image feature.
phase correlation: A motion
estimation method that uses
the translation-phase dual-
ity property of the Fourier
transform, that is, a shift in the
spatial domain is equivalent to
a phase shift in the frequency
domain. When using log-polar
coordinates, and the rotation
perspective inversion: The and scale properties of the
problem of determining the Fourier transform, spatial
position of a 3D object from rotation and scale can be
its image. i.e., solving the per- estimated from the frequency
spective projection equations shift, independent of spatial
for the 3D coordinates. See translation. See also planar
also absolute orientation. motion estimation.
perspective projection: Imag- phase matching stereo algo-
ing a scene with foreshortening. rithm: An algorithm for solv-
The projection equation of per- ing the stereo correspondence
spective is problem by looking for similar-
ity of the phase of the Fourier
X Y
x =f y=f  transform.
Z Z
phase-retrieval problem: The
where x y are the image coord- problem of reconstructing a
inates of an image point in signal based on only the mag-
the camera reference frame nitude (not the phase) of the
(e.g., in millimeters, not pix- Fourier transform.
els), f is the focal length and
X Y Z are the coordinates of phase spectrum: The Fourier
the corresponding scene point. transform of an image can be
decomposed into its phase
PET: See positron emission spectrum and its power
tomography. spectrum. The phase spectrum
phase congruency: The prop- is the relative phase offset of
erty whereby components of the given spatial frequency.
the Fourier transform of an phase unwrapping technique:
image are maximally in phase The process of reconstruct-
at feature points like step edges ing the true phase shift from
or lines. Phase congruency is phase estimates “wrapped”
invariant to image brightness into − . The true phase
and contrast and has been shift values may not fall in this

214
interval but instead be mapped typical of image processing
into the interval by addition and pattern recognition.
or subtraction of multiples of photometric invariant: A fea-
2. The technique maximizes ture or characteristic of an
the smoothness of the phase image that is insensitive to
image by adding or subtract- changes in illumination. See
ing multiples of 2 at vari- also invariant.
ous image locations. See also
Fourier transform. photometric decalibration:
The correction of intensities in
phi–s curve (–s): A tech- an image so that the same sur-
nique for representing planar face (at the same orientation)
contours. Each point in the will give the same response
contour is represented by the regardless of the position in
angle  formed by the line which it appears in the image.
through P and the shape’s cen-
photometric stereo: A tech-
ter (e.g., the barycentrum or
nique recovering surface
center of mass) with a fixed
shape (more precisely, the
direction, and the distance s
surface normal at each sur-
from the center to P: See also face point) using multiple
shape representation. images acquired from a
single viewpoint but under
s different illumination condi-
tions. These lead to different
φ
reflectance maps, that together
constrain the surface normal
at each point.
photometry: A branch of optics
photo consistency: See shape concerned with the measure-
from photo consistency. ment of the amount or the
spectrum of light. In computer
photodiode: The basic element, vision, one frequently uses
or pixel, of a CCD or other photometric models express-
solid state sensor, converting ing the amount of light emerg-
light to an electric signal. ing from a surface, be it
photogrammetry: A research fictitious, or the surface of
area concerned with obtaining a radiating source, or from
reliable and accurate mea- an illuminated object. A well-
surements from noncontact known photometric model is
imaging, e.g., a digital height Lambert’s law.
map from a pair of overlapping photon noise: Noise generated
satellite images. Consequently, by the statistical fluctuations
accurate camera calibration is associated with photon count-
a primary concern. The tech- ing over a finite time inter-
niques used overlap many val in the CCD or other solid

215
state sensor of a digital camera. picture element: A pixel. It
Photon noise is not indepen- is an indivisible image mea-
dent of the signal, and is not surement. This is the small-
additive. See also image noise, est directly measured image
digital camera. feature.
photopic response: The sensi- picture tree: A recursive image
tivity-wavelength curve model- and 2D shape representation
ing the response of the human in which a tree data structure
eye to normal lighting condi- is used. Each node in the tree
tions. In such conditions, the represents a region that is then
cones are the photoreceptors decomposed into subregions.
on the retina that best respond These are represented by child
to light. Their response curve nodes. The figure below shows
peaks at 555 nm, indicating a segmented image with four
that the eye is maximally sen- regions (left) and the corres-
sitive to green-yellow colors ponding picture tree.
in normal lighting conditions.
When light intensity is very
low, the rods determine the ∗
eye’s response, modeled by C
the scotopic curve, which peaks A B
A B
near to 510 nm.
D
photosensor spectral response: C D
The spectral response of a
photosensor characterizing the
sensor’s output as a func- piecewise rigidity: The prop-
tion of the input light’s erty of an object or scene that
spectral frequency. See also some of its parts, but not the
Fourier transform, frequency object or scene as a whole,
spectrum, spectral frequency. are rigid. Piecewise rigidity can
physics based vision: An area be a convenient assumption,
of computer vision seeking to e.g., in motion analysis.
apply physics laws or methods pincushion distortion: A form
(of optics, surfaces, illumin- of radial lens distortion where
ation, etc.) to the analy- image points are displaced
sis of images and videos. away from the center of dis-
Examples include polarization tortion by an amount that
based methods, in which phys- increases with the distance
ical properties of the scene to the center. A straight line
surfaces are estimated via that would have been paral-
estimates of the state of lel to an image side is bowed
polarization of the incoming towards the center of the
light, and the use of detailed image. This is the opposite of
radiometric models of image barrel distortion.
formation.
216
i. A parallel computer cannot
compute ai and yi simultan-
eously as they are dependent,
so the computation requires
the following steps
a1 = Ax1 
y1 = Ba1 
a2 = Ax2 

pinhole camera model: The y2 = Ba2 


mathematical model for an a3 = Ax3 
ideal perspective camera for-
med by an image plane and


a point aperture, through
which all incoming rays must ai = Axi 
pass. For equations, see per- yi = Bai 



spective projection. This is a
good model for simple convex However, notice that we com-
lens camera, where all rays pass pute yi just after yi−1 , so the
through the virtual pinhole at computation can be arranged
the focal point. as
a1 = Ax1 
Pinhole
Principal
point
Optical a2 = Ax2  y1 = Ba1 
axis
Image Scene
a3 = Ax3  y2 = Ba2 
plane object


pink noise: Noise that is not ai+1 = Axi+1  yi = Bai 
white, i.e., when there is a cor-
relation between the noise at


two pixels or at two times. where steps on the same
pipeline parallelism: Paral- line may be computed concur-
lelism achieved with two or rently as they are independent.
more, possibly dissimilar, com- The output values yi therefore
putation devices. The non- arrive at a rate of one every
parallel process comprises cycle rather than one every two
steps A and B, and will oper- cycles without pipelining. The
ate on a sequence of items pipeline process can be visual-
xi  i > 0, producing outputs yi . ized as:
The result of B depends on
the result of A, so a sequen-
xi + 1 ai yi – 1
tial computer will compute A B
ai = Axi  yi = Bai  for each
217
pit: 1) A general term for when a precise size, specified by
a signal value is lower than the manufacturer and deter-
the neighboring signal values. mining the CCD’s aspect
Unlike signal peaks, pits usu- ratio. See also intensity sensor
ally refer to two dimensional and photosensor spectral res-
images. For example, a pit ponse.
occurs when observing the
image of a dark spot on a pixel addition operator: A low-
lighter background. 2) A local level image processing oper-
point-like concave shape defect ator taking as input two gray
in a surface. scale images, I1 and I2 , and
returning an image I3 in which
pitch: A 3D rotation represen- the value of each pixel is I3 =
tation (along with yaw and I1 + I2 . This figure shows at the
roll) often used for cameras right the sum of the two images
or moving observers. The pitch at the left (the sum divided by 2
component specifies a rotation to rescale to the original inten-
about a horizontal axis to give sity level):
an up–down change in orien-
tation. This figure shows the
pitch rotation direction:

OBSERVATION
DIRECTION

pixel classification: The prob-


PITCH lem of assigning the pixels of
DIRECTION an image to certain classes.
See also image segmentation,
pixel: The intensity values of supervised classification, and
a digital image are speci- clustering. This image shows
fied at the locations of a the pixels of the left image clas-
discrete rectangular grid; sified into four classes denoted
each location is a pixel. A by the four different shades of
pixel is characterized by its grey:
coordinates (position in the
image) and intensity value (see
intensity and intensity image).
Values can express physical
quantities other than intensity
for different kinds of images,
as in, e.g., infrared imaging.
In physical terms, a pixel
is the photosensitive cell
on the CCD or other solid
state sensor of a digital pixel connectivity: The pattern
camera. The CCD pixe has specifying which pixels are con-

218
sidered neighbors of a given one the numbers of pixels compos-
(X) for the purposes of compu- ing the region. See also region.
tation. Common connectivity
pixel division operator: An
schemes are 4 connectedness
and 8 connectedness, as seen in operator taking as input two
the left and right images here: gray scale images, I1 and I2 ,
and returning an image I3 in
which the value of each pixel
is I3 = I1 /I2 .
X X pixel exponential operator:
A low-level image processing
operator taking as input one
gray scale image, I1 , and return-
pixel coordinates: The coord- ing an image I2 in which the
inates of a pixel in an image. value of each pixel is I2 = cb I1 .
Normally these are the row and This operator is used to change
column position. the dynamic range of an image.
pixel coordinate transform- The value of the basis b
ation: The mathematical depends on the desired degree
transformation linking two of compression of the dynamic
image reference frames, spe- range. c is a scaling factor.
cifying how the coordinates See also logarithmic transform-
of a pixel in one reference ation, pixel logarithm oper-
frame are obtained from the ator. The right image is 1.005
coordinate of that pixel in the raised to the pixel values of the
other reference frame. One left image:
linear transformation can be
specified by
i1 = ai2 + bj2 + e
j1 = ci2 + dj2 + f
where the coordinates of p2 =
i2  j2  are transformed into
p1 = i1  j1 . In matrix form,

ab
p1 = A p2 + t , with A = c d
pixel gray scale resolution:
a rotation matrix and t = The number of different gray
e levels that can be repre-
f a translation vector. See sented in a pixel, depending
also Euclidean, affine and on the number of bits asso-
holography transforms. ciated with each pixel. For
pixel counting: A simple algo- instance, an 8-bit pixel (or
rithm to determine the area of image) can represent 28 = 256
an image region by counting different intensity values. See

219
also intensity, intensity image, pixel multiplication operator:
and intensity sensor. An image processing operator
pixel interpolation: See image taking as input two gray scale
interpolation. images, I1 and I2 , and return-
ing an image I3 in which the
pixel jitter: A frame grabber value of each pixel is I3 =
must estimate the pixel sam- I1 ∗ I2 . The right image is the
pling clock of a digital camera, product of the left and middle
i.e., the clock used to read out images (scaled by 255 for con-
the pixel values, which is not trast here):
included in the output signal
of the camera. Pixel jitter is a
form of image noise generated
by time variations in the frame
grabber’s estimate of the cam-
era’s clock.
pixel logarithm operator: An
image processing operator
taking as input one gray scale
image, I1 , and returning an pixel subsampling: The pro-
image I2 in which the value cess of producing a smaller
of each pixel is I2 = c logb image from a given one by
 I1 + 1 . This operator is used including only one pixel out of
to change the dynamic range every N . Subsampling is rarely
of an image (see also applied this literally, however,
contrast enhancement), such as severe aliasing is intro-
as for the enhancement duced; scale space filtering is
of the magnitude of the applied instead.
Fourier transform. The base
b of the logarithm function pixel subtraction operator: A
is often e, but it does not low-level image processing
actually matter because the operator taking as input two
relationship between log- gray scale images, I1 and I2 ,
arithms of any two bases is and returning an image I3 in
only one of scaling. See also which the value of each pixel
pixel exponential operator. The is I3 = I1 − I2 . This operator
right image is the scaled implements the simplest
logarithm of the pixel values possible change detection
of the left image: algorithm. The right image
(with 128 added) is the middle
image subtracted from the left
image:

220
planar facet model: See surface also planar mosaic. 2) When
mesh. all of the surfaces in a scene
planar mosaic: A panoramic are planar, e.g., a blocks world
scene.
image mosaic of a planar
scene. If the scene is planar, plane: The locus of all points x
the transformation linking dif- such that the surface normal n 
ferent views is a homography. of the plane and a point in
the plane p satisfy the rela-
planar motion estimation: A
class of techniques aiming to x − p  · n
tion   = 0. In 3D space,
estimate the motion param- for instance, a plane is defined
by two vectors and a point
eters of bodies moving on
lying on the plane, so that the
a planes in space. See also
plane’s parametric equation is
motion estimation.
p = a u + bv + p0
planar patch extraction: The
problem of finding planar where p is the generic plane
regions, or patches, most point, u  v  p0 are the two vec-
commonly in range images. tors and the point defining
Plane extraction can be use- the plane, respectively. The
ful, for instance, in 3D pose implicit equation of a plane
estimation, as several model- is ax + by + cz + d = 0, where
based matching techniques x y z are the coordinates of
yield higher accuracy with pla- the generic plane point. In vec-
nar than non-planar surfaces. tor form, p · n  = d, where p =
planar patches: See surface x y z, n = a b c is a vec-
triangulation. tor perpendicular to the plane,
and √d is the distance of
planar projective transform- 
n
ation: See homography. the plane from the origin. All
of these definitions are equiva-
planar rectification: A class of lent.
rectification algorithms pro-
jecting the original images plane conic: Any of the curves
onto a plane parallel to the defined by the intersection of
baseline of the cameras. See a plane with a 3D double
also stereo and stereo vision. cone, namely ellipse, hyper-
bola and parabola. Two inter-
planar scene: 1) When the secting lines and a single
depth of a scene is small point represent degenerate
with respect to its distance conics, defined by special con-
from the camera, the scene figurations of the cone and
can be considered planar, plane. The implicit equation
and useful approximations can of a conic is ax 2 + bxy +
be adopted; for instance, the cy 2 + dx + ey + f = 0. See
transformation between two also conic fitting. This figure
views taken by a perspective shows an ellipse formed by
camera is a homography. See intersection:

221
Plücker line coordinates: A
representation of lines in
projective 3D space. A line
is represented by six num-
bers l12  l13  l14  l23  l24  l34  that
must satisfy the constraint that
l12 l34 + l13 l24 + l14 l23 = 0. The
numbers are the entries of
the Plücker matrix, L, for the
line. For any two points A B
on the line, L is given by
lij = Ai Bj − Bi Aj . The pencil of
plane projective transfer: An planes containing the line are
algorithm based on projective the nullspace of L. The six
invariants that, given two numbers may also be seen as a
images of a planar object, I1 pair of 3-vectors, one a point a 
and I2 , and four feature cor- on the line, one the direction
respondences, determines the  with a
n  ·n  = 0.
position of any other point of PMS: See Pantone matching
I1 in I2 . Interestingly, no know- system.
ledge of the scene or of the
imaging system’s parameters is point: A primitive concept of
Euclidean geometry, repre-
necessary.
senting an infinitely small
plane projective transform- entity. In computer vision,
ation: The linear transform- pixels are regarded as image
ation between the coordinates points, and one speaks of
of two projective planes, also “points in the scene” as pos-
known as homography. See itions in the 3D space observed
also projective geometry, proj- by the cameras.
ective plane, and projective point distribution model
transformation. (PDM): A shape representa-
plenoptic function represen- tion for flexible 2D contours. It
tation: A parameterized func- is a type of deformable template
tion for describing everything model and its parameters
that is visible from a given can be learned by supervised
point in space, a fundamental learning. It is suitable for 2D
representation in image based shapes that undergo general
rendering. but correlated deformations or
variations, such as component
Plessey corner finder: A well- motion or shape variation. For
known corner detector also instance, fronto-parallel images
known as Harris corner de- of leaves, fish or human hands,
tector, based on the local auto- resistors on a board, people
correlation of first-order image walking in surveillance videos,
derivatives. See also feature and the like. The shape vari-
extraction. ations of the contour in a series

222
of examples are captured by
principal component analysis. MINIMA

point feature: An image feature


that occupies a very small por-
tion of an image, ideally one
pixel, and is therefore local in
nature. Examples are corners
(see corner detection) or edge
pixels. Notice that, although
point features occupy only one
pixel, they require a neigh- MAXIMA
borhood to be defined; for
instance, an edge pixel is char-
acterized by a sharp variation point sampling: Selection of
of image values in a small discrete points of data from
neighborhood of the pixel. a continuous signal. For ex-
ample a digital camera samples
point invariant: A property that a continuous image function
1) can be measured at a point into a digital image.
in an image and 2) is invariant
to some transformation. For point similarity measure:
instance, the ratio of a pixel’s A function measuring the
observed intensity to that of similarity of image points
its brightest neighbor is invari- (actually small neighborhoods
ant to changes in illumination. to include sufficient infor-
Another example: the magni- mation to characterize the
tude of the gradient of inten- image location), for instance
sity at a point is invariant to cross correlation, SAD (sum of
translation and rotation. (Both absolute differences), or SSD
of these examples assume ideal (sum of squared differences).
images and observation.)
point source: A point light
point light source: A point-like source. An ideal illumination
light source, typically radiating source in which all light comes
energy radially, whose inten- from a single spatial point.
sity decreases as r12 , where r is The alternative is an extended
the distance to the source. light source. The assump-
point matching: A class of algo- tion of being a point source
rithms solving the matching or allows easier interpretation of
correspondence problem for shading and shadows, etc.
point features. point spread function: The
point of extreme curvature: response of a 2D system or filter
A point where the curvature to an input Dirac impulse. The
achieves an extremum, that is, response is typically spread over
a maximum or a minimum. a region surrounding the point
This figure shows one of each of application of the impulse,
type circled: hence the name. Analogous to

223
the impulse response of a 1D fish classification, defect detec-
system. See also filter, linear tion, and in structured light
filter. triangulation.
polar coordinates: A system polarizer: A device changing the
of coordinates specifying the state of polarization of light
position of a point P in terms to a specific polarized state,
of the direction of the line for example, producing lin-
through P and the origin, early polarized light in a given
and the distance from P to plane.
the origin along that line. For
example, the transformation polycurve: A simple curve C
between polar r  and Carte- that is smooth everywhere but
sian coordinates x y in the at a finite set of points, and
= r cos and
plane is given by x such that, given any point P on
y = r sin , or r = x 2 + y 2 and C , the tangent to C converges
= atan xy . to a limit approaching P from
polar rectification: A rectifi- each direction. Computer
cation algorithm designed vision shape models often
to cope with any camera describe boundary shapes
geometry in the context using polycurve models
of uncalibrated vision, re- consisting of a sequence of
parameterizing the images in curved or straight segments,
polar coordinates around the such as in this example using
epipoles. four circular arcs. See also
polyline.
polarization: The characterizing
property of polarized light.
polarized light: Unpolarized
light results from the nonde-
terministic superposition of
the x and y components of the
electric field. Otherwise, the
light is said to be polarized,
and the tip of the electric field
evolves on an ellipse (elliptic-
ally polarized light). Light is
often partially polarized, that is, polygon: A closed, piecewise
it can be regarded as the sum of linear, 2D contour. Squares,
completely polarized and com- rectangles and pentagons are
pletely unpolarized light. In examples of regular polygons,
computer vision, polarization where all sides have equal
analysis is an area of physics length and all angles formed
based vision, and has been used by contiguous sides are equal.
for metal–dielectric discrim- This does not hold for a gen-
ination, surface reconstruction, eral polygon.
224
polygon matching: A class of of computer vision is pose
techniques for matching poly- estimation.
gonal shapes. See polygon. pose clustering: A class of algo-
polygonal approximation: rithms solving the pose
A polyline approximating a estimation problem using
curve. This circular arc is clustering techniques (see
(badly) approximated by the clustering/cluster analysis). See
polyline: also pose, k-means clustering.
pose consistency: An algorithm
seeking to establish whether
two shapes are equivalent.
Given two sets of points G1
and G2 , for example, the algo-
rithm finds a sufficient num-
ber of point correspondences
polyhedron: A 3D object with to determine a transformation
planar faces, a “3D polygon”. T between the two sets, then
A subset of 3 whose bound- applies T to all other points of
ary is a subset of finitely many G1 . If the transformed points
planes. The basic primitive of are close to points in G2 ,
many 3D modeling schemes, consistency is satisfied. Also
as many hardware accelerators known as viewpoint consis-
process polygons particularly tency. See also feature point
quickly. A tetrahedron is the correspondence.
simplest polyhedron: pose determination: See pose
estimation.
pose estimation: The problem
of determining the orientation
and translation of an object,
especially a 3D one, from one
or more images thereof. Often
the term means finding the
transformation that aligns a
geometric model with the
polyline: A piecewise linear image data. Several techniques
contour. If closed, it becomes exist for this purpose. See also
a polygon. See also polycurve, alignment, model registration,
contour analysis and contour orientation estimation, and
representation. rotation representation.
pose: The location and orien- pose representation: The prob-
tation of an object in a lem of representing the
given reference frame, espe- angular position, or pose,
cially a world or camera refer- of an object (especially 3D)
ence frame. A classic problem in a given reference frame.

225
A common representation is codes. See handwritten and
the rotation matrix, which optical character recognition.
can be parameterized in dif- posture analysis: A class of
ferent ways, e.g., Euler angles, techniques aiming to estimate
pitch-, yaw-, roll-angles, rota- the posture of an articulated
tion angles around the coord- body, for instance a human
inate axes, axis-angle, and body (e.g., pointing, sitting,
quaternions. See also orien- standing, crouching, etc.).
tation estimation and rotation
representation. potential field: A mathematical
function that assigns some
position: Location in space (usually scalar) value at every
(either 2D or 3D). point in some space. In com-
position dependent bright- puter vision and robotics, this
ness correction: A technique is usually a measure of some
scalar property at each point
seeking to counteract the
of a 2D or 3D space or image,
brightness variation caused by such as the distance from a
a real imaging system, typ- structure. The representation
ically the fact that brightness is used in path planning, such
decreases as one moves away that the potential at every
from the optical axis in a lens point indicates, for example,
system with finite aperture. the ease/difficulty of getting to
This effect may be noticeable some destination.
only in the periphery of the
image. See also lens. power spectrum: In the context
of computer vision, normally
position invariant: Any prop- the amount of energy at each
erty that does not vary with spatial frequency. The term
position. For instance, the could also refer to the amount
length of a 3D line segment is of energy at each light fre-
invariant to the line’s position quency. Also called the power
in 3D space, but the length spectrum density function or
of the line’s projection on the spectral density function.
image plane is not. See also precision: 1) The repeatability
invariant. of the accuracy of a vision
positron emission tomogra- system (in general, of an
phy (PET): A medical imaging instrument) over many mea-
method that can measure the sures carried out in the same
concentration and movement conditions. Typically measured
of a positron-emitting isotope by the standard deviation of
in living tissue. a target error measure. For
instance, the precision of a
postal code analysis: A set vision system measuring lin-
of image analysis techniques ear size would be assessed by
concerned with understand- taking thousands of measure-
ing written or printed postal ments of a perfectly known

226
object and computing the stan-
–1 0 +1 +1 +1 +1
dard deviation of the measure-
ments. See also accuracy. 2) –1 0 +1 0 0 0
The number of significant bits
in a floating point or double –1 0 +1 –1 –1 –1
precision number that lie to
the right of the decimal point. Gx Gy

predictive compression
method: A class of image primal sketch: A representation
compression algorithms using for early vision introduced by
redundancy information, Marr, focusing on low-level fea-
mostly correlation, to build an tures like edges. The full primal
sketch groups the information
estimate of a pixel value from
computed in the raw primal
values of neighboring pixels.
sketch (consisting largely of
pre-processing: Operations on edge, bar, end and blob feature
an image that, for example, information extracted from
suppress some distortion(s) or the images), for instance by
enhance some feature(s). Ex- forming subjective contours.
amples include geometric See also Marr–Hildreth edge
transformations, edge detec- detection and raw primal
tion, image restoration, etc. sketch.
There is no clear distinction
primary color: A color coding
between image pre-processing
scheme whereby a range of
and image processing.
perceivable colors can be made
Prewitt gradient operator: An by a weighted combination of
edgedetection operator based primary colors. For example,
on templatematching. It applies color television and computer
a set of convolution masks, or screens use red, green and
kernels (see Prewittkernel), blue light-emitting chemicals
implementing matchedfilters to produce these three pri-
for edges at various (gener- mary colors. The ability to use
ally eight) orientations. The only three colors to generate
magnitude (or strength) of all others arises from the tri-
the edge at a given pixel is the chromacy of the human eye,
maximum of the responses to which has cones that respond
the masks. Alternatively, some to three different color spectral
implementations use the sum ranges. See also additive and
of the absolute value of the subtractive color.
responses from the horizontal principal component analysis
andverticalmasks. (PCA): A statistical technique
Prewitt kernel: The mask used useful for reducing the dimen-
by the Prewitt gradient oper- sionality of data, at the basis
ator. The horizontal and verti- of many computer vision tech-
cal masks are: niques (e.g., point distribution

227
models and eigenspace based
recognition). In essence, the
deviation of a random vector, X
x , from the population mean, PRINCIPAL
, can be expressed as the DIRECTIONS
product of A, the matrix of
eigenvectors of the covariance
matrix of the population, and a
vector y of projection weights:
principal curvature sign class:
y = A
x − 
 See mean and Gaussian curva-
ture shape classification.
so that
principal direction: The direc-
x = A−1 y +
 tion in which the normal cur-
vature achieves an extremum,
Usually only a subset of the
that is, a principal curvature.
components of y is sufficient to
The two principal curvatures
approximate x . The elements and directions, together,
of this subset correspond specify completely the local
to the largest eigenvalues of surface shape. The principal
the covariance matrix. See directions at the point X on
also Karhunen–Loève transform- the cylinder below are parallel
ation. to the axis and around the
principal component basis cylinder.
space: In principal compon-
ent analysis, the space gener-
ated by the basis formed by
the eigenvectors, or eigendir-
X
ections, of the covariance
PRINCIPAL
matrix. DIRECTIONS
principal component repre-
sentation: See principal com-
ponent analysis.
principal point: The point at
principal curvature: The max- which the optical axis of a
imum or minimum normal pinhole camera model inter-
curvature at a surface point, sects the image plane, as in:
achieved along a principal
direction. The two princi- PINHOLE
pal curvatures and directions, PRINCIPAL POINT
OPTICAL
together completely specify the AXIS
local surface shape. The princi- IMAGE SCENE
PLANE
pal curvatures in the two direc- OBJECT

tions at the point X on the


cylinder of radius r below are 0 principal texture direction: An
(along axis) and 1r (across axis). algorithm identifying the direc-

228
tion of a texture. A directional allow you to input infor-
or oriented texture in a small mation at any node (unlike
image patch generates a peak in neural networks), and asso-
the Fourier transform. To deter- ciate uncertainty coefficients
mine the direction, the Fourier to classification answers. See
amplitude plot is regarded as also Bayes’ rule, Bayesian
a distribution of physical mass, model, Bayesian network.
and the minimum-inertia axis
identified. probabilistic principal com-
ponent analysis: A technique
privileged viewpoint: A view- defining a probability model
point where small motions for principal component
cause image features to appear analysis (PCA). The model
or disappear. This contrasts can be extended to mixture
with a generic viewpoint. models, trained using the
probabilistic causal model: A expectation maximization (EM)
representation used in arti- algorithm. The original data is
ficial intelligence for causal modeled as being generated
models. The simplest causal by the reduced-dimensionality
model is a causal graph, in subset typical of PCA plus
essence an acyclic graph in Gaussian noise (called a latent
which nodes represents vari- variable model).
ables and directed arcs rep-
probabilistic relaxation: A
resent cause and effect. A
method of data interpretation
probabilistic causal model is a
causal graph with the probabil- in which local inconsistencies
ity distribution of each variable act as inhibitors and local
conditional to its causes. consistencies act as excitors.
The hope is that the combina-
probabilistic Hough trans- tion of these two influences
form: The probabilistic Hough constrains the probabilities.
transform computes an ap-
proximation to the Hough probabilistic relaxation label-
transform by using only a ing: An extension of relax-
percentage of the image data. ation labeling in which each
The goal is to reduce the com- entity to be labeled, for
putational cost of the standard instance each image feature, is
Hough transform. A threshold not simply assigned to a label,
effect has been observed so but to a set of probabilities,
that if the percentage sampled each giving the likelihood that
is above the threshold level the feature could be assigned a
then few false positives are specific label.
detected. probability: A measure of the
probabilistic model learning: confidence one may have in
A class of Bayesian learn- the occurrence of an event, on
ing algorithms based on a scale from 0 (impossible) to
probabilistic networks, that 1 (certain), and defined as the
229
proportion of favorable out- production system: 1) An
comes to the total number of approach to computerized
possibilities. For instance, the logical reasoning, whereby the
probability of getting any num- logic is represented as a set
ber from a dice in a single of “production rules”. A rule
throw is 16 . Probability theory, is of the form “LHS→RHS”.
an important part of statistics, This states that if the pattern
is the basis of several vision or set of conditions encoded
techniques. in the left-hand side (LHS)
are true or hold, then do
probability density estimation:
the actions specified in the
A class of techniques for esti-
right-hand side (RHS), which
mating the density function or may simply be the assertion
its parameters given a sample of some conclusion. A sample
from a population. A related rule might be “If the number
problem is testing whether of detected edge fragments is
a particular sample has been less than 10, then decrease
generated by a process charac- the threshold by 10%”. 2)
terized by a particular probabil- An industrial system that
ity distribution. Two common manufactures some product.
tests are the goodness-of-fit and 3) A system that is to be
the Kolmogorov–Smirnov tests. actually used, as compared to
The former is a parametric a demonstration system.
test best used with large sam-
ples; the latter gives good profiles: A shape signature for
results with smaller samples, image regions, specifying the
but is a non-parametric test number of pixels in each col-
and, as such, does not pro- umn (vertical profile) or row
duce estimates of the pop- (horizontal profile). Used in
ulation parameters. See also pattern recognition. See also
non-parametric method. shape, shape representation.
procedural representation: A progressive image transmis-
class of representations used sion: A method of transmit-
in artificial intelligence that ting an image in which a
are used to encode how to low-resolution version is first
perform a task (procedural transmitted, followed by details
knowledge). A classic example that allow progressively higher
is the production system. In
contrast, declarative represen-
tations encode how an entity is
structured.
Procrustes analysis: A method
for comparing two data sets
through the minimization of
squared errors, by translation, FIRST BETTER BEST
rotation and scaling. IMAGE IMAGE IMAGE

230
resolution versions to be projective geometry: A field of
recreated. geometry dealing with pro-
progressive scan camera: A jective spaces and their proper-
camera that transfers an entire ties. A projective geometry is
image in the order of left-to- one where only properties pre-
right, top-to-bottom, without served by projective transform-
the alternate line interlacing ations are defined. Projective
used in television standards. geometry provides a conven-
This is much more conven- ient and elegant theory to
ient for machine vision and model the geometry of the
other computer-based applica- common perspective camera.
tions. Most notably, the perspective
projection: 1) The transforma- projection equations become
tion of a geometric structure linear.
from one space to another,
e.g., the projection of a 3D projective invariant: A prop-
point onto the nearest point erty, say I , that is not affected
in a given plane. The projec- by a projective transformation.
tion may be specified by a lin- More specifically, assume an
ear function, i.e., for all points  of a geometric
invariant, IP,
p in the initial structure, the structure described by a param-
points p in the projected struc- eter vector P. When the struc-
ture are given by p = Mp for ture is subject to a projec-
some matrix M. Alternatively, tive transformation (M) this
the projection need not be lin- gives a structure with param-
ear, e.g., p = f p . 2) The spe- eter vector p , and IP  = Ip .
cific case of projection of a The most fundamental projec-
scene that creates an image on tive invariant is the cross ratio.
a plane by use of, for example, In some applications, invari-
a perspective camera, accord- ants of weight w occur, which
ing to the rules of perspec- 
transform as Ip  = IPdet Mw .
tive.
projective plane: A plane,
projection matrix: The matrix usually denoted by P 2 , on
transforming the homogen- which a projective geometry is
eous projective coordinates defined.
of a 3D scene point x y z 1
into the pixel coordinates projective reconstruction: The
u v 1 of the point’s image problem of reconstructing the
in a pinhole camera. It can be geometry of a scene from a
factored as the product of the set or sequence of images in
two matrices of the intrinsic a projective space. The trans-
camera parameters and formation from projective to
extrinsic camera parameters. Euclidean coordinates is easy
See also camera coordinates, if the Euclidean coordinates
image coordinates, scene of the five points in a pro-
coordinates. jective basis are known. See
231
also projective geometry and and characterizing attributes
projective stereo vision. of spatio-temporal patterns.
projective space: A space of For example, learning the
n + 1-dimensional vectors, color and texture distributions
usually denoted by P n , on that differentiate beween
which a projective geometry is normal and cancerous cells.
defined. See also boundary property,
metric property, unsupervised
projective stereo vision: A learning and supervised
class of stereo algorithms learning.
based on projective geometry.
Key concepts expressed prototype: An object or model
elegantly by the projective serving as representative ex-
framework are epipolar ample for a class, capturing the
geometry, fundamental matrix, defining characteristics of the
and projective reconstruction. class.
projective stratum: A layer in proximity matrix: A matrix
the stratification of 3D geo- M occurring in cluster
metries. Moving from the sim- analysis. Mi j denotes the
plest to the most complex, distance (e.g., the Hamming
we have the projective, affine, distance) between clusters i
metric and Euclidean strata. and j.
See also projective geometry, pseudocolor: A way of assigning
projective reconstruction. a color to pixels that is based
projective transformation: on an interpretation of the
Also known as projectivity, data rather than the original
from one projective plane scene color. The usual purpose
to another. It can be rep- of pseudocoloring is to label
resented by a non-singular image pixels in a useful man-
3 × 3 matrix acting on ner. For example, one com-
homogeneous coordinates. mon pseudocoloring assigns
The transformation has eight different colors according to
degrees of freedom, as only the the local surface shape class.
ratio of projective coordinates A pseudocoloring scheme for
is significant. aerial or satellite images of the
earth assigns colors according
property based matching: The
to the land type, such as water,
process of comparing two
forest, wheat field, etc.
entities (e.g., image features
or patterns) using their prop- PSF: See point spread function.
erties, e.g., the moments of a purposive vision: An area of
region. See also classification, computer vision linking per-
boundary property, metric ception with purposive action;
property. that is, modifying the posi-
property learning: A class of tion or parameters of an imag-
algorithms aiming at learning ing system purposively, so

232
that a visual task is facili- pyramid transform: An oper-
tated or made possible. Ex- ator for building a pyramid
amples include changing the from an image. See pyramid,
lens parameters so to obtain image pyramid, Laplacian
information about depth, as in pyramid, Gaussian pyramid.
depth from defocus, or mov-
ing around an object to achieve
full shape information.
pyramid: A representation of an
image including information
at several spatial scales. The
pyramid is constructed by the
original image (maximum reso-
lution) and a scale operator
that reduces the content of the
image (e.g., a Gaussian filter)
by discarding details at coarser
scales:

64 × 64

128 × 128

256 × 256

Applying the operator and sub-


sampling the resulting image
leads to the next (lower-reso-
lution)level of the pyramid.
See also scale space, image
pyramid, Gaussian pyramid,
Laplacian pyramid, pyramid
transform.
pyramid architecture: A com-
puter architecture supporting
pyramid-based processing, typ-
ically occurring in the context of
multi-scale processing. See also
scale space, pyramid, image
pyramid, Laplacian pyramid,
Gaussian pyramid.

233
Q

QBIC: See query by image con- range image analysis, a part of


tent. a range surface that is well
quadratic variation: 1) Any approximated by a quadric
function (here, expressing a (e.g., an elliptical patch).
variation of some variables) quadric patch extraction: A
that can be modeled by a quad- class of algorithms aiming to
ratic polynomial. 2) The spe- identify the portions of a sur-
cific measure of surface shape face that are well approxi-
deformation fxx2 + 2fxy2 + fyy2 of mated by quadric patches.
a surface f x y. This measure Techniques are similar to those
has been used to constrain the applied for conic fitting. See
smoothness of reconstructed also surface fitting, least square
surfaces. surface fitting.
quadrature mirror filter: A quadrifocal tensor: An alge-
class of filters occurring in braic constraint imposed on
wavelet and image compres- quadruples of corresponding
sion filtering theory. The fil- points by the geometry of four
ter splits a signal into a high simultaneous views, analogous
pass component and a low pass to the epipolar constraint for
component, with the low pass the two-camera case and to the
component’s transfer function trifocal tensor for the three-
a mirror image of that of the camera case. See also stereo
high pass component. correspondence, epipolar geo-
quadric: A surface defined by a metry.
second-order polynomial. See quadrilinear constraint: The
also conic. geometric constraint on four
quadric patch: A quadric sur- views of a point (i.e., the inter-
face defined over a finite region section of four epipolar lines).
of the independent variables See also epipolar constraint
or parameters; for instance, in and trilinear constraint.

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

235
quadtree: A hierarchical struc- Hierarchical subdivision con-
ture representing 2D image tinues until the remaining
regions, in which each node regions have constant proper-
represents a region, and the ties. Quadtrees can be used
whole image is the root of the to create a compressed image
tree. Each non-leaf node, rep- structure. The 3D extension of
resenting a region R, has four a quadtree is the octree.
children, that represent the qualitative vision: A paradigm
four subregions into which R is based on the idea that many
divided:, as illustrated below. perceptual tasks could be
better accomplished by com-
puting only qualitative descrip-
tions of objects and scenes
from images, as opposed
to quantitative information
like accurate measurements.
Suggested in the framework
of computational theories of
human vision.
quantization: See spatial quant-
ization.
quantization error: The ap-
proximation error created by
the quantization of a continu-
ous variable, typically using a
regularly spaced scale of val-
ues. This figure

5
4
3
2
1
0

shows a continuous function


(dashed) and its quantized
version (solid line) using six
values only. The quantization
error is the vertical distance
between the two curves. For
instance, the intensity values in
236
a digital image can only take on
a certain number (often 256)
of discrete values. See also
sampling theorem and Nyquist
sampling rate.
quantization noise: See quant-
ization error.
quasi-invariant: An approxima-
tion of an invariant. For
instance, quasi-invariant par-
ameterizations of image curves
have been built by approxi-
mating the invariant arc length
with lower spatial derivatives.
quaternion: A forerunner of
the modern vector concept,
invented by Hamilton, used
in vision to represent rota-
tions. Any rotation matrix,
R, can be parameterized by
a vector of four numbers,
q = q0  q1  q2  q3 , such that

k=0 qk = 1,
3 2
that define
uniquely the rotation. A rota-
tion has two representations,
q and − q . See rotation matrix
for alternative representations
of rotations.
query by image content
(QBIC): A class of techniques
for selecting members from a
database of images by using
examples of the desired image
content (as opposed to textual
search). Examples of contents
include color, shape, and tex-
ture. See also image database
indexing.

237
R

R–S curve: A contour represen- ition of each image point, p,


tation giving the distance, r , of away from its true position,
each point of the contour from along the line through the
an origin chosen arbitrarily, as image center and p. See also
a function of the arc length, s. lens, lens distortion, barrel
Allows rotation-invariant com- distortion, tangential distor-
parison of contour. See also tion, pin cushion distortion,
contour, shape representation. distortion coefficient. This
figure shows the typical de-
formations of a square (ex-
s s=0 aggerated):
r(s) r(0)
X

radar: An active sensor detect-


ing the presence of distant
objects. A narrow beam of very
high-frequency radio pulses is
transmitted and reflected by a radiance: The amount of light
target back to the transmitter. (radiating energy) leaving a
The direction of the reflected surface. The light can be gener-
beam and the time of flight of ated by the surface itself, as in a
the pulse determine the tar- light source, or reflected by it.
get’s position. See also time-of- The surface can be real (e.g., a
flight range sensor. wall) or imaginary (e.g., an
radial lens distortion: A type infinite plane). See also irradi-
of geometric distortion intro- ance, radiometry.
duced by a real lens. The radiance map: A map of ra-
effect is to shift the pos- diance for a scene. Sometimes

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

239
used to refer to a high dynamic representation then records
range image. the distance r from c to
radiant flux: The radiant energy points on the boundary, as a
per time unit, that is, the function of , which is the
amount of energy transmitted angle between the direction
or absorbed per time unit. and some reference direction.
See also radiance, irradiance, The representation has prob-
radiometry. lems when the vector at angle
 intersects the boundary more
radiant intensity: See radiant than one time. See:
flux.
radiometric calibration: A pro-
cess seeking to estimate radi- r(θ) θ
ance from pixel values. The
rationale for radiometric cali- c
bration is that the light enter-
ing a real camera (the radi-
ance) is, in general, altered
by the camera itself. A simple
calibration model is Ei j = radon transform: A transforma-
gi jI + oi j, where, for each tion mapping an image into
pixel i j, E is the radiance a parameter space highlighting
to estimate, I the measured the presence of lines. It can be
intensity, and g and o a pixel- regarded as an extension of the
specific gain and offset to be Hough transform. One defin-
calibrated. Ground truth values ition is
for E can be measured using 
photometers. g  = Ix y
radiometry: The measurement × − x cos 
of optical radiation, i.e., elec- −y sin  dxdy
tromagnetic radiation between
3 × 1011 and 3 × 1016 Hz (wave- where Ix y is the image (gray
lengths between 0.01 and 1000 values) and  = x cos  + y sin 
m). This includes ultravio- is a parametric line in the
let, visible and infrared. Com- image. Lines are identified by
mon units encountered are
watts photons peaks in the   space. See also
m2
and sec−steradian . Compare
Hough transform line finder.
with photometry, which is
the measurement of visible RAG: See region adjacency
light. graph.
radius vector function: A con- random access camera: A ran-
tour or boundary representa- dom access camera is charac-
tion based about a point c in terized by the possibility of
the center of the figure (usually accessing any image location
the center of gravity or a phys- directly. The name was intro-
ically meaningful point). The duced to distinguish such

240
cameras from sequential scan edge points randomly and
cameras, where image values increments the accumulator
are transmitted in a standard cell corresponding to the line
order. through these two points. The
random dot stereogram: A selection process is repeated a
stereo pair formed by one fixed number of times.
random dot image (that is, range compression: Reducing
binary images in which each the dynamic range of an image
pixel is assigned to black or to enhance the appearance
white at random), and a of the image. This is often
second image that is derived needed for images resulting
from the first. This figure from the magnitude of the
Fourier transform which might
have pixels with both large and
very low values. Without range
compression it will be hard to
see the structure in the pix-
els with the low values. The
left image shows the magni-
tude of a 2D Fourier transform
with a single bright spot in the
middle. The right image shows
shows an example, in which a the logarithm of the left image,
central square is shifted hori- revealing more details.
zontally. Looking cross-eyed at
close distance, you should per-
ceive a strong 3D effect. See
also stereo and stereo vision.
random sample consensus:
See RANSAC.
random variable: A scalar or a
vector variable that takes on a
random value. The set of pos-
sible values may be describ- range data: A representation of
able by a standard distribution, the spatial distribution of a
such as the Gaussian, mixture set of 3D points. The data
of Gaussians, uniform, or Poi- is often acquired by stereo
sson distributions. vision or by a range sensor.
randomized Hough transform: In computer vision, range data
A variation of the standard are often represented as cloud
Hough transform designed to of points, i.e., a set of triplets
produce higher accuracy with representing the X Y Z coor-
less computational effort. The dinate of each point, or as
line-finding variant of the algo- range images, also known as
rithm selects pairs of image Moirè patch. The figure below

241
shows a range image of an the left range image. See also
industrial part, where brighter surface segmentation.
pixels are closer:

range edge: See surface shape


discontinuity
range flow: A class of algo-
rithms for the measurement of
motion in time-varying range
data, made possible by the evo-
lution of fast range sensors.
See also optical flow.
range data fusion: The merging
of multiple sets of range data, range image: A representation
especially for the purpose of of range data as an image. The
1) extending the portion of an pixel coordinates are related
object’s surface described by to the spatial position of each
the range data, or 2) increasing point on the range surface, and
the accuracy of measurements the pixel value represents the
by exploiting the redundancy distance of the surface point
of multiple measures available from the sensor (or from an
for each point of surface area. arbitrary, fixed background).
See also information fusion, The figure below shows a
fusion, sensor fusion. range image of a face, where
range data integration: See darker pixels are closer:
range data fusion.
range data registration: See
registration.
range data segmentation: A
class of techniques partitioning
range data into a set of regions.
For instance, a well-known
method for segmenting range
images is HK segmentation,
which produces a set of sur- range image edge detector:
face patches covering the initial An edge detector working on
surface. The right image shows range images. Typically, edges
the plane, cylinder and spher- occur where depths or surface
ical patches extracted from normal directions (fold edge )
242
change rapidly. See also edge RANSAC: Acronym for random
detection, range images. The sample consensus, a robust
right image shows the depth estimator seeking to counter
and fold edges extracted from the effect of outliers in data
the left range image: used, for example, in a least
square estimation problem.
In essence, RANSAC considers
a number of data subsets of
the minimum size necessary
to solve the problem (say a
parametric surface fit), then
looks for statistical agreement
of the results. See also least
range sensor: Any sensor ac- median square estimation,
quiring range data. The most M-estimation, outlier rejection.
popular range sensors in com- raster scan: “Raster” refers to
puter vision are based on optic- the region of a monitor, e.g., a
al and acoustic technologies. cathode ray tube (CRT) or a li-
A laser range sensor often uses quid crystal display (LCD) cap-
structured light triangulation. able of rendering images. In a
A time-of-flight range sensor CRT, the raster is a sequence
measures the round-trip time of horizontal lines that are
of an acoustic or optical pulse. scanned rapidly with an elec-
See also depth estimation. An tron beam from left to right
example of a triangulation and top to bottom, largely in
range sensor is the same way as a TV picture
tube is scanned. In an LCD, the
LASER raster (usually called a “grid”)
CAMERA
PROJECTOR covers the whole device area
and is scanned differently, in
OBJECT that image elements are dis-
TO SCAN played individually.
STRIPE IMAGE rate-distortion: A statistical
method useful in analog-to-
LASER STRIPE digital conversion. It deter-
mines the minimum number of
rank order filtering: A class bits required to encode data
of filters the output of which while tolerating a given level of
depends on an ordering (rank- distortion, or vice versa.
ing) of the pixels within the rate-distortion function: The
region of support. The classic number of bits per sample (the
example is the median filter rate Rd ) to encode an analog
which selects the middle value image (or other signal) value
of the set of input values. More given the allowable distortion
generally, the filter selects the D (or mean square of the
kth largest value in the input set. error). Also needed is the

243
variance  2 of the input value positives against the number
(assuming it is a Gaussian or percentage of true nega-
random variable). Then Rd = tives. Performance analysis is
2
max0 12 log2  D . a substantial topic in com-
puter vision and the object of
raw primal sketch: The first an ongoing debate. See also
representation built in the per- performance characterization,
ception process according to test, classification.
Marr’s theory of vision, heav-
ily based on detection of receptive field: 1) The retinal
local edge features. It rep- area generating the response
resents the location, orien- to a photostimulus. The main
tation, contrast and scale of cells responsible for visual
center–surround, edge, bar perception in the retina are the
and truncated bar features. See rods and the cones, active in
also primal sketch. high- and low-intensity situ-
ations respectively. See also
RBC: See recognition by com- photopic response. 2) The
ponents. region of visual space giv-
real time processing: Any com- ing rise to that response.
putation performed within the 3) The region of an image
time limits imposed by a that is input to the calculation
given process. For example, of each output value. (See
in visual servoing a tracking region of support.)
system feeds positional data recognition: See identification.
to a control algorithm gen-
erating control signals; if the recognition by components
control signals are generated (RBC): 1) A theory of human
too slowly, the whole system image understanding devised
may become unstable. Differ- by Biederman. The foundation
ent processes can impose very is a set of 3D shape primitives
different constraints for real called geons, reminiscent of
time processing. When pro- Marr’s generalized cones. Dif-
cessing video-stream data, real ferent combinations of geons
time means complete process- yield a large variety of 3D
ing of one frame of data in shapes, including articulated
the time before the next frame objects. 2) The recognition
is acquired (possibly with sev- of a complex object by rec-
eral frames lag time as in a ognizing subcomponents and
pipeline parallel process). then combining these to rec-
receiver operating curves and ognize more complex objects.
performance analysis for See also hierarchical matching,
vision: A receiver operating shape representation, model
curve (ROC) is a diagram show- based recognition, object rec-
ing the performance of a ognition.
classifier. It plots the num- recognition by parts: See rec-
ber or percentage of true ognition by components.

244
structural decomposition- recursive region growing: A
recognition by structural class of recursive algorithms
decomposition: See recogni- for region growing. An ini-
tion by components. tial pixel is chosen. Given
reconstruction: The problem of an adjacency rule to deter-
computing the shape of a 3D mine the neighbors of a pixel,
object or surface from one (e.g., 8-adjacency), the neigh-
or more intensity or range boring pixels are explored.
images. Typical techniques If any meets the criteria for
include model acquisition and addition to the region, the
the many shape from X meth- growing procedure is called
ods reported (see shape from recursively on that pixel. The
contour and following entries). process continues until all con-
nected image pixels have been
reconstruction error: Inaccur- examined. See also adjacent,
acies in a model when com- image connectedness, neigh-
pared to reality. These can borhood recursive splitting.
be caused by inaccurate sens-
ing or compression. (See lossy recursive splitting: A class of
compression.) recursive algorithms for region
segmentation, dividing an
rectification: A technique warp-
image into a region set. The
ing two images into some
region set is initialized to the
form of geometric alignment,
e.g., so that the vertical pixel whole image. A homogeneity
coordinates of corresponding criterion is then applied; if
points are equal. See also not satisfied, the image is split
stereo image rectification. This according to a given scheme
figure shows a stereo pair (e.g., into four sub-images, as
(top row) and its rectified ver- in a quadtree ), leading to a
sion (bottom row), highlight- new region set. The procedure
ing some of the corresponding is applied recursively to all
scanlines, where correspond- regions in the new region set,
ing image features lie: until all remaining regions
are homogeneous. See also
region segmentation, region
based segmentation, recursive
region growing.
reference frame transforma-
tion: See coordinate system
transformation.
reference image: An image of
a known scene or of a scene
at a particular time used for
comparison with a current
image. See, for example,
change detection.

245
reference views: In iconic is based on the observation that
recognition, the views chosen the illumination on both sides
as most representative for a of a reflectance or color edge is
3D object. See also eigenspace nearly the same. So, although
based recognition, character- we cannot factor out the
istic view. reflectance and illumination
reference white: A sample from only the observed light-
image value which corresponds ness, the ratio of the lightnesses
to a known white object. The on both sides of the edge equals
knowledge of such a value facili- the ratio of the reflectances,
tates white balance corrections. independent of illumination.
Thus the ratio is invariant to
reflectance: The ratio of re- illumination and local surface
flected to incident flux, in other geometry for a significant class
words the ratio of reflected of reflectance maps. See also
to incident (light) power. See invariant, physics based vision.
also bidirectional reflectance
distribution function. reflection: 1) A mathematical
transformation where the out-
reflectance estimation: A class put image is the input image
of technique for estimating the flipped over about a given
bidirectional reflectance distri- transformation line in the
bution function (BDRF). Used image plane. See reflection
notably within the techniques operator. 2) An optics phe-
for shape from shading and nomenon whereby all incident
image based rendering, which light incident on a surface is
seeks to render arbitrary deflected away, without ab-
images of scenes from video sorption, diffusion or scatter-
material only. All information ing. An ideal mirror is the per-
about geometry and photome- fect reflecting surface. Given a
try (e.g., the BDRF) is derived single ray of light incident on
from video. See also physics a reflecting surface, the angle
based vision. of incidence equals the angle
reflectance map: The reflect- of reflection, as shown below.
ance map expresses the reflect- See also specular reflection.
ance of a material in terms
of a viewer-centered repre-
sentation of local surface INCIDENT RAY REFLECTED RAY
orientation. The most com-
monly used is the Lambertian
reflectance map, based on α α
Lambert’s law. See also shape
from shading, photometric
stereo.
reflectance ratio: A photo- reflection operator: A linear
metric invariant used for seg- transformation intuitively cha-
mentation and recognition. It nging each vector or point of

246
a given space to its mirror region: A connected part of an
image, as shown below. The image, usually homogeneous
transformation corresponding with respect to a given criter-
matrix, say H, has the property ion.
HH = I, i.e., H−1 = H: a reflec-
tion matrix is its own inverse. region adjacency graph (RAG):
See also rotation. A graph expressing the
adjacency relations among
image regions, for instance
generated by a segmentation
algorithm. See also region
segmentation and region based
segmentation. The adjacency
relations of the regions in the
left figure are encoded in the
RAG at the right:

B B
refraction: An optical phe- A A D
nomenon whereby a ray of C D C
light is deflected while passing
through different optic me-
diums, e.g., from air to water. region based segmentation:
The amount of deflection is A class of segmentation tech-
governed by the difference niques producing a number of
between the refraction indices image regions, typically on the
of the two mediums, according basis of a given homogeneity
to Snell’s law: criterion. For instance, intensity
n1 n2 image regions can be homo-
=
sin 1  sin 2  geneous by color (see color
where n1 and n2 are the image segmentation ) or texture
refraction indices of the two properties (see texture field
media, and 1 , 2 the respect- segmentation ); range image
ive refraction angles: regions can be homogeneous
by shape or curvature proper-
ties (see HK segmentation ).
INCIDENT RAY
region boundary extraction:
The problem of computing the
boundary of a region, for ex-
α
ample, the contour of a region
MEDIUM 1
in an intensity image after
MEDIUM 2 color based segmentation.
β
region decomposition: A class
REFRACTED RAY
of algorithms aiming to par-
tition an image or region

247
thereof into regions. See also merged into the region when
region based segmentation. the data are consistent with the
region descriptor: 1) One or previous region. The region
more properties of a region, is often redescribed after each
new set of data is added to
such as compactness or mo-
it. Many region growing algo-
ments. 2) The data structure
rithms have the form: 1) Des-
containing all data pertaining cribe the region based on the
to a region. For instance, current pixels that belong to
for image regions this could the region (e.g., fit a linear
include the region’s position model to the intensity distribu-
in the image (e.g., the coor- tion). 2) Find all pixels adja-
dinates of the center of mass ), cent to the current region. 3)
the region’s contour (e.g., a Add an adjacent pixel to the
list of 2D coordinates), some region if the region descrip-
indicator of the region shape tion also describes this pixel
(e.g., compactness or peri- (e.g., it has a similar inten-
meter squared over area), and sity). 4) Return to step 1 as
the value of the region’s homo- long as new pixels continue
geneity index. to be added. A similar algo-
region detection: A vast class rithm exists for region grow-
of algorithms seeking to parti- ing with 3D points, giving a
tion an image into regions with surface fitting. The data points
particular properties. See for could come from a regular grid
details region identification, (pixel or voxel ) or from an
region labeling, region match- unstructured list. In the latter
ing, region based segmenta- case, it is harder to determine
tion. adjacency.
region filling: A class of algo- region identification: A class of
rithms assigning a given value algorithms seeking to identify
to all the pixels in the interior regions with special proper-
of a closed contour identifying ties, for instance, a human
a region. For instance, one figure in a surveillance video,
may want to fill the inter- or road vehicles in an aerial
ior of a closed contour in a sequence. Region identifica-
binary image with zeros or tion covers a very wide
ones. See also morphology, area of techniques spanning
mathematical morphology, many applications, including
binary mathematical morph- remote sensing, visual surveil-
lance, surveillance, and agri-
ology.
cultural and forestry surveying.
region growing: A class of algo- See also target recognition,
rithms that construct a con- automatic target recognition
nected region by incrementally (ATR), binary object recog-
expanding the region, usually nition, object recognition, pat-
at the boundary. New data are tern recognition.

248
region invariant: 1) A property the region do not distract from
of a region that is invariant or distort results. As an ex-
(does not change) after some ample, when tracking a target
transformation is applied to through an image sequence,
the region, such as trans- most algorithms for locating
lation, rotation or perspec- the target in the next video
tive projection. 2) A property frame only consider image data
or function which is invariant from a region of interest sur-
over a region. rounding the predicted target
region labeling: A class of position. The figure shows a
algorithms which are used to boxed region of interest:
assign a label or meaning to
each image region in a given
image segmentation to achieve
an appropriate image interpre-
tation. Representative techn-
iques are relaxation labeling,
probabilistic relaxation label-
ing, and interpretation trees
(see interpretation tree search).
See also labeling problem.
region matching: 1) Estab-
lishing the correspondences
between matching members region of support: The sub-
of two sets of regions. 2) The region of an image that is used
degree of similarity between in a particular computation.
two regions, i.e., solving the For example, an edge detector
matching problem for regions. usually only uses a subregion
See, for instance, template
of pixels neighboring the pixel
matching, color matching,
currently being considered for
color histogram matching.
being an edge.
region merging: A class of
region neighborhood graph:
algorithms fusing two image
See region adjacency graph.
regions into one if a given
homogeneity criterion is sat- region propagation: The prob-
isfied. See also region, region lem of tracking moving image
based segmentation, region regions.
splitting. region representation: A class
region of interest: A subregion of methods to represent the
of an image where processing defining characteristics of an
is to occur. Regions of interest image region. For encoding
may be used to: 1) reduce the the shapes, see axial represen-
amount of computation that is tation, convex hull, graph
required or 2) to focus process- model, quadtree, run-length
ing so that image data outside coding, skeletonization. For

249
encoding a region by its prop- registration (right) of the
erties, see moments, curva- solid (left) and dashed (mid-
ture scale space, Fourier dle) curves. The transforma-
shape descriptor, wavelet de- tion needs not be rigid;
scriptor, shape representation. non-rigid registration is com-
region segmentation: See re- mon in medical imaging, for
gion based segmentation. instance in digital subtraction
angiography. Notice also that
region snake: A snake repre- most often there is no exact
senting the boundary of some solution, as the two objects
region. The operation of com- are not exactly the same, and
puting of the snake may be the best approximate solution
used as a region segmentation must be found by least squares
technique. or more complex methods.
region splitting: A class of algo- See also Euclidean trans-
rithms dividing an image, or a formation, medical image reg-
region thereof, into parts (sub istration, model registration,
regions) if a given homogene- multi-image registration.
ity criterion is not satisfied over regression: 1) In statistics, the
the region. See also region, relationship between one vari-
region based segmentation, able and another, as in
region merging. linear regression. A particular
registration: A class of tech- case of curve and surface
niques aiming to align, super- fitting. 2) Regression testing
impose, or match two objects verifies that changes to the
of the same kind (e.g., images, implementation of a system
curves, models); more specif- have not caused a loss of func-
ically, to compute the geo- tionality, or regression to the
metric transformation super- state where that functionality
imposing one to the other. For did not exist.
instance, image registration regularization: A class of math-
determines the region com- ematical techniques to solve an
mon to two images, thereby ill-posed problem. In essence,
finding the planar transform- to determine a single solu-
ation (rotation and transla- tion, one introduces the con-
tion) aligning them; similarly, straint that the solution must
curve registration determines be smooth, in the intuitive
the transformation aligning the sense that similar inputs must
similar (or same) part of two correspond to similar outputs.
curves. This figure shows the The problem is then cast
as a variational problem, in
which the variational integral
depends both on the data and
on the smoothness constraint.
For instance, a regularization

250
approach to the problem of techniques based on relations
estimating a function f from between the properties of
a set of values y1  y2 


 yn image entities (e.g., regions or
at the data point x1 


 xn , other features). For regions,
leads to the minimization of for instance, commonly used
the functional properties are adjacency, inclu-
sion, connectedness, and rel-

N
ative area size. See also relat-
H f  = xi  − yi 2 +  f 
 f 
i=1
ional graph, region adjacency
graph.
where  f  is the smoothness
functional, and a positive relative depth: The difference
parameter called the regular- in depth values (distance from
ization number. some observer ) for two points.
In certain situations while it
relational graph: A graph may not be possible to com-
in which the arcs express pute actual or absolute depth,
relations between the prop- it may be possible to compute
erties of image entities relative depth.
(e.g., regions or other fea- relative motion: The motion of
tures) which are the nodes an object with respect to some
in the graph. For regions, other, possibly also moving,
for instance, commonly used frame of reference (typically
properties are adjacency, inclu- the observer’s).
sion, connectedness, and
relative area size. See also relative orientation: The prob-
region adjacency graph (RAG), lem of computing the orien-
shape representation. The adja- tation of an object with
cency relations of the regions respect to another coord-
in the left figure are encoded inate system, such as that of
in the RAG at the right: the sensor. More specifically,
the rotation matrix aligning
the reference frames attached
B B to the object and second
A A D object. See also pose and pose
C D C
estimation.
relaxation: A technique for
assigning values from a con-
relational matching: A class of tinuous or discrete set to
matching algorithms based on the node of a network or
relational descriptors. See also graph by propagating the
relational graph. effects of local constraints.
relational model: See relational The network can be an image
graph. grid, in which case the pixels
are nodes, or features, for
relational shape description: A instance edges or regions.
class of shape representation At each iteration, each node

251
interacts with its neighbors, airplanes or satellites. Used fre-
altering its value according to quently in agriculture, forestry,
the local constraints. As the meteorological and military
number of iterations increases, applications. See also multi-
the effect of local constraints spectral analysis, multi-spectral
are propagated to farther and image, geographic information
farther parts of the network. system (GIS).
Convergence is achieved when representation: A description
no more changes occur, or or model specifying the prop-
changes become insignificant. erties defining an object or
See also discrete relaxation, class of objects. A classic ex-
relaxation labeling, probabil- ample is shape representation,
istic relaxation labeling. a group of techniques for
relaxation labeling: A relaxa- describing the geometric
tion technique for assigning a shape of 2D and 3D objects.
label from a discrete set to each See also Koenderink’s
node of a network or graph. surface shape classification.
A well-known example, a clas- Representations can be sym-
sic in artificial intelligence, bolic or non-symbolic (see
is Waltz’s line labeling algo- symbolic object representation
rithm (see also line drawing and non-symbolic represen-
analysis ). tation ), a distinction inherited
from artificial intelligence.
relaxation matching: A relax-
ation labeling technique for resection: The computation of
model matching, the pur- the position of a camera
pose of which is to label given the images of some
(match) each model primi- known 3D points. Also known
tive with a scene primitive. as camera calibration, or pose
Starting from an initial label- estimation.
ing, the algorithm harmonizes resolution: The number of pix-
iteratively neighboring labels els per unit area, length, visual
using a coherence measure for angle, etc.
the set of matches. See also
discrete relaxation, relaxation restoration: Given a noisy sam-
labeling, probabilistic relax- ple of some true data, the goal
ation labeling. of restoration is to recover the
best possible estimate of the
relaxation segmentation: A original true data, using only
class of segmentation tech- the noisy sample.
niques based on relaxation.
reticle: The network of fine
See also image segmentation.
wires or receptors placed in
remote sensing: The acquisi- the focal plane of an optical
tion, analysis and understand- instrument for measuring the
ing of imagery, mainly of the size or position of the objects
Earth’s surface, acquired by under observation.
252
retinal image: The image which See also geometric model,
is formed on the retina of the model acquisition.
human eye. RGB: A format for color images,
retinex: An image enhancement encoding the Red, Green, and
algorithm based on retinex the- Blue component of each pixel
ory, aimed to compute an in separate channels. See also
illuminant- independent quan- YUV, color image.
tity called lightness at each ribbon: A shape representation
image pixel. The key obser- for pipe-like planar objects
vation is that normal illumin- whose contours are approxi-
ation on a surface changes mately parallel, e.g., roads in
slowly, leading to slow changes aerial imagery. See also gene-
in the observed brightness of ralized cones, shape repre-
a surface. This contrasts with sentation.
strong changes in brightness ridge: A particular type of dis-
at reflectance and fold edges. continuity of the intensity func-
The retinex algorithm removes tion, giving rise to thick edges
the slowly varying components and lines. This figure shows a
by exploiting the fact that the characteristic dark-to-light-to-
observed brightness B = L × I dark intensity ridge profile
is product of the surface light-
ness (or reflectance) L and the
illumination I . By taking the INTENSITY
logarithm of B at each pixel,
the product of L and I become
a sum of logarithms. Slow
changes can be detected by dif-
ferentiation and then removed
by thresholding. Re-integration
of the result produces the light- PIXEL POSITION
ness image (up to an arbitrary
scale factor).
reverse engineering: The prob- along a scanline. See also
lem of generating a model step edge, roof edge, edge
detection.
of a 3D object from a set of
views, for instance a VRML ridge detection: A class of algo-
or a triangulated model. The rithms, especially edge and line
model can be purely geo- detectors, for detecting ridges
metric, that is, describing just in images.
the object’s shape, or combine right-handed coordinate sys-
shape and textural properties. tem: A 3D coordinate system
Techniques exists for reverse with the XYZ axes arranged as
engineering from both range follows. The alternative is a
images and intensity images. left-handed coordinate system.

253
+Y estimating the Euclidean
transformation that aligns the
model with the data. See also
non-rigid registration.
rigidity constraint: The as-
+X sumption that a scene or
object under analysis is rigid,
implying that all 3D points
remain in the same rela-
+Z tive position in space. This
(OUT OF PAGE) constraint can simplify signifi-
cantly many algorithms, for
instance shape reconstruction
rigid body segmentation: The (see shape and following
problem of partitioning auto- “shape from” entries) and
matically the image of an motion estimation.
articulated or deformable body
into a number of rigid sub- road structure analysis: A
components. See also part class of techniques which are
segmentation, recognition by used to derive information
components (RBC). about roads from images.
These can be close-up images
rigid motion estimation: A (e.g., images of the tarmac
class of techniques aiming to as acquired from a moving
estimate the 3D motion of a vehicles, to map defects
rigid body or scene in space automatically over extended
from a sequence of images distances) or remotely sensed
by assuming that there are images (e.g., to analyze the
no changes in shape. Rigidity geographical structure of road
simplifies the problem sig- networks).
nificantly so that changes in
appearance arise solely from Roberts cross gradient oper-
changes in relative position ator: An operator used for
and projection. Techniques edge detection, computing
an estimate of perpendicular
exist for using known 3D
components of the image
models, or estimating the
gradient at each pixel. The
motion of a general cloud
image is convolved with the
of 3D points, or from image two Roberts kernels, yielding
feature points or estimating two components, Gx and Gy ,
motion from optical flow. See for each pixel.
also motion estimation, ego-  The gradient
motion. magnitude Gx2 + Gy2 and
G
rigid registration: Registration orientation arctan Gy can then
x
where neither the model nor be estimated as for any 2D vec-
data is allowed to deform. tor. See also edge detection,
This reduces registration to Canny edge detector, Sobel
254
gradient operator, Sobel gradient, Gx andGy . The gra-
kernel, Deriche edge detector, dient magnitude Gx2 + Gy2 and
Hueckel edge detector, Kirsch
G
edge detector, Marr–Hildreth orientation arctan Gy can then
x
edge detector, O’Gorman be estimated as for any 2D vec-
edge detector, Robinson edge tor. See also edge detection,
detector. Roberts cross gradient oper-
Roberts kernel: A pair of ker- ator, Sobel gradient operator,
nels, or masks, used to Sobel kernel, Canny edge de-
estimate perpendicular com- tector, Deriche edge detector,
ponents of the image gra- Hueckel edge detector, Kirsch
dient within the Roberts cross edge detector, Marr–Hildreth
gradient operator: edge detector, O’Gorman edge
detector.
robust: A general term referring
to a technique which is insen-
sitive to noise or other pertur-
bations.
robust estimator: A statistical
estimator which, unlike nor-
mal least squares estimators,
The masks respond maximally is not distracted by even
to edge oriented to plus or significant percentages of out-
minus 45 from the vertical axis liers in the data. Popular
of the image. robust estimators in com-
Robinson edge detector: An puter vision include RANSAC,
operator for edge detection, least median of squares, and
computing an estimate of the M-estimators. See also outlier
directional first derivatives of rejection.
the image in eight directions. robust regression: A form of
The image is convolved with regression that does not use
the eight kernels, three of outlier values in computing
which as shown here the fitting parameters. For ex-
ample, if doing a least square
straight line fit to a set of data,
normal regression methods
use all data points, which can
give distorted results if even
one point is very far away from
the “true” line. Robust pro-
Two of these, typically those cesses either eliminate these
responding maximally to dif- outlying points or reduce their
ferences along the coordinate contribution to the results. The
axes, can be taken as estimates figure below shows a rejected
of the two components of the outlying point:

255
REJECTED OUTLIER

ROOF EDGE

INLIERS
rotating mask: A mask which
is considered in a num-
robust statistics: A general term ber of orientations relative
describing statistical methods to some pixel. See, for ex-
which are not significantly ample, the masks used in the
influenced by outliers. Robinson edge detector. Most
robust technique: See robust commonly used as a type of
estimator. average smoothing in which
ROC: See receiver operating the most homogeneous mask
characteristic. is used to compute the
smoothed value for every pixel.
roll: A 3D rotation represen- In the example, notice how
tation component (along
although image detail has been
with pitch and yaw ) often
used for cameras or moving reduced the major boundaries
observers. The roll component have not been smoothed.
specifies a rotation about the
optical axis or line of sight. This
figure shows the roll rotation
direction:

ROLL
DIRECTION

roof edge: 1) An image edge


where the values increase con-
tinuously to a maximum and
then decrease continuously,
such as the brightness values
on a Lambertian cylinder when
lit by a point light source, or an
orientation discontinuity (or
fold edge ) in a range image. 2)
A scene edge where an orienta-
tion discontinuity occurs. The
figure shows a horizontal roof
edge in a range image:
256
rotation: A circular motion of a transpose. A rotation matrix
set of points or object around has only three degrees of
a given point (2D) or line (3D, freedom in 3D and one in
called the axis of rotation ). 2D. In 3D space, there are
three eigenvalues, namely −1,
rotation estimation: The prob-
cos  + i sin , cos  − i sin ,
lem of estimating rotation from
where i is the imaginary unit.
raw or processed image, video A rotation matrix in 3D has
or range data, typically from nine entries but only three
two sets of corresponding degrees of freedom, as it
points (or lines, planes, etc.) must satisfy six orthogonality
taken from rotated versions of constraints. It can be param-
a pattern. The problem usu- eterized in various ways,
ally appears in one of three usually through Euler angles,
forms: 1) estimating the 3D yaw, pitch, roll, rotation
rotation from 3D data (three angles around the coordinate
points are needed), 2) estimat- axes, and axis-angle, etc. See
ing the 3D rotation from 2D also orientation estimation,
data (three points are needed rotation representation, qua-
but lead to multiple solutions), ternions.
or 3) estimating the 2D rota-
tion from 2D data (two points rotation operator: A linear
are needed). A second issue operator expressed by a
to consider is the effect of rotation matrix.
noise: typically more than the rotation representation: A for-
minimum number of points malism describing rotations
are needed to counteract the and their algebra. The most
effects of noise, which leads to frequent is definitely the
least square algorithms. rotation matrix, but quater-
rotation invariant: A property nions, Euler angles, yaw, pitch,
that keeps the same value even roll, rotation angles around
if the data values, the cam- the coordinate axes, and axis-
era, the image or the scene angle, etc. have also been
from which the data comes is used.
rotated. One needs to distin- rotational symmetry: The pro-
guish between 2D (i.e., in the perty of a set of point or object
image) and 3D (i.e., in the to remain unchanged after a
scene) rotation invariance. For given rotation. For instance,
example, the angle between a cube has several rotational
two image lines is invariant to symmetries, with respect to
image rotation, but not to rota- any 90 rotation around any
tion of the lines in the scene. axis passing through the cen-
rotation matrix: A linear oper- ters of opposite faces. See also
ator rotating a vector in a rotation, rotation matrix.
given space. The inverse of RS-170: The standard black-and-
a rotation matrix equals its white video format in the
257
United States. The EIA (Elec- file formats such as TIFF,
tronic Industry Association) is BMP and PCX. See also
the standards body that origi- image compression, video
nally defined the 525-line, 30 compression, JPEG.
frame per second TV stan- run length compression: See
dard for North America, Japan, run length coding.
and a few other parts of
the world. The EIA standard,
also defined under US stan-
dard RS-170A, defines only
the monochrome picture com-
ponent but is mainly used
with the NTSC color encoding
standard. A version exists for
PAL cameras.
rubber sheet model: See mem-
brane model.
rule-based classification: A
method of object recognition
drawn from artificial intelli-
gence in which logical rules
are used to infer object type.
run code: See run length
coding.
run length coding: A lossless
compression technique used
to reduce the size of a
repeating string of characters,
called a “run”, also applicable
to images. The algorithm
encodes a run of symbols
into two bytes, a count and
a symbol. For instance, the
6-byte string “xxxxxx” would
become “6x” occupying 2 bytes
only. It can compress any
type of information content,
but the content itself affects,
obviously, the compression
ratio. Compression ratios are
not high compared to other
methods, but the algorithm
is easy to implement and
quick to execute. Run-length
coding is supported by bitmap

258
S

saccade: A movement of the eye the effect of salt-and-pepper


or camera, changing the direc- noise as In = imin + yimax −
tion of fixation sharply. imin  iff x ≥ l, where l is
saliency map: A representation a parameter controlling how
encoding the saliency of given much of the image is cor-
image elements, typically fea- rupted, and imin  imax the range
tures or groups thereof. See of the noise. See also image
also salient feature, Gestalt, noise, Gaussian noise. This
perceptual grouping, percep- image was corrupted with 1%
tual organization. noise:
salient feature: A feature asso-
ciated with a high value of a
saliency measure, quantifying
feature suggestiveness for per-
ception (from the Latin salire,
to leap). For instance, inflection
points have been indicated
as salient features for repre-
senting contours. Saliency
is a concept originated from
Gestalt psychology. See also
perceptual grouping, percep-
tual organization. sampling: The transformation
salt-and-pepper noise: A type of a continuous signal into a
of impulsive noise. Let x y ∈ discrete one by recording its
0 1 be two uniform ran- values at discrete instants or
dom variables, I the true locations. Most digital images
image value at a given pixel, are sampled in space, time and
and In the corrupted (noisy) intensity, as intensity values
version of I . We can define are defined only on a regular

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

259
spatial grid, and can only take common assumptions, the op-
integer values. This shows an timal solution arises when
example of a continuous signal d is the more complicated geo-
and its samples: metric distance d a =
x  S
x − y 2 . The Sampson
miny∈S 
approximation defines
a x 2
f
d a =
x  S
a x 2

f
which is a first-order approx-
imation to the geometric dis-
tance. If an efficient algorithm
for minimizing weighted alge-
sampling density: The density braic distance is available, then
of a sampling grid, that is, the Sampson iterations are a
the number of samples col- further approximation, where
lected per unit interval. See  k is the solution
the k th iterate a
also sampling. to
sampling theorem: If an image  n

is sampled at a rate higher  k = argmin wi f


a a xi 2

a i=1
than its Nyquist frequency then
an analog image could be with weights computed using
reconstructed from the sam- the previous estimate so wi =
pled image whose mean square ak−1  xi 2 .
1/
f
error with the original image SAR: see synthetic aperture
converges to zero as the num- radar.
ber of samples goes to infinity.
SAT: See symmetric axis trans-
Sampson approximation: form.
An approximation to the
satellite image: An image of a
geometric distance in the fitting
section of the Earth acquired
of implicit curves or surfaces
using a camera mounted on an
that are defined by a param- orbiting satellite.
eterized function of the form
a x  = 0 for x on the surface
f saturation: Reaching the upper
S
a defined by parameter vec- limit of a dynamic range. For
tor a . Fitting the surface to the instance, intensity saturation
set of points  x1      xn con- occurs for a 8-bit monochro-
sists in minimizing matic image when intensities
 a function of greater than 255 are recorded:
the form e a = ni=1 d xi  S
a.
Simple solutions are often any such value is encoded as
available if the distance 255, the largest possible value
function d x  S
a is the in the range.
algebraic distance d x  S a = Savitzky–Golay filtering: A
a x 2 , but under certain
f class of filters achieving least
260
squares fitting of a polyno- scale reduction: The result of
mial to a moving window the application of a scale
of a signal. Used for fitting operator.
and data smoothing. See also scale space: A theory for early
linear filter, curve fitting. vision developed to account
scalar: A one dimensional entity; properly for the multi-scale
a real number. nature of images. The rationale
is that, in the absence of a pri-
scale: 1) The ratio between the ori information on the optimal
size of an object, image, or spatial scale at which a spe-
feature and that of a refer- cific problem should be treated
ence or model. 2) The prop- (e.g., edge detection), images
erty that some image fea- should be analyzed at all pos-
tures are apparent only when sible scales, the coarser ones
viewed at a given size, such representing simplifications of
as a line being enlarged so the finer ones. The finest scale
much that it appears as a is the input image itself. See
pair of parallel edge features. scale space representation for
3) A measure of the degree details.
to which fine features have
been removed or reduced scale space filtering: The filter-
in an image. One can ana- ing operation that transforms
lyze images at multiple spatial one resolution level into
scales, whereby only features another in a scale space, for
in certain size ranges appear at instance Gaussian filtering.
each scale (see scale space and scale space matching: A class of
pyramid). matching techniques that com-
scale invariant: A property that pare shape at various scales.
keeps the same value even if See also scale space and image
the data, the image or the matching.
scene from which the data scale space representation: A
comes is shrunk or enlarged. representation of an image,
2
The ratio perimeter is invariant to and more generally of a signal,
area
image scaling. making explicit the informa-
tion contained at multiple
scale operator: An operator spatial scales, and establishing
suppressing details (high- a causal relationship between
frequency contents) in an adjacent scale levels. The scale
image, e.g., Gaussian smooth- level is identified by a scalar
ing. Details at small scales parameter, called scale param-
are discarded. The resulting eter. A crucial requirement is
content can be represented in that coarser levels, obtained
a smaller-size image. See also by successive applications
scale space, image pyramid, of a scale operator, should
Gaussian pyramid, Laplacian constitute simplifications
pyramid, pyramid transform. of previous (finer) levels,

261
i.e., introduce no spurious rectified images, whereby cor-
details. A popular scale space responding points lie on
representation is the Gaussian scanlines with the same index.
scale space, in which the next See also rectification, stereo
coarser image is obtained by correspondence.
convolving the current image scanning electron microscope
with a Gaussian kernel. The (SEM): A scientific microscope
variance of this kernel is the introduced in 1942. It uses a
scale parameter. See also beam of highly energetic elec-
scale space, image pyramid, trons to examine objects on
Gaussian smoothing. a very fine scale. The imag-
scaling: 1) The process of zoom- ing process is essentially the
ing or shrinking an image. same as for a light microscope
2) Enlarging or shrinking a apart from the type of radi-
model to fit a set of data. 3) The ation used. Magnification is
process of transforming a set of much higher than what can be
values so that they lie inside a achieved with light. The images
standard range (e.g., −1 1), are rendered in gray shades.
often to improve numerical sta- This technique is particularly
bility. useful for investigating micro-
scanline: A single (horizontal) scopic details of surfaces.
line of an image. Originally this scatter matrix: For a set of
term was used for cameras in d dimensional points rep-
which the image is acquired resented as column vectors
line by line by a sensing ele- x1      xn , with mean
  =
1 n
ment that generally scans each
n i=1 x i , the scatter matrix is
pixel on a line and then moves the d × d matrix
onto the next line.
scanline slice: The cross section 
S= xi − 
  
 xi − 
of a structure along an image i=1
scanline. For instance, the
scanline slice of a convex poly- It is n − 1 times the sample
gon in a binary image is: covariance matrix.
scattergram: See scatterplot.
scatterplot: A data display tech-
nique in which each data item
1 is plotted as a single point in an
appropriate coordinate system,
that might help a person to bet-
0 ter understand the data. For
example, if a set of estimated
surface normals is plotted in
scanline stereo matching: The a 3D scatterplot, then planar
stereo matching problem with surfaces should produce tight

262
clusters of points. The figure coordinates, viewer centered
shows a set of data points plot- coordinates or object centered
ted according to their values of coordinates.
features 1 and 2:
scene labeling: The problem
of identifying scene elements
FEATURE 2 from image data, associating
them to labels representing
their nature and roles. See
also labeling problem, region
labeling, relaxation labeling,
image interpretation, scene
understanding.

FEATURE 1 scene reconstruction: The


problem of estimating the 3D
geometry of a scene, for exam-
scene: The part of 3D space ple the shape of visible surfaces
captured by an imaging sen- or contours, from image data.
sor, and every visible object See also reconstruction, shape
therein. from contour and the fol-
scene analysis: The process lowing “shape from” entries
of examining an image or or architectural model, volu-
video, for the purpose of infer- metric, surface and slice based
ring information about the reconstruction.
scene in view, such as the scene understanding: The
shape of the visible surfaces, problem of constructing a
the identity of the objects in semantic interpretation of a
the scene, and their spatial or scene from image data, that
dynamic relationships. See also is, describing the scene in
shape from contour and the terms of object identities and
following “shape from” entries, relationships among objects.
object recognition, and sym- See also image interpretation,
bolic object representation. object recognition, symbolic
scene constraint: Any con- object representation, seman-
straint imposed on the image tic net, graph model, relational
data by the nature of the scene, graph.
for instance, rigid motion, or SCERPO: Spatial Correspon-
the orthogonality of walls and dence, Evidential Reasoning
floors, etc. and Perceptual Organization.
scene coordinates: A 3D coord- A well known vision system
inate system that describes the developed by David Lowe that
position of scene objects rel- demonstrated recognition of
ative to a given coordinate complex polyhedral objects
system origin. Alternative co- (e.g., razors) in a complex
ordinate systems are camera scene.

263
screw motion: A 3D transfor- ence, derived from the Taylor
mation comprising a rotation approximation of f :
about an axis a  and translation f − 2fi + fi−1
along a  . The general Euclidean fi = i+1 + Oh
h2
transformation x → R x + t
is a screw transformation if where h is the sampling step
Rt = t . (assumed constant), and Oh
indicates that the truncation
search tree: A data struc- error vanishes as h. A similar
ture that records the choices but more complicated approxi-
that could be made in a mation exists for estimating the
problem-solving activity, while second derivative in a given
searching through a space of direction in an image. See also
alternative choices for the next first derivative filter.
action or decision. The tree
second fundamental form: See
could be explicitly created or surface curvature.
be implicit in the sequence of
actions. For example, a tree seed region: The initial region
that records alternative model- used in a region growing pro-
to-data feature matching is a cess such as surface fitting in
specialized search tree called range data or intensity region
an interpretation tree. If each finding in an intensity image.
non-leaf node has two chil- The patch on the surface here
dren, we have a binary search is a potential seed region for
tree. See also decision tree, region growing the full cylin-
tree classifier. drical patch:
SECAM: SECAM (Sequential
Couleur avec Mémoire) is the
television broadcast standard
in France, the Middle East,
and most of Eastern Europe.
SECAM broadcasts 819 lines
per second. It is one of three
main television standards
throughout the world, the
other two being PAL (see PAL segmentation: The problem of
camera ) and NTSC. dividing a data set into parts
according to a given set of rules.
second derivative operator: A The assumption is that the
linear filter estimating the sec- different segments correspond
ond derivative from an image to different structures in
at a given point and in a given the original input domain
direction. Numerically, a sim- observed in the image. See
ple approximation of the sec- for instance imagesegmenta-
ond derivative of a 1D function tion, colorimagesegmentation,
f is the central (finite) differ- curvesegmentation, motion
264
segmentation, partsegmenta- self-occlusion: Occlusion in
tion, range data segmentation, which part of an object is
texturesegmentation. occluded by another part of
self-calibration: The problem the same object. In the fol-
of estimating the calibration lowing example the left leg of
parameters using only informa- the person is occluding their
tion extracted from a sequence right leg.
or set of images (typically
feature point correspondences
in subsequent frames of a
sequence or in several simul-
taneous views), as opposed
to traditional calibration in
photogrammetry, that adopt
specially built calibration
objects. Self calibration is inti-
mately related with basic con- SEM: See scanning electron
cepts of multi-view geometry. microscope.
See also camera calibration, semantic net: A graph rep-
autocalibration, stratification, resentation in which nodes
projective geometry. represent the objects of a given
self-localization: The problem domain, and arcs properties
of estimating the sensor’s posi- and relations between objects.
tion within an environment See also symbolic object
from image or video data. representation, graph model,
relational graph. A simple
The problem can be cast as
example: an arch and its
geometric model matching if
semantic net representation:
models of sufficiently complex
objects are available, i.e., con-
taining enough points to ARCH
allow a full solution of the
pose estimation problem. In TOP POST POST
some situations it is possible
to identify a sufficient number PART_OF
of landmark points (see land- SUPPORTS
mark detection ). If no inform-
ation at all is available about semantic region growing: A
the scene, one can still region merging scheme incor-
apply tracking or optical flow porating a priori knowledge
techniques to get correspond- about adjacent regions; for
ing points over time, or instance, in aerial imagery
stereo correspondences in mul- of countryside areas, the
tiple simultaneous frames. fact that roads are usually
See also motion estimation, surrounded by fields. Con-
egomotion. straint propagation can then

265
be applied to achieve a in front of a target using only
globally optimal region seg- visual data (station keeping).
mentation. See also constraint Suppression of jitter in hand-
satisfaction, relaxation labeling, held video recorders is now
region segmentation, region commercially available. Basic
based segmentation, recursive ingredients are tracking and
region growing. motion estimation. See also
sensor: A general word for a egomotion.
mechanism that records infor- sensor motion estimation: See
mation from the “outside egomotion.
world”, generally for process-
ing by a computer. The sen- sensor path planning: See
sor might obtain raw measure- sensor planning.
ments, e.g., a video camera, sensor placement determina-
or partially processed infor- tion: See camera calibration
mation, e.g., depth from a and sensor planning.
stereo triangulation process. sensor planning: A class of tech-
sensor fusion: A vast class niques aimed to determine
of techniques aiming to optimal sensing strategies for
combine the different infor- a reconfigurable sensor sys-
mation contained in data tem, normally given a task
from different sensors, in and a geometric model of the
order to achieve a richer target object (that may be
or more accurate descrip- partially acquired in previous
tion of a scene or action. views). For example, given a
Among the many paradigms geometric feature on an object
for fusing sensory informa- for which a CAD-like model is
tion are the Kalman filter, known, and the task to ver-
Bayesian models, fuzzy logic, ify the feature’s size, a sen-
Dempster–Shafer evidential sor planning system would
reasoning, production systems determine the best position
and neural networks. and orientation of, say, a sin-
sensor motion compensation: gle camera and associated illu-
A class of techniques aiming to mination for estimating the
suppress the motion of a sen- size of each feature. The two
sor (or its effects) in a video basic approaches have been
sequence, or in data extracted generate-and-test, in which sen-
from the sequence. A typical sor configurations are gen-
example is image sequence erated and then evaluated
stabilization, in which a target with respect to the task con-
moving across the image in straints, and synthetic meth-
the original sequence appears ods, in which task constraints
stationary in the output are characterized analytically
sequence. Another example and the resulting equations
is keeping a robot stationary solved to yield the optimal

266
sensor configuration. See also shows the spectral sensitivity
active vision, purposive vision. of a typical CCD sensor (actu-
sensor position estimation: ally its spectral response, from
See pose estimation. which the spectral sensitivity
can be inferred). Notice that
sensor response: The output of the high sensitivity of silicon
a sensor, or a characterization in the infrared means that IR
of some key output quantities, blocking filters should be con-
given a set of inputs. Typically sidered for fine measurements
expressed in the frequency depending on camera intensi-
domain, as a function linking ties. We also notice that a CCD
the magnitude and phase of camera makes a very good sen-
the Fourier transform of the sor for the near-infrared range
output signal with the known (750–1000 nm).
frequency of the input. See
also phase spectrum, power separability: A term used in
spectrum, spectral response. classification problems refer-
ring to whether the data is
sensor sensitivity: In general, capable of being split into dis-
the weakest input signal that tinct subclasses by some auto-
a sensor can detect. It can be matic decision process. If prop-
inferred from the sensor res- erty values of two classes over-
ponse curve. For the com-
lap, then the classes are not
mon CCD sensor of video cam-
separable. The circle class is
eras, sensitivity depends on
linearly separable in the figure
various parameters, mainly the
fill factor (the percentage of below, but the × and box
the sensor’s area actually sen- classes are not:
sitive to light) and well capac-
ity (the amount of charge that FEATURE 2
a photosensitive element can
hold). The larger the values
of the above parameters, the
more sensitive the camera. See
also sensor spectral sensitivity.
sensor spectral sensitivity: A
characterization of a sensor’s
response in frequency. For FEATURE 1
example,
separable filter: A 2D (in image
processing) filter that can be
expressed as the product of
two filters, each of which acts
independently on rows and
columns. The classic example
is the linear Gaussian filter
(see Gaussian convolution ).
267
Separability implies a signif- and object reflectance. See
icant reduction in computa- also shadow, photometry.
tional complexity, typically shading from shape: A
reducing processing costs from technique recovering the
ON 2  to O2N , where N is the reflectance of isolated objects
filter size. See also linear filter, given a single image and
separable template. a geometric model, but not
separable template: A template exactly the inverse of the classic
or structuring element in a shape from shading problem.
filter, for instance a morpho- See also photometric stereo.
logical filter (see morphology ), shadow: A part of a scene that
that can be decomposed into direct illumination does not
a sequence of smaller tem- reach because of self-occlusion
plates, similarly to separable (attached shadow or self-
kernels for linear filters. The shadow) or occlusion caused
main advantage is a reduction by other objects (cast shadow).
in the computational complex- Therefore, this region appears
ity of the associated filter. See darker than its surroundings.
also separable filter. See also shape from shading,
set theoretic modeling: See shading from shape, photo-
constructive solid geometry. metric stereo. See:
shading: The pattern formed by
the graded areas of an inten-
sity image, suggesting light ATTACHED
and dark. Variations in the SHADOW
lightness of surfaces in the CAST SHADOW
scene may be due to vartia-
tions in illumination, surface
orientation and surface reflec-
tance. See also illumination, shadow, attached: A shadow
shadow. caused by an object on
itself by self-occlusion. See
shading correction: A class of also shadow, cast.
techniques for changing unde-
sirable shading effects, for shadow, cast: A shadow thrown
instance strongly uneven by an object on another object.
brightness distribution caused See also shadow, attached.
by nonuniform illumination. shadow detection: The prob-
All techniques assume a shad- lem of identifying image
ing model, i.e., a photometric regions corresponding to
model of image formation, for- shadows in the scene, using
malizing the dependency of the photometric properties.
measured image brightness on Useful for true color esti-
camera parameters (typically mation and region analysis.
gain and offset), illumination See also color, color image

268
segmentation, color matching, c t −
< t <
. The vol-
photometry, region segmen- ume inside the unit sphere in
tation. 3D is the shape  x 
x  < 1
shadow type labeling: A prob- x ∈ 3 .
lem similar to shadow shape class: One in a set of
detection, but requiring clas- classes representing different
sification of different types of types of shape in a given clas-
shadows. sification, for instance, “locally
shadow understanding: Esti- convex” or “hyperbolic” in HK
mating various properties segmentation of a range image.
of a 3D scene based on the shape decomposition: See
appearance or size of shadows, segmentation and hierarchical
e.g., building height. See also modeling.
shadow type labeling. shape from contours: A class of
shape: Informally, the form of algorithms for estimating the
an image or scene object. shape of a 3D object from
Typically described in com- the contour it generates in
puter vision through geomet- an image. A well-known tech-
ric representations (see shape nique, shape from silhouettes,
representation ), e.g., model- consists in extracting the
ing image contours with poly- object’s silhouette from a num-
nomials or b-spline, or range ber of views, and intersect-
data patches with quadric ing the 3D cones generated by
surfaces. More formally, defini- the silhouettes’ contours and
tions are: 1. (adj) The quality the centers of projections. The
of an object that is invari- intersection volume is known
ant to changes of the coor- as the visual hull. Work also
dinate system in which it is exists on understanding shape
expressed. If the coordinate from the differential properties
system is Euclidean, this cor- of apparent contours.
responds to the conventional shape from defocus: A class of
idea of shape. In an affine coor- algorithms for estimating scene
dinate system, the change of depth at each image pixel, and
coordinates may be affine, so therefore surface shape, from
that, for example, an ellipse multiple images acquired at
and a circle have the same different, controlled focus set-
shape. 2. (n) A family of point tings. A closed-form model of
sets, any pair being related by a the relation between depth and
coordinate system transforma- image focus is assumed, con-
tion. 3. (n) A specific set of taining a number of parameters
n-dimensional points, e.g., the (e.g., the optics parameters)
set of squares. For example a that must be calibrated. Depth
curve in 2 defined paramet- is estimated using this model
rically as ct = xt yt com- once image readings (pixel val-
prises the point set or shape ues) are available. Notice that

269
the camera uses a large aper- every possible appearance of
ture, so that the points in the the line junctions in space
scene are in focus over the under the given assumptions.
smallest possible depth inter- This figure shows part of a sim-
val. See also shape from focus. ple dictionary of junctions and
a labeled shape:
shape from focus: A class
of algorithms for estimating
scene depth at each image + +
pixel, and therefore surface
shape, by varying the focus +
setting of a camera until the
image achieves optimal focus
(minimum blur) in a neigh-
borhood of the pixel under
+ +
examination. Obviously, pix- + +
+
els corresponding to different
depths would achieve optimal
focus for different settings. A
model of the relation between
depth and image focus is
assumed, containing a num-
ber of parameters (e.g., the where + means planes inter-
optics parameters) that must secting in a convex shape, −
in a concave shape, and
be calibrated. Notice that the
the arrows a discontinuity
camera uses a large aperture,
(occlusion) between surfaces.
so that the smallest possible Each image junction is then
depth interval generates in- assigned the set of all pos-
focus image points. See also sible labels that its shape
shape from defocus. admits locally (e.g., all possi-
shape from line drawings: A ble two-line junction labels for
class of symbolic algorithms a two-line junction). Finally,
inferring 3D properties of a constraint satisfaction algo-
scene objects (as opposed to rithm is used to prune labels
exact shape measurements, as inconsistent with the context.
in other “shape from” meth- See also Waltz’s line labeling,
ods) from line drawings. First, relaxation labeling.
assumptions are made about shape from monocular
the type of line drawings depth cues: A class of
admissible, e.g., polyhedral algorithms estimating shape
objects only, no surface mark- from information related to
ings or shadows, maximum depth detected in a single
three lines forming an image image, i.e., from monocular
junction. Then, a dictionary cues. See shape from contours,
of line junctions is formed, shape from line drawings,
assigning a symbolic label to shape from perspective,

270
shape from shading, shape (photos). The basic constraint
from specularity, shape from is that the underlying shape
structured light, shape from must be “photo-consistent”
texture. with all the input photos,
shape from motion: A vast class i.e., roughly speaking, give rise
of algorithms for estimating to compatible intensity values
3D shape (structure), and in all cameras.
often depth, from the motion shape from photometric stereo:
information contained in an See photometric stereo.
image sequence. Methods exist
shape from polarization: A
that rely on tracking sparse
technique recovering local
sets of image features (for
shape from the polarization
instance, the Tomasi–Kanade
properties of a surface under
factorization ) as well as dense
observation. The basic idea
motion fields, i.e., optical flow,
is to illuminate a surface
seeking to reconstruct dense
with known polarized light,
surfaces. See also motion
estimate the polarization state
factorization.
of the reflected light, then
shape from multiple sensors: use this estimate in a closed-
A class of algorithms recov- form model linking the surface
ering shape from information normals with the measured
collected from a number of polarization parameters. In
sensors of the same type, or of practice, polarization estimates
different types. For the former can be noisy. This method
class, see multi-view stereo, can be useful wherever inten-
For the second class, see sity images do not provide
sensor fusion. information, e.g., featureless
shape from optical flow: See specular surfaces. See also
optical flow. polarization based methods.
shape from orthogonal views: shape from shading: The prob-
See shape from contours. lem of estimating shape, here
in the sense of a field of
shape from perspective: A normals from which a sur-
class of techniques estimating face can be recovered up
depth for various features from to a scale factor, from the
perspective cues, for instance shading pattern (light and
the fact that a translation along shadows) of an image. The
the optical axis of a perspective key idea is that, assum-
camera changes the size of ing a reflectance map for the
the imaged objects. See also scene (typically Lambertian),
pinhole camera model. an image irradiance equa-
shape from photo consis- tion can be written linking
tency: A technique based on the surface normals to the
space carving for recovering illumination direction and the
shape from multiple views image intensity. The constraint
271
can be used to recover the statistical texture and regular
normals assuming local surface texture patterns.
smoothness. shape from X: A generic term
shape from shadows: A tech- for a method that generates
nique for recovering geometry 3D shape or position estimates
from a number of images of an from one of a variety of possi-
outdoor scene acquired at dif- ble techniques, such as stereo,
ferent times, i.e., with the sun shading, focus, etc.
at different angles. Geomet- shape from zoom: The prob-
ric information can be recov- lem of computing shape (in
ered under various assump- the sense of the distance of
tions and knowledge of the each scene point from the sen-
sun’s position. Also called sor) from two or more images
“shape from darkness”. See acquired at different zoom
also shape from shading and settings, achieved through a
photometric stereo. zoom lens. The basic idea is
shape from silhouettes: See to differentiate the projection
shape from contours. equations with respect to the
focal length, f , achieving an
shape from specularity: A class
expression linking the varia-
of algorithms for estimating
tions of f and pixel displace-
local shape from surface spec-
ment with depth.
ularities. A specularity con-
strains the surface normal shape grammar: A grammar
as the incident and reflec- specifying a class of shapes,
tion angles must coincide. The whose rules specify patterns
detection of specularities in for combining more primitive
images is, in itself, a non-trivial shapes. Rules are composed of
problem. two parts, 1) describing a spe-
cific shape and 2) how to replace
shape from structured light: or transform it. Used also in
See structured light triangu- design, CAD, and architecture.
lation. See also production system,
shape from texture: The prob- expert system.
lem of estimating shape, here shape index: A measure,
in the sense of a field of usually indicated by S, of
normals from which a sur- the type of shape of a
face can be recovered up to surface patch in terms of its
a scale factor, from the image principal curvature. Formally
texture. The deformation of a
planar texture recorded in an
image (the texture gradient ) 2 + m
S =− arctan M
depends on the shape of M − m
the surface to which the tex-
ture is applied. Techniques where m and M are the
exist for shape estimation from principal curvatures. S is

272
undetermined for planar instance snakes, deformable
patches. A related parameter, superquadrics, and deformable
R, called curvedness, measures template model ).
the amount of curvedness of
shape texture: The texture of a
the patch:
surface from the point of view
 of the variation in the shape,
 2M + 2m /2 as contrasted to the variation
in the reflectance patterns on
All curvature-based shape the surface. See also surface
classes map to the unit circle roughness characterization.
in the R–S plane, with pla-
nar patches at the ori- sharp–unsharp masking: A
gin. See also mean and form of image enhancement
Gaussiancurvature shapeclassi- that makes the edges of
fication,shaperepresentation. image structures crisper. The
operator can either add a
shape magnitude class: Part weighted amount of a gradient
of a local surface curvature or high-pass filter of the
representation scheme in image or subtract a weighted
which each point has a
amount of a smoothing or
curvature class, and a mag-
low pass filter of the image.
nitude of curvature (shape
magnitude). This representa- The image on the right is an
tion is an alternative to the unsharp masked version of the
more common shape classifi- one on the left:
cation based on either the two
principal curvatures or the
mean and Gaussian curvature.
shape representation: A large
class of techniques seeking to
capture the salient properties
of shapes, both 2D and 3D, for
analysis and comparison pur-
poses. Many representations
have been proposed in the lit-
erature, including skeletons for
2D and 3D shapes (see medial shear transformation: An affine
axis skeletonization and dis- image transformation changing
tance transform ), curvature- one coordinate only. The
based representations (for corresponding transformation
instance, the curvature primal matrix, S, is equal to the
sketch, the curvature scale identity apart from s12 =
space, the extended Gaussian sx , which changes the first
image ), generalized cones for image coordinate. Shear on
articulated objects, invariants, the second image coordinate is
and flexible objects models (for obtained similarly by s21 = sy .
273
An example of the result of a shutter control: The device
shear transformation is: controlling the length of time
that the shutter is open.
side looking radar: A radar pro-
jecting a fan-shaped beam illu-
minating a strip of the scene at
the side of the instrument, typ-
ically used for mapping a large
area. The map is produced as
shock tree: A 2D shape repre- the instrument is carried along
sentation technique based by a vehicle sweeping the sur-
on the singularities (see face to the side. See also sonar.
singularity event ) of the signal coding system: A sys-
radius function along the tem for encoding a signal
medial axis (MA). The MA is into another, typically for com-
represented by a tree with the
pression or security purposes.
same structure, and is divided
See image compression, digital
into continuous segments
of uniform behavior (local watermarking.
maximum, local minimum, signal processing: The col-
constant, monotonic). See lection of mathematical and
also medial axis skeleton- computational tools for the
ization, distance transform. analysis of typically 1D (but
short baseline stereo: See also 2D, 3D, etc.) signals such
narrow baseline stereo. as audio recordings or other
intensity versus time or posi-
shot noise: See impulse noise tion measurements. Digital sig-
and salt-and-pepper noise. nal processing is the subset of
shutter: A device allowing the signal processing which per-
light into a camera for enough tains to signals that are rep-
time to form an image on resented as streams of binary
a photosensitive film or chip. digits.
Shutters can be mechanical,
as in traditional photographic signal-to-noise ratio (SNR): A
cameras, or electronic, as in measure of the relative strength
a digital camera. In the for- of the interesting and uninter-
mer case, a window-like mech- esting (noise) part of a signal.
anism is opened to allow the In signal processing, SNR is usu-
light to be recorded by a pho- ally expressed in decibels as
tosensitive film. In the latter the ratio of the power of signal
case, a CCD or other type of and noise, i.e., 10 log10 PPs . With
n
sensor is triggered electron- statistical noise, the SNR can be
ically to record the amount defined as 10 times the log of
of incident light at each the ratio of the standard devia-
pixel. tions of signal and noise.
274
signature identification: A matching, and geometric model
class of techniques for verifying matching.
a written signature. Also known similarity metric: A metric
as Dynamic Signature Verifica- quantifying the similarity of
tion. An area of biometrics. See two entities. For instance,
also handwriting verification, cross correlation is a com-
handwritten character recogni- mon similarity metric for
tion, fingerprint identification, image regions. For similarity
face identification. metrics on specific objects
signature verification: The encountered in vision, see
problem of authenticating a feature similarity, graph simi-
signature automatically with larity, gray scale similarity. See
image processing techniques; also point similarity measure,
in practice, deciding whether a matching.
signature matches a specimen similarity transformation: A
sufficiently well. See also hand- transformation changing an
writing verification and hand- object into a similar-looking
written character recognition. one; formally, a conformal
silhouette: See object contour. mapping preserving the ratio
of distances (the magnifica-
SIMD: See single instruction tion ratio). The transformation
multiple data. matrix, T, can be written as
similarity: The property that T = B−1 AB, where A and B
makes two entities (images, are similar matrices, that is,
models, objects, features, representing the same trans-
shape, intensity values, etc.) formation after a change of
or sets thereof similar, that is, basis. Examples include rota-
resembling each other. A simi- tion, translation, expansion
and contraction (scaling).
larity transformation creates
perfectly similar structures and simple lens: A lens composed
a similarity metric quantifies by a single piece of refract-
the degree of similarity of ing material, shaped in such a
two possibly non-identical way to achieve the desired lens
structures. Examples of sim- behavior. For example, a con-
ilar structures are 1) two vex focusing lens.
polygons identical except simulated annealing: A coarse-
for a change in size, and 2) to-fine, iterative optimization
two image neighborhoods algorithm. At each iteration,
whose intensity values are a smoothed version of the
identical except for scaling by energy landscape is searched
a multiplicative factor. The and a global minimum located
concept of similarity lies at by a statistical (e.g., random)
the heart of several classic process. The search is then
vision problems, including performed at a finer level of
stereo correspondence, image smoothing, and so on. The idea

275
is to locate the basin of the singular value decomposition
absolute minimum at coarse (SVD): A factorization of any
scales, so that fine-resolution m × n matrix A into A = UDV T .
search starts from an approx- The columns of the m × m
imate solution close enough matrix U are mutually orthog-
to the absolute minimum to onal unit vectors, as are the
avoid falling into surround- columns of the n × n matrix V.
ing local minima. The name The m × n matrix D is diago-
derives from the homony- nal, and its nonzero elements,
mous procedure for tempering the singular values i , satisfy
metal, in which temperature is 1 ≥ 2 ≥    ≥ n ≥ 0. The
lowered in stages, each time SVD has extremely useful prop-
allowing the material to reach erties. For example:
thermal equilibrium. See also
coarse-to-fine processing. • A is nonsingular if and only
single instruction multiple if all its singular values are
data (SIMD): A computer nonzero, and the number of
architecture allowing the nonzero singular values gives
same instruction to be simulta- the rank of A;
neously executed on multiple • the columns of U corre-
processors and thus differ- sponding to the nonzero sin-
ent portions of the data set gular values span the range
(e.g., different pixels or of A; the columns of V cor-
image neighborhoods). Useful responding to the nonzero
for a variety of low-level singular values span the null
image processing operations. space of A;
See also MIMD, pipeline • the squares of the nonzero
parallelism, data parallelism, singular values are the
parallel processing. nonzero eigenvalues of both
AA T and A T A, and the
single photon emission com- columns of U are eigenvectors
puted tomography (SPECT): of AA T , those of V of A T A.
A medical imaging technique
that involves the rotation of a Moreover, the pseudoinverse
photon detector array around of a matrix, occurring in the
the body in order to detect solution of rectangular linear
photons emitted by the decay systems, can be easily com-
of previously injected radionu- puted from the SVD definition.
clides. This technique is par-
singularity event: A point in the
ticularly useful for creating
domain of the map of a geo-
a volumetric image showing
metric curve or surface where
metabolic activity. Resolution
is lower than PET but imag- the first derivatives vanish.
ing is cheaper and some SPECT sinusoidal projection: A fam-
radiopharmaceuticals may be ily of linear image transforms,
used where PET nuclides can- C , the rows of which are
not. the eigenvalues of a special
276
symmetric tridiagonal matrix. centers of circles bitangent
This includes the discrete to the object boundary and
cosine transform (DCT ). smoothed local symmetries.
skeleton: A curve, or tree-like skew: An error introduced in the
set of curves, capturing the imaging geometry by a non-
basic structure of an object. orthogonal pixel grid, in which
This figure shows an exam- rows and columns of pixels do
ple of a linear skeleton for a not form an angle of exactly
puppet-like 2D shape: 90 . This is usually consid-
ered only in high-accuracy pho-
togrammetry applications.
skew correction: A transforma-
tion compensating for the skew
error.
skew symmetry: A skew sym-
metric contour is a planar con-
tour such that every straight
line oriented at an angle 
with respect to a particular
axis, called the skew symme-
try axis of the contour, inter-
sects the contour at two points
equidistant from the axis. An
The curves forming the example:
skeleton are typically cen-
tral to the shape. Several
algorithms exist for comput- AXIS
ing skeletons, for instance,
the medial axis transform
(see medial axis skeletoniza-
tion ) and the distance trans-
form, for which the grassfire d θ
algorithm can be applied. d
skeleton by influence zones
(SKIZ): Commonly known as
the Voronoi diagram.
skin color analysis: A set of
skeletonization: A class of tech- techniques for color analysis
niques that try to reduce a applied to images contain-
2D (or 3D) binary image to ing skin, for instance for
a “skeleton” form in which retrieving images from a
every remaining pixel is a database (see color based
skeleton pixel, but the essen- image retrieval ). See also
tial shape of the input image color, color image, color image
is captured. Definitions of the segmentation, color matching,
skeleton include the set of and colorimetry.

277
SKIZ: See skeleton by influence translation and rotation (up to
zones. a shift of the density function).
slant: The angle between a small motion model: A class
surface normal in the scene of mathematical models
and the viewing direction: representing very small
(ideally, infinitesimal) camera-
scene motion between
SURFACE
NORMAL
frames. Used typically in
shape from motion. See also
DIRECTION optical flow.
OF VIEW SLANT
ANGLE smart camera: A hardware
device incorporating a camera
and an on-board computer in
a single, small container, thus
achieving a programmable
vision system within the size
of a normal video camera.
See also tilt, shape from texture.
smooth motion curve: The
slant normalization: A class curve defined by a motion
of algorithms used in hand- that can be expressed by
written character recognition, smooth (that is, differentiable:
transforming slanted cursive derivatives of all orders exist)
character into vertical ones. See parametric functions of the
handwritten character recog- image coordinates. Notice that
nition, optical character recog- “smooth” is often used in
nition. an intuitive sense, not in
slice based reconstruction: the strict mathematical sense
The reconstruction of a 3D above (clearly, an exacting
object from a number of constraint), as, for example,
planar slices, or sections taken in image smoothing. See also
across the object. The slice motion, motion analysis.
plane is typically advanced smoothed local symmetries: A
at regular spatial intervals class of skeletonization algo-
to sweep the working vol- rithms, associated with Asada
ume. See also tomography, and Brady. Given a 2D curve
computerized tomography, that bounds a closed region
single photon emission, in the plane, the skeleton
computed tomography, and as computed by smoothed
nuclear magneticresonance. local symmetries is the locus
slope density function: This is of chord midpoints of bitan-
the histogram of the tangential gent circles. Compare the
orientations (slopes) of a curve symmetric axis transform. Two
or region boundary. It can be skeleton points as defined by
used to represent the curve smoothed local symmetries are
shape in a manner invariant to shown:
278
smoothness constraint: An
additional constraint used in
data interpretation problems.
The general principle is that
results derived from nearby
data must themselves have
similar values. Traditional
examples of where the smooth-
ness constraint can be applied
are in shape from shading and
smoothing: Generally, any optical flow. The underlying
modification of a signal observation that supports this
intended to remove the computational constraint is
effects of noise. Often used that the observed real world
to mean the attenuation surfaces and motions are
of high spatial frequency smooth almost everywhere.
components of a signal. As snake: A snake is the combi-
many models of noise have nation of a deformable model
a flat power spectral density and an algorithm for fitting
(PSD), while natural images that model to image data. In
have a PSD that decays toward one common embodiment, the
zero at high spatial frequen- model is a parameterized 2D
cies, suppressing the high curve, for example a b-spline
frequencies increases the over- parameterized by its control
all signal-to-noise ratio of the points. Image data, which
image. See also discontinuity might be a gradient image or
preserving smoothing, aniso- 2D points, induces forces on
tropic diffusion and adap- points on the snake that are
tive smoothing. translated to forces on the
smoothing filter: Smoothing is control points or parameters.
often achieved by convolution An iterative algorithm adjusts
of the image with a smooth- the control points according to
ing filter to reduce noise or these forces and recomputes
high spatial frequency detail. the forces. Stopping criteria,
Such filters include discrete step lengths, and other issues
approximations to the symmet- of optimization are all issues
ric probability densities such that must be dealt with in an
as the Gaussian, binomial effective snake.
and uniform distributions. For SNR: See signal-to- noise ratio.
example, in 1D, the dis-
crete signal x1    xn is con- Sobel edge detector: An edge
detector based on the Sobel
volved with the kernel  16 64 16  to
kernels. The edge magnitude
produce the smoothed signal
image E is the square root
y1    yn+2 in which yi = 16 xi−1 +
of the sum of squares of
4
x + 16 xi+1 .
6 i the convolution of the image
279
with horizontal and vertical soft vertex: A point on a
Sobel
 kernels, given by E = polyline whose connecting line
Kx ∗ I 2 + Ky ∗ I 2 . The Sobel segments are almost collinear.
Soft vertices may arise from
operator applied to the left segmentation of a smooth
image gives the right image: curve into line segments. They
are called ‘soft’ because they
may be removed if the seg-
ments of the polyline are
replaced by curve segments.
solid angle: Solid angle is a
property of a 3D object: the
amount of the unit sphere’s
surface that the object’s projec-
Sobel gradient operator: See tion onto the unit sphere occu-
Sobel kernel. pies. The unit sphere’s surface
Sobel kernel: A gradient esti- area is 4 , so the maximum
mation kernel used for edge value of a solid angle is 4
detection. The horizontal ker- steradians:
nel is the convolution of a
smoothing filter, s = 1 2 1 SOLID ANGLE
in the horizontal direction
and a gradient operator d =
−1 0 1 in the vertical direc-
tion. The kernel
 
−1 −2 −1
Ky = s ∗ d  = 0 0 0
1 2 1
highlights horizontal edges.
The vertical kernel Kx is the
transpose of Ky .
soft mathematical morphol-
ogy: An extension of gray
scale morphology in which
the min/max operations are
replaced by other rank opera- source: An emitter of energy
tions e.g., replace each pixel that illuminate the vision sys-
in an image by the 90th per- tem’s sensors.
centile value in a 5 × 5 window source geometry: See light
centered at the pixel. Weighted source geometry.
ranks may be computed. See source image: The image on
also fuzzy morphology. which an image processing or
soft morphology: See soft an image analysis operation is
mathematical morphology. based.

280
Source Image Target Image uniformly sampling the pro-
jected image data. For exam-
ple, a log-polar sensor has
rings of pixels of exponentially
increasing size as one moves
radially from the central point.
spatial angle: The area on a
unit sphere that is bounded
by a cone with its apex in
source placement: See light the center of the sphere. Mea-
source placement. sured in steradians. This is
space carving: A method for cre- frequently used when analyz-
ating a 3D volumetric model ing luminance.
from 2D images. Starting from
a voxel representation in which
a 3D cube is marked “occu-
pied”, voxels are removed if
they fail to be photo-consistent
in the set of 2D images in
Spatial Angle
which they appear. The order
in which the voxels are pro-
cessed is a key aspect of space
carving, as it allows otherwise
intractable visibility computa-
tions to be avoided.
space curve: A curve that may
follow a path in 3D space spatial averaging: The pixels in
(i.e., it is not restricted to lying the output image are weighted
in a plane). averages of their neighboring
pixels in the input image. Mean
space variant sensor: A sensor and Gaussian smoothing are
in which the pixels are not examples of spatial averaging.
spatial domain smoothing: An
implementation of smoothing
in which each pixel is replaced
by a value that is directly
computed from other pixels
in the image. In contrast,
frequency domain smoothing
first processes all pixels
to create a linear trans-
formation of the image,
such as a Fourier transform
and expresses the smoothing
operation in terms of the
transformed image.
281
spatial frequency: The rate of search and storage of geo-
repetition of intensities across metric quantities. For exam-
an image. In a 2D image the ple closest-point queries are
space to which spatial refers is made more efficient by the
the image’s X –Y plane. computation of spatial indices
such as the Voronoi diagram,
distance transform, k-D trees,
or Binary Space Partitioning
(BSP) trees.
spatial matched filter: See
matched filter.
spatial occupancy: A form of
object or scene representation
in which a 3D space is divided
into a grid of voxels. Voxels
containing a part of the object
are marked as being occupied
and other voxels are marked as
This image has significant free space. This representation
is particularly useful for tasks
repetition at a spatial fre- where properties of the object
quency of 10 1
pixel−1 in are less important than simply
the horizontal direction. The the presence and position of the
2D Fourier transform repre- object, as in robot navigation.
sents spatial frequency con-
tributions in all directions, spatial proximity: The distance
at all frequencies. A dis- between two structures in real
crete approximation is effi- space (as contrasted with prox-
ciently computed using the imity in a feature or property
fast Fourier transform (FFT). space).
spatial hashing: See spatial spatial quantization: The con-
version of a signal defined on
indexing.
an infinite domain to a finite
spatial indexing: 1) Conversion set of limited-precision sam-
of a shape to a number, so that ples. For example the func-
it may be quickly compared tion fx y: 2 →  might be
to other shapes. Intimately quantized to the image g, of
linked with the computation width w and height h defined
of invariants to spatial trans- as gi j: 1w × 1h → .
formations and imaging distor- The value of a particular
tions of the shape. For exam- sample gi j is determined
ple, a shape represented as by the point-spread function
a collection of 2D boundary px y, and is given by gi j =
points might be indexed by px − i y − jfx ydxdy.
its compactness. 2) The design spatial reasoning: Inference
of efficient data structures for from geometric rather than

282
symbolic or linguistic infor- case, the constrained motion
mation. See also geometric simplifies the general prob-
reasoning. lem, yielding one or more
spatial relation: An association of: closed-form solutions,
of two or more spatial entities, greater efficiency, increased
expressing the way in which accuracy. Similar benefits can
such entities are connected be obtained from approxima-
or related. Examples include tions such as the affine camera
perpendicularity or parallelism and weak perspective.
of lines or planes, and inclu- speckle: A pattern of light and
sion of one image region in dark spots superimposed on
another. the image of a scene that is illu-
spatial resolution: The small- minated by coherent light such
est separation between dis- as from a laser. Rough surfaces
tinct signal features that can in the scene change the path
be measured by a sensor. For lengths and thus the interfer-
a CCD camera, this is dic- ence effects of different rays, so
tated by the distance between a fixed scene, laser and imager
adjacent pixel centers. It is configuration results in a fixed
often specified as an angle: speckle pattern on the imaging
the angle between the 3D surface.
rays corresponding to adja-
cent pixels. The inverse of the Imaging surface
Laser source
highest spatial frequency that a (e.g. CCD array)
sensor can represent without Rough Beam interference
aliasing. surface gives light /dark spot

spatio-temporal analysis: The


analysis of moving images speckle reduction: Restoration
by processing that operates of images corrupted with
on the 3D volume formed speckle noise, such as laser or
by the stack of 2D images ultrasound images.
in a sequence. Examples
include kinetic occlusion, the SPECT: See single-photon emis-
epipolar plane image (EPI) sion computed tomography.
and spatio-temporal autore- spectral analysis: 1) Analysis
gressive models (STAR). performed in either the spatial,
special case motion: A sub- temporal or electromagnetic
problem of the general frequency domain. 2) Gen-
structure from motion prob- erally, any analysis that
lem, where the camera motion involves the examination of
is known to be constrained eigenvalues. This is a nebulous
a priori. Examples include concept, and consequently
planar motion, turntable the number of “spectral
motion or single-axis rotation, techniques” is large. Often
and pure translation. In each equivalent to PCA.

283
spectral decomposition specular reflection: Mirror-like
method: See spectral analysis. reflection or highlight. Formed
spectral density function: See when a light source at 3D loca-
power spectrum. tion L, surface point P, sur-
face normal N at that point and
spectral distribution: The camera center C are all copla-
spatial power spectrum or elec- nar, and the angles LPN and
tromagnetic spectrum distri- NPC are equal.
bution.
spectral factorization: A meth- Camera C
od for designing linear filters Light source
based on difference equa- Surface
L normal
tions that have a given
spectral density function when N
applied to white noise. P
Surface
spectral filtering: Modifying the
light before it enters the sen-
sor by using a filter tuned to specularity: See specular
different spectral frequencies. reflection.
A common use is with laser sphere: 1. A surface in any
sensing, in which the filter is dimension defined by the x
chosen to pass only light at such that  x − c = r for a
the laser’s frequency. Another center c and radius r . 2. The
usage is to eliminate ambi-
volume of space bounded by
ent infrared light in order to
increase the sharpness of an the above, or x such that
image (as most silicon- based x − c ≤ r .

sensors are also sensitive to spherical: Having the shape of,
infrared light). characteristics of, or associa-
spectral frequency: Electro- tions with, a sphere.
magnetic or spatial frequency. spherical harmonic: A function
spectral reflectance: See defined on the unit sphere of
reflectance. the form
spectral response: The Ylm   = lm Plm cos e im
response R of an imag-
ing sensor illuminated by is a spherical harmonic, where
monochromatic light of wave- lm is a normalizing factor, and
length  is the product of Plm is a Legendre polynomial.
the input light intensity I and Any real function defined on
the spectral response at that the sphere f  has an expan-
wavelength s, so R = Is. sion in terms of the spherical
spectrum: A range of values harmonics of the form
such as the electromagnetic 


l
spectrum. f  = m m
l Yl  
l=0 m=−l

284
that is analogous to the of pose and 2) it avoids
Fourier expansion of a func- ambiguities of representation
tion defined on the plane, that can occur with nearly flat
with the m l analogous to the surfaces.
Fourier coefficients. Polar plots splash: An invariant representa-
of the first ten spherical har- tion of the region about a 3D
monics, for m = 0   2 l = point. It gives a local shape rep-
0   m. The plots show r = 1 + resentation useful for position
Ylm   in polar coordinates: invariant object recognition.
spline: 1) A curve ct defined
as a weighted  sum of control
points: ct = ni=0 wi tpi , where
the control points are p1    n
and one weighting (or “blend-
ing”) function wi is defined for
each control point. The curve
may interpolate the control
points or approximate them.
The construction of the spline
offers guarantees of continuity
and smoothness. With uniform
splines the weighting functions
for each point are translated
spherical mirror: Sometimes copies of each other, so wi t =
used in catadioptric cameras. w0 t − i. The form of w0 deter-
A mirror whose shape is a mines the type of spline: for
portion of a sphere. B-splines and Bezier curves,
w0 t is a polynomial (typi-
spin image: A local surface cally cubic) in t. Nonuniform
representation of Johnson splines reparameterize the t
and Hebert. At selected axis, ct = cut where ut
points p with surface normal maps the integers k = 0n to
n , all other surface points knot points t0n with linear
x can be represented in interpolation for non-integer
a 2D basis as   = values of t. Rational splines
  x − p 2 −  x − p 2 
n ·  ·
n with n-D control points are
x − p . The spin image is the
 perspective projections of nor-
histogram of all of the   mal splines with n + 1-D con-
values for the surface. Each trol points. 2) Tensor-product
selected points p leads to a splines define a 3D surface
different spin image. Matching x u v as a product of splines
points compares their spin in u and v.
images by correlation. Key spline smoothing: Smoothing
advantages of the representa- of a discretely sampled signal
tion are 1) it is independent xt by replacing the value at ti

285
by the value predicted at that
point by a spline x̂t fitted to SPURS
neighboring values.
split and merge: A two-stage
procedure for segmentation or
clustering. The data is divided
into subsets, with the initial
division being a single set con-
taining all the data. In the split
stage, subsets are repeatedly
subdivided depending on the
extent to which they fail to sat-
isfy a coherence criterion (for
example, similarity of pixel col- squared error clustering:
ors). In the merge stage, pairs A class of clustering algo-
of adjacent sets are found that, rithms that attempt to find
when merged, will again sat- cluster centers c1    cn that
isfy a coherence criterion. Even minimize
 the squared error
if the coherence criteria are x − ci 2 where
x ∈ mini∈1    n 
the same for both stages, the  is the set of points to be
merge stage may still find sub- clustered.
sets to merge. stadimetry: The computation of
SPOT: Systeme Probatoire de distance to an object of known
l’Observation de la Terre. A size based on its apparent size
series of satellites launched in the camera’s field of view.
by France that are a common stationary camera: A camera
source of satellite images of whose optical center does
the earth. SPOT-5 for example not move. The camera may
was launched in May 2002 and pan, tilt and rotate about its
provides complete coverage of optical center, but not trans-
the earth every 26 days. late. Images taken by a station-
spot detection: An image pro- ary camera are always related
cessing operation for locat- by a planar homography. Also
ing small bright or dark known as a rotating camera
locations against contrasting or non-translating camera. The
backgrounds. The issues here term may also refer to a camera
are what size of spot and that does not move at all.
amount of contrast. statistical classifier: A function
spur: A short segment attached mapping from a space of input
to a more significant line or data to a set of labels. Input
edge. Spurs often arise when data are points x ∈ n and
linear structures are tracked labels are scalars. The classifier
through noisy data, such as by x  = l assigns the label l to
c
an edge detector. This figure point x . The classifier is typi-
shows some spurs: cally a parameterized function,

286
such as a neural network (with for which the response at
weights as parameters) or any arbitrary value of  may
a support vector machine. The be computed as a function
classifier parameters could be of a small number of basis
set by optimizing performance responses, thus saving com-
on a training set of known putation. For example, the

x  l  pairs or by a self- directional derivative at orien-
organizing learning algorithm. tation  may be computed in
terms of the x and y derivatives
statistical pattern recogni-
Ix and Iy as
tion: Pattern recognition that  
depends on classification rules dI cos I
learned from examples rather = sin I x
dn y
than constructed by designers.
Compare structural pattern For non-steerable filters such
recognition. as Gabor filters, the response
statistical shape model: A must be computed at each
parameterized shape model orientation, leading to higher
where the parameters are computational complexity.
assumed to be random vari- steganography: Concealing of
ables drawn from a known information in non-suspect
probability distribution. The “carrier” data. For example,
distribution is learned from encoding information in the
training examples. Examples low-order bits of a digital
include point distribution image.
models. step edge: 1) A discontinuity
statistical texture: A texture in image intensity (compare
whose description is in terms with fold edge). 2) An ideal-
of the statistics of image neigh- ized model of a step-change in
borhoods. General examples intensity. This plot of intensity
are co-occurrence statistics of I versus X position shows an
pairs of neighboring pixels, intensity step edge discontinu-
Fourier texture descriptors, ity in intensity I at a:
autocorrelation and autore-
gressive models. A specific I
example is the statistics of
the distribution of entries in
5 × 5 neighborhoods. These
statistics may be learned from
a set of training images or
automatically discovered via
clustering. X
steerable filter: A filter applied a
to a 2D image, whose response
is dependent on an scalar steradian: The unit of solid
“orientation” parameter , but angle.

287
stereo: General term for a class dence problem is to deter-
of problems in which multi- mine which pairs of image
ple images of the same scene image points are correspon-
are used to recover a 3D prop- dences. Unfortunately, match-
erty such as surface shape, ing features or image neigh-
orientation or curvature. In borhoods is usually ambigu-
binocular stereo, two images ous, leading to both massive
are taken from different view- amounts of computation and
points allowing the com- many alternative solutions. To
putation of 3D structure. reduce the space of matches,
In trifocal, trinocular and corresponding points are usu-
multiple-view stereo, three or ally required to satisfy some
more images are available. In constraints, such as having sim-
photometric stereo, the view- ilar orientation and contrast,
point is the same, but light- local smoothness, uniqueness
ing conditions are varied in of match. A powerful con-
order to compute surface ori- straint is the epipolar con-
entation. straint: from a single view,
stereo camera calibration: an image point is constrained
The computation of intrinsic to lie on a 3D ray, whose
and extrinsic camera param- projection onto the second
eters for a pair of cameras. image is an epipolar curve. For
Important extrinsic variables pinhole cameras, the epipolar
are relative orientation: the curve is a line. This greatly
rotation and translation relat- reduces the space of potential
ing the two cameras. Achieved matches.
in several ways: 1) conven- stereo convergence: The angle
tional calibration of each  between the optical axes of
camera independently; 2) com- two sensors in a stereo config-
putation of the essential matrix uration:
or fundamental matrix relating
the pair, from which relative
orientation may be computed
along with one or two intrinsic
parameters; 3) for a rigid
stereo rig, moving the rig and α
capturing multiple image pairs.
stereo correspondence prob-
lem: The key to recovering
depth from stereo is to iden-
tify 2D image points that are stereo fusion: The ability of the
projections of the same 3D human vision system, when
scene point. Pairs of such presented with a pair of stereo
image points are called cor- images, one to each eye inde-
respondences. The correspon- pendently, to form a consistent

288
3D interpretation of the scene, analogous problem in multiple
essentially solving the stereo views.
correspondence problem. The stereo vision: The ability to
fact that humans can per- determine three dimensional
form fusion even on random structure using two eyes. See
dot stereograms means that also stereo.
high-level recognition is not
required to solve all stereo cor- stimulus: 1) Any object or event
respondence problems. that a computer vision system
may detect. 2) The perceived
stereo image rectification: radiant energy itself.
For a pair of images taken by
pinhole cameras, points in stochastic gradient: An opti-
stereo correspondence lie on mization algorithm for mini-
corresponding epipolar lines. mizing a convex cost function.
Stereo image rectification stochastic completion field: A
resamples the 2D images strategy for algorithmic discov-
to create two new images, ery of illusory contours.
with the same number of
rows, so that points on cor- stochastic process: A process
responding epipolar lines lie whose next state depends
on corresponding rows. This probabilistically on its current
reduces computation for some state.
stereo algorithms, although stratification: A class of solu-
certain relative orientations tions to self-calibration
(e.g., translation along the in which a projective recon-
optical axis) make rectification struction is first converted to
difficult to achieve. an affine reconstruction (by
stereo matching: See stereo computing the plane at infin-
correspondence problem. ity) and then to a Euclidean
reconstruction.
stereo triangulation: Determin-
ing the 3D position of a streaming video: Video presen-
point given its 2D positions ted as a sequence of images
in each of two images taken or frames. An algorithm pro-
by cameras in known posi- cessing such video cannot eas-
tions. In the noise-free case, ily select a particular frame.
each 2D point defines a 3D stripe ranging: See structured
ray by back projection, and light triangulation.
the 3D point is at the inter-
section of the two rays. With strobe duration: The time for
noisy data, the optimal triangu- which a strobe light is illumi-
lation is computed by finding nated.
the 3D point that maximizes strobed light: A light that is
the probability that the two illuminated for a very short
imaged points are noisy projec- period, generally at high
tions thereof. Also used for the intensity.
289
structural pattern recognition: structure from optical flow:
Pattern recognition where clas- Recovery of camera motion
sification is achieved using by computing optical flow con-
high-level rules or patterns, strained by the infinitesimal
often specified by a human motion fundamental matrix.
designer. See also syntactic The small motion approxi-
pattern recognition. mation replaces the rotation
structural texture: A texture matrix R by I − 
 × where  is
that is formed by the regular the axis of rotation, the unique
repetition of a primitive struc- vector such that R  = .

ture, for example an image of structure matching: See recog-
bricks or windows. nition by components.
structure and motion recovery: structured light: A class of tech-
The simultaneous computa- niques where carefully engi-
tion of 3D scene structure and neered illumination is employ-
3D camera positions from a ed to simplify computation
sequence of images of a scene. of scene properties. Common
Common strategies depend
examples include structured
on tracking of 2D image
entities (e.g., interest points light triangulation, and moiré
or edges) through multiple fringe sensing.
views and thus obtaining con- structured light source calibra-
straints on the 3D entities tion: The special case of cal-
(e.g., points and lines) and ibration in a structured light
camera motion. Constraints system where the position of
are embodied in entities such the light source is determined.
as the fundamental matrix and
trifocal tensor that may be esti- structured light triangulation:
mated from image data alone, Recovery of 3D structure by
and then allow computation computing the intersection of
of the 3D camera positions.
Recovery is up to certain equiv- ILLUMINATED
alence classes of scenes, where SURFACE
any member of the class may
generate the observed data,
such as projective or affine
reconstructions.
structure factorization: See z
motion factorization.
α β
structure from motion: Recov-
ery of the 3D shape of a set
of scene points from their mo- D
tion. For a more modern
treatment, see structure and LEFT RIGHT
motion recovery. SENSOR LASER

290
a ray (or plane or other light A and B, the subgraph iso-
shape) of light with the ray morphism problem is to enu-
determined by the image of merate all pairs of subgraphs
the illuminated scene surface: a b where: a ⊂ A; b ⊂ B;
a is isomorphic to b; and
structured model: See hierarc-
some given predicate pa b
hical model.
is true. Appropriate modifica-
structuring element: The tions of the problem allow
basic neighborhood struc- the solution of many graph
ture of morphological image problems including determin-
processing. The structuring ing shortest paths and find-
element is an image, typically ing maximal cliques. A given
small, that defines a shape graph has a number of sub-
pattern. Morphological oper- graphs exponential in the num-
ations on a source image ber of vertices and the general
combine the structuring ele- problem is NP-complete. This
ment with the source image in example shows subgraph iso-
various ways. morphism with the matching
graph being A:b-C:a-B:c:
subband coding: A means of
coding a discrete signal for
transmission. The signal is A b
passed through a set of B
c
bandpass filters, and each
channel is quantized sepa- C D a
rately. The sampling rate of
the individual channels is set
such that, before quantization, subjective contour: An edge
the sum of the number of per- perceived by humans in an
channel samples is the same image due to Gestalt com-
as the number of samples of pletion, particularly when no
the original system. By varying image evidence is present.
the quantization for different
bands, the number of samples
may be reduced with small
losses in signal quality.
subcomponent: An object part
used in a hierarchical model.
subcomponent decomposi-
tion: Representation of a
complete object part by a In this example, the triangle
collection of smaller objects in that appears to float above the
a hierarchical model. black discs is bounded partially
subgraph isomorphism: Equiv- by a subjective contour.
alence of a pair of subgraphs of subpixel edge detection: Esti-
two given graphs. Given graphs mation of the location of an
291
image edge by subpixel inter- new image whose pixel values
polation of the gradient opera- are more widely sampling the
tor response, to give a position original image (e.g., every
more accurately than an inte- third pixel). Interpolation
ger pixel value. can produce more accurate
subpixel interpolation: A class samples To avoid aliasing,
of techniques that essen- spatial frequencies higher than
tially interpolate the position the Nyquist limit of the coarse
of local maxima in images grid should be removed by
to positions at a resolution low pass filtering the image.
smaller than integer pixel Also known as downsampling.
coordinates. Examples include subspace learning: A subspace
subpixel edge detection and method where the subspace
interest point detection. A rule is learned from a number of
of thumb is 0.1 pixel accuracy observations.
is often possible. If the input
is an image zx y containing subspace method: A general
the response of some kernel term describing methods that
to a source image, a typical convert a vector space into a
approach might be as follows. lower dimensional subspace,
e.g., projecting a set of N
dimensional vectors onto their
1. Identify a local maximum first two principal components
where zx y ≥ za b where to produce a 2D subspace.
a b ∈ neighborhoodx y.
2. Fit the quadratic surface z = subtractive color: The way in
ai 2 +bij +cj 2 +di +ej +f to the which color appears due to
set of samples i j zx + i y + the attenuation/absorption of
j in a neighborhood about frequencies of light by mate-
x y. rials (e.g., we perceive that
3. Compute the position of something is red it is because
the local maximum of the it is attenuating/absorbing all
quadratic surface wavelengths other than those
   −1   corresponding to red). See also
i 2a b d additive color.
j = − b 2c e superellipse: A class of 2D
4. If − 12 < i j < 12 then report a curves, including the ellipses
maximum at subpixel location and Lamé curves as special
x + i y + j. cases. The general form of the
superellipse is
x
 y

Similar strategies apply when + =1
computing the subpixel loca- a b
tion of edges. although several alternative
subsampling: Reducing the size forms exist. Fitting superel-
of an image by producing a lipses to data is difficult due to

292
the strongly nonlinear depen- though some success has been
dence of the shape on the achieved. Two examples of
parameters  and . superquadrics, both with  = 2:

The above shows examples of


two superellipses. The convex
superellipse has  =  = 3, the The convex superquadric has
concave example has  =  = 12 .  =  = 3, the concave example
supergrid: An image represen- has  =  = 12 .
tation that is larger than the superresolution: Generation of
original image and represents a high-resolution image from
explicitly both the image points a collection of low-resolution
and the crack edges between images of the same object taken
them. from different viewpoints. The
key to successful superresolu-
Pixels tion is in the accurate estimation
of the registration between
Crack viewpoints.
Edges supervised classification: See
classification.
supervised learning: A method
superpixel: A superpixel is a for training a neural network
pixel in a high resolution where the network is pre-
image. An anti-aliasing com- sented (in a training phase)
puter graphics technique pro- with a series of patterns and
duces lower resolution image their correct classification. See
data by a weighted sum of the also unsupervised learning.
superpixels. support vector machine: A
superquadric: A 3D generaliza- statistical classifier assigning
tion of the superellipse, the labels l to points x in n . The
solution set of support vector machine has
x
 y
 z
 two defining characteristics.
+ + =1 Firstly, the classifier places
a b c the decision surface that
As with superellipses, fitting separates points xi and xj that
to 3D data is non-trivial, al- have different labels li = lj in

293
such a way as to maximize surface. The surface is the set S
the margin between them. of points on it, defined over a
Roughly speaking, the decision domain D:
surface is as far as possible
from any x . Secondly, the S = 
x u v u v ∈ D ⊂ 2
classifier operates not on the
raw feature vectors x , but on surface area: Given a paramet-
high dimensional projections ric surface S =  x u v u v ∈
f 
x   n → N , N > n. How- D ⊂ 2 , with unit tangent vec-
ever, because the classifier only tors xu u v and xv u v, the
ever requires dot products area of the surface is
such as f  x  · f  y , we never
form f explicitly, but specify xu u v × xv u v dudv

S
instead the kernel function
surface boundary representa-
K x  y  = f 
x  · f  y . Wherever
tion: A method of defining
the dot product between
surface models in computer
high-dimensional vectors is
required, the kernel function graphics. It defines a 3D
is used instead. object as a collection of sur-
faces with boundaries. The
support vector regression: A model topology states which
range of techniques for func- surfaces are connected, and
tion estimation that attempts which boundaries are shared
to determine a function to between patches.
model data while ensuring that
the function does not deviate
from any data sample by more e 3 f
than a certain amount. See also
support vector machine. 2 B
4
surface: A surface in general b
5
1 c
parlance is a 2D shape that C
is located in 3D. Mathemat- g
ically, it is a 2D subset of
3 that is almost everywhere 8 A 6
locally topologically equivalent 7
to the open unit ball in 2 .
This means that a cloud of
a 9 d
points is not a surface, but
the surface may have cusps
or boundaries. A parameteriza- The B-rep model of these three
tion of the surface is a function faces comprises: 1) the faces
from 2 to 3 that defines A,B,C along with the parameters
the 3D surface point x u v as of their 3D surfaces; the edges
a function of 2D parameters 1–9 with 3D curve descriptions;
u v. Restricting u v to sub- and vertices a–g; 2) connec-
sets of 2 yields a subset of the tivities of these entities, for
294
example face B is bounded by or translated in 3D space). The
curves 1–4, curve 1 is bounded shape is specified by the sur-
by vertices b and c. face’s principal curvatures at
surface class: Koenderink’s each point. To compute the
classification of local surface principal curvatures, we need
shape into classes based two pieces of machinery, called
on two functions of the the first and second funda-
principal curvatures: mental forms. In the differen-
tial geometry of surfaces, the
• The shape index S = first fundamental form encap-
2

tan−1 1 +

2 sulates arc-length of curves in a
1 2
• The
 curvedness C = surface. If the surface is defined
in parametric form by a smooth
1
 + 2 
2 1 function x u v, the surface’s
where 1 and 2 are the prin- tangent vectors at u v are
cipal curvatures. The surface given by the partial deriva-
classes are planar (C = 0), tives xu u v and xv u v. From
hyperboloid ( S < 38 ) or ellip- these, we define the dot prod-
ucts Eu v = xu · xu , Fu v =
soid ( S > 58 ) and cylinder ( 38 <
xu · xv , Gu v = xv · xv . Then
S < 85 ), subdivided into con- arclength along a curve in the
cave (S < 0) and convex (S > 0). surface is given by the first
Alternative classification sys- fundamental form ds 2 = Edu2 +
tems exist based on the mean 2Fdudv + Gdv 2 . The matrix of
and Gaussian curvature or the the first fundamental form is
principal curvatures. The for- the 2 × 2 matrix
mer distinguishes more classes  
of hyperboloid surfaces. EF
I= F G
surface continuity: Mathemat-
ically, surface continuity is The second fundamental form
defined at a single point param- encapsulates the curvature
eterized by u v on the surface information. The second par-
x u v u v ∈ D ⊂ 2 . The
 tial derivatives are xuu u v etc.
surface is continuous at that The surface normal at u v is
point if infinitesimal motions the unit vector n  u v along
in any direction away from xu × xv . Then the matrix of the
u v can never cause a sudden second fundamental form at
change in the value of x . The u v is the 2 × 2 matrix
surface is everywhere continu-
 
ous, or just continuous if it is x · n  xuv · n

continuous at all points in D. II = uu 
 xvv · n
xvu · n 
surface curvature: Surface cur-
vature measures the shape of a If d = du dv is a direction
3D surface (the characteristics in the tangent space (so its 3D
of the surface that are con-  = du
direction is t d xu + dv xv ),
stant if the surface is rotated then the normal curvature in

295
the direction d is given by d
 = For example, given a set of
d  IId n sampled data points S =
. The minima and maxima  p1   pn , one might wish to
d  Id
of as d varies at a point u v generate other points in 3
are the principal curvatures at that lie on a smooth sur-
the point, given by the gen- face that passes through all
eralized eigenvalues of II z = the points in S. Techniques
I z , i.e., the solutions to the include radial basis functions,
quadratic equation in given by splines, natural neighbor inter-
detII − I = 0. polation.
surface discontinuity: A dis- surface matching: Identifying
continuity is a point at which corresponding points on two
the surface, or its normal vec- 3D surfaces, often as a precur-
tor, is not continuous. These sor to surface registration.
are often fold edges, where surface mesh: A surface bound-
the surface normal has a large ary representation in which the
change in direction. See also faces are typically planar and
surface continuity. the edges are straight lines.
surface fitting: A family of Such representations are often
parametric surfaces x u v is associated with efficient data
parameterized by a vector of structures (e.g., winged edge,
parameters . For example, the quad edge) that allow fast com-
family of 3D spheres is param- putation of various geomet-
eterized by four parameters: ric and topological properties.
three for the center, one for Hardware acceleration of poly-
the radius. Given a set of n gon rendering is a feature of
sampled data points  p1   pn , many computers.
the task of surface fitting is to surface normal: The direction
find the parameters  of the perpendicular to a surface. For
surface that best fits the given a parametric surface x u v,
data. Common interpretations the normal is the unit vec-
of “best fit” include finding tor parallel to  x
u
×  x
v
. For an
the surface for which the sum implicit surface F x  = 0, the
of Euclidean distances from normal is the unit vector par-
the points to the surface is allel to
F =  F  F  F . The
x y z
smallest, or that maximize the figure shows the surface nor-
probability that the data points mal as defined by the small
could be noisy samples from neighborhood at the point X:
the surface. General techniques
include least squares fitting or
nonlinear optimization over
the surface parameters.
surface interpolation: Generat- X
ing a continuous surface from
sparse data such as 3D points.

296
surface orientation: The con- the domain of u v to a set of
vention that decides whether discrete class labels.
the surface normal or its nega- surface shape discontinuity: A
tion points outside the space discontinuity in the value of
bounded by the surface. a surface shape classification
surface patch: A surface whose over a surface. For example, a
domain is finite. discontinuity in the classifica-
surface reconstruction: The tion function cu v. Another
problem of building a sur- example occurs at the fold
face mesh or B-rep model edge at point X:
from unorganized point data.
surface reflectance: A descrip-
tion of the manner in which a
surface interacts with light. See
reflectance. X
surface roughness character-
ization: An inspection appli-
cation where estimates of the
roughness of a surface are
made, e.g., when inspecting
spray-painted surfaces. surface tracking: Identification
of the same surface through
surface segmentation: Division the frames of a video sequence.
of a surface into simpler
patches. Given a surface surface triangulation: See sur-
defined over a domain face mesh.
D, determine a partition surveillance: An application
D = D1n on which some area of vision concerned with
goodness criteria are well the monitoring of activities in a
satisfied. For example, it might scene. Typically this will involve
be required that the maximal at least background modeling
distance of a point of each Di and human motion analysis.
from the best-fit quadric sur-
face is below a threshold. See SUSAN corner finder: A pop-
also range data segmentation. ular interest point detector
developed by Smith and Brady.
surface shape classification: Combines the smoothing and
The use of curvature infor- central difference stages of a
mation of a surface to clas- derivative-based operator into
sify each point on the surface a single center–surround com-
as locally ellipsoidal, hyper- parison.
bolic, cylindrical or planar. See
also surface class. For exam- SVD: See singular value decom-
ple, given a parametric surface position.
x u v, the classification func- SVM: See support vector ma-
tion cu v is a mapping from chine.

297
swept object representation: are the locus of centers of
A volumetric representation bitangent circles. See also
scheme in which 3D objects medial axis skeletonization.
are formed by sweeping a 2D In the following example the
cross section along an axis medial axis is derived from
or trajectory. A brick can be a binary segmentation of a
formed by sweeping a rectan- moving subject.
gle. Some schemes, like the
geon or generalized cylinder
representation, allow changes
to the size of the cross section
and curved trajectories. A cone
is defined here by sweeping a
circle along a straight axis with
a linearly decreasing radius:

symmetry: A shape that remains


symbolic: Inference or compu- invariant under at least
tation expressed in terms of one non-identity transforma-
a set of symbols rather than tion from some pre-specified
a signal. Where a digital sig- transformation group is sym-
nal is a discrete representation metric. For example the
of a continuous function, sym- set of points comprising an
bols are inherently discrete. ellipse is the same after the
For example, an image (signal) ellipse is subjected to the
is converted to a list of the Euclidean transformation of
names of people who appear rotation by 180 about its
in it (symbols). center. The image of the
outline of a surface of rev-
symbolic object representa- olution under perspective
tion: Representation of an projection is invariant under
object by lists of symbolic a certain homography, so the
terms like “plane”, “quadric”, silhouette exhibits a projective
“corner”, or “face”, etc., rather symmetry. Affine symmetries
than the points or pixels of the are sometimes known as
shape itself. The representa- skew symmetries and sym-
tion may include the shape and metry induced by reflection
position of the objects, too. about a line are called bilateral
symmetric axis transform symmetries.
(SAT): A transformation symmetry detection: A class of
that locates all points on algorithms that search for sym-
the skeleton of a region by metry in imaged curves, sur-
identifying those points that faces and point sets.
298
symmetry line: The axis of a bi- syntactic texture description:
lateral symmetry. The solid line Description of texture in terms
rectangle has two dashed line of grammars of local shapes or
symmetry lines: image patches and transforma-
tion rules. Good for modeling
synthetic artificial textures.
synthetic aperture radar (SAR):
An imaging device that trans-
mits long-wavelength (in com-
parison to visible light) radio
waves from airborne or space
platforms and builds a 2D
image of the intensities of the
returned reflections. Clouds
symmetry plane: The axis of a are transparent at these (cen-
timeter) wavelengths, and the
bilateral symmetry in 3D. The
active transmission means that
dashed lines show three sym- images may be taken at night.
metry planes of this cube: The images are captured as
a sequence of low-resolution
(“small aperture”) 1D slices
as the platform translates
across the target area, with
a final high-resolution (“syn-
thetic [large] aperture”) image
recoverable via a Fourier trans-
form after all slices have been
captured. The time-of-flight of
sync pulse: Abbreviation of the returned signal determines
“synchronization pulse”. Any the distance from the transmit-
ter and therefore, assuming a
electrical signal that allows two planar (or known geometry)
or more electronic devices to surface, the pixel location in
share a common time frame. the cross-path direction.
Commonly used to synchro-
nize the capture instants of systolic array: A class of par-
two cameras in a stereo image allel computer in which pro-
capture system. cessors are arranged in a
directed graph. The proces-
syntactic pattern recognition: sors synchronously receive
Object identification by con- data from one set of neigh-
verting an image of the object bors (e.g., North and West
into a sequence or array of sym- in a rectangular array), per-
bols and using grammar pars- form a computation, and trans-
ing techniques to match the mit the computed quantity
sequence of symbols to gram- to another set of neighbors
mar rules in a database. (e.g., South and East).

299
T

tabu search: A heuristic search target recognition: See auto-


technique that seeks to avoid matic target recognition.
cycles by forbidding or penal-
izing moves taking the search task parallelism: Parallel pro-
to previously visited solution cessing achieved by the con-
spaces (hence “tabu”). current execution of relatively
large subsets of a computer
tangent angle function: Given
program. A large subset might
a curve xt yt, the function
be defined as one whose run
t = tan−1 ẋt
ẏt
. time is of the order of tens
tangent plane: The plane pass- of milliseconds. The parallel
ing through a point on a tasks need not be identical,
surface that is perpendicular to e.g., from a binary image, one
the surface normal. task may compute a moment
while another computes the
tangential distortion (lens): perimeter.
A particular lens aberration cre-
ated, among others, by lens tee junction: An intersection
decentering, usually modeled between line segments (pos-
only in high-accuracy calibra- sibly representing edges )
tion systems. where a straight line meets and
target image: The image result- terminates somewhere along
ing from an image processing another line segment. See also
operation. blocks world. Tee junctions
Source Image Target Image

A
p

C B

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

301
can give useful depth-ordering of local controls to operate
cues. Here we can hypothesize remote machinery, and haptic
that surface c lies in front of the (i.e., touch) feedback from the
surfaces A and B, given the tee remote to the local environ-
junction at p. ment.
telecentric optics: A lens sys- template matching: A strategy
tem arranged such that moving for location of an object in
the image plane along the op- an image. The template, a 2D
tical axis does not change the image of the object, is com-
magnification or image pos- pared with all windows of
ition of imaged world points. the same size as the template
One embodiment is to place an in the image to be searched.
aperture in front of the lens so Windows where the difference
that when an object is imaged with the template (as com-
off the focal plane of the lens, puted by, e.g., normalized
the center of the ( blurred) cross-correlation or sum of
object is the ray through the squared differences (SSD)) is
center of the aperture, rather within a threshold are reported
than the center of the lens. as instances of the object.
Placing the aperture at the Interesting as a brute-force
lens’s front focal plane will matching strategy. To obtain
ensure these rays are parallel invariance to scale, rotation or
after the lens. other transformations, the tem-
plate must be subjected explic-
itly to the transformations.
Focal plane

3. Image plane
moves
1. World point temporal averaging: Any pro-
cedure for noise reduction in
2. No aperture
4. Image point
which a signal that is known
moves
Non-telecentric optics
to be static over time is sam-
on image plane
pled at different times and the
Focal plane

3. Image plane 1. World point results averaged.


moves
temporal representation: A
2. Aperture
model representation that en-
4. Image point
stationary codes the dynamics of how an
on image plane Telecentric optics object’s shape or position can
vary over time.
telepresence: Interaction with
temporal resolution: The fre-
objects at a location remote
quency of observations with
from the user via vision
or robotic devices. Examples respect to time (e.g., one per
include slaving of remote cam- second) as opposed to the
eras to the motion of a head- spatial resolution.
mounted display worn by the temporal stereo: 1) Stereo
user, transmission of audio achieved through movement of
from the remote location, use the camera rather than use of

302
two separate cameras. 2) Inte- test set: The set used to ver-
gration of multiple stereo ify a classifier or other algo-
views of a dynamic scene to rithm. The test set contains
produce a better estimate of only examples not included in
each view. the training set.
temporal tracking: See track- tetrahedral spatial decompo-
ing. sition: A method of decom-
tensor product surface: A posing 3D space into packed
parametric representation for tetrahedrons instead of the
a curved surface commonly more commonly used rect-
used in computer modeling angular voxel decomposition.
and graphics applications, The A tetrahedral decomposition
surface shape is defined by allows a recursive subdivision
the product of two polynomial of a tetrahedron into eight
(usually cubic) curves in the smaller tetrahedra. This figure
independent surface coord- illustrates the decomposition
inates. with one of the eight smaller
volumes shaded:
terrain analysis: Analysis and
interpretation of data repre-
senting the shape of the planet
surface. Typical data structures
are digital elevation maps or
triangulated irregular networks
(TINs).
tessellated viewsphere: A divi-
sion of the viewsphere into
distinct subsets of (approxi-
mately) equal areas. Often
used as a data structure for
representation of functions of
the form f
n where n  is a unit texel: See texture element.
normal vector in 3 . Typically texon: See texture element.
constructed by subdivision of
the viewsphere into a polygon textel: See texture element.
mesh such as an icosahedron: texton: Julesz’ 1981 definition
of the units in which texture
might be perceived. In the
texton-based view, a texture is
a regular assembly of textons.
texture: The phenomenon by
which uniformity is perceived
in regular (etymologically,
“woven”) patterns of (possibly
irregular) elements. In com-
puter vision, texture usually
303
refers to the patterns in the ing each class by a human. The
appearance or reflectance on basis of texture segmentation.
a surface. The texture may texture descriptor: A vector val-
be regular, i.e., satisfy some ued function computed on
texture grammar or may be an image subwindow that is
statistical texture i.e., the dis- designed to produce similar
tribution of pixel values may outputs when applied to dif-
vary over the image. Texture ferent subwindows of the same
could also refer to variations texture. The size of the image
in the local shape on a surface, subwindow controls the scale
e.g., its degree of roughness. of the detector. If the response
See also shape texture. at a pixel position x y is com-
texture-based image retrieval: puted as the maximum over
Content-based image retrieval several scales, an additional
that uses texture as its scale output sx y is available.
classification criterion. See also texture primitive.
texture boundary: The bound- texture direction: The texture
ary between adjacent regions gradient or a 90 rotation
in texture segmentation. The thereof.
boundary perceived by texture element (texel): A
humans between two regions small geometric pattern that is
of different textures. This repeated frequently on some
figure shows the boundary be- surface resulting in a texture.
tween three regions of
different color and shape tex- texture energy measure: A
ture: single-valued texture descrip-
tor with strong response
in textured regions. A
texture descriptor may be
formed by combining the
results of several texture energy
measures into a vector.
texture enhancement: A pro-
cedure analogous to edge-
preserving smoothing in which
texture boundaries rather than
edges are to be preserved.
texture classification: Assign- texture field grouping: See
ment of an image (or a win- texture segmentation.
dow of an image) to one
of a set of texture classes. texture field segmentation:
The texture classes are typi- See texture segmentation.
cally defined by presentation texture grammar: Grammar
of training images represent- used to describe textures as
304
instances of simpler pat- texture recognition: See tex-
terns with a given spatial ture classification.
relationship (including other texture region extraction: See
textures defined previously texture field segmentation.
in this way). A sentence from
this grammar would be a texture representation: See
syntactic texture description. texture model.
texture gradient: The gradient texture segmentation: Seg-
of a single scalar output sx y mentation of an image into
of a texture descriptor. A com- patches of coherent texture.
mon example is the scale This figure shows a region
output, for homogeneous tex- segmented into three regions
ture, whose texture gradient based on color and shape
can be used to compute the texture:
foreshortening direction.
texture mapping: In computer
graphics, rendering of a polyg-
onal surface where the surface
color at each output screen
pixel is obtained by interpo-
lating values from an image,
called the texture map. The
source image pixel location
is computed using correspon-
dences between the polygon’s texture synthesis: The genera-
vertex coordinates and tex- tion of synthetic images of
ture coordinates on the texture textured scenes. More particu-
map. larly, the generation of images
texture matching: Matching of that appear perceptually to
regions based on texture share the texture of a set of
descriptions. training examples of a texture.
texture model: The theore- Theil–Sen estimator: A method
tical basis for a class of for robust estimation of curve
texture descriptor. For exam- fits. A family of curves is parame-
ple, autocorrelation of linear terized by parameters a1p , and
filter responses, statistical tex- is to be fit to data x1n . If
ture descriptions, or syntactic q is the smallest number of
texture descriptions. points that uniquely define a1p ,
then the Theil–Sen estimate of
texture orientation: See tex- the optimal parameters â1p are
ture gradient.
the parameters that have the
texture primitive: A basic unit median error measure of all
of texture (e.g., a small pat- the q-point estimates. For exam-
tern that is repeated) as used in ple, for line fitting, the num-
syntactic texture descriptions. ber of parameters (slope and

305
intercept, say) is p = 2, and the thin plate model: A model
number of points required to of surface smoothness used
give a fit is also q = 2. Thus in the variational approach.
the Theil–Sen estimate of the The internal energy (or
slope gives
 the median error of bending energy ) of a thin
the nq two-point slope esti- plate represented as a para-
metric surface x y f x y is
mates. The Theil–Sen estimator given by fxx2 + 2fxy2 + fyy2 .
is not statistically efficient, nor
does it have a particularly high thinning operator: Thinning
breakdown point, in contrast is a morphological operation
to such estimators as RANSAC that is used to remove selected
and least median of squares. foreground pixels from
binary images, somewhat like
thermal noise: In CCD cameras,
erosion or opening. It can be
additional electrons released
used for several applications,
by thermal vibration in the
but is particularly useful for
substrate that are counted
skeletonization and to tidy up
with those released by incident
the output of edge detectors
photons. Thus, the gray scale
by reducing all lines to single
values are corrupted by an
pixel thickness. Thinning
additive Poisson noise process.
is normally only applied to
thickening operator: Thicken- binary images and produces
ing is a morphological oper- another binary image as
ation that is used to grow output. This is a diagram
selected regions of foreground illustrating the thinning of a
pixels in binary images, some- region:
what like dilation or closing.
It has several applications,
including determining the
approximate convex hull of a
shape, and determining the
skeleton by zone of influence.
Thickening is normally only
applied to binary images, and it
produces another binary image
as output. This is an example three view geometry: See trino-
of thickening six times in the cular stereo.
horizontal direction: three dimensional imaging:
Any of a class of techniques
that obtain three dimensional
information using imaging
equipment. 1) 3D volumetric
imaging: obtaining measure-
ments of scene properties at all
points in a 3D space, including
the insides of objects. This
306
is used for inspection, but binary. This figure shows an
more commonly for medical input image and its threshold
imaging. Techniques include output:
nuclear magnetic resonance,
computerized tomography,
positron emission tomography
and single photon emission
computed tomography. 2) 3D
surface imaging: obtaining
surface information embed-
ded in a 3D space. Active
techniques generally include thresholding with hysteresis:
a source of structured light Thresholding of a time-varying
(or other electromagnetic or scalar signal where the thresh-
sonar radiation), and a sensor old value is a function of pre-
such as a camera or micro- vious signal and threshold val-
phone. Either triangulation ues. For example a thermostat
or time of flight computations control based on temperature
allow the distance from the receives a signal st, and gen-
erates an output signal bt of
sensor system to be computed.
the form
Common technologies include 
laser scanning, texture projec- st > cold if bt − 1 = 0
bt =
tion systems, and moiré fringe st < hot if bt − 1 = 1
methods. Passive 3D imaging
where the value at time t
depends only on external (and depends on the previous deci-
hence unstructured) illumina- sion bt − 1. In computer
tion sources. Examples of such vision, often associated with
systems are stereo reconstruc- the edge following stage of the
tion and shape from focus Canny edge detector.
techniques.
TIFF: Tagged Image File Format.
threshold selection: The auto-
matic choice of threshold tilt: The tilt direction of a 3D
values for conversion of a surface patch as observed in
scalar signal (such as a gray a 2D image is parallel to the
scale image ) to binary. Often projection of the 3D surface
(e.g., Otsu’s 1979 method) normal into the image. If the
proceeds by analysis of the his- 3D surface is represented as
togram of the sample values. a depth map zx y in image
Different assumptions about coordinates, then the tilt direc-
the underlying distributions tion at x y is the unit vec-
z z
yield different strategies. tor parallel to  x  y . The
tilt angle may be defined as
thresholding: Quantization into tan−1  z / z .
two values. For example, con- y x
version of a scalar signal time derivative: A technique
(such as a gray scale image ) to for computing how an image

307
sequence changes over time. tolerance zone the current line
Typically used as part of segment is ended and a new
shape from motion. segment is started. A tolerance
time to collision: See time to band is illustrated.
contact. tolerance interval: An interval
time to contact: From a within which a stated propor-
sequence of images It, com- tion of some population will
putation of the value of t lie.
at which, assuming constant Tomasi–Kanade factorization:
motion, an image object will A maximum-likelihood solu-
intersect the plane parallel to tion to structure and motion
the image plane that contains recovery in the situation where
the camera center. It can be points in a static scene are
computed even in the absence observed by affine cameras and
of metric information about the observed x y positions
the imaging system—i.e., in are corrupted by Gaussian
an uncalibrated setting. noise. The method depends on
time to impact: See time to the observation that if m points
contact. are observed over n views, the
2n × m measurement matrix
time-of-flight range sensor: A containing all the observations
sensor that computes distance (after certain transforma-
to target points by emit- tions have been performed)
ting electromagnetic (or other) is of rank 3. The closest
radiation and measuring the rank-3 approximation of the
time between emitting the matrix is reliably obtained via
pulse and observing the reflec- singular value decomposition,
tion of the pulse. after which the 3D points and
tolerance band algorithm: An camera positions are easily
algorithm for incremental extracted, up to an affine
segmentation of a curve into ambiguity.
straight line elements. Assume tomography: A technique for
that the current straight line the reconstruction of a 3D
segment defines two parallel volumetric dataset based
boundaries of a tolerance on a number of 2D slices.
zone at a pre-selected distance The most common examples
from the line segment. When occur in medical imag-
a new curve point leaves the ing (e.g., nuclear magnetic
resonance, positron emission
TOLERANCE EXIT tomography ).
ZONE POINT
top-down: A reasoning ap-
proach that searches for evi-
dence for high-level hypothe-
ses in the data. For example, a
hypothesize-and-test algorithm

308
might have a strategy for making connectedness of objects
good guesses as to the position in discrete geometry (see
of circles in an image and then topological representation ).
compare the hypothesized One speaks of the topology
circles to edges in the image, of a network, meaning the set
choosing those that have good of connections within the net-
support. Another example is work, or equivalently the set
a human body recognizer that of neighborhood relationships
employs body part recognizers that describe the network.
(e.g., heads, legs, torso) that, in
turn, either directly use image torsion: A concept in the
data or recognize even smaller differential geometry of curves
subparts. formally representing the
intuitive notion of the local
top hat operator: A morpho- twisting of a 3D curve as you
logical operator used to move along the curve. The
remove structures from torsion t of a 3D space curve
images. The top-hat filtering of ct is the scalar
image I by structuring element  
S is the difference I −
d c˙ t c¨ t 
ct
openI S, where openI S is −nt · bt =
the morphological opening of dt c¨ t2
I by S.
where n  t is the curve normal
topological representation: 
Any representation that and bt the binormal. The
encodes connectedness of notation  x  y  z
denotes the
elements. For example, in a scalar triple product x · y × z .
surface boundary represen- torus: 1) The volume swept by
tation comprising faces, edges moving a sphere along a circle
and vertices, the topology of in 3D. 2) The surface of such a
the representation is the list volume.
of face–edge and edge–vertex
connections, which is inde- total variation: A class of reg-
pendent of the geometry (or ularizer in the variational
spatial positions and sizes) approach. The total variation
of the representation. In regularizer of function f x
 →  is of the form R f  =
n
this case, the fundamental
relation is “bounded by”, so
x d x where is (a sub-
f
a face is bounded-by one or set of) the domain of f .
more edges, and an edge is tracking: A means of estimat-
bounded-by zero or more ing the parameters of a
vertices. dynamic system. A dynamic
topology: 1) Properties of point system is characterized by a
sets (such as surfaces) that set of parameters (e.g., fea-
are unchanged by continuous ture point positions, target
reparameterizations (homeo- object positions, human joint
morphisms) of space. 2) The angles) evolving over time,

309
of which we have measure- rent measurements z t to esti-
ments (e.g., photographs of mate x t.
the human) obtained at suc-
traffic analysis: Analysis of
cessive time instants. The task
video data of automobile
of tracking is to maintain
traffic, e.g., to identify num-
an estimate of the probabil-
ber plates, detect accidents,
ity distribution over the model
detect congestion, compute
parameters, given the mea-
throughput, etc.
surements, as well as a priori
models of how the param- training set: The set of labeled
eters change over time. Com- examples used to learn the
mon algorithms for tracking parameters of a classifier. In
include the Kalman filter and order to build an effective clas-
particle filters. Tracking may sifier, the training set should be
be viewed as a class of representative of the examples
algorithms that operate on that will be encountered in
sequences of inputs, using the eventual domain of appli-
assumptions about the coher- cation.
ence of successive inputs to trajectory: The path that a mov-
improve performance of the ing point makes over time. It
algorithm. Often the task of could also be the path that a
the algorithm may be cast as whole object takes if less preci-
estimation of a state vector—a sion of usage is desired.
set of parameters such as the
joint angles of a human body— trajectory estimation: Determi-
at successive time instants t. nation of the 3D trajectory of
The state vector x t is to be an object observed in a set of
estimated using a set of sen- 2D images.
sors that yield observations, transformation: A mapping of
z t, such as the 2D positions data in one space (such as
of bright spots attached to a an image) into another space
human. In the absence of tem- (e.g., all image processing
poral coherence assumptions, operations are transforma-
x must be estimated at each tions).
time step solely using the infor-
mation in z t. With coherence translation: A transformation of
assumptions, the system uses Euclidean space that can be
the set of all observations so represented in the form x →
far  z   < t to compute the Tx  ≡ x → x + t . In projec-
estimate at time t. In prac- tive space, a transformation
tice, the estimate of the state that leaves the plane at infinity
is represented as a probabil- pointwise invariant.
ity density over all possible val- translation invariant: A prop-
ues, and the current estimate erty that keeps the same value
uses only the previous state even if the data, scene or the
estimate x t − 1 and the cur- image from which the data
310
comes is translated. The dis- trifocal tensor: The geometric
tance between two points is entity that relates the images
translation invariant. of 3D points observed in three
translucency: The transmission perspective 2D views. Alge-
of light through a diffusing braically represented as a 3 ×
jk
interface such as frosted glass. 3 × 3 array of values Ti . If
Light entering a translucent a single 3D point projects to
material has multiple possible x x
 x

in the first, second,


exit directions. and third views respectively, it
must obey the nine equations
transmittance: Transmittance is
x i x
j jr x

k ks Ti

the ratio of the (“outgoing”) = 0rs
power transmitted by a trans-
parent object to the incident for r and s varying from 1 to 3.
(“incoming”) power. In the above,  is the epsilon-
transparency: The property of a tensor for which
surface to be traversed by radi- 
 1 ijk an even
ation (e.g., by visible light), so 
 permutation of 123
that objects on the other side ijk = 0 two of i j k equal


can be seen. A non-transparent  −1 ijk an odd
surface is called opaque. permutation of 123
tree classifiers: A classifier that
applies a sequence of binary As this equation is linear in
tests to input points x in order the elements of T , it can be
to determine the label l of the used to estimate them given
class to which it belongs. enough 2D point correspon-
dences x x
 x

. As not all 3 ×
tree search method: A class of 3 × 3 arrays represent realizable
algorithms to optimize a func- camera configurations, estima-
tion defined on tuples of values tion must also incorporate sev-
taken from a finite set. The tree eral nonlinear constraints on
describes the set of all such the tensor elements.
tuples, and the order in which
tuples are explored is defined trilinear constraint: The geo-
by the particular search algo- metric constraint on three views
rithm. Examples are depth- of a point (i.e., the intersec-
first, breadth-first, A∗ and tion of three epipolar lines ).
best-first search. Applications This is similar to the epipolar
include the interpretation tree. constraint which is applied in
the two view scenario.
triangulated models: See sur-
face mesh. trilinear tensor: Another name
triangulation: See Delaunay for the trifocal tensor.
triangulation, surface triangu- trilinearity: An equation in a
lation, stereo triangulation, set of three variables in which
structured light triangulation. holding two of the variables
311
fixed yields a linear equation in under normal circumstances,
the remaining one. For exam- converges approximately to
ple xyz = 0 is trilinear in x, the mode even if the observed
y and z, while x 2 = y is not, distribution has very few
as holding y fixed yields a samples with no obvious peak.
quadratic in x.
tube camera: See tube sensor.
trinocular stereo: A multiview
stereo process that uses three tube sensor: A tube sensor con-
cameras. verts light to a video signal
using a vacuum tube with
tristimulus theory of color a photoconductive window.
perception: The human Once the only type of light sen-
visual system has three types sor, the tube camera is now
of cones, with three different largely superseded by the CCD,
spectral response curves, so but remains useful for some
that the perception of any high dynamic range imaging.
incident light is represented as The image orthicon tube or
three intensities, roughly cor- “immy” is remembered in the
responding to long (maximum name of the US Academy of
about 558– 580 nm), medium Television Arts and Sciences
(531–545 nm) and short Emmy awards.
(410–450 nm) wavelengths.
twist: A 3D rotation represen-
tristimulus values: The rela-
tation component that speci-
tive amounts of the three
primary colors that need to be fies a rotation about the vec-
combined to match a given tor defined by the azimuth and
color. elevation. This figure shows the
pitch rotation direction:
true negative: A hypothesis
which is false that has been
corrected rejected.
TWIST
true positive: A hypothesis ELEVATION
which is true that has been AZIMUTH
corrected accepted.
truncated median filter: An
approximation to mode
filtering when image neigh- twisted cubic: The curve 1 t
borhoods are small. The filter t 2  t 3  in projective 3-space, or
sharpens blurred image edges any projective transformation
as well as reducing noise}. thereof. The general form is
The algorithm truncates the thus
local distribution on the mean     
side of the median and then x1 a11 a12 a13 a14 1
recomputes the median of x2  a21 a22 a23 a24   t 
the new distribution. The x  = a a a a  t 2 
3 31 32 33 34
algorithm can iterate and, x4 a41 a42 a43 a44 t3
312
The projection of a twisted
cubic into a 2D image is a ra-
tional cubic spline.
two view geometry: See bino-
cular stereo.
type I error: A hypothesis which
is true that has been rejected.
type II error: A hypothesis
which is false that has been
accepted.

313
U

ultrasonic imaging: Creation of Often used to excite fluorescent


images by the transmission and materials.
recording of reflected ultra- umbilic: A point on a surface
sonic pulses. A phased array
where the curvature is the
of transmitters emits a set of
same in every direction. Every
pulses, and then records the
point on a sphere is an umbilic
returning pulse intensities. By
varying the relative timings of point.
the pulses, the returned inten- umbra: The completely dark
sities can be made to corre- area of a shadow caused
spond to locations in space, by a particular light source
allowing measurements to be (i.e., where no light falls from
taken from within ultrasonic- the light source).
transparent materials (includ-
ing the human body, excluding
air and bone). No shadow
Fuzzy shadow
Light Source
ultrasound sequence registra- Umbra (complete shadow)

tion: Registration of overlap-


ping ultrasound images.
uncalibrated approach: See
ultraviolet: Description of elec- uncalibrated vision.
tromagnetic radiation with
wavelengths between about uncalibrated stereo: Stereo re-
300–420 nm (near ultraviolet) construction performed with-
and 40–300 nm (far ultraviolet). out precalibration of the
The short wavelengths make cameras. Particularly, given a
it useful for fine-scale exam- pair of images taken by
ination of surfaces. Ordinary unknown cameras, the fun-
glass is opaque to UV radiation, damental matrix is computed
quartz glass is transparent. from point correspondences,

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

315
after which the images may interval can be used to repre-
be rectified and conventional sent a range of possible values.
calibrated stereo may proceed. under-segmented: Describing
The results of uncalibrated the output of a segmentation
stereo are 3D points in a algorithm. Given an image
projective coordinate system, where a desired segmentation
rather than the Euclidean result is known, the algorithm
coordinate system that a under-segments if regions out-
calibrated setup admits. put by the algorithm are gener-
uncalibrated vision: The class ally the union of many desired
of vision techniques that regions. This image should be
require no quantitative infor- segmented into three regions
mation about the camera but it was under-segmented
used in capturing the images into two regions:
on which they operate. For
example, techniques that
can be applied on archive
footage. In particular, applied
to geometric problems such
as stereo reconstruction that
traditionally required that
the images be from a camera
system upon which calibration
measurements had been uniform distribution: A proba-
previously made. Uncalibrated bility distribution in which a
approaches include those, variable can take any value
such as uncalibrated stereo in the given range with equal
where the traditional cali- probability.
bration step is replaced by uniform illumination: An idea-
procedures that can use image lized configuration in which
features directly, and others, the arrangement of lighting
such as time-to-contact compu- within a scene is such that
tations that can be expressed each point receives the same
in ways that factor out the amount of light energy. In
calibration parameters. In computer vision, sometimes
general, uncalibrated systems uniform illumination has a dif-
will have degrees of freedom ferent meaning: that each point
that cannot be measured, such in an image of the scene (or a
as overall scale, or projective part thereof such as the back-
ambiguity. ground) has similar imaged
uncertainty representation: A intensity.
strategy for representation of uniform noise: Additive corrup-
the probability density of a vari- tion of a sampled signal. If the
able as used in a vision algo- signal’s samples are si then the
rithm. In a similar manner, an corrupted signal is s̃i = si + ni

316
where the ni are uniformly ran- unsharp operator: An image
domly drawn from a specified enhancement operator that
range  . sharpens edges by adding a
uniqueness stereo constraint: high pass filtered version of an
When performing stereo image to itself. The high pass
matching or stereo recon- filter is implemented by sub-
struction, matching can be tracting a smoothed version of
simplified by assuming that the image yielding
points in one image corres- Iunsharp = I + I − Ismooth 
pond to only one point in
other images. This is generally This shows an input image and
true, except at object bound- its unsharped output:
aries and other places where
pixels are not completely
opaque.
unit ball: An N dimensional
sphere of radius one.
unit quaternion: A quaternion
is a 4-vector q ∈ 4 . Quater-
nions of unit length can
be used to parameterize 3D
rotation matrices. Given a
quaternion with components unsupervised classification:
q0  q1  q2  q3  the correspond- See clustering.
ing rotation matrix R is (letting unsupervised learning: A
S = q02 − q12 − q22 − q32 ): method for training a neural
  network or other classifier
S + 2q12 2q1 q2 + 2q0 q3 2q3 q1 − 2q0 q2
2q1 q2 − 2q0 q3 S + 2q22 2q2 q3 + 2q0 q1
where the network learns
2q3 q1 + 2q0 q2 2q2 q3 − 2q0 q1 S + 2q32 to recognize patterns (in a
training set) automatically. See
The identity rotation is given by also supervised learning.
the quaternion 1 0 0 0. The updating eigenspace: Algo-
rotation axis is the unit vector rithms for the incremental
parallel to q1  q2  q3 . updating of eigenspace repre-
unit vector: A vector of length sentations. These algorithms
one. facilitate approaches such as
unitary transform: A reversible active learning.
transformation (e.g., the dis- USB camera: A camera conform-
crete Fourier transform ). U is a ing to the USB (Universal Serial
unitary matrix where U∗ U = I, Bus) standard.
U∗ is the adjoint matrix and I
is the identity matrix.
unrectified: When a stereo cam-
era pair has not been rectified.

317
V

validation: Testing whether or just as a pair of railway lines


not some hypothesis is true. See meeting in a vanishing point is
also hypothesize and verify. the intersection of two parallel
valley: A dark elongated object lines and the plane at infinity.
in a gray scale image, so called This sketch shows the vanish-
because it corresponds to a ing line for the ground plane
valley in the image viewed with a road and railroad:
as a 3D surface or elevation
map of intensity versus image
VANISHING
position. POINT
valley detection: An image pro- VANISHING LINE
cessing operator (see also bar
detector ) that enhances linear
features rather than light-to-
dark edges. See also valley.
value quantization: When a
continuous number is encoded
as a finite number of integer
values. A common example of
this occurs when a voltage or vanishing point: The image of
current is encoded as integers the point at infinity where two
in the range 0–255. parallel 3D lines meet. A pair
vanishing line: The 2D line that of parallel 3D lines are rep-
is the image of the intersec- resented as a n and b + 
 +  n.
tion of a 3D plane with the The vanishing point is the 
plane at infinity. The horizon image of the 3D direction n0 .
line in an image is the image of This sketch shows the vanish-
the intersection of the ground ing points for a road and rail-
plane with the plane at infinity, road:

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

319
VANISHING a specific example: smooth-
POINT ing. In the conventional
VANISHING LINE approach, smoothing might
be considered the result of
an algorithm: convolve the
image with a Gaussian kernel.
In the variational approach,
the smoothed signal P is
the signal that best trades
off smoothness, measured as
the square
 of the second deriva-
variable focus: A camera sys- tive Ṗt2 dt, and fidelity
tem with a lens system that to the data, measured as the
allows zoom to be changed squared difference between
under user or program control. the
 input and the output
An image sequence in which Pt − It2 dt, with the balance
focal length varies through the chosen by a parameter :
sequence. 
EP = Pt − It2 + Ṗt2 dt
variational approach: Signal
processing expressed as a
problem of variational calcu- variational method: See vari-
lus. The input signal is a func- ational approach.
tion It on the interval t ∈ variational problems: See vari-
−1 1. The processed signal ational approach.
is a function P defined on the vector field: A multi-valued
same interval, that minimizes
function f n → m . For
an energy functional EP of the
example, the 2D-to-2D func-
form
 1 tion f x y =  y sin
x illus-
EP = fPt Ṗt Itdt trated below. An RGB image
−1
The calculus of variations [y, sin(π x)]
shows that the minimizing P is
the solution to the associated
Euler–Lagrange equation
f d f
=
P dt Ṗ
In computer vision, the func-
tional is often of the form

E = truthP I  +  beautyP
where the “truth” term mea-
sures fidelity to the data and
the “beauty” term is a regu-
larizer. These can be seen in
320
Ix y = rx y gx y bx y i
distribution. Let mpq be the
is an example of a 2D-to-3D pq th central moment of a
vector field. binary region in the i th image.
vector quantization: Represen- Then the Cartesian velocity
tation of a set of vectors by moments are defined as vpqrs =
associating each possible vec-  s i
i x̄i − x̄i−1   ȳi − ȳi−1  mpq ,
r
tor with one of a small set of where x̄i  ȳi  is the center of
“codebook” vectors. For exam- mass in the i th image.
ple, each pixel in an RGB image
has 2563 possible values, but velocity smoothness con-
one might expect that a par- straint: Changes in the magni-
ticular image uses only a small tude or direction of an image’s
subset of these values. If a velocity field occur smoothly.
256-element colormap is com- vergence: 1) The angle between
puted, and each RGB value the optical axes in a stereo sys-
is represented by the nearest tem, when the two cameras fix-
RGB vector in the colormap, ate on the same scene point.
the RGB space has been quan- 2) The difference between the
tized into 256 elements. pan angle settings of the two
vehicle detection: An example cameras.
of the object recognition prob- vergence maintenance: The
lem where the task is to iden- action of a control loop which
tify vehicles in video imagery. ensures that the optical cen-
vehicle license/number plate ters of two cameras—whose
analysis: When a visual sys- positions are under program
tem locates the license plate in control—are looking at the
a video image and then recog- same scene point.
nizes the characters. verification: In the context of
vehicle tracking: An example of object recognition, a class of
the tracking problem applied algorithms aiming to test the
to images of vehicles. validity of various hypothe-
ses (models) explaining the
velocity: Rate of change of pos- data. Back projection is such
ition. Generally, for a curve a technique, typically used
x t ∈ n the velocity is the with geometric models. See
n-vector xtt also object verification.
velocity field: The image vertex: A point at the end of a
velocity of each point in an line (edge) segment. Often ver-
image. See also optical flow tices are common to two or
field. more line segments.
velocity moment: A moment video: 1) Generic term for a
that integrates information set of images taken at succes-
about region velocity as sive instants with small time
well as position and shape intervals between them. 2) The

321
analogue signal emitted by a correcting for video-specific
video camera. Each frame of degradations.
video corresponds to about video segmentation: Applica-
40 ms of electrical signal that tion of segmentation to video,
encodes the start of each scan 1) with the requirement that
line, the image encoding of the segmentation exhibit the
each video scan line, and syn- temporal coherence in the
chronization information. 3) A original footage and 2) to
video recording. split the video sequence into
video annotation: The associa- different groups of consecutive
tion of symbolic objects, such frames, e.g., when there is a
as text descriptions or index change of scene.
terms with frames of video. video sequence: See video.
video camera: A camera that video transmission format: A
records a sequence of images description of the precise form
over time. of the analog video signal cod-
video coding: The conversion of ing conventions in terms of
video to a digital bitstream. duration of components such
The source may be analogue as number of lines, number of
or digital. Generally, coding pixels, front porch, sync and
also compresses or reduces the blanking.
bitrate of the video data. vidicon: A type of tube camera,
video compression: Video successor of the image
coding with the specific aim of orthicon tube.
reducing the number of bits view based object recognition:
required to represent a video Recognition of 3D objects
sequence. Examples include using multiple 2D images of
MPEG, H.263, and DIVX. the objects rather than a 3D
video indexing: Video anno- model.
tation with the aim of allowing view combination: A class of
queries of the form “At what techniques combining proto-
frame did event x occur?” or type views linearly to form
“Does object x appear?”. appearance models. See also
video rate system: A real time appearance model, eigenspace
system that operates at the based recognition, prototype,
frame rate of the ambient video representation.
standard. Typically 25 or 30 view volume: The infinite vol-
frames per second, 50 or 60 ume of 3D space bounded
fields per second. by the camera’s center of pro-
video restoration: Application jection and the edges of the
of image restoration to video, viewable area on the image
often making use of the tem- plane. The volume might also
poral coherence of video, or be bounded near and far
322
by other planes because of viewpoint consistency con-
focusing and depth of field straint: Lowe’s term for the
constraints. This figure illus- concept that a 3D model
trates the view volume: matched to a set of 2D line
segments must admit at least
one 3D camera position that
projects the 3D model to
View volume those lines. Essentially, the 3D
and 2D data must allow pose
estimation.
viewpoint dependent repre-
Center of projection
sentations: See viewer
centered representation.
viewer centered representa- viewpoint planning: Deciding
tion: A representation of the where an active vision sys-
3D world that an observer tem will look next, in order
(e.g., robot or human) main- to maximize the likelihood of
achieving some preset goal. A
tains. In the viewer centered
common example is comput-
version, the global coordinate ing the location of a range
system is maintained on the sensor in several successive
observer, and the representa- positions in order to gain a
tion of the world changes as complete 3D model of a tar-
the observer moves. Compare get object. After n pictures have
object centered representation. been captured, the viewpoint
viewing space: The set of all planning problem is to choose
possible locations from which the position of picture n + 1 in
an object or scene could be order to maximize the amount
viewed. Typically these loca- of new data acquired, while
tions are grouped to give a set ensuring that the new pos-
of typical or characteristic views ition will allow the new data to
of the object. If orthographic be registered to the n existing
images.
projection is used, then the
full 3D space of views can be viewsphere: The set of camera
simplified to a viewsphere. positions from which an object
can be observed. If the cam-
viewpoint: The position and era is orthographic, the view-
orientation of the camera when sphere is parameterized by the
an image was captured. The 2D set of points on the 3D
viewpoint may be expressed in unit sphere. At the camera
absolute coordinates or rela- position corresponding to a
tive to some arbitrary coord- particular point on the view-
inate system, in which case the sphere, all images of the
relative position of the camera object due to camera rota-
and the scene (or other cam- tion are related by a 2D-
eras) is the relevant quantity. to-2D image transformation,
323
i.e., no parallax effects occur. rendering a 3D model of the
See aspect graph. The place- world into a head-mounted
ment of a camera on the view- display whose viewpoint is
sphere is illustrated here: tracked in 3D so that the
user’s head movements gener-
ate images corresponding to
their viewpoint; placing the
user in a computer augmented
virtual environment (CAVE),
where as much as possible of
the user’s field of view can be
manipulated by the controlling
computer.
virtual view: Visualization of a
model from a particular view-
point.
vignetting: Darkening of the
viscous model: A deformable
corners of an image relative
model based on the concept of
to the image center, which
a viscous fluid (i.e., a fluid with
is related to the degree to
a relatively high resistance to
which the points are off the
flow).
optical axis.
visible light: Description of
virtual bronchoscopy: Cre-
electromagnetic radiation with
ation of virtual views of the
wavelengths between about
pulmonary system based on
400 nm (blue) and 700 nm
e.g., magnetic resonance imag-
(red), corresponding to the
ing as a replacement for
range to which the rods and
endoscope imaging.
cones of the human eye are
virtual endoscopy: Simulation sensitive.
of a traditional endoscopy pro-
visibility: Whether or not a par-
cedure using virtual reality
ticular feature is visible from a
representation of physiological
data such as that obtained by camera position.
an X-ray CAT-scan or magnetic visibility class: The set of points
resonance imaging. where exactly the same portion
virtual reality: The use of com- of an object or scene is visible.
puter graphics and other inter- For example, when viewing the
action tools to confer on a corner of a cube, an observer
user the sensation of being can move about in about one-
in, and interacting with, an eighth of the full viewing space
alternative environment. This before entering a new visibility
includes simulation of visual, class.
aural, and haptic cues. Com- visibility locus: All camera pos-
mon ways in which the visual itions from which a particular
environment is displayed are: feature is visible.

324
VISIONS: The early scene not corresponding to the
understanding system of world actually causing the
Hanson and Riseman. image or sequence. Illusions
visual attention: The process by are caused, in general, by
which low level feature detec- the combination of special
tion directs high level scene arrangements of the visual
analysis and object recognition stimuli, viewing conditions,
strategies. In humans, the and responses of the human
results of the process are evi- vision system. Well-known
dent in the pattern of fixations examples include the Ames
and saccades in normal obser- room (two persons are seen as
vation of the world. having very different heights
in a seemingly normal room)
visual cortex: A part of the brain and the Ponzo illusion:
dedicated to the processing of
visual information.
visual hull: A space carving
method for approximating
shape from multiple images.
The method finds the silhou-
ette contours of a given object
in each image. The region of
space defined by each camera
and the associated image con-
tour imposes a constraint on
the shape of the target object.
The visual hull is the intersec-
tion of all such constraints.
As more views are taken,
the approximation becomes
better. See the shaded areas in Here two equal segments
seem to be different lengths as
interpreted as 3D projections).
The well-known ambiguous
figure–background drawings
of the Gestalt psychology
(see Gestalt), like the famous
chalice–faces pattern, are a
related subject.
visual industrial inspection:
The use of computer vision
techniques in order to effect
quality control or to control
visual illusion: The perception processes in an industrial set-
of a scene, object or motion ting.

325
visual inspection: A general objects in view. Both video and
term for analyzing a visual range sensors have been used,
image to inspect some item, including acoustic sensors (see
such as might be used for qual- sonar). See also visual servoing,
ity control on a production visual localization.
line. visual routine: Ullman’s 1984
visual learning: The problem of term for a subcomponent of
learning visual models from a visual system that performs
sets of images (examples), a specific task, analogous to a
or in general knowledge that behavior in robotics.
can be used to carry out visual salience: A (numerical)
vision tasks. An area of assessment of the degree to
the vast field of automated which pixels or areas of a
learning. Important applica- scene attract visual attention.
tions employing visual learning The principle of Gestalt orga-
include face recognition and nization.
image database indexing. See
also unsupervised learning, visual search: The task of
supervised learning. searching an image for a partic-
ular prespecified object. Often
visual localization: The prob- used as a an experimental tool
lem of estimating the location in psychophysics.
of a target in space given one or
more images of it. Solutions dif- visual servoing: Robot control
fer according to several factors via motions that make the
including the number of input image of, e.g., the robot end
images (one, as in model based effector coincide with the
pose estimation, multiple dis- image of the target position.
crete images, as in stereo vision, Typically, the system has lit-
or video sequences, as in tle or no a priori know-
motion analysis), the a priori ledge of the camera locations,
knowledge assumed (i.e., cam- their relation to the robot, or
era calibration available or the robot kinematics. These
not, full perspective or sim- parameters are learned as the
plified projection model, robot moves. Visual servo-
geometric model of target ing allows the calibration to
change during robot opera-
available or not).
tion. Such systems can adapt
visual navigation: The prob- well to anomalous conditions,
lem of navigating (steering) such as an arm bending under
a robot through an environ- a load or motor slippage,
ment using visual data, typically or where calibration may not
video sequences. It is possible, provide sufficient precision to
under diverse assumptions, to allow the desired actions to
determine the distance from be reliably produced purely
obstacles, the time-to-contact, from the modeled robot kine-
and the shape and identity of the matics and dynamics. Because

326
only image measurements are which a subset of 3D space
available, the inverse kinematic is represented digitally. Exam-
problem may be harder than in ples include voxmap, octree
conventional servoing. and the space bounded by
visual surveillance: Surveil- surface representations.
lance dependent only on the Voronoi cell: See Voronoi
use of electromagnetic sensors. diagram.
volume: 1) A region of 3D space. Voronoi diagram: Given n
A subset of 3 . A (possibly infi- points x1n , the Voronoi
nite) 3D point set. 2) The space diagram of the point set is
bounded by a closed surface. a partition of space into n
volume detection: The detec- regions or cells R1n . Every
tion of volume-shaped entities point p in cell Ri is closer to
in 3D data sets, such as might point xi than to any other x .
be produced by an nuclear The hyperplanes separating
magnetic resonance scanner. the Voronoi regions are the
perpendicular bisectors of
volume matching: Identi- the edges in the Delaunay
fication of correspondence triangulation of the point set.
between objects or subsets of The Voronoi diagram of these
objects defined using a volu- four points are the four cells
metric representation. surrounding them:
volume skeletons: The skel-
etons of 3D point sets, by
extension of the definitions for
2D curves or regions.
volumetric image: A voxmap
or 3D array of points where
each entry typically rep-
resents some measure of
material density or other prop-
erty in 3D space. Common
examples include computer-
ized tomography and nuclear
magnetic resonance data.
voxel: From “volume element”
volumetric reconstruction: by analogy with “pixel”. A
Any of several techniques that region of 3D space, named
derive a volumetric represen- by analogy with pixel. Usually
tation from image data. Exam- voxels are axis-aligned rect-
ples include X-ray tomography, angular solids or cubes. A
space carving and visual hull component of the voxmap rep-
computation. resentation for 3D volumes. A
volumetric representation: A voxel, like a pixel, may have
data structure by means of associated attributes such as

327
color, occupancy, or the den-
sity of some measurement at
that point.
voxmap: A volumetric represen
tation that describes a 3D
volume by dividing space into
a regular grid of voxels,
arranged as a 3D array vi j k.
For a boolean voxmap, cell
i j k intersects the volume iff
vi j k = 1. The advantages of
the representation are that it
can represent arbitrarily com-
plex topologies and is fast to
look up. The major disadvan-
tage is the large memory usage,
addressed by the octree repre-
sentation.
VRML: Virtual Reality Markup
Language. A means of defining
3D geometric models intended
for Internet delivery.

328
W

walkthrough: A classification values for a given order n is the


of the infinite number of Hadamard matrix H2n of order
paths between two points into 2n . The two functions of order
one of nine equivalence classes 1 are the rows of
of the eight relative directions  
between the points plus 1 1
the ninth having no move- H2 = 1 −1
ment. Point B is in equivalence
class 2 relative to A: and the four of order 2
(depicted below) are
 
8 1 H H
7 2
H4 = H2 −H2
2 2
A B
6 3  
1 1 1 1
5 4 1 −1 1 −1
= 1 1 −1 −1
Walsh function: The Walsh 1 −1 −1 1
functions of order n are a
particular set of square waves In general, the functions of
Wn k  0 2n  → −1 1 for k order n + 1 are generated by
from 1 to 2n . They are orthog- the relation
onal, and the product of Walsh  
functions is a Walsh function. H2n H2n
H2n+1 = H −H2n
The square waves transition 2n
only at integer lattice points so
each function can be specified and this recurrence is the basis
by the vector of values it takes of the fast Walsh transform.
on the points  12  1 12      2n − The four Walsh functions of
1
2
. The collection of these order 2 are:

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

329
1 1
showing concave edges (−),
convex edges (+) and occlud-
0 0 ing edges (>).
–1 –1
warping: Transformation of an
0 2 4 0 2 4 image by reparameterization
of the 2D plane. Given an
1 1 image I
x  and a 2D-to-2D map-
ping w  x → x  , the warped
0 0
image W x  is Iw x . Warp-
–1 –1 ing functions w are often
0 2 4 0 2 4 designed so that certain con-
trol points p1 n in the source
image are mapped to speci-
Walsh transform: Expression of fied locations p1 n

in the des-
a 2n -element vector v in tination image. See also image
terms of a basis of order-n morphing.
Walsh functions; the multipli-
cation by the corresponding
Hadamard matrix. The Walsh
transform has applications in
image coding, logic design and
the study of genetic algorithms.
Waltz line labeling: A scheme
for the interpretation of line The original image Ix ; Warp-
images of polyhedra in blocks- ing function represented by
world images. Each image arrows joining points x to
line is labeled to indicate w −1 
x ; Warped image Wx .
what class of scene edge gave watermark: See digital water-
rise to it: concave, convex, marking.
occluding, crack or shadow. By
including the constraints sup- watershed segmentation:
plied by junction labeling in Image segmentation by means
a constraint satisfaction pro- of the watershed transform.
blem, Waltz demonstrated that A typical implementation
collections of lines whose proceeds thus: 1. Detect edges;
labels were locally ambigu- 2. Compute the distance
ous could be globally dis- transform D of the edges;
ambiguated. This is a simple 3. Compute watershed regions
example of Waltz line labeling in −D.

a b
+ +
+ +
+

330
c d called watershed regions. Effi-
cient algorithms exist for the
computation of the watershed
transform. Above is an image
with minima superimposed,
e f
the same image viewed as a 3D
elevation map and the water-
shed transform of the image,
where different minima have
different colored regions and
watershed pixels are shown in
white. One particular water-
(a) Original image; (b) Canny
shed is indicated by arrows.
edges; (c) Distance trans-
form; (d) Region boundaries wavelength: The wavelength of
of watershed transform of (c); a wave is the distance between
(e) Mean color in watershed successive peaks. Denoted
, it
regions; (f) Regions overlaid is the wave’s speed divided by
on image. the frequency. Electromagnetic
waves, particularly visible light,
watershed transform: A tool for are often important in com-
morphological image segmen- puter vision, with wavelengths
tation. The watershed trans- of the order of 400–700 nm.
form views the image as an
elevation map, with each local wavelet: A function x that
minimum in the map given has certain properties that
a unique integer label. The mean it can be used to derive a
set of basis functions in terms
watershed transform of the
of which other functions can
image assigns to each non-
be approximated. Comparing
minimum pixel, p, the label to the Fourier transform basis
of the minimum to which a functions, note that they can
drop of water would fall if be viewed as a set of scal-
placed at p. Points on “ridges” ings and translations of f x =
or watersheds of the elevation sin x, for example cos3 x =
map, that could fall into one of sin3 x + 2  = f  6x+1 . Simi-
2
two minima are called water- larly, a wavelet basis is made
shed points and the set of from a mother wavelet x by
pixels surrounding each min- translating and scaling: each
imum that share its label are basis function jk x is of the
form jk x = const · 2−j x −k.
The conditions on ensure
that different basis functions
(i.e., with different j and k) are
orthonormal. There are sev-
eral popular choices (e.g., by
Watershed Haar and Daubechies) for ,

331
that trade off various desirable weak perspective: An approxi-
properties, such as compact- mation of viewing geometry
ness in space and time, and between the pinhole or
ability to approximate certain full perspective camera and
classes of functions. the orthographic imaging
model. The projection of
a homogeneous 3D point
X =  X Y Z 1 is given by the
φ1,–2 φ1,–1 φ1,0 φ1,1 φ1,2
formula

x p11 p12 p13 p14 


y = p 21 p22 p
23 p24
X
φ2,–2 φ2,–1 φ2,0 φ2,1 φ2,2

for the affine camera, but with


the additional constraint that
the vectors  p11  p12  p13  and
 p21  p22  p23  are scaled rows
The mother Haar wavelet of a rotation matrix, i.e.,
and some of the derived p11 p21 + p12 p22 + p13 p23 = 0
wavelets jk .
wavelet descriptor: Description weakly calibrated stereo: Any
of a shape in terms of the two-view stereo algorithm for
coefficients of a wavelet decom- which the only calibration
position of the original sig- information needed is the fun-
nal, in a manner similar to damental matrix between the
Fourier shape descriptors for cameras is said to be weakly
2D curves. See also wavelet calibrated. In the general,
transform. multi-view, case, means the
camera calibration is known
wavelet transform: Representa- up to a projective ambiguity.
tion of a signal in terms of a Weakly calibrated systems can-
basis of wavelets. Similar to not determine Euclidean prop-
the Fourier transform, but as erties such as absolute scale
the wavelet basis is a two- but will return results that are
parameter family of functions projectively equivalent to the
jk , the wavelet transform of Euclidean reconstructions.
a d-D signal is an d + 1-D
function. However, the num- Weber’s Law: If a difference can
be just perceived between two
ber of distinct values needed
stimuli of values I and I + I
to represent the transform of
then it should be possible to
a discrete signal of length n
perceive a difference between
is just On. The wavelet trans-
two stimuli with different val-
form has similar applications to ues J and J + J where II ≤ JJ .
the Fourier transform, but the
wavelet basis offers advantages weighted least squares: A least
when representing natural sig- square error estimation pro-
nals such as images. cess in which the data elements

332
also have a weight associated. wide baseline stereo: The
The weights might specify the stereo correspondence prob-
confidence or quality of the lem (sense 1) in the par-
data item. The use of weights ticular case when the two
can help make the estimation images for which correspond-
more robust. ence is to be determined
weighted walkthrough: A dis- are significantly different
crete measure of the relative because the cameras are
position of two regions. The separated by a long baseline.
measure is a histogram of the In particular, a 2D window
walkthrough relative positions around a point in one image
of every pair of points selected is expected to look signifi-
from the two regions. cantly different in the second
image due to foreshortening,
weld seam tracking: Using occlusion, and lighting effects.
visual feedback to control a
robot welding device, so it wide field-of-view: Where the
maintains the weld along the optics is designed to capture
desired seam. light rays forming large angles
(say 60 or more) with the
white balance: A system of color optical axis. See also wide
correction to deal with differ- angle lens, panoramic image
ing light conditions, in order for mosaic, panoramic image
white objects to appear white. stereo, plenoptic function
white noise: A noise process in representation.
which the noise power at all width function: Given a 2D
frequencies is equal (as com- shape (closed subset of the
pared to pink noise ). When plane) S ⊂ 2 , the width func-
considering spatially distribu- tion w is the width of
ted noise, white noise means the shape as a function of
that there is distortion at all orientation. Specifically, the
spatial frequencies (i.e., large projection P = x cos  +
distortions as well as small). y sin  x y ∈ S, and w =
whitening filter: See noise- max P − min P.
whitening filter.
Wiener filter: A regularized in-
wide angle lens: A lens with verse convolution filter. Given
a field of view greater than a signal g that is known
about 45 . Wide angle lenses to be the convolution of an
allow more information to be unknown signal f and a known
collected in a single image, corrupting signal k, it is desired
but often suffer a loss of to undo the effect of k and
resolution, particularly at the recover f . If F G K are the
periphery of the image. Wide respective Fourier transforms
angle lenses are also more of f g k, then G = F ·K , so the
likely to require correction for inverse filter can recover F =
nonlinear lens distortion. G ÷ K . In practice, however, G

333
Gaussian (exp− x 2 ) window-
2
is corrupted by noise, so that
when an element of K is less ing functions in 2D are shown
than the average noise level, here:
the noise is amplified. Wiener’s
filter combats this tendency by
adding an estimate of the noise
to the divisor. Because the div-
isor is complex, a real formula-
tion is as follows:
winged edge representation:
G GK ∗ GK ∗
F= = = A graph representation for
K KK ∗ K 2 polyhedra in which the nodes
and adding the frequency represent vertices, edges and
domain noise estimate N , we faces. Faces point to bounding
obtain the Wiener reconstruc- edge nodes, that point to
tion of F given G and K : vertices, that point back to
connecting edges, that point
GK ∗ to adjacent faces. The winged
F= edge term comes from the fact
K 2 + N
that edges have four links that
windowing: Looking at a small connect to the previous and
portion of a signal or image successor edges around each
through a “window”. For ex- of the two faces that contain
ample, given the vector x = the given edge, as seen here:
x1      x100 , one might look
LINKED EDGES
at the window of 11 values
centered around 50, x45 55 .
Often used in order to restrict FACE 2 FACE 1
some computation such as the CURRENT EDGE
Fourier transform to a small
part of the image. In general,
windowing is described by a
LINKED EDGES
windowing function, which is
multiplied by the signal to
give the windowed signal. For winner-takes-all: A strategy
example, a signal f x   n → whereby only the best candi-
 and windowing function date (e.g., algorithm, solution)
w x  are given, where  con- is chosen, and any other is
trols the scale or width of w. abandoned. Commonly found
Then the windowed signal is in the neural network and
learning literature.
x w x − c
x  = f
fw 
wire frame representation: A
where c is the center of the representation of 3D geometry
window. The Bartlett (1 − x ), in terms of vertices and edges
x
Hanning ( 12 + 12 cos 

), and linking the vertices. It does not

334
include descriptions of the sur-
face between the edges, and
in particular, does not include
information for hidden line
removal. This is a wire frame
model of a cube:

world coordinates: A coord-


inate system useful for placing
objects in a scene. Usually
this is a 3D coordinate system
with some arbitrarily placed
origin (e.g., at a corner of a
room). This contrasts with ob-
ject centered representations,
viewer centered representa-
tions or camera coordinates.
world coordinates

335
X

X-ray: Electromagnetic radiation


of shorter wavelength than
ultraviolet light, i.e., less than
about 4–40 nm. Very short
X-rays are called gamma rays.
Useful for medical imaging
because of their power to xor operator: A combination of
penetrate most materials, and two binary images A B where
for other areas such as lithog- each pixel i j in A xor B is 1 if
raphy because of the short exactly one of Ai j and Bi j
wavelength. is 1. The output is the comple-
X-ray CAT/CT: Computed axial ment of the xnor operator. The
tomography or computer- rightmost image is the xor of
assisted tomography. A tech- the two left images:
nique for dense 3D imaging
of the interior of a material,
particularly the human body.
Characterized by use of an
X-ray source and imaging
system that rotate round the
object being scanned.
xnor operator: A combination
of two binary images A B
where each pixel i j in A xnor
B is 0 if exactly one of Ai j
and Bi j is 1. The output
is the complement of the xor
operator. The rightmost image
is the xnor of the two left
images:

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

337
Y

YARF: Yet Another Road Fol- (roughly purple/green). Con-


lower. A Carnegie-Mellon Uni- version to YIQ from RGB is by
versity autonomous driving Y I Q = MR G B where
system.
 
yaw: A 3D rotation representa- 0299 0596 0212
tion component (along with M = 0587 −0275 −0523
pitch and roll) often used for 0114 −0321 0311
cameras or moving observers.
The yaw component specifies YUV: A color representation sys-
a rotation about a vertical axis tem in which each point is rep-
to give a side-to-side change in resented by luminance ( Y ) and
orientation. This figure shows two chrominance channels (U
the yaw rotation direction: which is Red minus Y, and V
which is Blue minus Y ).

Yaw
direction

YCrCb: See YUV where U = Cr


and V = Cb.
YIQ: Color space used in NTSC
television. Separates Lumi-
nance (Y) and two color
signals: In-phase (roughly
orange/ blue), and Quadrature

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

339
Z

Zernike moment: The dot


product of an image with one
of the Zernike polynomials.
The Zernike polynomial
Unm   = Rnm e im
is defined in polar coordinates
  on the plane, only within zero crossing operator: A class
the unit disk. When project- of feature detector that, rather
ing an image, data outside the than detecting maxima in the
unit disk are generally ignored. first derivative, detects zero
The real and imaginary parts crossings in the second deriva-
are called the even and odd tive. An advantage of finding
polynomials respectively. The zero crossings rather than max-
radial function Rnm t is given ima is that the edges always
by form closed curves, so that
regions are clearly delineated.

n−m/2
t n−2l n − l! A disadvantage is that noise is
−1l  n+m   
l=0 l! 2 − l ! n−m 2
−l ! enhanced, so the image must
be carefully smoothed before
the second derivative is com-
The Zernike polynomials have puted. A common kernel that
a history in optics, as basis combines smoothing and se-
functions for modeling nonlin- cond derivative computation is
ear lens distortion. Below, the the Laplacian of Gaussian.
leftmost column shows the real
and imaginary parts of e im zero crossings of the Lapla-
for m = 1. Columns 2–4 show cian of a Gaussian: See zero
the real and imaginary parts crossing operator.
of Zernike polynomials U11 , U31 , zipcode analysis: See postal
and U22 : code analysis.

Dictionary of Computer Vision and Image Processing R.B. Fisher, K. Dawson-Howe, A. Fitzgibbon,
C. Robertson and E. Trucco © 2005 John Wiley & Sons, Ltd. ISBN: 0-470-01526-8

341
zoom: 1) To change the effect-
ive focal length of a camera in
order to increase magnification
of the center of the field of
view. 2) Used in referring to
the current focal-length setting
of a zoom lens.
zoom lens: A lens that allows
the effective focal length (or
“zoom”) to be varied after
manufacture. Zoom lenses may
be manipulated manually or
electrically.
Zucker–Hummel operator: A
convolution kernel for sur-
face detection in volumetric
images. There is one 3 × 3 ×
3 kernel for each of the three
derivatives. For example, if
vx y z is the volume image,
v
z
is computed as the convolu-
tion of the kernel c = −S 0 S
where S is the 2D smoothing
kernel
 
a b a
S= b 1 b
a b a

and √ a = 1/ 3 and b =
1/ 2. Specifically, the ker-
nel Dzi j k = Si jck, and
v v
the kernels for and
x y
are permutations of Dz given
by Dxi j k = Dz j k i and
Dyi j k = Dzk i j.
Zuniga–Haralick operator: A
corner detection operator that
is based on the coefficients of a
cubic polynomial approximat-
ing the local neighborhood.

342

You might also like