Unit 1_merged - Lecture slides

LECTURE 1
Introduction, image representation and image

analysis
Computer Vision
• Vision – allows us to perceive and understand the world around us
• What is computer vision?

• Duplicate human vision by a machine (computer)
• Electronic perception and understanding of an image
• Giving computers the ability to see - Computer Vision

• Computer vision – simulating human vision or mimicking human
vision?
Major challenge in Computer Vision
• Our world is 3D
• What about visual sensors (eg. Cameras)?
• Usually give 2D images
• Projection of 3D world to 2D images - loss of information
• Advanced 3D visual sensors available
• Eg. TeraHertz scans
• Analyzing 3D images is more complicated than 2D
• Motion makes CV more complex
• Moving object or moving camera
Model development
• Model has to accommodate multiple variations
• Multiple models to learn variations
• Typical operations in model development

• Image capture
• Pre-processing
• Segmentation
• Model fitting
• Motion prediction
• Qualitative/quantitative conclusion
• Different algorithms available for each operation
Why is CV difficult?
• 3D à 2D loss of information
• Guessing the actual size of object requires additional yardstick
• Interpretation – use of previous knowledge/experience to understand
an image
Image interpretation: image data à model
• Noise – inherently present in real world measurements
• Noise adds uncertainty
• Tools required to deal wilth uncertainty – probabilistic tools
• Too much data – Images/video data is big
• Processor and memory requirements
• Real time performance is a challenge
Why is CV difficult? (…contd)
• Brightness – complex image formation physics
• Reconstructing surface orientation from intensity variations is ill-posed
• Local window vs need for global view
• Local elements (like pixel) analyzed by image analysis algorithm
• Global understanding is difficult with local observations
Local observations Global view

Image representation and image analysis
tasks
• Image understanding
Input image à Model of the image
• Image model
• Reduces information in the image
• Retains only “relevant” inofrmation based on application
• Raw image à … intermediate representations … à image
interpretation
• Intermediate representations – designed by CV
• Establish and maintain relations between entities within and between layers
Image representation
• Four levels of image representation
• From raw image to image with features
• Understanding objexts from features
• Input image – little or no abstraction

• Image with features – highly abstract
description
Image representation
• Broad categories of image representation
• Low level image processing
• High level image understanding
Low level processing
• Knowledge of image content is not used
• Low level processing involves procedures like –
• Image compression
• Noise filtering
• Edge extraction
• Image sharpening etc.
• Image representation on a computer
• Digitized representation in the form of a matrix
• RGB channels for color representation
• A set of matrices (images) for video data
High level processing
AI methods come
under HLP
• For understanding image content
• Make decisions based on understanding of image content
HLP
Input image World model
Compare
Low level Image with Perceived If Differences
processing features / reality appear
Image Model
Feedback to update
image model
Image representation - example
Acutal image Brightness representation of image
Machine only sees an array of numbers

Image representation - example
• Machine only sees an array of numbers
• To understand these array of numbers
• General knowledge
• Domain specific knowledge
• Information extracted from image
• Etc. are required
• asf
Brightness representation of image

LLP vs HLP
Low Level Processing High Level Processing
Image content is not used Image content is used – object size,
shape, mutial relations between objects
etc. are used
Low level data from original image is Features relevant to the end goal are
used extracted and used
Overlaps with image processing Focusses on image understanding and
techniques – digitizations, noise filtering, decision making
edge extraction, segmentation etc.
Summary
• CV is the science to mimick biological vision systems
• What routine tasks performed by biological vision systems would be
good to accomplish by machines?
• Image representation, LLP and HLP algorithms used depend on the
application (end goal) – eg. Autonomous vehicle navigaton or object
tracking, medical diagnosis etc.
• Modeling human visual system requires understanding our own brain.
CV tasks
Algorithmic
components
Lecture 2
Image representation, digitization and properties
Image representation – some basics
• Mathematical models are used to describe signals - eg. Images
• Signal – ’function’ of one or more variables with physical meaning
• Signals can be 1D, 2D, 3D or higher dimensional
• Functions can be continuous, discrete or digital
• Continuous – domain and range are both continuous
• Discrete – domain is discrete, range is continuous
• Digital - domain and range are both discrete
Image representation – some basics (contd.)
• Monochromatic and color images – usually in the visible spectrum
• Images acquired outside the visible spectrum (but within EM
spectrum)
• Infra-red images
• Microwave imaging
• Images acquired outside EM spectrum (medical domain in particular)

• Magnetic Resonance images
• X-ray computed tomography (CT) images
• Ultrasound images
• Frequently, these images are of 3 or more dimensions
Continuous image function
• Image function – maps brightness at image points
• Why brightness?
• Integrates different optical quantities
• Avoids complicated image formation processes
• Image based on brightness points is called ‘intensity image’
• 2D representation of 3D scene – loss of information
• Can a 3D scene be reconstructed from a 2D image? – ill-posed problem
• Attempt to find depth of points in an image
• Brightness depends on multiple factors – material, reflectance,
illumination, orientation of light source etc.
• Reconstruction of 3D geometry of an object from an ‘intensity image’ is also ill-posed
problem
Continuous image function
• Monochromatic static image represented as continuous image
function f(x,y). (x,y) are coordinates in a plane
• For computerized image processing 𝑥, 𝑦 ∈ 𝑁
• Image function is a region R in the plane
𝑅= 𝑥, 𝑦 , 1 ≤ 𝑥 ≤ 𝑥!"# , 1 ≤ 𝑦 ≤ 𝑦!"#
• Orientation – Cartesian fashion (horizontal x-axis, vertical y-axis, origin
bottom-left
• Orientation in matrix form also used (row, column, origin top-left)
• Range of values for f(x,y) is also limited – lowest brightness value is
black, highest is white, intermediate values represent gray-scale
Image quality
• Image quality is proportional to four types of resolutions
• Spatial resolution – no. of image samples in image plane (eg. No. of pixels per
unit area)
• Spectral resolution – bandwidth of light frequencies captured by sensor
• Radiometric resolution – no. of distinguishable gray levels/color levels
• Time resolution – interval between time samples (samples per second)
• Time resolution important for dynamic images – time sequences of
images are processed
• For mathematical transforms, f(x,y) is considered ‘well behaved’
• f(x,y) is integrable
• f(x,y) has an invertible Fourier transform, etc.
Image digitization
• Captured image is a continous function f(x,y) of two coordinates in a
plane
• For computer processing of image, appropriate representation is
required
• Digitization involves two steps performed on the continuous function
f(x,y)
• Sampling
• Quantization
Image digitization - Sampling
• f(x,y) is sampled into a matrix with M rows and N columns
• Finer sampling achieves better approximation of the continuous
image function f(x,y)
• Larger the value of M and N, better the approximation
• Standard TV uses 512 x 512 grid
• HDTV 1920 x 1080 pixels
• What should be the sampling period (sampling interval)?
• What should be the geometric arrangement of sampling points
(sampling grid)?
Sampling period
• f(x,y) is continuous image function; x and y are continuous
• f(x,y) is sampled at points 𝑥 = 𝑗Δ𝑥, 𝑦 = 𝑘Δ𝑦 for 𝑗 = 1,2 … 𝑀 and 𝑘 =
1,2 … 𝑁
• Δ𝑥 and Δ𝑦 are distances between neighbouring sample points along x
and y axes respectively; Δ𝑥 and Δ𝑦 are called sampling intervals or
sampling periods
• Discrete (Sampled) image is 𝑓(𝑗Δ𝑥, 𝑘Δ𝑦)
• How to choose the sampling periods Δ𝑥 and Δ𝑦
• Choice of Δ𝑥 and Δ𝑦 depend on maximum frequency components
along x and y axes
Shannon’s sampling theorem
• Δ𝑥 and Δ𝑦 chosen based on Shannon’s sampling theorem
! !
Δ𝑥 < "# and Δ𝑦 < "$ where U and V are maximal frequencies
along x and y dimensions respectively
• Sampling period should be chosen such that it is less than half of the
smallest interesting detail in the image.
• In practice, sampling period about 10 times smaller than that indicated
by Shannon theorem is used
! !
• What happens if this condition is violated? i.e. if Δ𝑥 > and Δ𝑦 >
"# "$
Shannon’s sampling theorem
! !
• What happens if this condition is violated? i.e. if Δ𝑥 > and Δ𝑦 >
"# "$
• Aliasing occurs - causes distortion of image
Geometric arrangement of sampling points
• Sampling points arranged in a grid
• Square or hexagonal grids are used in practice
• Pixel – infinitely small sampling point in the grid

• All pixels together constitute the entire image
Quantization
• Each sample 𝑓(𝑗Δ𝑥, 𝑘Δ𝑦) of the image can have a brightness value
from a continuous range of brightness levels
• Brightness range is divided into K quantization levels
• Each sample 𝑓(𝑗Δ𝑥, 𝑘Δ𝑦) brightness is approximated to the nearest
quantization level – brightness range is discretized
• Quantization introduces error – should be kept minimum so as not to
be perceptible
• Number of quantization levels should be sufficiently high – shading
details of image are lost if K is too low
Quantization (contd.)
• If ‘b’ bits are used to represent each quantized brightness level, then
the total number of quantization levels is
K = 2b or b = log2K
• If K is increased, b also increases
• Note the appearance of false contours
when K is reduced in the figure à
Note: Quantized brightness levels are
natural numbers
Lecture 3
Digital image properties
Introduction
• Digital image consists of pixels of finite size
• Pixels contain brightness of a particular location in an image
• Pixels usually arranged in a rectangular grid – represented by 2D
matrix
• Elements of the matrix are natural numbers corresponding to
quantized brightness levels
Metric properties - Distance
• Any function D is a ‘distance’ if it satisfies 3 conditions
𝐷 𝑝, 𝑞 ≥ 0; 𝐷 𝑝, 𝑞 = 0 𝑖𝑓 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑖𝑓 𝑝 = 𝑞, 𝑖𝑑𝑒𝑛𝑡𝑖𝑡𝑦
𝐷 𝑝, 𝑞 = 𝐷 𝑞, 𝑝 , 𝑠𝑦𝑚𝑚𝑒𝑡𝑟𝑦
𝐷 𝑝, 𝑟 ≤ 𝐷 𝑝, 𝑞 + 𝐷 𝑞, 𝑟 , 𝑡𝑟𝑖𝑎𝑛𝑔𝑢𝑙𝑎𝑟 𝑖𝑛𝑒𝑞𝑢𝑎𝑙𝑖𝑡𝑦
p r
Distance - definitions
• Euclidean distance DE between two points (𝑖, 𝑗)and (ℎ, 𝑘)
𝐷! [ 𝑖, 𝑗 , ℎ, 𝑘 ] = 𝑖 − ℎ " + 𝑗 − 𝑘 "
• DE is intuitive but involves costly calculation due to square root
• Manhattan (or city block) distance D4 à only horizontal and vertical
moves are allowed
𝐷# 𝑖, 𝑗 , ℎ, 𝑘 = 𝑖 − ℎ + 𝑗 − 𝑘
• Chessboard distance D8 à diagonal moves are allowed (minimal no.
of King moves on a chessboard to go from one part to another
𝐷$ 𝑖, 𝑗 , ℎ, 𝑘 = 𝑚𝑎𝑥 | 𝑖 − ℎ |, | 𝑗 − 𝑘 |
Distance definitions
Pixel adjacency
• Two pixels are adjacent if the distance between them is 1
• 4-Neighbors
• Two pixels (p, q) are 4-neighbors if they have D4(p,q) = 1
• 8-Neighbors
• Two pixels (p, q) are 8-neighbors if they have D8(p,q) = 1
Distance transform
• Distance transform computes distance of pixels from some image
subset (such as object in an image)
• Used in several fast transforms in image processing
• Compute distance between object pixels (1’s) and background pixels
(0’s)
Object pixels have zero
distance from themselves
Low values for close pixels
Higher values for pixels

further away
Image analysis concepts
• Edge
• Local property of a pixel and its immediate neighborhood
• Vector quantity having magnitude and direction
• Gradient of image function f(x,y) is used to compute edges
• Edge direction is perpendicular to gradient direction
• Crack edge
• Creates a structure between pixels
• Each pixel has 4 crack edges – definded by its relation to its 4 neighbors
• Direction is of increasing brightness and multiple of 90o
• Magnitude is the absolute difference between the brightness of the relevant
pair of pixels
Image analysis concepts
• Border
• Set of pixels in a region R that have one or more neighbors outside R
• Border is a global concept of a region; edge is a local concept
• Inner border – border description corresponds to inner border
• Outer border – set of pixels surrounding the inner border
Some inner border elements coincide
in the discrete representation
They may be distinct in the continuous

case
Distinct borders in
continuous case
Convex and non-convex regions
• Convex region – any 2 points within the region are connected by a

straight line which lies entirely withing the region
• Non convex region – straight line connecting 2 points within the
region is not completely within the region
Objects, holes and background
VIDEO
• Letters are the OBJECTS
• White space surrounded by letters eg. Inside D and O are HOLES
• The remainder of the white space in the image is the BACKGROUND
Topological properties - Convex Hull
• Convex Hull – smallest convex region containing the given input region
• Deficit of convexity – region inside convex hull that does not belong to
the object
• LAKES are akin to HOLES – lakes can be thought of as holes within a
convex hull
• BAYS are contiguous with the border
Topological properties
• Topological properties are invariant to “rubber sheet transforms”
• For eg. If the balloon is stretched
• Contiguity of regions is not changed
• Number of holes in the regions is also not changed
• Topological properties are qualitative properties
Brightness Histogram hf(z)
• hf(z) provides the frequency of brightness level ‘z’ in the image
• hf(z) gives an indication of the probability that a pixel (x,y) has
brightness level ‘z’
𝑝(𝑧; 𝑥, 𝑦)
• hf(z) is an estimate of 𝑝(𝑧)
• Position of pixel is not relevant for histogram
• One histogram may correspond to multiple images
• If position of ‘object’ is changed on a constant background, histogram will
remain unchanged
Brightness Histogram hf(z)
• Histogram information used for image segmentation, object

separation from background etc.
Entropy H
• Entropy is a measure of information
• Suppose there are ‘n’ brightness levels 𝑥% , 𝑥" , 𝑥& … … 𝑥' in an image
• Let 𝑝(𝑥( )be the probability that the brightness level of a pixel is
𝑥( 𝑓𝑜𝑟 𝑘 = 1,2,3 … … 𝑛
• Entropy is defined as
' '
1
𝐻 = J 𝑝 𝑥( 𝑙𝑜𝑔" = − J 𝑝 𝑥( 𝑙𝑜𝑔" 𝑝 𝑥(
𝑝 𝑥(
()% ()%
• 𝑝 𝑥( can be estimated from the image histogram ℎ* (𝑧)
• Higher the probability à lower is the information contained
• Entropy is useful in image compression techniques
Reading Exercise
• Sections 2.3.4 Visual Perception of the Image and
• 2.3.5 Image Quality
in
• Image Processing, Analysis and Machine Vision by Sonka, Hlavac,
Boyle
Lecture 4
Noise in images, color images, Data Structures for Image Analysis
Noise in images
• Images are corrupted by noise
• Noise introduced during capture, transmission or processing
• Noise is typically modeled as White Noise
Gv(f)
• Gv(f) is the power spectral density of white noise f
• White noise models the worst case scenario of noise corruption

• Real world noise has decreasing intensity with increasing frequency
• White noise simplifies calculations
Noise in images
• Gaussian Noise – has a Gaussian probability density function
1 " $!% !
!
𝑝 𝑥 = 𝑒 # &
𝜎 2𝜋
Where 𝜇 is the mean and 𝜎 is the standard deviation of the random
variable 𝑥
• Gaussian noise is a good approximation of noise in practical scenarios
• Noise is additive
𝑓 𝑥, 𝑦 = 𝑔 𝑥, 𝑦 + 𝑣(𝑥, 𝑦)
where 𝑔 is the input image and 𝑣 represents noise; 𝑔 𝑎𝑛𝑑 𝑣 are
independent of each other
Noise in images
• When White Gaussian Noise gets ADDED to the image (signal), the
image is said to be corrupted by AWGN
• Often the noise is considered to be ZERO mean i.e. 𝜇 = 0. Then the
noise pdf becomes
1 " $ !
!
𝑝 𝑥 = 𝑒 #&
𝜎 2𝜋
'()*+, -./01 4 ∑(#,%) 7 ! ($,:)
• Signal to noise ratio 𝑆𝑁𝑅 = = =∑ !
2.(30 -./01 5 (#,%) < ($,:)
• SNR is a measure of image quality – higher SNR means better quality
• SNR in dB is 𝑆𝑁𝑅=> = 10𝑙𝑜𝑔"? (𝑆𝑁𝑅)
Noise in images
• If noise 𝑣 is dependent on the image function 𝑔, then the noise is
multiplicative
𝑓 = 𝑔𝑣
• Quantization noise – discussed in lecture 2
• Occurs when the number of quantization levels K is too low
• Shading details are lost (false contours appear) in the image
• Impulse noise
• Affects individual pixels
• Brigntness level of an individual pixel varies significantly from its neighborhood
• Salt and pepper noise – saturated impulse noise
• Affected pixel is either BLACK or WHITE
Color images
• Color perception in humans is subjective, hence not precise in
absolute terms
• White light is a composition of multiple wavelength (frequency)
components
• Only a narrow section of the EM spectrum is visible to humans
EM spectrum Visible spectrum

Color images – primary colors
• Most colors can be represented as combinations of primary colors
• RED 𝜆 = 700 𝑛𝑚, GREEN 𝜆 = 546.1 𝑛𝑚 and BLUE 𝜆 = 435.8 𝑛𝑚
• Color formed by Surface reflection and Body reflection
• Surface reflection
• Spectrum of reflected light is same as illuminant
• Gives ‘shiny’ effect and does not have color
• Body reflection
• Random reflection from internal pigments of the material
• Some wavelengths are absorbed and some are reflected (with varying
intensities) – CAUSES COLOR FORMATION
Multispectral images
• Intensity of each narrow band of wavelength is measured for each
pixel
• A separate image for each wavelength component is formed by
independent digitization
• All the wavelength component images together constitute a multi-
spectral image
Color perceived by humans
• Light receptors (sensors) on retina are:
• Cones – color sensitive receptors
• Rods – monochromatic receptors for low ambient light sensing
• Three categories of cones (based on sensed wavelength)
• Short (S) ≈ 430 𝑛𝑚
• Medium (M) ≈ 560 𝑛𝑚
• Long (L) ≈ 610 𝑛𝑚
• Spectral response of each sensor is given as
for 𝑖 = 𝑆, 𝑀, 𝐿
• where 𝐼(𝜆) is spectral density of illumination (light source)
• 𝑆 𝜆 represents surface reflection characteristics
• 𝑅𝑖 (𝜆) is spectral sensitivity of the ith sensor
Data Structures for image analysis
• Two essential parts of a computer program
• Data
• Algorithms
• Choice and simplicity/complexity of algorithms depends on data
organization
• Aim of Computer Vision
• Find a relation between input image and models of real world
• From raw image to real world model
• Semantic information gets added to support interpretation of image data
• CV comprises 2 major tasks
• Design of intermediate representations of images (data structures)
• Design of algorithms – to create the data structures define relations between them
Levels of image data representation
Bidirectional informational flow between the representation levels
Iconic Segmented Geometric Relational

images images representations models
• Lowest • Second • Third representation • Fourth representation

representation representation level level
level level • Info of 2D and 3D • Efficient data
• Raw image data • Grouping of shapes treatment
• Integer matrices image parts - • Quantification of • High level of
of pixel objects shapes abstraction
brightness • Knowledge of • Useful to • Only data relevant to
• Some pre- application model/simulate end use is retained
processing may domain is useful effects of • Apriori knowledge of
have been for segmentation illumination and application is used
applied motion • AI used to build
relational models
Traditional image data structures
• Image data structures provide
• Direct representation of image information
• Basis for more complex image representation methods
• Matrices – most common data structure
• Global information derived from matrices can speed up algorithms
• Global information – histogram, co-occurrence matrix, integral matrix
• Chains – used to describe object borders
• Graph – weighted graph, region adjacency graph
• Relational structures
• Hierarchical data structures
• Pyramids
• Quadtrees
Lecture 5
Traditional image data structures
Traditional image data structures - Matrices
• Matrices – most common data structure – low level representation
• Matrix elements - integers representing brightness or some other
property of the pixel
• Matrix can represent rectangular as well as hexagonal sampling grids
• (row, col) indices provide image information at the pixel coordinates
• Matrix contains spatial relations among semantically important parts
of the image – e.g neighborhood relation
• Programming languages represent a matrix using an array data
structure
Matrices – special images
• Binary image
• Only 2 brightness levels
• Matrix elements are zeros or ones
• Multispectral image
• A separate matrix to represent each spectral band
• Multiple such matrices together represent a single multispectral image
• Hierarchical image data structures
• Multiple matrices of different resolutions
• Different level of detailing represented in each matrix
• Useful for parallel computing architectures
• dbbbbb
Global image information in image matrix
• Image matrix has huge image data
• Global information is concise – occupies less memory
• Global information can speed up algorithms
• Global information – derived from original image matrix
Global information
Histogram Co-occurrence matrix Integral image
• Histogram – probability estimate that a pixel has a certain brightness

Co-occurrence matrix (Cr)
• Computes how often pairs of pixels with a ‘specific value’ and ‘offset’
occur in the image
• Offset (∆𝑥, ∆𝑦) is a position operator that defines the spatial relation
(r) between pixels. For eg. Offset (1,2) could indicate 1 right, 2 down
• Cr is L x L matrix, where L is the no. of different pixel values
(brightness levels)
• (i,j)th value of Cr gives the no. of times in the image that the ith and jth
pixel values occur in the spatial relation given by offset
• Cr is useful to analyze texture in an image
Consider image I with 5 different pixel levels
Offset = (∆𝑥, ∆𝑦) = (1, 2) à 1 right, 2 down
Cr will be a 5 x 5 matrix
Cr(1,4) gives the number of times in the image

pixel values 1 and 4 occur for the defined offset
Diagonal elements of Cr i.e. Cr(1,1), Cr(2,2), Cr(3,3),

Cr(4,4), Cr(5,5) give the number of times pixel levels
1 and 1, 2 and 2, 3 and 3, 4 and 4, 5 and 5 occur at
the defined offset
Image matrix
• 4 pixel levels in image
– 0,1,2,3
• Offset = (∆𝑥, ∆𝑦) = (1,
0) à1 right, 0 down
• Pixel levels 1 and 0
occur in the defined
offset 2 times in the
image
• How many times pixel
levels 0 and 1 occur in
the defined offset ?
Image matrix Co-occurrence matrix (4x4) • Observe the diagonal
elements of Cr and
verify in the image
Integral image
• Matrix representation containing global image information
• Integral image values at a location (𝑖, 𝑗) is denoted 𝑖𝑖(𝑖, 𝑗)
• 𝑖𝑖(𝑖, 𝑗) represents the sum of all the original image pixel values left
and above (𝑖, 𝑗)
𝑖𝑖 𝑖, 𝑗 = * 𝑓(𝑘, 𝑙)
!"#,%"&
Integral image is padded to the left

and the top to allow for calculation
Integral image
Integral images allow rapid pixel
summations of image subregions
Pixel summation can be done in constant

time regardless of neighborhood size
Pixel summation for shaded region (4

pixels) in input image is 3 + 2 + 5 + 4 = 14
This can be computed using 4 reference

values from the corresponding shaded
region in the integral image as:
46 – 22 – 20 + 10 = 14
Integral image
• Pixel summation can be done in constant time regardless of
neighborhood size
Consider pixel summation over the red
shaded region containing 6 pixels
Pixel summation from input image is

5+4+6+2+1+3 = 21
From integral image this can be

computed using 4 reference values as
74-39-29+15 = 21
Summations in rectangular regions can

be done rapidly (in constant time)
irrespective of the region size
Traditional image data structures - Chains
• Used for describing object borders
• Basic element of chain is called symbol
• Chains are appropriate for data that can be arranged as a
sequence of symbols
• Chain codes (Freeman chain codes)
• Useful to describe borders or ‘one pixel wide’ lines in images
• Border is defined by coordinates of a reference pixel and
the sequence of symbols corresponding to a unit length
line in several predefined orientations
• 0,1,2…7 denote the symbols in the 8 orientations
Chain codes (Freeman chain codes)
• The chain code from the reference pixel starting the chain (marked by
arrow) is
0 0 0 7
7
• 00077665555556600000006444444442221111112234445652211
Run length coding
• To record only areas belonging to objects in an image
• Selected area is represented as a ‘list of lists’
• Each row of image is described by a sublist
• First element is row number
• Subsequent numbers are in pairs – first element of pair is beginning of a run
and second element is the end of a run (beginning and end are described by
column coordinates)
• There can be multiple sequences in a row
Run length coding
Row number Start of run End of run
(11144) (1 1 1 4 4)
(214)
Row number Start of run End of run

(52355)
(2 1 4)
Run length code for this binary image is ((11144) (214) (52355))
Chain code and run length code are types of chain data structures
Topological data structures - Graphs
• Image described as a set of elements and their relations
• A graph G is defined as G = (V,E) where
• V = (v1, v2, v3, …,vn) are a set of nodes and E = (e1, e2, e3, …,em) are a set of arcs
• Each arc ek is incident to a pair of nodes (vi, vj)
• Degree of a node is number of arcs incident on the node.
v2 e2 e4 v5
e1
e5
v3 v4
v1
e3 v6
Topological data structures – Weighted
Graphs
w1
v2 e2 e4 v5
e1
w1 e5
v4 w4
v3
v1 w3
e3 v6
In a weighted graph, values (weights) are assigned to arcs, nodes or both

Topological data structures – Region
adjacency graphs
• Nodes correspond to regions
• Neighboring regions are connected by an arc
• Regions divided based on similar properties (brightness, texture etc.)
Region 0 denotes pixels

outside the image
Nodes of degree 1 denote

simple ‘holes’ – e.g.
region (node) 5 in the
figure
Relational structures or relational databases
• Used to represent information from an image in the form of a table
Lecture 6
Heirarchical data structures
Prof. Mohammed Usman

Hierarchical data structures
• CV is computationally intensive
• Large amount of data to be processed
• Real time response is desired
• Parallel processing is desired to speed up response
• How to divide data for parallel processing?
• Hierarchical data structures provide a mechanism to divide the data
for parallel processing
• Highest resolution is used only for those parts of image for which it is
essential
• Other parts of image are processed at lower resolution
Matrix Pyramids (M-pyramids)
M0: 1 x 1
• M-pyramid is a sequence of images (single pixel)
𝑀! , 𝑀!"# , 𝑀!"$ , … 𝑀%
• 𝑀&"# is derived from 𝑀& by reducing the M1: 2 x 2
resolution by one half
• In practice, square matrices are used i.e. 𝑀! M2: 4 x 4
has dimension that is a power of 2 - so that
𝑀% corresponds to a single pixel
• 𝑀! is the original image with highest resolution
• M-pyramids are used to work with an image at
different resolutions simultaneously M3: 8 x 8
• One degree smaller resolution contains 4

times less data – processing becomes 4 times
faster
M-pyramids
• Number of image pixels used by M-pyramid:
• For an original resolution image (level L) of dimensions N x N, no. of
pixels = N2
• At level L-1, no. of pixels = (1/4)*N2
• At level L-2, no. of pixels = (1/16)*N2 and so on.
#
• No. of pixels used by M-pyramid to store all matrices is 𝑁 + ' 𝑁 $ +
$
# $ $ # #
𝑁 +⋯=𝑁 1+ + + ⋯ ≈ 1.33𝑁 $
#( ' #(
Tree pyramids (T-pyramids)
• Often it is advantageous to use several resolutions simultaneously
rather than choose just one image from the M-pyramid
• T-pyramid is useful in such cases
• Image is divided into 4 quadrants at each hierarchical level
• Image is divided into child nodes until leaf node is a single pixel
…………
Original image: Level 0 (root level) Level 1: 4 child nodes Level 2: each level 1 child node has
4 child nodes of its own
T-pyramids
• Every node of a T-pyramid has 4 child nodes
• Nodes at the last level (Level L) are called ‘leaf nodes’
• Leaf nodes represent individual pixels and have no child nodes
• Every node has a parent except the root node (Level 0 node)
Quad trees
• Modification of T-pyramids
• Like T-pyramids, image is divided into 4 quadrants
• Subsequent divisions in the hierarchy are applied only to those
quadrants that have non-homogenous regions (regions with varying
brightness levels)
Quad tree representation is efficient

to represent images with large
homogenous regions
Quad trees
• Quadrants 1 and 4 are homogeneous – hence not divided further
• Quadrant 2 is divided further into 4 quadrants
Only quadrant 3 is non-homogeneous at this level – hence

subdivided

Unit 1_merged - Lecture slides

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 1_merged - Lecture slides

Uploaded by

Copyright:

Available Formats

LECTURE 1

Introduction, image representation and image

• What is computer vision?

• Giving computers the ability to see - Computer Vision

• Multiple models to learn variations

• Typical operations in model development

Local observations Global view

• Input image – little or no abstraction

Acutal image Brightness representation of image

Machine only sees an array of numbers

Brightness representation of image

• Images acquired outside EM spectrum (medical domain in particular)

• Pixel – infinitely small sampling point in the grid

Low values for close pixels

Higher values for pixels

They may be distinct in the continuous

• Convex region – any 2 points within the region are connected by a

• Histogram information used for image segmentation, object

• Gv(f) is the power spectral density of white noise f

• White noise models the worst case scenario of noise corruption

EM spectrum Visible spectrum

Iconic Segmented Geometric Relational

• Lowest • Second • Third representation • Fourth representation

Histogram Co-occurrence matrix Integral image

• Histogram – probability estimate that a pixel has a certain brightness

Offset = (∆𝑥, ∆𝑦) = (1, 2) à 1 right, 2 down

Cr(1,4) gives the number of times in the image

Diagonal elements of Cr i.e. Cr(1,1), Cr(2,2), Cr(3,3),

Integral image is padded to the left

Pixel summation can be done in constant

Pixel summation for shaded region (4

This can be computed using 4 reference

Pixel summation from input image is

From integral image this can be

Summations in rectangular regions can

Row number Start of run End of run

In a weighted graph, values (weights) are assigned to arcs, nodes or both

Region 0 denotes pixels

Nodes of degree 1 denote

Prof. Mohammed Usman

• One degree smaller resolution contains 4

Quad tree representation is efficient

Only quadrant 3 is non-homogeneous at this level – hence

You might also like