Image Features Using Wavelets and Applications To Document Image Processing

IMAGE FEATURES USING
WAVELETS AND APPLICATIONS TO

DOCUMENT IMAGE PROCESSING
Dr. S A Angadi
Professor
Department of Computer Science and Engineering
PG Centre,
Visvesvaraya Technological University, Belgaum
Contents
 Document Image Processing : An Introduction
 Wavelets : An Overview
 Discrete Wavelet Transform and Images
 Image Features using Wavelets
 Applications
Document Image Processing: An Introduction
 DIP/DIA is the theory and practice of recovering the symbol

structures of digital images scanned from paper or produced by
computer
 DIA is a subfield of Digital Image processing
 Digital images of natural objects: X-rays, fingerprints, faces, scenery,
etc. are NOT part of DIA
 Digital images of symbolic objects: Postal addresses, printed articles,
forms, music sheets, engineering drawings, topographic maps belong to
DIA
 Source: Scanners, printers, fax machines, hand!
 Incidental text: license plates, billboards, subtitles, in photos and video
 WWW ??
 DIA’s grand goal is take us to the land of paperless office
Document Image Processing
200 dpi images 400 dpi images
Document Image Examples

Meter
Mark
Digital
ender’s Address Endorsem Post Mark
ent
In Case of Undeliverable as Addressed Return to Sender
Linear
Code
Delivery Address
Document Image Examples

-Postal documents
Document Image Examples-Forms

Document Image Examples-Unconstrained Text

Document Image
Examples-Graphics
Document Image
Analysis
Textual Processing Graphical Processing
Optical Page Region and

Line
Character Layout Symbol
Processing
Recognition Analysis Processing
Text Skew, blocks, Lines, curves, Filled

paragraphs corners regions
 The ultimate solution of Doc. Image Processing would be for computers
to deal with paper documents as they deal with other forms of computer
media.
 A file of picture elements/ pixels is the raw input data to Doc. IP/ Doc.
IA systems
 The first step in document analysis is to perform processing on this
image to prepare it for further analysis. Such processing includes:
 thresholding to reduce a gray-scale or color image to a binary image,
 reduction of noise to reduce extraneous data, and
 thinning and region detection to enable easier subsequent detection of
pertinent features and objects of interest.
 This is also called pixel-level processing (also called preprocessing and
low-level processing in literature)
Document
Data Capture
Pixel Level Processing
Feature Level Analysis
Text Analysis and Recognition Graphics Analysis and Recognition

 The next major step is extraction of intermediate
features that aid in the final recognition
 The features include features from individual pixels or from
collection of pixels or by transforming the information in
collection of pixels (eg: RLC features, normalization and
feature extraction, shape descriptors, compactness,
asymmetry, topology, contour smoothness, Hough
transform features, DCT features DWT features etc)
 The features extracted from the image are subjected to
processing, for deriving the information from text/
graphics
Documents Image Processing
 There are two main types of analysis that are
applied to text in documents (Text processing)
 One is optical character recognition (OCR) to
derive the meaning of the characters and words
from their bit-mapped images, and the other is page
layout analysis to discover formatting of the text,
and from that to derive meaning associated with the
positional and functional blocks in which the text is
located
 Graphics recognition and interpretation is an
important topic in document image analysis since
graphics elements pervade textual material, with
diagrams illustrating concepts in the text, company
logos heading business letters, and lines separating
fields in tables and sections of text (graphics
Processing)
 The regions may be intermixed
 For different type of processing, different features are
essential
Document Image Features
 For Document Image Analysis features employed are categorized
into,
 Image Features
 Textual Features
 Structural Features
 Image features are either extracted directly from the image (e.g. the
Density of black pixels in a region) or extracted from a Segmented
image (e.g. the number of horizontal lines in a segmented block)
 Image features extracted at the level of a whole image are called
global image features;
 Image features extracted from the regions of an image are called
local image features
 Structural features (e.g. relationships between
objects in the page) are obtained from physical or
logical layout analysis
 Textual features (e.g. presence of keywords) may
be computed from OCR output or directly from
document images
 Some classifiers use only image features, only
structural features, or only textual features; others
use a combination of features from several groups
Image Features Structural Features Textual Features
Various levels of Density Physical Layout Textual features
Attributes of connected Logical Structures Textual features obtained
components Results of functional before layout
Column/ row gaps labeling Textual features from
Text histogram Spatial relations OCR results analysis
Location and size of cells
Line and text features
Physical layout features
All these features can be subjected to transforms to obtain more meaningful information
Wavelet Transform: An Overview
 What are Transforms
 Transforms are applied to raw signal/data to obtain
further information from signal/ data which is not
readily available from raw data.
 Fourier Transform gives frequency content of time
domain signal.
 It will also give point value representation of a
polynomial
 Why do we need frequency content of a signal/ ECG
 FT provides a signal which contains only the frequency
domain information
 It does not give any information of the signal in the
time domain
 When time localization of the frequency components is
needed we require a transform giving time frequency
representation
 STFT, Wigner Distributions and Wavelet Transforms
provide time frequency representation
 Both FT and WT are reversible transforms- one can go
from raw data to processed data and back
 “The wavelet transform is a tool that cuts up data,
functions or operators into different frequency
components, and then studies each component with
a resolution matched to its scale”
 tau and s are translation and scale parameters
 psi(t) is the transforming function and is called the
mother wavelet
 there are two main differences between the STFT and
the CWT
 1. The Fourier transforms of the windowed signals are not taken,
and therefore single peak will be seen corresponding to a
sinusoid, i.e., negative frequencies are not computed
 2. The width of the window is changed as the transform is
computed for every single spectral component, which is probably
the most significant characteristic of the wavelet transform
 the CWT can be thought of as the inner product of
the test signal with the basis functions psi_(tau ,s)
(t):
Where
 Wavelet transform is referred to as “Fourier Transform of 20th Century”
 Wavelets are wavelike oscillatory signals of finite bandwidth both in Time
and in Frequency
 Wavelets are basis functions of spaces with certain properties
 Wavelets provide time scale(frequency) representation of non stationary
signals
 Based on multiresolution approximation (MRA)
• Approximate a function at various resolutions using a scaling function,
(t)
• Keep track of details lost using wavelet functions, (t)
• Reconstruct the original signal by adding approximation and detail
coeff
 Implemented by using a series of lowpass and highpass filters
• Lowpass filters are associated with the scaling function and provide approximation
• Highpass filters are associated with the wavelet function and provide detail lost in approximating the signal
Discrete Wavelet Transform
 The foundations of the DWT go back to 1976 when Croiser, Esteban,
and Galand devised a technique to decompose discrete time signals.
 Crochiere, Weber, and Flanagan did a similar work on coding of speech
signals in the same year, they named their analysis scheme as subband
coding
 A Discrete Wavelet Transform is any Transform for which
wavelets are discretely sampled
 It transforms a discrete time signal to a discrete wavelet
representation
 It converts an input series x0, x1, ..xm, into one high-pass
wavelet coefficient series and one low-pass wavelet coefficient
series (of length n/2 each)
signal
lowpass highpass
filters
Approximation Details
(a) (d)
 The procedure starts with passing this signal
(sequence) through a half band digital lowpass
filter with impulse response h[n]
 Filtering a signal corresponds to the mathematical
operation of convolution of the signal with the
impulse response of the filter.
 The DWT analyzes the signal at different frequency
bands with different resolutions by decomposing the
signal into a coarse approximation and detail information
 DWT employs two sets of functions, called scaling
functions and wavelet functions, which are associated
with low pass and highpass filters, respectively.
 The decomposition of the signal into different frequency
bands is simply obtained by successive highpass and
lowpass filtering of the time domain signal
 The original signal x[n] is first passed through a
halfband highpass filter g[n] and a lowpass filter h[n]
 After the filtering, half of the samples can be
eliminated according to the Nyquist’s rule, since
the signal now has a highest frequency of p/2
radians instead of p
 The signal can therefore be subsampled by 2,
simply by discarding every other sample. This
constitutes one level of decomposition and can
mathematically be expressed as follows:
 This decomposition halves the time resolution since
only half the number of samples now characterizes the
entire signal. However, this operation doubles the
frequency resolution, since the frequency band of the
signal now spans only half the previous frequency
band, effectively reducing the uncertainty in the
frequency by half
 At every level, the filtering and subsampling will
result in half the number of samples (and hence half
the time resolution) and half the frequency band
spanned (and hence double the frequency resolution)
 In practice, such transformation will be applied
recursively on the low-pass series until the desired
number of iterations is reached
 The frequencies that are most prominent in the original signal
will appear as high amplitudes in that region of the DWT
signal that includes those particular frequencies
 The difference of this transform from the Fourier transform is
that the time localization of these frequencies will not be lost
 However, the time localization will have a resolution that
depends on which level they appear
 This procedure in effect offers a good time resolution at high
frequencies, and good frequency resolution at low frequencies
 Most practical signals encountered are of this type
 One important property of the discrete wavelet
transform is the relationship between the impulse
responses of the highpass and lowpass filters. The
highpass and lowpass filters are not independent of
each other, and they are related by
where g[n] is the highpass, h[n] is the lowpass filter, and L is the
filter length (in number of points).
 Note that the two filters are odd index alternated
reversed versions of each other. Lowpass to
highpass conversion is provided by the (-1)n term.
Filters satisfying this condition are commonly used
in signal processing, and they are known as the
Quadrature Mirror Filters (QMF). The two filtering
and subsampling operations can be expressed by
 The reconstruction in this case is very easy since half band
filters form orthonormal bases
 The procedure is followed in reverse order for the
reconstruction
 The signals at every level are upsampled by two, passed
through the synthesis filters g’[n], and h’[n] (highpass and
lowpass, respectively), and then added
 The interesting point here is that the analysis and synthesis
filters are identical to each other, except for a time reversal
Therefore, the reconstruction formula becomes (for each layer)
 However, if the filters are not ideal halfband, then
perfect reconstruction cannot be achieved.
Although it is not possible to realize ideal filters,
under certain conditions it is possible to find filters
that provide perfect reconstruction
 The most famous ones are the ones developed by
Ingrid Daubechies, and they are known as
Daubechies’ wavelets
2D- Discrete Wavelet Transform
 Significant lossy data reduction is possible using

DWT
 How do we generalize these concepts to 2D?
 2D functions  images f(x,y)  I[m,n]
intensity function
 Reasons to take 2D-DWT of an image
 Compression
 Denoising
 Feature extraction
Discrete Wavelet Transforms
 2D-DWT of an image
 We start by defining a two-dimensional scaling and
wavelet functions
s ( x, y )   ( x)   ( y ) s ( x, y )   ( x)  ( y )
 “Subset” of scale and position based on power of two

 rather than every “possible” set of scale and position in
continuous wavelet transform
 Behaves like a filter bank: signal in, coefficients out
 2 D DWT for Image
 2 D DWT for Image
 2 D DWT for Image/ has applications in Image
Compression/ Image Recognition
DWT on Images
~
LL Ak 1
COLUMNS
H 1 2
ROWS ~ 2 1
H
COLUMNS
…… ~ (h)
G 1 2 LH Dk 1
ROWS
INPUT COLUMNS
……
IMAGE ~
H 1 2 HL Dk(v)1
~ 2 1
G
ROWS
~ (d)
G 1 2 D
HH k 1
COLUMNS
LLL LLH LLH

LL LH LH LH
INPUT LHL LHH LL
LHL LHH
IMAGE
HL HH HL HH HL HH
DWT on Images
Downsample columns along the rows: For each row, keep the
2 1 even indexed columns, discard the odd indexed columns
Downsample rows along the columns : For each column, keep

1 2 the even indexed rows, discard the odd indexed rows
Upsample columns along the rows: For each row, insert zeros at
2 1 between every other sample (column)
Upsample rows along the columns: For each column, insert zeros
1 2 at between every other sample (row)
DWT on Images
LL Ak 1 1 2 H
2 1 H
LH (h)
Dk 1 1 2 G
ORIGINAL
IMAGE
Dk(v)1
HL
1 2 H
2 1 G
HH D(d) 1 2 G
k 1
 We perform the 2-D wavelet transform by applying 1-D wavelet
transform first on rows and then on columns.
Rows Columns
H 2
H 2 LL
G 2
f(m, n) LH
H 2
G 2
HL
G 2
HH
Integer based Wavelets
 By using a so-called lifting scheme, integer-
based wavelets can be created.
 Using the integer-based wavelet, one can
simplify the computation.

 Integer-based wavelets are also easier to
implement by a VLSI chip than non-integer

wavelets.
Image Features using Wavelets
 The wavelet transform provides an appropriate basis for image
handling because of its beneficial features
 The characteristics of the wavelet transform are:
 The ability to compact most of the signal’s energy into a few
transformation coefficients, which is called energy compaction
 The ability to capture and represent effectively low frequency
components (such as image backgrounds) as well as high frequency
transients (such as image edges)
 The variable resolution decomposition with almost uncorrelated
coefficients
 The ability of a progressive transmission, which facilitates the
reception of an image at different qualities
 The wavelet transform of the images will lead to
computation four different types of coefficients,
namely,
 Approximation Coefficients
 Horizontal Coefficients
 Vertical Coefficients
 Diagonal coefficients
 Energy Values of the transformed images at various
levels
 Different type of texture features can be extracted
at various levels of decomposition from wavelets,
some of the wavelets employed are,
Applications of Wavelets to Image Processing
 Wavelets for Image Compression

 The Discrete wavelet transform decomposes an image into a set of
successfully smaller orthogonal images. Often it is possible to coarsely
quantize or eliminate low valued coefficients without sacrificing the integrity
of the image
 For a given image, you can compute the DWT of, say each row, and discard
all values in the DWT that are less then a certain threshold
 We then save only those DWT coefficients that are above the threshold for
each row, and when we need to reconstruct the original image, we simply pad
each row with as many zeros as the number of discarded coefficients, and use
the inverse DWT to reconstruct each row of the original image
 We can also analyze the image at different frequency bands, and reconstruct
the original image by using only the coefficients that are of a particular band
Applications of Wavelets to Image Processing
 Wavelets for Image Enhancement

 Since the DWT decomposes the image into components of different
size, position and orientation, you can alter the coefficients before
reconstruction so that you can attenuate certain characteristics of the
image
 Image Fusion
 Image fusion combines 2 or more registered images of the same
object into a single object, that in most cases is better than the
original. This is helpful in the medical imaging field where multiple
images from different machines are employed
 The images are combined in the transform domain by taking the
highest amplitude coefficient then performing the inverse on the new
fused image
Word Level Script Identification of Text in Low Resolution Images of
Display Boards using Wavelet Features
 Wavelet features of image for recognition

applications
 Coefficients
 Transforms of coefficients
 Energy Coefficients etc
Word Level Script Identification of Text in Low Resolution Images of
Display Boards using Wavelet Features
 Themethod distinguishes input word into five scripts namely;

Devanagari, Kannada, English, Malayalam and Tamil.
 The method investigates use of

 zone wise wavelet energy features
 wavelet log mean deviation features and
 newly obtained properties of wavelet coefficients
 for script identification of text in low resolution images of
display boards.
54
City Block Distance Measure for Script Identification using Wavelet Features
Preprocessing and Feature Extraction
55
 The preprocessing is done to binarize the image and generate

bounding box around it.
Zone Wise Wavelet Energy Features
 The detailed coefficient Dj1 (Horizontal Energy Band) is
divided into three horizontal and four vertical zones at each
level j as shown in figure in next slide.
 The three horizontal zones namely top zone, middle zone and
bottom zone covers 30%, 40% and 30% of the region of band
and all vertical zones are divided to have equal size.
Feature Extraction
56 Zone Wise Wavelet Energy Features
Three horizontal and four vertical zones of detailed coefficient Dj1
• Further, 3j horizontal and 4j vertical zone energy features are obtained, at

each level j.
•The method also computes 2 relational features as difference values
between top and middle zone, and middle and bottom again at each level j.
Feature Extraction
57
 Then, the wavelet energy feature is computed from the detailed coefficient
Dj3 (Diagonal Energy Band) at every level j.
 Hence, totally 10 wavelet energy features are obtained at each level j.
These features (10 at each level) are stored into feature vector X.
u1 N
Ej1h1  (   (D ( m, n )) /( MxN )
j1
m 1 n 1
u2 N
E j1h 2  (   (D
m  u 1  1 n 1
( m , n )) /( MxN )
j1
M N
E j 1h 3  (   (D
m  u 2 1 n 1
j1 ( m, n )) /( MxN )
Feature Extraction
58 Wavelet Log Mean Deviation Features
•The method computes totally 3j wavelet log mean deviation features at every
resolution level j, which are stored into feature vector X.
M N |Djp (m ,n )|  1)
  log(
m 1n 1 Sj 
LMDjp 
MN
•In the current work, the value of ∂ = 0.001 is experimented
• During experiments it is observed that, the obtained values gives better representation of
texture.
•Further, 2 more additional features that model relation between detailed energy bands are
determined as stated in equations below.
•Hence, this step records 10 features (5 at each level) into feature vector X.
LMDj4 = LMDj1 - LMDj2;
LMDj5 = LMDj2 - LMDj3;
Feature Extraction
59 Wavelet Vertical Run Features
•A wavelet vertical run R(Ø,d) is defined as number of consecutive wavelet

coefficients that runs for a distance greater than or equal to a specified value
d, in a given direction Ø=90 degree (The value 90 is fixed for vertical
direction).
•And the wavelet vertical run feature WRFj2z is number of occurrences of

wavelet vertical runs in a given area or region.
•These statistical features are obtained from vertical detailed coefficient Dj2
halved into four equal sized vertical regions/zones leading to a dimension of
8 features at both decomposition levels (4 features for every level j), which
are further recorded into feature vector X.
Feature Extraction
60 Wavelet Vertical Run Features
WRFj21 WRFj22 WRFj23 WRFj24
Four vertical zones of detailed coefficient Dj2

k*zone_ size
WRFj 2 z  R(Ø,d)
n1( k 1)*zone_ size
X = [Ej1h1 Ej1h2 Ej1h3 Ej1h4 Ej1h5 Ej1v1 Ej1v2 Ej1v3 Ej1v4 Ej3 LMDj1 LMDj2 LMDj3 LMDj4 LMDj5
WRFj1 WRFj2 WRFj3 WRFj4, j=1,2 ]
Knowledge Base Construction
61
•For the purpose of knowledge base construction,
the images were captured from display boards of
government offices in India. The image database
consists of 1450 Kannada, 1200 English, 900
Malyalam, 900 Tamil and 900 Devanagari script
word images of varying resolutions
•The images in the database are characterized by

variable number of characters, variable font size and
style, uneven thickness and spacing between
characters, minimal information context, small skew,
noise and other degradations.
Knowledge Bases Construction
62 •Then, 70% of the different samples from each

script are chosen to train the system. The stored
information in the knowledge base sufficiently
characterizes all variations in input and script class
separation.
•It is also noticed that, training system with more

samples will improve the performance of the system.
At the end of training four knowledge bases
WD_IMKB_KAN, WD_IMKB_ENG,
WD_IMKB_MAL, and WD_IMKB_TAM for
Kannada, English, Malyalam and Tamil Scripts are
generated.
Knowledge Bases Construction
63
•Testing is carried out for all word images of

database containing 70% trained and 30% test
samples
•The experimentation is also done to identify the

script of 14081 word images (13271 Kannada and
810 English words) of experimental data set 1 and
14252 word images (13442 Kannada and 810
English words) of data set 2
Script Class Identification
64
• Computational Strategy for Devanagari Script

Identification
•In this stage, horizontal run statistics of test word image are
used to determine whether the written word in display board
image belongs to Hindi or other scripts.
•Initially, the horizontal runs of length greater than 6 are
computed for every row of word image and are stored into a run
feature vector HRV. The vector records row number and run
length count of all runs for all rows.
•Then, the model uses linear discrimanant function D1 to classify
word image into two classes’ w1 and w2 based on run vector.
Where, w1 corresponds to Hindi script and w2 corresponds to
other scripts category.
65 City Block Distance Measure for Script Identification
•In this stage, test data instance is processed to obtain wavelet

features, and a feature vector Xt is constructed .
•Xt = [tEj1h1 tEj1h2 tEj1h3 tEj1h4 tEj1h5 tEj1v1 tEj1v2

tEj1v3 tEj1v4 tEj3 tLMDj1 tLMDj2 tLMDj3 tLMDj4 tLMDj5
tWRFj1 tWRFj2 tWRFj3 tWRFj4, j=1,2 ]
•Then, the smallest city block distance between test data

instance Xt and data set of each knowledge base is determined
to obtain distances d1, d2, d3, and d4.
66
The smallest distance between test word image and knowledge base is
used to identify the script class.
Results and Analysis
67
 The effectiveness of proposed methodology for

script identification using wavelet features has
been evaluated for 33683 low resolution word
images. The images were captured from display
boards of government offices in India.
 The proposed methodology has produced good

results for low resolution word images
containing text of different size, font, and
alignment with varying background.
Results and Analysis
68
 The approach also identifies script of small skewed text regions.

Hence, the proposed method is robust and achieves an identification
accuracy of 92% for Kannada Script, 97.65% for English, 82.5%
for Malyalam and 87% for Tamil Script.
 A closer examination of results revealed that misclassifications

arise due to minimal information content, noise and larger skew,
which affect the texture of region of text and performance of the
texture based approach.
 It is also found that, if the knowledge bases are trained for all
variations and degradations, better performance can be obtained.
69
THANK YOU
Useful Link
 Matlab wavelet tool using guide
 http://www.wavelet.org
 http://www.multires.caltech.edu/teaching/
 http://www-dsp.rice.edu/software/RWT/
 www.multires.caltech.edu/teaching/courses/
waveletcourse/sig95.course.pdf
 http://www.amara.com/current/wavelet.html

Image Features Using Wavelets and Applications To Document Image Processing

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Image Features Using Wavelets and Applications To Document Image Processing

Uploaded by

Copyright:

Available Formats

IMAGE FEATURES USING

WAVELETS AND APPLICATIONS TO

 DIP/DIA is the theory and practice of recovering the symbol

200 dpi images 400 dpi images

Document Image Examples

Document Image Examples

Document Image Examples-Forms

Document Image Examples-Unconstrained Text

Textual Processing Graphical Processing

Optical Page Region and

Text Skew, blocks, Lines, curves, Filled

Pixel Level Processing

Feature Level Analysis

Text Analysis and Recognition Graphics Analysis and Recognition

 Significant lossy data reduction is possible using

 “Subset” of scale and position based on power of two

LLL LLH LLH

Downsample rows along the columns : For each column, keep

simplify the computation.

implement by a VLSI chip than non-integer

 Wavelets for Image Compression

 Wavelets for Image Enhancement

 Wavelet features of image for recognition

 Themethod distinguishes input word into five scripts namely;

 The method investigates use of

 The preprocessing is done to binarize the image and generate

Three horizontal and four vertical zones of detailed coefficient Dj1

• Further, 3j horizontal and 4j vertical zone energy features are obtained, at

•A wavelet vertical run R(Ø,d) is defined as number of consecutive wavelet

•And the wavelet vertical run feature WRFj2z is number of occurrences of

WRFj21 WRFj22 WRFj23 WRFj24

Four vertical zones of detailed coefficient Dj2

•The images in the database are characterized by

62 •Then, 70% of the different samples from each

•It is also noticed that, training system with more

•Testing is carried out for all word images of

•The experimentation is also done to identify the

• Computational Strategy for Devanagari Script

65 City Block Distance Measure for Script Identification

•In this stage, test data instance is processed to obtain wavelet

•Xt = [tEj1h1 tEj1h2 tEj1h3 tEj1h4 tEj1h5 tEj1v1 tEj1v2

•Then, the smallest city block distance between test data

 The effectiveness of proposed methodology for

 The proposed methodology has produced good

 The approach also identifies script of small skewed text regions.

 A closer examination of results revealed that misclassifications

You might also like