Online Handwritten Script Recognition

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 40

ONLINE HANDWRITTEN SCRIPT

RECOGNITION

Presentation By:
Priya Ahuja
CSE 6C
10-CSU-110

CONTENTS
Online Recognition
Why Handwriting Recognition?
Why is Handwriting Recognition difficult?
Properties of Scripts
Features of Handwritten Script
Steps in Handwritten Script Recognition
Future Scope
References

Online Recognition
On-line handwriting recognition involves the automatic conversion of
text as it is written on a special digitizer or PDA, where a sensor picks
up the pen-tip movements as well as pen-up/pen-down switching. That
kind of data is known as digital ink and can be regarded as a dynamic
representation of handwriting. The obtained signal is converted into
letter codes which are usable within computer and text-processing
applications.
The elements of an on-line handwriting recognition interface typically
include:
1) a pen or stylus for the user to write with.
2) a touch sensitive surface, which may be integrated with, or adjacent
to, an output display.
3) a software application which interprets the movements of the stylus
across the writing surface, translating the resulting strokes into digital
text.

Devices that accept on-line handwritten data: From the top left, Pocket
PC, CrossPad, Ink Link, Cell Phone, Smart Board, Tablet with display,
Anoto pen, Wacom Tablet, Tablet PC

Why Handwriting Recognition?


Online documents may be written in different languages and scripts. A
single document page in itself may contain text written in multiple scripts.
A script is defined as a graphic form of a writing system. Different scripts
may follow the same writing system. For example, the alphabetic system is
adopted by scripts like Roman and Greek , and the phonetic-alphabetic
system is adopted by most Indian scripts , including Devnagari. A specific
script like Roman may be used by multiple languages such as English,
German, and French.
The general class of Han-based scripts include Chinese, Japanese, and
Korean (we do not consider Kana or Han-Gul).
Devnagari script is used by many Indian languages, including Hindi,
Sanskrit, Marathi, and Rajasthani.
Arabic script is used by Arabic, Farsi, Urdu, etc.
Roman script is used by many European languages like English, German,
French, and Italian.

The most important characteristic of


online documents is that they capture
the temporal sequence of strokes
while writing the document.
We use stroke properties as well as the
spatial and temporal information of a
collection of strokes to identify the
script used in the document.

Why is Handwriting Recognition


Difficult?
High variability of individual characters
Writing style

Stroke width and quality


Size of the writing
Variation even for single writer!
Reliable segmentation of cursive script extremely problematic due to
Merging of adjacent characters

Properties of scripts
Arabic : Arabic is written from right to left within a line and the lines
are written from top to bottom. A typical Arabic character contains a
relatively long main stroke which is drawn from right to left, along with
one to three dots. The character set contains three long vowels.
Short markings(diacritics) may be added to the main character to
indicate short vowels. Due to these diacritical marks and the dots in
the script, the length of the strokes vary considerably.
Cyrillic: Cyrillic script looks very similar to the cursive Roman script. The
most distinctive features of Cyrillic script, compared to Roman script are:
1) individual characters, connected together in a word, form one long
stroke,
2) the absence of delayed strokes .Delayed strokes cause movement of the
pen in the direction opposite to the regular writing direction.

The word trait contains three delayed strokes,


shown as bold dotted lines here.

Devnagari : The most important characteristic of Devnagari


script is the horizontal line present at the top of each word,
called Shirorekha .These lines are usually drawn after the word is written and
hence are similar to delayed strokes in Roman script. The words are written
from left to right in a line.
The word devnagari written in
Devnagari script. The Shirorekha is shown
in bold.

Han: Characters of Han script are composed of multiple short strokes. The
strokes are usually drawn from top to bottom and left to right within a character.
The direction of writing of words in a line is either left to right or top to bottom.

Hebrew: Words in a line of Hebrew script are written from right to left and,
hence, the script is temporally similar to Arabic. The most distinguishing factor
of Hebrew from Arabic is that the strokes are more uniform in length in the
former.

Roman: Roman script has the same writing direction as Cyrillic, Devnagari,
and Han scripts. In addition, the length of the strokes tends to fall between that
of Devnagari and Cyrillic scripts.

Features of Handwritten script


Horizontal Interstroke Direction (HID):This is the sum of the
horizontal directions between the starting points of consecutive strokes
in the pattern. The feature essentially captures the writing direction
within a line.

where Xstart(.) denotes the x coordinate of the pen-down position of the stroke,
n is the number of strokes in the pattern,
and r is set to 3 to reduce errors due to abrupt changes in direction between
successive strokes.
The value of HID falls in the range [r n , n r].

Average Stroke Length (ASL): Each stroke is resampled during


preprocessing so that the sample points are equidistant. Hence, the number
of sample points in a stroke is used as a measure of its length. The Average
Stroke Length is defined as the average length of the individual strokes in the
pattern.

where n is the number of strokes in the pattern.


The value of ASL is a real number which falls in the range [1.0, R 0 ],where the
value of R0 depends on the resampling distance used during preprocessing.
Shirorekha Strength: This feature measures the strength of the horizontal
line component in the pattern using the Hough transform. The value of this
feature is computed as:

Where H(r,denotes the number of votes in the (r,th bin in the twodimensional Hough transform space. The Hough transform can be
computed efficiently for dynamic data by considering only the sample
points. The numerator is the sum of the bins corresponding to line
orientations between -10o and 10o and the denominator is the sum of
all the bins in the Hough transform space. The value of Shirorekha
Strength is a real number which falls in the range [0.0, 1.0].

Shirorekha Confidence: We compute a confidence measure for a


stroke being a Shirorekha .Each stroke in the pattern is inspected for
three different properties of a Shirorekha; Shirorekhas span the width
of a word, always occur at the top of the word, and are horizontal.
Hence, the confidence (C) of a stroke (s) is computed as:

Stroke Density: This is the number of strokes per unit length (x-axis) of the
pattern. Note that the Han script is written using short strokes, while
Roman and Cyrillic are written using longer strokes.
where n is the number of strokes in the pattern. The value of Stroke
Density is a real number and can vary within the range (0.0,R 1), where R1
is a positive real number.
Aspect Ratio: This is the ratio of the width to the height of a pattern. The
value of Aspect Ratio is a real number and can vary within the range (0.0,
R2), where R2 is a positive real number.
Reverse Distance: This is the distance by which the pen moves in the
direction opposite to the normal writing direction. The normal writing
direction is different for different scripts. The value of Reverse Distance is a
nonnegative integer and its observed values were in the range [0,1200].

Average Horizontal Stroke Direction: Horizontal Stroke Direction (HD) of a


stroke, s, can be understood as the horizontal direction from the start of the
stroke to its end. Formally, we define HD(s) as:
where Xpen-down(.)and Xpen-up(.)
are the x-coordinates of the pen-down
and pen-up positions, respectively.
For an n-stroke pattern, the Average Horizontal Stroke Direction is computed as the
average of the HD values of its component strokes. The value of Average Horizontal
Stroke Direction falls in the range [-1.0,1.0].

Average Vertical Stroke Direction: It is defined similar to the Average Horizontal


Stroke Direction. The Vertical Direction (VD) of a single stroke s is defined as:

where Y pen-down(.)and Y pen-up(.) are the y-coordinates of the pen-down and pen-up
positions, respectively .For an n-stroke pattern, the Average Vertical Stroke
Direction is computed as the average of the VD values of its component strokes.
The value of Average Vertical Stroke Direction falls in the range [-1.0,1.0].

Vertical Interstroke Direction (VID): The Vertical Interstroke Direction is


defined as:

_
Y(s) is the average of the y-coordinates of the stroke points and n is the
number of strokes in the pattern. The value of VID is an integer and falls
in the range (1 -n, n -1).
Variance of Stroke Length: This is the variance in sample lengths of
individual strokes within a pattern. The value is of Variance of Stroke
Length is a nonnegative integer.

STEPS IN HANDWRITTEN SCRIPT


RECOGNITION
1. Preprocessing: Goal is to remove unwanted variation.
Common Methods: Skew / Slant / Size normalization:

Trajectory data mapped to 2D representation


Baselines / core area estimated similar to offline case

Special Online Methods: Outlier Elimination: Remove position


measurements caused by interferences
Resampling and smoothing of the trajectory
Elimination of delayed strokes.

Resampling and smoothing of the trajectory -:


Goal: Normalize variations in writing speed (no identification!)
Equidistant resampling & interpolation.

Elimination of delayed strokes -:


Handling of delayed strokes problematic, additional time variability!
Remove by heuristic rules

Feature Extraction
Basic Idea: Describe shape of pen trajectory locally
Typical Features:
Slope angle of local trajectory(represented as sin and cos :
continuous variation)
Binary pen-up vs. pen-down feature
Hat feature for describing delayed strokes(strokes that spatially
correspond to removed delayed strokes are marked)
Feature Dynamics: In all applications of HMMs dynamic
features greatly enhance performance.
Discrete time derivative of features
Here: Differences between successive slope angles

CLASSIFICATION
The last big step is classification. In this
step various models are used to map
the extracted features to different
classes and thus identifying the
characters or words the features
represent.

Andrei Andreyevich Markov


Born: 14 June 1856 in Ryazan, Russia
Died: 20 July 1922 in Petrograd (now St
Petersburg), Russia
Markov is particularly remembered for
his study of Markov chains, sequences
of random variables in which the future
variable is determined by the present
variable but is independent of the way in
which the present state arose from its
predecessors. This work launched the
theory of stochastic processes.

Markov random processes


A random sequence has the Markov property
if its distribution is determined solely by its
current state. Any random process having this
property is called a Markov random process.
For observable state sequences (state is
known from data), this leads to a Markov
chain model.
For non-observable states, this leads to a
Hidden Markov Model (HMM).

Chain Rule & Markov Property


Bayes rule

P (qt , qt 1 ,...q1 ) P (qt | qt 1 ,...q1 ) P (qt 1 ,...q1 )


P(qt , qt 1 ,...q1 ) P (qt | qt 1 ,...q1 ) P ( qt 1 | qt 2 ,...q1 ) P (qt 2 ,...q1 )
t

P(qt , qt 1 ,...q1 ) P (q1 ) P(qi | qi 1 ,...q1 )


i2

Markov property

P ( qi | qi 1 ,...q1 ) P (qi | qi 1 ) for i 1


t

P ( qt , qt 1 ,...q1 ) P (q1 ) P (qi | qi 1 ) P (q1 ) P (q2 | q1 )...P (qt | qt 1 )


i2

A Markov System
Has N states, called s1, s2 .. sN

s2

s1
N=3
t=0

s3

There are discrete timesteps, t=0, t=1,

A Markov System
Has N states, called s , s .. s
1

s2

s1
N=3
t=0
qt=q0=s3

s3

There are discrete timesteps, t=0, t=1,


On the tth timestep the system is in exactly
one of the available states. Call it qt
Note: qt {s1, s2 .. sN }

Current State

A Markov System
Has N states, called s , s .. s
Current State

s2

There are discrete timesteps, t=0, t=1,


On the tth timestep the system is in exactly
one of the available states. Call it qt
Note: qt {s1, s2 .. sN }

s1
N=3
t=1
qt=q1=s2

s3

Between each timestep, the next state is


chosen randomly.

P(qt+1=s1|qt=s2) = 1/2
P(qt+1=s2|qt=s2) = 1/2
P(qt+1=s3|qt=s2) = 0
P(qt+1=s1|qt=s1) = 0
P(qt+1=s2|qt=s1) = 0

s2

P(qt+1=s3|qt=s1) = 1

s1
qt=q1=s2

There are discrete timesteps, t=0, t=1,


On the tth timestep the system is in exactly
one of the available states. Call it qt
Note: qt {s1, s2 .. sN }

s3

N=3
t=1

Has N states, called s1, s2 .. sN

P(qt+1=s1|qt=s3) = 1/3
P(qt+1=s2|qt=s3) = 2/3
P(qt+1=s3|qt=s3) = 0

Between each timestep, the next state is


chosen randomly.
The current state determines the probability
distribution for the next state.

P(qt+1=s1|qt=s2) = 1/2
P(qt+1=s2|qt=s2) = 1/2
Has N states, called s1, s2 .. sN

P(qt+1=s3|qt=s2) = 0
P(qt+1=s1|qt=s1) = 0

s2

P(qt+1=s2|qt=s1) = 0

There are discrete timesteps, t=0, t=1,


1/2

P(qt+1=s3|qt=s1) = 1

Note: qt {s1, s2 .. sN }
2/3

1/2

s1
N=3
t=1
qt=q1=s2

On the tth timestep the system is in exactly


one of the available states. Call it qt

1/3

s3

P(qt+1=s1|qt=s3) = 1/3
P(qt+1=s2|qt=s3) = 2/3
P(qt+1=s3|qt=s3) = 0
Often notated with arcs
between states

Between each timestep, the next state is


chosen randomly.
The current state determines the probability
distribution for the next state.

P(qt+1=s1|qt=s2) = 1/2
P(qt+1=s2|qt=s2) = 1/2

Markov Property

P(qt+1=s3|qt=s2) = 0
P(qt+1=s1|qt=s1) = 0

s2

P(qt+1=s2|qt=s1) = 0

qt+1 is conditionally independent of { qt-1, qt-2,


1/2 q , q } given q .
1
0
t

P(qt+1=s3|qt=s1) = 1

In other words:
2/3

1/2

s1
N=3
t=1
qt=q1=s2

1/3

s3

P(qt+1 = sj |qt = si ) =
P(qt+1 = sj |qt = si ,any earlier history)
Notation:

P(qt+1=s1|qt=s3) = 1/3
P(qt+1=s2|qt=s3) = 2/3
P(qt+1=s3|qt=s3) = 0

aij P (qt 1 si | q s j )
i P (q1 si )

Example: A Simple Markov


Model For Weather Prediction
Any given day, the weather can be described as being
in one of three states:
State 1: precipitation (rain, snow, hail, etc.)
State 2: cloudy
State 3: sunny

Transitions between states are described by the


transition matrix

This model can then be described by the


following directed graph

Basic Calculations-1
Example: What is the probability that the
weather for eight consecutive days is
sun-sun-sun-rain-rain-sun-cloudy-sun?
Solution:
O = sun sun sun rain rain sun cloudy sun
3
3 3 1 1 3
2
3

From Markov To Hidden Markov


The previous model assumes that each state can be
uniquely associated with an observable event

Once an observation is made, the state of the system is then


trivially retrieved
This model, however, is too restrictive to be of practical use for
most realistic problems

To make the model more flexible, we will assume that


the outcomes or observations of the model are a
probabilistic function of each state

Each state can produce a number of outputs according to a


unique probability distribution, and each distinct output can
potentially be generated at any state
These are known a Hidden Markov Models (HMM), because
the state sequence is not directly observable, it can only be
approximated from the sequence of observations produced by
the system

HMM Formal Definition


An HMM, , is a 5-tuple consisting of
N the number of states
M the number of possible
observations
{1, 2, .. N} The starting state
probabilities
P(q0 = Si) = i
a11

a12 a1N

a21

a22 a2N

:
aN1

:
:
aN2 aNN

P(qt+1=Sj | qt=Si)=aij

b1(1)

b1(2)

b1(M)

The observation probabilities

b2(1)

b2(2)

b2(M)

The state transition probabilities

P(Ot=k | qt=Si)=bi(k)

The coin-toss problem


To illustrate the concept of an HMM consider the following
scenario

Assume that you are placed in a room with a curtain


Behind the curtain there is a person performing a coin-toss experiment
This person selects one of several coins, and tosses it: heads (H) or tails
(T)
The person tells you the outcome (H,T), but not which coin was used
each time

Your goal is to build a probabilistic model that best explains


a sequence of observations O={o1,o2,o3,o4,}={H,T,T,H,,}

The coins represent the states; these are hidden because you do not
know which coin was tossed each time
The outcome of each toss represents an observation
A likely sequence of coins may be inferred from the observations, but
this state sequence will not be unique

The Coin Toss Example 1 coin


As a result, the Markov model is observable since there is only one state
In fact, we may describe the system with a deterministic model where the
states are the actual observations (see figure)
the model parameter P(H) may be found from the ratio of heads and tails
O= H H H T T H
S = 1 1 1 2 2 1

The Coin Toss Example 2 coins

From Markov to Hidden Markov Model:


The Coin Toss Example 3 coins

1, 2 or 3 coins?
Which of these models is best?
Since

the states are not observable, the


best we can do is select the model that
best explains the data (e.g., Maximum
Likelihood criterion)
Whether the observation sequence is long
and rich enough to warrant a more
complex model is a different story, though

Future Scope
Over the past three decades, many different methods
have been explored by a large number of scientists to
recognize characters. A variety of approaches have
been proposed and tested by researchers in different
parts of the world to improve the experience of
usability.

References
ieeexplore.ieee.org/iel5/34/28182/01261096.pdf
F. Coulmas, Writing Systems: An Introduction to Their Linguistic Analysis:
Cambridge University Press, 2003.
R. Plamondon and S. N. Srihari, "On-line and off-line handwriting
recognition: A comprehensive survey," IEEE Transactions on Pattern
Analysis and Machine Intelligence,vol. 22, pp. 63-84, 2000.
A. L. Spitz, "Determination of the script and language content of
document images," IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 19, pp. 235-245,1997
G. X. Tan, C. Viard-Gaudin, and A. Kot, "Automatic Writer Identification
Framework for Online HandwrittenDocuments Using Character
Prototypes," Pattern Recogn.,2009.

THANK YOU !

You might also like