Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 27

Optical Character Recognition

Chuan-kai Yang

Outline

OCR Systems Historical Perspective Commercial Applications

OCR Systems
Image Scanner OCR software/hardware Output interface

Image Scanner

Detector Illumination source Scan lens Document transport

OCR Software/Hardware

Document Analysis Character Recognition Contextual Processing

Document Analysis

Character segmentation/isolation Compensate poor scanning quality Image enhancement Underline removal Noise removal

Character Recognition

Feature extractor

Determine the descriptors or feature set Derived feature set is fed into the classifier Template matching (matrix matching) Structural classification Bayesian classifier Artificial neural networks

Classifier

Template Matching 1/2


One of the most common methods Individual image pixels are used as features Classification is performed by comparing an input character image with a set of templates(prototypes) Each comparison results in a similarity measure between the input character and the template, the comparison is pixel by pixel The character identity is assigned the identity of the most similar template

Template Matching 2/2

Template matching is a trainable process because template characters may be changed In many commercial systems, PROMs (programmable read-only memory) store templates containing single fonts. If a suitable PROM exists for a font then template matching can be trained to recognize that font

Structural Classification 1/2

It utilize structural features and decision rules to classify characters Features may be defined in terms of character strokes, character holes, or other character attributes such as concavities For instance, the letter P may be described as a vertical stroke with a hole attached on the upper right side

Structural Classification 2/2

For a character image input, the structural features are extracted and a rule-based system is applied to classify the character Structural methods are also trainable The construction of good feature set and a good rule-base can be time-consuming

Other methods

Discriminant function classifier use hypersurfaces to separate the feature description of characters Bayesian methods seek to minimize the loss function associated with misclassification through the use of probability theory ANNs, which are closer to human perception, employ mathematical minimization techniques These techniques are used in commercial OCR systems

Recognition Rate

For machine-printed characters, the rate can reach over 99% For hand-written characters, the rate is typically lower

Contextual Processing

The number of word choices for a given field can be limited by the content of another field

Knowing the zip code can help knowing address Spelling checker Verified interactively by the user

Post processing to correct recognition error


Non-Roman Character Recognition

Output Interface

The output interface allows character recognition results to be electronically transferred into the domain that uses the results:

Spread sheets Databases Word processors

Historical Perspective

Born in 1951 GISMO by M. Sheppard: a robot reader-writer 1954 J. Rainbow developed a prototyped machine that was able to read uppercase typewritten output at the fantastic speed of one character per minute Systems that cost one million dollars were not uncommon

Some ANSI Standard Fonts


machine

machine

handwritten

Todays OCR Systems

It is not uncommon to find PC-based OCR systems for under $800 capable of recognizing several hundred characters per minute Some system advertise themselves as omnifont-able

Commercial Applications

Task-Specific Readers

Assigning ZIP codes to letter mail Reading data entered in forms, e.g. tax forms Automatic accounting procedures used in processing utilities bills Verification of account numbers and courtesy amounts on bank checks Automatic accounting of airline passenger tickets Automatic validation of passports

Address Readers

Up to 400 fonts, and up to 45000 mail pieces per hour.

Form Readers

Trained with a blank form Scan regions that should be filled with data Some system can process forms at a rate of 5800 forms per hour

Check Readers

Capture the check image Cross reference the amounts specified at both places An operator can correct misclassified characters by cross-validating the recognition results

Bill Processing Systems

Focus on certain regions on a document where the expected information are located

Account number Payment value

Airline Ticket Readers

Scan/Match

Reservation record Travel agent record Passenger ticket

Some systems can scan tickets upt to 260000 tickets per day (17 tickets per second)

Passport Readers

Reads the travelers


Name Date of birth Passport number

Match against the database records containing information on fugitive felons and smugglers

General Purpose Page Readers

High-end: higher data throughput and more advanced capabilities Can adapt the recognition engine to customer data to improve accuracy Can even detect type face (bold face and italic) Low-End:

Mostly used in an office with desktop workstations Could handle a broad range of documents at a lower rate and accuracy

You might also like