Professional Documents
Culture Documents
Optical Character Recognition: Chuan-Kai Yang
Optical Character Recognition: Chuan-Kai Yang
Chuan-kai Yang
Outline
OCR Systems
Image Scanner OCR software/hardware Output interface
Image Scanner
OCR Software/Hardware
Document Analysis
Character segmentation/isolation Compensate poor scanning quality Image enhancement Underline removal Noise removal
Character Recognition
Feature extractor
Determine the descriptors or feature set Derived feature set is fed into the classifier Template matching (matrix matching) Structural classification Bayesian classifier Artificial neural networks
Classifier
One of the most common methods Individual image pixels are used as features Classification is performed by comparing an input character image with a set of templates(prototypes) Each comparison results in a similarity measure between the input character and the template, the comparison is pixel by pixel The character identity is assigned the identity of the most similar template
Template matching is a trainable process because template characters may be changed In many commercial systems, PROMs (programmable read-only memory) store templates containing single fonts. If a suitable PROM exists for a font then template matching can be trained to recognize that font
It utilize structural features and decision rules to classify characters Features may be defined in terms of character strokes, character holes, or other character attributes such as concavities For instance, the letter P may be described as a vertical stroke with a hole attached on the upper right side
For a character image input, the structural features are extracted and a rule-based system is applied to classify the character Structural methods are also trainable The construction of good feature set and a good rule-base can be time-consuming
Other methods
Discriminant function classifier use hypersurfaces to separate the feature description of characters Bayesian methods seek to minimize the loss function associated with misclassification through the use of probability theory ANNs, which are closer to human perception, employ mathematical minimization techniques These techniques are used in commercial OCR systems
Recognition Rate
For machine-printed characters, the rate can reach over 99% For hand-written characters, the rate is typically lower
Contextual Processing
The number of word choices for a given field can be limited by the content of another field
Knowing the zip code can help knowing address Spelling checker Verified interactively by the user
Output Interface
The output interface allows character recognition results to be electronically transferred into the domain that uses the results:
Historical Perspective
Born in 1951 GISMO by M. Sheppard: a robot reader-writer 1954 J. Rainbow developed a prototyped machine that was able to read uppercase typewritten output at the fantastic speed of one character per minute Systems that cost one million dollars were not uncommon
machine
handwritten
It is not uncommon to find PC-based OCR systems for under $800 capable of recognizing several hundred characters per minute Some system advertise themselves as omnifont-able
Commercial Applications
Task-Specific Readers
Assigning ZIP codes to letter mail Reading data entered in forms, e.g. tax forms Automatic accounting procedures used in processing utilities bills Verification of account numbers and courtesy amounts on bank checks Automatic accounting of airline passenger tickets Automatic validation of passports
Address Readers
Form Readers
Trained with a blank form Scan regions that should be filled with data Some system can process forms at a rate of 5800 forms per hour
Check Readers
Capture the check image Cross reference the amounts specified at both places An operator can correct misclassified characters by cross-validating the recognition results
Focus on certain regions on a document where the expected information are located
Scan/Match
Some systems can scan tickets upt to 260000 tickets per day (17 tickets per second)
Passport Readers
Match against the database records containing information on fugitive felons and smugglers
High-end: higher data throughput and more advanced capabilities Can adapt the recognition engine to customer data to improve accuracy Can even detect type face (bold face and italic) Low-End:
Mostly used in an office with desktop workstations Could handle a broad range of documents at a lower rate and accuracy