Main PPT2

UNICODE OCR
By
N. Manasa (07T81A1220) A.Sruthi (07T81A1249) B.Venkatalaxmi (07T81A1256) N.Shanthi kishore (07T81A1239)

Under the Guidance of R.Ravinder reddy
Abstract Introduction System Analysis System Requirements System Design
Abstract
Character degradation is a big problem for machine printed character recognition. Two main reasons for degradation are extrinsic image degradation such as blurring and low image dimension, and intrinsic degradation caused by font variations. A recognition method that combines two complementary classifiers is proposed in this paper. The local feature based classifier extracts the local contour direction changes, which is effective for character patterns with less structure deterioration. The global feature based classifier extracts the texture distribution of the character image, which is effective when the character structure is hard to discriminate. The two complementary classifiers are combined by candidate fusion in a coarse-to-fine style. Experiments are carried on degraded Chinese character recognition. The results prove the effectiveness of our method.
Introduction
The process of converting scanned images of machine-printed or handwritten text (numerals, letters, and symbols) into a computer-processable format; also known as optical character recognition (OCR). A typical OCR system contains three logical components: an image scanner, OCR software and hardware, and an output interface. The image scanner optically captures text images to be recognized. Text images are processed with OCR software and hardware. The process involves three operations: document analysis (extracting individual character images), recognizing these images (based on shape), and contextual processing (either to correct misclassifications made by the recognition algorithm or to limit recognition choices). The output interface is responsible for communication of OCR system results to the outside world.
OCR is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text.
There are many different approaches to optical character recognition problem. One of the most common and popular approaches is based on neural networks, which can be applied to different tasks, such as pattern recognition, time series prediction, function approximation, clustering, etc. As more and more convenient document capture devices emerge in the market, the demands for degraded character recognition increase dramatically. Many research results are published in recent years for degraded character recognition.
The solution to the intrinsic shape degradation can be well solved by the nonlinear normalization and the block based local feature extraction used in handprint character recognition.
As for the extrinsic image degradation, a comprehensive study on the image degradation model on Latin character set is presented in.
There are basically two approaches for the extrinsic degradation: the local based grayscale feature extraction and the global based texture feature extraction . While most of the methods focus on how to solve one of the above problems, few papers robust against the combination of the both.
The recognition confidence and estimated image blurring level are used to combine a local feature based classifier with a global feature based classifier. However, the hierarchical recognition structure cannot efficiently handle the mixture cases in real environment.
System Analysis: oExisting System oDisadvantage oProposed System oAdvantage
Existing System
In the existing system the Character degradation is a big problem for machine printed character recognition. Two main reasons for degradation are the intrinsic degradation caused by character shape variation and the extrinsic image degradation such as blurring and low image dimension. A mixture of the above factors makes degraded character recognition a difficult task.
Disadvantage
In this system Non-linear normalization was used which does not provide exact pixel identification and speed is less in character recognition . Intrinsic and Extrinsic degradation problems are being solved but seperately . This will lead to wastage of time and cannot give correct result
Proposed System
In the proposed system intrinsic problem and extrinsic problems can be solved by the complementary classifier method which consisting of local and global features two classification processes are executed in parallel under a coarse-to fine recognition structure. In the proposed system the Unicode OCR method uses hybrid recognition algorithm is proposed to solve the problem in the existing system. It is used to find the characters fonts, its size, its width and its height. It mainly employees an approach called Neural Networks.
Neural networks
Neural Networks usually called Artificial Neural Networks. It is a mathematical model or computational model that is inspired by the structure or functional aspects of biological neural networks. A neural network consists of an interconnected group of artificial neurons, and it processes information using a connectionist approach to computation. An artificial neuron receives a number of inputs either from original data, or from the output of other neurons in the neural network. Each input comes via a connection that has a strength or weight. These weights correspond to synaptic efficiency in a biological neuron.
In this it consists of 3 phases which are Training set, Loading, and Recognition 1. Training set: In the training step we teach the network to respond with desired output for a specified input. For this purpose each training sample is represented by two components: possible input and the desired network's output for the input. 2. Loading : After the training step is done, we can give an arbitrary input to the network and the network will form an output, from which we can resolve a pattern type presented to the network. 3. Recognition: Recognition will be done only if errors will be less i.e nothing but On each learning epoch all samples from the training set are presented to the network and the summary squared error is calculated. When the error becomes less than the specified error limit, then the training is done and the network can be used for recognition. 92 percent accuracy in recognition proves the good performance of the proposed system.
Proposed System(cont..)
How to recognize ? We need to input to the trained network and get its output. Then we should find an element in the output vector with the maximum value. The element's number will point us to the recognized pattern:
13
Advantage
In our system classifiers combination was implemented using fusion to solve the problem of degradation. The result shows that the proposed method is much more robust than any of the individual classifier
Normalization technique was used which provide exact pixel identification and high speed in character recognition .
REQUIREMENTS DESIGNS SCREEN SHOTS
Software Requirements
Hardware Requirements Processor : Intel Pentium IV RAM : 512 MB Processor Speed : 600 MHz Display Type : VGA Mouse : Logitech Software Requirements Operating System Software Platform
: Windows XP : Microsoft Visual Studio .Net 2005 : .NET
DFD DIGRAMS
Source image Processing using artificial neural network Character reorganization (paragraphs, lines, words, characters)
Unicode mapping
Classification
Feature extraction (character height, width, horz lines, vertical lines, slope lines)
Recognized text
17
Flow chart
18 18
Use-case diagram
Class diagram
SEQUENCE DIAGRAM
Screen shots
Load train set
Saving network
Loading network and the image
Future Enhancements
The future research directions include better coarse classification and the robustness evaluation under other degradation types. Experiments are carried on degraded Chinese character recognition. Even we can implement our system to detect the degraded characters of other languages and to convert them into required language.
Conclusion
A recognition method is proposed in this paper, that integrates a local feature based classifier with a global feature based classifier. The local feature based classifier performs well in good quality environment. The global feature based classifier is very robust for bad quality characters. Our method uses a coarse-to fine recognition structure. A candidate fusion step is used to link the coarse classification with the fine classification. The proposed method can efficiently handle two typical degradation types: image degradation caused by blurring and low image dimension and character shape changes by font variation. The future research directions include better coarse classification and the robustness evaluation under other degradation types
THANK YOU

Main PPT2

Uploaded by

Copyright:

Available Formats

You might also like

Main PPT2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Main PPT2

Uploaded by

Copyright:

Available Formats

UNICODE OCR

N. Manasa (07T81A1220) A.Sruthi (07T81A1249) B.Venkatalaxmi (07T81A1256) N.Shanthi kishore (07T81A1239)

Abstract Introduction System Analysis System Requirements System Design

System Analysis: oExisting System oDisadvantage oProposed System oAdvantage

REQUIREMENTS DESIGNS SCREEN SHOTS

: Windows XP : Microsoft Visual Studio .Net 2005 : .NET

Load train set

Loading network and the image

You might also like