Main PPT2

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 31

UNICODE OCR

By

N. Manasa (07T81A1220) A.Sruthi (07T81A1249) B.Venkatalaxmi (07T81A1256) N.Shanthi kishore (07T81A1239)


Under the Guidance of R.Ravinder reddy

Abstract Introduction System Analysis System Requirements System Design

Abstract
Character degradation is a big problem for machine printed character recognition. Two main reasons for degradation are extrinsic image degradation such as blurring and low image dimension, and intrinsic degradation caused by font variations. A recognition method that combines two complementary classifiers is proposed in this paper. The local feature based classifier extracts the local contour direction changes, which is effective for character patterns with less structure deterioration. The global feature based classifier extracts the texture distribution of the character image, which is effective when the character structure is hard to discriminate. The two complementary classifiers are combined by candidate fusion in a coarse-to-fine style. Experiments are carried on degraded Chinese character recognition. The results prove the effectiveness of our method.

Introduction
The process of converting scanned images of machine-printed or handwritten text (numerals, letters, and symbols) into a computer-processable format; also known as optical character recognition (OCR). A typical OCR system contains three logical components: an image scanner, OCR software and hardware, and an output interface. The image scanner optically captures text images to be recognized. Text images are processed with OCR software and hardware. The process involves three operations: document analysis (extracting individual character images), recognizing these images (based on shape), and contextual processing (either to correct misclassifications made by the recognition algorithm or to limit recognition choices). The output interface is responsible for communication of OCR system results to the outside world.

OCR is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text.

There are many different approaches to optical character recognition problem. One of the most common and popular approaches is based on neural networks, which can be applied to different tasks, such as pattern recognition, time series prediction, function approximation, clustering, etc. As more and more convenient document capture devices emerge in the market, the demands for degraded character recognition increase dramatically. Many research results are published in recent years for degraded character recognition.
The solution to the intrinsic shape degradation can be well solved by the nonlinear normalization and the block based local feature extraction used in handprint character recognition.

As for the extrinsic image degradation, a comprehensive study on the image degradation model on Latin character set is presented in.

There are basically two approaches for the extrinsic degradation: the local based grayscale feature extraction and the global based texture feature extraction . While most of the methods focus on how to solve one of the above problems, few papers robust against the combination of the both.
The recognition confidence and estimated image blurring level are used to combine a local feature based classifier with a global feature based classifier. However, the hierarchical recognition structure cannot efficiently handle the mixture cases in real environment.

System Analysis: oExisting System oDisadvantage oProposed System oAdvantage

Existing System
In the existing system the Character degradation is a big problem for machine printed character recognition. Two main reasons for degradation are the intrinsic degradation caused by character shape variation and the extrinsic image degradation such as blurring and low image dimension. A mixture of the above factors makes degraded character recognition a difficult task.

Disadvantage
In this system Non-linear normalization was used which does not provide exact pixel identification and speed is less in character recognition . Intrinsic and Extrinsic degradation problems are being solved but seperately . This will lead to wastage of time and cannot give correct result

Proposed System
In the proposed system intrinsic problem and extrinsic problems can be solved by the complementary classifier method which consisting of local and global features two classification processes are executed in parallel under a coarse-to fine recognition structure. In the proposed system the Unicode OCR method uses hybrid recognition algorithm is proposed to solve the problem in the existing system. It is used to find the characters fonts, its size, its width and its height. It mainly employees an approach called Neural Networks.

Neural networks
Neural Networks usually called Artificial Neural Networks. It is a mathematical model or computational model that is inspired by the structure or functional aspects of biological neural networks. A neural network consists of an interconnected group of artificial neurons, and it processes information using a connectionist approach to computation. An artificial neuron receives a number of inputs either from original data, or from the output of other neurons in the neural network. Each input comes via a connection that has a strength or weight. These weights correspond to synaptic efficiency in a biological neuron.

In this it consists of 3 phases which are Training set, Loading, and Recognition 1. Training set: In the training step we teach the network to respond with desired output for a specified input. For this purpose each training sample is represented by two components: possible input and the desired network's output for the input. 2. Loading : After the training step is done, we can give an arbitrary input to the network and the network will form an output, from which we can resolve a pattern type presented to the network. 3. Recognition: Recognition will be done only if errors will be less i.e nothing but On each learning epoch all samples from the training set are presented to the network and the summary squared error is calculated. When the error becomes less than the specified error limit, then the training is done and the network can be used for recognition. 92 percent accuracy in recognition proves the good performance of the proposed system.

Proposed System(cont..)
How to recognize ? We need to input to the trained network and get its output. Then we should find an element in the output vector with the maximum value. The element's number will point us to the recognized pattern:

13

Advantage

In our system classifiers combination was implemented using fusion to solve the problem of degradation. The result shows that the proposed method is much more robust than any of the individual classifier
Normalization technique was used which provide exact pixel identification and high speed in character recognition .

REQUIREMENTS DESIGNS SCREEN SHOTS

Software Requirements
Hardware Requirements Processor : Intel Pentium IV RAM : 512 MB Processor Speed : 600 MHz Display Type : VGA Mouse : Logitech Software Requirements Operating System Software Platform

: Windows XP : Microsoft Visual Studio .Net 2005 : .NET

DFD DIGRAMS
Source image Processing using artificial neural network Character reorganization (paragraphs, lines, words, characters)

Unicode mapping

Classification

Feature extraction (character height, width, horz lines, vertical lines, slope lines)

Recognized text

17

Flow chart

18 18

Use-case diagram

Class diagram

SEQUENCE DIAGRAM

Screen shots

Load train set

Saving network

Loading network and the image

Future Enhancements
The future research directions include better coarse classification and the robustness evaluation under other degradation types. Experiments are carried on degraded Chinese character recognition. Even we can implement our system to detect the degraded characters of other languages and to convert them into required language.

Conclusion
A recognition method is proposed in this paper, that integrates a local feature based classifier with a global feature based classifier. The local feature based classifier performs well in good quality environment. The global feature based classifier is very robust for bad quality characters. Our method uses a coarse-to fine recognition structure. A candidate fusion step is used to link the coarse classification with the fine classification. The proposed method can efficiently handle two typical degradation types: image degradation caused by blurring and low image dimension and character shape changes by font variation. The future research directions include better coarse classification and the robustness evaluation under other degradation types

THANK YOU

You might also like