Assignment No 2 - OCR CNN

You might also like

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 2

Honours* in Data Science #Fourth year of Engineering (Semester VII) #410502:

Machine Learning and Data Science Laboratory


Dr. Girija Gireesh Chiddarwar
Assignment No 2 - Recognize optical character using ANN

Optical Character Recognition(OCR)


It is a widespread technology to recognize text inside images, such as scanned
documents and photos.
OCR technology is used to convert virtually any kind of image containing written
text (typed, handwritten, or printed) into machine-readable text data.
The Optical Character Recognition (OCR) takes an optical image of character as
input and produces the corresponding character as output.
It has a wide range of applications including traffic surveillance, robotics,
digitization of printed articles, etc. 

The OCR can be implemented by using Convolutional Neural Network (CNN), which is a
popular deep neural network architecture.
The traditional CNN classifiers are capable of learning the important 2D features
present in the images and classify them, the classification is performed using
softmax.

Convolutional Neural Networks (CNN)

Convolution
A convolution sweeps the window through images then calculates its input and filter
dot product pixel values. This allows convolution to emphasize the relevant
features.

Max Pooling
CNN uses max pooling to replace output with a max summary to reduce data size and
processing time.
This allows you to determine features that produce the highest impact and reduces
the risk of overfitting.
Max pooling takes two hyperparameters: stride and size.
The stride will determine the skip of value pools while the size will determine
how big the value pools in every skip.

Activation Function (ReLU and Sigmoid)


After each convolutional and max pooling operation, we can apply Rectified Linear
Unit (ReLU).
The ReLU function mimics our neuron activations on a “big enough stimulus” to
introduce nonlinearity for values x>0 and returns 0 if it does not meet the
condition.
This method has been effective to solve diminishing gradients.
Weights that are very small will remain as 0 after the ReLU activation function.

Finally, we will serve the convolutional and max pooling feature map outputs with
Fully Connected Layer (FCL).
We flatten the feature outputs to column vector and feed-forward it to FCL.
We wrap our features with softmax activation function which assign decimal
probabilities for each possible label which add up to 1.0.
Every node in the previous layer is connected to the last layer and represents
which distinct label to output.

2D convolution layer (e.g. spatial convolution over images).


This layer creates a convolution kernel that is convolved with the layer input to
produce a tensor of outputs.
If use_bias is True, a bias vector is created and added to the outputs. Finally, if
activation is not None, it is applied to the outputs as well.
When using this layer as the first layer in a model, provide the keyword argument
input_shape (tuple of integers or None, does not include the sample axis), e.g.
input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format="channels_last".
You can use None when a dimension has variable size.
Arguments
filters: Integer, the dimensionality of the output space (i.e. the number of output
filters in the convolution).
kernel_size: An integer or tuple/list of 2 integers, specifying the height and
width of the 2D convolution window. Can be a single integer to specify the same
value for all spatial dimensions.
strides: An integer or tuple/list of 2 integers, specifying the strides of the
convolution along the height and width. Can be a single integer to specify the same
value for all spatial dimensions. Specifying any stride value != 1 is incompatible
with specifying any dilation_rate value != 1.
padding: one of "valid" or "same" (case-insensitive). "valid" means no padding.
"same" results in padding with zeros evenly to the left/right or up/down of the
input such that output has the same height/width dimension as the input.
data_format: A string, one of channels_last (default) or channels_first. The
ordering of the dimensions in the inputs. channels_last corresponds to inputs with
shape (batch_size, height, width, channels) while channels_first corresponds to
inputs with shape (batch_size, channels, height, width). It defaults to the
image_data_format value found in your Keras config file at ~/.keras/keras.json. If
you never set it, then it will be channels_last.

Algorithm
Install required packages( matplotbib, pandas, numpy, keras, sklearn) and
dataset(A_Z hardwritten .cvs) from kaggel
Open the dataset
To get into image format (Drop few column and reshape)
Dictionary for getting characters from index values
Reshaping the training & test dataset so that it can be put in the model
Converting the labels to categorical values
Implementing CNN model using one input, output and convolution layers
Making model predictions
Prediction of external(user written) image.
Displaying some of the test images & their predicted labels
Displaying accuracy , precision, recall and F1 Score using matplolib

You might also like