Professional Documents
Culture Documents
Bangla Handwritten Digit Recognition Report
Bangla Handwritten Digit Recognition Report
Abstract
Due to the wide range of shapes, sizes, and writing styles, handwritten digit recognition has
always been difficult. For all of the recognition algorithms built in this thesis, the rewarded
method is used. Bangla-digit is the largest dataset. It is a collection for Bengali handwritten
digits. This is a massive dataset with over 3000 images in it. This dataset, however, is incredibly
difficult to work with due to its complexity. The aim of this project is to preprocess images that
can be used to train deep learning models with high accuracy. In this project, various
preprocessing techniques are developed for image processing, with a deep convolutional neural
network (CNN) functioning as the classification algorithm. On the bangla-digit image dataset,
the performance is systematically evaluated of this process. Finally, 93% accuracy is obtained
for bangla-digit dataset.
1. Introduction
Bangla handwritten digit recognition is a classical problem in the field of computer vision.
There are various kinds of practical application of this system such as OCR, postal code
recognition, license plate recognition, bank checks recognition etc. Recognizing Bangla digit
from documents is becoming more important. The unique number of Bangla digits are total
10. So the recognition task is to classify 10 different classes. The critical task of handwritten
digit recognition is recognizing unique handwritten digits. Because every human has his own
writing styles. But our contribution is for the more challenging task. The challenging task is
about getting robust performance and high accuracy for large, unbiased, unprocessed, and
highly augmented “bangla-digit” dataset. The dataset is a combination of ten class datasets
that were gathered from different sources and at different times containing blurring, noise,
rotation, translation, shear, zooming, height/width shift,brightness, contrast, occlusions, and
superimposition. We have not processed all kinds of augmentation of this dataset. We have
processed blur and noisy images mainly. Then our processed image are classified by a deep
convolutional neural network (CNN).
2. Literature Review
Bangla digits were also identified using the local binary pattern. The convolutional neural
network, deep learning approach was used to improve supervised learning and performance.
Handwritten digit recognition was proposed by EMNIST for both balanced and imbalanced
datasets.Researchers used a local binary pattern to identify Bangla digits, achieving more
accuracy. As a result of the recent success of deep learning, especially convolutional neural
network (CNN) for computer vision, many researchers have been encouraged to use the CNN
to recognize handwritten characters and digits as a computer vision challenge.Proposed
handwritten Bangla digit recognition using classification combinations by Dempster–Shafer
(DS) technique.
They used the DS technique and MLP classifier for classification on the training dataset of
6000 handwritten samples, as well as threefold cross-validation.
Use this concept in their genetic algorithm-based handwritten digit recognition technique.
Islam and his colleagues suggested Bayanno-Net, in which they used CNN to understand
Bangla handwritten numerals offered a comparative analysis of existing classification
algorithms for recognizing Bengali handwritten numerals. A model based on handwritten
digits was proposed by Shawon et al. An efficient automated design to generate UML
diagram from natural language specifications by Gulia et al; the main aim of their paper is to
focus on the production of activity diagram and sequence diagram through natural language
specifications and on the other hand by applying a conceptual architecture for creating a
human computing environment involving the applicability of deep learning algorithms.
Intelligent human computing: A Deep Learning-Based Approach by the paper.A Bangla
document classification using deep recurrent neural network with BiLSTM by Saifur Rahman
and Partha Chakraborty et al. The collected data were processed for the Bangla text document
and designing the model architecture training data; it was fitted out into the model. A Bangla
documents classification using transformer-based deep learning models by Md Mahbubur
Rahman, Md. They have applied the BERT and ELECTRA models for Bangla text
classification. The main goal of their paper is to identify fake profiles and eliminate the fake
accounts on online social networks.
3. Dataset Description
For recognition and generation, “bangla-digit” dataset has been used which contains 2746
handwritten Bangla digits. This is one of the biggest datasets of handwritten Bangla digits.
This dataset is divided into 10 subfolder.These 2746 data is collected from 30 persons then
use image augmentation. The image are gray-scale and the dimension is 124∗ 124.The
sample images of the “Bangla–diti” dataset are shown in Fig-1.
4. Methodology
The experimentation process was conducted using two cloud-based platforms, namely
Google Colab and Kaggle. To run the experiments, we set up the required software packages
and libraries in both environments, including TensorFlow, Keras, and NumPy. We used
TensorFlow version 2.4.0 and Keras version 2.4.3, which were the latest stable versions at the
time of the experiments.
Three main steps are applied to recognize handwritten digits. Among them, feature extraction
step and classification step are combined in one step. So the number of main steps of our digit
recognition is two. One is preprocessing of images,and another is deep CNN model.
Here we include models EfficientNetB0, MobileNet. These models were chosen based on
their proven performance in image classification tasks.
These steps are shown in Fig.
Input Image
Image Preprocessing
Feature Extration
Image Classification
Detected Digit
Baseline model: A baseline model is a simple model that is used as a starting point for
developing more complex models. It is typically the simplest model that can be built for a
given problem, and it serves as a reference point against which other models can be
compared. The purpose of a baseline model is to establish a minimum level of performance
that must be exceeded by more complex models in order to justify their additional
complexity.
For example, in machine learning, a baseline model might be a simple linear regression
model that uses only one or two features to make predictions. This model can be used to
establish a baseline level of prediction accuracy, and more complex models can be developed
and compared to this baseline to determine if they are worth the additional computational cost
and complexity.
Transfer learning: Transfer learning was used to train the pre-trained models on the facial
emotion recognition dataset. In this approach, the pre-trained models were used as feature
extractors, and the final layer(s) of the models were replaced with a new layer(s) that was
trained on the facial emotion recognition dataset.
Transfer learning: Transfer learning was used to train the pre-trained models on the facial
emotion recognition dataset. In this approach, the pre-trained models were used as feature
extractors, and the final layer(s) of the models were replaced with a new layer(s) that was
trained on the facial emotion recognition dataset.
Dataset preprocessing: The collected dataset was preprocessed by resizing the images to a
standard size of 124x124 pixels and then converting them to JPEG format. First of all we
collect the image then we crop the image.
Data Augmentation: Data augmentation was performed using the ImageDataGenerator class
from the keras.preprocessing.image module. The following transformations were applied to
the images:
Rotation: Randomly rotate the image by a certain angle in the range 10.
Width and Height Shift: Randomly shift the image horizontally and vertically by a
fraction of the image size in the range 0.1, 0.2.
Zoom: Randomly zoom into the image by a factor in the range 0.0.
Hyperparameters: Several hyperparameters have been used for model training and
optimization, such as:
Batch Size: This hyperparameter determines the number of samples that are processed
in a single batch. A batch size of 32 has been used.
Size: This hyperparameter defines the size of the input image after resizing. In the
code, the size has been set to 124x124 pixels.
Epochs: This hyperparameter defines the number of times the entire dataset is passed
through the model during training. The number of epochs has been set to 16 and it
varies while fine tuning.
Learning Rate: This hyperparameter determines the step size at which the optimizer
adjusts the model parameters during training. Adam optimizer with a learning rate of
0.001 has been used and also varies while training for different models of different
variations.
Optimizer: This hyperparameter defines the optimization algorithm used to adjust the
weights of the model during training. Adam optimizer has been used.
5. Result and Analysis
The train and test split ratio for our dataset is 10%. For every training dataset accuracy is
calculated. Here, train and test split ratio is 10%. Three outcomes are obtained after 16
epochs for own dataset. Results are given in Table-2. It also tests the processing time of
training. Eight seconds are required per epoch.
80
60
40 31
20
9
0
EffNetB0 MobileNet
Baseline Fine-tuning Transfer learning
6. Stakeholders
The stakeholders for Bangla handwriting recognition for a report could include:
Limitations:
i. Sample size: The sample size used in this research was limited, which may have
affected the statistical power and generalizability of the findings.
ii. Participant Expression Accuracy: The study was limited by the participants' inability
to accurately convey their facial expressions, which may have impacted the reliability
and validity of the data collected.
iii. Data quality: The quality of the data used in this research was dependent on the
accuracy and completeness of the responses provided by the participants, which may
have been affected by recall bias or response.
.
Here are some potential future recommendations:
K5 (Design):
K6 (Technology):
K8 (Research):
2. P2: Range of
Conflicting
Requirements
3. P3: Depth of Analysis
Required
4. P4: Familiarity of
Issues
5. P5: Extent of
Applicable Codes
6. P6: Extent of
Stakeholder
Involvement and
Conflicting
Requirements
7. P7: Interdependence
Code:
https://www.kaggle.com/code/khondokerabunaim/mobilenetv2transfer
https://www.kaggle.com/code/khondokerabunaim/mobilenetv2finetune
https://www.kaggle.com/code/khondokerabunaim/mobilenetv2baseline