Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Bangla Handwritten Digit Recognition Using Deep Learning.

Khondoker Abu Naim (200101103) and Abu Rayhan Mouno (180201118)


Course Code: CSE 4132 Course Title: Artificial Neural Networks and Fuzzy Systems Sessional
Semester: Winter 2023
*Department of Computer Science and Engineering, Bangladesh Army University of Science and Technology (BAUST)

Abstract
Due to the wide range of shapes, sizes, and writing styles, handwritten digit recognition has
always been difficult. For all of the recognition algorithms built in this thesis, the rewarded
method is used. Bangla-digit is the largest dataset. It is a collection for Bengali handwritten
digits. This is a massive dataset with over 3000 images in it. This dataset, however, is incredibly
difficult to work with due to its complexity. The aim of this project is to preprocess images that
can be used to train deep learning models with high accuracy. In this project, various
preprocessing techniques are developed for image processing, with a deep convolutional neural
network (CNN) functioning as the classification algorithm. On the bangla-digit image dataset,
the performance is systematically evaluated of this process. Finally, 93% accuracy is obtained
for bangla-digit dataset.

1. Introduction
Bangla handwritten digit recognition is a classical problem in the field of computer vision.
There are various kinds of practical application of this system such as OCR, postal code
recognition, license plate recognition, bank checks recognition etc. Recognizing Bangla digit
from documents is becoming more important. The unique number of Bangla digits are total
10. So the recognition task is to classify 10 different classes. The critical task of handwritten
digit recognition is recognizing unique handwritten digits. Because every human has his own
writing styles. But our contribution is for the more challenging task. The challenging task is
about getting robust performance and high accuracy for large, unbiased, unprocessed, and
highly augmented “bangla-digit” dataset. The dataset is a combination of ten class datasets
that were gathered from different sources and at different times containing blurring, noise,
rotation, translation, shear, zooming, height/width shift,brightness, contrast, occlusions, and
superimposition. We have not processed all kinds of augmentation of this dataset. We have
processed blur and noisy images mainly. Then our processed image are classified by a deep
convolutional neural network (CNN).

2. Literature Review
Bangla digits were also identified using the local binary pattern. The convolutional neural
network, deep learning approach was used to improve supervised learning and performance.
Handwritten digit recognition was proposed by EMNIST for both balanced and imbalanced
datasets.Researchers used a local binary pattern to identify Bangla digits, achieving more
accuracy. As a result of the recent success of deep learning, especially convolutional neural
network (CNN) for computer vision, many researchers have been encouraged to use the CNN
to recognize handwritten characters and digits as a computer vision challenge.Proposed
handwritten Bangla digit recognition using classification combinations by Dempster–Shafer
(DS) technique.
They used the DS technique and MLP classifier for classification on the training dataset of
6000 handwritten samples, as well as threefold cross-validation.
Use this concept in their genetic algorithm-based handwritten digit recognition technique.
Islam and his colleagues suggested Bayanno-Net, in which they used CNN to understand
Bangla handwritten numerals offered a comparative analysis of existing classification
algorithms for recognizing Bengali handwritten numerals. A model based on handwritten
digits was proposed by Shawon et al. An efficient automated design to generate UML
diagram from natural language specifications by Gulia et al; the main aim of their paper is to
focus on the production of activity diagram and sequence diagram through natural language
specifications and on the other hand by applying a conceptual architecture for creating a
human computing environment involving the applicability of deep learning algorithms.
Intelligent human computing: A Deep Learning-Based Approach by the paper.A Bangla
document classification using deep recurrent neural network with BiLSTM by Saifur Rahman
and Partha Chakraborty et al. The collected data were processed for the Bangla text document
and designing the model architecture training data; it was fitted out into the model. A Bangla
documents classification using transformer-based deep learning models by Md Mahbubur
Rahman, Md. They have applied the BERT and ELECTRA models for Bangla text
classification. The main goal of their paper is to identify fake profiles and eliminate the fake
accounts on online social networks.

3. Dataset Description
For recognition and generation, “bangla-digit” dataset has been used which contains 2746
handwritten Bangla digits. This is one of the biggest datasets of handwritten Bangla digits.
This dataset is divided into 10 subfolder.These 2746 data is collected from 30 persons then
use image augmentation. The image are gray-scale and the dimension is 124∗ 124.The
sample images of the “Bangla–diti” dataset are shown in Fig-1.

Fig1:Sample data Image from bangla-digit dataset.

Class Number of Sample


0 274
1 276
2 276
3 276
4 276
5 275
6 275
7 276
8 275
9 267

Table-1: Summarizes the Dataset.

4. Methodology
The experimentation process was conducted using two cloud-based platforms, namely
Google Colab and Kaggle. To run the experiments, we set up the required software packages
and libraries in both environments, including TensorFlow, Keras, and NumPy. We used
TensorFlow version 2.4.0 and Keras version 2.4.3, which were the latest stable versions at the
time of the experiments.

Three main steps are applied to recognize handwritten digits. Among them, feature extraction
step and classification step are combined in one step. So the number of main steps of our digit
recognition is two. One is preprocessing of images,and another is deep CNN model.
Here we include models EfficientNetB0, MobileNet. These models were chosen based on
their proven performance in image classification tasks.
These steps are shown in Fig.

Input Image

Image Preprocessing

Feature Extration

Image Classification

Detected Digit

Fig-2: Block diagram of proposed Bangla digit recognition

Fig-2: Propose Model

Baseline model: A baseline model is a simple model that is used as a starting point for
developing more complex models. It is typically the simplest model that can be built for a
given problem, and it serves as a reference point against which other models can be
compared. The purpose of a baseline model is to establish a minimum level of performance
that must be exceeded by more complex models in order to justify their additional
complexity.
For example, in machine learning, a baseline model might be a simple linear regression
model that uses only one or two features to make predictions. This model can be used to
establish a baseline level of prediction accuracy, and more complex models can be developed
and compared to this baseline to determine if they are worth the additional computational cost
and complexity.
Transfer learning: Transfer learning was used to train the pre-trained models on the facial
emotion recognition dataset. In this approach, the pre-trained models were used as feature
extractors, and the final layer(s) of the models were replaced with a new layer(s) that was
trained on the facial emotion recognition dataset.
Transfer learning: Transfer learning was used to train the pre-trained models on the facial
emotion recognition dataset. In this approach, the pre-trained models were used as feature
extractors, and the final layer(s) of the models were replaced with a new layer(s) that was
trained on the facial emotion recognition dataset.
Dataset preprocessing: The collected dataset was preprocessed by resizing the images to a
standard size of 124x124 pixels and then converting them to JPEG format. First of all we
collect the image then we crop the image.

Data Augmentation: Data augmentation was performed using the ImageDataGenerator class
from the keras.preprocessing.image module. The following transformations were applied to
the images:
 Rotation: Randomly rotate the image by a certain angle in the range 10.
 Width and Height Shift: Randomly shift the image horizontally and vertically by a
fraction of the image size in the range 0.1, 0.2.
 Zoom: Randomly zoom into the image by a factor in the range 0.0.

Hyperparameters: Several hyperparameters have been used for model training and
optimization, such as:
 Batch Size: This hyperparameter determines the number of samples that are processed
in a single batch. A batch size of 32 has been used.
 Size: This hyperparameter defines the size of the input image after resizing. In the
code, the size has been set to 124x124 pixels.
 Epochs: This hyperparameter defines the number of times the entire dataset is passed
through the model during training. The number of epochs has been set to 16 and it
varies while fine tuning.
 Learning Rate: This hyperparameter determines the step size at which the optimizer
adjusts the model parameters during training. Adam optimizer with a learning rate of
0.001 has been used and also varies while training for different models of different
variations.
 Optimizer: This hyperparameter defines the optimization algorithm used to adjust the
weights of the model during training. Adam optimizer has been used.
5. Result and Analysis
The train and test split ratio for our dataset is 10%. For every training dataset accuracy is
calculated. Here, train and test split ratio is 10%. Three outcomes are obtained after 16
epochs for own dataset. Results are given in Table-2. It also tests the processing time of
training. Eight seconds are required per epoch.

Table-2: Result comparison in different mode.

Result Analysis of Model Comparison


120
99.4
100 95 94 92

80

60

40 31

20
9

0
EffNetB0 MobileNet
Baseline Fine-tuning Transfer learning

Fig-3: Model Comparison

EfficientNetB0: The EfficientNetB0 model achieved the highest accuracy in transfer


learning, with a score of 95%. The baseline model achieved 32% accuracy. which is
relatively low. However, fine-tuning improved the accuracy to 99.4%, which is quite good
MobileNet:The mobileNet model did not perform as well as the EffNetB0 model. Transfer
learning achieved a moderate accuracy of 92%. However, the baseline model achieved 9%
accuracy, which is relatively low. However, fine-tuning improved the accuracy to 94%.
Fig-4: Graph of model accuracy and loss

6. Stakeholders
The stakeholders for Bangla handwriting recognition for a report could include:

 Researchers and developers working in the field of Bangla handwriting recognition


technology.
 Users of the technology, such as businesses or organizations that utilize Bangla
handwriting recognition for data entry or analysis.
 Government bodies and policy-makers who are interested in promoting the
development and use of Bangla handwriting recognition technology.
 Educational institutions and instructors who teach Bangla language and writing, as
handwriting recognition technology could be used in language learning.
 Bangla language and culture advocates, who may be interested in preserving and
promoting the use of the Bangla language in the digital age.

7. Issues Encountered
 Large image size: The size (in terms of space) of the images used in the dataset was
too large, which made it difficult to work with. As a result, preprocessing techniques
had to be used to reduce the image size to a more manageable level.
 Face data augmentation problem.
 Variability in handwriting: Bengali script has many complex characters, and each
character can be written in several different ways. This variability makes it difficult to
develop a reliable recognition system that can accurately recognize all variations of
each character.

8. Conclusion, Limitations and Future Recommendations


Despite having lots of variation in the test set our ensemble of residual networks based
approach with Xception architecture performed really well even though the final model has
comparatively lower number of parameters and the models were trained with limited
resources.The impacts of using residual network and batch-normalization were prominent to
improve the overall performance of the classifier model. The number of parameters can be
further reduced with more optimized set of parameter selection while introducing more
augmentation can improve overall performance of the model further.

Limitations:
i. Sample size: The sample size used in this research was limited, which may have
affected the statistical power and generalizability of the findings.
ii. Participant Expression Accuracy: The study was limited by the participants' inability
to accurately convey their facial expressions, which may have impacted the reliability
and validity of the data collected.
iii. Data quality: The quality of the data used in this research was dependent on the
accuracy and completeness of the responses provided by the participants, which may
have been affected by recall bias or response.
.
Here are some potential future recommendations:

i. Explore additional pre-trained models: While the current implementation uses a


variety of pre-trained models, there are many other options available. Exploring
additional models could help improve the accuracy of the facial emotion recognition
system.
ii. Improve dataset quality: Collecting a high-quality dataset is crucial for training
accurate machine learning models. To improve the performance of the current
system, more high-quality data could be collected, particularly for underrepresented
classes
References
[1] Chakraborty, Partha, Syeda Surma Jahanapi, and Tanupriya Choudhury. "Bangla
Handwritten Digit Recognition." Cyber Intelligence and Information Retrieval: Proceedings
of CIIR 2021. Springer Singapore, 2022.
[2] Hossain, M. Zahid, M. Ashraful Amin, and Hong Yan. "Rapid feature extraction for
Bangla handwritten digit recognition." 2011 International Conference on Machine Learning
and Cybernetics. Vol. 4. IEEE, 2011.
[3] Fahim Sikder, Md. "Bangla handwritten digit recognition and generation." Proceedings of
International Joint Conference on Computational Intelligence: IJCCI 2018. Springer
Singapore, 2020.
[4] Hoq, Md Nazmul, et al. "A comparative overview of classification algorithm for Bangla
handwritten digit recognition." Proceedings of International Joint Conference on
Computational Intelligence: IJCCI 2018. Springer Singapore, 2020.
[5] Rabby, AKM Shahariar Azad, et al. "Bangla handwritten digit recognition using
convolutional neural network." Emerging Technologies in Data Mining and Information
Security: Proceedings of IEMIS 2018, Volume 1. Springer Singapore, 2019.
Appendix
Attainment of Complex Engineering Problem (CP)

S.L. CP No. Attainment Remarks


1. P1: Depth of K3 (Engineering Fundamentals):
Knowledge Required K4 (Engineering Specialization):

K5 (Design):

K6 (Technology):

K8 (Research):
2. P2: Range of
Conflicting
Requirements
3. P3: Depth of Analysis
Required
4. P4: Familiarity of
Issues
5. P5: Extent of
Applicable Codes
6. P6: Extent of
Stakeholder
Involvement and
Conflicting
Requirements
7. P7: Interdependence

Mapping of Complex Engineering Activities (CA)

S.L. CA No. Attainment Remarks


1. A1: Range of
resources
2. A2: Level of
interaction
3. A3: Innovation
4. A4: Consequences for
Society and the
Environment
5. A5: Familiarity

Code:
 https://www.kaggle.com/code/khondokerabunaim/mobilenetv2transfer
 https://www.kaggle.com/code/khondokerabunaim/mobilenetv2finetune
 https://www.kaggle.com/code/khondokerabunaim/mobilenetv2baseline

You might also like