Professional Documents
Culture Documents
Sethi 2020
Sethi 2020
Authorized licensed use limited to: University of Melbourne. Downloaded on June 23,2020 at 19:11:12 UTC from IEEE Xplore. Restrictions apply.
the terms are self-explanatory, line segmentation words at the initial stage of recognition analysis actually is
involves division with respect to lines, character carried out in three parts i.e. upper, middle and lower parts.
segmentation involves division with respect to The features which are diversified, are designed after
characters and word segmentation involves division considering octant-centroid features, shadow features which
with respect to words. But, here in this paper we are modified and features, which are in the long -run. The
have implemented a different approach to accuracy rate of 80.58% was recorded when the experiments
segmentation; we have involved splitting of dataset were carried out on 300 samples [19].
during the process of segmentation. And, according The study of different segmentation techniques was
to splitting process the prediction of the labelled included in hand-written character recognition. There are
digit is more accurate, as processed by the three levels of segmentation process discussed, which
classifier.[13] includes character, word and text-lines; the needs and the
x Feature Extraction: The process of ‘Feature factors which leads to any affect to the segmentation process
Extraction’ can be defined as the process in which are discussed.
the essential information about the focussed subject The study which involves the work in this field
is extracted which is existing in the image. demonstrates entirely a fresh approach which targets the use
x Classification: Within classification, the pre-defined of series of algorithms for recognition and segmentation for
class is assigned with an unknown sample. As the the OCR of handwritten script and digits. Hidden Markov
features, are extracted the digits are classified and Model (HMM) is processed with precision rate of 92.3% for
recognised accordingly. recognition with lexicon size 50.[20] The word-level
x Post-Processing: The techniques of post processing segmentation is derived from the combination of Lexicon
are used to achieve more accuracy. and HMM [21].
The discussion of various segmentation levels has been
II. LITERATURE SURVEY existent in this field of work study.
The hand-written digit classification or recognition, for ‘Hough Transformation’ can be defined as the level used
the cursive hand-written document, the study demonstrated for segmentation of text lines. The skeletonization process
that off-line hand-written analysis of the document occurs occurs due to the fact that vertically connected components
through skew recognition, writing pressure detection and are separated. The experiments which were implemented
segmentation. The segmentation method which was were on IC-DAR2007 dataset.
proposed, was based on modified vertical and horizontal The strength function of connectivity involves the
projections; moreover, in the existence of multi-skewed and process of segmentation in this field of study.
over-lapped text lines these projections are capable to “Connectivity strength parameter” can be defined as the
segment the text lines and the words accurately.[14] The parameter which decides the constituents of the text-line.
testing of the method was executed on more than 550 images This approach is adaptive to the language in nature with
of text which were of IAM database and sample images of precision rate of around 97.3%.[22]
handwriting of different writers on different back-grounds. III. SUPERVISED MACHINE LEARNING SYSTEM
This proposed method was capable of performing correct
segmentation of around 92.56% words and 95.65% lines The supervised machine learning technique can be
from the IAM dataset. Moreover, around 96% lines are defined as the technique , in which certain pre-defined input
perfectly normalized with a minute error correction. The or pre-defined labels are given to the machine learning
method of skew normalization demonstrates the skew angle model which acts as the supervisor to the machine learning
accurately and compares it to different hands on techniques model and trains the model , such that the model could either
efficiently [15]. classify or regress with experience from the set of inputs
The segmentation of lines, of the text is processed on the given to obtain the desired set of outputs .
basis of information energy targeted individually for the
calculation of every pixel and “Artificial Neural Network”
recognised the characters and digits. The accuracy of
recognition was around 92% [16].
The execution of feature set which is hull based and of
convex form which implies, 125 features to be computed Figure 2. Multi Variate Scattered Data
according to the consideration of diverse attributes of bays of
a pattern of the convex hull, isolated Bangla basic characters This technique performs on the basis of set of input-
and digits were also recognised and this technique was also output pairs, which associates the regression and
included in this field of study. The accuracy rate of classification techniques to provide with the desired outputs.
recognition of hand-written Bangla characters was 76.86% As in this algorithm the model is trained first, with labelled
and for Bangla numerals it was 99.45% [17]. dataset the output can be predicted much earlier by the
In this field, unprecedented technique is applied for the humans; but, so as to associate the scenario with machines ,
recognition of Bangla cursive words. For, the recognition supervised machine learning is used. The set of inputs given
purpose MLP classifier is used.[18] The categorization of to the model are generally known as Vectors and the desired
outputs obtained are known as Supervisory Signals.[23]
50
Authorized licensed use limited to: University of Melbourne. Downloaded on June 23,2020 at 19:11:12 UTC from IEEE Xplore. Restrictions apply.
Since, this technique leads to the development of relation These properties contribute to the probability
between input and output, the mathematical formula independently. It is useful for large datasets as it is
associated with this technique can be expressed as follows: - not very much difficult to build.[25]
x Nearest Neighbour: The K-Nearest Neighbour
………..(1) supervised machine learning algorithm can be
The supervised machine learning algorithms can be defined as algorithm in which labelled dataset is
classified as: - taken as the input and the computation as desired
x Regression : The statistical supervised machine for the output, is held over. The feature space
learning algorithm, which determines the statistical contains the K-Nearest Neighbour training
relationship between a single dependent variable examples. Also, it is an instance-based learning.
and multiple independent variables which is widely x Logistic Regression: The statistical technique
used in prediction and financial analysis. The main which analyses the dataset to determine the
target of this analysis is to predict the best set of outcome where there are one or more independent
values of the random variable or independent variables. The outcome for this classification
variable on the foundation of dependent technique is computed with the help of a
variable.[24] The two types of regression analysis dichotomous variable, which is a variable with two
are : Linear and Multiple regression. possible outcomes only. The main motive of this
x Classification : The classification as, statistical algorithm is to search for a perfect fitting model
supervised machine learning algorithm can be which depicts the relationship among the
explained as the technique or the approach which dichotomous variables and independent
expects a set of inputs to learn, observe and analyse variables.[26]
the data , and implement the experience to classify x Decision Trees: The classification algorithm which
on the set of new inputs after the model has been builds the regression and classification models in
trained well. The main motive of this technique is the form of a structure of a tree. The associated tree
to classify the data according to the set of features structure is formed by the sequential or incremental
persistent with the input data , after the successful division of the dataset in the subsets and further
training of the model with the labelled dataset. subsets in more smaller subsets. The topmost
decision node of the tree is known as the root node,
In this research paper, we have demonstrated and which is the best predictor node of the decision
executed the supervised machine learning algorithm of tree. It could handle numerical as well as
classification so as to accomplish our purpose of categorical data.
implementing the project of “Hand-written digit x Random Forest: The construction of multitude
classification”. decision trees at the training time which actually
formulates the decision trees for the combinational
IV. DIVERSE CLASSIFICATION ALGORITHMS IN MACHINE
learning methods for regression and classification.
LEARNING
They tend to overfit their training dataset for
In statistics and machine learning, the classification correct decision trees.
algorithms analyses the training dataset for the prediction of x Neural Network: The neural network classification
g dataset.
target or testing algorithm in supervised machine learning contains
the units of neural network better known as neurons
which are known to convert the input vector into
considerable output and structurally organized in
layers. In this algorithm every unit expects an input
and applies the non-linear function on the input and
transfers the output thus, obtained to the next layer
Figure 3. Classification Algorithm for the computation. The necessary weights are
applied on to the signals so as to make the neural
The 7 types of classification algorithms are explained as network adaptable which have been turned in the
follows: - training phase. The main drawback of this technique
x Naïve Bayes Classifier: The classification is that it does not support feedback system for every
neuron involved in the network
algorithm which is mathematically based on
Bayes’ Theorem and assumes the independence of
In this paper, we have applied the KNN Classification
predictors is known Naïve Bayes Classifier. This Algorithm to support the implementation of our project
classification technique on the basis of assumption work, “Hand-written digit classification”.
as stated above, the particular feature of a class is
non-related to the presence of the other feature.
51
Authorized licensed use limited to: University of Melbourne. Downloaded on June 23,2020 at 19:11:12 UTC from IEEE Xplore. Restrictions apply.
V. KNN – BASED ARCHITECTURE ¾ Selection of K parameter based on data.
K-Nearest Neighbors Algorithm, can be defined as the ¾ Distance metric is required to define proximity
supervised machine learning algorithm which is non- among any two data points.
parametric in nature and applied in the areas of classification
and regression. Considering both the cases, and on the Steps for KNN Algorithm
analysis of feature space the input comprises of the K closest x Computation of distance metric between testing data
training examples. It is decided by the output that whether, points and all the labelled data points.
x The data points which are labelled are enjoined in the
the algorithm will be used for regression or classification. ascending order of distance metric.
In the case of classification, the output obtains the x Selection of the top K labelled data points.
membership of the class.[27] The classification of the object x The class labels are matched with the K labelled data
is purely based on the vote of its neighbors. Whereas, in points and assigned to the test data points.
regression the property value for the object is output.
KNN, the supervised non-parametric classification or
regression-based algorithm which is also known by the term VI. IMPLEMENTATION
lazy learning or late learning involves all the computations to
Here we propose, KNN algorithm to solve the problem of
g of classification.[28]
be derived at the final stage
Hand-Written Digit Classification; the dataset used to solve
the problem is referred to as the MNIST dataset.
52
Authorized licensed use limited to: University of Melbourne. Downloaded on June 23,2020 at 19:11:12 UTC from IEEE Xplore. Restrictions apply.
The block diagram for the system is explained as
follows:-
53
Authorized licensed use limited to: University of Melbourne. Downloaded on June 23,2020 at 19:11:12 UTC from IEEE Xplore. Restrictions apply.
This work can be proliferated in future to result in rapid [15] Hamayun A. Khan , “MCS HOG Features and SVM Based
computation which would also decrease time and would Handwritten Digit Recognition System” , Scientific Research , 2017.
increase efficiency and target better results. [16] Irfan Ali , Insaf Ali , Subhash , Asif Khan , Syed Ahmed Raza , Basit
Hassan , Priha Bhatti, “Sindhi Handwritten- Digits Recognition Using
Machine Learning Techniques” , International Journal of Computer
REFERENCES Science and Network Security (IJCSNS) , VOL.19 No.5 , 2019.
[1] Pritam Dhande, Reena Kharat. "Recognition of cursive English [17] Huseyin Kusetogullari , Amir Yavariabdi , Abbas Cheddad , Hakan
handwritten characters" 2017 International Conference on Trends in Grahn and Johan Hall, “ A Swedish Historical Handwritten Digit
Electronics and Informatics (ICEI), 2017. Dataset” , Springer , 2019.
[2] Mandal*, S Shahnawazuddin, Rohit Sinha*, S. R. Mahadeva [18] Khamparia, A., Gupta, D., Albuquerque, V. H. C. D., Sangaiah, A.
Prasanna *, Suresh Sundaram*,“ Exploring Sparse Representation for K., & Jhaveri, R. H. (2020). Internet of health things-driven deep
Improved Online Handwriting Recognition” 2018 16th International learning system for detection and classification of cervical cells using
Conference on Frontiers in Handwriting Recognition , 2018. transfer learning. The Journal of Supercomputing. doi:
[3] Khamparia, A., & Singh, K. M. (2019). A systematic review on deep 10.1007/s11227-020-03159-4
learning architectures and applications. Expert Systems, 36(3). doi: [19] Yuxiang Wang , Ruijin Wang , Dongfen Li , Daniel Adu-Gyamfi ,
10.1111/exsy.12400 Kaibin Tian and Yixin Zhu, “Improved Handwritten Digit
[4] Soni, S., & Bhushan, B. (2019). Use of Machine Learning algorithms Recognition using Quantum K-Nearest Neighbor Algorithm”
for designing efficient cyber security solutions. 2019 2nd Springer , 2019.
International Conference on Intelligent Computing, Instrumentation [20] Tiwari, R., Sharma, N., Kaushik, I., Tiwari, A., & Bhushan, B.
and Control Technologies (ICICICT). doi: (2019). Evolution of IoT & Data Analytics using Deep
10.1109/icicict46008.2019.8993253 Learning. 2019 International Conference on Computing,
[5] Manchanda, C., Rathi, R., & Sharma, N. (2019). Traffic Density Communication, and Intelligent Systems (ICCCIS). doi:
Investigation & Road Accident Analysis in India using Deep 10.1109/icccis48478.2019.8974481
Learning. 2019 International Conference on Computing, [21] Mahreen Ahmed , Asma Ghulam Rasool , Hammad Afzal , Imran
Communication, and Intelligent Systems (ICCCIS). doi: Siddiqi , “Improving handwriting based gender classification using
10.1109/icccis48478.2019.8974528 ensemble classifiers” , ScienceDirect , 2017.
[6] Yuanzhi Zhu , Zecheng Xie , Lianwen Jin, Xiaoxue Chen, Yaoxiong [22] Khamparia, A., Singh, A., Anand, D., Gupta, D., Khanna, A., Kumar,
Huang and Ming Zhang, “SCUT-EPT : New Dataset and N. A., & Tan, J. (2018). A novel deep learning-based multi-model
Benchmark for Offline Chinese Text Recognition in Examination ensemble method for the prediction of neuromuscular
Paper” IEEE Access , 2018. disorders. Neural Computing and Applications. doi: 10.1007/s00521-
[7] S.E Benita Galaxy , S. Selvin Ebenzer, “Enhancement of 018-3896-0
segmentation and zoning to improve the accuracy of handwritten [23] Emad Sami Jaha , “Efficient Gabor-Based Recognition for
character recognition” , International Conference on Electrical , Handwritten Arabic-Indic Digits” , International Journal of Advanced
Electronics and Optimization Techniques (ICEEOT) – 2016. Computer Science and Applications Vol . 10 No. 1 , 2019.
[8] Khamparia, A., Saini, G., Gupta, D., Khanna, A., Tiwari, S., & [24] Hui-huang Zhao and Han Liu , “Multiple classifiers fusion and CNN
Albuquerque, V. H. C. D. (2019). Seasonal Crops Disease Prediction feature extraction for handwritten digits recognition” , Springer ,
and Classification Using Deep Convolutional Encoder 2019.
Network. Circuits, Systems, and Signal Processing, 39(2), 818–836.
doi: 10.1007/s00034-019-01041-0 [25] U. Chauhan, V. Kumar, V. Chauhan, S. Tiwary and A. Kumar,
"Cardiac Arrest Prediction using Machine Learning Algorithms,"
[9] Sil, R., Roy, A., Bhushan, B., & Mazumdar, A. (2019). Artificial 2019 2nd International Conference on Intelligent Computing,
Intelligence and Machine Learning based Legal Application: The Instrumentation and Control Technologies (ICICICT),
State-of-the-Art and Future Research Trends. 2019 International Kannur,Kerala, India, 2019, pp. 886-890.
Conference on Computing, Communication, and Intelligent Systems
(ICCCIS). doi: 10.1109/icccis48478.2019.8974479 [26] Saqib Ali, Zeeshan Shaukat, Muhammad Azeem, Zareen Sakhawat,
Tariq Mahmood and Khali ur Rehman, “An efficient and improved
[10] Sugata Das , Sekhar Mandal, “Keyword Spotting in Historical Bangla scheme for handwritten digit recognition based on convolutional
Handwritten Document Image Using CNN” , Second International neural network” , Springer , 2019.
Conference on Advanced Computational and Communication
Paradigms (ICACCP) , 2019. [27] Nicole Dalia Cilia, Claudio De Stefano , Francesco Fontanella,
Alessandra Scotto di Freca, “A ranking-based feature selection
[11] Adeel Yousaf , M. Jaleed Khan , M. Imran , Khurram Khurshid, approach for handwritten character recognition” ScienceDirect ,
“Benchmark Dataset for Offline Handwritten Character Recognition” 2019.
, IEEE Xplore , 2017
[28] Jindal, M., Gupta, J., & Bhushan, B. (2019). Machine learning
[12] Grover, M., Verma, B., Sharma, N., & Kaushik, I. (2019). Traffic methods for IoT and their Future Applications. 2019 International
control using V-2-V Based Method using Reinforcement Conference on Computing, Communication, and Intelligent Systems
Learning. 2019 International Conference on Computing, (ICCCIS). doi: 10.1109/icccis48478.2019.8974551
Communication, and Intelligent Systems (ICCCIS). doi:
10.1109/icccis48478.2019.8974540 [29] Khamparia, A., Pandey, B., Pandey, D. K., Gupta, D., Khanna, A., &
Albuquerque, V. H. C. D. (2020). Comparison of RSM, ANN and
[13] Momina Moetesum*, Imran Siddiqi* , Chawki Djeddi , Yaacoub Fuzzy Logic for extraction of Oleonolic Acid from Ocimum
Hannad , Somaya Al-Maadeed , “Data Driven Feature Extraction for sanctum. Computers in Industry, 117, 103200. doi:
Gender Classification using Multi-script Handwritten Texts” , 16th 10.1016/j.compind.2020.103200
International Conference on Frontiers in Handwriting Recognition,
2018. [30] Harjani, M., Grover, M., Sharma, N., & Kaushik, I. (2019). Analysis
of Various Machine Learning Algorithm for Cardiac Pulse
[14] Reza Tavoli , Mohammadreza Keyvanpour , “A method for Prediction. 2019 International Conference on Computing,
handwritten word spotting based on particle swarm optimisation and Communication, and Intelligent Systems (ICCCIS). doi:
multi-layer perceptron” , IET (The Institution of Engineering and 10.1109/icccis48478.2019.8974519
Technology) Journals , 2018.
54
Authorized licensed use limited to: University of Melbourne. Downloaded on June 23,2020 at 19:11:12 UTC from IEEE Xplore. Restrictions apply.