Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Received December 7, 2021, accepted December 22, 2021, date of publication December 23, 2021,

date of current version December 31, 2021.


Digital Object Identifier 10.1109/ACCESS.2021.3138240

Deep Learning Strategy for Braille


Character Recognition
TASLEEM KAUSAR 1 , SAJJAD MANZOOR1 , ADEEBA KAUSAR2 , YUN LU 3,

MUHAMMAD WASIF 4 , AND M. ADNAN ASHRAF1


1 Mirpur Institute of Technology, Mirpur University of Science and Technology, Mirpur, Azad Jammu and Kashmir 10250, Pakistan
2 Department of Computer Science and Information Technology, University of Narowal, Narowal, Punjab 51600, Pakistan
3 School of Computer Science and Engineering, Huizhou University, Huizhou, Guangdong 516007, China
4 Department of Electrical Engineering, University of Gujrat, Gujrat, Punjab 50700, Pakistan

Corresponding author: Yun Lu (luyun_hit@163.com)


This work was supported in part by the Joint Fund of Basic and Applied Basic Research Fund of Guangdong Province under Grant
2020A1515110498, in part by the Professorial and Doctoral Scientific Research Foundation of Huizhou University under Grant
2020JB058, in part by the Planning Project of Enhanced Independent Innovation Ability of Huizhou University under Grant hzu202018,
and in part by the National Natural Science Foundation of China under Grant 62176102.

ABSTRACT People with vision impairment use Braille language for reading, writing, and communication.
The basic structure of the Braille language consists of six dots arranged in three rows and two column cells,
which are identified by visually impaired people using finger touch. However, it is difficult to memorize the
pattern of dots that form the Braille characters. This research presents a novel approach for automatic Braille
characters recognition. The designed approach works in two main stages. In first stage, image alignment
& enhancement are performed using several image preprocessing techniques. In second stage, character
recognition is performed with a proposed lightweight convolution neural network (CNN). As CNN shows
promise for accurate recognition of optical characters. Therefore, we adopted several recently proposed
state-of-the-art CNN networks for Braille characters’ recognition. To make the networks light and improve
their recognition performance, we proposed a strategy by replacing few modules in the original CNNs with
an inverted residual block (IRB) module with less computational cost. The novelty of this work lies in
CNN model design and output performance. We executed the effectiveness of the designed setup through
experiments on two different publicly available benchmark Braille datasets obtained from visually impaired
people. On the English Braille and Chinese double-sided Braille image (DSBI) datasets, the proposed model
shows a prediction accuracy of 95.2% and 98.3%, respectively. The reported test time of model is about 0.01s
for English and 0.03s DSBI Braille images. In comparison to state-of-the-art, designed method is robust,
effective, and capable to identify the Braille characters efficiently. In future, the functional performance of
the proposed Braille recognition scheme will be tested through accessible user interfaces.

INDEX TERMS Braille images, image alignment, principal component analysis, Wiener filtering, convolu-
tion neural networks, inverted residual block.

I. INTRODUCTION formed by arrangement of these six dots in a special manner.


According to World Health Organization (WHO), about Therefore, a dot may be raised at any combination for the
2.2 billion people in world are blind or visually impaired [1]. six positions hence, in total 64 combinations are available
It is hard for these visually impaired people to read and write (2∧6 = 64). In Braille, every character is identified by pattern
text, so they use Braille, a system of raised dots that can be formed by the dots that are raised in cell, The codes of
read by the sense of finger touch [2]. The basic structure of Braille characters, alphabets, and symbols formed through
Braille system is matrix of six dots aligned in 3 × 2 order different combinations of Braille dots are shown in Figure 1b.
as shown in Figure 1a. Each character in a Braille cell is Due to its effectiveness, Braille system is used worldwide by
visually impaired for written communication. However, a lot
The associate editor coordinating the review of this manuscript and of people, especially ordinary people are not able to identify
approving it for publication was Shovan Barma . the Braille characters.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.


VOLUME 9, 2021 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 169357
T. Kausar et al.: Deep Learning Strategy for Braille Character Recognition

There is almost no open source dataset to carry out a


fair comparison between the different algorithms. The lack
of publicly available datasets also inhibits the training of
deep neural networks to evaluate the accuracy of recog-
nition algorithms. As the performance values provided in
above mentioned works are measured on proprietary datasets,
Therefore, in Braille image recognition, it is impossible to
have a fair comparison of proposed work with previous
works. Li et al. [36] proposed a deep semantic segmen-
tation framework named BraUNet. The BraUNet is based
FIGURE 1. (a) Braille dot matrix (b) Alphabets using combinations of
on standard UNet model [37] in which they used auxiliary
braille dots. learning strategy and post processing strategy to improve the
recognition performance. Li et al. [38] also proposed another
Braille recognition technique named TS-OBR that used a tra-
ditional machine learning cascaded classifier. In [28], authors
Recently, deep learning has successfully made major used an object detection network named as RetinaNet [39]
advances in various domains [3]–[6]. The deep learning meth- for optical Braille character recognition. All the models of
ods particularly convolutional neural networks (CNNs) have U-Net, BraUNet, TS-OBR, and RetinaNet are trained only on
dramatically improved the state-of-the-art in natural image the DSBI datasets [40]. DSBI dataset [40] is the only publicly
classification [7]–[9], object detection [10]–[12], image seg- available dataset with labeled Braille text images, and we
mentation [13]–[15], speech recognition [16], [17], and would like to express our deep appreciation to its creators.
medical image analysis [18]–[21]. Although deep learning We evaluated model performance on the two publicly avail-
techniques have made tremendous advances in image recog- able English Braille datasets [41] and Chinese double-sided
nition and achieved high performance results, however, deep Braille image dataset [40].
learning is sparsely used for Braille character recognition. Early deep CNN models e.g., AlexNet [42] and
In this modern era, there is also growing interest in meth- VGGNet [43] mainly consist of the convolution layers and
ods exploration for Braille character recognition [22]–[24]. fully connected layers with strict rules about the resolution
However, there are a few papers on the use of deep models of input images. This is because, these networks are mainly
for Braille recognition. To bridge differences between blind composed of two types of layers i.e., convolution layer, and
people and ordinary people and facilitate the blind people to fully connected layer. The convolution layer can accept an
quickly read Braille language, work is carried out for auto- arbitrary size input image to generate output feature map
matic Braille recognition [25]–[28]. Furthermore, in [29], the of any size. The output feature maps of convolution layer
authors have used convolutional neural network (CNN) and are used as an input to fully connected layer. Essentially
radio character segmentation algorithm (RCSA) to recognize fully connected layer accepts fixed size input, therefore,
Braille characters and convert them into English language. the fixed-size constraint is imposed by fully connected
They claimed a high accuracy of their proposed method. layer. Current CNNs (e.g., Inception [44], ResNet [45], and
A Chinese character recognition model is proposed in [30] DenseNet [46]) contain the fully convolution layer, however
that obtained an accuracy of 94.42%. The proposed method they also adopt global average pooling (GAP) layer [47].
in [30] is extended to read the characters in natural speech. This layer solves the problem of a large number of param-
A combination of conventional sequence mapping method eters, so network computations are considerably reduced.
and deep learning method has been used in [31] to convert Meanwhile, the GAP layer squeezed the output feature maps
Braille characters into Hindi language, where output was by taking the average across height and width dimensions
generated as a speech. In [32], the authors used YOLO for real to realize image input of any size. Recently, another model
time Braille character recognition. In [33], a method so-called called InceptionV3 [44] is designed using small convolution
novel technique has been proposed which translates man- kernels. In this model, large convolution kernel is factorized
darin Braille words to Chinese characters using the N-Best into small kernels where depth and width are increased by
algorithm. The authors claimed that the proposed technique removing unnecessary layers. These modifications consider-
obtained an overall 94.38% translation precision. ably reduce the parameters and computational costs of Incep-
In [34], authors used a deep learning strategy for Braille tionV3. Meanwhile, an increase in network depth and width
music recognition from Braille music images. They applied primarily improves the traditional convolution layer in the
few image preprocessing operations to increase the model network. The DenseNet [46] used dense connection in which
performance. Another deep learning technique is used in [35] input of each layer uses output from previous layers. These
to help blind people to read the Braille characters. They com- patterns of dense connections alleviate the gradient vanishing
pared the results with classical machine learning techniques problem. As a result, the output features are strengthened
such as Naïve Bayes (NB), Decision Trees (DT), SVM, and which leads to better performance with fewer parameters.
KNN. Based on the advantages of the DenseNet network, we used

169358 VOLUME 9, 2021


T. Kausar et al.: Deep Learning Strategy for Braille Character Recognition

it as the backbone network of our final proposed deep CNN different widely used state-of-the-art networks [43]–[44] as a
model. backbone of modified model. As far as our knowledge goes,
Moreover, automatic recognition of characters from Braille our work is the first to successfully perform Braille character
images has some limitations. Images captured with portable recognition using a lightweight deep CNN model. The exper-
camera devices and scanners are usually of low contrast, imental analysis shows that the suggested technique not only
visually disturbed, misaligned, have illumination variation, improves recognition accuracy but also reduces the model
and are contaminated with nonregular noises. These input parameters and computational cost. The main contributions
image distortions can affect the detection performance of of our work are discussed as follows:
deep CNN models and make the automatic recognition of • A novel approach for automatic Braille character recog-
Braille characters challenging. It is, therefore, necessary to nition is proposed. The proposed approach is based
process and align them in standard orientation before feeding on combination of image preprocessing techniques and
them to CNN models. Image preprocessing is important such an advanced lightweight convolution neural network
that deep models can recognize them correctly. It is also a model.
promising solution to obtain a better regularization in deep • The fast lightweight CNN network is designed using an
CNNs. IRB module having less computational cost. Different
In literature, several image preprocessing techniques such effective strategies are also applied to improve network
as image filtering, histogram equalization [48], and chromatic efficiency.
adaptation [49] have been applied to increase image clarity. • The proposed network works effectively on two types
The authors in [50] discussed various image preprocessing of Braille datasets and outperforms the state-of-the-art
methods that are used to develop the security systems using in terms of prediction accuracy and recognition speed.
face, fingerprint, and iris biometric features of humans. Sim- On the English Braille and DSBI datasets, proposed
ilarly, a lot of work has been done on image alignment. approach shows a prediction accuracy of 95.2% and
In [51] the authors have proposed a principal component anal- 98.3%, respectively. The reported test time of proposed
ysis (PCA) based image alignment algorithm. In [52], two model on English and Chinese datasets is about 0.01s
stage image alignment scheme has been proposed for image and 0.03s, respectively.
stitching using line-guided local warping. In [53] a method
The paper is further organized as follows; in Section I,
for joint face alignment has been proposed and compared with
the image preprocessing steps are discussed. In Section II,
existing techniques. They considered rigid variations of faces
and nonrigid distortions in process of joint image alignment. the technical description of proposed method is given.
In [54], Likassa proposed a new image alignment algorithm Experimental setup and results are described in Section III.
to handle the impact of outliers and heavy sparse noises on Section IV concludes the paper.
a set of linearly correlated data using affine transformations
and Frobenius & norms. Recently, Huang et al. [55] have II. METHODOLOGY
proposed a typical low-rank pairwise alignment bilinear clas- A. PREPROCESSING
sification network (LRPABN). In the designed model, they This Braille character recognition scheme is composed of two
used a feature alignment layer to match the image features main stages. In the first stage, image alignment & enhance-
with the query ones. In [56], an optimization framework ment is performed using a series of image preprocessing
is proposed that uses fiducial markers placed in the scene, techniques and then in next stage, characters’ recognition is
reducing visual artifacts caused by misalignments. performed with proposed convolution neural network. The
There are two main challenges in the use of deep CNNs flow diagram of Braille character recognition is shown in Fig-
in image recognition: (i) How the accuracy can be further ure 2. The preprocessing techniques used for image alignment
increased? and (ii) How the network computational cost can and enhancement are discussed in the following section.
be reduced?. In this paper, we have addressed the Braille char-
acter recognition problem as an image classification problem 1) IMAGE ALIGNMENT WITH PRINCIPAL COMPONENT
and established a more effective and suitable CNN based ANALYSIS
algorithm for Braille character recognition. The high com- The Braille images obtained with scanners and camera
putational complexity of previously proposed deep CNNs devices usually are geometrically disturbed. The Braille
was undesirable for real time applications. Unlike previ- images consist of six dots, aligned in 3 × 2 matrix that are
ous approaches, we focused on network computational cost identified by visually impaired people using finger touch.
issue and proposed a new approach for Braille character If the image is not in an accurate position, the visually
recognition that uses the most appropriate image preprocess- impaired people encounter a problem in prediction of char-
ing techniques and lightweight deep convolution neural net- acters. Besides, performance of the automatic Braille char-
work. To solve network computational cost issue, a dynamic acter recognition systems trained on geometrically disturbed
type CNN modification technique using the inverted resid- images is also remarkably affected by image rotation prob-
ual block (IRB) module [57] is employed in original CNN lems. To recognize such images correctly, it is important
model. We tested the performance of proposed method using to resolve the geometric misalignment and align images

VOLUME 9, 2021 169359


T. Kausar et al.: Deep Learning Strategy for Braille Character Recognition

The solution of determinant gives a second-degree equa-


tion as:
λ2 − λ(x + y) + (xy − b2 ) = 0 (6)
Eigenvalues can be computed by solving the quadratic
formula of second-degree as:
q
tr(cz ) ± {tr(cz )}2 − 4 |cz |
λ1 , λ2 = (7)
2
2
tr(cz ) = x + y and |cx | = xy − b (8)
Eigenvectors (2 × 1 vector each) of covariance matrix cz
corresponding to eigenvalues which can be computed as:
 
1 b
u= p (9)
b2 + (λ1 − x)2 λ1 − x
 
1 b
FIGURE 2. Flow diagram of braille character recognition process. v= p (10)
b2 + (λ2 − x)2 λ2 − x
Let X is a matrix whose columns u = [x11 , x21 ]T and v =
in a proper position. In this study, we achieved the image [x21 , x22 ]T formed first and second eigenvectors, respec-
alignment using principal component analysis algorithm [51]. tively. The first vector corresponds to largest eigenvalue and
Image alignment with PCA [51] is described as following second correspond to smallest eigenvalue as:
steps:  
x x
Steps 1: Given the input image, the coordinates (columns X = 11 12 (11)
and rows) of 1-valued pixels are extracted in single matrix Z x21 x22
as: Step2: Using the matrix X , rotation angle of any given mis-

c1 r1
 aligned image can be computed as:
 
−1 tr(X )
 c2 r2 
θ = cos (12)
 . . 
 
Z =  (1) 2
 . . 
where tr(X ) = x11 + y22 . To find the standard orientation of
cN rN
the original image, the rotation angle is computed as:
where r, c, and N show row, column, and total number of 
if θ 6= 0 rotation yes

pixels in the given input image, respectively. In PCA algo- Rotation = (13)
otherwise No
rithm, computing the mean and covariance matrix is essential
to find the first component which represents the direction of where θ denotes the true side of the rotated object.
maximum variance of data spread. Mean can be computed as: Step 3: The θ 0 is desired angle to convert the rotated image
into its standard orientation, can be computed as:
N
1 X θ 0 = −(x11 × x21 )θ (14)
mz = Zk (2)
N
k=1 PCA aligned the images spatially in the direction of their
The covariance is general 2 × 2 matrix can be computed as: principal spread. The misaligned images are converted back
to true standard orientation by rotating them with the angle
N
1 X calculated in Eq. (14) using bilinear interpolation. Results of
cz = (Zk − mz )(Zk − mz )T (3)
N −1 image alignment are shown in Figure 3.
k=1

Eigenvalues and eigenvectors indicate the magnitude and 2) IMAGE ENHANCEMENT AND NOISE REMOVAL
direction of variances, respectively. Eigenvalue problem can Most Braille images are exaggerated by noise and deep mod-
be solved as: els trained on noisy images would result in poor recognition
performance. Therefore, there is a high need to enhance
(cz − λI )e = 0 (4) image quality before feeding them to any deep CNN model.
where e, λ and I represent eigenvalue, eigenvector, and iden- The image enhancement steps are given as follows:
tity matrix, respectively. To find solution for e in Eq. (4), Step 1: At first step of object ROI extraction (dots patterns
(cz − λI ) must be a nonsingular matrix as: in case of Braille images), input noisy image I is filtered with
Wiener filter [58]. From the input noisy images, it smooths
det(cz − λI ) = 0 (5) the noise and inverts the blur effect simultaneously. This

169360 VOLUME 9, 2021


T. Kausar et al.: Deep Learning Strategy for Braille Character Recognition

FIGURE 3. Standard orientation results on english braille character Dataset [41]. Visually disturbed Braille images and corresponding
aligned images; images are rotated around the vertical axis.

filtering process enhances the doted regions and preserves The results of Wiener filtering, thresholding, and morpho-
smooth boundaries. logical operations are shown in Figure 4. From Figure 4,
Step 2: In the next step, hysteresis threshold [59] is it can be seen that the Wiener filter highlights the core
applied. The binary image obtained after thresholding can be area and improves the image quality by eliminating additive
expressed as: noises even when the image is blur having low intensity.
The morphological operations help in removing unnecessary
0, if I (x, y) < T1

objects and preserving smooth boundaries. Furthermore, the


1, if I (x, y) > T2


erosion and dilation help in removing unnecessary objects in




 0, if T1 ≤ I (x, y) ≤ T2 and I (x, y) is a final ROI images. The ultimate goal of image alignment and
Iw = (15)

 neighbour of pixels with label 0 enhancement is to improve image quality. From experimental
analysis, we found that image preprocessing significantly



 1, if T1 ≤ I (x, y) ≤ T2 and I (x, y) is a
improves the performance of the deep CNN model in terms


neighbour of pixels with label 1

of recognition efficiency. Detail of image alignment and
where, T1 and T2 denote the upper and lower threshold. enhancement steps is given in Algorithm 1.
In experiments, we set the upper threshold T1 to 0.9 and lower
threshold T2 to 0.45. B. PROPOSED IMAGE CLASSIFICATION METHOD
Step 3: Binary images are then processed with morpholog- In this section, the proposed deep CNN model is systemat-
ical opening by reconstruction using a disc-shaped structur- ically described and various used techniques are discussed.
ing element [60]. The convolutional neural network is one form of artificial
φ(I ) = I − ps (Iw |I ) (16) neural network (ANN) that is mostly applied to pattern
recognition tasks. Unlike other ANN networks, CNNs use
where ps (Iw |I ) represent the re-constructor operator of struc- spatial type of kernels and strides to maintain temporal and
turing element s and mask image Iw . Whereas, the mask spatial details, while the ANNs suppress the information into
image can be obtained after the opening process as follows: one dimensional metrics. Many researchers have been using
Iw = (I 2S) ⊕ S (17) CNNs in the field of image processing and obtained robust
performance. Deep CNN mainly consists of alternative, con-
Step 4: The mask image Iw is then processed to output volutions, max-pooling, and fully connected layers that learn
image Io using morphological erosion and dilation opera- high level details from low level representations [42]. These
tions [60]. The grayscale erosion is described as: layers are trained by tuning several hyper parameters such as
stride, kernel size, and padding.
(Iw 2S)(x, y) = min Iw (x + x 0 , y + y0 )|(x 0 , y0 ) ∈ Vs }

In this paper, we have designed a deep CNN model con-
(18) sisting of two main layers, i.e., feature extraction and classi-
where 2 denotes binary erosion between r and structuring fication layer. For feature extraction, we selected previously
elements. The Vs is the domain of structuring elements. proposed deep CNN networks: VGG16 [43], ResNet50 [45],
Step 5: Similarly, grayscale dilation is written as: Inceptionv3 [44] and DensNet201 [46]. In each original net-
work, last few modules are replaced with IRB module [57].
(Iw ⊕ S)(x, y) = max Iw (x − x 0 , y − y0 )|(x 0 , y0 ) ∈ Vs }

We selected four networks as the backbone of our model
(19) i.e. VGG16 [43], ResNet50 [45], InceptionV3 [44], and

VOLUME 9, 2021 169361


T. Kausar et al.: Deep Learning Strategy for Braille Character Recognition

FIGURE 4. (a) Original image, (b) filtered with wiener filter, (c) after hysteresis thresholding, (d) after
morphological reconstruction, (e) morphological flood-fill i.e., erosion followed by dilation. The
images are taken from English Braille dataset.

Algorithm 1 for Preprocessing of Braille Image


Input: noisy image I
Output: processed image Io
Procedure:
Image alignment
##Given the input image
 
x11 x12
X= /eigenvalues and corresponding eigenvectors are computed from
x21 x22 
)
θ = cos−1 tr(X 2 /rotation angle of given misaligned image is computed
θ 0 = −(x11 × x21 )θ /misaligned images are rotated back to standard orientation using θ 0
Image enhancement

Iw = Wiener(I , noise) /wiener filtering


<


 0, if I (x, y) T1
1, if I (x, y) > T2


Iw = /hysteresis thresholding


 0, if T1 ≤ I (x, y) ≤ T2 and I (x, y) is a neighbour of pixels with label 0

1, if T1 ≤ I (x, y) ≤ T2 and I (x, y) is a neighbour of pixels with label 1
φ(Iw ) = Iw − ps (Im |Iw ) / morphological opening by reconstruction
(Iw 2S)(x, y) = min Iw (x + x 0 , y + y0 )|(x 0 , y0 ) ∈ Vs

/morphological erosion
(Iw ⊕ S)(x, y) = max Iw (x − x 0 , y − y0 )|(x 0 , y0 ) ∈ Vs /morphological dilation

DenseNet201 [46]. In case of VGG16 as backbone, last three Inception module is replaced with IRB module. Essentially,
convolution layers are replaced with IRB module. In case main structure of DenseNet201 is composed of four differ-
of ResNet50, last six convolution layers are replaced with ent dense blocks and transition layers. The transition layers
IRB module. In case of InceptionV3 as backbone, the last (1 × 1 convolution followed by 2 × 2 average pooling) are

169362 VOLUME 9, 2021


T. Kausar et al.: Deep Learning Strategy for Braille Character Recognition

used between two contiguous dense blocks to connect them. 1) TEST TIME MINIMIZATION THROUGH INVERTED
Each dense block consists of 6, 12, 48, 32 Bottleneck layers, RESIDUAL BLOCK
respectively. In each Bottleneck layer, BN-ReLU-1 × 1 Conv Let the convolution layer with kernel size k ×k takes the input
is used before BN-ReLU-3×3 Conv, where BN represents the Cin from previous layer and produce output feature map of
Batch Normalization. In case of DenseNet201, we removed size Cout , time computations involve by standard convolution
10 Bottleneck layers in the last dense block with IRB module. layer [61] can be computed as:
The network architecture of designed deep CNN model (with  
DenseNet201 backbone) is shown in Figure 5. In the IRB Ostd Cin · k 2 · Mout · Nout · Cout (23)
module, a 1 × 1 pointwise convolution is added, before and
where, Mout , Nout and Cout are spatial dimensions of output
after the depthwise convolution layer. Because in depthwise
feature maps. For instance, time computation for the first
convolution, the number of channels in output feature map
convolution layer is written as:
(Cout ) are same as input feature map (Cin ). The pointwise
convolutional layer increases the dimension of output feature Ostd (C0 . k12 . M1 . N1 . C1 ) (24)
maps. As the feature maps are expanded by a factor n, number
of output channels are increased to n × Cin . In this way, net- Equation 24 shows that time complexity in CNN archi-
work extracts more discriminative features in higher dimen- tecture mainly relies on spatial dimensions of the input
sional space and increases the network performance. Using image, number of kernels, and kernel size used. In our pro-
the second pointwise convolution layer, output dimension posed architecture, use of IRB and small size image are
reduces such that the output channels are same as input chan- the major contributions to reduce computational complexity.
nels and features information are concentrated in a channel. For instance, when the size of input image is reduced from
Architecture of inverted residual block is shown in Figure 6. 28 × 28 × 3 to 14 × 14 × 3, time complexity of first
Moreover, modified network consists of convolution, batch convolution layer is reduced to 75% and is computed (using
normalization, ReLU activation function followed by global Eq. 24) as:
average pooling layer. In IRB module, we used 512 filters and Ostd (64 × 32 × 28 × 28 × 3)
for the comparative analysis, the hyperparameter expansion
→ Ostd (64 × 32 × 14 × 14 × 3) (25)
factor n is set as 1, 2, and 3. The suggested modifications in
proposed model reduce the computational cost and lead to Furthermore, use of IRB [57] effectively reduces the com-
fewer network parameters. putational cost of proposed model. The IRB is depthwise sep-
We performed classification process mainly with Softmax arable convolution that separates the traditional convolution
function, which is used to calculate the probability value of into depthwise convolution and 1 × 1 pointwise convolution
each given class. The classification with Softmax is formu- to offer less computational cost. The computational cost of
lated as: depthwise separable convolution is defined as sum of the
e yl calculation amount of deep convolution and 1×1 convolution.
σ = PJ (20) The computational cost of depthwise separable convolution is
yj
j=1 e defined as:
 
where yi is input to softmax, j is total number of categories, Odpt Cin · f 2 · Mout · Nout + Cin · Cout · Mout · Nout
and σ is normalized Softmax output probability. Image is
(26)
assigned with a class label for which the predicted Softmax
class probability value is highest. The Softmax class proba- The computational cost ratio of depthwise separable convo-
bility is computed as: lution to standard convolution can be computed as:
Cin · f 2 · Mout · Nout + Cin · Cout · Mout · Nout

o = arg max(σ ) (21) cr = Odpt.
× Ostd Cin · f 2 · Mout · Nout · Cout  (27)
In recognition model, the categorical cross entropy loss func-
.
tion of Softmax is used to find the proximity between actual cr = 1 Cout + 1 f 2

(28)
and desired output. The loss function is computed as:
where Eqs. (26) and (28) represent the advantage of depth
N
!
1 X eyi separable convolution that changed the multiplication opera-
J= − log PJ (22)
N yj tion into multiplication and addition operations. For instance,
t=1 j=1 e
using the downsampled input Braille image of 14 × 14 pixels
where N is total number of training samples. The network at first layer, in standard convolution, 64 3× 3 × 3 kernels
training is considered as optimization process, where group move 12 × 12 times (i.e., 64 × 3 × 3 × 3 × 12 × 12 =
of optimal solutions in parameter space is found to make the 248,832 multiplications). In contrast with separable convolu-
loss minimum. The gradient of loss function is calculated tion, in depthwise convolution, 3 5 × 5 × 1 kernels move 12×
and weight parameters are updated until training loss is min- 12 times (i.e., 3 × 3 × 3 × 12 × 12 = 3,888 multiplications)
imized to lowest value. and in pointwise convolution, 64 1 × 1 × 3 kernels move

VOLUME 9, 2021 169363


T. Kausar et al.: Deep Learning Strategy for Braille Character Recognition

FIGURE 5. The network architecture of proposed deep CNN model (with DenseNet201 backbone). Ten bottleneck layers in last dense block of DenseNet
in backbone network are replaced with inverted residual block.

FIGURE 6. Inverted residual block.

TP
12 × 12 times (i.e., 64 × 1 × 1 × 3 × 12 × 12 = 7,648 Percesion = (30)
multiplications). Summing the multiplications of depthwise TP + FP
convolution and point wise convolutions gives 31,536 multi- percesion × sensitivity
F score = 2 × (31)
plications. These calculations show that using depth separa- percesion + sensitivity
ble convolution, the time complexity of first convolution layer TP + TN
Predictive Accuracy = (32)
is decreased by 87.3%. TP + TN + FP + FN
where TP, FP, TN &FN represent true positives, false posi-
III. EXPERIMENTAL SETTINGS AND RESULTS ANALYSIS tives, true negatives, false negatives, respectively.
This section discusses architecture of modified model, set- Entropy [64] of perdition as:
tings of experiments, dataset used, and output results. Image H =−
X
p (y|i, D) log p (y|i, D) (33)
preprocessing algorithms are implemented with MATLAB
y∈Y
R2018a program. Whereas, the CNN algorithm is imple-
mented with TensorFlow [62] and Keras [63] libraries on Peak signal to noise ratio (PSNR) [65] as:
.√
2.4 GHz Intel(R) Xeon(R) E5-2630 CPU with one NVIDIA PSNR = 20 log MAXI MSE (34)
Tesla M40 GPU of 12GB memory. Firstly, we investigate the
impact on performance of different preprocessing techniques where MAXI represents the maximum intensity value in the
used. The primary aim is to check the effects of different pre- image and MSE is the mean squared error between the origi-
viously proposed backbone networks on output recognition nal image and noise contaminated image.
performance.
B. DATASET
A. STATISTICAL MEASURES In this paper, the Braille character detection performance
Following measures are taken to evaluate the recognition of proposed method is checked with two publicly available
performance of the proposed model: Braille datasets, namely English Braille character [41] and
double-sided Braille image dataset [40].
TP Detailed description of datasets is given in subsequent
Sensitivity = (29) paragraphs.
TP + FN
169364 VOLUME 9, 2021
T. Kausar et al.: Deep Learning Strategy for Braille Character Recognition

TABLE 1. Analysis of impact in performance of image preprocessing (a: alignment, e: enhancement), data augmentations (dim: brightness correction, rot:
rotation, whs: width hight shift), and downsampling on proposed method with DenseNet201 as backbone. results are evaluated on english braille dataset.

TABLE 2. Hyper-parameter settings of the experiments using english C. EVALUATION ON ENGLISH BRAILLE DATASET
braille dataset.
In this section, we have shown the results obtained on English
Braille character dataset [41].

1) ABLATION EXPERIMENTS
In order to check the impact in performance of preprocessing,
data augmentations, and image downsampling techniques,
we performed several experiments using our improved model
(Backbone: DenseNet201), results are given in Table 1.
1) ENGLISH BRAILLE CHARACTER DATASET [41] In first experiment, the model is trained with original images
This Braille image dataset is publicly available at open (without preprocessing). In second experiment, the model is
source Kaggle repository [41]. It is composed of 1560 images trained with images that are aligned using PCA algorithm
including 26 characters. These Braille characters’ images are (‘A’ denotes image alignment). In the third experiment, the
grayscale, 3 channel images having black and white color model is trained with images that are preprocessed with
scales with resolution of 28 × 28 × 3 pixels. We split the image enhancement techniques (‘E’ denotes image enhance-
dataset into training, validation, and test sets. About 20% ment). Similarly, in fourth experiment, the training process
and 10% of the Braille English character dataset are selected is done by incorporating data augmentations (rot: rotation,
in the test set and validation set, respectively. While the whs: width height shift, and dim: brightness correction) tech-
remaining images are used for model training. The size of niques. Furthermore, in the fifth experiment, the effect of
training dataset is relatively small so, augmented versions image downsampling on model performance is analyzed. The
of original images are generated with rotation, width height resolution of input images is downsampled from 28 × 28 × 3
shift, and brightness correction. These data augmentations pixels to 14 × 14 × 3 pixels using bilinear interpolation.
increased the size of data by 60 times i.e., 3 augmentations In sixth experiment, training process is repeated by involving
× 20 different augmentation values. all the image preprocessing, data augmentation, and down-
sampling techniques.
2) CHINESE DOUBLE-SIDED BRAILLE IMAGE (DSBI) It can be observed from Table 1, image preprocessing
DATASET [40] improves the image quality and significantly increases the
This Braille image dataset is also publicly available at open performance of the deep CNN model in terms of recognition
source Github repository [40]. It contains 114 double-sided efficiency. Particularly, with image alignment and enhance-
Braille images obtained from several Braille books. Aug- ment steps, it is possible to obtain high accuracy in the
mented versions of this dataset are also generated with Braille character recognition. From experimental analysis,
rotation, width height shift, and brightness correction of the we found that the incorporation of data augmentations indi-
original images. In this dataset, each Braille image is of cates consistent improvements in classification accuracy. The
different resolution. We divided this dataset into training, impact of image downsampling on recognition performance
validation, and test sets. Each set is proportionally sampled can also be observed. It can be seen that small size of input
to construct training, validation, and test parts with 70, 14, image did not significantly affect the recognition accuracy;
and 30 images, respectively. We evaluated the results on both however, the parameters with downsampling are lower than
recto and verso Braille characters. Although this dataset is not without downsampling. Overall results demonstrate that the
large enough to provide full training of the deep CNN model, model achieved superior performance when all image pre-
it allows us to compare different methods to the problem. processing, augmentation, and downsampling techniques are

VOLUME 9, 2021 169365


T. Kausar et al.: Deep Learning Strategy for Braille Character Recognition

TABLE 3. Comparison of f score and parameters of our modified network (using different backbones) with corresponding original networks on english
braille dataset. F score and parameters are also compared using different values of expansion factor n.

considered. Consequently, the model obtained overall F score results show that our method improves input Braille images
of 95.2% for Braille character recognition. quality and effectively decreases the model parameters. It also
Furthermore, three experiments are performed with differ- provides benefits to achieve robust recognition accuracy of
ent settings of hyper-parameters. The hyper-parameters help Braille characters.
in proper training of deep CNN model. The settings of hyper
parameters and training time of each experiment are given 3) CLASSIFICATION ACCURACY (%) COMPARISON WITH
in Table 2. We repeatedly revised model and adjusted the THE STATE-OF-THE-ART TECHNIQUES
hyper-parameters of model to obtain better performance. Best In this section, we elaborately relate our results to the state-of-
performance is obtained in third experiment when the classi- the-art used in image classification. The performance of our
fier is trained for 150 epochs at learning rate of 0.0002 and method with backbone DenseNet201 [46] (using expansion
batch size of 8 using root mean square propagation (rmsprop) factor n = 1) is compared with original InceptionV3 [44],
optimizer. These hyper-parameter values are heuristically ResNet50 [45], VGG16 [43] networks, and even with most
chosen after rounds of experimentation on training data. advanced DenseNet201 network [46]. It can be noted that
our modified model has fewer parameters than the original
2) EXPERIMENTS WITH DIFFERENT BACKBONES networks [43]–[46] (reported in Table 4).
To further evaluate the ability of the designed model, results This is because we have replaced few modules in original
are evaluated using different backbone networks with chang- networks with the IRB which reduces the parameters. From
ing the expansion factor n. The F score and parameters of the performance analysis, it is also clear that F score of
our proposed models with different backbone networks are our modified method is relatively better compare to the cor-
reported in Table 3. responding original InceptionV3, DenseNet201, ResNet50,
Results show that our method with different backbones and VGG16 models (reported in Table 4). Overall, experi-
obtains the improvement to different degrees. Generally, our ments performance indicates that proposed approach not only
method with backbone as DenseNet201 achieves the highest improves the character recognition accuracy but also reduces
F score in most cases. It is also important to note that use of the parametric load effectively.
different expansion factors of 1, 2, or 3 has little impact on the Entropy measure is used to compute the uncertainty of
F score, it’s almost similar for all values of n. Nevertheless, predictions. It captures the fact that misclassification mainly
computational parameters are much higher with n = 2 or occurred for highly uncertain predictions. Therefore, to mea-
n = 3 than that with n = 1. In the deep convolution layer, if the sure the Aleatoric uncertainty [66] of model, we computed
value of n increases the parameters rise sharply. Therefore, the entropy of predictions. In comparison to the state-of-the-
we selected n = 1, in our experiments which not only makes art models (see Table 4), our proposed method (Backbone:
parameters fewer but also provides a significant classification DenseNet201) shows low entropy of prediction i.e. 19.2%.
accuracy.
The precision-recall curve of proposed model to com- D. TIME ANALYSIS
pare the performance evaluated with different backbone net- Training a deep learning model is most challenging, takes
works [43]–[46] on test dataset is shown in Figure 7. After a long time. However, in real time scenarios, it is fore-
comparing performance with different backbones (using most priority to make the recognition speed of automatic
n = 1), it is found that the modified model achieved best classification models fast [67], [68]. The results shows that
classification results with the DenseNet201. The evaluated our proposed method (Backbone:DenseNet201) has fewer

169366 VOLUME 9, 2021


T. Kausar et al.: Deep Learning Strategy for Braille Character Recognition

FIGURE 7. The precision-recall curve of proposed method with different backbone networks on english braille
test data.

TABLE 4. Performance comparison with the state-of-the-art models on TABLE 5. Train and test time on english and DSBI braille datasets.
english braille dataset.

parameters as compared to original DenseNet201 model.


This is because, in original DenseNet201 model, few mod-
ules are replaced with an IRB module (using n = 1) which proposed model for different experiments is given in Table 5.
has fewer parameters compared to original bottleneck layers Moreover, the overall train and test time of our improved
and therefore makes the computational cost lower. In our versions of DenseNet, InceptionV3, ResNet50, and VGG16
improved versions of DenseNet, InceptionV3, ResNet50, and network is given in Table 5. The train time is model training
VGG16 network, the use of IRB effectively reduced the time while test time is inference time of model computed for
memory requirement and inference time. The appropriate use a single image.
of expansion factor n values in IRB provides the mechanism
to control the parametric load. Furthermore, the concept of E. ROBUSTNESS AGAINST INPUT PERTURBATIONS
image downsampling also helps to control computational In this analysis, the capability of proposed model (backbone
costs. We repeated the training process several times until DenseNet20) is checked against input image noise perturba-
optimized set of parameters are obtained. The training time of tions. To evaluate robustness of the model to input image

VOLUME 9, 2021 169367


T. Kausar et al.: Deep Learning Strategy for Braille Character Recognition

TABLE 6. Analysis of robustness in terms of predictive accuracy (%) on english braille dataset. Where m denotes different noise levels.

perturbations, we added different levels of multiplicative TABLE 7. PSNR of images contaminated with different noise levels.
Gaussian noise to test images and input the contaminated
noisy images into the classifiers. The image perturbation is
described as:

i = i + m.randn(c1 , c2 , 3) (35)

where i represents the input image of size c1 × c2 × 3, i


denotes contaminated data, randn() represents the Gaussian
noise function. We selected the different noise levels m =
1, 3, 5, 7, 9, 11. In image perturbation the pixel values may characters. Furthermore, the results are compared with the
exceed [0, 255], we clipped the exceeded values to limit recent TS-OBR method [38] that used a traditional machine
them in range [0, 255]. We performed separate experiments learning cascaded classifier. The results are also compared
by using (i) original input images, (ii) preprocessed input to a recently proposed Braille recognition method [28]. The
images (iii) preprocessed images contaminated with different authors of [40], [36] compared the accuracy of algorithms
noise levels. From robustness test results, as given in Table 6, at dot detection, while in [27], [38] accuracies are compared
it can be concluded that our modified network has strong both on character and dots levels. To make a fair performance
robustness compared to original DenseNet201 network for all comparison, methods are tested on same hardware. We retrain
types of input images. This analysis also demonstrates strong and test the U-Net [40], BraUNet [36], TS-OBR [38], and
generalization capability of modified model to accept wider RetinaNet [28] methods on our newly divided sets of DSBI
range of Braille data with inevitable noise perturbations. Braille images (i.e. training, validation, and test parts with 70,
Moreover, we studied the effect of increase in noise level on 14, and 30 images). The performance comparison of tested
the PSNR [65] values. The PSNR computed (using Eq. 34) methods is given in Table 8. The results show that our model
for preprocessed images and noisy images (contaminated achieved an accuracy of 0.983% in character level detection
with different noise levels) are given in Table 7. In Table 7, on DSBI dataset. Our method took about time 0.03s/image on
it can be seen that PSNR values reduce for the images which 2.4 GHz Intel(R) Xeon(R) E5-2630 CPU with one NVIDIA
are degraded by high noise. It is confirmed corrupting noise Tesla M40 GPU of 12GB memory. Our method obtained a
affects the fidelity of image representation. remarkable performance in terms of F score, outperforms
other methods in terms of recognition speed. This Braille
F. EVALUATION ON CHINESE DOUBLE-SIDED character recognition speed can be acceptable in real time
BRAILLE (DSBI) DATASET applications.
To further check the ability of the designed model, All compared methods [40], [38], [28], [36] applied several
we evaluated its performance on double-sided Braille image post processing strategies which increase their computational
dataset [40]. burden. The use of IRB module in our proposed scheme
Similar to the English Brail dataset [41], we performed provides the mechanism to control the parametric load and
image alignment and enhancement with image preprocess- effectively reduce the memory requirement and inference
ing techniques. We adopted same training protocols as were time of CNN model.
set in case of English Braille images. On this dataset,
the performance is also calculated in terms of precision, G. MCNEMAR’S STATISTICAL TEST
sensitivity, and F score. The application of deep learning To evaluate the statistical significance of the difference
model on the DSBI dataset has been reported in few previ- between the results obtained with our method and other pre-
ously published works [37], [28], [40], [36]. We compared viously proposed methods, we have performed a McNemar’s
effectiveness of our algorithm with these approaches. The statistical test. This statistical test is widely used to check
compared methods used U-Net [40] and BraUNet [36] mod- whether one machine learning classifier outperforms other
els in their approaches. U-Net [40] and BraUNet [36] are on a given problem. The null hypothesis of McNemar’s test
deep CNN based models. BraUNet is segmentation model defines that two methods disagree with same value. How-
based on modified U-Net architecture with auxiliary fore- ever, if two methods disagree with different values, the null
ground segmentation task to determine the area occupied by hypothesis is rejected. In this test, we set the significance level

169368 VOLUME 9, 2021


T. Kausar et al.: Deep Learning Strategy for Braille Character Recognition

TABLE 8. Performance comparison with the state-of-the-art models on DSBI dataset.

TABLE 9. Statistical test on DSBI dataset. method performed fast character recognition with better accu-
racy (i.e. 95.2% and 98.3% for English and Chinese Braille
datasets, respectively), and robust to input image noise per-
turbations. The time complexity of proposed model is also
compared with state-of-the-art methods. Our model involves
fewer time computations and performs the Braille character
recognition fastly (i.e. 0.01s and 0.03s for English and Chi-
nese Braille images, respectively). The state-of-the-art char-
acter recognition accuracy and speed prove that the proposed
can be suited for real-time applications. In the future, this
method can be extended to perform recognitions on larger
Braille datasets. Future work may also investigate the impacts
of Braille image qualities on Braille character recognition
systems due to the difference in image sources
REFERENCES
at alpha = 5%. The acceptance of null hypothesis assumed [1] S. Shokat, R. Riaz, S. S. Rizvi, K. Khan, F. Riaz, and S. J. Kwon, ‘‘Analysis
and evaluation of Braille to text conversion methods,’’ Mobile Inf. Syst.,
that the p-value is greater than the alpha value. The p-values vol. 2020, pp. 1–14, Jul. 2020, doi: 10.1155/2020/3461651.
obtained during experimental analysis are given in Table 9. [2] A. Mousa, H. Hiary, R. Alomari, and L. Alnemer, ‘‘Smart Braille system
The obtained p-values are greater than alpha value of 5%. The recognizer,’’ Int. J. Comput. Sci. Issues, vol. 10, no. 6, p. 52, 2013.
[3] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521,
null hypothesis for given test proves that there is no difference no. 7553, p. 436, Feb. 2015.
in the disagreement. The tested methods have similar propor- [4] T. Patil, S. Pandey, and K. Visrani, ‘‘A review on basic deep learning tech-
tion of errors. The null hypothesis shows that classification nologies and applications,’’ in Data Science and Intelligent Applications
(Lecture Notes on Data Engineering and Communications Technologies).
methods have similar performance metric values, the differ- 2021.
ence in any is by chance only and rejected in each case. From [5] R. Aggarwal, V. Sounderajah, G. Martin, D. S. W. Ting,
this analysis, it can be concluded that proposed method takes A. Karthikesalingam, D. King, H. Ashrafian, and A. Darzi, ‘‘Diagnostic
accuracy of deep learning in medical imaging: A systematic review and
about 0.03s/image for DSBI dataset, outperforms the existing meta-analysis,’’ NPJ Digit. Med., vol. 4, no. 1, pp. 1–23, Dec. 2021, doi:
Braille recognition techniques, and provides state-of-the-art 10.1038/s41746-021-00438-z.
performance for real time applications. [6] T. Kausar, M. A. Ashraf, A. Kausar, and I. Riaz, ‘‘Convolution neural
network based approach for breast cancer type classification,’’ in Proc. Int.
Bhurban Conf. Appl. Sci. Technol. (IBCAST), Jan. 2021, pp. 407–413, doi:
IV. CONCLUSION AND FUTURE WORK 10.1109/IBCAST51254.2021.9393249.
[7] X. Wang, Y. Yan, P. Tang, X. Bai, and W. Liu, ‘‘Revisiting multiple instance
In this paper, an automatic Braille image character recog- neural networks,’’ Pattern Recognit., vol. 74, pp. 15–24, Feb. 2018, doi:
nition algorithm using a lightweight convolution neural net- 10.1016/j.patcog.2017.08.026.
work is proposed. In preprocessing stage, designed approach [8] P. Wang, E. Fan, and P. Wang, ‘‘Comparative analysis of image clas-
sification algorithms based on traditional machine learning and deep
applied principal component analysis strategy to resolve the learning,’’ Pattern Recognit. Lett., vol. 141, pp. 61–67, Jan. 2021, doi:
misalignment problem by assessing the orientation of the 10.1016/j.patrec.2020.07.042.
true side of the object i.e., Braille dots pattern in Braille [9] Y. Yuan, C. Wang, and Z. Jiang, ‘‘Proxy-based deep learning frame-
work for spectral–spatial hyperspectral image classification: Efficient and
image. Furthermore, the mathematical, adaptive, and geomet- robust,’’ IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–15, 2021, doi:
ric operators are used to improve the Braille image features. 10.1109/TGRS.2021.3054008.
Subsequently, proposed lightweight convolution neural with [10] Y. Liu, P. Sun, N. Wergeles, and Y. Shang, ‘‘A survey and perfor-
mance evaluation of deep learning methods for small object detec-
IRB is used to perform character recognition. Comparison tion,’’ Expert Syst. Appl., vol. 172, Jun. 2021, Art. no. 114602, doi:
with state-of-the-art methods concluded that the designed 10.1016/j.eswa.2021.114602.

VOLUME 9, 2021 169369


T. Kausar et al.: Deep Learning Strategy for Braille Character Recognition

[11] A. W. Kabani and M. R. El-Sakka, ‘‘Object detection and localization using [31] P. Kaur, S. Ramu, S. Panchakshari, and N. Krupa, ‘‘Conversion of Hindi
deep convolutional networks with softmax activation and multi-class log Braille to speech using image and speech processing,’’ in Proc. IEEE 7th
loss,’’ in Proc. Int. Conf. Image Anal. Recognit., 2016, pp. 358–366, doi: Uttar Pradesh Section Int. Conf. Elect., Electron. Comput. Eng. (UPCON),
10.1007/978-3-319-41501-7_41. Nov. 2020, pp. 1–6, doi: 10.1109/UPCON50219.2020.9376566.
[12] B. Wu, T. Kausar, Q. Xiao, M. Wang, W. Wang, B. Fan, and D. Sun, ‘‘FF- [32] A. A. Choudhury, R. Saha, S. Z. H. Shoumo, S. R. Tulon, J. Uddin, and
CNN: An efficient deep neural network for mitosis detection in breast can- M. K. Rahman, ‘‘An efficient way to represent Braille using YOLO
cer histological images,’’ in Proc. Annu. Conf. Med. Image Understanding algorithm,’’ in Proc. Joint 7th Int. Conf. Inform., Electron. Vis. (ICIEV),
Anal., 2017, pp. 249–260, doi: 10.1007/978-3-319-60964-5_22. Jun. 2019, pp. 379–383, doi: 10.1109/ICIEV.2018.8641038.
[13] J. Ji, X. Lu, M. Luo, M. Yin, Q. Miao, and X. Liu, ‘‘Parallel fully [33] M. Jiang, X. Zhu, G. Gielen, E. Drábek, Y. Xia, G. Tan, and T. Bao, ‘‘Braille
convolutional network for semantic segmentation,’’ IEEE Access, vol. 9, to print translations for Chinese,’’ Inf. Softw. Technol., vol. 44, pp. 91–100,
pp. 673–682, 2021, doi: 10.1109/ACCESS.2020.3042254. Feb. 2002, doi: 10.1016/S0950-5849(01)00220-8.
[14] A. Ouahabi and A. Taleb-Ahmed, ‘‘Deep learning for real-time semantic [34] W. Richards. (2020). Music Braille Pedagogy: The Intersection of Blind-
segmentation: Application in ultrasound imaging,’’ Pattern Recognit. Lett., ness, Braille, Music Learning Theory, and Laban. [Online]. Available:
vol. 144, pp. 27–34, Apr. 2021, doi: 10.1016/j.patrec.2021.01.010. https://researchspace.auckland.ac.nz/handle/2292/51655.
[15] N. Alalwan, A. Abozeid, A. A. ElHabshy, and A. Alzahrani, ‘‘Effi- [35] S. Shokat, R. Riaz, S. S. Rizvi, A. M. Abbasi, A. A. Abbasi, and
cient 3D deep learning model for medical image semantic segmenta- S. J. Kwon, ‘‘Deep learning scheme for character prediction with
tion,’’ Alexandria Eng. J., vol. 60, no. 1, pp. 1231–1239, Feb. 2021, doi: position-free touch screen-based Braille input method,’’ Hum.-
10.1016/j.aej.2020.10.046. centric Comput. Inf. Sci., vol. 10, no. 1, pp. 1–24, Dec. 2020,
[16] Z. Song, ‘‘English speech recognition based on deep learning with mul- doi: 10.1186/s13673-020-00246-6.
tiple features,’’ Computing, vol. 102, no. 3, pp. 663–682, Mar. 2020, doi: [36] R. Li, H. Liu, X. Wang, J. Xu, and Y. Qian, ‘‘Optical Braille recognition
10.1007/s00607-019-00753-0. based on semantic segmentation network with auxiliary learning strategy,’’
[17] Y. Jin, B. Wen, Z. Gu, X. Jiang, X. Shu, Z. Zeng, Y. Zhang, Z. Guo, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops,
Y. Chen, T. Zheng, Y. Yue, H. Zhang, and H. Ding, ‘‘Deep-learning- Jun. 2020, pp. 554–555, doi: 10.1109/CVPRW50498.2020.00285.
enabled MXene-based artificial throat: Toward sound detection and speech [37] O. Ronneberger, P. Fischer, and T. Brox, ‘‘U-Net: Convolutional
recognition,’’ Adv. Mater. Technol., vol. 9, Jul. 2020, Art. no. 2000262, doi: networks for biomedical image segmentation,’’ in Proc. Int. Conf.
10.1002/admt.202000262. Med. Image Comput. Comput.-Assist. Intervent., 2015, pp. 234–241,
[18] T. Kausar, M. Wang, M. A. Ashraf, and A. Kausar, ‘‘SmallMitosis: Small doi: 10.1007/978-3-319-24574-4_28.
size mitotic cells detection in breast histopathology images,’’ IEEE Access, [38] R. Li, H. Liu, X. Wang, and Y. Qian, ‘‘Effective optical Braille recog-
vol. 9, pp. 905–922, 2021, doi: 10.1109/ACCESS.2020.3044625. nition based on two-stage learning for double-sided Braille image,’’
[19] T. Kausar, M. Wang, M. Idrees, and Y. Lu, ‘‘HWDCNN: Multi-class in Proc. Pacific Rim Int. Conf. Artif. Intell., 2019, pp. 150–163,
recognition in breast histopathology with Haar wavelet decomposed image doi: 10.1007/978-3-030-29894-4_12.
based convolution neural network,’’ Biocybern. Biomed. Eng., vol. 39, [39] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, ‘‘Focal loss for dense
no. 4, pp. 967–982, Oct. 2019, doi: 10.1016/j.bbe.2019.09.003. object detection,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2,
[20] Y. Lu, M. Wang, W. Wu, Q. Zhang, Y. Han, T. Kausar, S. Chen, M. Liu, pp. 318–327, Feb. 2020, doi: 10.1109/TPAMI.2018.2858826.
and B. Wang, ‘‘Entropy-based pattern learning based on singular spectrum [40] R. Li, H. Liu, X. Wang, and Y. Qian, ‘‘DSBI: Double-sided Braille
analysis components for assessment of physiological signals,’’ Complexity, image dataset and algorithm evaluation for Braille dots detection,’’
vol. 2020, pp. 1–17, Jan. 2020, doi: 10.1155/2020/4625218. in Proc. 2nd Int. Conf. Video Image Process., 2018, pp. 65–69, doi:
[21] M. Bakator and D. Radosav, ‘‘Deep learning and medical diagnosis: A 10.1145/3301506.3301532.
review of literature,’’ Multimodal Technol. Interact., vol. 2, no. 3, p. 47, [41] No Title. Accessed: Aug. 20, 2020. [Online]. Available:
Aug. 2018, doi: 10.3390/mti2030047. https://www.kaggle.com/shanks0465/braille-character-dataset/
[22] V. V. Murthy, M. Hanumanthappa, and S. Vijayanand, ‘‘Braille cell seg- [42] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
mentation and removal of unwanted dots using Canny edge detector,’’ with deep convolutional neural networks,’’ Commun. ACM, vol. 60, no. 6,
in Advances in Artificial Intelligence and Data Engineering. 2021, doi: pp. 84–90, May 2017, doi: 10.1145/3065386.
10.1007/978-981-15-3514-7_7. [43] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for
[23] I. G. Ovodov, ‘‘Semantic-based annotation enhancement algorithm large-scale image recognition,’’ Tech. Rep., 2015.
for semi-supervised machine learning efficiency improvement applied [44] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, ‘‘Rethinking
to optical Braille recognition,’’ in Proc. IEEE Conf. Russian Young the inception architecture for computer vision,’’ in Proc. IEEE Conf.
Res. Elect. Electron. Eng. (ElConRus), Jan. 2021, pp. 2190–2194, doi: Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2818–2826, doi:
10.1109/ElConRus51938.2021.9396534. 10.1109/CVPR.2016.308.
[24] S. Alufaisan, W. Albur, S. Alsedrah, and G. Latif, ‘‘Arabic Braille [45] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
numeral recognition using convolutional neural networks,’’ in recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Proc. Int. Conf. Commun., Comput. Electron. Syst., 2021, p. 87, Jun. 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90.
doi: 10.1007/978-981-33-4909-4_7. [46] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger,
[25] R. F. Turkson, F. Yan, M. K. A. Ali, and J. Hu, ‘‘Artificial neural network ‘‘Densely connected convolutional networks,’’ in Proc. IEEE Conf. Com-
applications in the calibration of spark-ignition engines: An overview,’’ put. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4700–4708, doi:
Eng. Sci. Technol., Int. J., vol. 19, no. 3, pp. 1346–1359, Sep. 2016, doi: 10.1109/CVPR.2017.243.
10.1016/j.jestch.2016.03.003. [47] M. Lin, Q. Chen, and S. Yan, ‘‘Network in network,’’ Tech. Rep., 2014.
[26] S. T and V. Udayashankara, ‘‘A review on software algorithms for optical [48] G. Yadav, S. Maheshwari, and A. Agarwal, ‘‘Contrast limited adaptive
recognition of embossed Braille characters,’’ Int. J. Comput. Appl., vol. 81, histogram equalization based enhancement for real time video system,’’
no. 3, pp. 25–35, Nov. 2013, doi: 10.5120/13993-2015. in Proc. Int. Conf. Adv. Comput., Commun. Inform. (ICACCI), Sep. 2014,
[27] H. Burton, A. Z. Snyder, T. E. Conturo, E. Akbudak, J. M. Ollinger, and pp. 2392–2397, doi: 10.1109/ICACCI.2014.6968381.
M. E. Raichle, ‘‘Adaptive changes in early and late blind: A fMRI study of [49] R. Kreslin, P. M. Calvo, L. G. Corzo, and P. Peer, ‘‘Linear chromatic
Braille reading,’’ J. Neurophysiol., vol. 87, no. 1, pp. 589–607, Jan. 2002, adaptation transform based on Delaunay triangulation,’’ Math. Problems
doi: 10.1152/jn.00285.2001. Eng., vol. 2014, pp. 1–9, Jan. 2014, doi: 10.1155/2014/760123.
[28] I. G. Ovodov, ‘‘Optical Braille recognition using object detection CNN,’’ [50] A. K. Singh and B. K. Singh, ‘‘Applications of human biometrics in
2020, arXiv:2012.12412. digital image processing,’’ Int. J. Innov. Sci. Res. Technol., vol. 5, no. 7,
[29] B.-M. Hsu, ‘‘Braille recognition for reducing asymmetric communication pp. 1273–1276, Aug. 2020, doi: 10.38124/ijisrt20jul748.
between the blind and non-blind,’’ Symmetry, vol. 12, no. 7, p. 1069, [51] H. Z. U. Rehman and S. Lee, ‘‘Automatic image alignment using principal
Jun. 2020, doi: 10.3390/SYM12071069. component analysis,’’ IEEE Access, vol. 6, pp. 72063–72072, 2018, doi:
[30] J. Mao, J. Zhu, X. Wang, H. Liu, and Y. Qian, ‘‘Speech synthesis of 10.1109/ACCESS.2018.2882070.
Chinese Braille with limited training data,’’ in Proc. IEEE Int. Conf. [52] T. Z. Xiang, G. S. Xia, X. Bai, and L. Zhang, ‘‘Image stitching by line-
Multimedia Expo (ICME), Jul. 2021, pp. 1–6, doi: 10.1109/icme51207. guided local warping with global similarity constraint,’’ Pattern Recognit.,
2021.9428160. vol. 83, pp. 481–497, Nov. 2018, doi: 10.1016/j.patcog.2018.06.013.

169370 VOLUME 9, 2021


T. Kausar et al.: Deep Learning Strategy for Braille Character Recognition

[53] G. Zhang, L. Pan, J. Chen, Y. Gong, and F. Liu, ‘‘Joint alignment SAJJAD MANZOOR received the degree in elec-
of image faces,’’ IEEE Access, vol. 8, pp. 114884–114891, 2020, doi: tronics systems engineering from Hanyang Uni-
10.1109/ACCESS.2020.3003332. versity, South Korea, in 2016. He is currently
[54] H. T. Likassa, ‘‘New robust principal component analysis for joint image an Assistant Professor with the Mirpur Insti-
alignment and recovery via affine transformations, Frobenius and L2,1 tute of Technology, Mirpur University of Science
norms,’’ Int. J. Math. Math. Sci., vol. 2020, pp. 1–9, Apr. 2020, doi: and Technology (MUST). His research interests
10.1155/2020/8136384. include robotics, vision, and target tracking.
[55] H. Huang, J. Zhang, J. Zhang, J. Xu, and Q. Wu, ‘‘Low-rank pair-
wise alignment bilinear network for few-shot fine-grained image classi-
fication,’’ IEEE Trans. Multimedia, vol. 23, pp. 1666–1680, 2021, doi:
10.1109/TMM.2020.3001510.
[56] T. Madeira, M. Oliveira, and P. Dias, ‘‘Enhancement of RGB-D image
alignment using fiducial markers,’’ Sensors, vol. 20, no. 5, p. 1497, ADEEBA KAUSAR received the B.Sc. degree in
Mar. 2020, doi: 10.3390/s20051497. computer system engineering from the University
[57] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen, of Azad Jammu and Kashmir, Pakistan, in 2017,
‘‘MobileNetV2: Inverted residuals and linear bottlenecks,’’ in Proc. IEEE and the M.Sc. degree in computer system engi-
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 4510–4520, neering from the University of Engineering and
doi: 10.1109/CVPR.2018.00474. Technology, Taxila, Pakistan, in 2020. She is cur-
[58] D. S. K. Jadwa, ‘‘Wiener filter based medical image de-noising,’’
rently a Lecturer with the Department of Com-
Int. J. Sci. Eng. Appl., vol. 7, no. 9, pp. 318–323, Sep. 2018, doi:
puter Science, University of Narowal, Pakistan.
10.7753/ijsea0709.1014.
[59] N. Otsu, ‘‘A threshold selection method from gray-level histograms,’’ Her research interests include computer vision,
Tech. Rep., 1979. image processing, deep learning techniques, next-
[60] Morphological Image Analysis: Principles and Applications, Sens. Rev., generation wireless communication systems, and wireless sensor networks.
2000, doi: 10.1108/sr.2000.08720cae.001.
[61] K. He and J. Sun, ‘‘Convolutional neural networks at constrained time
cost,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015, YUN LU was born in Hengyang, Hunan, China,
pp. 5353–5360, doi: 10.1109/CVPR.2015.7299173. in 1985. He received the B.S. degree in microelec-
[62] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, tronics from Xiangtan University, Hunan, in 2009,
S. Ghemawat, G. Irving, M. Isard, and M. Kudlur, ‘‘TensorFlow: A system the M.S. degree in optical engineering from Sun
for large-scale machine learning,’’ in Proc. 2nd USENIX Symp. Operating Yat-sen University, China, in 2011, and the Ph.D.
Syst. Design Implement. (OSDI), 2016, pp. 265–283.
degree in microelectronics and solid state elec-
[63] F. Chollet, ‘‘Keras,’’ J. Chem. Inf. Model., 2013.
tronics from the Harbin Institute of Technol-
[64] L. Smith and Y. Gal, ‘‘Understanding measures of uncertainty for adver-
sarial example detection,’’ Tech. Rep., 2018.
ogy at Shenzhen, Shenzhen, China, in 2020.
[65] A. Hore and D. Ziou, ‘‘Image quality metrics: PSNR vs. SSIM,’’ in From 2011 to 2014, he was with the Shen-
Proc. 20th Int. Conf. Pattern Recognit., Aug. 2010, pp. 2366–2369, doi: zhen Institutes of Advanced Technology, Chinese
10.1109/ICPR.2010.579. Academy of Sciences, working on the design of mixed signal front-end
[66] A. D. Kiureghian and O. Ditlevsen, ‘‘Aleatory or epistemic? Does it circuits for biomedical applications. From 2014 to 2017, he was also a
matter?’’ Struct. Saf., vol. 31, no. 2, pp. 105–112, Mar. 2009, doi: Senior Engineer with Launch Tech Company Ltd. Since August 2020, he has
10.1016/j.strusafe.2008.06.020. been working as an Associate Professor at Huizhou University, Guangdong,
[67] T. Makkar, Y. Kumar, A. K. Dubey, A. Rocha, and A. Goyal, ‘‘Analogizing China. His current research interests include cognitive computing from neu-
time complexity of KNN and CNN in recognizing handwritten digits,’’ in roscience to engineering, innovative methods for brain–machine interface,
Proc. 4th Int. Conf. Image Inf. Process. (ICIIP), Dec. 2017, pp. 1–6, doi: and machine learning.
10.1109/ICIIP.2017.8313707.
[68] C. Malon, E. Brachtel, E. Cosatto, H. P. Graf, A. Kurata, M. Kuroda,
J. S. Meyer, A. Saito, S. Wu, and Y. Yagi, ‘‘Mitotic figure recognition:
MUHAMMAD WASIF received the B.Sc. degree
Agreement among pathologists and computerized detector,’’ Anal. Cell.
Pathol., vol. 35, pp. 97–100, Jan. 2012, doi: 10.3233/ACP-2011-0029. in electrical engineering from Riphah University,
Pakistan, in 2006, and the M.Sc. degree in mecha-
tronics and the Ph.D. degree in robotics from
King’s College London, U.K., in 2008 and 2016,
respectively. He is currently working as an Assis-
tant Professor with the Department of Electrical
Engineering, University of Gujrat, Pakistan. He is
the Group Leader of the Intelligent Systems Labo-
ratory, Department of Electrical Engineering, Uni-
versity of Gujrat. His research interests include robotics, intelligent control,
active vision, machine vision, and machine learning.

TASLEEM KAUSAR was born in Pakistan, M. ADNAN ASHRAF received the B.Sc. and
in 1988. She received the B.Sc. degree in elec- M.Sc. degrees in electrical engineering from the
trical engineering from the University of Azad Mirpur University of Science and Technology,
Jammu and Kashmir, Pakistan, in 2010, the M.Sc. Pakistan, in 2009 and 2020, respectively, where he
degree in electronics engineering from the Univer- is currently pursuing the Ph.D. degree in control.
sity of Engineering and Technology, Taxila, Pak- He started job as a Trainee Engineer at the High
istan, in 2013, and the Ph.D. degree in electronics Voltage and Short Circuit Laboratory, Wapda,
from the Harbin Institute of China, in 2020. She Pakistan, from 2011 to 2012. He was a Design
has been working as a Lecturer at the Mirpur Engineer with Transfopower Pvt. Ltd., Pakistan,
University of Science and Technology, Pakistan, from 2012 to 2014. He joined as a Lecturer at the
since 2013. Her research interest includes medical image analysis with deep Mirpur University of Science and Technology, in 2016.
learning techniques.

VOLUME 9, 2021 169371

You might also like