Real-Time Product Recognition and Currency Detection in Supermarket For Visually Impaired

Real-Time Product Recognition and Currency Detection In
Supermarket for Visually Impaired

Nishaben Sodha, Sanika Divekar, Sanket Jadhav, Tejas Pacharne, Vighnesh Ghadge, Atharva Jadhav
Department of IT & Artificial Intelligence and Data Science Department of Information Technology
Vishwakarma Institute of Technology Vishwakarma Institute of Technology
Pune, Maharashtra 411037, India Pune, Maharashtra 411037, India
{ nishaben.sodha & sanika.divekar20, sanket.jadhav20, tejas.pacharne20, vighnesh.ghadge19, atharv.jadhav19} @vit.edu
Abstract - we all come across instances in our existence artificial vision devices, currently, systems such as the Orcam
where visually impaired people take time to locate expected MyEye and the Sound of Vision system are existing. These
things and await the cashier at a retail store. Fake currency is the products can assist people with vision impairments in a variety
other part of concern in the world of visually impaired people. of settings, including reading written documents, describing
Increased technical improvements have created the likelihood of
their surroundings, and recognizing people they may know.[2]
more fake currency being disseminated in the market, lowering
the country's total economy. Because automatic product Currency plays an essential role as a medium of
recognition and currency detection are more dependable and exchange for every day essential exchange. Each nation does
time-saving than human operation, it will be hassle-free for them. have its own currency which comes with a variety of colors,
The suggested system employs image processing to determine sizes, shapes, and patterns. Visually challenged people find it
whether the currency is accurate or not. More precisely, this difficult to detect and count different denominations of
study examines the major hurdle of deep learning for retail currency. Due to continual use, pen marks on the banknote's
product detection, the other part is detecting currency value. In
surface become invisible or crumble This makes it difficult for
this paper, we introduce a new classification approach for a
difficult problem in this project: the categorization of fruit and visually impaired people to recognize and distinguish
vegetables, veggies, and frozen items in super markets, such as banknotes accurately by feeling or touch. Image processing in
milk packages and juice cartons. The entire system is digital form is a large field that provides solutions to problems
self-contained and operates in real-time. The suggested teachable, like these, in which patterns and identification markings are
neural model, according to the test results, the model-based searched for and extracted, and then compared to actual
method has detection and recognition algorithms exceeding 99 banknote images.[3]
percent and 99 percent, respectively.
The suggested currency detection and recognition
Index Terms – Indian Currency Detection, Image
system's key contribution is to build a simple standalone
processing, Product Recognition, Visually Impaired
system easy to use, which will assist impaired individuals in
I. INTRODUCTION identifying currency in a real-time scenario.
Over 2.2 billion people worldwide suffer due to a II. LITERATURE REVIEW
visual impairment, including 1 billion people with severe or Various papers were studied to know about research done in
blindness or significant weakening of long-distance vision, the the area. Some of the important surveys and techniques are
majority of them are over 50 years old. However, when faced explained below.
with a range of obstacles and conditions in their daily lives, In paper [4], authors proposed a system which includes
these assistance devices have their own limitations. People the implementation of Automatic Caption Generator using
frequently regard such individuals as a burden and leave them CNN and RNN-LSTM models. Flickr8k dataset used to train
to run errands for themselves. As a result, the visually and test the model. The evaluation of the model is done on the
impaired individual is continuously in need of assistive basis of BLEU scores. Image Caption Generator models are
technology that can support them with their everyday duties based on encoder-decoder architecture which use input vectors
and rehabilitation. Blindness is a serious problem that affects for generating valid and appropriate captions, The VGG16
many people and worsens with age. CNN architecture works as an encoder and RNN-LSTM work
There is a pressing need for technology that may try as a decoder. CNN model has many layers including input,
and encourage people in their everyday existence, increase Convo, Pooling; Fully-connected, Soft max, and Output are
their chances of success in a straightforward way, and the layers included. The proposed model gives 0.68 BLEU
ultimately to mobility. The most significant of these problems score which is still a faraway value from 1.0. The model is
is visual impairment. [1] based on multi label classification.
The implementation of image recognition models in The study [5] examines the major issues of deep learning
assistive technology for people with visual impairments is the for retail product recognition and proposes various strategies
subject of this research. Mobile applications, such as that might aid in the topic's research. According to the review
Microsoft's Seeing AI and Aipoly Vision, and wearable
currently, YOLO, SSD, R-CNN that is faster Mask R-CNN are
cutting-edge object detection approaches that test their
algorithms using PASCAL VOC and MS COCO datasets. are converted. The converted part is the feature information
PASCAL VOC, on the other hand, only has images of 20 item about that key object. Templates have been used for
classes, whereas MS COCO has photos of 80 object descriptive text using a lexical model or other particular. The
categories. Large-Scale Classification, Data Limitation, encoder decoder makes it more flexible for the system to
Intra-class Variation. To overcome this problem authors image captioning. This is the main structure said in the paper.
suggested some techniques such as CNN-Based Feature In the encoder, a convolution neural network is used to take
Descriptors, Data Augmentation, Fine-Grained Classification, key objects from image features and make out information; in
One-Shot Learning. the decoder, a back - propagation neural network is used to
The very first aspect of this paper [6] is taking all the combine with the features and key object to provide an image
different information from the image feature extraction which content description statement. The soft attention mechanism
was the first requirement. Here the paper describes taking focuses on all parts of the picture, with each region having a
information about an object then that object location in that varied weight value.
image; The paper also defines another section where the In the paper [7] MSCOCO, Flickr30k, and Flickr8k are
evaluation of contextual information of image and its some of the most often used datasets for image captioning.
combination with feature extraction to generate image The MSCOCO collection is quite vast, and all of the photos in
description. The two main types of image captioning it contain several captions. The Visual Genome dataset is used
algorithms are those that use the template method and those for picture captioning on regions. Image caption performance
that use the encoder-decoder structure. The two main types of is measured using a variety of evaluation parameters. The
image captioning algorithms are those that use the template BLEU metric is useful for evaluating short sentences. ROUGE
method and those that use the encoder-decoder structure. This is divided into several kinds, each of which can be used to
approach first gathers a range of important feature data; here assess various types of texts. METEOR has the ability to
the system works on special properties that are taken from the evaluate several parts of a caption. In comparison to other
key objects. Using multiple kinds of classifiers, such as SVM, evaluation metrics, SPICE is better at understanding semantic
the svm acts as the middle part where the collected features subtleties of captions.
TABLE I
Summarization of related paper
Title Journal name Author Published Date Methodology
Image Captioning: International Simao Herdade, Armin 11 Jan 2020 Transformer encodes 2D position and size
Transforming Objects Conference on Kappeler, Kofi Boakye, relationships between detected objects in images
into Words[8] Contemporary Joao Soares, and using SPICE captioning matrix.
Computing and
Applications
A Comprehensive Journal of Physics: A Comprehensive Survey 14 Oct 2018. Genome dataset is used for picture captioning on
Survey Of Deep Conference Series Of Deep Learning For regions.
Learning For Image Image Captioning
Captioning [9]
Currency Detector for IEEE Bibhudyuti Nayak; Natraja 06 September The elementary methods used in our planned
Visually Impaired using Praveen Thota; Jadapalli 2021 system include pre-image classification,
Machine Learning[10] Dinesh Kumar; segmentation, histogram equalization, region of
Dhanalakshmi R; Bairavel interest (ROI), and finally template matching
S using MobileNetV1-224 and TensorFlow.
Title Journal name Author Published Date Methodology
Object and Currency 2020 IEEE Region 10 Kanchi Kedar Sai Nadh 02 November The MobileNet architecture with OpenCV and a
Detection with Audio Symposium Reddy; Challa Yashwanth; 2020 proprietary set of conditions were applied to
Feedback for Visually (TENSYMP) SreeHarsha KVS; Pavan calculate the tilt of the recognized item, and the
Impaired[11] Anvesh Tamidala Venkata KNN method has been used to recognize currency
Sai; Sonia Khetarpaul notes in the OpenCV library's brute force
descriptor matcher.
III. PROPOSED METHODOLOGY
Fig1. System Architecture
The flow diagram fig 1 represents the two main categories batch size and 0000.1 as learning rate. The model shows 98
of the system as grocery and currency detection. The flow percent accuracy on training and testing data. The model
diagram presents the input of anyone of the categories to the accuracy increases its validation accuracy on real time images
system. In Fig. 1, the operation of the proposed system is by using canny edge detection for detection and CNN module
described. The input from the camera is collected as an image for classification of different bank notes. [12]
frame, which is then pre-processed before being fed into the D. Data Collection:
trained model. Here the first image is put into the model a. Indian Currency Dataset:
trained on the dataset; it produces output for detected currency Because a current Indian rupee denomination dataset is
or grocery products and generates the description for each. It available with a limited number of images, additional images
can detect currency and produce a single picture, and there is are added with the help of Realme XT smartphone camera's
no type restriction for the suggested system. 64 MP sensor. In comparison, the total size of all the
The label, which is a text output, is then transformed into currency images in the collection that were photographed in
voice for distinct labels or recordings the text to speech portrait mode. To construct the dataset, a total of 50 photos
platform is used of relevant currency, product name which are taken for each currency. All market-acceptable currency
may then be listened. As a result, the audio output for each notes are, such as old and new 10 rupee notes, old and new 20
detected and identified currency is relayed to the visually rupee notes, old and new 50 rupee notes, old and new 100
impaired individual through the speaker. rupee notes, new 200, 500, and 2000 notes. [13]
B. Proposed System
The proposed system contains two modules as Indian currency
classification and grocery recognition. The currency
classification module is used for classification of Indian
currency using Convolution Neural Network. The model was
trained on Google Teachable Machine with 30 epochs, 32 as
The Currency Detection Dataset is a custom dataset passed on
Google teachable machine where learning rate is 0.00001.Te
batch size for the training model is 16 with 30 epochs. The
custom training dataset model given around 98% percentage
accuracy on training and testing data. The training and testing
loss was recorded as 0.4.
The Grocery dataset was processed with data augmentation
with rotation range as 10 , width shift as 0.1 , height shift as
0.1 rescaling factor as 1/255 ,zoom range as 0.2 and validation
split as 0.2. The training data is 80 percent and 20 percent as
test data. The processed Grocery data passed to the VGG16
model which is pre trained on the Image Net dataset. Initial
layers of model frozen and extra dense layers added to the end
and at the last layer Softmax function is added to get the
feature vector. VGG16 model is trained with 50 epochs and
learning rate is managed by ADAM optimizer. The transfer
learning model is trained on the Grocery dataset and the
Fig 2. Currency Dataset
features. The accuracy was 96% on the trained model.
b. Common Grocery Dataset
This resource offers a collection of nature photos of
supermarket goods. All of the natural photographs were IV. RESULT AND DISCUSSION
captured using a smartphone camera in various grocery shops. A. Product Detection Results
We collected 5125 natural photos from 81 different categories Figure — depicts the product being analyzed with a precision
of fruits, veggies, and other carton products (e.g. juice, milk, of —— percent. Figure -- also shows the application's output.
yogurt). The 81 classes are grouped into 42 coarse-grained Whenever the product is analyzed under various situations,
classes, with fine-grained classes such as 'Royal Gala' and there are times when it is not successfully detected. The
'Granny Smith' belonging to the same coarse-grained class number of successful and unsuccessful detections, however,
'Apple.' We downloaded an iconic image and a product varied with product, since some things were spotted with
description for each fine-grained class, and some examples of greater precision than others. For example, in the instance of
these may be viewed on this page below. The dataset was orange, banana, kiwi in fruit detection and lemon, ginger,
presented at WACV 2019 in the article "A Hierarchical garlic in vegetable detection the product was recognized all
Grocery Store Image Dataset with Visual and Semantic the times out of attempts, representing 99% accuracy.
Labels.".[14] B. Currency Detection Results
Figure — depicts the reading moving objects in frames
E. Preprocessing detecting the currency notes, and Figure — depicts the output.
Images are gathered and then pre-processed, augmented, and Only when the application launches and the sequence of
annotated to train the neural network in order to construct a images appear, the user must maintain the currency note close
real-time currency identification and product recognition to the screen, and the sequence of images is saved as an image
system utilizing neural network with teachable machine file. To acquire currency detection results, a teachable machine
A. Image Augmentation and Annotation was used. The picture file is examined and the essential points
Image Enhancement and Annotation Under this approach, an that can be utilized to identify the currency are identified.
image is acquired using a camera with a resolution of
1280*720 pixels in various settings such as occlusion,
illumination (front, side, and dispersed lighting), and so on.
Around 330 photographs of each currency were obtained from
the camera as well as the internet (images were also in various
formats such as.jpg, .jpeg, .png).
As demonstrated in fig 1, image augmentation methods
include rotation, brightness, reflection, color, scaling, shear,
adding noise, background removal, and translation
adjustments used for image augmentation. To make matters
more complicated, picture data is supplemented with a mix of
various modifications and augmentation procedures.
F.Training and Testing

Fig 3. Accuracy and epoch graph
Fig 5. Proposed Application Interface
Fig 4. Loss and epoch graph
Fig 6. Proposed System detecting and recognizing product and currency
V. FUTURE SCOPE
For the future the system can add additional features
to the object detection mechanism, such as color recognition
and audio data conversion. Even the number of classes Captioning", Arxiv:1810.04020v2, ACM Computing SurveysVolume 51
Issue 6November 2019
between the grocery detection can be added such as making
[9] Gaurav1 and Pratistha Mathur, “A Survey on Various Deep Learning
the machine able to predict between sugar and rice which have Models for Automatic Image Captioning”, Journal of Physics: Conference
very few different attributes. the system can be enhanced to Series, Volume 1950, International Conference on Mechatronics and
know what currency is on the with the condition as written on Artificial Intelligence (ICMAI) 2021 27 February 2021
the note, fake or real currency. [10]B. Nayak, N. P. Thota, J. D. Kumar, D. R and B. S, "Currency Detector
for Visually Impaired using Machine Learning," 2021 International
The system can be able to make further notation such as the Conference on System, Computation, Automation and Networking
surrounding description with the help of translating objects (ICSCAN), 2021, pp. 1-5
into the words and make it up to audible. The system can have [11]K. K. S. N. Reddy, C. Yashwanth, S. KVS, P. A. T. V. Sai and S.
various aspects to make it more deliverable and make it hassle Khetarpaul, "Object and Currency Detection with Audio Feedback for
Visually Impaired," 2020 IEEE Region 10 Symposium (TENSYMP),
free for visually impaired people. 2020, pp. 1152-1155, doi: 10.1109/TENSYMP50017.2020.9230687.
[12]Veeramsetty, V., Singal, G. & Badal, T. Coinnet: platform independent
application to recognize Indian currency notes using deep learning
VI. CONCLUSION techniques. Multimed Tools Appl 79, 22569–22594 ,2020.
[13]Veeramsetty, Venkataramana; Singal, Gaurav; Badal, Tapas “Indian
The system is able to accurately detect products with
Currency Dataset”, Mendeley Data, V1, 2020
fruits, vegetables, other grocery products, such as orange, [14]M. Klasson, C. Zhang and H. Kjellström, "A Hierarchical Grocery Store
lemon, onion, potato, ginger, garlic, banana etc. The Image Dataset With Visual and Semantic Labels," 2019 IEEE Winter
application can also determine the value of Indian currency Conference on Applications of Computer Vision (WACV), 2019, pp.
notes. Users have been able to give vocal instructions, which 491-500, doi: 10.1109/WACV.2019.00058.
the software has detected and processed. In addition, we were
able to properly translate these outputs into audible feedback.
In this paper, an independent, real-time currency detection and
recognition system based teachable machine and neural
network is provided. The model is trained using many photos
of each class with 330 input images. Banknotes datasets are
created under a variety of settings such as crowded
backgrounds, rotation, occlusion, illumination intensity, scale,
and so on.
REFERENCES
[1] R. C. Joshi, S. Yadav and M. K. Dutta, "YOLO-v3 Based Currency

Detection and Recognition System for Visually Impaired Persons," 2020
International Conference on Contemporary Computing and Applications
(IC3A), 2020, pp. 280-285, doi: 10.1109/IC3A48958.2020.233314.
[2] M. Klasson, C. Zhang and H. Kjellström, "A Hierarchical Grocery Store
Image Dataset With Visual and Semantic Labels," 2019 IEEE Winter
Conference on Applications of Computer Vision (WACV), 2019, pp.
491-500, doi: 10.1109/WACV.2019.00058
[3] Ç. Gider and S. V. Albayrak, "Identifing of Alphanumerical Codes in
Promotional products by Using of Deep Neural Network," 2018 3rd
International Conference on Computer Science and Engineering (UBMK),
2018, pp. 458-46
[4] Preksha Khant, Vishal Deshmukh, Aishwarya Kude, Prachi Kiraula,
"Image Caption Generator using CNN-LSTM", International Research
Journal of Engineering and Technology, Vol. 08, pp. 4100-4105, July
2021.
[5] Yuchen Wei, Son Tran, Shuxiang Xu, Byeong Kang, and Matthew
Springer, "Deep Learning for Retail Product Recognition: Challenges and
Techniques", Hindawi Computational Intelligence and Neuroscience, Vol.
1, 2020.
[6] Chaoyang Wang, Ziwei Zhou, Liang Xu, "An Integrative Review of
Image Captioning Research", The 2020 5th International Seminar on
Computer Technology, Mechanical and Electrical Engineering (ISCME
2020) ,Journal of Physics: Conference Series, 2020.
[7] Simao Herdade, Armin Kappeler, Kofi Boakye, Joao Soares, "Image
Captioning: Transforming Objects into Words", arXiv:1906.05963v2, 11
Jan 2020.
[8] Md. Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin, Hamid
Laga, "A Comprehensive Survey Of Deep Learning For Image

Real-Time Product Recognition and Currency Detection in Supermarket For Visually Impaired

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Real-Time Product Recognition and Currency Detection in Supermarket For Visually Impaired

Uploaded by

Copyright:

Available Formats

Real-Time Product Recognition and Currency Detection In

Supermarket for Visually Impaired

Title Journal name Author Published Date Methodology

III. PROPOSED METHODOLOGY

Fig1. System Architecture

F.Training and Testing

Fig 5. Proposed Application Interface

Fig 4. Loss and epoch graph

Fig 6. Proposed System detecting and recognizing product and currency

[1] R. C. Joshi, S. Yadav and M. K. Dutta, "YOLO-v3 Based Currency

You might also like