Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Intelligent Billing system using Object Detection

Neeraj Chidella N Kalyan Reddy


Department of Electronics and Communication Engineering Department of Electronics and Communication Engineering
Visvesvaraya National Institute of Technology Visvesvaraya National Institute of Technology
Nagpur, India Nagpur, India
chidellaneeraj55@gmail.com kalyanreddynuthula@gmail.com

N Sai Dheeraj Reddy Maddi Mohan


Department of Electronics and Communication Engineering Department of Electronics and Communication Engineering
Visvesvaraya National Institute of Technology Visvesvaraya National Institute of Technology
Nagpur, India Nagpur, India
dheeraj.nannuru@gmail.com maddimohan461@gmail.com

Joydeep Sengupta
Department of Electronics and Communication Engineering
Visvesvaraya National Institute of Technology
Nagpur, India
joydeep.sengupta44@gmail.com

Abstract—With the rapidly increasing technology and de- and deep learning is widely used nowadays. It is very much
velopment in machine learning, deep learning and artificial implemented in every place where there is a requirement of
intelligence, improving the billing system is an effective means semantic segmentation. Object detection using deep learning
of reducing wastage of time. Nowadays, even though barcode
scanners have become as fast as ever but for fruits and vegetables, makes many existing problems to be solved very productively
it still needs to be entered manually into the computer which is and easily. As of now in India, the problem of late billings
very time taking and hectic process. Vegetable and fruit markets in supermarkets like Walmart, Spencer’s and Reliance are
have become an integral part of our life hence in such places the very much prevalent, even though more number of checkout
environment must be made hassle free and more importantly, stations are being added, Billing for fruits and vegetables is
the billing should be less laborious and efficient without wasting
time. In order to overcome the existing problems associated with very tough with these bar-codes and RF-ID tags. We have used
the barcode and RFID tags, we proposed an automatic billing basic convolutional network (CNN) for single fruit detection
system that detects the fruits and vegetables and then displays and achieved an accuracy of 98 percent, CNN is an algorithm
the final Bill. The main objective of this project is to detect the based on classification, where we have to use convolutional
fruits, display the fruits detected and then to bill these items. To layers in such a way that select the interested region from the
achieve this, we have used two different algorithms, 1) Fine tuned
Convolutional Neural Network that we built from base model. 2) image and classify them. Even though this method is slow, we
To increase accuracy for real time object detection and for the were able to achieve high accuracy by fine tuning the model
bounding boxes to be displayed, we used state of the art YOLO but this model doesn’t give the bounding boxes and cannot be
based on pytorch as YOLO predicts the bounding boxes and used for multiple fruits in a same image hence we used YOLO
detects the object faster than other detection algorithms and is algorithm which is based on regression where we predict the
more reliable.
Index Terms—YOLO, bounding boxes, Convolutional Neural
classes and give bounding boxes and detect multiple objects.
Network(CNN), Automatic Billing, Object detection. The main problems that we addressed through this project are
1) Reducing the use of bar-codes as it is very expensive
because it requires bar-coding of each and every product and
I. I NTRODUCTION
is very time taking at the time of billing.
In recent days, People have become time conscious and are 2) The generalization of prices as the price of fruits and
very hesitant to waste time. In these fast moving times, with vegetables is not fixed everywhere and depends on bargaining
increasing technology day by day. Time has become the most skills, this would make common prices if linked with the same
important resource for the people all over the world hence data set everywhere.
there is a need to fasten the billing process mainly in grocery 3) Decreasing the man power in turn reducing the management
shops, vegetable and fruit markets where there is lot of rush. costs for the supermarkets
We also might have had bad experiences in grocery shops, fruit
markets where we had to weight in long queues even for one
or two products also. Object detection using Machine learning
II. L ITERATURE S URVEY vertically and through some angle which is found through
YOLO(You Only Look Once) is an emerging and very trial and error. All of this data is imported in batches of
efficient model for object detection. Hence, a lot of research 32/64 for ease of use.
is being done on it and the idea of an automated smart billing 2) YOLO:
system for supermarkets, vegetable stores etc.
• We have collected images from Open images dataset by
Real Time Object Detection with Yolo by Geetha Priya S, N
google in Yolo v4 format,A custom dataset that consists
DuraiMurugan, SP chokkalingam[1]. In this paper, they have
of 19 classes was constructed by us from Open Images
proposed a YOLO algorithm for the detection of objects using
dataset V5 by google in the format of yoloV4, but
only one neural network. This paper also mentions the ad-
this dataset was poor as it downloaded many irrelevant
vantages that Yolo has over other Object detection algorithms
images. This dataset consists 19 classes of fruits and
and also concludes that YOLO is a much more efficient and
vegetables , each class having around 200 images hence
fastest algorithm to use in real time. The paper written by
making a total of 3650 images in the train dataset and
Marcus Klasson[2] focussed on the implementation of object
the test dataset consists of 200 mixed fruit images in
recognition models executed into assistive technologies, this
various conditions. We used darknet framework as the
paper explains the working of CNN on multiple datasets.
first iteration of YOLOV4 was built on darknet.
Food Image Recognition for Price Calculation using Con-
• Here is a flow chart that depicts the dataset,training and
volutional Neural Network by Md Jan Nordin, Norshakirah
detection process of the yolo algorithm
Aziz, Ooi Wei Xin[3]. This paper helped us by providing
a good way for billing the obtained food materials after
object detection. Object Detection Based on YOLO Network
by Chengji Liu,Yufan Tao, Jiawei Liang, Kai Li ,Yihang
Chen[4]. This paper proposed a yolo based model to solve
the problem of image degradation due to noise, blurring etc in
case of traffic signs. Kavan Patel[7] proposed a self checkout
portal in supermarkets using the YOLO algorithm but he
failed to complete the implementation of this model. Another
paper written by Huimin Yuan and Ming Yan[9] proposed an
intelligent billing system using cascade R-CNN. After all this
literature survey, we found out that many authors have tried
to build a smart billing system using deep learning but they
were not able to get a good result. So, in this paper, we are
proposing a smart billing system for supermarkets based on
the highly advanced YOLO model.

III. I MPLEMENTATION OF MODELS


A. Building the Models
1) CNN: For any object detection model,the first step Fig. 1. Flowchart of data acquisition
in building the model would be acquiring the ideal dataset
that would serve the purpose of the project. After thorough • The acquisiton of the third dataset was the most toughest
literature survey, part for developing the model in yolo as all the images
• A dataset called Fruits-360 is selected from kaggle that in the dataset must be in the yolo format that is each and
contains a whooping 90483 total images which are split every image must be labelled and anotated in text files
into training set of 67692 images and test set of size which contain the data related to the bounding boxes of
22688 images sorted in 131 fruit and vegetable classes. the object in the image, hence we had to create a custom
The main reason for choosing this dataset is, it has huge dataset combining two different datasets from kaggle.
number of images which would help in achieving good This custom dataset consists of 12 classes which are
accuracy hence good detection, all the images are already ’apple’, ’banana’, ’cheetos’, ’cucumber’, ’eggplant’, ’her-
resized to 100x100. shey’, ’kitkat’, ’maggie’, ’mushroom’, ’orange’, ’pringle’,
• The second immediate step after acquiring the dataset ’reese’ This dataset is preprocessed by resizing all the
would be preprocessing as the dataset is very huge, images to 416 x 416, this dataset consists of 1705 images
preprocessing the dataset is a must for getting good in total with 1654 in training set and 61 images in
results. As all the images are already resized, The next validation set. These images are pre annotated images
step after acquiring the dataset is Augmentation of data, hence easily convertable into any yolo format required
Convolutional Neural networks are not invariant through using roboflow, we used pytorch which is a machine
rotation hence the images are rotated horizontally and learning library used for computer vision purposes
B. Working Of CNN :
The Convolutional neural networks consist of three main
layers namely
• Convolution Layer : Detection of meaningful information
takes place in this layer where a kernel or a matrix
that consists a set of learnable paramets slides on the
image producing useful information. This layer reduces
the memory requiremen and also imrpoves the efficiency.
A feature map is applied on the image to get the important
features from the image.
• Pooling Layer : Pooling layer is used for reducing the size
of the feature map after using the kernel. In this project,
we use max pooling which gives the maximum possible
output from the neighbhourhood. One important feature
of pooling is the invariance in translation which is the
object that we are trying to detect would be identifiable/
recognizable regardless of where it is in the image.
• Fully Connected Layer: This layer acts as a bridge
between the input and the output as it helps in final
classification of the fruit. The non linearity in the image Fig. 2. Model training Fig. 3. Detection
can be reduced by RELU that is rectified linear unit which
only gives the non negative value pixels as the output but
keeping the important features. image. Now our YOLO algorithm checks each and every
We also flatten the final feature map and then we input this grid separately and marks its bounding boxes. The grid
to the neural network, where every layer is having a different without object is labelled as zero.
random weight. Sometimes, the neuron extracts same number • There are chances of two or more grids containing the
of features from same data, this happens when multiple same object, in this case the grid containing the center
neurons face identical data patterns this causes too much point of the object is taken. In this case of multiple
computational resources, hence if neurons extract the same bounding boxes, for accurate detection of the objects
features again and again, this reduces the quality of the output IOU (Intersection over Union) and Non-Max Suppression
and leads to over fitting. So when we add these drop out layer, techniques are used. In IOU, area of intersection and
this will randomly shutdown some of the neuron values the area of union of two bounding boxes are calculated
number of dropout layer is completely trial and error based and IOU value is calculated using the formulae, IOU
only. = (Area of Intersection / Area of Union). We suppress
the bounding boxes whose IOU value is less than certain
C. Working of YOLO: threshold value. Threshold can be anything and decided
• We take an image and apply YOLO algorithm to it. So based on the type of objects and spacing between. And in
now we divide the image into nxn grids. The value of n Non-max suppression the boxes with higher probability
can be anything and depends on the image complexity. (Objectness score) are taken and the boxes with high IOU
Now we perform classification and localization on each are suppressed. We need to repeat these methods until one
of the grid. Now based on presence of the object inside bounding box is found.
that particular grid, objectness score of each grid is found. • Anchor Boxes: With the help of bounding boxes, we can
And a bounding box is generated over the image. If there only detect at most one object in a particular grid cell,
is no object in the grid, then the objectness score will be so for detection of more than one object we need to use
zero and also the bounding box values of coordinates of anchor box. Anchor box is a matrix which contains the
the grid will also be zero, else if there is an object in the details of the bounding box and about the presence of the
grid then the objectness score will be 1 and the bounding object. Each cell in the grid contains a anchor box, but
box value will be generated based on the bounding values when we have more than one object in a cell then the
of that object. The detailed explanation of bounding box dimension of the anchor box changes with the number of
prediction and the concept of anchor boxes, which is used objects present in the cell.
in detecting more than one object in a single grid are IV. RESULTS
discussed in further subsections.
• Bounding box prediction: Each individual grid is classi- A. CNN:
fied and labelled and then Image classification and Object Initially, we trained with basic CNN model which gave an
localization techniques are applied for each grid of the accuracy around 65%, after we fine tuned the model with
drop out layers, changed the preprocessing parameters and further tried to improve the accuracy using YOLOv5 model,
also changed the dropout ratio, then we got an accuracy of whose results are explained in the next sub-section.
98%. Here are some of the predictions using CNN
C. YOLOv5:
Yolov5 is different from its previous versions since its
implementation is completly PyTorch based unlike its counters
which are on Darknet. Its major improvements in mosaic data
augmentation and auto learning bounding box anchors gives
it an upper hand. So, by using this YOLOv5 we were further
able to improve our accuracy to 78%.

Fig. 4. Prediction using CNN Fig. 5. Multiple results

As you can see that the predictions with the test images can
out perfectly as expected but as it is CNN algorithm, we were
able to get the bounding boxes and get good accuracy on real
life test images.
B. YOLOv4:
Fig. 8. Detection using YOLOV5
The model is trained using the YOLOv4 model on RTX
3090 with a 24gb graphic card for 16 hours and was able
to obtain a accuracy of 70%. Following images represent the
results of yolov4,

Fig. 9. Prediction using YOLOv5

Fig. 6. Detection of fruits

Fig. 10. Map and Loss of YOLOv5

Fig. 7. Prediction using YOLOv4


D. Billing
The model is trained for 30000 epochs. Accuracy and A python script was written which detects the objects in
precision obtained for this YOLOv4 model are pretty good. We the given input image using our saved weights, which were
obtained from our training model. And the script also maps the helps in generalization of prices all over markets. The main
prices of the fruits detected. Prices of each classes are stored aim of the project to reduce the long queues and waiting hours
in a file and can be updated easily. Finally our scripts stores at billing stations is successfully addressed and a decent model
each and every object in a dictionary(Data Structure in python) is presented to solve it.
along with its number of occurrences and calculates the total
VI. F UTURE W ORK
bill obtained. Following are few results of bill calculated,
In this project, we have used YOLOv5 algorithm for only
12 different classes which also consisted of some grocery
items , even though the generated model is giving good
results for our fruits,vegetables and grocery dataset, mAP of
the model turned out to be 0.80 with Yolov5 whereas we
have used yolo v4 for 19 different fruit and vegetable classes
which gave an accuracy around 70 percent. There is a need
to improve the dataset which inturn improves the accuracy,
a better custom dataset can be made if we photograph the
images by ourselves, annotate them and convert into YOLO
format. The Object detection and billing system must be linked
together in the form of an application which would directly
help the Fruit/Vegetable markets.
R EFERENCES
[1] Geethapriya. S, N. Duraimurugan, S.P. Chokkalingam. “Real-Time
object detection with yolo”. International Journal of Engineering and
Fig. 11. Billing of detected objects Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-8, Issue-
3S, February 2019.
[2] Marcus Klasson, Cheng Zhang, Hedvig Kjellstrom. “A hierarchical
The above figure shows the number of tomatoes detected grocery store image dataset with visual and semantic labels”.
as 11 and as we decided the price of each tomato as Rs 5 in [3] Md Jan Nordin, Norshakirah Aziz, Ooi Wei Xin. “Food image recogni-
our dataset, it totals 55 tion for price calculation using convolutional neural network”.
[4] Chengji Liu,Yufan Tao, Jiawei Liang, Kai Li ,Yihang Chen. “Object
detection based on YOLO network”. 2018 IEEE 4th Information Tech-
nology and Mechatronics Engineering Conference (ITOEC 2018).
[5] Ms. Y. Vineela Sravya, M. Keerthi, M. Kasturi, R. Lochana, A. Anusha.
“Food calorie estimation and auto bill generation for grocery products
using YOLO object detection”. Journal of Xi’an University of Architec-
ture and Technology Volume XII, Issue V, 2020 ISSN No : 1006-7930.
[6] Xiaofeng Ning, Wen Zhu, Shifeng Chen. “Recognition, Object Detection
and Segmentation of white background photos based on Deep Learning”.
[7] Kavan Patel. “Fruits and vegetable detection for POS with Deep Learn-
ing”.
[8] Ragesh N, Giridhar B, Lingeshwaran D, Siddharth P and K P Peeyush.
“Deep Learning based automated billing cart”. International Conference
on Communication and Signal Processing, April 4-6, 2019, India.
[9] Huimin Yuan, Ming Yan. “Food object recognition and intelligent billing
system based on Cascade R-CNN”. 2020 International Conference on
Culture-oriented Science and Technology (ICCST).
[10] Suraj Chopade, Prof. Smita Palnitkar, Sujit Chavan, Anirudha Desh-
pande. “Automated Super Shop using image processing (Python)”. Inter-
national Journal of Future Generation Communication and Networking
Vol. 13, No. 2s, (2020), pp. 382–388.
[11] E. K. Jose and Veni. S. “YOLO classification with multiple object
Fig. 12. Billing of Detected Objects tracking for vacant parking lot detection”. Journal of Advanced Research
in Dynamical and Control Systems, vol. 10, pp. 683-689, 2018.
[12] Redmon and A. Farhadi. ”YOLO9000: Better, Faster, Stronger”. 2017
V. C ONCLUSION IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
Honolulu, HI, 2017, pp. 6517-6525.
A working model is built for the object detection and billing [13] Joseph Redmon, Santosh Divvala, Ross Girshick. “You Only Look
Once: Unified, Real-Time Object Detection”. The IEEE Conference on
of fruits, vegetables and some grocery. Fruit classification is Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779-788.
done through three different methods in the project from which [14] G. M. Farinella, D. Allegra, M. Moltisanti. “ Retrieval and classification
we concluded that YOLO is the fastest, more accurate on of food images”. Computers in Biology and Medicine, vol. 77, pp. 23-
39, 2016.
real/unseen images. Even though fine tuned CNN model gave [15] Zhuang-Zhuang Wang, Kai Xie, Xin-Yu Zhang, Hua-Quan Chen, Chang
a very good accuracy, it was poor when it comes to the real Wen, Jian-Biao He. “Small-Object detection based on YOLO and Dense
life images hence we chose YOLO over CNN. This project Block via Image Super-Resolution”. IEEE Access, vol. 9, pp. 56416-
56429.
revolutionises the complete system of billing. This project
helps in decreasing man power or self checkout portals, also

You might also like