ANLP

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 25

EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

Chapter 1
Introduction
Digital Image Processing means processing digital image by means of a digital computer.
We can also say that it is a use of computer algorithm, in order to get enhanced image
either to extract some useful information.
Image processing mainly include the following steps:
1. Importing the image via image acquisition tools.
2. Analyzing and manipulating the image.
3. Output in which result can be altered image or a report which is based on Analyzing
that image.
1.1 What is an image
An image is defined as a two-dimensional function,F(x,y), where x and y are spatial
coordinates, and the amplitude of F at any pair of coordinates (x,y) is called the intensity
of that image at that point. When x,y, and amplitude values of F are finite, we call it a
digitalimage.
In other words, an image can be defined by a two-dimensional array specifically arranged
in rows and columns.
Digital Image is composed of a finite number of elements, each of which elements have a
particular value at a particular location. These elements are referred to as picture
elements, image elements, and pixels. A Pixel is most widely used to denote the elements
of a Digital Image.
1.2 Types of an image
1. BINARY IMAGE– The binary image as its name suggests, contain only two-
pixel elements i.e 0 & 1, where 0 refers to black and 1 refers to white. This image
is also known as Monochrome.
2. BLACK AND WHITE IMAGE– The image which consist of only black and
white color is called BLACK AND WHITE IMAGE.
3. 8-bit COLOR FORMAT– It is the most famous image format.It has 256
different shades of colors in it and commonly known as Grayscale Image. In this
format, 0 stands for Black, and 255 stands for white, and 127 stands for gray.
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

4. 16-bit COLOR FORMAT– It is a color image format. It has 65,536 different


colors in it.It is also known as High Color Format. In this format the distribution
of color is not as same as Grayscale image.
A 16-bit format is actually divided into three further formats which are Red, Green and
Blue. That famous RGB format.
1.3 IMAGE AS A MATRIX
As we know, images are represented in rows and columns we have the following syntax
in which images are represented:

Fig 1: Image as a Matrix


The right side of this equation is digital image by definition. Every element of this matrix
is called image element, picture element, or pixel.
DIGITAL IMAGE REPRESENTATION IN MATLAB:

Fig 2: DIGITAL IMAGE REPRESENTATION IN MATLAB


EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

In MATLAB the start index is from 1 instead of 0. Therefore, f(1,1) = f(0,0).


henceforth the two representations of image are identical, except for the shift in origin.
In MATLAB, matrices are stored in a variable i.e X,x,input_image , and so on. The
variables must be a letter as same as other programming languages.

1.4 PHASES OF IMAGE PROCESSING:


ACQUISITION-
It could be as simple as being given an image which is in digital from. The main
work involves:
a) Scaling
b) Color Conversion (RGB to Gray or Vice-versa)
IMAGE ENHANCEMENT–
It is amongst the simplest and most appealing in areas of Image Processing it is
also used to extract some hidden details from an image and is subjective.

IMAGE RESTORATION–
It also deals with appealing of an image but it is objective (Restoration is based on
mathematical or probabilistic model or image degradation).

COLOR IMAGE PROCESSING–


It deals with pseudo color and full color image processing color models are
applicable to digital image processing.

WAVELETS AND MULTI-RESOLUTION PROCESSING–


It is foundation of representing images in various degrees.
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

IMAGE COMPRESSION-
It involves in developing some functions to perform this operation. It mainly deals
with image size or resolution.

MORPHOLOGICAL PROCESSING-
It deals with tools for extracting image components that are useful in the
representation & description of shape.

SEGMENTATION PROCEDURE-
It includes partitioning an image into its constituent parts or objects. Autonomous
segmentation is the most difficult task in Image Processing.

REPRESENTATION & DESCRIPTION-


It follows output of segmentation stage, choosing a representation is only the part
of solution for transforming raw data into processed data.

OBJECT DETECTION AND RECOGNITION-


It is a process that assigns a label to an object based on its descriptor.
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

1.5 OVERLAPPING FIELDS WITH IMAGE PROCESSING

Fig 3: Overlapping Fields with Image


According to block 1, if input is an image and we get out image as a output, then it is
termed as Digital Image Processing.

According to block 2, if input is an image and we get some kind of information or


description as a output, then it is termed as Computer Vision.

According to block 3, if input is some description or code and we get image as an


output, then it is termed as Computer Graphics.

According to block 4, if input is description or some keywords or some code and we get
description or some keywords as a output, then it is termed as Artificial Intelligence
1.6 IMAGE EDGE DETECTION OPERATORS:
Edges are significant local changes of intensity in a digital image. An edge can be
defined as a set of connected pixels that forms a boundary between two disjoint regions.
There are three types of edges:
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

 Horizontal edges
 Vertical edges
 Diagonal edges
Edge Detection is a method of segmenting an image into regions of discontinuity. It is a
widely used technique in digital image processing like
 pattern recognition
 image morphology
 feature extraction
Edge detection allows users to observe the features of an image for a significant change
in the gray level. This texture indicating the end of one region in the image and the
beginning of another. It reduces the amount of data in an image and preserves the
structural properties of an image.
Edge Detection Operators are of two types:
 Gradient – based operator which computes first-order derivations in a digital
image like, Sobel operator, Prewitt operator, Robert operator
 Gaussian – based operator which computes second-order derivations in a digital
image like, Canny edge detector, Laplacian of Gaussian

Fig 4: Image Edge Detection Operators


EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

Sobel Operator: It is a discrete differentiation operator. It computes the gradient


approximation of image intensity function for image edge detection. At the pixels of an
image, the Sobel operator produces either the normal to a vector or the corresponding
gradient vector. It uses two 3 x 3 kernels or masks which are convolved with the input
image to calculate the vertical and horizontal derivative approximations respectively.
Prewitt Operator: This operator is almost similar to the sobel operator. It also detects
vertical and horizontal edges of an image. It is one of the best ways to detect the
orientation and magnitude of an image. It uses the kernels or masks.
Robert Operator: This gradient-based operator computes the sum of squares of the
differences between diagonally adjacent pixels in an image through discrete
differentiation. Then the gradient approximation is made. It uses the following 2 x 2
kernels or masks
Marr-Hildreth Operator or Laplacian of Gaussian (LoG): It is a gaussian-based
operator which uses the Laplacian to take the second derivative of an image. This really
works well when the transition of the grey level seems to be abrupt. It works on the zero-
crossing method i.e when the second-order derivative crosses zero, then that particular
location corresponds to a maximum level. It is called an edge location. Here the Gaussian
operator reduces the noise and the Laplacian operator detects the sharp edges.
Canny Operator: It is a gaussian-based operator in detecting edges. This operator is not
susceptible to noise. It extracts image features without affecting or altering the feature.
Canny edge detector have advanced algorithm derived from the previous work of
Laplacian of Gaussian operator. It is widely used an optimal edge detection technique. It
detects edges based on three criteria:
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

Chapter 2
Literature survey

2.1 YANG YANG LEE, ZAINI ABDUL HALIM AND MOHD NADHIR AB
WAHAB “License Plate Detection Using Convolutional Neural Network–Back to
the Basic with Design of Experiments” IEEE March 4, 2022. Digital Object
Identifier 10.1109/ACCESS.2022.3153340

Automatic License Plate Recognition (ALPR) is one of the applications that


hugely benefited from Convolutional Neural Network (CNN) processing which has
become the mainstream processing method for complex data. Many ALPR research
proposed new CNN model designs and post-processing methods with various levels of
performances in ALPR. However, good performing models such as YOLOv3 and SSD in
more general object detection and recognition tasks could be effectively transferred to the
license plate detection application with a small effort in model tuning. This paper focuses
on the design of experiment (DOE) of training parameters in transferring YOLOv3 model
design and optimising the training specifically for license plate detection tasks. The
parameters are categorised to reduce the DOE run requirements while gaining insights on
the YOLOv3 parameter interactions other than seeking optimised train settings. The
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

result shows that the DOE effectively improve the YOLOv3 model to fit the vehicle
license plate detection task. A series of simple DOEs had been shown to improve the
ALPR performance of YOLOv3 with the CNN model untouched, specifically on the LP
detection task. An AP of 99% is achieved for Malaysian vehicle LP detection by
strategically tuning the YOLOv3 training parameters. A minor change relative to the
stock parameters did improve the performance. Adjusting other settings in the DOEs also
provide insights into the interactions of the YOLOv3 parameters. Images with more pixel
areas are generally better because more features are available for CNN feature extraction.
It is also found that a smaller mini-batch size has a better fitting to local minima,
improving the overall AP. The warm-up period is useful in generalising the initial feature
map across all image batches before increasing the global learning rate, but a longer
learning rate ramp-up time will decrease the overall AP.

2.2 Dr.Vishwanath Burkpalli, Abhishek Joshi, Abhishek B Warand, Akash Patil


“AUTOMATIC NUMBER PLATE RECOGNITION USING TENSORFLOW AND
EASYOCR” IRJETS, Volume:04/Issue:09/September-2022, Impact Factor-6.752

Three main stages are identified in such applications. First, it is necessary to


locate and extract the license plate region from a larger scene image. Second, having a
license plate region to work with, the alphanumeric characters in the plate need to be
extracted from the background. Third, deliver them to a character system (BOX
APPROACH) for recognition. In order to identify a vehicle by reading its license plate
successfully, it is obviously necessary to locate the plate in the scene image provided by
some acquisition system (e.g., video or still camera). Locating the region of interest helps
in dramatically reducing both the computational expense and algorithm complexity. The
paper mainly works with the standard license plates but the techniques, algorithms and
parameters that is be used can be adjusted easily for any similar number plates even with
other alpha-numeric set. The objective of this paper is to study and resolve algorithmic
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

and mathematical aspects of the automatic number plate recognition systems, such as
problematic of machine vision, pattern recognition, OCR and neural networks.

2.3 Aman Jain, Jatin Gupta, Somya Khandelwal and Surinder Kaur “Vehicle
License Plate Recognition” Fusion: Practice and Applications (FPA) Vol. 4, No. 1,
PP. 15-21, 2021

One of the most significant parts of integrating computer technologies into


intelligent transportation systems (ITS) is vehicle license plate recognition (VLPR). In
most cases, however, to recognize a license plate successfully, the location of the license
plate is to be determined first. Vehicle License Plate Recognition systems are used by law
enforcement agencies, traffic management agencies, control agencies, and various
government and nongovernment agencies. VLPR is used in various commercial
applications, including electronic toll collecting, personal security, visitor management
systems, parking management, and other corporate applications. As a result, calculating
the correct positioning of a license plate from a vehicle image is an essential stage of a
VLPR system, which substantially impacts the recognition rate and speed of the entire
system. In the fields of intelligent transportation systems and image recognition, VLPR is
a popular topic. In this research paper, we address the problem of license plate detection
using a You Only Look Once (YOLO)-PyTorch deep learning architecture. In this
research, we use YOLO version 5 to recognize a single class in an image dataset. This
research work will significantly benefit the future of object detection, especially in
vehicle detection. It uses the latest version of YOLO, i.e., version 5, and the miniature
version for number plate detection. YOLO v5, in a traditional sense, does not bring any
novel architectures or losses, or techniques to the table.

2.4 RAYUDU SUSHMA, MADHURI RITHIKA DEVI, NIKHIL


MAHESHWARAM, DR. SREEDHAR BHUKYA” Automatic License Plate
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

Recognition with YOLOv5 and Easy-OCR method” June 2022 | IJIRT 155622 |
Volume 9 Issue 1 | ISSN: 2349-6002 | page 1243

In this era of ever-increasing technology, there is a vast demand among people for
a safe and secure day-to-day lifestyle and travel. The number of automobiles on the road
has risen steadily during the last decade. With the tremendous development in the
vehicular sector every day, tracking individual automobiles has become a very difficult
task. With the use of surveillance cameras on the roadside, this idea proposes an
automatic vehicle monitoring system for vehicles moving very quickly. Obtaining real-
time CCTV footage is an extremely time-consuming task. To solve this problem, an
efficient deep learning model, which is called You Only Look Once (YOLO), is used for
object detection and Easy-OCR for character recognition. ALPR (Automatic License
Plate Recognition), one of the most extensively used computer vision applications, is the
subject of the proposed work. It includes technologies such as EASY-OCR and YOLO.

2.5 WANWEI WANG, JUN YANG, MIN CHEN, AND PENG WANG “A Light
CNN for End-to-End Car License Plates Detection and Recognition” IEEE
December 13, 2019, Digital Object Identifier 10.1109/ACCESS.2019.2956357

In this paper, we propose a lightweight method for end-to end car license plates
detection and recognition. We realize the multi-task license plate detection algorithm of
three-layer cascaded network, to complete the license plate detection, corner detection
and colour recognition task using an entire network structure. Compared with the
traditional detection algorithm, multi-task parallel processing improves the detection
precision and speed. Then the license plate recognition adopts CRNN and CTC model to
realize end-to-end object recognition algorithm with high precision, which can support
single-layer and double-layer license plates with different colour. Compared with
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

traditional algorithms and other neural network algorithms, our method has a higher
recognition precision on the CCPD dataset. In the future, we will open our source code
and data set. The experimental result shows that our method achieves up to 98%
recognition precision and is superior to other methods in the precision and speed of
detection and recognition.

2.6 Mohammed Salemdeeb, Sarp Erturk “Multi-national and Multi-language


License Plate Detection using Convolutional Neural Networks” Engineering,
Technology & Applied Science Research, Vol.10, No.4,2020,5979-5985, page 5979

Detecting country and language is important to build a global ALPR system,


while correct layout classification is essential in order to read the detected characters in
the right order. This paper focused on LP detection and classification of multi-national
and multi-language LPs with different layouts from BR, USA, EU, TR, KSA and UAE,
proposing a method that can detect LPs regardless of their country of origin, language or
layout. Furthermore, a second classification stage was used to recognize LPs’ issuing
country, language and layout. In addition, a new multi-national, multi-language and
multi-layout LP dataset was introduced in order to enable benchmarking and to close the
gap in this field. The developed detection and classification approach was based on deep
learning. The results were promising, and the LP detection average precision was
99.57%, while the LP classification accuracy was 99.33%. The current study paves the
way to designing a global ALPR system. In the future, an end-to-end training process
could be developed to test the whole system as a unified ALPR model.

2.7 Divya Rastogi, Mohammad Shahbaz Khan, Kanav jindak and Karan Singh “A
Real-Time Vehicle Number Plate Detection and Recognition System” Volume XII,
Issue IV, 2020, Page No:5005, Journal of Xi’an University of Architecture &
Technology
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

A real-time plate detection system supported by image processing provides a


definitive solution to the present problem. we proposed a real-time vehicle number plate
recognition (RVNPR) system for the recognition of number plates which can extract the
characters from the number plates of vehicles passing by a particular location using
image processing algorithms. it's not necessary to put in additional devices like GPS or
radio frequency Identification (RFID) to implement the proposed system. Using high-
definition cameras, the system takes images of every passing vehicle and sends the image
to the computer for processing by RVNPR software. The plate recognition software uses
different algorithms like Yolo (You Only Look Once), segmentation and at last character
recognition. Real-time vehicle number plate recognition (RVNR) system is used for
detection of number plates on vehicles and recognising the characters from the number
plates. The advantage of our proposed method is its high accuracy in plate detection and
recognition.

Chapter 3

Existing System

3.1 BLOCK DIAGRAM


EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

Fig 5: Block diagram of Existing system


3.2 RELATED WORKS
3.2.1 TRANSITION OF ALPR TO DEEP LEARNING ALGORITHM
The early day ALPRs are mostly on hand-crafted algorithms. Image processing
techniques such as edge detection and coefficient correlation [4] were common, as well
as ML algorithms such as k-nearest Neighbour, sparse autoencoder method, support
vector machine (SVM) and artificial neural network (ANN).
The paradigm had shifted when a subset of ML algorithms, i.e., CNN deep learning,
started performing and CNN computation became more viable. Unlike ANN, CNN can
process multi-dimensional data such as images. Initial findings from concluded that the
accuracy would improve with more train data since CNN is a data-driven algorithm that
differs from traditional handcrafted coding. Then, attempted to recognize LPs and its
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

characters with single CNN by retraining Alex Net, which is one of the most popular
CNN algorithms in 2017. They trained the CNN with custom cropped images of car LPs
and achieved 95.24% of accuracy. Another similar research also uses CNN-based
character classification to replace traditional OCR, proving that CNN can classify
characters from blurry LP images.
Vehicle ALPR research in massively deployed deep learning algorithms with CNN and
long-short term memory (LSTM). CNN was used to extract features, then LSTM were
trained to process the features to recognize the characters. They can discriminate both
private and public car plates of different colors and recognize the characters. Similarly,
uses simpler parallel CNNs to identify the nature of the car plate, such as types,
dimensions and color, then used LSTM to recognize the car plate characters, achieving
99.8% of precision. Reference combined both edge detection and CNN as hybrid
processing pipelines to enhance ALPR performance. It is realized that although CNN is
superior in performance, the cost of computing is prohibitive in a real-world scenario.
Image sizes are limited to the CNNs input size.
Its classification performance relies on cropped image batches. However, it does not
translate into the ability to localize and identify car plates in a huge image area which is
the more practical use case in ALPR application. Thus, tried to speed up the recognition
process with regional CNN (RCNN) but achieved a precision of 0.4 out of 1 on a single
huge image, primarily due to R-CNN’s limitations. Newer research from also showed
that a much better variation of R-CNN called masked R-CNN is capable of a complete
ALPR task at comparable 98% precision and recall.
Data processing pipeline unification has been done by to fully utilise CNN to detect and
recognise the LP characters, bypassing any unnecessary architectures for different tasks
but showing that CNN favours detection tasks but not character recognition. Another
research exploited big data (about 250k images), namely Chinese City Parking Dataset
(CCPD) with one pass CNN much like SSD to recognize and localize the LP. It is proven
to be effective and robust for various environments (blurry, angled, tilted LP), avoiding
recurrent CNN computation like R-CNN, which is the reason for the high computing cost
for CNN inferencing.
YOLO algorithm has been the interest for ALPR in recent years. YOLO algorithm was
introduced by in 2015 and achieved one-pass CNN object classification and localization.
Further revisions of YOLO improve the detection capability and speed. The first use of
YOLO CNN was attempted by to detect LPs of vastly different plate orientations,
yielding 99.5% F1-score. YOLOv2 algorithm with modified ResNet50 CNN was
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

proposed by [19] to localize and detect the nature of multi-national LP (country, size, and
languages but did not work on recognizing the characters on LPs), achieving 99.57%
detection precision.
YOLOv2 because they claimed YOLOv3 has more layers that slow down the training,
which is not entirely true depending on which CNN model to utilise in the YOLO
workflow. They only compared the metric of motorcyclist LP of riders with or without
helmets, in which the comparison might not be significant. Nevertheless, they also
achieved 95 to 97.5% precision score with YOLOv2 algorithm. Similarly, extended the
dataset for multi-national and multilanguage LP, reaching 99.57% of AP in LP detection.
Reference further enhances datasets by synthesizing LPs to overcome small dataset size
and train a custom CNN model ported to Fast-YOLO to perform ALPR.
Reference utilized YOLOv3 for both LP detection and recognition stages with 95-97%
accuracy. Overall, the YOLO-based algorithm is very promising to be repurposed for LP
detection. Another work from is very similar to YOLO, but they used a branching method
at the end of CNN to detect LPs. Besides the YOLO algorithm, CNN could also do image
segmentation with up-sampling layers, giving an idea of using an entirely semantic
segmentation method to detect and recognize the LP. Their work is specifically on Arabic
LP, so it is hard to compare to the CCPD dataset. Reference also uses the CNN
segmentation method to extract features and perform complete ALPR with parallel
CNNs.
3.2.2 THE NATURE OF DATA-DRIVEN ALPR
Deep learning is one of the data-driven programming approaches. Instead of a hard-coded
feature extraction algorithm, feeding as much data as possible will ‘‘code’’ the necessary
feature map. Thus, the reliance on big data is one of the key factors for ML
implementation. CNN model architecture plays a vital role in ML, but it heavily depends
on the methods or preference of the data processing pipeline. The ALPR approach could
be classified into one-staged and two-staged processes. The two-staged process is more
straightforward because the data classes are separated, i.e., LP itself and it characters in
optimizing the coding for each data processing pipeline. This approach is especially true
from traditional image processing, where images are considered complex data. A two-
staged process usually detects and crops the region of interest (ROI) around LP to
eliminate any unwanted background details, then only attempt to recognize the characters
on the cropped LP images with optical character recognition (OCR). However, the image
processing pipeline of ALPR had been shifting to a one-staged process with the
emergence of ML, extracting both LP and its character in one pass. One pass processing
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

is possible with some innovations in CNN model designs and post-processing techniques.
In the one-staged process, the CNN model mostly only acts as a feature extractor to
preserve the spatial information of the object of interest, i.e., LP and its characters in
ALPR. The classification and bounding box localization are passed to other post-
processing techniques such as non-max suppression (NMS) and intersection over union
(IoU) to compute the confidence level and the ROI of the object class within an image.

3.2.3 THE CHALLENGES OF MALAYSIAN LP


Several ALPR works on Malaysian’s LP exist, but none are up to the global trend of ML-
based ALPR. There are some unique challenges to implementing ALPR on Malaysian
LP. First is the availability of the dataset because there are no known open-source LP
images for Malaysian vehicles. The LP images are confidential or owned by specific
authorities, which could not be accessed easily and openly. Secondly is the inconsistency
of the LP format. Many on-road Malaysian LP characters could have different fonts,
spacing and placements, even with non-standard stickers or labels, violating the official
LP guideline. There also exist many valid LPs with unique characters such as ‘‘XXIV’’,
‘‘SUKOM’’, ‘‘1M4U’’ and Putrajaya’’. Newer LPs also located characters after the
numbering with increasing new on-road vehicles. Those varying LP standards render
most overseas ALPR techniques inapplicable because foreign LPs have fixed character
numbers and spacing, suitable for processing with the character segmentation method.

3.3 METHODOLOGY
Many ALPR algorithms had been proposed in previous research, but they hardly
discussed the relationships and the reasoning of the related training parameters. In this
work, multi-level (2-level and 3-level) factorial DOE would be utilized to study the
YOLOv3 training parameters’ interactions and optimize the LP detection performance for
stage one of ALPR, i.e., LP detection only. Self-prepared Malaysia’s vehicle LP dataset
will be used since this research has to tackle ALPR problems on Malaysian vehicles
specifically. Stage two of ALPR, i.e., character recognition, will not be part of the
research for the time being because LP labels are geo-specific and highly dependent on
dataset labelling and algorithms.

3.3.1 MALAYSIAN VEHICLE LP DATASET


EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

One is taking photos from the federal highway, which consists of multiple vehicles in a
single 32MP image with a DSLR telephoto lens. Images with clear local LP labels were
also downloaded from social websites and some local car auction websites. Overall, a
total of 10k images were manually collected, processed and labelled. Only the LP spatial
locations are labelled for (x,y,h,w), where ‘x’, ‘y’, ‘w’, and ‘h’ are the horizontal and
vertical locations of bounding box centre, width, and height of the LP bounding box,
respectively, as illustrated in Figure 6.

Fig 6: The labelling convention

Fig 7: YOLOv3 structure with ‘‘SqueezeNet’’ backbone CNN


EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

3.4 THE YOLOv3 PARAMETERS


YOLOv3 is a state-of-art real-time object recognition algorithm that could accept various
CNN model designs as the backbone feature extractor (with few CNN backend
requirements). The principal technique of YOLOv3 is on the backend feature pyramid
network (FPN) and anchor box layer as a post-processing pipeline to retrieve spatial
information and class confidence from the extracted features. The backbone CNN model
is ‘‘SqueezeNet’’, originally proposed by and remains untouched to isolate the
parameters specific to the CNN model. The overall structure of the ‘‘SqueezeNet’’ based
YOLOv3 algorithm is illustrated in Figure 1. YOLOv3 does have a few training
parameters with default values to be adjusted in the MATLAB native code, as shown in
Table 1. Some parameter values are limited to the original example datasets and not
strictly tied to the YOLOv3 algorithm.

Table 1: List of parameters For Yolov3 training

3.4: MULTI-LEVEL FACTORIAL DOE


EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

The purpose of the DOE is to study the correlation of the training parameters and their
effects on the YOLOv3 LP detection performance. The DOE was performed at 2-level
and 3-level factorials, whereby the 2-levels factorial DOE is applied at the initial DOE,
and the 3-level factorial design is applied on the subsequent DOEs. The choice of level
number would be explained along with the experiments in Section IV. Multi-level
factorial DOE design follows a function, as in
𝑓𝑎𝑐𝑡𝑜𝑟
𝑟𝑢𝑛=𝑙𝑒𝑣𝑒 𝑙
One consideration of DOE on CNN is that a complete epoch of CNN training could take
minutes or even hours depending on the CNN algorithms complexity and computing
hardware. The number of runs increases exponentially with the number of levels and
factors. There are nine possible parameters for the YOLOv3 training that contribute to
nine factors, referring to (1), a 2-level full factorial DOE would require 512 runs, or
19683 runs in the case of 3-level full factorial, which is impractical. Thus, reducing either
the number of levels or factors is necessary.

3.5 EXPERIMENTS
3.5.1 DOE I
DOE I is a 2-level full factorial experiment with three factors and two replicates,
resulting in a total of 16 runs. A complete 2-level factorial experiment only requires eight
runs, but the extra loop ensures the normal distribution of the data point. Only then the
scores are valid for the ANOVA. Three factors were tested, i.e., image aspect ratio,
number of anchors and the train-test ratio. The default number of epochs was 80, but it
was reduced to 10 for all DOEs. The purpose of the DOE is to study the interaction of the
parameters and their relative convergence capability. It also helps minimize the CNN
train time to have a faster design cycle for DOE since it takes several minutes to complete
an epoch. Thus, a comprehensive CNN training epoch is not required.
However, a complete 80 epochs of training would be carried out at the end of all DOEs to
validate the DOE findings. ‘‘imageAspectRatio’’ is limited to near square ratio as
described in Section III part A. More ‘‘numberofAnchors’’ could improve the mean
intersection union of the localization, thus improving the AP for more object classes.
Hence, its upper limit is set to double the lower limit. ‘‘trainTestRatio’’ is dependent on
the size of the supplied dataset. A bigger dataset could allocate more data for training. A
70% training ratio is general for most ML approaches.
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

Table 2: Doe I list of parameters

3.5.2 DOE II
DOE II is a 3-level full factorial experiment with three factors and one replicates,
resulting in 27 runs. It is found that 3-level factorial could provide more insight into the
interactions of the factors and normally distributed data points with just one replicate. An
extra 11 runs compared to the previous DOE I is a good trade-off for having additional
information for the 3-level factorial interaction plots while ensuring normal data
distribution for ANOVA outputs to be valid. Since the previous experiment provided an
insight into the parameter’s interaction, it will not be redone as a 3-level factorial DOE.
The settings of each factor, their aliases, and the default values of the other parameters
for DOE II are listed in Table 3, whilst its ANOVA and interaction plot output is shown
in Figure 4. The best parameter settings from DOE I were utilised in this DOE II, i.e.,
‘‘imageAspectRatio’’, ‘‘numberofAnchor’’ and ‘‘trainTestRatio’’ are 1:0.98, 6 and 0.7,
respectively.
The ‘‘miniBatchSize’’ is set as 16 and 32 since the computing memory is the only
limiting factor. Although the original value for ‘‘warmupPeriod’’ is 41.6% of the total
epochs, a lower value will have a faster ramp to the target learning rate, thus having
better initial convergence but might risk CNN overfitting. The ‘‘penaltyThreshold’’ is a
ceiling for applying penalty function to the CNN model. A higher threshold will improve
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

the object detection confidence score but decrease the anchor box detection overlapping
tolerance, lowering the AP.

Table 3: Doe II list of parameters.


3.5.3 DOE III
DOE III is also a 3-level full factorial experiment with three factors and one replicates,
resulting in 27 runs. From the DOE I, it is found that increasing in train-test ratio could
have statistical significance to the performance outcome. However, it is still possible to
push the ratio to 80% instead of limited to 70%. Also, it is unknown whether the train-test
ratio has any correlation with the L2Regularization and learning rate. Thus, the train-test
ratio was included in this DOE to extend its behaviour study to 3-level factorial. The
settings of each factor, their aliases, and the default values of the other parameters for
DOE III are listed in Table 4, whilst its ANOVA and interaction plot outputs. The
‘‘L2Regualrization’’ is the magnitude for weight update gradient, modifying the weight
update rate. Whilst the ‘‘learnRate’’ is the global multiplier for updating the CNN
trainable parameters. CNN training is sensitive to ‘‘L2Regualrization’’ and ‘‘learnRate’’
values, so they are only adjusted in small margins.
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

Table 4: Doe III list of parameters.


3.5.4 FINAL TEST
The purpose of the final test is to validate the findings of the DOE. A full 80 epochs were
executed on the YOLOv3 algorithm to validate whether the DOE effectively tunes the
performance outcome. There are five tests. Setting E is of optimum settings from the
DOEs, while Setting D is of original parameters before the DOEs. The rest of Setting A,
B and C are non-optimum settings. Each test is repeated for 3-fold cross-validation so
that the tests are less likely to be data-dependent. The average of three runs for each set
was taken as the final score of each test. In the end, the results are listed in Table 5 shows
the precision-recall curve of each setting for the first-fold run to compare the fitness of
the YOLOv3 to the Malaysian LP dataset.
The DOE optimized Setting E delivered the highest 99.00% AP score while the default
Setting D has 98.53% AP, or 0.47% improvement only because the train test ratio and the
warm-up time are the only difference. Other settings changes were relatively
underperformed, ranging from 95.47% to 98.12% AP scores. The precision-recall curve
also shows that the optimized setting has the best average curve fitness of all settings
with minimum precision of 0.993. It shows that the DOEs tuned the performance of the
YOLOv3 to detect the LP location.
However, the setting could be specific to the Malaysian LP dataset. Other open datasets
like CCPD and UFPR are yet to be tested. SSD is an alternative algorithm for ALFR, but
it has a poorer performance than YOLOv3 from initial training with the default setting,
only achieving 87.75% AP in addition to ten times longer training epochs. Also, SSD
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

utilized RestNet50 CNN model, so it is not a fair comparison. SSD has some different
classes of parameters and might require different DOE strategies.

Table 5: Final test


3.6 RESULTS ANALYSIS
3.6.1 DOE I

Fig 8: DOE I result of analysis.


3.6.2 DOE II
EXPERIMENTAL BASED LICENSE PLATE RECOGNITION USING KERAS APPLICATION

Fig 9: DOE II result of analysis.


3.6.3 DOE III

Fig 10: DOE III result of analysis.


smaller LP belongs to neighbour Thailand country. E) LP detection on blurry
images. F) LP detection on large vehicles such as lorries.

You might also like